prefactor: Allow multiple htlcs in/out in forwarding events for trampoline#4304
prefactor: Allow multiple htlcs in/out in forwarding events for trampoline#4304carlaKC wants to merge 19 commits intolightningdevkit:mainfrom
Conversation
|
👋 Thanks for assigning @valentinewallace as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4304 +/- ##
==========================================
- Coverage 85.93% 85.86% -0.08%
==========================================
Files 159 159
Lines 104693 104960 +267
Branches 104693 104960 +267
==========================================
+ Hits 89972 90119 +147
- Misses 12213 12331 +118
- Partials 2508 2510 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
lightning/src/events/mod.rs
Outdated
| (11, next_user_channel_id, option), | ||
| (13, prev_node_id, option), | ||
| (15, next_node_id, option), | ||
| // Type 9 was prev_user_channel_id in 0.2 and earlier. |
There was a problem hiding this comment.
IMO it would be nice to write the old fields in cases where we have only one input. That way downgrade still includes the fields that users might have expected and even unwrapped safely.
There was a problem hiding this comment.
IMO it would be nice to write the old fields in cases where we have only one input.
Would you be happy to just write the first HLTC to the legacy fields all the time, for the sake of simplicity?
There was a problem hiding this comment.
Sure, seems fine too. We'll want to document the downgrade behavior in the docs on the event.
lightning/src/events/mod.rs
Outdated
| 25u8.write(writer)?; | ||
| write_tlv_fields!(writer, { | ||
| (0, prev_channel_id, required), | ||
| // Type 0 was prev_channel_id in 0.2 and earlier. |
There was a problem hiding this comment.
We definitely need to write this field still so that we can downgrade at all.
lightning/src/ln/channelmanager.rs
Outdated
| 1 => Ok(HTLCSource::PreviousHopData(Readable::read(reader)?)), | ||
| 2 => { | ||
| let mut previous_hop_data = Vec::new(); | ||
| let mut incoming_trampoline_shared_secret: crate::util::ser::RequiredWrapper<[u8; 32]> = crate::util::ser::RequiredWrapper(None); |
There was a problem hiding this comment.
init_and_read_tlv_fields may be helpful here.
|
🔔 2nd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
b9a8b09 to
2c52690
Compare
|
🔔 3rd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @valentinewallace! This PR has been waiting for your review. |
valentinewallace
left a comment
There was a problem hiding this comment.
I gave it an initial skim some time ago and Matt covered all the feedback I had at the time, so I think this should probably be good. Can we squash/rebase and I'll take a closer look? 🙏 yay trampoline progress!
13c172a to
5cb4c2f
Compare
|
Squashed old fixups and added a few changes that came up finishing up the full trampoline flow. Rebase is non-trivial, so going to hold off on that until the new changes have had a look! |
TheBlueMatt
left a comment
There was a problem hiding this comment.
I haven't thought too deeply about the replay but at a high level it makes sense. Will let @valentinewallace opine on it as well.
| } => Self::TrampolineForward { | ||
| session_priv: outbound_payment | ||
| .as_ref() | ||
| .map(|o| o.session_priv.secret_bytes()) |
There was a problem hiding this comment.
IIUC there will be multiple HTLCSources for a single trampoline forward, due to differing TrampolineDispatch for each outbound MPP part. Thus, each outgoing HTLC will be/needs to have a different SentHTLCId. Might be worth commenting that here or on HTLCSource.
| onion_error.decode_onion_failure(&self.secp_ctx, &self.logger, &source); | ||
| let incoming_trampoline_shared_secret = Some(*incoming_trampoline_shared_secret); | ||
|
|
||
| // TODO: when we receive a failure from a single outgoing trampoline HTLC, we don't |
There was a problem hiding this comment.
This somewhat complicated landing things - if we add support for decoding new fields such that downgrades will not fail on load, we also need to actually handle things safely, which this isn't. Is your intention to hold off on this and just land all of trampoline in one go (probably for 0.4) or should we break some deserialization logic (maybe refuse to deserialize HTLCSources with trampolines for now) so we can make incremental progress?
There was a problem hiding this comment.
I believe this will be fine to merge as-is because we'll store dispatched trampoline payments in pending_outbound_payments with an even TLV, so we wouldn't be able to downgrade while we have pending trampoline payments?
Failing to deserialize HTLCSource sounds like a good safety stop if we don't want to depend on that, would be nice to land incrementally.
There was a problem hiding this comment.
Yea, would be nice to not depend on it I think because we don't want to rely on the channelmanager persistence happening. Maybe we can add a trivial custom TLV in the new HTLCSource case that just always Err(DecodeError::UnknownRequiredFeature)s?
There was a problem hiding this comment.
Good point. Updated read of HTLCSource::Trampoline to fail, easy enough to re-add it when we need it.
Change is here, along with a few of the changes that Val requested since your last review.
|
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
valentinewallace
left a comment
There was a problem hiding this comment.
Didn't quite finish but made it more than halfway. Looks good! 🫡
lightning/src/events/mod.rs
Outdated
| // We can unwrap in the eagerly-evaluated default_value code because we | ||
| // always write legacy fields to be backwards compatible, and expect | ||
| // this field to be set because the legacy field was only None for versions | ||
| // before 0.0.107 and we do not allow upgrades with pending forwards to 0.1 | ||
| // for any version before 0.0.123. |
There was a problem hiding this comment.
OOC, did you look into what version the user_channel_id/node_id fields started being included? Just curious if we can do something similar
There was a problem hiding this comment.
tbh didn't look into it because they're both Option in HTLCPreviousHopData.
They were added to PaymentForwarded:
node_id= 0.1: I believe we still allow direct upgrades w/ pending htlcs so can't unwrap?user_chanel_id= 0.0.123: so could do the same upgrade trick (comment should be <= 0.0.123, will fix), but would have to unwrap inclaim_funds_internal
| let reason = HTLCFailReason::from_failure_code(LocalHTLCFailureReason::ChannelClosed); | ||
| let (source, hash) = htlc_source; | ||
| self.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, None); | ||
| let failure_type = source.failure_type(*counterparty_node_id, msg.channel_id); |
There was a problem hiding this comment.
I appreciate the rename from receiver to failure_type in this diff 😆
| @@ -1381,8 +1432,11 @@ impl_writeable_tlv_based_enum_upgradable!(MonitorUpdateCompletionAction, | |||
| (4, blocking_action, upgradable_required), | |||
| (5, downstream_channel_id, required), | |||
| }, | |||
| (2, EmitEventAndFreeOtherChannel) => { | |||
| (0, event, upgradable_required), | |||
| (2, EmitEventOptionAndFreeOtherChannel) => { | |||
There was a problem hiding this comment.
I think we might be able to omit this change and repurpose the FreeOtherChannelImmediately variant, which seems equivalent to this variant when the event is None? At least this passes tests, though the manager read portion may need a closer look: https://pastebin.com/5BGVXHrB. It seems like a weird state if neither field is set, which would currently be allowed.
There was a problem hiding this comment.
I took a look at that and got caught up on the expectation that FreeOtherChannelImmediately isn't persisted, because each trampoline claim is a NewClaim which pushes into monitor_update_blocked_actions which will get persisted (previously we'd only create this action for duplicates which are handled differently).
If that's okay (we can after all remove the assert like you've done) then agree this is the way to go 👍
There was a problem hiding this comment.
Hmm that's a good point, I'm definitely not sure it makes sense to use FreeOtherChannelImmediately (btw, the patch I posted was Claude'd/sus). Mostly not a fan that the state EmitOptionalEventAndFreeOtherChannel { event: None, channel: None } is allowed, which seems like something we can lean on the compiler to prevent. Maybe it would make more sense to add a new variant FreeInboundChannelTrampoline?
There was a problem hiding this comment.
Maybe it would make more sense to add a new variant FreeInboundChannelTrampoline?
Also looked at this but IIRC it led to quite a lot of duplication - let me claude up my own sus patch and see what it looks like.
There was a problem hiding this comment.
Ooh yeah I would buy that it leads to a lot of extra code. Wdyt about something like this? 796704c i.e. add an extra bool to the existing FreeOtherChan variant. I'm not sure if we'll also want some way to explicitly detect that it's a partial trampoline case before creating the variant with the bool set, however.
There was a problem hiding this comment.
Mostly not a fan that the state EmitOptionalEventAndFreeOtherChannel { event: None, channel: None } is allowed
Looking at downstream_counterparty_and_funding_outpoint, it's only Option because it was added in 0.0.116 - so we could flip this to required (because we don't allow in-flight upgrades from <= 0.0.123 to 0.1+) and then that weird state is impossible?
I think I prefer this to having a trampoline-related field in MonitorUpdateCompletionAction.
There was a problem hiding this comment.
That seems reasonable 👍 It might be nice if the FreeOtherChannelImmediately name included "duplicate" somehow to disambiguate between that case and EmitEventAndFree where the event is None, but the docs are at least clear on the difference.
valentinewallace
left a comment
There was a problem hiding this comment.
Looks good pending outstanding feedback! I haven't looked at the follow-up too closely so obviously having faith that some things like the outbound payments, startup replay stuff to be resolved there.
lightning/src/ln/channelmanager.rs
Outdated
| .. | ||
| } => { | ||
| // TODO: what do we want to do with this given we do not wish to propagate it directly? | ||
| let _decoded_onion_failure = |
| @@ -1381,8 +1432,11 @@ impl_writeable_tlv_based_enum_upgradable!(MonitorUpdateCompletionAction, | |||
| (4, blocking_action, upgradable_required), | |||
| (5, downstream_channel_id, required), | |||
| }, | |||
| (2, EmitEventAndFreeOtherChannel) => { | |||
| (0, event, upgradable_required), | |||
| (2, EmitEventOptionAndFreeOtherChannel) => { | |||
There was a problem hiding this comment.
That seems reasonable 👍 It might be nice if the FreeOtherChannelImmediately name included "duplicate" somehow to disambiguate between that case and EmitEventAndFree where the event is None, but the docs are at least clear on the difference.
5cb4c2f to
f952f59
Compare
|
Pushed suggested changes to update: eurgh, will fix clippy on squash |
|
For replay, I've gone through each restart scenario and I'm pretty confident that we'll have the information that we need on hand with this groundwork (I am new to this part of the codebase though!). There are some recent changes on main that I haven't written but have looked into and I think that we need to:
|
valentinewallace
left a comment
There was a problem hiding this comment.
Latest updates LGTM, I'm good with a rebase
f952f59 to
f15a1c5
Compare
In the commits that follow, we want to be able to free the other channel without emitting an event so that we can emit a single event for trampoline payments with multiple incoming HTLCs. We still want to go through the full claim flow for each incoming HTLC (and persist the EmitEventAndFreeOtherChannel event to be picked up on restart), but do not want multiple events for the same trampoline forward. Changing from upgradable_required to upgradable_option is forwards compatible - old versions of the software will always have written this field, newer versions don't require it to be there but will be able to read it as-is. This change is not backwards compatible, because older versions of the software will expect the field to be present but newer versions may not write it. An alternative would be to add a new event type, but that would need to have an even TLV (because the event must be understood and processed on restart to claim the incoming HTLC), so that option isn't backwards compatible either.
`downstream_counterparty_and_funding_outpoint` was added to LDK in 0.0.116. We do not allow direct upgrades with pending forwards to 0.1 from 0.0.123 and below, so we can now assume that this field will always be present. This change also makes it impossible to create a `EmitEventOptionAndFreeOtherChannel` action with nothing in it (no event or channel), which could have been possible now that we've made the event optional).
In preparation for trampoline failures, allow multiple previous channel ids. We'll only emit a single HTLCHandlingFailed for all of our failed back HTLCs, so we want to be able to express all of them in one event.
This commit adds a SendHTLCId for trampoline forwards, identified by their session_priv. As with an OutboundRoute, we can expect our HTLC to be uniquely identified by a randomly generated session_priv. TrampolineForward could also be identified by the set of all previous outbound scid/htlc id pairs that represent its incoming HTLC(s). We choose the 32 byte session_priv to fix the size of this identifier rather than 16 byte scid/id pairs that will grow with the number of incoming htlcs.
We only have payment details for HTLCSource::TrampolineForward available once we've dispatched the payment. If we get to the stage where we need a HTLCId for the outbound payment, we expect dispatch details to be present. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
To create the right handling type based on source, add a helper. This is mainly useful for PreviousHopData/TrampolineForward. This helper maps an OutboundRoute to a HTLCHandlingFailureType::Forward. This value isn't actually used once we reach `forward_htlc_backwards_internal`, because we don't emit `HTLCHandlingFailed` events for our own payments. This issue is pre-existing, and could be addressed with an API change to the failure function, which is left out of scope of this work.
Will need to share this code when we add trampoline forwarding. This commit exactly moves the logic as-is, in preparation for the next commit that will update to suit trampoline. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
When we introduce trampoline forwards, we're going to want to provide two external pieces of information to create events: - When to emit an event: we only want to emit one trampoline event, even when we have multiple incoming htlcs. We need to make multiple calls to claim_funds_from_htlc_forward_hop to claim each individual htlc, which are not aware of each other, so we rely on the caller's closure to decide when to emit Some or None. - Forwarding fees: we will not be able to calculate the total fee for a trampoline forward when an individual outgoing htlcs is fulfilled, because there may be other outgoing htlcs that are not accounted for (we only get the htlc_claim_value_msat for the single htlc that was just fulfilled). In future, we'll be able to provide the total fee from the channelmanager's top level view.
Implement payment claiming for `HTLCSource::TrampolineForward` by iterating through previous hop data and claiming funds for each HTLC. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
We'll want this extracted when we need to handle trampoline and regular forwards. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
Implement failure propagation for `HTLCSource::TrampolineForward` by iterating through previous hop data and failing each HTLC with `TemporaryTrampolineFailure`. Note that testing should be implemented when trampoline forward is completed. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
Move recovery logic for `HTLCSource::PreviousHopData` into `channel_monitor_recovery_internal` to prepare for trampoline forward reuse. Co-authored-by: Arik Sosman <git@arik.io> Co-authored-by: Maurice Poirrier <mpch@hey.com>
Implement channel monitor recovery for trampoline forwards iterating over all hop data and updating pending forwards.
This commit uses the existing outbound payment claims replay logic to restore trampoline claims. If any single previous hop in a htlc source with multiple previous hops requires claim, we represent this with a single outbound claimed htlc because we assume that *all* of the incoming htlcs are represented in the source, and will be appropriately claimed (rather than submitting multiple claims, which will end up being duplicates of each other). This is the case for trampoline payments, where the htlc_source stores all previous hops.
f15a1c5 to
8ce1604
Compare
|
Squashed and rebased. Primary changes in rebase were 713a9d9 and a94f410 because the logic had changed quite a bit on main, also fixed test failure + clippy. Range diff for rebase is here. I'm going to open up a followup with the changes needed to support replay of trampoline htlcs. We're not in any danger of downgrading to a version that can't handle this replay properly because we refuse to downgrade ( |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Got through the first half again, will resume in a few hours.
| 7u8.write(writer)?; | ||
| // Fields 1, 3, 9, 11, 13 and 15 are written for backwards compatibility. | ||
| let legacy_prev = prev_htlcs.first().ok_or(io::Error::from(IOInvalidData))?; | ||
| let legacy_next = next_htlcs.first().ok_or(io::Error::from(IOInvalidData))?; |
There was a problem hiding this comment.
Hmm, we generally strictly require that writes don't fail (and panic if writes to a Vec fail, in a few places). Can we instead debug_assert and write garbage (by skipping the TLV)? Same elsewhere in later commits.
| /// The `user_channel_id` for `channel_id`. | ||
| pub user_channel_id: Option<u128>, | ||
|
|
||
| /// The public key identity of the node that the HTLC was sent to or received from. |
There was a problem hiding this comment.
We should keep the information on since which version these are always set.
| // before 0.0.107 and we do not allow upgrades with pending forwards to 0.1 | ||
| // for any version 0.0.123 or earlier. | ||
| (17, prev_htlcs, (default_value, vec![HTLCLocator{ | ||
| channel_id: prev_channel_id_legacy.unwrap(), |
There was a problem hiding this comment.
We can't unwrap this unless we do the above read as required rather than option. We shouldn't encounter such data, of course, so switching to required seems fine, but we don't want to crash if we do.
| downstream_counterparty_and_funding_outpoint: Option<EventUnblockedChannel>, | ||
| EmitEventOptionAndFreeOtherChannel { | ||
| event: Option<events::Event>, | ||
| downstream_counterparty_and_funding_outpoint: EventUnblockedChannel, |
There was a problem hiding this comment.
Does it make sense to, instead of doing many EmitEventOptionAndFreeOtherChannels, instead do a single EmitEventAndFreeOtherChannel*s*? Its maybe not super critical/equivalent but it seems like it might simplify generation.
| /// We were responsible for pathfinding and forwarding of a trampoline payment, but failed to | ||
| /// do so. An example of such an instance is when we can't find a route to the specified | ||
| /// trampoline destination. | ||
| TrampolineForward {}, |
There was a problem hiding this comment.
Can come in a followup but we probably want to include more info here, like a routing error message or so (at least so we capture things like #4308)
TheBlueMatt
left a comment
There was a problem hiding this comment.
Okay actually got through it. All looks good, a few comments on structure and API design that aren't critical and a few minor notes but generally good.
| } | ||
|
|
||
| impl HTLCSource { | ||
| pub fn failure_type( |
There was a problem hiding this comment.
I mean I get it, but its a bit weird that we're just returning the fields passed in. I do wonder if this isn't an opportunity to move the Event::HTLCHandlingFailed::prev_channel_ids into HTLCHandlingFailureType so they can be built here?
|
|
||
| failed_forwards.push(( | ||
| // This can't be a trampoline payment because we don't process them | ||
| // as forwards (we're the last/"receiving" onion node). |
There was a problem hiding this comment.
Can we add a debug_assert!(false, "We don't handle trampoline failures in failure_handler above"); in the else branch to the if let PendingHTLCRouting::Forward { ref onion_packet, .. } = routing {? Seems to pass tests.
| ); | ||
|
|
||
| Some(events::Event::PaymentForwarded { | ||
| prev_htlcs: prev_htlcs.clone(), |
There was a problem hiding this comment.
nit: you can make this an FnOnce to avoid the clone.
| } | ||
| } | ||
|
|
||
| fn prune_forwarded_htlc( |
There was a problem hiding this comment.
We now have both a local lambda called prune_forwarded_htlc and a separate method called prune_forwarded_htlc, both identical.
| } | ||
|
|
||
| let mut fail_read = false; | ||
| if prev_hop.counterparty_node_id.is_none() { |
There was a problem hiding this comment.
I don't think we really need to retain this crap at this point. If they made it all the way to 0.3 and haven't hit this yet, we can just fail to deserialize period.
| fail_read |= fail_claim_read; | ||
| return Some(( | ||
| // When we have multiple prev_htlcs we assume that they all | ||
| // share the same htlc_source which contains all previous hops, |
There was a problem hiding this comment.
Can we validate this? Seems easy to validate?
This PR is a prefactor to support trampoline in LDK. The last commit in #3976 contains the remainder of the trampoline logic, which may be useful to understanding the context for some of these refactors. These changes move us from a one HTLC in/ one HTLC out world to one where we allow multiple incoming and multiple outgoing HTLCs, to allow MPP trampoline:
In this world, we only want to emit one
PaymentForwarded/HTLCHandlingFailedevent per trampoline forward. We will receive and process aupdate_fail/fulfillfrom each of our downstream peers (D+E), and need to understand whether we want to emit an event:pending_outbound_payments. If we only fail back once the last outgoing HTLC is cleared, we'll only emit one event.🧹 A few of these commits can definitely be squashed - I went with splitting move/rename commits up and using a few
todo!s that are later filled in to help keep review more readable. I don't mind this, but happy to squash where people see fit!Also could use some tests - will add once approach has an ACK.