Skip to content

[P2P] known TxOrphanage problems

Gloria Zhao edited this page Jan 3, 2025 · 4 revisions

Historically, orphan-handling was an opportunistic process and the TxOrphanage was not designed to be a reliable data structure. Thus, these problems are not necessarily bugs or vulnerabilities, but they do stop orphan-handling (and thus the existing and proposed package relay logic) from guaranteeing reliable propagation.

Ideal Situation

Ideally, transaction relay is "censorship-resistant." Concretely, we'd like to ensure that, for any single node, as long as we have one honest peer who wants to tell us about a transaction, we download it. We hope to guarantee this even in the presence of determined adversaries who might make lots of inbound connections to this node and send specially-crafted p2p messages to make us drop important information or give up on downloading a transaction.

The other, probably more important, design goal is to limit the resource usage of orphan-handling and the orphanage data structure. In the past, we had DoS vulnerabilities from orphan handling loops through inputs and the orphanage having no memory bounds.

Known Problems

Ordered by the order in which they should be resolved (severity and solution ease).

One per txid (resolved)

After we have placed a transaction into our orphanage, any tx that has the same txid is considered "already in orphanage" and we won't add it. All 1p1c submissions are composed of a parent and a transaction taken from the orphanage.

This means an attacker could send us a mutated version of a child transaction in a 1p1c package and we'd fail to accept the parent+child package until that transaction is evicted from orphanage. Of course, once the mutated transaction is evicted, they can send another mutated version.

Solution: check (and erase) by wtxid. See #30000

No effort to retry parent fetching

When we have an orphan transaction, we request missing parents from the sender and hope that they send it to us. If we get notfound or no response for the parents, we make no effort to try again. This means that an attacker can block us from hearing about the parent in a 1p1c package by being the first sender of the child, unless our other peers also announce it.

Solution: retry with the other peers who can provide us the parent (presumably that is everyone who announced the orphan, since they must have had the parent in order to validate the child). This brings us to the next limitation... ### Only tracking 1 fromPeer We only track 1 fromPeer for each orphan entry. When we add a tx to orphanage, we just fill in fromPeer with the NodeId of the peer who sent it. We don't add the other announcers of the tx. Similarly, if we receive an inv for a tx in orphanage, we just drop it and don't track that this peer potentially knows the parent.

When we decide to request an orphan's missing parents, we call m_txrequest.ForgetTxHash (so that we don't continue requesting/downloading this transaction that we've already seen and are keeping in orphanage). So we have also forgotten about those announcers if/when we get to the point where we have failed to resolve the orphan's missing inputs with the first peer.

See #31397

Easy Churnability

We evict randomly. This means an attacker can make it more likely for us to drop useful orphans simply by sending a large volume of orphans (which can be transactions that spend nonexistent UTXOs and never cost anything to create).

Some solutions we've discussed:

  • Limiting the amount of in-flight orphan resolutions or orphanage usage per peer. We didn't like this idea since transactions, blocks, and likely orphan resolutions are usually disproportionately provided to us by the same few outbound peers. So an in-flight limit would likely slow things down in the average case. Draft.
  • Changing our eviction algorithm to be more clever and try to load-balance across peers. IIRC I started implementing this but found it a bit too complicated.
  • For each peer, protecting a certain amount of the orphans provided by them. We can do this via some token bucket scheme in which each peer gets some number of "tokens" (a counter in a std::map), each good for X bytes of orphanage storage. The "tokens" would be returned for successful orphan acceptance, not returned for invalid orphans, and replenished at some slow rate to a certain ceiling. Draft in #27742.

No effort to try all package combinations

The 1p1c logic gathers all candidate children from the orphanage and tries submitting a maximum of 1 before dropping the rest on the floor. Limiting to 1 package is good to bound computation, but we may fail to accept a package if there are a lot of other candidates. These candidates might not be good ones.

This means an attacker may try to block us from accepting a package by sending many invalid children of the parent transaction.

This might be completely alleviated if we restrict 1p1c pairings to transactions from the same peer, since we will eventually attempt to resolve the honest peer's orphan. However, if they provided multiple candidates e.g. because of replacement(s), we may choose the wrong one and fail. (Partial solution in the last commit of #31397).

Solution: add a work queue for 1p1cs. This requires storing the parent transaction somewhere (probably also in orphanage) and bounding the memory used for this purpose.

General resource management inefficiencies

TxOrphanage has a hard upper limit of 100 orphans. This limit is global; it is not meaningfully allocated between peers, which allows any one peer to dominate it. It also doesn't do a good job of codifying the actual scarce resource, which is memory. The theoretical maximum is 40MB = 100 * 400KB, but we evict much sooner than that since most “normal” transactions are likely much smaller than 100KvB. And actually, it may be larger, since we store orphan transactions as CTransationRefs, not serialized. It's also unclear if this is a good number. It is quite large for a data struture that holds unvalidated data.