-
Notifications
You must be signed in to change notification settings - Fork 50
[P2P] known TxOrphanage problems
Historically, orphan-handling was an opportunistic process and the
TxOrphanage
was not designed to be a reliable data structure.
Thus, these problems are not necessarily bugs or vulnerabilities, but
they do stop orphan-handling (and thus the existing and proposed package relay
logic) from guaranteeing reliable propagation.
Ideally, transaction relay is "censorship-resistant." Concretely, we'd like to ensure that, for any single node, as long as we have one honest peer who wants to tell us about a transaction, we download it. We hope to guarantee this even in the presence of determined adversaries who might make lots of inbound connections to this node and send specially-crafted p2p messages to make us drop important information or give up on downloading a transaction.
The other, probably more important, design goal is to limit the resource usage of orphan-handling and the orphanage data structure. In the past, we had DoS vulnerabilities from orphan handling loops through inputs and the orphanage having no memory bounds.
Ordered by the order in which they should be resolved (severity and solution ease).
After we have placed a transaction into our orphanage, any tx that has the same txid is considered "already in orphanage" and we won't add it. All 1p1c submissions are composed of a parent and a transaction taken from the orphanage.
This means an attacker could send us a mutated version of a child transaction in a 1p1c package and we'd fail to accept the parent+child package until that transaction is evicted from orphanage. Of course, once the mutated transaction is evicted, they can send another mutated version.
Solution: check (and erase) by wtxid. See #30000
When we have an orphan transaction, we
request missing parents from the sender and hope that they send it to us. If we
get notfound
or no response for the parents, we make no effort to try again.
This means that an attacker can block us from hearing about the parent in a
1p1c package by being the first sender of the child, unless our other peers
also announce it.
Solution: retry with the other peers who can provide us the parent (presumably
that is everyone who announced the orphan, since they must have had the parent
in order to validate the child). This brings us to the next limitation... ###
Only tracking 1 fromPeer We only track 1 fromPeer
for each orphan entry. When
we add a tx to orphanage, we just fill in fromPeer
with the NodeId
of the
peer who sent it. We don't add the other announcers of the tx. Similarly, if we
receive an inv
for a tx in orphanage, we just drop it and don't track that
this peer potentially knows the parent.
When we decide to request an orphan's missing parents, we call
m_txrequest.ForgetTxHash
(so that we don't continue requesting/downloading
this transaction that we've already seen and are keeping in orphanage). So we
have also forgotten about those announcers if/when we get to the point where we
have failed to resolve the orphan's missing inputs with the first peer.
See #31397
We evict randomly. This means an attacker can make it more likely for us to drop useful orphans simply by sending a large volume of orphans (which can be transactions that spend nonexistent UTXOs and never cost anything to create).
Some solutions we've discussed:
- Limiting the amount of in-flight orphan resolutions or orphanage usage per peer. We didn't like this idea since transactions, blocks, and likely orphan resolutions are usually disproportionately provided to us by the same few outbound peers. So an in-flight limit would likely slow things down in the average case. Draft.
- Changing our eviction algorithm to be more clever and try to load-balance across peers. IIRC I started implementing this but found it a bit too complicated.
- For each peer, protecting a certain amount of the orphans provided by them.
We can do this via some token bucket scheme in
which each peer gets some number of "tokens" (a counter in a
std::map
), each good for X bytes of orphanage storage. The "tokens" would be returned for successful orphan acceptance, not returned for invalid orphans, and replenished at some slow rate to a certain ceiling. Draft in #27742.
The 1p1c logic gathers all candidate children from the orphanage and tries submitting a maximum of 1 before dropping the rest on the floor. Limiting to 1 package is good to bound computation, but we may fail to accept a package if there are a lot of other candidates. These candidates might not be good ones.
This means an attacker may try to block us from accepting a package by sending many invalid children of the parent transaction.
This might be completely alleviated if we restrict 1p1c pairings to transactions from the same peer, since we will eventually attempt to resolve the honest peer's orphan. However, if they provided multiple candidates e.g. because of replacement(s), we may choose the wrong one and fail. (Partial solution in the last commit of #31397).
Solution: add a work queue for 1p1cs. This requires storing the parent transaction somewhere (probably also in orphanage) and bounding the memory used for this purpose.
TxOrphanage
has a hard upper limit of 100 orphans. This limit is global; it is
not meaningfully allocated between peers, which allows any one peer to dominate
it. It also doesn't do a good job of codifying the actual scarce resource,
which is memory. The theoretical maximum is 40MB = 100 * 400KB, but we evict
much sooner than that since most “normal” transactions are likely much smaller
than 100KvB. And actually, it may be larger, since we store orphan transactions
as CTransationRef
s, not serialized. It's also unclear if this is a good
number. It is quite large for a data struture that holds unvalidated data.