[svs] Tackling Scalability #3
justincpresley
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Introduction
A challenging yet interesting issue can be seen by reviewing the StateVectorSync (SVS) Protocol. An NDN Interest has a limit to how large it can be (8800 Bytes), yet SVS does not consider this fact. Thus, a SVS StateVector can become too large to fit inside an NDN Interest packet resulting in undefined behavior. While this issue is not a fault with SVS's design, it can present a frustrating situation for a practical use of SVS. Naturally, you might ask "What can I or we do about this issue?".
There are many tactics we can employ to resolve our worry. However first, it must be stated that this is entirely dependent on your application. For example, the default specification (without any tactics) will be best for 3 datasets with relatively small names such that the scalability issue is never encountered. Of course, libraries like this one can offer that customization or provide a hybrid implementation. Nevertheless, the tactics provided can be used in conjunction.
Avoidance
The scalability problem is a dangerous one. As such, why not try and avoid the issue entirely? Avoidance is not necessary a bad choice either as resolutions do have tradeoffs.
Resolutions
With the scalability problem, there becomes a single choice, "What to do when the StateVector overflows?". Yet this question is easily answered, we can either have the StateVector overflow into more NDN Interests or not provide the full StateVector. The former is simply inefficient, redundant, and troublesome to implement. I was hesitant to call this brute-force solution a solution really.
The latter however is interesting to explore. If we take a partition of the vector, new problems do arise yet they are fundamentally different then our original scalability problem.
Entry Selection
Our first order of business is what entries to include in our partition. This is exactly what this paper discusses. The conclusion effectively is that including the latest entries that were updated achieves the lowest latency. One notable point that is not discussed in that paper is the use of application context. If certain datasets are more or less valuable than others, our selection may be altered to include more valuable datasets more often.
Dealing with Incomplete State
For some applications, stopping here is acceptable. However, for many applications whose datasets are not ephemeral/atomic (they rely on a larger context of all datasets); this solution is incomplete. A new question remains, "How do I know I am in Sync?". Because the partition becomes increasingly a smaller potion of the entire StateVector, a node can become out-of-sync given a large enough StateVector.
There are again, two answers: a global approach and an individual one.
Global Approach for Ensuring State
The first answer is to periodically send the full StateVector by overflowing it into more NDN Interests. This ensures that for a given time t, all nodes are ensured to be in sync. This solution has the same aspects as the first resolution to the scalability problem as conceptually it is the same solution. In practice, this solution would result in large network spikes every time we do a full sync. We can do better.
Individual Approach for Ensuring State
This answer is based on the recognition that not all nodes need to be synced and based on the fact that discovering whether a node is in sync is not the same as syncing the node. Thus, how can we help a node determine "If it needs to be synced?". We can do so by providing MetaInfo about the full StateVector. Given enough MetaInfo, nodes that are out-of-state either by first joining or due to frequent loss can detect a comparable difference. If they notice they are behind, they can proceed to retrieve the full StateVector from a synced node. They can detect who is a synced node by either
pinging
the nodes or by including a sender ID within the sync Interest (removing Interest Aggression).MetaInfo
The MetaInfo. It must appear right before the StateVector, but What does that look like? This structure is optimized, the very first variable is a boolean telling whether the StateVector is complete or not. If it is complete, the structure ends there. If not, the following is included.
Approximate Metadata:
Specific Metadata:
Deciding what else the MetaInfo includes can be accomplished through further testing and designing. For example, a Hash may be useful to determining small inconsistencies.
Discussion
If you would like to contribute to this discussion, please do so! There are many research directions/questions like this one that arguably need community engagement. Whether it is to clarify my wording or providing an idea, it is welcomed.
Beta Was this translation helpful? Give feedback.
All reactions