-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max structured document size #22
Comments
are we walking about JSON Documents? because if we are, I think its better to just define the limit once, and not allow for different storage providers to use different values... for example: https://stackoverflow.com/questions/1262376/is-there-a-limit-on-how-much-json-can-hold I'd prefer to see the spec specify a set of maximum values for various things, and for vendors who want to exceed those for performance reasons to have to declare their incompatibility if its created by exceeding the spec guidelines. |
I think we need to define a limit for interoperability. We should do the same for chunk sizes for streams. |
Without a limit, a number of cases around replication or changing storage providers could get very messy or result in lock in that the standard is trying to help prevent. |
Next we need to agree on language to put into the spec that declares a limit on document size that all encrypted data vaults must abide by (and must support). The language could be something to the effect: "In order to encourage interoperability, a storage server MUST support document sizes of up to 16 MiB in size and MUST reject documents that are larger than this size. For data that is larger than 16 MiB, chunked streams MUST be used instead." Bikeshedding/PRs are welcome. |
Discussed on Jun 10, 2021 call. Tobias and Dave: less options is better, let's just pick a max size for version 1 of the spec, instead of having it be a config field. We don't want vaults to vary in that setting (otherwise, a document that fits on one vault can't migrate to another). |
No objections from the group to keeping 16MB size. |
We need to look up what the max document size is for MongoDB and express it in MiBs and then update the spec. |
Another reason for a simple limit: Migrating from EDV A to EDV B could become challenging (or not possible) without the same limits. So let's keep a single limit for version 1 of the spec. |
Saddly, BSON != JSON. |
Do we know if BSON size is always smaller than JSON? Btw, EDV implementers would be measuring by JWE size as it's the only common thing to measure. |
If we can't determine that BSON will be smaller than JSON, we should choose 10 MiB as a reasonable hedge. Multiples of 10 are easy to reason about quickly and it's likely not too small for nearly all use cases. As usual, if it's too small for a specific use case, streams can be used. There are also advantages to having the size be smaller; AEAD (authenticated encryption) requires the entire data chunk to be processed before determining whether to report an error or accept it. |
It seems that BSON can be smaller or larger than JSON, depending on the data: https://stackoverflow.com/questions/24114932/which-one-is-lighter-json-or-bson |
I'm going to create a PR shortly to address this issue. What I'm seeing so far from the group is:
Please feel free to continue discussing, and let me know if there are any objections to a 10 MiB limit! |
Your reasoning sounds solid to me, @DRK3 -- +1 to 10MiB limit. |
The spec currently has a size limit on Structured Documents, but not on Encrypted Documents. I think this should be the other way around - the limit should be on the Encrypted Documents themselves, for a few reasons:
(On a somewhat related point: since an EDV server never sees the unencrypted data (AKA Structured Document), technically I think nothing really forces you to use Structured Documents in the first place since the server won't know the difference, which begs the question: why do Structured Documents exist? I made an issue about this awhile ago: #32) |
Structured Documents exist because the client needs to know what to expect when decrypting the encrypted document, and needs to be able to encrypt documents in a way that other clients will be able to read. For example, for streams, you need to be able to encode the chunks, media type, and each chunk location in the EDV. You have to encode that information /somewhere/, and the Structured Document is that /somewhere/. The Structured Document is also has a free-form (but hopefully well specified) metadata storage location in addition to the file content. All this to say, we have Structured Documents because there has to be some place that we put file metadata and content. |
@msporny Makes sense! Thanks for explaining it to me. What do you think about having the 10 MiB limit be on the Encrypted Document size, rather than the Structured Document size? (for the reasons listed in #22 (comment)) |
Yes, +1 for the 10 MiB limit to be on the Encrypted Document size. |
Yes, this was always the intent, so if the spec says otherwise it's just a typo. |
Currently in the spec under structured documents it stipulates that the maximum document size is limited to 16MB, however this could be a decision reserved for a storage provider for which they could advertise their maximum allowed document size instead.
Suggestion is to remove this maximum document size from the spec and instead add language around how a storage provider server can advertise what their maximum size is, the spec could offer recommendations on how this should be set
The text was updated successfully, but these errors were encountered: