Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add content-encoding support to spec for posting blobs #235

Open
sargun opened this issue Feb 15, 2021 · 25 comments
Open

Proposal: Add content-encoding support to spec for posting blobs #235

sargun opened this issue Feb 15, 2021 · 25 comments

Comments

@sargun
Copy link
Contributor

sargun commented Feb 15, 2021

As we want to future proof the protocol. I suggest we add Content-Encoding type. This would be around making it so that content can be pushed, and individual chunks can be compressed given the compression format that best fits their use case, as opposed to whatever the media type dictates. This enables us in the future to move to better formats. I specifically think that this should be supported on the upload step.

Flow

In the step where the user creates a new session /v2//blobs/uploads/, the server responds with the Distribution-accept-encoding: ...., where ..., uses the standard mechanism of listing encoding accepted, and supported the quality value described in the HTTP spec (https://tools.ietf.org/html/rfc7231#section-5.3.1).

For example:

POST /v2/ubuntu/blobs/uploads/ HTTP/1.1
Accept-Encoding: gzip, deflate
Content-Length: 0
Host: ...

HTTP/1.1 202 Accepted
Content-Length: 0
Docker-Upload-Uuid: 752b820d-d52d-4a58-a816-4fcb431122fa
Location: https://.../v2/ubuntu/blobs/uploads/752b820d-d52d-4a58-a816-4fcb431122fa?_state=MDzIPDnJ5OSmNh6P7i_SK-dDAD%3D%3D
Range: 0-0
Distribution-accept-encoding: gzip; q=0.8, identity; q=0.2;  zstd; br

The other aspect that I think we should require is that the user use the same content-encoding for all blobs in a multi-blob push.

FAQ

Question: Why not return a body like:

{
  "acceptsEncodings": ["zstd", "br"]
}

Answer: This endpoint does not return a payload today, and it's unknown whether returning a payload would break any clients. IMHO, we should move the location / session info into a JSON body, but that's a bigger question

Question: Why not use transfer encoding?
Answer: It doesn't seem to have support of new compression standards.

Question: Why don't we just add media types with all possible encoding standards?
Answer: Those are largely dictated by the image spec. The current behaviour of coupling the distribution and image spec tightly makes it difficult to adopt new things in distribution. Since distribution has come out, zstd, br, and others have become much better than gzip in many ways. Lifting and shifting today would require a massive change or on the fly conversion, which isn't always possible, especially with cryptographically verified payloads.

Question: Why wouldn't we just zstd all the things?
Answer: It's not always the smallest:

320K	small.br
352K	small.gz
288K	small.xz
292K	small.zst

193M	comp.gz
 63M	comp.xz
110M	comp.zst
545M	comp

In the day of slow, WFH internet, size matters a lot to people. Scaling up endpoint compute is much easier in many cases.

@sargun sargun changed the title Add content-encoding request header to spec Proposal: Add content-encoding support to spec for posting blobs Feb 15, 2021
@justincormack
Copy link

content encoding supports zstd its in the registry https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding

@justincormack
Copy link

The main issue with changing to content encoding is handling clients who cant understand the format, as decompressing and eg recompressing with gzip is expensive on the server side, assuming that the server is storing blobs with zstd. Also the client can't verify the blobs until uncompressed.

@sargun
Copy link
Contributor Author

sargun commented Feb 16, 2021

@justincormack Yeah, but I think supporting Content-encoding is a nice "SHOULD" to have in the registry spec. I understand that decompressing / recompressing gzip is expensive. You can keep the content compressed ahead of time if you want server side. On the other hand, streaming decompression + hashing isn't that much overhead compared to CPU for home users. CPU has gotten way cheaper, but 10Mbps throughput still isn't common for lots of folks.

@jonjohnsonjr
Copy link
Contributor

I am a little worried about giving the registry more visibility into blobs. Right now, they are completely opaque to the registry until you push a descriptor that references them, so registries generally don't try to do anything cute with blobs, which gives flexibility to clients. My concern also applies to #101

I don't feel strongly enough about this for it to sway my opinion on either issue, but I wanted to bring it up as a discussion point. If we give registries more information, it becomes possible to act on that information, even if we have spec language that says something like:

Any other types will be ignored by a registry and not be reflected when that blob is pulled.

@sargun
Copy link
Contributor Author

sargun commented Feb 18, 2021

@jonjohnsonjr I believe the content-encoding could still be opaque if the registry did not want to do "transcoding" -- it would just only be allowed to serve content with the same content encoding (or reject it, and only allow for the identity encoding).

On the other hand, from an efficiency perspective, uploading uncompressed data has some major benefits.

  1. Dedupe
  2. Shared dictionaries for compression
  3. Being able to rev the compression formats / encoding based on the use case
  4. Serving the right encoding based on if the user is in the DC vs. at home on a low bandwidth connection

@jonjohnsonjr
Copy link
Contributor

The other aspect that I think we should require is that the user use the same content-encoding for all blobs in a multi-blob push.

Why? This seems like it might limit the ability to cross-repo mount blobs.

Question: Why don't we just add media types with all possible encoding standards?
Those are largely dictated by the image spec.

I disagree with this point. There are OCI-specific media types defined, but that does not exclude other media types from being used as long as they conform with RFC 6838. I think a lot of folks (include myself) have made some incorrect assumptions about media types in such a way that the image spec is internally inconsistent, e.g. with the zstd stuff.

The OCI image spec is not the place for specifying these things, though documenting them there is reasonable. There is a standard process for registering media type suffixes that was not followed for the zstd layer proposals. I poked enough people to correct that by getting +zstd added to the structured suffix registry. There's no reason that other encodings couldn't be used (aside from clients not understanding them) if they are similarly added via the IANA process.

as opposed to whatever the media type dictates

How would this work for pushing and pulling?

I'm curious about how this would affect the current blob upload flow, digest calculations, and client behavior around descriptors.

Because a client is responsible for compression currently, it's trivial to produce an appropriate descriptor by counting and hashing the bytes as they're uploaded. How can a client do this if the registry has the freedom to change the encoding? Doesn't this largely break anything that depends on the properties of the CAS?

Also the client can't verify the blobs until uncompressed.

I think this is important, as it opens us up to zip bombs. If we have a trusted client, we can trust the descriptor they produced. Currently, that means we can download exactly the number of bytes we expect from the registry and hash them to ensure that the content matches what was uploaded before attempting to decompress anything. If we have to decompress the blobs to verify the contents, a malicious registry server could effectively DoS the client.

@jonjohnsonjr
Copy link
Contributor

I think I know what you're getting at with this, and if I'm right, I agree with you. We'd just need to clarify what registries and clients SHOULD support.

To achieve this, clients should upload the uncompressed contents, but use compression during transport, as negotiated by Accept-Encoding.

This would mean that the resulting descriptors that point to these blobs (e.g. a manifest's layers) would have the mediaType "application/vnd.oci.image.layer.v1.tar" (no +gzip or +zstd), and the size would be the number of bytes for the uncompressed contents. In this case, the digests should match the diffid contained in the config file.

As you described, this has a ton of benefits registry side, as they can store, dedupe, and manage content however is most efficient for them.

Clients are still free to do what they are doing today, if that would be inefficient for them.

This would end up burning some CPU in certain cases, but we can specify that registries and clients SHOULD support both gzip and zstd compression during transport for broadest compatibility.

If a client or registry doesn't want to store the uncompressed content or re-compress it themselves, they can always tee the compressed content to disk while de-compressing the blob to verify the uncompressed content's digest and size.

I think this is important, as it opens us up to zip bombs.

My concern here goes away if the descriptor has a size for the uncompressed contents (a bit). We can know an upper limit to how much should be read, at least, as you would expect compression to almost always result in smaller blobs than uncompressed content (arguably, if a registry is using an encoding that increases the size of the uncompressed content, that's probably a bug, because it should be smart enough to know that compressing the thing actually made it bigger).

This all seems fine and good to me -- even better than what we're doing today. I don't think we need any changes to the image-spec for this, at all. My only concerns are around registry and client compatibility. I expect that if you started doing this today, registries wouldn't even think to compress things or try to negotiate the Content-Encoding, so we'd end up just shipping around uncompressed tarballs for a while until everything got fixed.

I'd also expect some clients just assume layers are always gzipped, in spite of the spec, so it would take a while to squash all the bugs around that.

@sargun
Copy link
Contributor Author

sargun commented Feb 18, 2021

@jonjohnsonjr You've got it!

The OCI image spec is not the place for specifying these things, though documenting them there is reasonable. There is a standard process for registering media type suffixes that was not followed for the zstd layer proposals. I poked enough people to correct that by getting +zstd added to the structured suffix registry. There's no reason that other encodings couldn't be used (aside from clients not understanding them) if they are similarly added via the IANA process.

I would rather not have to go through the process of adding each encoding that IANA adds. IMHO, it's better that we do it "once" -- and that "once" is relying on the HTTP spec to define the base content encodings. If we want to extend it (custom zstd dictionaries or xz, that's our prerogative).

The way that this would ~roughly work is that people can push / pull blobs and the registry can store the unencoded format, or a trivially compressed format. Periodically (let's say 1 / day), "hot" blobs would be highly compressed with a purpose built dictionary.

I'm curious about how this would affect the current blob upload flow, digest calculations, and client behavior around descriptors.

Because a client is responsible for compression currently, it's trivial to produce an appropriate descriptor by counting and hashing the bytes as they're uploaded. How can a client do this if the registry has the freedom to change the encoding? Doesn't this largely break anything that depends on the properties of the CAS?

The digest would be the contents of the unencoded file. As the client is pulling the file, it can process the hash in a streaming manner.

My reasoning for the client being responsible for choosing the encoding at pull / push time is that:

  1. Storage ("at rest") is cheap
  2. Bandwidth to the developer is expensive (WiFi for 90-something% of developers), but they have plenty of bandwidth. Also, they have high-latency.
  3. CPU on registry nodes for "offline" processing is cheap
  4. CPU on worker nodes in the DC is at a premium

The blobs would be addresses by their unencoded values. Compression can happen in the background as described above.


In regards to ZIP bombs, all of the encodings (xz, br, zstd, deflate) that are popularly supported have had hardening against this, either from the browser / HTTP client vendors after years of hard work (deflate), or are built into the encoding themselves (zstd).

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Feb 18, 2021

I would rather not have to go through the process of adding each encoding that IANA adds. IMHO, it's better that we do it "once" -- and that "once" is relying on the HTTP spec to define the base content encodings. If we want to extend it (custom zstd dictionaries or xz, that's our prerogative).

I'm not sure that I understand your point here, so apologies if I'm just repeating what you've said (again).

I want to lean heavily on existing HTTP standards for this. Per RFC 7231, acceptable content-codings are defined by IANA in the HTTP Content Coding Registry.

I am not interested in us maintaining a parallel set of content-codings, but would prefer that anything novel here goes through the standard RFC process. This might be a great place to discuss and experiment with such things, which eventually get turned into an RFC, and then standardized through IANA, but defining them in an OCI spec is somewhat of a layering violation.

I agree that the IANA process is pretty heavy, but for something as fundamental as this, I think it's appropriate. gzip, zstd, and br are already defined here, so I think I'm clearly misunderstanding something.

In general, I'm heavily in favor of anything where we just rely on existing HTTP semantics and clarify that registries are expected to support them. You can get what we have today with +gzip media types and identity content encodings, so this isn't even a backwards-incompatible "change", AFAICT.

We don't even need to really specify much about which content-codings a registry or client should support, since that's already part of the content-negotation process in the RFC, right?

I guess to summarize my opinion:

Yes, this is a good idea. We should be doing this, and nothing in the spec forbids it, but it might make sense to call this out as expected behavior.

I am interested in seeing client and registry implementations that take advantage of this. I am also interested to know if any registries support this today, and how many clients would be broken by this.

@sargun
Copy link
Contributor Author

sargun commented Feb 18, 2021

@jonjohnsonjr Sorry, my comment was somewhat flippant. Moreso, support for content-encoding comes "for free" when IANA approves it an my HTTP libraries + load balancers + backend storage get support. Although it's not meant to be a hop-hop header, proxies can do encoding on behalf of the workload.

@jonjohnsonjr
Copy link
Contributor

Sorry, my comment was somewhat flippant

Not at all, I've conflated Content-Encoding and Transfer-Encoding in my understanding of your proposal, so I'm just fumbling my way through this trying to understand.

Distribution-accept-encoding

I missed this originally -- you want a parallel version of this that corresponds to Transfer-Encoding (I think? Maybe both?) so that we don't go through IANA?

Question: Why not use transfer encoding?
Answer: It doesn't seem to have support of new compression standards.

I now understand your original point here. The HTTP Transfer Coding Registry is different from the HTTP Content Coding Registry, and Transfer-Encoding semantics seem to map better onto what I'm describing than Content-Encoding...

Names of transfer codings MUST NOT overlap with names of content
codings (Section 3.1.2.1 of [RFC7231]) unless the encoding
transformation is identical, as is the case for the compression
codings defined in Section 4.2.

This is interesting to me -- I'm not sure why zstd and br are omitted from the transfer coding registry, unless it's just an oversight? It doesn't seem like there is any reason for this not to be supported if gzip is.

Do you have more context for this? I haven't read through all of the background, but I want to suggest we just start using Transfer-Encoding and in parallel see if we can get this "fixed" to include the new compression standards. I guess either of these would make sense, depending on the registry's implementation, and would be orthogonal to the mediaType concerns from before. My point about teeing the gzipped response makes sense for Content-Encoding but not as much for Transfer-Encoding, if I'm reading this correctly.

@felixhandte any background?

@sargun
Copy link
Contributor Author

sargun commented Feb 19, 2021

The content registry (as of this point in time --- https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding):

aes128gcm AES-GCM encryption with a 128-bit content encryption key [RFC8188]  
br Brotli Compressed Data Format [RFC7932]  
compress UNIX "compress" data format [Welch, T., "A Technique for High Performance Data Compression", IEEE Computer 17(6), June 1984.] [RFC7230] Section 4.2.1
deflate "deflate" compressed data ([RFC1951]) inside the "zlib" data format ([RFC1950]) [RFC7230] Section 4.2.2
exi W3C Efficient XML Interchange [W3C Recommendation: Efficient XML Interchange (EXI) Format]  
gzip GZIP file format [RFC1952] [RFC7230] Section 4.2.3
identity Reserved (synonym for "no encoding" in Accept-Encoding) [RFC7231] Section 5.3.4
pack200-gzip Network Transfer Format for Java Archives [JSR 200: Network Transfer Format for Java][Kumar_Srinivasan][John_Rose]  
x-compress Deprecated (alias for compress) [RFC7230] Section 4.2.1
x-gzip Deprecated (alias for gzip) [RFC7230] Section 4.2.3
zstd A stream of bytes compressed using the Zstandard protocol [RFC-kucherawy-rfc8478bis-05]

I believe that in HTTP/2 and HTTP/1.1, CE has superseded TE as you can basically do everything with CEs that you can with TEs:

However, a non-transparent proxy MAY modify the content-coding if the new coding is known to be acceptable to the recipient, unless the "no-transform" cache-control directive is present in the message.

There's also the Vary header which allows proxies to serve multiple versions of the same content.


RE: Distribution-accept-encoding and forcing all blobs within a given manifest / upload to have the same CE -- the idea is that uploading may be separated from the creation of the manifest. Rather than the user doing POSTs with multiple encodings and finding out the server doesn't accept a given encoding for a blob, they instead check the Distribution-accept-encoding once, and use the "most preferred" encoding. Alternatively, we can relax it to any of the CEs in the header.

@felixhandte
Copy link

My understanding matches, I think, what is being discussed here. Outside of chunked in HTTP/1.1, TE is little-used, and in practice CE has come to mean what TE was intended to mean. We did not pursue a TE registry entry for zstd because we weren't aware of any modern use of it.

Without knowing much about the context of this topic, I'll offer generic advice:

The fundamental question (that is sort of touched on in TE vs CE) is whether compression is part of the format of the payload, or simply a transformation applied to it as part of storing or transporting it. My recommendation is to avoid baking compression into the format you are defining, and instead allow it to be a transformation on top of the format applied during transport or storage.

Some pros and cons:

  • - Optionality and hop-by-hop-ness in compression can lead to inefficiencies where actors in your system needlessly recompress or forget to compress.
  • - You have to decompress to validate the object, which violates "encrypt then MAC" guidance by analogy.
  • + You get a simpler format spec.
  • + Well understood, existing negotiation mechanisms handle interoperability.
  • + Within that interoperability, you get implementation agility, with individual components able to make appropriate choices for their situation, including choices that did not exist at the time you wrote the spec. (You get to use hot new compression algorithms as they come out!)
  • + You identify / hash / deduplicate / etc. payloads based on their actual values, rather than their compressed representation (which is not a deterministic transformation).

@jonjohnsonjr
Copy link
Contributor

That matches my understanding and makes me feel better about conflating CE and TE 😉

My recommendation is to avoid baking compression into the format you are defining, and instead allow it to be a transformation on top of the format applied during transport or storage.

Yep! This is exactly what's being proposed, so I feel like we are heading in the right direction.

You have to decompress to validate the object, which violates "encrypt then MAC" guidance by analogy.

This was also my main concern, but I think we are somewhat in the clear here because we know the exact size and hash of the expected output from decompression, based on the content descriptor (which is most of the context you're probably missing, though I don't expect you to dig into this at all 😄).

Thanks again!

@sargun
Copy link
Contributor Author

sargun commented Feb 20, 2021

Given decompression can happen at "wire speed" these days, is it so much more overhead than doing traditional validation itself?

Maybe the above proposal could be split into two:

  1. The registry SHOULD allow clients to upload blobs with a content encoding. If it allows clients to upload blobs using a content encoding, it MUST be able to serve that blob to other clients using other content encodings, or none at all (identity). If the registry does not implement a given content encoding at upload time, it should respond with 415 Unsupported Media Type.
  2. Some mechanism of advertising which content-encodings the given registry supports.

@SteveLasker
Copy link
Contributor

SteveLasker commented Mar 3, 2021

The idea to improve compression options is goodness. We need the means to evolve as new performant options come available, and the network topologies evolve.
That said, what I'm reading here is registries should evolve from receiving and distributing content, to actively engage in compression, adding compute on top of storage.
One of the many aspects of distribution is its immutability and lack of compute processing for the content. Registries can scale to billions of operations and petabytes of data, on a very small footprint of compute. Asking a registry to start negotiating content and compression will dramatically add overhead, costs, degrade performance as multi-tenant registries (what all major cloud vendors run) will then have to cope with noisy neighbor problems. I'm not opposed to adding capabilities, including compute where needed and well defined. However, asking a registry to do this active negotiation and transcoding has a number of issues captured above:

  • How does a client, like Notary v2 sign the manifest of descriptors if they can change on the server?
  • What are the various permutations that a registry would need to support. If a customer pulls from MCR to GCR, what's the expected compatibility?
  • How will CDNs be impacted as the same content can be returned multiple ways. What does a manifest look like that supports multiple formats?

Context:
We're doing a form of this decompression with Project Teleport today. And, we make it opaque to the user. A user uploads an image, compressed with the standard formats. We decompress it, storing expanded layers. Which can be quite large, so while storage is considered cheap, customers don't delete their content, and it adds up way too fast. So, please don't be so quick to discount storing multiple formats.
The clients negotiate with the server to see if the image is teleportable, and the client is in the same region and also supports teleport. The in-flight expansion and negotiation are quite expensive and complex. Attempting to do this at an infinite number of compression formats doesn't scale well.

There should be some generic libraries to support new compression formats that each artifact tooling could support. I just don't think we want to make registries "smart" to have to engage in this processing and figure it out for all artifact types. The registry supports blobs. How a client compresses those blobs is up to each artifact type's tooling. This simplicity and separation of concerns has allowed us to scale to all types of scenarios.

I'm all for better compression and shared libraries to enable evolving forms of compression. Even asking a registry if a multi-arch tag has different versions. I'd just suggest this is an artifact-specific decision, and the image-spec can and should evolve to support any range of formats it believes are necessary.

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Mar 3, 2021

That said, what I'm reading here is registries should evolve from receiving and distributing content, to actively engage in compression, adding compute on top of storage.

This is just a natural part of HTTP. Registries speak HTTP, so it's reasonable to expect they could take advantage of this. It's also negotiated with the client, so a "dumb" registry that doesn't want to support any of this works just fine.

However, asking a registry to do this active negotiation and transcoding has a number of issues captured above

I don't think there were any issues captured above other than some optional sugar for clients. At least for me, I came to the conclusion that this is an excellent proposal that is pretty trivial to implement both registry side and client side.

What are the various permutations that a registry would need to support. If a customer pulls from MCR to GCR, what's the expected compatibility?

Registries wouldn't need to do anything for clients that started trying to take advantage of this. The benefits really arise when both clients and registries start doing this, but I would expect most registries could support a no-op version of this today without any changes. The client would just be uploading uncompressed blobs in that case.

To expand on this a bit, for media types ending in +gzip or +zstd, registries should keep doing what they're doing today. The blobs are opaque and they don't care about them.

For media types that are not compressed, the registry and client can perform this content-encoding negotiation to determine the best format to use when uploading blobs. On the backend, the registry can do the same content-encoding negotiation when the client downloads blobs.

Again, for registries that don't do any content-encoding negotiation, this would still work, but they'd lose out on possible compression savings.

How does a client, like Notary v2 sign the manifest of descriptors if they can change on the server?

The manifest would not change on the server. The content-encoding negotiation happens as part of the HTTP protocol. How clients interpret the media types within the descriptor does not change.

What does a manifest look like that supports multiple formats?

See my comment: #235 (comment)

You could drop +gzip and +ztsd from the media type in the descriptor and allow the compression used in transport to be an implementation detail of HTTP. The content is still immutable, we'd just be dealing with the hash of the content directly instead of the hash of the gzipped (or zstd-compressed) content. Of course, clients could continue to do what they're doing today, and it should still work, this would just partially walk back a hard-coded mistake in the original implementation. It's nice that we can use the hash of the gzipped blobs for verification because it enables dumb, static file registry implementations to be reasonably efficient, but it's sub-optimal for all the reasons described above.

This unshackles us from gzip in a backward compatible way, so we could actually start rolling out zstd support, for which we currently have no transition story.

We're doing a form of this decompression with Project Teleport today. And, we make it opaque to the user. A user uploads an image, compressed with the standard formats. We decompress it, storing expanded layers. Which can be quite large, so while storage is considered cheap, customers don't delete their content, and it adds up way too fast. So, please don't be so quick to discount storing multiple formats.

I think you're really misunderstanding the proposal. This wouldn't require registries to do anything extra unless it would be beneficial to them. It gives registries the flexibility to trade off CPU (this can often be offloaded to hardware) to save storage.

The in-flight expansion and negotiation are quite expensive and complex. Attempting to do this at an infinite number of compression formats doesn't scale well.

You are not expected to support an infinite number of compression formats. The entire point of the negotiation is to find a subset that both client and registry support. Worst case, you can just store and return the uncompressed bytes directly. Best case, the client supports the compression format that is most convenient for you.

There should be some generic libraries to support new compression formats that each artifact tooling could support.

I think this is a really naive way to think about software. Not all software is written in the same language, so I don't understand what you're proposing here? These generic libraries should already exist for basically anything that's part of the IANA standards (which this proposal would lean on) because it's just compression. We don't need anything specific to OCI for registries to do compression.

I just don't think we want to make registries "smart" to have to engage in this processing and figure it out for all artifact types. The registry supports blobs. How a client compresses those blobs is up to each artifact type's tooling. This simplicity and separation of concerns has allowed us to scale to all types of scenarios.

Again, we discussed this at length. The registry can continue to be dumb, but this proposal uses a standard mechanism from HTTP to progressively enhance the experience of content transport. Right now, most manifests are being produced with the compression hard-coded as part of the media type. That's fine, and will always work, but compression should be an implementation detail of distribution, as it's not inherent to the nature of the content.

Even asking a registry if a multi-arch tag has different versions.

This seems like an incomplete thought and I don't understand how architecture would relate to compression.

I'd just suggest this is an artifact-specific decision, and the image-spec can and should evolve to support any range of formats it believes are necessary.

This doesn't change that at all, it just allows for compression to be an implementation detail of transport instead of a quality of the artifact. If we can get broad support for this, we could give guidance that artifacts should generally reference content by its uncompressed hash and use content-encoding negotiation for efficiency at transport time, if needed.

Of course, everything that currently works today would still work.

@vbatts
Copy link
Member

vbatts commented Mar 4, 2021

Sitting down to read this issue with a single cup of coffee was not enough 😑

But, I am a huge fan of checksumming only the uncompressed blob, and allowing ourselves the flexibility of choosing whatever compression (or level thereof or flavor of implementation).

When we discussed this ages ago, the concern was for the 3-5x increase in storage for the registry maintainers. (who honestly could compress, chunk, or deflate in many ways); and for breaking existing/old clients

With that in mind, it is worth a quick PoC to show this new media-type served up and the behavior expected for clients that may never update.

Along with a recommendation/methodology for registry maintainers to effectively serve up existing known tarballs whose checksum is bound to the golang gzip to older clients, while allowing new clients to fetch with say +zstd where the checksum of the blob is on the uncompressed tar

@jonjohnsonjr
Copy link
Contributor

Sitting down to read this issue with a single cup of coffee was not enough 😑

😆

Yeah, this would definitely benefit from some more concrete examples.

@sudo-bmitch
Copy link
Contributor

Registries wouldn't need to do anything for clients that started trying to take advantage of this. The benefits really arise when both clients and registries start doing this, but I would expect most registries could support a no-op version of this today without any changes. The client would just be uploading uncompressed blobs in that case.

We have two clients (push vs pull) and a registry as variables here. For pushing an image, I think we can have a graceful fall back. If the client pushing doesn't support it, then we have the situation today with +gzip as the media type. If the registry doesn't implement this, the client doing the push could see that from the headers on the POST and fall back to the older blob media types and include that in the manifest descriptors before uploading the manifest.

The risk of uncompressed transfers is if the client pushing and registry both support this, but the client pulling does not. I'm thinking of three options:

  1. To minimize that risk, clients doing the push could avoid using this functionality on the current image manifests, and initially only use it for new features like the artifact-spec.
  2. Just send uncompressed data. Not ideal, but would encourage many to update.
  3. Have the registry return a different manifest to different clients pulling. This is even less ideal to me because it means clients get a different digest for the "same" image, likely breaking a lot of workflows. And it also adds more work to the registry to reformat these manifests.

I can see implementations starting with option 1 and then switching to option 2 after enough time has passed and clients pushing are switched to use the new method for all pushes.

@jonjohnsonjr
Copy link
Contributor

Stumbled across this in an unrelated thread: golang/go#30829 (comment)

@cben had some thoughts on how this would interact with range requests. I'm not sure I understand the problem, but I think it's relevant :)

@sargun
Copy link
Contributor Author

sargun commented Jun 16, 2021

I think that this post is probably best split into two sections, which:

  1. Downloads
  2. Uploads

But, waiting for my other distribution spec proposal to go through before proposing anything scary.

@ndeloof
Copy link

ndeloof commented Nov 9, 2022

Currently, that means we can download exactly the number of bytes we expect from the registry and hash them to ensure that the content matches what was uploaded before attempting to decompress anything. If we have to decompress the blobs to verify the contents, a malicious registry server could effectively DoS the client.

if a manifest refers to uncompressed blobs, and client donwloads with Content-Encoding negotiation to select the best supported compression, it still can rely on manifest's layer size to constrain the decompress process so it can't exceed expected size for output stream, so a gzip-bomb would be easily blocked.

@sudo-bmitch
Copy link
Contributor

How do we envision this change rolling out through the ecosystem? I'm assuming the fallback is storing and/or transmitting uncompressed blobs. Will this only be used for new use cases (e.g. artifacts) or do we plan to initially roll this out for container images? Will build systems avoid creating images with uncompressed layers by default until after deployed runtimes and registries have been upgraded to support compression on transport?

@ndeloof
Copy link

ndeloof commented Nov 10, 2022

I haven't found any path, including hack-ish ones, to avoid waiting for runtimes and registries to be upgraded :'(
Anyway, while this couldn't have immediate impact for public image distribution, still can be an efficient improvement for private images, where a user/company can enforce use of up-to-date client and registry(ies)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants