-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taming media types #791
Comments
aww dang. I your comment in an open tab for too long and lost track of
follow-up. Thanks for opening a new thread.
Funny, I just chatted with @lumjjb and posed the question about the
multiple '+' suffix.
As for tackling the `+zstd` that I recently helped merge, I am not sure
that waiting on IANA is _entirely_ required? surely there is a way to
declare locally defined addendum to those suffixes?
|
Right, I'm not proposing we revert that or anything, but it would be nice to follow the existing standard so that we can just adopt their semantics, then upstream extensions to the suffix registry if they seem reasonable (so clients that aren't OCI-aware can possibly interoperate with OCI registries). I'm imagining a better way to consume these strings, here's a strawman: type OCIMediaType struct {
// Maybe "vnd.oci" is parsed out?
// * vnd.oci.descriptor
// * vnd.oci.layout.header
// * vnd.oci.image.manifest
// * vnd.oci.image.index
// * vnd.oci.image.config
Type string
// Usually "v1"
Version string
// * json
// * tar
Encoding string
// * gzip
// * zstd
Compression string
// See https://github.com/opencontainers/image-spec/issues/747
Encryption string
// Self explanatory
Distributable bool
}
func ParseMediaType(s string) OCIMediaType {
// an implementation that takes advantage of the rfc6838 grammar
} This way, if I want to add zstd support, I can just check: if mt.Compression == "zstd" {} rather than: if mt == v1.MediaTypeImageLayerZstd || mt == v1.MediaTypeImageLayerNonDistributableZstd {} Similarly, the most obvious way to test if something is a layer is some form of: if mt == v1.MediaTypeImageLayer || mt == v1.MediaTypeImageLayerGzip || mt == v1.MediaTypeImageLayerZstd || mt == v1.MediaTypeImageLayerNonDistributable || mt == v1.MediaTypeImageLayerNonDistributableGzip || mt == v1.MediaTypeImageLayerNonDistributableZstd {
// yep it's a layer
} which feels silly. I think doing this is only really possible if we have a well defined grammar in the first place. Since once already exists, we should probably use it 😄 |
OHMAN
On 12/09/19 12:00 -0700, jonjohnsonjr wrote:
This way, if I want to add zstd support, I can just check:
```go
if mt.Compression == "zstd" {}
```
I agree that is nice and cleaner
I think doing this is only really possible if we have a well defined grammar in the first place. Since once already exists, we should probably use it 😄
:-D
|
Agreed. It also looks like the intention is not to have multiple suffixes since it represents an encoding... I will change away from the
And |
@jonjohnsonjr is ordering important when applying them though, for example from an unwrapping perspective. Not sure if that is defined as part of that specification. For example How I am considering parsing is just to split it into a base and array of extensions. Those extensions could be typed but I don't think they should be flattened into a struct. |
@dmcgowan definitely, which is why I'm nervous about having multiple suffixes. I'm consdering something like this: func main() {
mt := "application/vnd.oci.image.layer.nondistributable.v1.tar+gzip+enc"
// OCIMediaType {
// Type: "application/vnd.oci.image.layer.nondistributable.v1.tar+gzip",
// Encoding: "enc",
// }
parsed := ParseMediaType(mt)
// OCIMediaType {
// Type: "application/vnd.oci.image.layer.nondistributable.v1.tar",
// Encoding: "gzip",
// }
parsed = ParseMediaType(parsed.Type)
// OCIMediaType {
// Type: "application/vnd.oci.image.layer.nondistributable.v1",
// Encoding: "tar",
// }
parsed = ParseMediaType(parsed.Type)
} Clients could progressively unwrap these types until they hit the encoding they care about or one of the concrete mediatypes:
I'm also kind of frustrated with how nondistributable layers are expressed... it doesn't really fit into this model. You can't pull it off the end like compression or encryption :/ Encryption/versions complicate things as well. Right now it's not too bad to treat
Makes sense, so something like this? type OCIMediaType struct {
// * application/vnd.oci.descriptor.v1.json
// * application/vnd.oci.layout.header.v1.json
// * application/vnd.oci.image.manifest.v1.json
// * application/vnd.oci.image.index.v1.json
// * application/vnd.oci.image.config.v1.json
// * application/vnd.oci.image.layer.nondistributable.v1.tar
// * application/vnd.oci.image.layer.v1.tar
Type string
// * ["gzip"]
// * ["enc", "gzip"]
Extensions []string
} I guess it doesn't really matter what the implementation is. As long as we have a well defined ordering of extensions, clients can do whatever they want. |
I'm also a bit worried that things like opencontainers/artifacts#11 gaining traction could cement us in a bad spot if we aren't careful. Do these all fit in our world?
For example, Also, the singularity layer spelled |
Base on the regexp from https://tools.ietf.org/html/rfc6838#section-4.2, it
seems like having multiple "+"s is not illegal but only the last one is
considered the structured syntax suffix based on the spec.
|
Why does sif end in layer.tar? A sif image could conceptually be one layer, but it's definitely not a tar. Is the idea that you have to tar the image up and extract the SIF on some filesystem? The spec does have a version and if you look at the build it also can be grabbed from the git commit - but I'm not sure that's exposed beyond using the sif tool to inspect the header, and afaik it hasn't been changed. I'm not sure who wrote that (or why it has spelling mistakes!) but @ikaneshiro would be best to ask. |
This makes sense, the last one should be the format of the payload and trimming the suffix should yield the unwrapped payload's media type. |
@jonjohnsonjr Do you think the artifacts repo should provide guidance on the media types? The ship has kind of sailed on the v1 OCI types (I agree nondistributable says nothing about the content, it should have been an annotation). Also everyone doesn't need a custom media type for everything, the reason layers aren't just "application/tar" is because the tar must be processed in a specific way to handle whiteouts. As for "+tar", I'm not sure that is completely wrong if the type is just defining how to process a set of files, but not sure why they did something inconsistent. |
Definitely, but they're going to look to image-spec for inspiration, and we should have a good answer. We could document the reasoning behind the current media types so that they don't just cargo cult what's here without thinking about why. I think a small grammar on top of rfc6838 would go a long way towards keeping things consistent and usable.
I kind of agree, but I suspect there's not that much relevant adoption, especially since the distribution spec isn't really finalized. Regardless, we should fix this if we ever bump to
Right, I don't think it's "wrong", but it would be much nicer if it were consistent 😄 |
If you're asking for a standard for how different compression formats should be applied to layers, regardless of type. That sounds like greatness. And, yes, I would really like to better understand the reasoning for
I think we're talking about how tools for a specific artifact types handle their layers vs. how a registry handles multiple artifact types. For each artifact type, including an OCI image, there are a set of valid layer types. If someone introduces new layer types, how would the various tools know how to parse those layers? But, I also think we should recognize that there are different "types" being pushed to registries. "docker" images, which should equate to OCI Images have a set of known types. The image-spec speaks to the specific layers the docker/oci image format support. Singularity images are unique to the tooling around singularity, and shouldn't have to follow the conventions for an OCI Image. As @vsoch suggests, they shouldn't have to convert their format just to match the docker/oci image format. They can use their native .sif format. That was the big value for them leveraging OCI Artifacts. They no longer have to impersonate a docker image. Same with Helm, OPA, Marky Mark and who knows what else gets added. Each have their own sandbox to play in. While registries can focus on their value. Take the security scanning scenario. I have a registry that I want to secure. A security scanning solution will scan all the artifacts. If it knows how to scan a docker/oci image, it can report on its results as a known state. When it encounters a Helm chart, or Singularity image, it shouldn't report an error, rather it either knows how to scan it or not. It knows whether it can scan it because of its The purpose of OCI Artifacts is to formalize what's already happening. This allows registries to do what they do best, authenticate, accept, store, serve stuff. While, artifact authors manage their types, and tools around their types know how to work with them. This is why we're proposing well-known-types where artifact authors submit their types. In this case the misspelling could get caught before becoming "well known" For the docker/oci image, if we think all tools should support a new type, they would submit a change to the |
@jonjohnsonjr @vsoch I'll move the discussion around that to https://github.com/sylabs/singularity/issues/4437 though. Don't want to detract from this more general discussion. |
Thank you @ikaneshiro ! |
I put this on the agenda for tomorrow's weekly dev meeting. If we can't get enough people there tomorrow to discuss we can push it out a week too. |
i will not be able to make the call, but I feel the discussion is on the
right track. I do like the way Jon laid out an interface for it, but since
order matters, this feels like it'll just be a huge switch statement that
is comparing whole strings.
…On Tue, Sep 17, 2019 at 8:07 PM Derek McGowan ***@***.***> wrote:
I put this on the agenda for tomorrow's weekly dev meeting. If we can't
get enough people there tomorrow to discuss we can push it out a week too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#791?email_source=notifications&email_token=AAAQL2LGLVMRDKOWYNUXEJDQKEME5A5CNFSM4IWHWHX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD65M3DI#issuecomment-532336013>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQL2NHO7MK6F4YQY55IT3QKEME5ANCNFSM4IWHWHXQ>
.
|
I'd like more clarification somewhere on valid mediatypes (maybe that is the artifacts repo?) .. the Helm mediatypes are using |
I'd like to see scenarios drawn out for how the +encrypted/encoded layer type suffix would work from build to snapshot for the community at large, before we say yeah let's support a generic media type suffix. Has that been done yet? It would make more sense to me if it was say +multipart/encrypted with a first portion of the binary providing the necessary detail for decryption and/or +type.encrypted where type indicates which of a plurality of generally supported encryption algorithms for existing and new encryption types possibly specified over here iana but where we could also add more as needed. |
Yup. @stefanberger, @dmcgowan worked on this together with the new stream plugin from @crosbymichael. There is also an implementation in |
I see code, but I don't see soup to nuts scenario descriptions. I think I remember seeing some decks where scenarios were talked about in general but don't see it in the links. Thinking about this from a spec perspective. I don't see in the code, yet, how you handle all the various encryption types, is there a plain text header in the layer, or some magic number? |
Posting back here, @estesp kindly took some notes in the OCI call: https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#Notes1 Some action items from @dmcgowan
|
@jonjohnsonjr it looks like action item 2 is already in WIP: opencontainers/artifacts#6 There seems to be alot of examples of using |
I think we don't need to require that a compression format be a part of RFC6838, but rather all maintainers should agreed that the format should be added and under what name. This will avoid registry authors from supporting every type of compression under the sun and if there is a format that would be valuable to add, we don't have to wait for anyone else to be able to add it to the standard. We should keep artifacts discussion on that GitHub repository's issue tracker and scope this discussion purely to application of our spec to containers. |
Something I just ran into re: nondistributable layers... this mostly involves distribution, but most of the relevant conversation has been happening in this repo. This PR says that unrecognized media types must be ignored: #759 Which was the result of this issue to ensure flexibility in media types: opencontainers/distribution-spec#55 So I'm wondering, from a distribution perspective, what it means to ignore something? My first impression was to just completely ignore it by not even validating that the descriptor references a blob that has actually been uploaded, but I think that's probably unhelpful behavior. Nondistributable media types seem like an exception that proves the rule here; i.e. that any non-manifest should be expected to reference an uploaded blob, unless it's a nondistributable media type. That thought reminded me of this thread, where we're throwing in the towel on nondistributable media types, but I think that's a bad idea. It would be great if we could define an annotation for nondistributability such that non-layer media types can also be outside of the blobstore, e.g. for directly referencing blobs on GitHub instead of within GCR (inspired by discussion with @vbatts). WDYT @dmcgowan |
FWIW this is no longer true 🎉 |
This is a real problem, not just for me: moby/buildkit#1879 cc @tonistiigi come share in my pain |
Posted this in #787 originally but noticed it was closed, so I doubt anyone would ever see it.
Do we care if OCI media types are valid RFC6838 media types? E.g. "zstd" isn't defined in the Structured Syntax Suffix Registry.This ties into #747 as well. I'm not sure if the
+enc
double-suffix is valid (though I mostly skimmed that RFC and might misunderstand the grammar described).I was thinking about how I'd go about adding support for zstd in various places, and I'm a bit worried that, as we add more dimensions to the media types, we're going to end up with a combinatoric explosion of valid media type strings. As a registry/client author, it's currently easy enough to hard-code these strings and special-case e.g. nondistributable layers or the compression algorithm; however, I can imagine that comparing hard-coded strings will become sub-optimal at some point in the future, and we'd be better off parsing these strings into a structured form.
cc @vbatts @lumjjb @stevvooe
The text was updated successfully, but these errors were encountered: