This repository has been archived by the owner on Jul 18, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OCI artifact manifest, Phase 1-Reference Types #29
OCI artifact manifest, Phase 1-Reference Types #29
Changes from all commits
67857e6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's been hashed out in the past, but what's to stop us from using the existing image manifest spec and adding only the additional fields needed to enable references? In my mind, this would just be
subjectManifest
as an optional field that would allow registries to begin tracking these relationships on push and use that to implement a references or referrers API.I think that we should at least list out the reasons why we think we need a new manifest type and why we can't use the existing spec. Especially with the conversations going on about another "object manifest" type in the future, I'd really like to keep the number of schemas that registries have to implement to a minimum. Adding fields to the existing spec seems like it might be simpler and reduce the overall churn over the next few years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One major issue is that if any object can have a reference (which is a reverse reference for gc) and also a normal link to a manifest, we can no longer guarantee that there are no cycles in the garbage collection graph, and gc becomes intractable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that the garbage collector will not be using the
reference
link in the manifest?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I 100% agree with this, but I do agree that it becomes more complicated. It's hard to really discuss because there are so many different GC policies.
I don't like that this doc prescribes GC behavior, though, as it limits the power of how this kind of relationship could be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limitation is simply a minimum scoped set of scenarios we're confident we can support until we have the time to work through all the scenarios in #37.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justincormack is there a restriction preventing an ArtifactManifest from referencing an ArtifactManifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. See the example where an SBOM is signed. The SBOM is an
artifact reference type
, and the signature is also anartifact referenceType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just drop this, as it is a vestige of old types. No need to carry this into new versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I get a hell yes! ( to removing it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always wondered how
schemaVersion
relates to the embeddedmediaType
version:application/vnd.oci.artifact.manifest.v1+json
I'd be happy to avoid the duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of everything we see today is
schemaVersion
2. It was just used to differentiate it early on. It's not needed for new manifests where this doesn't matter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just asking for clarity.
There is a difference in the schema of the .json document between
image.manifest
,image.index
andartifact.manifest
What I believe you're saying is the
manifest.mediaType
version provides enough info for how to process the.json
document?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations generally shouldn't be trying to peek inside an artifact in order to understand how to process it. By that point, it's often too late. Instead, content should generally be presented alongside a descriptor, which contains the mediaType of the content.
For a registry, these descriptors are sometimes communicated via HTTP headers by the client on push and the server on pull. Sometimes, the descriptors are embedded within other content, e.g. when pulling a manifest list, you select one of the
manifests
entries (based on some criteria, e.g.platform
) and process what you fetch based on the mediaType in that descriptor.If you try to use the embedded mediaType to process everything, you'll probably end up having to parse content twice. Once to find its mediaType, then again to parse based on the mediaType.
Very old clients didn't send or process HTTP headers sufficiently to behave properly, so this schemaVersion existed to differentiate between schema 1 and schema 2 images. Now that we live in a universe with a nice type system, that shouldn't be necessary.
If you were to add a schemaVersion field to a new kind of artifact, you absolutely don't want to set it to
1
or2
, because this could break old clients that are using schemaVersion to select between two manifest schemas. You could useschemaVersion: 3
, but it's easier to just drop it altogether and rely on mediaTypes and descriptors in order to parse content.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jon,
Based on all this feedback and more understanding for how versioning is processed for the document to safely evolve, I did remove
schemaVersion
from the current artifact.manifest and examplesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedding the
mediaType
directly in the object itself makes future migration challenging or impossible. I know this supports being able to detect the type of the object, but usually this should be done through a specific descriptor path.If we want to have this ability, we should come up with a different name. Using the mediaType to set the content of "THIS" document will be confusing, since we use it in both roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this correlates with the ^ q&a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good thread to tug on, for some background: opencontainers/image-spec#411 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is an artifactType different than a mediaType? Why not just have a mediaType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you determined above this spec reads as it is looking at "this" format (manifest) to be defined by mediaType and artifactType is a sub-type within artifacts registered with iana.. with certain additional rules and content types expected in the blobs (artifact blobs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a second type system then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm may be missing some level of detail to the question here.
There are a few manifests that registries would know how to process. Hopefully, we can converge on a few, and not have too many, but we are adding one more to generalize and add capabilities over the
oci.image.manifest
The registries need to know about these manifest, as defined by the
manifest.mediaType
. At least that's how I read the difference betweenimage.manifest
andimage.index
This decouples the registry from knowing about the specific artifacts, just like a filesystem knows how to store files, but doesn't care about the
file.extension
.artifactTypes
are the means by which Helm, WASM, Notary, SPDX, CycloneDX, Cosign identify themselves. The registry doesn't care from a processing perspective for how to store and retrieve it. But, the registry UX could show icons and details for the type and security scanners can determine how they scan and process the different types of artifacts, as they know what it is, so they know how to scan them. Similarly, each artifact tooling would know if they should continue processing the manifest, or reject it. Just like Word knows how to reject opening a.mp4
file, vs a.docx
file.And, the
/referreres
API can filter by theartifactType
so it could be requested to only returnartifactType=cncf.notary.v2
manifests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Steve here, having a spec that allows any kind of typed object to be put in a field requires registry implementations to become stricter with what they accept compared to today. For example, registries might reject an otherwise well-formed and to spec OCI Image because they don't recognize the config blob's media type, since they need to verify each object is something they know how to deal with in relatively fundamental ways. For distribution, this is as basic as knowing where this object should be located at all, given the digest.
I don't believe ignoring the unknown mediatypes is appropriate for registries accepting pushes either, as this has implications for lifecycle management. For example, I believe some implementations accepted buildkit caches, but since they are OCI Image Index which presumably contained manifests, the associated blobs were deleted during garbage collection. This is pretty opaque to the end user, and I believe that registries which do lifecycle management of manifests and their referenced objects should return
MANIFEST_INVALID
during push if they are not able to maintain the lifecycle of the manifest correctly.Having a broad category of "manifests" and "blobs", which are distinctions that I think most people here understand conventionally, allow registries to have a basic understanding of the level of expectations around an object without having to know the media type beforehand. There are already separate API routes for blobs and manifests, so this distinction exists within the v2 API spec today.
For manifests, this does mean that registry implementations will have to know about the mediatype to read the manifest content correctly since registry implementations are expected to perform actions that require inspecting the manifest, such as validation. However, for an object that's classified as a blob, the registry implementation is safe to make certain assumptions such as, that that object will not reference other objects in a way that's relevant to the registry, and that it should not be parsed as JSON. Having this clarity would allow registries and clients enough wiggle room to work together without having to keep in lock step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deleteriousEffect I generally agree with you, but I don't see how this relates to
artifactType
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self.. need to officially define layer in the distribution spec.. missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I like defining blobs here gut says CAS should own blobs.. and image would do layers is array of blobs.. having ordinality .. over here in artifacts spec.. this would be artifacts or artifactBlobs is array of blobs where ordinality is optional..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
course we could do the same thing by changing objects to CAS objects or something similar..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer something pointing towards their use. Should this be "subjects"? How does it relate to the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be discussing the differences between a manifest, which represents an artifact (container image, helm, wasm, opa, ...) and the content that makes up that artifact, represented as
blobs
These are distinct types. They both use descriptors to define how they're persisted as CAS objects, but they do have different meanings, where the registry, and a client processes a manifest for it's content, but should just store and serve
blobs
(akalayers
)I'm going to defer this one for now, as this concept, as I understand it, is core to the referenceTypes and the work being built atop it.
I'll suggest we continue this conversation as we transition to the artifacts-spec repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just call this
subject
, since there is not reason to restrict to a specific manifest.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that be different than another entry in the
[blobs]
collection?This has been the subject of the cycles and lifecycle management question that revolve around a manifest is the thing we reason about for user interaction, where the blobs have been implementation details the client and registry can optimize around.
I'd like to have more conversation around this to understand what it would mean for an artifact to contain blobs, but link to other blobs, as opposed to focusing on artifacts reference other artifacts (manifests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nishakm see here and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we all recognize adding any content added to registries needs enough info to manage its lifecycle. This includes GC on incomplete uploads or enough information for users to delete their content.
While registries implement GC and deletion management differently, we're just making sure we've provided the right level of information in the manifests to enable lifecycle management and set user expectations for consistency across registries.
For phase 1, we're scoping this tightly so we can see how this evolves over time. For instance, we definitely want to support independent artifacts with
subjectManifest
references that have tags. This is the larger #37 effort that this will evolve into. We're just scoping independent artifacts out for now, as it's not critical to our phase 1 scope for the fall of 2021There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo.. referenctType