Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry - Potential OCI Artifacts user experience #3283

Closed
majastrz opened this issue Jun 19, 2021 · 23 comments
Closed

Registry - Potential OCI Artifacts user experience #3283

majastrz opened this issue Jun 19, 2021 · 23 comments

Comments

@majastrz
Copy link
Member

Goals

This is intended to provide an approximation of the Bicep registry experience IF we chose OCI Artifacts as the package manager and implement the integration. The existence of this issue DOES NOT indicate that OCI was selected as the implementation of the Bicep Registry. (See #2128 for details about other candidates.)

Gallery Experience

TBD

CLI

bicep build

Since Bicep modules can contain references to modules published to an artifact registry, the artifact contents may not exist on the local system. If the Bicep file contains references to external modules, bicep build pulls the referenced artifacts before type checking and code generation and stores them on the local file system. (In the compiler pipeline, the pull step occurs after parsing but before type checking is done.)

By default, the artifact cache is located at %USERPROFILE%\.bicep\artifacts on Windowsand ~/.bicep/artifacts on Linux and Mac. The location can be customized via the BICEP_ARTIFACTS environment variable. (If the language server and CLI tools are pointed to different artifact cache paths, they will each download a separate copy of each referenced artifact.)

Note: This is a custom mechanism that we will have to build ourselves. It will have to account for possibility of concurrent pull operations.

Open Question: Do we need to expose a setting for the artifact cache path in bicepconfig.json like NuGet does?

bicep pull

In certain usecases (Docker and some CI systems), the restore and build operations need to be separated. This can be accomplished as follows:

  1. bicep pull
  2. bicep build --no-pull main.bicep

Open Question: What verb do we use for this command? (NPM and Go use install. .Net/MSBuild use restore. Docker/OCI use pull.)

Open Question: Docker has been aggressively throttling anonymous artifact/image pull requests since late 2020. Do we need to support authentication even for public registries as a result.

Reference module from an artifact

To reference a module from a package, the opens a new or existing Bicep file in VS code and types one or more declarations like the following:

module mod 'majastrzoci.azurecr.io/bicep/modules/test:0.1-alpha' = {
  ...
}

OCI does not have a concept of a package feed like NuGet does. Instead, references are made directly via a URI. The URI has the following components:

  • majastrzoci.azurecr.io - URI of the registry
  • bicep/modules/test - Identifies the repository. A repository is a collection of artifacts with the same name but different tags.
  • 0.1-alpha - Identifies the artifact tag, which is similar to a package version in NuGet. (However, the same image/artifact content may have multiple tags or may be untagged.)

See https://docs.microsoft.com/en-us/azure/container-registry/container-registry-concepts for more details.

Open Question: This example covers the most minimal syntax we could have. The syntax proposals for referencing external modules are tracked in #3186.

OCI also allows referencing artifacts by their SHA256 digest. The syntax for this would look like the following:

module mod 'majastrzoci.azurecr.io/bicep/modules/test@sha256:07601524fed07e648d40018070ea8927acb9d2bc695e9ebd3566ac113b98dda9' = {
  ...
}

Open Question: Should we even support references by digest? In otherwords, should we allow references to untagged artifacts?

Open Question: Even if we decide to support digest references, do we need to offer completions for digests?

No mechanism exists to enumerate all possible registry URIs, which means we will NOT be able to provide completions for the registry URI portion of the module type string. Once the registry URI is typed in (and auth is setup), we will be able to provide completions for the image name and tags.

Open Question: The lack of full completion experience suggests that we should develop a custom mechanism to configure the list of registries separately from the .bicep file and reference them by alias.

If the current module has any external module references, the language server queues up a pull action in the background. The background process will check the package cache for the artifact name and tag. If missing, the package will be downloaded from the registry.

The background pull operation does not happen instantly. Until the artifact is in the local artifact cache, accurate type information is not available:

  • The language server falls back to the widest possible type for the given context. (any or object)
  • A warning is shown on the module ref indicating that the artifact is not yet available.
  • Property name and property access completions involving this particular module are unavailable.
  • The language server remains otherwise functional.

If/when restore fails:

  • An error is shown on the module ref explaing the type of failure.
  • The language server remains otherwise functional.

Once the background pull operation is finished successfully, the language server recompiles the module to make use of the downloaded type info (we can reuse the parse tree). Property name and property access completions are now available for the module.

Open Question: Where do we expose the OCI logs? In VS code "Output" pane under "Bicep Package Manager"?

Setup private registry

Several registries support OCI artifacts. A good list is available at https://oras.land/implementors/. An Azure Container Registry can be easily created via the Azure Portal or a .bicep file.

Open Question: How will the auth work for registries? Is there a standard extensibility model or do we need to custom build it?

Setup local registry

Using a local docker installation, you can run the registry image:

docker run -it --rm -p 5000:5000 registry

This makes a non-persistent registry available at localhost:5000.

Create package

OCI artifacts are either pushed from local file system to a registry or pulled from the registry down to the local file system. (It is also possible to transfer artifacts between registries, but it's less relevant here.) There is no "package" concept that is a single-file representation of all the layers in an OCI artifact that would be equivalent to a .nupkg in NuGet or similar in other package managers.

Push artifact to registry

An artifact can be pushed to the registry via the bicep push <file> command. The command performs the following operations:

  • Pull referenced artifacts to populate the local cache (can opt-out with --no-pull)
  • Validate module (including configured linter rules)
  • Prepare artifact content for publishing in a temp location
  • Push to the registry

The artifact contents will be as follows:

  • The main module.
  • The local modules referenced by the main module.
  • External modules referenced by the main and local modules become dependencies of the artifact.

A more detailed discussion about package contents is tracked in #3266. The discussion of module metadata lives in #3187.

Open Question: How do we express the dependency on other artifacts? Do we implement our own custom metadata to annotate dependencies or do we lean into the layering capabilities of OCI? (The former has implications to artifact pull latency.)

Open Question: OCI is completely generic and requires only artifact name and tag. What other metadata do we need?

Open Question: How to encode min/max Bicep version in the artifact? Can we add our own custom annotations to the manifest?

Sign artifact

TBD

@majastrz majastrz added discussion This is a discussion issue and not a change proposal. story: registry labels Jun 19, 2021
@ghost ghost added the Needs: Triage 🔍 label Jun 19, 2021
@majastrz
Copy link
Member Author

This will be updated in the future after discussions with OCI SMEs.

@stan-sz
Copy link
Contributor

stan-sz commented Jun 21, 2021

OCI does not have a concept of a package feed like NuGet does. Instead, references are made directly via a URI. The URI has the following components:

Imagine an issue of importing a bicep module from one ACR to another (e.g. airgapped). The problem is that if the references to the dependent modules are hardcoded then simply migrating is not enough, as also the contents need to be updated. We are facing a similar problem with helm charts and the problem is well described at https://stevelasker.blog/2020/10/21/is-it-time-to-change-default-registry-references/ and a discussion is at opencontainers/artifacts#29. @majastrz - can you think of a way to make the ACR hostname configurable (through command line or env variable) so bicep modules are transferrable between registries?

Bonus question: how one can transfer a bicep module with all dependent modules from one registry to another without the need to discover all dependent modules?

@majastrz
Copy link
Member Author

Imagine an issue of importing a bicep module from one ACR to another (e.g. airgapped). The problem is that if the references to the dependent modules are hardcoded then simply migrating is not enough, as also the contents need to be updated. We are facing a similar problem with helm charts and the problem is well described at https://stevelasker.blog/2020/10/21/is-it-time-to-change-default-registry-references/ and a discussion is at opencontainers/artifacts#29. @majastrz Marcin Jastrzebski FTE - can you think of a way to make the ACR hostname configurable (through command line or env variable) so bicep modules are transferrable between registries?

This is a great point. To support these scenarios, we would need to separate the registry URI from the Bicep files themselves. This is similar to how NuGet configures sources via the NuGet.config file. If we proceed with OCI as the registry implementation, then we could implement a similar mechanism. We already have a bicepconfig.json that could be extended for this purpose or we could introduce a new file. Regardless, it would also require the ability to override via command line or env vars (could work like the nuget sources command).

My preference is for a config file based approach (with cmd overrides) instead of a purely cmd-based solution to make it possible for the language server to consume this information and provide a good completion experience for external modules.

Question: Do you always point to a single ACR? Or are there ever cases where you're pulling images/artifacts from multiple ACRs?

Bonus question: how one can transfer a bicep module with all dependent modules from one registry to another without the need to discover all dependent modules?

The idiomatic OCI method of dealing with dependencies appears to be to build them into the artifact at bicep push time as separate file layers. (It's a variation on option 2 in #3266.) This way you could just transfer the artifact to a different registry and wouldn't need to worry about dependencies. Similarly bicep pull latency would likely improve because we'd only need to pull 1 artifact per external module reference without having to do secondary pulls to obtain the closures of all the dependencies.

@stan-sz
Copy link
Contributor

stan-sz commented Jun 22, 2021

Question: Do you always point to a single ACR? Or are there ever cases where you're pulling images/artifacts from multiple ACRs?

To avoid disruption or uncontrolled variation of the dependencies (e.g. tag update), we pull all dependencies into a test ACR first and deploy to test environments from there. Later in the release pipeline import the exact same versions to production ACR and run deploy to production environments from there. The ultimate test if all dependencies have been correctly captured and imported to production ACR is an isolated environment, where nothing can be pulled from outside.

@majastrz
Copy link
Member Author

If most users would end up with a single registry, then maybe we should have a concept of a "default" registry. Then, the references could become my/amazing/artifact:myTag instead of myregistry.azurecr.io/my/amazing/artifact:myTag or myRegistryAlias:my/amazing/artifact:myTag.

@stan-sz
Copy link
Contributor

stan-sz commented Jun 25, 2021

Fair point, the "default" registry could be the current registry, while still have the ability to pull other modules from public sources within the same template.
I suggest reaching out to @SteveLasker for guidance on designing OCI artifact handling.

@majastrz
Copy link
Member Author

Yeah, we'd support the default and additional registries and ensure the references in the bicep file are/can be registry uri agnostic.

@rouke-broersma
Copy link

I think there is a danger in default registries in that where the image comes from is abstracted away so it is not immediately obvious. What if you forget to set your default registry? Suddenly you might be downloading your packages from 'a public default' registry instead of your private registry. This helps create supply chain attacks which we've seen a lot of reports on lately.

If the option of a default registry gets added there should not be any 'default for the default' imo. And I think it should still be allowed to be explicit about your registry location if you wish.

@majastrz
Copy link
Member Author

@SteveLasker given your article at https://stevelasker.blog/2020/10/21/is-it-time-to-change-default-registry-references/, I'd love to hear/see your thoughts on how we should approach referencing Bicep modules from an OCI registry 😊

@SteveLasker
Copy link

@rouke-broersma

think there is a danger in default registries in that where the image comes from is abstracted away so it is not immediately obvious. What if you forget to set your default registry?

There's a bunch of lessons to learn from various registry attempts.
If there's a true, single public registry (docker hub, npm, nuget), then maybe a default can make sense. But, I'd say we've learned enough to say even a default should be explicit. Meaning, all package manager clients should have a config. If the config happens to default to the "single" public registry, then it's actually explicitly mapped. And, the output would indicate as such.
For instance, if a user were to run thingthang pull mypackage:v1, and the default registry was pacakges.registry.io, the output would generate: pulling pacakges.registry.io/mypackage:v1

The other thing I think we've learned is never, never, ever, not ever or never, have a search path. Where you can provide a list of registries you should look for. This leads to the registry squatting attack.

@majastrz I did do a bit more thoughts on deterministic mappings: https://github.com/SteveLasker/drafts/blob/main/registry-repo-config.md

The idea here is you can configure a registry, you can define deterministic mappings. And, you can use named parameters.

Trying to change an existing toolchain is a challenge (like the container toolchains), although we should try.
For new clients, like bicep, I'd really like to iterate on some ideas with the newly donated to CNCF oras project. If folks are interested, I'd be happy to help with iterating ideas in the oras cli and libraries to enable configuration. If it works for bicep, we can enable this for all ORAS based clients (Helm, WASM, OPA, ...).

@SteveLasker
Copy link

FWIW, I created this discussion topic: Registry Configurations #6 to continue thoughts across multiple artifact types.

@majastrz
Copy link
Member Author

majastrz commented Jul 29, 2021

Thanks @SteveLasker! I definitely agree on the these points:

  • We will definitely need config (it will likely end up in bicepconfig.json).
  • No search paths of any kind
  • Deterministic and explicit mappings

I took a look at https://github.com/SteveLasker/drafts/blob/main/registry-repo-config.md. I was definitely not considering the need to redirect to a different "path" in a particular registry, but that will be necessary if a smaller registry is being replicated to a larger one or one with a different naming convention.

I'm not yet convinced that we need variables in the first iteration of the Bicep registry, but we should not make any decisions that prevent us from adding them later on. I'll write up a proposal of what this would look like and post it in this issue, so we can discuss.

@majastrz
Copy link
Member Author

majastrz commented Jul 30, 2021

Config file

We can add a new registries section to the existing bicepconfig.json schema.

{
  "registries": {
    "aliases": {
      "public": {
        "uri": "mcr.microsoft.com"
      },
      "private": {
        "uri": "example.azurecr.io"
      },
      "privateWithPath": {
        "uri": "example.azurecr.io/hello/there"
      }
    }
  }
}

When the registries section is missing or if the entire config file is missing, we would assume no registries are configured. public, private, and privateWithPath strings in the example above are just examples and have no special meaning. Any arbitrary string can be an alias.

The JSON language service would provide completions for all the elements of the config file (except URIs, of couse 🙂).

Module references

Assuming the above bicepconfig.json, here's how module references would work:

// pulls example.azurecr.io/bicep/modules/myAmazingModule:0.1-alpha
module mod 'oci::private:bicep/modules/myAmazingModule:0.1-alpha' = {
  ...
}

// pulls mcr.microsoft.com/bicep/modules/role-assignment:1.42
module mod 'oci::public:bicep/modules/role-assignment:1.42' = {
  ...
}

// pulls example.azurecr.io/hello/there/something/else:1.0
module mod 'oci::privateWithPath:something/else:1.0' = {
  ...
}

In addition to making the Bicep source agnostic to the registry URI, the above reference string syntax allows us to provide completions for all segments. (Alias completions would come from the config file. OCI repo and tag completions for private registries would come from ACR APIs. Completions for the public registry - TBD.)

To help with prototyping, we will also allow a direct URI syntax like the following:

module mod 'oci::mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha' =
{
   ...
}

CLI support

To allow pulling from a different registry Uri without modifying the .bicep source, the user could modify the bicepconfig.json and rebuild. However, we could also expose the ability to override the value via CLI arguments to bicep build or bicep restore. This could look like this:
bicep restore --registry-alias public=my-replica-of-mcr.azurecr.io

Thoughts?

@SteveLasker
Copy link

@majastrz,
sorry for the delayed response. I’m out on vacation with limited internet access, back ~august 10th.
What you have looks good, if I understand it correctly, as it it provides deterministic mappings. The primary thing to avoid, which I believe you have, is any sort of fall through search paths that allows someone to squat a name on a registry in the beginning of the list of registries.
I wasn’t sure about the public, vs private references. Are those just arbitrary names, or was it assumed public=mcr.microsoft.com?

Just to possibly anticipate the question as i’m not sure when I’ll be back online, the theory is there is not “one” public registry. Content can come from anyplace, as a company may have their corporate registry (registry.acme-rockets.io/corp/some-package:v1) which is their “public” and each team, or sub-division of the company may have theirs.

@rouke-broersma
Copy link

rouke-broersma commented Jul 30, 2021

I think uri's should be allowed instead of aliases in the module reference. I don't see the harm in giving the choice. Say I'm prototyping and trying out an external module. Now I have to create the Bicep.json only because I need to reference the registry. To make it easier to distinguish between aliases and uris we could for example reserve the 'special' alias uri:

module mod 'oci::uri:mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha' = { ... }

@majastrz
Copy link
Member Author

@majastrz,
sorry for the delayed response. I’m out on vacation with limited internet access, back ~august 10th.
What you have looks good, if I understand it correctly, as it it provides deterministic mappings. The primary thing to avoid, which I believe you have, is any sort of fall through search paths that allows someone to squat a name on a registry in the beginning of the list of registries.

@SteveLasker Thanks for taking a look (especially during your time off)!

I wasn’t sure about the public, vs private references. Are those just arbitrary names, or was it assumed public=mcr.microsoft.com?

Yup, they'd just be arbitrary names. No special meaning. I updated the proposal above to clarify as well.

Just to possibly anticipate the question as i’m not sure when I’ll be back online, the theory is there is not “one” public registry. Content can come from anyplace, as a company may have their corporate registry (registry.acme-rockets.io/corp/some-package:v1) which is their “public” and each team, or sub-division of the company may have theirs.

Yeah, no restrictions on the number of public registries. I guess the only real difference is whether you need to auth and if your identity has permissions to pull artifacts (and list artifacts to power completions when authoring). Although even that line gets blurry with the move to block anynomous requests on some registries like docker hub.

I think uri's should be allowed instead of aliases in the module reference. I don't see the harm in giving the choice. Say I'm prototyping and trying out an external module. Now I have to create the Bicep.json only because I need to reference the registry. To make it easier to distinguish between aliases and uris we could for example reserve the 'special' alias uri:

module mod 'oci::uri:mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha' = { ... }

@rouke-broersma Yeah I think that makes sense as well. I updated the proposal above to add a similar syntax to the above. The final syntax should be similar - I'm ignoring any issues with parsing ambiguities right now.

@majastrz
Copy link
Member Author

majastrz commented Aug 9, 2021

Team discussion notes 8/9/2021

  • There is a valid concern that requiring bicepconfig.json for any interactions with the registry is adding friction and will complicate any samples that we post in the Bicep repo or MS docs. We feel that a good compromise will be to assume the default "public" registry if the bicepconfig.json file does not have the registries section filled in. If the registries section is present, the contents fully replace the list of configured registries (so you can remove or reconfigure the public registry if that is what you intend).
  • The default bicepconfig.json should not use public as an alias to the MCR because there may be multiple "public" registries in the future. Something more specific will be needed. Possible alternatives: mcr, ms
  • We feel that using oci as the prefix in module refs like oci::mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha is not the best choice. In most cases, the users won't really need to know that OCI is the underlying technology. Searching for "oci" online doesn't really produce anything relevant (and clashes with another cloud provider). Most users will typically interact with it via bicep push, bicep restore and by deploying an ACR. OCI is just an implementation detail. We should explore other options here.

The alternatives for the prefix would look like this:

  • registry::mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha
  • reg::mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha
  • bmr::mcr.microsoft.com/bicep/modules/myAmazingModule:0.1-alpha

Thoughts?

@rouke-broersma
Copy link

rouke-broersma commented Aug 10, 2021

I would personally say that the bicepconfig.json should only be used for aliases and that there should simply never be any 'default' alias. Which would mean that if the alias is not set, the full oci url needs to be referenced or the alias has to be passed through the cli. No magic referencing at all even for the from your standpoint 'default' of mcr.

The reason is that this would still be a 'fallthrough search path' which can lead to supply chain attacks based on name-squatting. This is not a theoretical attack, it happens with other registry infrastructure right now.

@SteveLasker

@matsest
Copy link
Contributor

matsest commented Aug 11, 2021

To allow Bicep code to be easily configured/shared, it would be very good to have an alias type of functionality to have 'variable' config for the registries used.

I also agree that having a bicepConfig.json requirement for managing aliases will add some friction to the user experience.

If the Bicep code (.bicep) uses aliases for registries and does not contain the actual references to the registries, I think sharing source code (also those that does not use the default registry) and examples will become harder than necessary and minimize the insight into which repository is used. With that said, having the opportunity to easily set an alias can be handy for both readability and maintaining code.

Is it feasible to contain the reference to the registries within the .bicep files, but not verbosely on each reference? E.g. something like:

registryAlias myReg = 'example.azurecr.io'

module mod 'bmr::myReg:bicep/modules/myAmazingModule:0.1-alpha' = {
   ...
}

which could be overwritable with cli flags:

bicep [subcommand] --registry-alias myRegAlias=example.azurecr.io

@majastrz
Copy link
Member Author

@matsest scroll up to #3283 (comment). I have proposal above that covers something like that but with alias config stored in bicepconfig.json.

@majastrz
Copy link
Member Author

After a team discussion, we have decided to use the following verbs for the CLI command:

  • bicep restore - this will be the command to get modules from the registry into the local file system (automatically done by bicep build unless opted out)
  • bicep publish - this will be the command to publish modules to the registry

@majastrz
Copy link
Member Author

@matsest If we allowed variables to be used in type strings (via constant folding), the syntax would have to be something like 'bmr::${myReg}/modules/myAmazingModule:0.1-alpha' to be consistent with other elements of the language and to make tooling work as expected as well.

@alex-frankel
Copy link
Collaborator

Closing since this has been fully implemented

@ghost ghost locked as resolved and limited conversation to collaborators May 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants