Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(retention): added image retention policies #1866

Merged
merged 1 commit into from
Nov 1, 2023

Conversation

eusebiu-constantin-petu-dbk
Copy link
Collaborator

feat(metaDB): add more image statistics info

What type of PR is this?

Which issue does this PR fix:

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

Automation added to e2e:

Will this break upgrades or downgrades?

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@rchincha
Copy link
Contributor

Also call this "feat(retention)"

@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from 7938a52 to 318aeff Compare September 29, 2023 13:48
@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk changed the title feat(gc): added image retention policies feat(retention): added image retention policies Sep 29, 2023
@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from 371c6cd to 04143f1 Compare September 29, 2023 15:45
@rchincha
Copy link
Contributor

sync policies (are per-repo, regex, semver, etc) are a good fit here IMO. Thoughts?

@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 6 times, most recently from c8506fb to 0806c32 Compare October 2, 2023 13:33
Copy link
Contributor

@andaaron andaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are interested in both tags which would be retained and tags which would be removed, in case of dry run.
Can we show both? Are we showing both?

examples/config-sync-localhost.json Outdated Show resolved Hide resolved
pkg/api/config/config.go Outdated Show resolved Hide resolved
pkg/api/controller.go Outdated Show resolved Hide resolved
pkg/cli/server/root.go Outdated Show resolved Hide resolved
pkg/cli/server/stress_test.go Outdated Show resolved Hide resolved
pkg/storage/gc/gc_test.go Outdated Show resolved Hide resolved
pkg/storage/gc/policy.go Outdated Show resolved Hide resolved
pkg/storage/gc/retention/rules/latestpull/latestpull.go Outdated Show resolved Hide resolved
pkg/storage/gc/retention/rules/dayspush/dayspush.go Outdated Show resolved Hide resolved
pkg/storage/gc/retention/rules/retainall/retainall.go Outdated Show resolved Hide resolved
@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from 8b45eae to e3f7742 Compare October 2, 2023 15:25
@codecov
Copy link

codecov bot commented Oct 2, 2023

Codecov Report

Merging #1866 (782170b) into main (3e6053e) will increase coverage by 0.11%.
The diff coverage is 94.56%.

@@            Coverage Diff             @@
##             main    #1866      +/-   ##
==========================================
+ Coverage   89.87%   89.98%   +0.11%     
==========================================
  Files         158      164       +6     
  Lines       26978    27542     +564     
==========================================
+ Hits        24246    24785     +539     
- Misses       2012     2031      +19     
- Partials      720      726       +6     
Files Coverage Δ
errors/errors.go 100.00% <ø> (ø)
pkg/api/config/config.go 97.72% <100.00%> (+0.37%) ⬆️
pkg/api/routes.go 94.50% <100.00%> (ø)
pkg/cli/server/validate_sync_enabled.go 100.00% <100.00%> (ø)
pkg/extensions/sync/local.go 91.28% <100.00%> (ø)
pkg/extensions/sync/references/cosign.go 97.03% <100.00%> (ø)
pkg/extensions/sync/references/oci.go 91.66% <100.00%> (ø)
pkg/extensions/sync/references/oras.go 85.83% <100.00%> (+0.11%) ⬆️
pkg/extensions/sync/references/referrers_tag.go 89.52% <100.00%> (ø)
pkg/log/log.go 85.18% <100.00%> (+0.56%) ⬆️
... and 18 more

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from 0f3519c to 97fb79c Compare October 3, 2023 13:21
"deleteUntagged": true,
"tagsRetention": [{
"names": ["v2.*", "*-prod"],
"retainAlways": true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retains all tags, is true by default, and can not be used together with the other rules.

"deleteReferrers": true,
"deleteUntagged": true,
"tagsRetention": [{
"mostRecentlyPushedCount": 10,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag is retained if any of these 4 rules is met.

because names are missing, then it will be applied for all tags in matched repos.
this will be applied for all repos that do not match the above policies (can be used like a default policy)

"deleteReferrers": false,
"deleteUntagged": true,
"tagsRetention": [{
"names": ["v2.*", "*-prod"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any tag that doesn't match any tagRetention[].names will be removed!
if names rule is empty, then the rules below are applied for all tags.

"policies": [
{
"repoNames": ["infra/*", "prod/*"],
"deleteReferrers": false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether or not to delete manifests with missing subjects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you push an referrer but don't push the subject, then this policy kicks in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they are older than retention.delay, yes

{
"repoNames": ["infra/*", "prod/*"],
"deleteReferrers": false,
"deleteUntagged": true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether or not delete manifests without tags.

"dryRun": false,
"policies": [
{
"repoNames": ["infra/*", "prod/*"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the policy applied on a repo is the first one in the list that matches
same for tagsRetention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not longest prefix match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I raised that last week, can't remember why we didn't want that...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would be very tricky to get the configuration right as a user if the entries in the policy list can be in any order.
You would need to look at potentially hundreds such entries in any order to figure out which one would apply for the repo you are interested in.

If the first one matches you don't lose anything functionality-wise.
You can place your longest prefix match first, and your shorter prefix match afterwards and get the same result you would have gotten if we matched always on longest prefix.

examples/config-gc-periodic.json Show resolved Hide resolved
examples/config-gc.json Outdated Show resolved Hide resolved
examples/config-gc.json Outdated Show resolved Hide resolved
examples/config-gc.json Outdated Show resolved Hide resolved
examples/config-gc.json Outdated Show resolved Hide resolved
pkg/api/config/config.go Outdated Show resolved Hide resolved
@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from 7c82037 to 592d362 Compare October 6, 2023 13:56
@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 3 times, most recently from d305898 to 92a7815 Compare October 23, 2023 13:00
@rchincha
Copy link
Contributor

cc: @mbshields this needs documentation as well

@lictw
Copy link

lictw commented Oct 26, 2023

Good interface, nice options for any case - great job, thanks! I will build and test it for sure, I need this!

I have one suggestion, I need it and I think it will be nice option too! I don't know how to name it exactly, let it be "dynamic tags":

Imagine tags like environment-{1,2,..}-* and I want to keep 2 latest for each environment (1, 2, etc), so I must write rules for each environment, but it's a dynamic thing, and there are many of them, I suggest to simplify it somehow.

So many groups of tags and single rule for each group, something like regex capture group (maybe named) I think the best solution cause our group value must be in any matched tag and we should extract&compare it for each tag, and it's also very "flat":

names: [(environment-\d+)-.*]

.., so for each matched tag we should compare $1 and interpret tags with same group as unique list/rule and do the things.

Maybe you imagine better interface, but I really think it's cool feature, and it will be very unique feature!

@lictw
Copy link

lictw commented Oct 27, 2023

Update after first testing:

  1. Please update main from upstream to pull changes about "delete repositories without manifests", now in your fork repositories are left after policies delete all tags.

  2. Help me understand what is deleteReferrers, how it works? Also deleteUntagged about unfinished blobs, when it's uploaded, but not tagged (canceled push for ex.)?

  3. I'm right that if some tags matched to rule, than only this rule defines who will be retained and this tags can't be now affected by rules below?

        {
          "repositories": ["**"],
          "deleteUntagged": true,
          "deleteReferrers": true,
          "keepTags": [{
            "patterns": ["main$"]
          },{
            "patterns": ["main"],
            "mostRecentlyPushedCount": 3
          },{
            "pushedWithin": "72h",
            "pulledWithin": "720h"
          }]
        }

For ex. here for tags like: main (meta-tag, always points to the last build of this branch), main.1, main.2, branch-foo (meta), branch-foo.13 I want:

  1. main meta-tag always presented
  2. 3 latest build from main branch not counting main meta-tag, cause it already matched by rule above
  3. for builds from other branches and it will not affect any main builds cause all of them already matched by above rules

So the main point that this rules will not preserve main builds for 3 days how builds from other branches and will keep only 3 latest how it says, right? Then it's great implementation, thank you very much!

Also about above feature dynamic tags, in real world for complete retention I need something like this:

        {
          "repositories": ["**"],
          "deleteUntagged": true,
          "deleteReferrers": true,
          "keepTags": [{
            "patterns": ["main$"]
          },{
            "patterns": ["main\\..+"],
            "mostRecentlyPushedCount": 8
          },{
            "patterns": ["environments-[a-z]+$"]
          },{
            "patterns": ["(environments-[a-z]+)\\..+"],
            "mostRecentlyPushedCount": 1
          },{
            "pushedWithin": "72h",
            "pulledWithin": "720h"
          }]
        }

Everything like above, but also keep meta-tag for any environment and 1 latest build for each, and also, as for main, not match them by last rule (so not delay deletion for 72h).

In general, everything looks working, I use build of fork with retention for 2 days in my main zot installation, also seems that all works as I described in 3, and so as I said - it's great!

@lictw
Copy link

lictw commented Oct 27, 2023

And also, I see there is a no difference between "patterns": ["main", "master"] and "patterns": ["(main|master)"], so it generates a single list to which the rule applies (pushedCount for example). I think it will be great if each pattern will generate personal list and when we write like "patterns": ["main", "master"] then we just want apply the single rule, but for each match, they are not the same, and my dynamic tags in this case will just automatic propagation of the patterns, when single "patterns": ["(environment-[a-z]+)\\..+"] will be processed (based on existing tags) as "patterns": ["(environment-foo\\..+", "(environment-bar\\..+", "(environment-baz\\..+"].

@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 2 times, most recently from ba6b3de to f748ba2 Compare October 30, 2023 16:23
@eusebiu-constantin-petu-dbk
Copy link
Collaborator Author

Update after first testing:

  1. Please update main from upstream to pull changes about "delete repositories without manifests", now in your fork repositories are left after policies delete all tags.
  2. Help me understand what is deleteReferrers, how it works? Also deleteUntagged about unfinished blobs, when it's uploaded, but not tagged (canceled push for ex.)?
  3. I'm right that if some tags matched to rule, than only this rule defines who will be retained and this tags can't be now affected by rules below?
        {
          "repositories": ["**"],
          "deleteUntagged": true,
          "deleteReferrers": true,
          "keepTags": [{
            "patterns": ["main$"]
          },{
            "patterns": ["main"],
            "mostRecentlyPushedCount": 3
          },{
            "pushedWithin": "72h",
            "pulledWithin": "720h"
          }]
        }

For ex. here for tags like: main (meta-tag, always points to the last build of this branch), main.1, main.2, branch-foo (meta), branch-foo.13 I want:

  1. main meta-tag always presented
  2. 3 latest build from main branch not counting main meta-tag, cause it already matched by rule above
  3. for builds from other branches and it will not affect any main builds cause all of them already matched by above rules

So the main point that this rules will not preserve main builds for 3 days how builds from other branches and will keep only 3 latest how it says, right? Then it's great implementation, thank you very much!

Also about above feature dynamic tags, in real world for complete retention I need something like this:

        {
          "repositories": ["**"],
          "deleteUntagged": true,
          "deleteReferrers": true,
          "keepTags": [{
            "patterns": ["main$"]
          },{
            "patterns": ["main\\..+"],
            "mostRecentlyPushedCount": 8
          },{
            "patterns": ["environments-[a-z]+$"]
          },{
            "patterns": ["(environments-[a-z]+)\\..+"],
            "mostRecentlyPushedCount": 1
          },{
            "pushedWithin": "72h",
            "pulledWithin": "720h"
          }]
        }

Everything like above, but also keep meta-tag for any environment and 1 latest build for each, and also, as for main, not match them by last rule (so not delay deletion for 72h).

In general, everything looks working, I use build of fork with retention for 2 days in my main zot installation, also seems that all works as I described in 3, and so as I said - it's great!

Hello, and thanks for trying it!

  1. done
  2. deleteReferrers, will remove manifests for which their subject is missing.(was deleted, or was never uploaded)
  3. deleteUntagged, will remove manifests without tags
  4. yes, that's right, only the first tag policy is applied so your "main$" should be safe.
Imagine tags like environment-{1,2,..}-* and I want to keep 2 latest for each environment (1, 2, etc), so I must write rules for each environment, but it's a dynamic thing, and there are many of them, I suggest to simplify it somehow.

^here, I don't understand why you need rules per each environment, you want different rules for each one?

Let me think about the rest of the points, thanks for raising them.

@rchincha
Copy link
Contributor

@peusebiu pls rebase

@lictw
Copy link

lictw commented Oct 31, 2023

I don't understand why you need rules per each environment, you want different rules for each one?

The same, the same to keep last N, but N for each, 1 latest for env-A, 1 latest for env-B, not 1 for any. It seems to me that the best implementation that I have reached is described here.

deleteReferrers, will remove manifests for which their subject is missing.(was deleted, or was never uploaded)

PS what is subject?)

@rchincha
Copy link
Contributor

deleteReferrers, will remove manifests for which their subject is missing.(was deleted, or was never uploaded)

PS what is subject?)

The OCI distribution spec is working towards allowing uploading of arbitrary content and add "soft links" to each other.
https://github.com/opencontainers/distribution-spec/blob/main/spec.md#enabling-the-referrers-api

@eusebiu-constantin-petu-dbk eusebiu-constantin-petu-dbk force-pushed the gc_policies branch 3 times, most recently from a074690 to f3ae020 Compare October 31, 2023 17:02
feat(metaDB): add more image statistics info

Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
Copy link
Contributor

@rchincha rchincha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rchincha rchincha merged commit 9074f84 into project-zot:main Nov 1, 2023
33 checks passed
@rchincha
Copy link
Contributor

rchincha commented Nov 1, 2023

@lictw Thanks for all the feedback and testing.
This PR has been merged, so build from main and try.
We will take up further enhancements in future PRs.

@lictw
Copy link

lictw commented Nov 1, 2023

@rchincha yep, nice job! Please ping me in future PRs to watch.

@lictw
Copy link

lictw commented Jun 24, 2024

@peusebiu Hi! I'm back. I'm faced with a situation that I can't seem to solve with the current implementation. But a task seems so basic that I can't believe in it, so I decided to ask if I understand everything correctly.

I want to keep 3 latest tags, but only for some time, something like:

          "repositories": ["foo/**"],
          "keepTags": [{
            "mostRecentlyPushedCount": 3,
            "pushedWithin": "100h"
          }]

but there is OR logic, so above will keep latest 3 tags for forever. It's a DEV builds, I want to delete them fully after some time, but I also what to have only some latest builds if there is many, so it's the reason.

Also, I also have the latest tag, so I want to always keep the latest and 3 latest pushed tags, and delete all of them after some time.

And as side question about latest pulled, is it tracking for a specific tag or for whole image? If an image has 2 tags, but was pulled only by one, the second will be deleted?

@lictw
Copy link

lictw commented Jul 10, 2024

@peusebiu up (@rchincha maybe?)

@andaaron
Copy link
Contributor

andaaron commented Jul 10, 2024

@lictw

To answer your last paragraph, only the tags are deleted, the actual image/manifest is kept/deleted afterwards based on the separate setting you used for deleteUntagged. If the manifest is still referenced by at least one other remaining tag it should not be deleted.

I'm not sure there is a possibility to do an AND (intersection) between mostRecentlyPushedCount and pushedWithin. They were designed to independently apply for the same regex pattern identifying the tag strings. And I believe this constraint was because the patterns were meant to be unique, so there wouldn't be a possibility to have the same patterns value in multiple entries inside keepTags if you wanted to implement the OR (union).
I may be mistaking though. @peusebiu , @rchincha , what do you think?

@eusebiu-constantin-petu-dbk
Copy link
Collaborator Author

eusebiu-constantin-petu-dbk commented Jul 11, 2024

@andaaron you are right with your points.

Unfortunately what you want is not possible with current implementation :(.

the best you can do is:

"repositories": ["foo/**"],
"keepTags": [
    {
        "patterns": ["latest"]
    },
    {
    "mostRecentlyPushedCount": 4,
    }
]

if you want to always keep 'latest' and 3 more recent tags ^

"repositories": ["foo/**"],
"keepTags": [
    {
        "patterns": ["latest"]
    },
    {
    "pushedWithin": "100h"
    }
]

If you want to always keep latest + all recently pushed tags. ^


"repositories": ["foo/**"],
"keepTags": [
    {
        "patterns": ["latest"]
    },
    {
    "pushedWithin": "100h",
    },
    {
    "mostRecentlyPushedCount": 4
    }
]

If you want to always keep latest + most recently pushed tags (within 100h) and as a default case keep most recent 4 tags(latest may be included) if the tags are older than 100h. ^

@lictw
Copy link

lictw commented Jul 13, 2024

@peusebiu why 4? Just to clarify that I understand how it equally works. Latest already done after first rule and should be excluded, is not it? So 3 from of the remaining.

After all, does my case is actual? Do you have any thoughts on feature improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants