Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lints: add check for non-utf-8 filesystem content #983

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

allisonkarlitskaya
Copy link
Contributor

As mentioned in #975, we should add a lint to ensure that all filenames are UTF-8, giving nice errors (with full pathnames) in the event that we encounter invalid filenames.

We also check symlink targets.

Add a unit test that tries various valid and invalid scenarios.

@allisonkarlitskaya allisonkarlitskaya force-pushed the utf8-lint branch 2 times, most recently from e627b9e to 0d0acdf Compare December 19, 2024 09:58
@allisonkarlitskaya
Copy link
Contributor Author

From what I can see of how we get our root Dir reference in cli.rs, and the recommended instructions for running this in a RUN command in the Containerfile, I think this might end up walking into /dev, /sys, /proc and so on as well...

On the other hand, any bind mounts that are present might prevent us from scanning the underlying content. Imagine some scenario where /etc/resolv.conf in the image was a symlink that contained non-utf8 character, but we wouldn't be able to access that because it gets bind-mounted-over for the duration of the RUN command.

Maybe we should do a temporary bind mount of (only) the rootfs for purposes of linting?

Thoughts?

@cgwalters
Copy link
Collaborator

From what I can see of how we get our root Dir reference in cli.rs, and the recommended instructions for running this in a RUN command in the Containerfile, I think this might end up walking into /dev, /sys, /proc and so on as well...

Yeah we should avoid that for sure. There's relevant similar code in e.g. remove_all_on_mount_recurse

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the mount traversal you noted, thanks!

@allisonkarlitskaya
Copy link
Contributor Author

I don't really trust checking dev vs. rootdev anymore: we've had some reports of that that show that overlayfs with composefs can produce strange effects in this case... like in containers/composefs-rs#41 (comment)

The overlayfs docs themselves warn about this:

While directories will report an st_dev from the overlay-filesystem, non-directory objects may report an st_dev from the lower filesystem or upper filesystem that is providing the object. Similarly st_ino will only be unique when combined with st_dev, and both of these can change over the lifetime of a non-directory object. Many applications and tools ignore these values and will not be affected.

https://docs.kernel.org/filesystems/overlayfs.html

The openat2 NO_XDEV flag approach seems kinda okay-ish I guess... does cap-std have support for this?

I think the best thing we could do here is some trickery with cloning the root mount to an fd using the new mount API and then iterating through that fd...

@allisonkarlitskaya
Copy link
Contributor Author

Okay, "some trickery" isn't very tricky at all. It mostly looks like just using open_tree() without the AT_RECURSIVE flag. rustix binds this, so I'll take a look.

https://github.com/brauner/man-pages-md/blob/main/open_tree.md

@cgwalters
Copy link
Collaborator

I don't really trust checking dev vs. rootdev anymore: we've had some reports of that that show that overlayfs with composefs can produce strange effects in this case... like in containers/composefs-rs#41 (comment)

Yeah, fair.

OK so another approach which is also used in the referenced code is this: #984

The openat2 NO_XDEV flag approach seems kinda okay-ish I guess... does cap-std have support for this?

Not directly but we can also just use rustix for stuff like this too.

cgwalters added a commit to cgwalters/man-pages-md that referenced this pull request Dec 19, 2024
This came up in containers/bootc#983 and it's actually *really* cool honestly that one can do a non-cloning (detached) `open_tree` without privileges. Let's note that implicitly by documenting that this can return `EPERM`.
@cgwalters
Copy link
Collaborator

Okay, "some trickery" isn't very tricky at all. It mostly looks like just using open_tree() without the AT_RECURSIVE flag. rustix binds this, so I'll take a look.

That is awesome! I didn't know that was usable unprivileged. Indeed it's by far the best approach. I noticed this wasn't doc'd so ➡️ brauner/man-pages-md#15

lib/src/cli.rs Outdated Show resolved Hide resolved
lib/src/cli.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks!

@cgwalters
Copy link
Collaborator

Ah CI needs some #[cfg(feature = "install")] massaging...I am quite tempted again to drop that feature or just scope it to to-disk

lib/src/mount.rs Outdated Show resolved Hide resolved
lib/src/mount.rs Outdated Show resolved Hide resolved
@cgwalters cgwalters self-assigned this Dec 19, 2024
As mentioned in containers#975, we should add a lint to ensure that all filenames
are UTF-8, giving nice errors (with full pathnames) in the event that we
encounter invalid filenames.

We also check symlink targets.

Add a unit test that tries various valid and invalid scenarios.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters
Copy link
Collaborator

After a live chat we decided to just rely on openat2(.. NO_XDEV), updated for that.

@cgwalters cgwalters merged commit f95d43c into containers:main Dec 19, 2024
25 checks passed
@cgwalters
Copy link
Collaborator

Also noting here just so it doesn't get lost:

We also discussed having container lint --root /path/to/root which would be usable in a multi-stage build like this:

FROM someimage as rootfs
...


FROM otherimage as linter
RUN --mount=from=rootfs,target=/rootfs bootc container lint --rootfs=/rootfs

An advantage of this approach is that we'd be able to scan the filesystem tree without having container runtime state (/proc, /dev) overmounted at all...which would actually allow us to e.g. also lint against device nodes.

But...the ergonomics would be obviously less good.

And ultimately the most powerful way to do this is to have a container scanner that runs outside of the running image, which would allow us to lint the actual image structure itself (e.g. tiny layers that should be coalesced). Or of course, a container -> container optimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants