Replies: 7 comments 20 replies
-
Cross posting for visibility: https://github.com/orgs/paketo-buildpacks/discussions/191 |
Beta Was this translation helpful? Give feedback.
-
I would disagree unless the name and versions combine to point to a specific tag of an image, but then you don't have the repo to know where the image resides. The reason I disagree is that there is no standard install for Ubuntu or any other Linux version. There are so many different ways to do an install of Linux, pull in packages, strip out packages, I don't think that provides much useful information. At least, not when it comes to knowing what's installed in the image.
I don't like this option, but I think it's probably the only want to actually have a strong guarantee. If you are making a builder and a set of buildpacks, you can coordinate between the two. If you are making a buildpack targeting a well known builder/base images, then you can target it as well. As mentioned, it does tightly couple the builder and buildpacks, which isn't great. I could definitely see warning instead of a hard fail if the targets don't match. It also makes me wonder why even change when we have basically the same thing right now with stacks.
This doesn't work because as a buildpack author you can only sanity check what's in the build image. You could make sure all the required build tools are present, but you can't see the run image so you can't check what's going to be available at runtime, and doing runtime checks is absolutely not an option. Even if we could check the runtime image, I don't like this approach because it puts a lot of extra work on the buildpacks. I can't speak for all of the Paketo buildpacks, but for the Java-related ones that supply binaries, we already kind of do option 0.) because we just have There are a couple of places where we need to use different start commands based on if a shell is present, like Tomcat, and that's probably where we'd use the target id.
100%
100%. The only other way around is rebuilding stuff. That works for some build systems like Node.js/NPM that will recompile native code, but not for PHP extensions. Not sure there's a good way around it. I think the Paketo PHP buildpack is just more tightly coupled to where it can run than other buildpacks. |
Beta Was this translation helpful? Give feedback.
-
I believe we (Google Cloud's buildpacks) would face a lot of similar issues as the Heroku in this situation. For context, we maintain our own stacks to support the different ways customers want to interact with GCP products:
Package differencesA few thoughts/questions also come to mind:
Package Management for Authors
Let's assume we can all come to a shared understanding on what it means to be on Ubuntu 22.04, and that it's always going to be the Each buildpack author then could determine the additional packages that they require on top of Ubuntu 22.04. This is just a rough idea, but what if we let authors declare additional packages:
And then we'd extend the...Detect phase? to install the listed packeges for the selected buildpacks. I think this could be interesting because:
Downsides:
|
Beta Was this translation helpful? Give feedback.
-
To add: With the removal of that stack id then some buildpack authors wouldn't be able to get far enough to fail on binary execution as they don't even know what binary to try to download. Currently, some Heroku buildpacks treat the stack id as a cache key of-sorts, so there is a I could convert Ubuntu 22.04 into |
Beta Was this translation helpful? Give feedback.
-
Responding mostly inline, but I realize I had a typo in an important paragraph. It is not possible to match Updated and slightly re-phrased:
|
Beta Was this translation helpful? Give feedback.
-
So, one thing that I did not touch on at all in the original post is alignment (or lack thereof) between build and run images. A buildpack can only analyze the build image; the presence of a library during I suppose the general assumption for "curated" images like Paketo's, Heroku's, Google's, is that the run images are a complete subset of the build images, but this may not always hold true, and certainly cannot be guaranteed once users have their own builders and images. An evolution of "the packages don't match" is the "the distributions don't match" case (or even OS/architecture). Then we're effectively in "cross-building" territory. The question is how realistic is this in the real world. A JVM app built on a Ubuntu builder, then executed on a minimal RHEL runtime image, sure - linked But a Ruby binary for a Ubuntu 22.04 runtime target is unlikely to work on an RHEL 7 builder. Too many differences in the dynamically loaded libraries for Those are likely use cases for people who are then bringing their own images anyway and have full end-to-end control over builders and build/run images. But it does beg the question... is In fact... where is the value even really supposed to come from? https://github.com/buildpacks/spec/blob/main/buildpack.md#targets doesn't say that at the moment (I know... it's a work in progress 😄). |
Beta Was this translation helpful? Give feedback.
-
Okay, so, thinking about what In its simplest form, it expresses (or promises, really) a run images' compatibility with the list of packages that are declared somewhere (or that the image has always offered, since the spec says "MUST NOT break ABI compatibility" for newer image versions). In Paketo's builder, imagine this run-image metadata: [run-image]
image = "…"
reference = "…"
[target]
id = "io.paketo.run.full"
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" And in Heroku's builder, this: [run-image]
image = "…"
reference = "…"
[target]
id = "com.heroku.runtime.standard"
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" A buildpack that downloads pre-built binaries such as language runtimes (with a bunch of libraries dynamically linked) declares compatibility with Ubuntu 22.04, since that what it has pre-built binaries for: api = "0.10"
[buildpack]
id = "…"
name = "…"
version = "…"
[[order]]
[[order.group]]
# …
[[targets]]
os = "linux"
arch = "amd64"
[[targets.distros]]
name = "ubuntu"
version = "22.04" This buildpack therefore would work on both of the builders above, but to be reasonably confident that, at runtime, the libraries its binaries dynamically link against exist, it now checks against some known run images it supports: # bin/compile
# ... download binary using $CNB_TARGET_OS, `$CNB_TARGET_DISTRO_NAME`, etc.
if os.getenv("CNB_TARGET_ID") not in ["io.paketo.run.full", "com.heroku.runtime.standard", "com.google.gcp.gae"]:
# ... do an LDD check even if that's no guarantee if the run image mismatches
# ... or print a warning, at least in case something fails later, about required libraries So far, so good. But. This would cause lots of work, for everyone, long-termWhat if...
Make it not a single ID, but a set of labels!Think of it instead this way: we want compatibility "labels", and we want to be able to express more than just a single one for an image. Let's call the resulting env var, just for the sake of this example, And let's define this env var as a comma-separated list of the compatibility labels (spec'd as a set, really, since the order is irrelevant), which keeps things super simple, because on the image side, it always means "a union of those labels", while buildpacks, during execution, can match against it with arbitrary logic. So, our images from earlier, would now have metadata like this: [run-image]
# …
[target]
compatibility = ["io.paketo.run.tiny", "io.paketo.run.full"]
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" [run-image]
# …
[target]
compatibility = ["com.heroku.runtime.standard"]
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" And a buildpack can parse the env var: # bin/compile
# ... download binary using $CNB_TARGET_OS, `$CNB_TARGET_DISTRO_NAME`, etc.
compatible_targets = set(os.getenv("CNB_TARGET_COMPATIBILITY").split(","))
tested_targets = {"io.paketo.run.full", "com.heroku.runtime.standard", "com.google.gcp.gae"}
# check if comma-separated CNB_TARGET_COMPATIBILITY set intersects with one of our tested-against run images
if not tested_targets & compatible_targets:
# ... do an LDD check even if that's no guarantee if the run image mismatches
# ... or print a warning, at least in case something fails later, about required libraries Custom images are suddenly no problem for buildpacks!I can now make my own image for my own builder, where I use a mix of Paketo and Google buildpacks, so I ensured that the run image is a union of them, because that's what I decided is best for my purposes (I am not giving it its own "compatibility name" here, since I do not plan on sharing it with third parties, but... I could): [run-image]
# …
[target]
compatibility = ["io.paketo.run.tiny", "io.paketo.run.full", "com.google.gcp.minimal", "com.google.gcp.basic", "com.google.gcp.gae"]
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" Any buildpack that has logic specifically for any of those labels will continue to "just work" as if the image wasn't custom. Image "flavors" are suddenly trivial to evolve!At Heroku, let's say we decide to start offering minimal images. No problem; the existing "standard" image's metadata changes accordingly: [run-image]
# …
[target]
compatibility = ["com.heroku.runtime.minimal", "com.heroku.runtime.standard"]
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" A buildpack currently somehow special-cases " And buildpack authors do not have to lift a finger!The beautiful thing is that no existing buildpack that has explicit "shortcuts" or automated tests for well-known run images need any adjustments when new image variants are introduced that are a superset of others. This scales really well as the number of builders, images and buildpacks goes up over time. More use casesEasy images for debuggingAnother cool thing that becomes possible are debug run images. What if our website keeps crashing server-side with a segfault in libXML? Well...: [run-image]
# …
[target]
compatibility = ["io.paketo.run.tiny", "io.paketo.run.full", "io.paketo.run.debug"]
os = "linux"
arch = "amd64"
[target.distro]
name = "ubuntu"
version = "22.04" The extra entry in Because it still identifies as compatible with " "Fake It 'Till You Make It"Or, what if CNBs and the infrastructure become the greatest business on the planet, and we have wonderful competition between vendors. A new company enters the marketplace and promises the very best builder images. How can they ever catch up to the big, established players, whose Well, this new company fully controls their own builder and images, including the Really weird buildpack requirementsThis is quite unlikely to be a real use case, but the buildpack code that matches against Maybe a super rare library is needed for a binary, and it's known to work on
The latter would of course require users of that buildpack to then have a custom image with the right packages, and both of those compatibility labels in its What's needed to make this happen?
|
Beta Was this translation helpful? Give feedback.
-
Context
As the CNB spec removes the concept of stacks, buildpacks will eventually (once the deprecation turns into a removal in a future spec revision) no longer be able to definitively express that they require e.g. Heroku-22, a "stack" that is a specific build of Ubuntu 22.04 with a known set of libraries available for buildpacks' binaries to dynamically link against:
Instead, only the distribution name and version (besides base OS flavor and architecture) are exposed by the lifecycle and are to be specified as targets; in Heroku's case, for Heroku-22, this would be "ubuntu" and "22.04"):
This gives a certain basic guarantee for package and library versions, if installed (assuming they're from the official distribution sources), but not whether they're available on the given build and run images in the first place.
Many buildpacks bundle or load pre-built packages for e.g. language runtimes, which contain executables and shared objects that link against shared objects from the environment they were built on:
Dynamic linking is desirable, both for efficiency (archive size) and security (rebasing updates linked libraries) purposes.
A standard e.g. Ubuntu 22.04 installation is not guaranteed to have these libraries;
libc-client.so.2007e
is from a package that's explicitly installed during the build of the Heroku-22 stack image.In the future, CNBs will therefore have to handle the possibility of missing libraries for their pre-built language runtimes and other packages, as they may be executed in a builder that uses base images different from those provided by the buildpack vendor.
While there may be a
$CNB_TARGET_ID
environment variable present during the detect and build phases, it is not (and should not be) possible to define this ID as part of the[[targets]]
list in abuildpack.toml
for the purpose of "matching" buildpacks against "stacks".The resulting "looser" contract makes it easier to combine buildpacks that currently explicitly depend on different stacks - instead, they will declare compatibility with certain OS distributions and versions.
The variable is optional, however, and, again, different vendors' builders and/or base images will have different values. For buildpacks to be interoperable, they must rely solely on distribution name and version e.g. when determining which pre-built language runtime binaries to download.
But then, as stated before, there are no guarantees for a buildpack that the dynamic libraries these binaries link against are actually present on the "ubuntu"/"22.04" image it's is currently running on.
There are a few possible approaches for dealing with this inside buildpacks. I am listing them below for the purpose of discussion, in the hope that in the not-too-distant future, these can be incorporated into the buildpack authors' guide documentation.
All of the approaches below assume that a buildpack author is aiming for interoperability with other buildpacks, builders, and base images. In cases where somebody wants to explicitly throw together a custom buildpack to just work with a particular known stack in a controlled environment, none of this is necessary, of course. Furthermore, "utility buildpacks" that do not vendor any kind of self-compiled binaries that use dynamic linking can generally ignore all of this, as well.
Options for Buildpack authors
In ascending order of effort-to-implement ;)
0. Just ignore it all
"Hope is not a strategy", they say, but for buildpacks that do not vendor anything, or that are supposed to be used in a tightly controlled environment, not targeting anything in particular and "just knowing" that stuff will work, is perfectly fine.
1. Re-purpose
$CNB_TARGET_ID
This is effectively the "do nothing" approach that fails a bit more gracefully. Use the value of
$CNB_TARGET_ID
during build to decide "yup, my built stuff works with that", and warn or fail, or even bail out early during detection*.In most cases, this is not a desirable option, as it results in effectively no interoperability - a Heroku buildpack can't run in a Paketo builder or v.v., and the result is exactly the same tight coupling that we have at present with
$CNB_STACK_ID
.2. Rely on target OS, architecture, distribution name, and distribution version
Instead of using a stack or target ID as the main identifier, use the
CNB_TARGET_*
variables, for determining compatibility, and/or e.g. what pre-built binaries to vendor, for instance:$ curl "https://my-happy-package-repo.s3.amazonaws.com/dist/${CNB_TARGET_OS}-${CNB_TARGET_ARCH}/${CNB_TARGET_DISTRO_NAME}-${CNB_TARGET_DISTRO_VERSION}/ruby-3.4.5.tar.gz
And then...
2.A: "Hail Mary"
Then, hope it "just works", because someone took care to ensure the packages for the buildpacks in use are on the base images.
Again, fine for cases where a user controls the entire "stack" of builder, base images, and buildpacks. Effectively similar to Option 0.
2.B: sanity check the binaries
For any downloaded executables and shared objects,
ldd
them to see if any dynamicNEEDED
entries are missing, and alert the user, advising them to use e.g. https://packages.ubuntu.com/search?searchon=contents&keywords=libc-client.so.2007e&mode=exactfilename&suite=jammy&arch=amd64 to figure out what package is missing from their base image.Better user experience, although users' hands might be tied in cases where they cannot easily change the builder or its base images.
If
$CNB_TARGET_ID
matches a specific known/expected value, a buildpack may skip this step. See notes on$CNB_TARGET_ID
namespacing below.Buildpacks could also only warn, rather than fail, on missing shared libraries that might not be essential (think a PHP runtime builds' shared
ext-imap
module not finding itslibc-client.so.2007e
- maybe the code doesn't even need and load theext-imap
extension into core).3. Rely on all from 2. plus available libraries
Effectively, run
ldconfig -p
from the buildpack to get the full list of dynamic libraries thatld.so
knows about, and use that list to figure out ahead of time whether pre-built binaries will even work on the current base image.This requires maintaing appropriate metadata for built binaries (as in, full list of shared libraries they depend on), so it's considerably more effort than either of option 2.
If
$CNB_TARGET_ID
matches a specific known/expected value, a buildpack may skip this step. See notes on$CNB_TARGET_ID
namespacing below.Afterwards...
3.A: fail early
If the packages on the image aren't sufficient, fail with an explanation of what's missing, similar to option 2.B.
3.B: choose a "simpler" build
Of course, buildpacks could prepare several builds of language runtimes or other software, with different "levels" of dynamic libraries they link against. A buildpack could install a more "minimal" build of a program, which might be enough for the user.
Thoughts on
$CNB_TARGET_ID
"namespacing"For effective interoperability, the value of
$CNB_TARGET_ID
should be required, by the spec, to be namespaced, as buildpacks will likely use them to short-circuit any "stack" sanity checks.Otherwise, whoever "squats" an identifier like "
minimal
" or similar first makes it unusable for any other buildpacks - imagine both Paketo and Heroku having "full-sized" images based on Ubuntu 22.04, but with different libraries... if they both called themselves "full
", they'd be indistinguishable, and a buildpack could not infer anything about their guaranteed contents this way.Reverse-domain notation comes to mind as a possible simple pattern, think
com.heroku.base-images.run
orio.paketo.images.run-minimal
.This should also be an unversioned "flavor" or "variant" identifier, because the version (think 20.04/"bionic", 22.04/"jammy") is already encoded in
$CNB_DISTRIBUTION_(ID|VERSION)
. Maybe$CNB_TARGET_VARIANT
or another name is an even better choice for this variable than$CNB_TARGET_ID
?Long-term, e.g. the Heroku buildpacks might then whitelist e.g. "
com.heroku.base-images.run
" and "io.paketo.images.run-full
" inside the buildpack logic, and declareubuntu
and20.04
,22.04
,24.04
as supported intargets.distributions
.The spec might also benefit from some guidance on what a buildpack should do if the target ID is of an unknown value (probably the same as if it is absent... best effort, as per the possible approaches outlined above).
A note on ABI compatibility
In the list of options above, it is not possible to rely solely on the target OS and architecture plus the available libraries, and to ignore the distribution name and version, if downloaded programs dynamically link against shared libraries on the system.
The reason for that is that ABI versions (e.g. the
6
inlibstdc++.so.6
) only guarantee forward compatibility at runtime. A program compiled against an old version of a library will still execute fine when dynamically linking against a much newer version (but same ABI version number) of the library.However, if the program is compiled against a newer version of the library, and then executed against an older version, it is likely to have, via
#ifdef
macros etc, used newly added symbols from the newer version of the library during compilation, which are not present on an older version.For example, a program that links against
libpq.so.5
, built on Debian "bookworm", links against symbols from version 15.3 oflibpq
. When executed on a Ubuntu "jammy" system, wherelibpq.so.5
is also installed, this will then not work - thelibpq
version on that Ubuntu version is 14.8.As the combinations of exact library versions across Linux distributions and versions are effectively infinite, this means that, in practice, distributions and their versions have to be targeted precisely, both during compilation and when later "vendoring" them inside buildpacks.
Beta Was this translation helpful? Give feedback.
All reactions