CH-IMAGE(1) | Charliecloud | CH-IMAGE(1) |
ch-image - Build and manage images; completely unprivileged
$ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT $ ch-image [...] build-cache [...] $ ch-image [...] delete IMAGE_GLOB [IMAGE_GLOB ...] $ ch-image [...] gestalt [SELECTOR] $ ch-image [...] import PATH IMAGE_REF $ ch-image [...] list [-l] [IMAGE_REF] $ ch-image [...] pull [...] IMAGE_REF [DEST_REF] $ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF] $ ch-image [...] reset $ ch-image [...] undelete IMAGE_REF $ ch-image { --help | --version | --dependencies }
ch-image is a tool for building and manipulating container images, but not running them (for that you want ch-run). It is completely unprivileged, with no setuid/setgid/setcap helpers. Many operations can use caching for speed. The action to take is specified by a sub-command.
Options that print brief information and then exit:
Common options placed before or after the sub-command:
This is accomplished by re-parsing the module, injecting import pdb; pdb.set_trace() into the parse tree, re-compiling the tree, and replacing the module’s code with the result. This has various gotchas, including (1) module-level code in the target module is executed twice, (2) the option is parsed with bespoke early code so command line argument parsing itself can be debugged, (3) breakpoints on function definition will trigger while the module is being re-executed, not when the function is called (break on the first line of the function body instead), and (4) other weirdness we haven’t yet characterized.
Charliecloud provides the option --arch ARCH to specify the architecture for architecture-aware registry operations. The argument ARCH can be: (1) yolo, to bypass architecture-aware code and use the registry’s default architecture; (2) host, to use the host’s architecture, obtained with the equivalent of uname -m (default if --arch not specified); or (3) an architecture name. If the specified architecture is not available, the error message will list which ones are.
Notes:
Charliecloud does not have configuration files; thus, it has no separate login subcommand to store secrets. Instead, Charliecloud will prompt for a username and password when authentication is needed. Note that some repositories refer to the secret as something other than a “password”; e.g., GitLab calls it a “personal access token (PAT)”, Quay calls it an “application token”, and nVidia NGC calls it an “API token”.
For non-interactive authentication, you can use environment variables CH_IMAGE_USERNAME and CH_IMAGE_PASSWORD. Only do this if you fully understand the implications for your specific use case, because it is difficult to securely store secrets in environment variables.
By default for most subcommands, all registry access is anonymous. To instead use authenticated access for everything, specify --auth or set the environment variable $CH_IMAGE_AUTH=yes. The exception is push, which always runs in authenticated mode. Even for pulling public images, it can be useful to authenticate for registries that have per-user rate limits, such as Docker Hub. (Older versions of Charliecloud started with anonymous access, then tried to upgrade to authenticated if it seemed necessary. However, this turned out to be brittle; see issue #1318.)
The username and password are remembered for the life of the process and silently re-offered to the registry if needed. One case when this happens is on push to a private registry: many registries will first offer a read-only token when ch-image checks if something exists, then re-authenticate when upgrading the token to read-write for upload. If your site uses one-time passwords such as provided by a security device, you can specify --password-many to provide a new secret each time.
These values are not saved persistently, e.g. in a file. Note that we do use normal Python variables for this information, without pinning them into physical RAM with mlock(2) or any other special treatment, so we cannot guarantee they will never reach non-volatile storage.
Most registries use something called Bearer authentication, where the client (e.g., Charliecloud) includes a token in the headers of every HTTP request.
The authorization dance is different from the typical UNIX approach, where there is a separate login sequence before any content requests are made. The client starts by simply making the HTTP request it wants (e.g., to GET an image manifest), and if the registry doesn’t like the client’s token (or if there is no token because the client doesn’t have one yet), it replies with HTTP 401 Unauthorized, but crucially it also provides instructions in the response header on how to get a token. The client then follows those instructions, obtains a token, re-tries the request, and (hopefully) all is well. This approach also allows a client to upgrade a token if needed, e.g. when transitioning from asking if a layer exists to uploading its content.
The distinction between Charliecloud’s anonymous mode and authenticated modes is that it will only ask for anonymous tokens in anonymous mode and authenticated tokens in authenticated mode. That is, anonymous mode does involve an authentication procedure to obtain a token, but this “authentication” is done anonymously. (Yes, it’s confusing.)
Registries also often reply HTTP 401 when an image does not exist, rather than the seemingly more correct HTTP 404 Not Found. This is to avoid information leakage about the existence of images the client is not allowed to pull, and it’s why Charliecloud never says an image simply does not exist.
ch-image maintains state using normal files and directories located in its storage directory; contents include various caches and temporary images used for building.
In descending order of priority, this directory is located at:
Unlike many container implementations, there is no notion of storage drivers, graph drivers, etc., to select and/or configure.
The storage directory can reside on any single filesystem (i.e., it cannot be split across multiple filesystems). However, it contains lots of small files and metadata traffic can be intense. For example, the Charliecloud test suite uses approximately 400,000 files and directories in the storage directory as of this writing. Place it on a filesystem appropriate for this; tmpfs’es such as /var/tmp are a good choice if you have enough RAM (/tmp is not recommended because ch-run bind-mounts it into containers by default).
While you can currently poke around in the storage directory and find unpacked images runnable with ch-run, this is not a supported use case. The supported workflow uses ch-convert to obtain a packed image; see the tutorial for details.
The storage directory format changes on no particular schedule. ch-image is normally able to upgrade directories produced by a given Charliecloud version up to one year after that version’s release. Upgrades outside this window and downgrades are not supported. In these cases, ch-image will refuse to run until you delete and re-initialize the storage directory with ch-image reset.
WARNING:
Subcommands that create images, such as build and pull, can use a build cache to speed repeated operations. That is, an image is created by starting from the empty image and executing a sequence of instructions, largely Dockerfile instructions but also some others like “pull” and “import”. Some instructions are expensive to execute (e.g., RUN wget http://slow.example.com/bigfile or transferring data billed by the byte), so it’s often cheaper to retrieve their results from cache instead.
The build cache uses a relatively new Git under the hood; see the installation instructions for version requirements. Charliecloud implements workarounds for Git’s various storage limitations, so things like file metadata and Git repositories within the image should work. Important exception: No files named .git* or other Git metadata are permitted in the image’s root directory.
Extended attributes (xattrs) are ignored by the build cache by default. Cache support for xattrs belonging to unprivileged xattr namespaces (e.g. user) can be enabled by specifying the --xattrs option or by setting the CH_XATTRS environment variable. If CH_XATTRS is set, you override it with --no-xattrs. Note that extended attributes in privileged xattr namespaces (e.g. :code:‘trusted‘) cannot be read by :code:‘ch-image‘ and will always be lost without warning.
The cache has three modes: enabled, disabled, and a hybrid mode called rebuild where the cache is fully enabled for FROM instructions, but all other operations re-execute and re-cache their results. The purpose of rebuild is to do a clean rebuild of a Dockerfile atop a known-good base image.
Enabled mode is selected with --cache or setting $CH_IMAGE_CACHE to enabled, disabled mode with --no-cache or disabled, and rebuild mode with --rebuild or rebuild. The default mode is enabled if an appropriate Git is installed, otherwise disabled.
NOTE:
Existing tools such as Docker and Podman implement their build cache with a layered (union) filesystem such as OverlayFS or FUSE-OverlayFS and tar archives to represent the content of each layer; this approach is standardized by OCI. The layered cache works, but it has drawbacks in three critical areas:
Also, similar files are never de-duplicated, regardless of ancestry. For example, if instruction A creates a file and subsequently instruction B modifies a single bit in that file, both versions are stored in their entirety.
Our Git-based cache addresses the three drawbacks: (1) Git is purpose-built to store changing directory trees, (2) cache overhead is imposed only at instruction commit time, and (3) Git de-duplicates both identical and similar files. Also, it is based on an extremely widely used tool that enjoys development support from well-resourced actors, in particular on scaling (e.g., Microsoft’s large-repository accelerator Scalar was recently merged into Git).
In addition to these structural advantages, performance experiments reported in our paper above show that the Git-based approach is as good as (and sometimes better than) overlay-based caches. On build time, the two approaches are broadly similar, with one or the other being faster depending on context. Both had performance problems on NFS. Notably, however, the Git-based cache was much faster for a 129-instruction Dockerfile. On disk usage, the winner depended on the condition. For example, we saw the layered cache storing large sibling layers redundantly; on the other hand, the Git-based cache has some obvious redundancies as well, and one must compact it for full de-duplication benefit. However, Git’s de-duplication was quite effective in some conditions and we suspect will prove even better in more realistic scenarios.
That is, we believe our results show that the Git-based build cache is highly competitive with the layered approach, with no obvious inferiority so far and hints that it may be superior on important dimensions. We have ongoing work to explore these questions in more detail.
Charliecloud’s build cache takes advantage of Git’s file de-duplication features. This operates across the entire build cache, i.e., files are de-duplicated no matter where in the cache they are found or the relationship between their container images. Files are de-duplicated at different times depending on whether they are identical or merely similar.
Identical files are de-duplicated at git add time; in ch-image build terms, that’s upon committing a successful instruction. That is, it’s impossible to store two files with the same content in the build cache. If you try — say with RUN yum install -y foo in one Dockerfile and RUN yum install -y foo bar in another, which are different instructions but both install RPM foo’s files — the content is stored once and each copy gets its own metadata and a pointer to the content, much like filesystem hard links.
Similar files, however, are only de-duplicated during Git’s garbage collection process. When files are initially added to a Git repository (with git add), they are stored inside the repository as (possibly compressed) individual files, called objects in Git jargon. Upon garbage collection, which happens both automatically when certain parameters are met and explicitly with git gc, these files are archived and (re-)compressed together into a single file called a packfile. Also, existing packfiles may be re-written into the new one.
During this process, similar files are identified, and each set of similar files is stored as one base file plus diffs to recover the others. (Similarity detection seems to be based primarily on file size.) This delta process is agnostic to alignment, which is an advantage over alignment-sensitive block-level de-duplicating filesystems. Exception: “Large” files are not compressed or de-duplicated. We use the Git default threshold of 512 MiB (as of this writing).
Charliecloud runs Git garbage collection at two different times. First, a lighter-weight garbage pass runs automatically when the number of loose files (objects) grows beyond a limit. This limit is in flux as we learn more about build cache performance, but it’s quite a bit higher than the Git default. This garbage runs in the background and can continue after the build completes; you may see Git processes using a lot of CPU.
An important limitation of the automatic garbage is that large packfiles (again, this is in flux, but it’s several GiB) will not be re-packed, limiting the scope of similar file detection. To address this, a heavier garbage collection can be run manually with ch-image build-cache --gc. This will re-pack (and re-write) the entire build cache, de-duplicating all similar files. In both cases, garbage uses all available cores.
git build-cache prints the specific garbage collection parameters in use, and -v can be added for more detail.
Because Git uses content-addressed storage, upon commit, it must read in full all files modified by an instruction. This I/O cost can be a significant fraction of build time for some images. To mitigate this, regular files larger than the experimental large file threshold are stored outside the Git repository, somewhat like Git Large File Storage.
ch-image copies large files in and out of images at each instruction commit. It tries to do this with a fast metadata-only copy-on-write operation called “reflink”, but that is only supported with the right Python version, Linux kernel version, and filesystem. If unsupported, Charliecloud falls back to an expensive standard copy, which is likely slower than letting Git deal with the files. See File copy performance for details.
Every version of a large file is stored verbatim and uncompressed (e.g., a large file with a one-byte change will be stored in full twice), so Git’s de-duplication does not apply. However, on filesystems with reflink support, files can share extents (e.g., each of the two files will have its own extent containing the changed byte, but the rest of the extents will remain shared). This provides de-duplication between large files images that share ancestry. Also, unused large files are deleted by ch-image build-cache --gc.
A final caveat: Large files in any image with the same path, mode, size, and mtime (to nanosecond precision if possible) are considered identical, even if their content is not actually identical (e.g., touch(1) shenanigans can corrupt an image).
Option --cache-large sets the threshold in MiB; if not set, environment variable CH_IMAGE_CACHE_LARGE is used; if that is not set either, the default value 0 indicates that no files are considered large.
(Note that Git has an unrelated setting called core.bigFileThreshold.)
Suppose we have this Dockerfile:
$ cat a.df FROM alpine:3.17 RUN echo foo RUN echo bar
On our first build, we get:
$ ch-image build -t foo -f a.df . 1. FROM alpine:3.17 [ ... pull chatter omitted ... ] 2. RUN echo foo copying image ... foo 3. RUN echo bar bar grown in 3 instructions: foo
Note the dot after each instruction’s line number. This means that the instruction was executed. You can also see this by the output of the two echo commands.
But on our second build, we get:
$ ch-image build -t foo -f a.df . 1* FROM alpine:3.17 2* RUN echo foo 3* RUN echo bar copying image ... grown in 3 instructions: foo
Here, instead of being executed, each instruction’s results were retrieved from cache. (Charliecloud uses lazy retrieval; nothing is actually retrieved until the end, as seen by the “copying image” message.) Cache hit for each instruction is indicated by an asterisk (*) after the line number. Even for such a small and short Dockerfile, this build is noticeably faster than the first.
We can also try a second, slightly different Dockerfile. Note that the first three instructions are the same, but the third is different:
$ cat c.df FROM alpine:3.17 RUN echo foo RUN echo qux $ ch-image build -t c -f c.df . 1* FROM alpine:3.17 2* RUN echo foo 3. RUN echo qux copying image ... qux grown in 3 instructions: c
Here, the first two instructions are hits from the first Dockerfile, but the third is a miss, so Charliecloud retrieves that state and continues building.
We can also inspect the cache:
$ ch-image build-cache --tree * (c) RUN echo qux | * (a) RUN echo bar |/ * RUN echo foo * (alpine+3.9) PULL alpine:3.17 * (root) ROOT named images: 4 state IDs: 5 commits: 5 files: 317 disk used: 3 MiB
Here there are four named images: a and c that we built, the base image alpine:3.17 (written as alpine+3.9 because colon is not allowed in Git branch names), and the empty base of everything root. Also note how a and c diverge after the last common instruction RUN echo foo.
Build an image from a Dockerfile and put it in the storage directory.
$ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT
See below for differences with other Dockerfile interpreters. Charliecloud supports an extended instruction (RSYNC), a few other instructions behave slightly differently, and a few are ignored.
Note that FROM implicitly pulls the base image if needed, so you may want to read about the pull subcommand below as well.
Required argument:
Options:
Note: See documentation for ch-run --bind for important caveats and gotchas.
Note: Other instructions that modify the image filesystem, e.g. COPY, can only access host files from the context directory, regardless of this option.
If no colon present in the name, append :latest.
Uses ch-run -w -u0 -g0 --no-passwd --unsafe to execute RUN instructions.
ch-image is a fully unprivileged image builder. It does not use any setuid or setcap helper programs, and it does not use configuration files /etc/subuid or /etc/subgid. This contrasts with the “rootless” or “fakeroot” modes of some competing builders, which do require privileged supporting code or utilities.
Without root emulation, this approach does confuse programs that expect to have real root privileges, most notably distribution package installers. This subsection describes why that happens and what you can do about it.
ch-image executes all instructions as the normal user who invokes it. For RUN, this is accomplished with ch-run arguments including -w --uid=0 --gid=0. That is, your host EUID and EGID are both mapped to zero inside the container, and only one UID (zero) and GID (zero) are available inside the container. Under this arrangement, processes running in the container for each RUN appear to be running as root, but many privileged system calls will fail without the root emulation methods described below. This affects any fully unprivileged container build, not just Charliecloud.
The most common time to see this is installing packages. For example, here is RPM failing to chown(2) a file, which makes the package update fail:
Updating : 1:dbus-1.10.24-13.el7_6.x86_64 2/4 Error unpacking rpm package 1:dbus-1.10.24-13.el7_6.x86_64 error: unpacking of archive failed on file /usr/libexec/dbus-1/dbus-daemon-launch-helper;5cffd726: cpio: chown Cleanup : 1:dbus-libs-1.10.24-12.el7.x86_64 3/4 error: dbus-1:1.10.24-13.el7_6.x86_64: install failed
This one is (ironically) apt-get failing to drop privileges:
E: setgroups 65534 failed - setgroups (1: Operation not permitted) E: setegid 65534 failed - setegid (22: Invalid argument) E: seteuid 100 failed - seteuid (22: Invalid argument) E: setgroups 0 failed - setgroups (1: Operation not permitted)
Charliecloud provides two different mechanisms to avoid these problems. Both involve lying to the containerized process about privileged system calls, but at very different levels of complexity.
This mode uses fakeroot(1) to maintain an elaborate web of deceit that is internally consistent. This program intercepts both privileged system calls (e.g., setuid(2)) as well as other system calls whose return values depend on those calls (e.g., getuid(2)), faking success for privileged system calls (perhaps making no system call at all) and altering return values to be consistent with earlier fake success. Charliecloud automatically installs the fakeroot(1) program inside the container and then wraps RUN instructions having known privilege needs with it. Thus, this mode is only available for certain distributions.
The advantage of this mode is its consistency; e.g., careful programs that check the new UID after attempting to change it will not notice anything amiss. Its disadvantage is complexity: detailed knowledge and procedures for multiple Linux distributions.
This mode has three basic steps:
RUN instructions that do not seem to need modification are unaffected by this mode.
The details are specific to each distribution. ch-image analyzes image content (e.g., grepping /etc/debian_version) to select a configuration; see lib/force.py for details. ch-image prints exactly what it is doing.
WARNING:
This mode uses the kernel’s seccomp(2) system call filtering to intercept certain privileged system calls, do absolutely nothing, and return success to the program.
Some system calls are quashed regardless of their arguments: capset(2); chown(2) and friends; kexec_load(2) (used to validate the filter itself); ; and setuid(2), setgid(2), and setgroups(2) along with the other system calls that change user or group. mknod(2) and mknodat(2) are quashed if they try to create a device file (e.g., creating FIFOs works normally).
The advantages of this approach is that it’s much simpler, it’s faster, it’s completely agnostic to libc, and it’s mostly agnostic to distribution. The disadvantage is that it’s a very lazy liar; even the most cursory consistency checks will fail, e.g., getuid(2) after setuid(2).
While this mode does not provide consistency, it does offer a hook to help prevent programs asking for consistency. For example, apt-get -o APT::Sandbox::User=root will prevent apt-get from attempting to drop privileges, which it verifies, exiting with failure if the correct IDs are not found (which they won’t be under this approach). This can be expressed with --force-cmd=apt-get,-o,APT::Sandbox::User=root, though this particular case is built-in and does not need to be specified. The full default configuration, which is applied regardless of the image distribution, can be examined in the source file force.py. If any --force-cmd are specified, this replaces (rather than extends) the default configuration.
Note that because the substitutions are a simple regex with no knowledge of shell syntax, they can cause unwanted modifications. For example, RUN apt-get install -y apt-get will be run as /bin/sh -c "apt-get -o APT::Sandbox::User=root install -y apt-get -o APT::Sandbox::User=root". One workaround is to add escape syntax transparent to the shell; e.g., RUN apt-get install -y apt-get.
This mode executes all RUN instructions with the seccomp(2) filter and has no knowledge of which instructions actually used the intercepted system calls. Therefore, the printed “instructions modified” number is only a count of instructions with a hook applied as described above.
In terminal output, image metadata, and the build cache, the RUN instruction is always logged as RUN.S, RUN.F, or RUN.N. The letter appended to the instruction reflects the root emulation mode used during the build in which the instruction was executed. RUN.S indicates seccomp, RUN.F indicates fakeroot, and RUN.N indicates that neither form of root emulation was used (--force=none).
ch-image is an independent implementation and shares no code with other Dockerfile interpreters. It uses a formal Dockerfile parsing grammar developed from the Dockerfile reference documentation and miscellaneous other sources, which you can examine in the source code.
We believe this independence is valuable for several reasons. First, it helps the community examine Dockerfile syntax and semantics critically, think rigorously about what is really needed, and build a more robust standard. Second, it yields disjoint sets of bugs (note that Podman, Buildah, and Docker all share the same Dockerfile parser). Third, because it is a much smaller code base, it illustrates how Dockerfiles work more clearly. Finally, it allows straightforward extensions if needed to support scientific computing.
ch-image tries hard to be compatible with Docker and other interpreters, though as an independent implementation, it is not bug-compatible.
The following subsections describe differences from the Dockerfile reference that we expect to be approximately permanent. For not-yet-implemented features and bugs in this area, see related issues on GitHub.
None of these are set in stone. We are very interested in feedback on our assessments and open questions. This helps us prioritize new features and revise our thinking about what is needed for HPC containers.
The context directory is bind-mounted into the build, rather than copied like Docker. Thus, the size of the context is immaterial, and the build reads directly from storage like any other local process would (i.e., it is reasonable use / for the context). However, you still can’t access anything outside the context directory.
Variable substitution happens for all instructions, not just the ones listed in the Dockerfile reference.
ARG and ENV cause cache misses upon definition, in contrast with Docker where these variables miss upon use, except for certain cache-excluded variables that never cause misses, listed below.
Note that ARG and ENV have different syntax despite very similar semantics.
ch-image passes the following proxy environment variables in to the build. Changes to these variables do not cause a cache miss. They do not require an ARG instruction, as documented in the Dockerfile reference. Unlike Docker, they are available if the same-named environment variable is defined; --build-arg is not required.
HTTP_PROXY http_proxy HTTPS_PROXY https_proxy FTP_PROXY ftp_proxy NO_PROXY no_proxy
In addition to those listed in the Dockerfile reference, these environment variables are passed through in the same way:
SSH_AUTH_SOCK USER
Finally, these variables are also pre-defined but are unrelated to the host environment:
PATH=/ch/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TAR_OPTIONS=--no-same-owner
Variables set with ARG are available anywhere in the Dockerfile, unlike Docker, where they only work in FROM instructions, and possibly in other ARG before the first FROM.
The FROM instruction accepts option --arg=NAME=VALUE, which serves the same purpose as the ARG instruction. It can be repeated.
The LABEL instruction accepts key=value pairs to add metadata for an image. Unlike Docker, multiline values are not supported; see issue #1512. Can be repeated.
NOTE:
Especially for people used to UNIX cp(1), the semantics of the Dockerfile COPY instruction can be confusing.
Most notably, when a source of the copy is a directory, the contents of that directory, not the directory itself, are copied. This is documented, but it’s a real gotcha because that’s not what cp(1) does, and it means that many things you can do in one cp(1) command require multiple COPY instructions.
Also, the reference documentation is incomplete. In our experience, Docker also behaves as follows; ch-image does the same in an attempt to be bug-compatible.
We expect the following differences to be permanent:
WARNING:
Copying files is often simple but has numerous difficult corner cases, e.g. when dealing with symbolic or hard links. The standard instruction COPY deals with many of these corner cases differently from other UNIX utilities, lacks complete documentation, and behaves inconsistently between different Dockerfile interpreters (e.g., Docker’s legacy builder vs. BuildKit), as detailed above. On the other hand, rsync(1) is an extremely capable, widely used file copy tool, with detailed options to specify behavior and 25 years of history dealing with weirdness.
RSYNC (also spelled NSYNC) is a Charliecloud extension that gives copying behavior identical to rsync(1). In fact, Charliecloud’s current implementation literally calls the host’s rsync(1) to do the copy, though this may change in the future. There is no list form of RSYNC.
The two key usage challenges are trailing slashes on paths and symlink handling. In particular, the default symlink handling seemed reasonable to us, but you may want something different. See the arguments and examples below. Importantly, COPY is not any less fraught, and you have no choice about what to do with symlinks.
RSYNC takes the same arguments as rsync(1), so refer to its man page for a detailed explanation of all the options (with possible emphasis on its symlink options). Sources are relative to the context directory even if they look absolute with a leading slash. Any globbed sources are processed by ch-image(1) using Python rules, i.e., rsync(1) sees the expanded sources with no wildcards. Relative destinations are relative to the image’s current working directory, while absolute destinations refer to the image’s root.
For arguments that read input from a file (e.g. --exclude-from or --files-from), relative paths are relative to the context directory, absolute paths refer to the image root, and - (standard input) is an error.
For example,
WORKDIR /foo RSYNC --foo src1 src2 dst
is translated to (the equivalent of):
$ mkdir -p /foo $ rsync -@=-1 -AHSXpr --info=progress2 -l --safe-links \ --foo /context/src1 /context/src2 /storage/imgroot/foo/dst2
Note the extensive default arguments to rsync(1). RSYNC takes a single instruction option beginning with + (plus) that is shorthand for a group of rsync(1) options. This single option is one of:
Equivalent to the rsync(1) options listed for +m plus --links (copy symlinks as symlinks unless otherwise specified) and --safe-links (silently skip unsafe symlinks).
Equivalent to the rsync(1) options listed for +m plus --links (copy symlinks as symlinks unless otherwise specified) and --copy-unsafe-links (copy the target of unsafe symlinks).
NOTE:
A small number of rsync(1) features are actively disallowed:
Note that there are likely other flags that don’t make sense and/or cause undesirable behavior. We have not characterized this problem.
The instruction is a cache hit if the metadata of all source files is unchanged (specifically: filename, file type and permissions, xattrs, size, and last modified time). Unlike Docker, Charliecloud does not use file contents. This has two implications. First, it is possible to fool the cache by manually restoring the last-modified time. Second, RSYNC is I/O-intensive even when it hits, because it must stat(2) every source file before checking the cache. However, this is still less I/O than reading the file content too.
Notably, Charliecloud’s cache ignores rsync(1)’s own internal notion of whether anything would be transferred (e.g., rsync -ni). This may change in the future.
All of these examples use the same input, whose content will be introduced gradually, using edited output of ls -oghR (which is like ls -lhR but omits user and group). Examples assume a umask of 0007. The Dockerfile instructions listed also assume a preceding:
FROM alpine:3.17 RUN mkdir /dst
i.e., a simple base image containing a top-level directory dst.
Many additional examples are available in the source code in the file test/build/50_rsync.bats.
We begin by copying regular files. The context directory ctx contains, in part, two directories containing one regular file each. Note that one of these files (file-basic1) and one of the directories (basic1) have strange permissions.
./ctx: drwx---r-x 2 60 Oct 11 13:20 basic1 drwxrwx--- 2 60 Oct 11 13:20 basic2 ./ctx/basic1: -rw----r-- 1 12 Oct 11 13:20 file-basic1 ./ctx/basic2: -rw-rw---- 1 12 Oct 11 13:20 file-basic2
The simplest form of RSYNC is to copy a single file into a specified directory:
RSYNC /basic1/file-basic1 /dst
resulting in:
$ ls -oghR dst dst: -rw----r-- 1 12 Oct 11 13:26 file-basic1
Note that file-basic1’s metadata — here its odd permissions — are preserved. 1 is the number of hard links to the file, and 12 is the file size.
One can also rename the destination by specifying a new file name, and with +z, not copy metadata (from here on the ls command is omitted for brevity):
RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom
dst: -rw------- 1 12 Sep 21 15:51 file-basic1_nom
A trailing slash on the destination creates a new directory and places the source file within:
RSYNC /basic1/file-basic1 /dst/new/
dst: drwxrwx--- 1 22 Oct 11 13:26 new dst/new: -rw----r-- 1 12 Oct 11 13:26 file-basic1
With multiple source files, the destination trailing slash is optional:
RSYNC /basic1/file-basic1 /basic2/file-basic2 /dst/newB
dst: drwxrwx--- 1 44 Oct 11 13:26 newB dst/newB: -rw----r-- 1 12 Oct 11 13:26 file-basic1 -rw-rw---- 1 12 Oct 11 13:26 file-basic2
For directory sources, the presence or absence of a trailing slash is highly significant. Without one, the directory itself is placed in the destination (recall that this would rename a source file):
RSYNC /basic1 /dst/basic1_new
dst: drwxrwx--- 1 12 Oct 11 13:28 basic1_new dst/basic1_new: drwx---r-x 1 22 Oct 11 13:28 basic1 dst/basic1_new/basic1: -rw----r-- 1 12 Oct 11 13:28 file-basic1
A source trailing slash means copy the contents of a directory rather than the directory itself. Importantly, however, the directory’s metadata is copied to the destination directory.
RSYNC /basic1/ /dst/basic1_renamed
dst: drwx---r-x 1 22 Oct 11 13:28 basic1_renamed dst/basic1_renamed: -rw----r-- 1 12 Oct 11 13:28 file-basic1
One gotcha is that RSYNC +z is a no-op if the source is a directory:
RSYNC +z /basic1 /dst/basic1_newC
dst:
At least -r is needed with +z in this case:
RSYNC +z -r /basic1/ /dst/basic1_newD
dst: drwx------ 1 22 Oct 11 13:28 basic1_newD dst/basic1_newD: -rw------- 1 12 Oct 11 13:28 file-basic1
Multiple source directories can be specified, including with wildcards. This example also illustrates that copies files are by default merged with content already existing in the image.
RUN mkdir /dst/dstC && echo file-dstC > /dst/dstC/file-dstC RSYNC /basic* /dst/dstC
dst: drwxrwx--- 1 42 Oct 11 13:33 dstC dst/dstC: drwx---r-x 1 22 Oct 11 13:33 basic1 drwxrwx--- 1 22 Oct 11 13:33 basic2 -rw-rw---- 1 10 Oct 11 13:33 file-dstC dst/dstC/basic1: -rw----r-- 1 12 Oct 11 13:33 file-basic1 dst/dstC/basic2: -rw-rw---- 1 12 Oct 11 13:33 file-basic2
Trailing slashes can be specified independently for each source:
RUN mkdir /dst/dstF && echo file-dstF > /dst/dstF/file-dstF RSYNC /basic1 /basic2/ /dst/dstF
dst: drwxrwx--- 1 52 Oct 11 13:33 dstF dst/dstF: drwx---r-x 1 22 Oct 11 13:33 basic1 -rw-rw---- 1 12 Oct 11 13:33 file-basic2 -rw-rw---- 1 10 Oct 11 13:33 file-dstF dst/dstF/basic1: -rw----r-- 1 12 Oct 11 13:33 file-basic1
Bare / (i.e., the entire context directory) is considered to have a trailing slash:
RSYNC / /dst
dst: drwx---r-x 1 22 Oct 11 13:33 basic1 drwxrwx--- 1 22 Oct 11 13:33 basic2 dst/basic1: -rw----r-- 1 12 Oct 11 13:33 file-basic1 dst/basic2: -rw-rw---- 1 12 Oct 11 13:33 file-basic2
To replace (rather than merge with) existing content, use --delete. Note also that wildcards can be combined with trailing slashes and that the directory gets the metadata of the first slashed directory.
RUN mkdir /dst/dstG && echo file-dstG > /dst/dstG/file-dstG RSYNC --delete /basic*/ /dst/dstG
dst: drwx---r-x 1 44 Oct 11 14:00 dstG dst/dstG: -rw----r-- 1 12 Oct 11 14:00 file-basic1 -rw-rw---- 1 12 Oct 11 14:00 file-basic2
Symbolic links in the source(s) add significant complexity. Like rsync(1), RSYNC can do one of three things with a given symlink:
These actions are selected independently for safe symlinks and unsafe symlinks. Safe symlinks are those which point to a target within the top of transfer, which is the deepest directory in the source path with a trailing slash. For example, /foo/bar’s top-of-transfer is /foo (regardless of whether bar is a directory or file), while /foo/bar/’s top-of-transfer is /foo/bar.
For the symlink examples, the context contains two sub-directories with a variety of symlinks, as well as a sibling file and directory outside the context. All of these links are valid on the host. In this listing, the absolute path to the parent of the context directory is replaced with /....
.: drwxrwx--- 9 200 Oct 11 14:00 ctx drwxrwx--- 2 60 Oct 11 14:00 dir-out -rw-rw---- 1 9 Oct 11 14:00 file-out ./ctx: drwxrwx--- 3 320 Oct 11 14:00 sym1 ./ctx/sym1: lrwxrwxrwx 1 13 Oct 11 14:00 dir-out_rel -> ../../dir-out drwxrwx--- 2 60 Oct 11 14:00 dir-sym1 lrwxrwxrwx 1 8 Oct 11 14:00 dir-sym1_direct -> dir-sym1 lrwxrwxrwx 1 10 Oct 11 14:00 dir-top_rel -> ../dir-top lrwxrwxrwx 1 47 Oct 11 14:00 file-out_abs -> /.../file-out lrwxrwxrwx 1 14 Oct 11 14:00 file-out_rel -> ../../file-out -rw-rw---- 1 10 Oct 11 14:00 file-sym1 lrwxrwxrwx 1 57 Oct 11 14:00 file-sym1_abs -> /.../ctx/sym1/file-sym1 lrwxrwxrwx 1 9 Oct 11 14:00 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 17 Oct 11 14:00 file-sym1_upover -> ../sym1/file-sym1 lrwxrwxrwx 1 51 Oct 11 14:00 file-top_abs -> /.../ctx/file-top lrwxrwxrwx 1 11 Oct 11 14:00 file-top_rel -> ../file-top ./ctx/sym1/dir-sym1: -rw-rw---- 1 14 Oct 11 14:00 dir-sym1.file ./dir-out: -rw-rw---- 1 13 Oct 11 14:00 dir-out.file
By default, safe symlinks are preserved while unsafe symlinks are silently ignored:
RSYNC /sym1 /dst
dst: drwxrwx--- 1 206 Oct 11 17:10 sym1 dst/sym1: drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 lrwxrwxrwx 1 10 Oct 11 17:10 dir-top_rel -> ../dir-top -rw-rw---- 1 10 Oct 11 17:10 file-sym1 lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 17 Oct 11 17:10 file-sym1_upover -> ../sym1/file-sym1 lrwxrwxrwx 1 17 Oct 11 17:10 file-sym2_upover -> ../sym2/file-sym2 lrwxrwxrwx 1 11 Oct 11 17:10 file-top_rel -> ../file-top dst/sym1/dir-sym1: -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file
The source files have four rough fates:
The top-of-transfer can be changed to sym1 with a trailing slash. This also adds sym1 to the destination so the resulting directory structure is the same.
RSYNC /sym1/ /dst/sym1
dst: drwxrwx--- 1 96 Oct 11 17:10 sym1 dst/sym1: drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 -rw-rw---- 1 10 Oct 11 17:10 file-sym1 lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 dst/sym1/dir-sym1: -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file
*_upover and *-out_rel are now unsafe and replaced with their targets.
Another common use case is to follow unsafe symlinks and copy their targets in place of the links. This is accomplished with +u:
RSYNC +u /sym1/ /dst/sym1
dst: drwxrwx--- 1 352 Oct 11 17:10 sym1 dst/sym1: drwxrwx--- 1 24 Oct 11 17:10 dir-out_rel drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 drwxrwx--- 1 24 Oct 11 17:10 dir-top_rel -rw-rw---- 1 9 Oct 11 17:10 file-out_abs -rw-rw---- 1 9 Oct 11 17:10 file-out_rel -rw-rw---- 1 10 Oct 11 17:10 file-sym1 -rw-rw---- 1 10 Oct 11 17:10 file-sym1_abs lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 -rw-rw---- 1 10 Oct 11 17:10 file-sym1_upover -rw-rw---- 1 10 Oct 11 17:10 file-sym2_abs -rw-rw---- 1 10 Oct 11 17:10 file-sym2_upover -rw-rw---- 1 9 Oct 11 17:10 file-top_abs -rw-rw---- 1 9 Oct 11 17:10 file-top_rel dst/sym1/dir-out_rel: -rw-rw---- 1 13 Oct 11 17:10 dir-out.file dst/sym1/dir-sym1: -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file dst/sym1/dir-top_rel: -rw-rw---- 1 13 Oct 11 17:10 dir-top.file
Now all the unsafe symlinks noted above are present in the image, but they have changed to the normal files and directories pointed to.
WARNING:
The sources themselves, if symlinks, do not get special treatment:
RSYNC /sym1/file-sym1_direct /sym1/file-sym1_upover /dst
dst: lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1
Note that file-sym1_upover does not appear in the image, despite being named explicitly in the instruction, because it is an unsafe symlink.
If the destination is a symlink to a file, and the source is a file, the link is replaced and the target is unchanged. (If the source is a directory, that is an error.)
RUN touch /dst/file-dst && ln -s file-dst /dst/file-dst_direct RSYNC /file-top /dst/file-dst_direct
dst: -rw-rw---- 1 0 Oct 11 17:42 file-dst -rw-rw---- 1 9 Oct 11 17:42 file-dst_direct
If the destination is a symlink to a directory, the link is followed:
RUN mkdir /dst/dir-dst && ln -s dir-dst /dst/dir-dst_direct RSYNC /file-top /dst/dir-dst_direct
dst: drwxrwx--- 1 16 Oct 11 17:50 dir-dst lrwxrwxrwx 1 7 Oct 11 17:50 dir-dst_direct -> dir-dst dst/dir-dst: -rw-rw---- 1 9 Oct 11 17:50 file-top
Build image bar using ./foo/bar/Dockerfile and context directory ./foo/bar:
$ ch-image build -t bar -f ./foo/bar/Dockerfile ./foo/bar [...] grown in 4 instructions: bar
Same, but infer the image name and Dockerfile from the context directory path:
$ ch-image build ./foo/bar [...] grown in 4 instructions: bar
Build using humongous vendor compilers you want to bind-mount instead of installing into the image:
$ ch-image build --bind /opt/bigvendor:/opt . $ cat Dockerfile FROM centos:7 RUN /opt/bin/cc hello.c #COPY /opt/lib/*.so /usr/local/lib # fail: COPY doesn’t bind mount RUN cp /opt/lib/*.so /usr/local/lib # possible workaround RUN ldconfig
$ ch-image [...] build-cache [...]
Print basic information about the cache. If -v is given, also print some Git statistics and the Git repository configuration.
If any of the following options are given, do the corresponding operation before printing. Multiple options can be given, in which case they happen in this order.
$ ch-image [...] delete IMAGE_GLOB [IMAGE_GLOB ... ]
Delete the image(s) described by each IMAGE_GLOB from the storage directory (including all build stages).
IMAGE_GLOB can be either a plain image reference or an image reference with glob characters to match multiple images. For example, ch-image delete 'foo*' will delete all images whose names start with foo. Multiple images and/or globs can also be given in a single command line.
Importantly, this sub-command does not also remove the image from the build cache. Therefore, it can be used to reduce the size of the storage directory, trading off the time needed to retrieve an image from cache.
WARNING:
$ ch-image [...] gestalt [SELECTOR]
Provide information about the configuration and available features of ch-image. End users generally will not need this; it is intended for testing and debugging.
SELECTOR is one of:
Print information about images. If no argument given, list the images in builder storage.
$ ch-image [...] list [-l] [IMAGE_REF]
Optional argument:
List images in builder storage:
$ ch-image list alpine:3.17 (amd64) alpine:latest (amd64) debian:buster (amd64)
Print details about Debian Buster image:
$ ch-image list debian:buster details of image: debian:buster in local storage: no full remote ref: registry-1.docker.io:443/library/debian:buster available remotely: yes remote arch-aware: yes host architecture: amd64 archs available: 386 bae2738ed83 amd64 98285d32477 arm/v7 97247fd4822 arm64/v8 122a0342878
For remotely available images like Debian Buster, the associated digest is listed beside each available architecture. Importantly, this feature does not provide the hash of the local image, which is only calculated on push.
$ ch-image [...] import PATH IMAGE_REF
Copy the image at PATH into builder storage with name IMAGE_REF. PATH can be:
If the imported image contains Charliecloud metadata, that will be imported unchanged, i.e., images exported from ch-image builder storage will be functionally identical when re-imported.
WARNING:
Pull the image described by the image reference IMAGE_REF from a repository to the local filesystem.
$ ch-image [...] pull [...] IMAGE_REF [DEST_REF]
See the FAQ for the gory details on specifying image references.
Destination:
Options:
This script does a fair amount of validation and fixing of the layer tarballs before flattening in order to support unprivileged use despite image problems we frequently see in the wild. For example, device files are ignored, and file and directory permissions are increased to a minimum of rwx------ and rw------- respectively. Note, however, that symlinks pointing outside the image are permitted, because they are not resolved until runtime within a container.
The following metadata in the pulled image is retained; all other metadata is currently ignored. (If you have a need for additional metadata, please let us know!)
Note that some images (e.g., those with a “version 1 manifest”) do not contain metadata. A warning is printed in this case.
Download the Debian Buster image matching the host’s architecture and place it in the storage directory:
$ uname -m aarch32 pulling image: debian:buster requesting arch: arm64/v8 manifest list: downloading manifest: downloading config: downloading layer 1/1: c54d940: downloading flattening image layer 1/1: c54d940: listing validating tarball members resolving whiteouts layer 1/1: c54d940: extracting image arch: arm64 done
Same, specifying the architecture explicitly:
$ ch-image --arch=arm/v7 pull debian:buster pulling image: debian:buster requesting arch: arm/v7 manifest list: downloading manifest: downloading config: downloading layer 1/1: 8947560: downloading flattening image layer 1/1: 8947560: listing validating tarball members resolving whiteouts layer 1/1: 8947560: extracting image arch: arm (may not match host arm64/v8)
Push the image described by the image reference IMAGE_REF from the local filesystem to a repository.
$ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]
See the FAQ for the gory details on specifying image references.
Destination:
Options:
Because Charliecloud is fully unprivileged, the owner and group of files in its images are not meaningful in the broader ecosystem. Thus, when pushed, everything in the image is flattened to user:group root:root. Also, setuid/setgid bits are removed, to avoid surprises if the image is pulled by a privileged container implementation.
Push a local image to the registry example.com:5000 at path /foo/bar with tag latest. Note that in this form, the local image must be named to match that remote reference.
$ ch-image push example.com:5000/foo/bar:latest pushing image: example.com:5000/foo/bar:latest layer 1/1: gathering layer 1/1: preparing preparing metadata starting upload layer 1/1: a1664c4: checking if already in repository layer 1/1: a1664c4: not present, uploading config: 89315a2: checking if already in repository config: 89315a2: not present, uploading manifest: uploading cleaning up done
Same, except use local image alpine:3.17. In this form, the local image name does not have to match the destination reference.
$ ch-image push alpine:3.17 example.com:5000/foo/bar:latest pushing image: alpine:3.17 destination: example.com:5000/foo/bar:latest layer 1/1: gathering layer 1/1: preparing preparing metadata starting upload layer 1/1: a1664c4: checking if already in repository layer 1/1: a1664c4: not present, uploading config: 89315a2: checking if already in repository config: 89315a2: not present, uploading manifest: uploading cleaning up done
Same, except use unpacked image located at /var/tmp/image rather than an image in ch-image storage. (Also, the sole layer is already present in the remote registry, so we don’t upload it again.)
$ ch-image push --image /var/tmp/image example.com:5000/foo/bar:latest pushing image: example.com:5000/foo/bar:latest image path: /var/tmp/image layer 1/1: gathering layer 1/1: preparing preparing metadata starting upload layer 1/1: 892e38d: checking if already in repository layer 1/1: 892e38d: already present config: 546f447: checking if already in repository config: 546f447: not present, uploading manifest: uploading cleaning up done
$ ch-image [...] reset
Delete all images and cache from ch-image builder storage.
$ ch-image [...] undelete IMAGE_REF
If IMAGE_REF has been deleted but is in the build cache, recover it from the cache. Only available when the cache is enabled, and will not overwrite IMAGE_REF if it exists.
Also sets verbose mode if not already set (equivalent to --verbose).
If Charliecloud was obtained from your Linux distribution, use your distribution’s bug reporting procedures.
Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues
charliecloud(7)
Full documentation at: <https://hpc.github.io/charliecloud>
2014–2023, Triad National Security, LLC and others
2024-04-01 05:37 UTC | 0.37 |