CH-FROMHOST(1) | Charliecloud | CH-FROMHOST(1) |
ch-fromhost - Inject files from the host into an image directory, with various magic
$ ch-fromhost [OPTION ...] [FILE_OPTION ...] IMGDIR
NOTE:
Inject files from the host into the Charliecloud image directory IMGDIR.
The purpose of this command is to inject arbitrary host files into a container necessary to access host specific resources; usually GPU or proprietary interconnects. It is not a general copy-to-image tool; see further discussion on use cases below.
It should be run after:code:ch-convert and before ch-run. After invocation, the image is no longer portable to other hosts.
Injection is not atomic; if an error occurs partway through injection, the image is left in an undefined state and should be re-unpacked from storage. Injection is currently implemented using a simple file copy, but that may change in the future.
Arbitrary file and libfabric injection are handled differently.
Arbitrary file paths that contain the strings /bin or /sbin are assumed to be executables and placed in /usr/bin within the container. Paths that are not loadable libfabric providers and contain the strings /lib or .so are assumed to be shared libraries and are placed in the first-priority directory reported by ldconfig (see --lib-path below). Other files are placed in the directory specified by --dest.
If any shared libraries are injected, run ldconfig inside the container (using ch-run -w) after injection.
MPI implementations have numerous ways of communicating messages over interconnects. We use libfabric (OFI), an OpenFabric framework that exports fabric communication services to applications, to manage these communications with built-in, or loadable, fabric providers.
Using OFI, we can (a) uniformly manage fabric communication services for both OpenMPI and MPICH, and (b) use simplified methods of accessing proprietary host hardware, e.g., Cray’s Gemini/Aries and Slingshot (CXI).
OFI providers implement the application facing software interfaces needed to access network specific protocols, drivers, and hardware. Loadable providers, i.e., compiled OFI libraries that end in -fi.so, for example, Cray’s libgnix-fi.so, can be copied into, and used, by an image with a MPI configured against OFI. Alternatively, the image’s libfabric.so can be overwritten with the host’s. See details and quirks below.
These can be repeated, and at least one must be specified.
WARNING:
$ ch-fromhost --print-lib /var/tmp/bullseye /usr/local/lib $ ch-fromhost -v --print-lib /var/tmp/bullseye asking ldconfig for inferred shared library destination inferred shared library destination: /var/tmp/bullseye//usr/local/lib /usr/local/lib $ ch-fromhost -v -v --print-lib /var/tmp/bullseye asking ldconfig for inferred shared library destination /sbin/ldconfig: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory /sbin/ldconfig: Path `/lib/x86_64-linux-gnu' given more than once /sbin/ldconfig: Path `/usr/lib/x86_64-linux-gnu' given more than once /sbin/ldconfig: /lib/x86_64-linux-gnu/ld-2.31.so is the dynamic linker, ignoring inferred shared library destination: /var/tmp/bullseye//usr/local/lib /usr/local/lib
See issue #732 for an example of how this was confusing for users.
This command does a lot of heuristic magic; while it can copy arbitrary files into an image, this usage is discouraged and prone to error. Here are some use cases and the recommended approach:
The implementation of libfabric provider injection and replacement is experimental and has a couple quirks.
To avoid issues and reduce complexity, the inferred injection destination for libfabric providers and replacement will always at the path in the image where libfabric.so is found.
Managing all possible bind mount paths is untenable. Thus, this experimental implementation injects libraries linked to a libgnix-fi.so built with the minimal modules necessary to compile, i.e.:
A Cray GNI provider linked against more complicated PE’s will still work, assuming 1) the user explicitly bind-mounts missing libraries listed from its ldd output, and 2) all such libraries do not conflict with container functionality, e.g., glibc.so, etc.
For now, on Cray systems with Slingshot, CXI, we need overwrite the container’s libfabric.so with the hosts using --path. See examples for details.
Please file a bug if we missed anything above or if you know how to make the code better.
Symbolic links are dereferenced, i.e., the files pointed to are injected, not the links themselves.
As a corollary, do not include symlinks to shared libraries. These will be re-created by ldconfig.
There are two alternate approaches for nVidia GPU libraries:
Further, while these alternate approaches would simplify or eliminate this script for nVidia GPUs, they would not solve the problem for other situations.
File paths may not contain colons or newlines.
ldconfig tends to print stat errors; these are typically non-fatal and occur when trying to probe common library paths. See issue #732.
Cray Slingshot CXI injection.
Replace image libabfric, i.e., libfabric.so, with Cray host’s libfabric at host path /opt/cray-libfabric/lib64/libfabric.so.
$ ch-fromhost -v --path /opt/cray-libfabric/lib64/libfabric.so /tmp/ompi [ debug ] queueing files [ debug ] cray libfabric: /opt/cray-libfabric/lib64/libfabric.so [ debug ] searching image for inferred libfabric destiation [ debug ] found /tmp/ompi/usr/local/lib/libfabric.so [ debug ] adding cray libfabric libraries [ debug ] skipping /lib64/libcom_err.so.2 [...] [ debug ] queueing files [ debug ] shared library: /usr/lib64/libcxi.so.1 [ debug ] queueing files [ debug ] shared library: /usr/lib64/libcxi.so.1.2.1 [ debug ] queueing files [ debug ] shared library: /usr/lib64/libjson-c.so.3 [ debug ] queueing files [ debug ] shared library: /usr/lib64/libjson-c.so.3.0.1 [...] [ debug ] queueing files [ debug ] shared library: /usr/lib64/libssh.so.4 [ debug ] queueing files [ debug ] shared library: /usr/lib64/libssh.so.4.7.4 [...] [ debug ] inferred shared library destination: /tmp/ompi//usr/local/lib [ debug ] injecting into image: /tmp/ompi/ [ debug ] mkdir -p /tmp/ompi//var/lib/hugetlbfs [ debug ] mkdir -p /tmp/ompi//var/spool/slurmd [ debug ] echo '/usr/lib64' >> /tmp/ompi//etc/ld.so.conf.d/ch-ofi.conf [ debug ] /opt/cray-libfabric/lib64/libfabric.so -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libcxi.so.1 -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libcxi.so.1.2.1 -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libjson-c.so.3 -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libjson-c.so.3.0.1 -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libssh.so.4 -> /usr/local/lib (inferred) [ debug ] /usr/lib64/libssh.so.4.7.4 -> /usr/local/lib (inferred) [ debug ] running ldconfig [ debug ] ch-run -w /tmp/ompi/ -- /sbin/ldconfig [ debug ] validating ldconfig cache done
Same as above, except also inject Cray’s fi_info to verify Slingshot provider access.
$ ch-fromhost -v --path /opt/cray/libfabric/1.15.0.0/lib64/libfabric.so \ -d /usr/local/bin \ --path /opt/cray/libfabric/1.15.0.0/lib64/libfabric.so \ /tmp/ompi [...] $ ch-run /tmp/ompi/ -- fi_info -p cxi provider: cxi fabric: cxi [...] type: FI_EP_RDM protocol: FI_PROTO_CXI
Cray GNI shared provider injection.
Add Cray host built GNI provider libgnix-fi.so to the image and verify with fi_info.
$ ch-fromhost -v --path /home/ofi/libgnix-fi.so /tmp/ompi [ debug ] queueing files [ debug ] libfabric shared provider: /home/ofi/libgnix-fi.so [ debug ] searching /tmp/ompi for libfabric shared provider destination [ debug ] found: /tmp/ompi/usr/local/lib/libfabric.so [ debug ] inferred provider destination: //usr/local/lib/libfabric [ debug ] injecting into image: /tmp/ompi [ debug ] mkdir -p /tmp/ompi//usr/local/lib/libfabric [ debug ] mkdir -p /tmp/ompi/var/lib/hugetlbfs [ debug ] mkdir -p /tmp/ompi/var/opt/cray/alps/spool [ debug ] mkdir -p /tmp/ompi/opt/cray/wlm_detect [ debug ] mkdir -p /tmp/ompi/etc/opt/cray/wlm_detect [ debug ] mkdir -p /tmp/ompi/opt/cray/udreg [ debug ] mkdir -p /tmp/ompi/opt/cray/xpmem [ debug ] mkdir -p /tmp/ompi/opt/cray/ugni [ debug ] mkdir -p /tmp/ompi/opt/cray/alps [ debug ] echo '/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/opt/cray/alps/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/opt/cray/udreg/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/opt/cray/ugni/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/opt/cray/wlm_detect/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/opt/cray/xpmem/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] echo '/usr/lib64' >> /tmp/ompi/etc/ld.so.conf.d/ch-ofi.conf [ debug ] /home/ofi/libgnix-fi.so -> //usr/local/lib/libfabric (inferred) [ debug ] running ldconfig [ debug ] ch-run -w /tmp/ompi -- /sbin/ldconfig [ debug ] validating ldconfig cache done $ ch-run /tmp/ompi -- fi_info -p gni provider: gni fabric: gni [...] type: FI_EP_RDM protocol: FI_PROTO_GNI
Place shared library /usr/lib64/libfoo.so at path /usr/lib/libfoo.so (assuming /usr/lib is the first directory searched by the dynamic loader in the image), within the image /var/tmp/baz and executable /bin/bar at path /usr/bin/bar. Then, create appropriate symlinks to libfoo and update the ld.so cache.
$ cat qux.txt /bin/bar /usr/lib64/libfoo.so $ ch-fromhost --file qux.txt /var/tmp/baz
Same as above:
$ ch-fromhost --cmd 'cat qux.txt' /var/tmp/baz
Same as above:
$ ch-fromhost --path /bin/bar --path /usr/lib64/libfoo.so /var/tmp/baz
Same as above, but place the files into /corge instead (and the shared library will not be found by ldconfig):
$ ch-fromhost --dest /corge --file qux.txt /var/tmp/baz
Same as above, and also place file /etc/quux at /etc/quux within the container:
$ ch-fromhost --file qux.txt --dest /etc --path /etc/quux /var/tmp/baz
Inject the executables and libraries recommended by nVidia into the image, and then run ldconfig:
$ ch-fromhost --nvidia /var/tmp/baz asking ldconfig for shared library destination /sbin/ldconfig: Can’t stat /libx32: No such file or directory /sbin/ldconfig: Can’t stat /usr/libx32: No such file or directory shared library destination: /usr/lib64//bind9-export injecting into image: /var/tmp/baz /usr/bin/nvidia-smi -> /usr/bin (inferred) /usr/bin/nvidia-debugdump -> /usr/bin (inferred) /usr/bin/nvidia-persistenced -> /usr/bin (inferred) /usr/bin/nvidia-cuda-mps-control -> /usr/bin (inferred) /usr/bin/nvidia-cuda-mps-server -> /usr/bin (inferred) /usr/lib64/libnvidia-ml.so.460.32.03 -> /usr/lib64//bind9-export (inferred) /usr/lib64/libnvidia-cfg.so.460.32.03 -> /usr/lib64//bind9-export (inferred) [...] /usr/lib64/libGLESv2_nvidia.so.460.32.03 -> /usr/lib64//bind9-export (inferred) /usr/lib64/libGLESv1_CM_nvidia.so.460.32.03 -> /usr/lib64//bind9-export (inferred) running ldconfig
This command was inspired by the similar Shifter feature that allows Shifter containers to use the Cray Aries network. We particularly appreciate the help provided by Shane Canon and Doug Jacobsen during our implementation of --cray-mpi.
We appreciate the advice of Ryan Olson at nVidia on implementing --nvidia.
If Charliecloud was obtained from your Linux distribution, use your distribution’s bug reporting procedures.
Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues
charliecloud(7)
Full documentation at: <https://hpc.github.io/charliecloud>
2014–2023, Triad National Security, LLC and others
2024-04-01 05:37 UTC | 0.37 |