job_container.conf(5) | Slurm Configuration File | job_container.conf(5) |
job_container.conf - Slurm configuration file for job_container/tmpfs plugin
job_container.conf is an ASCII file which defines parameters used by Slurm's job_container/tmpfs plugin. The plugin reads the job_container.conf file to find out the configuration settings. Based on them it constructs a private (or optionally shared) filesystem namespace for the job and mounts a list of directories (defaults to /tmp and /dev/shm) inside it. This gives the job a private view of these directories. These paths are mounted inside the location specified by 'BasePath' in the job_container.conf file. When the job completes, the private namespace is unmounted and all files therein are automatically removed. To make use of this plugin, 'PrologFlags=Contain' must also be present in your slurm.conf file, as shown:
JobContainerType=job_container/tmpfs PrologFlags=Contain
The file will always be located in the same directory as the slurm.conf.
If using the job_container.conf file to define a namespace available to nodes the first parameter on the line should be NodeName. If configuring a namespace without specifying nodes, the first parameter on the line should be BasePath.
Parameter names are case insensitive. Any text following a "#" in the configuration file is treated as a comment through the end of that line. Changes to the configuration file take effect upon restart of Slurm daemons.
The following job_container.conf parameters are defined to control the behavior of the job_container/tmpfs plugin.
NOTE: The BasePath must be unique to each node. If BasePath is on a shared filesystem, you can use "%h" or "%n" to create node-unique directories.
NOTE: The BasePath parameter cannot be set to any of the paths specified by Dirs. Using these directories will cause conflicts when trying to mount and unmount the private directories for the job.
NOTE: /dev/shm has special handling, and instead of a bind mount is always a fresh tmpfs filesystem.
If any parameters in job_container.conf are changed while slurm is running, then slurmd on the respective nodes will need to be restarted for changes to take effect (scontrol reconfigure is not sufficient). Additionally this can be disruptive to jobs already running on the node. So care must be taken to make sure no jobs are running if any changes to job_container.conf are deployed.
Restarting slurmd is safe and non-disruptive to running jobs, as long as job_container.conf is not changed between restarts in which case above point applies.
JobContainerType=job_container/tmpfs PrologFlags=Contain
AutoBasePath=true BasePath=/var/nvme/storage
The second sample file will define 2 basepaths. The first will only be on largemem[1-2] and it will be automatically created. The second will only be on gpu[1-10], will be expected to exist and will run an initscript before each job.
NodeName=largemem[1-2] AutoBasePath=true BasePath=/var/nvme/storage_a NodeName=gpu[1-10] BasePath=/var/nvme/storage_b InitScript=/etc/slurm/init.sh
The third sample file will Define 1 basepath that will be on all nodes, automatically created, with /tmp and /var/tmp as private mounts.
AutoBasePath=true BasePath=/var/nvme/storage Dirs=/tmp,/var/tmp
Copyright (C) 2021 Regents of the University of California
Produced at Lawrence Berkeley National Laboratory
Copyright (C) 2021-2022 SchedMD LLC.
This file is part of Slurm, a resource management program. For details, see <https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
slurm.conf(5)
Slurm Configuration File | November 2024 |