CONDOR_SUBMIT(1) | HTCondor Manual | CONDOR_SUBMIT(1) |
condor_submit - HTCondor Manual
Queue jobs for execution under HTCondor
condor_submit [-terse ] [-verbose ] [-unused ] [-file submit_file] [-name schedd_name] [-remote schedd_name] [-addr <ip:port>] [-pool pool_name] [-disable ] [-password passphrase] [-debug ] [-append command ...][-batch-name batch_name] [-spool ] [-dump filename] [-interactive ] [-factory ] [-allow-crlf-script ] [-dry-run ] [-maxjobs number-of-jobs] [-single-cluster ] [<submit-variable>=<value> ] [submit description file ] [-queue queue_arguments]
condor_submit is the program for submitting jobs for execution under HTCondor. condor_submit requires one or more submit description commands to direct the queuing of jobs. These commands may come from a file, standard input, the command line, or from some combination of these. One submit description may contain specifications for the queuing of many HTCondor jobs at once. A single invocation of condor_submit may cause one or more clusters. A cluster is a set of jobs specified in the submit description between queue commands for which the executable is not changed. It is advantageous to submit multiple jobs as a single cluster because the schedd uses much less memory to hold the jobs.
Multiple clusters may be specified within a single submit description. Each cluster must specify a single executable.
The job ClassAd attribute ClusterId identifies a cluster.
The submit description file argument is the path and file name of the submit description file. If this optional argument is the dash character (-), then the commands are taken from standard input. If - is specified for the submit description file, -verbose is implied; this can be overridden by specifying -terse.
If no submit description file argument is given, and no -queue argument is given, commands are taken automatically from standard input.
Note that submission of jobs from a Windows machine requires a stashed password to allow HTCondor to impersonate the user submitting the job. To stash a password, use the condor_store_cred command. See the manual page for details.
For lengthy lines within the submit description file, the backslash (\) is a line continuation character. Placing the backslash at the end of a line causes the current line's command to be continued with the next line of the file. Submit description files may contain comments. A comment is any line beginning with a pound character (#).
condor_submit mysubmitfile -append "queue input in A, B, C"
then the entire -append command line option and its arguments are converted to
condor_submit mysubmitfile -queue input in A, B, C
The submit description file is not modified. Multiple commands are specified by using the -append option multiple times. Each new command is given in a separate -append option. Commands with spaces in them will need to be enclosed in double quote marks.
On a Unix command line, the shell expands file globs before parsing occurs.
Note: more information on submitting HTCondor jobs can be found here: Submitting a Job.
As of version 8.5.6, the condor_submit language supports multi-line values in commands. The syntax is the same as the configuration language (see more details here: admin-manual/introduction-to-configuration:multi-line values).
Each submit description file describes one or more clusters of jobs to be placed in the HTCondor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments. The submit description file is generally the last command-line argument to condor_submit. If the submit description file argument is omitted, condor_submit will read the submit description from standard input.
The submit description file must contain at least one executable command and at least one queue command. All of the other commands have default actions.
Note that a submit file that contains more than one executable command will produce multiple clusters when submitted. This is not generally recommended, and is not allowed for submit files that are run as DAG node jobs by condor_dagman.
The commands which can appear in the submit description file are numerous. They are listed here in alphabetical order by category.
BASIC COMMANDS
In the java universe, the first argument must be the name of the class containing main.
There are two permissible formats for specifying arguments, identified as the old syntax and the new syntax. The old syntax supports white space characters within arguments only in special circumstances; when used, the command line arguments are represented in the job ClassAd attribute Args. The new syntax supports uniform quoting of white space characters within arguments; when used, the command line arguments are represented in the job ClassAd attribute Arguments.
Old Syntax
In the old syntax, individual command line arguments are delimited (separated) by space characters. To allow a double quote mark in an argument, it is escaped with a backslash; that is, the two character sequence \" becomes a single double quote mark within an argument.
Further interpretation of the argument string differs depending on the operating system. On Windows, the entire argument string is passed verbatim (other than the backslash in front of double quote marks) to the Windows application. Most Windows applications will allow spaces within an argument value by surrounding the argument with double quotes marks. In all other cases, there is no further interpretation of the arguments.
Example:
arguments = one \"two\" 'three'
Produces in Unix vanilla universe:
argument 1: one argument 2: "two" argument 3: 'three'
New Syntax
Here are the rules for using the new syntax:
Example:
arguments = "3 simple arguments"
Produces:
argument 1: 3 argument 2: simple argument 3: arguments
Another example:
arguments = "one 'two with spaces' 3"
Produces:
argument 1: one argument 2: two with spaces argument 3: 3
And yet another example:
arguments = "one ""two"" 'spacey ''quoted'' argument'"
Produces:
argument 1: one argument 2: "two" argument 3: spacey 'quoted' argument
Notice that in the new syntax, the backslash has no special meaning. This is for the convenience of Windows users.
There are two different formats for specifying the environment variables: the old format and the new format. The old format is retained for backward-compatibility. It suffers from a platform-dependent syntax and the inability to insert some special characters into the environment.
The new syntax for specifying environment values:
<name>=<value>
Example:
environment = "one=1 two=""2"" three='spacey ''quoted'' value'"
Produces the following environment entries:
one=1 two="2" three=spacey 'quoted' value
Under the old syntax, there are no double quote marks surrounding the environment specification. Each environment entry remains of the form
<name>=<value>
Under Unix, list multiple environment entries by separating them with a semicolon (;). Under Windows, separate multiple entries with a vertical bar (|). There is no way to insert a literal semicolon under Unix or a literal vertical bar under Windows. Note that spaces are accepted, but rarely desired, characters within parameter names and values, because they are treated as literal characters, not separators or ignored white space. Place spaces within the parameter list only if required.
A Unix example:
environment = one=1;two=2;three="quotes have no 'special' meaning"
This produces the following:
one=1 two=2 three="quotes have no 'special' meaning"
If the environment is set with the environment command and getenv is also set, values specified with environment override values in the submitter's environment (regardless of the order of the environment and getenv commands).
If no path or a relative path is used, then the executable file is presumed to be relative to the current working directory of the user as the condor_submit command is issued.
Matchlist is a comma, semicolon or space separated list of environment variable names and name patterns that match or reject names. Matchlist members are matched case-insensitively to each name in the environment and those that match are imported. Matchlist members can contain * as wildcard character which matches anything at that position. Members can have two * characters if one of them is at the end. Members can be prefixed with ! to force a matching environment variable to not be imported. The order of members in the Matchlist has no effect on the result. getenv = true is equivalent to getenv = *
Prior to HTCondor 8.9.7 getenv allows only True or False as values.
Examples:
# import everything except PATH and INCLUDE (also path, include and other case-variants) getenv = !PATH, !INCLUDE # import everything with CUDA in the name getenv = *cuda* # Import every environment variable that starts with P or Q, except PATH getenv = !path, P*, Q*
If the environment is set with the environment command and getenv is also set, values specified with environment override values in the submitter's environment (regardless of the order of the environment and getenv commands).
Note that this command does not refer to the command-line arguments of the program. The command-line arguments are specified by the arguments command.
job-owner@UID_DOMAIN
where the configuration variable UID_DOMAIN is specified by the HTCondor site administrator. If UID_DOMAIN has not been specified, HTCondor sends the e-mail to:
job-owner@submit-machine-name
Note that if a program explicitly opens and writes to a file, that file should not be specified as the output file.
Note that the priority setting in an HTCondor submit file will be overridden by condor_dagman if the submit file is used for a node in a DAG, and the priority of the node within the DAG is non-zero (see automated-workflows/dagman-priorities:Setting Priorities for Nodes for more details).
The optional argument <int expr> specifies how many times to repeat the job submission for a given set of arguments. It may be an integer or an expression that evaluates to an integer, and it defaults to 1. All but the first form of this command are various ways of specifying a list of items. When these forms are used <int expr> jobs will be queued for each item in the list. The in, matching and from keyword indicates how the list will be specified.
The optional argument <varname> or <list of varnames> is the name or names of of variables that will be set to the value of the current item when queuing the job. If no <varname> is specified the variable ITEM will be used. Leading and trailing whitespace be trimmed. The optional argument <slice> is a python style slice selecting only some of the items in the list of items. Negative step values are not supported.
A submit file may contain more than one queue statement, and if desired, any commands may be placed between subsequent queue commands, such as new input, output, error, initialdir, or arguments commands. This is handy when submitting multiple runs into one cluster with one submit description file.
The vanilla universe is the default (except where the configuration variable DEFAULT_UNIVERSE defines it otherwise).
The scheduler universe is for a job that is to run on the machine where the job is submitted. This universe is intended for a job that acts as a metascheduler and will not be preempted.
The local universe is for a job that is to run on the machine where the job is submitted. This universe runs the job immediately and will not preempt the job.
The grid universe forwards the job to an external job management system. Further specification of the grid universe is done with the grid_resource command.
The java universe is for programs written to the Java Virtual Machine.
The vm universe facilitates the execution of a virtual machine.
The parallel universe is for parallel jobs (e.g. MPI) that require multiple machines in order to run.
The docker universe runs a docker container as an HTCondor job.
COMMANDS FOR MATCHMAKING
request_memory = max({60, Target.TotalSlotMemory}) rank = Memory
asks HTCondor to find all available machines with more than 60 megabytes of memory and give to the job the machine with the most amount of memory. The HTCondor User's Manual contains complete information on the syntax and available attributes that can be used in the ClassAd expression.
&& (RequestCpus <= Target.Cpus)
is appended to the requirements expression for the job.
For pools that enable dynamic condor_startd provisioning, specifies the minimum number of CPUs requested for this job, resulting in a dynamic slot being created with this many cores.
&& (RequestDisk <= Target.Disk)
is appended to the requirements expression for the job.
For pools that enable dynamic condor_startd provisioning, a dynamic slot will be created with at least this much disk space.
Characters may be appended to a numerical value to indicate units. K or KB indicates KiB, 210 numbers of bytes. M or MB indicates MiB, 220 numbers of bytes. G or GB indicates GiB, 230 numbers of bytes. T or TB indicates TiB, 240 numbers of bytes.
&& (Target.GPUs >= RequestGPUs)
is appended to the requirements expression for the job.
For pools that enable dynamic condor_startd provisioning, specifies the minimum number of GPUs requested for this job, resulting in a dynamic slot being created with this many GPUs.
&& (countMatches(MY.RequireGPUs, TARGET.AvailableGPUs) >= RequestGPUs)
is appended to the requirements expression for the job. This expression cannot be evaluated by HTCondor prior to version 9.8.0. A warning to this will effect will be printed when condor_submit detects this condition.
For pools that enable dynamic condor_startd provisioning and are at least version 9.8.0, the constraint will be tested against the properties of AvailableGPUs and only those that match will be assigned to the dynamic slot.
For pools that enable dynamic condor_startd provisioning, a dynamic slot will be created with at least this much RAM.
The expression
&& (RequestMemory <= Target.Memory)
is appended to the requirements expression for the job.
Characters may be appended to a numerical value to indicate units. K or KB indicates KiB, 210 numbers of bytes. M or MB indicates MiB, 220 numbers of bytes. G or GB indicates GiB, 230 numbers of bytes. T or TB indicates TiB, 240 numbers of bytes.
request_GPUs
This does not arrange for the CUDA runtime to be present, only for the job to run on a machine whose driver supports the specified version.
For scheduler and local universe jobs, the requirements expression is evaluated against the Scheduler ClassAd which represents the the condor_schedd daemon running on the access point, rather than a remote machine. Like all commands in the submit description file, if multiple requirements commands are present, all but the last one are ignored. By default, condor_submit appends the following clauses to the requirements expression:
View the requirements of a job which has already been submitted (along with everything else about the job ClassAd) with the command condor_q -l; see the command reference for condor_q. Also, see the HTCondor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.
FILE TRANSFER COMMANDS
&& (TARGET.HasEncryptExecuteDirectory)
to ensure the job is matched to a machine that is capable of encrypting the contents of the execute directory. This support is limited to Windows platforms that use the NTFS file system and Linux platforms with the ecryptfs-utils package installed.
For more information about this and other settings related to transferring files, see the HTCondor User's manual section on the file transfer mechanism.
Note that should_transfer_files is not supported for jobs submitted to the grid universe.
When a path to an input file or directory is specified, this specifies the path to the file on the submit side. The file is placed in the job's temporary scratch directory on the execute side, and it is named using the base name of the original path. For example, /path/to/input_file becomes input_file in the job's scratch directory.
When a directory is specified, the behavior depends on whether there is a trailing path separator character. When a directory is specified with a trailing path separator, it is as if each of the items within the directory were listed in the transfer list. Therefore, the contents are transferred, but the directory itself is not. When there is no trailing path separator, the directory itself is transferred with all of its contents inside it. On platforms such as Windows where the path separator is not a forward slash (/), a trailing forward slash is treated as equivalent to a trailing path separator. An example of an input directory specified with a trailing forward slash is input_data/.
For grid universe jobs other than HTCondor-C, the transfer of directories is not currently supported.
Symbolic links to files are transferred as the files they point to. Transfer of symbolic links to directories is not currently supported.
For vanilla and vm universe jobs only, a file may be specified by giving a URL, instead of a file name. The implementation for URL transfers requires both configuration and available plug-in.
If you have a plugin which handles https:// URLs (and HTCondor ships with one enabled), HTCondor supports pre-signing S3 URLs. This allows you to specify S3 URLs for this command, for transfer_output_remaps, and for output_destination. By pre-signing the URLs on the submit node, HTCondor avoids transferring your S3 credentials to the execute node. You must specify aws_access_key_id_file and aws_secret_access_key_file; you may specify aws_region, if necessary; see below. To use the S3 service provided by AWS, use S3 URLs of the following forms:
# For older buckets that aren't region-specific. s3://<bucket>/<key> # For newer, region-specific buckets. s3://<bucket>.s3.<region>.amazonaws.com/<key>
To use other S3 services, where <host> must contain a .:
s3://<host>/<key> # If necessary aws_region = <region>
You may specify the corresponding access key ID and secret access key with s3_access_key_id_file and s3_secret_access_key_file if you prefer (which may reduce confusion, if you're not using AWS).
If you must access S3 using temporary credentials, you may specify the temporary credentials using aws_access_key_id_file and aws_secret_access_key_file for the files containing the corresponding temporary token, and +EC2SessionToken for the file containing the session token.
Temporary credentials have a limited lifetime. If you are using S3 only to download input files, the job must start before the credentials expire. If you are using S3 to upload output files, the job must finish before the credentials expire. HTCondor does not know when the credentials will expire; if they do so before they are needed, file transfer will fail.
HTCondor does not presently support transferring entire buckets or directories from S3.
HTCondor supports Google Cloud Storage URLs -- gs:// -- via Google's "interoperability" API. You may specify gs:// URLs as if they were s3:// URLs, and they work the same way. You may specify the corresponding access key ID and secret access key with gs_access_key_id_file and gs_secret_access_key_file if you prefer (which may reduce confusion).
Note that (at present), you may not provide more than one set of credentials for s3:// or gs:// file transfer; this implies that all such URLs download from or upload to the same service.
For HTCondor-C jobs and all other non-grid universe jobs, if transfer_output_files is not specified, HTCondor will automatically transfer back all files in the job's temporary working directory which have been modified or created by the job. Subdirectories are not scanned for output, so if output from subdirectories is desired, the output list must be explicitly specified. For grid universe jobs other than HTCondor-C, desired output files must also be explicitly listed. Another reason to explicitly list output files is for a job that creates many files, and the user wants only a subset transferred back.
For grid universe jobs other than with grid type condor, to have files other than standard output and standard error transferred from the execute machine back to the access point, do use transfer_output_files, listing all files to be transferred. These files are found on the execute machine in the working directory of the job.
When a path to an output file or directory is specified, it specifies the path to the file on the execute side. As a destination on the submit side, the file is placed in the job's initial working directory, and it is named using the base name of the original path. For example, path/to/output_file becomes output_file in the job's initial working directory. The name and path of the file that is written on the submit side may be modified by using transfer_output_remaps. Note that this remap function only works with files but not with directories.
When a directory is specified, the behavior depends on whether there is a trailing path separator character. When a directory is specified with a trailing path separator, it is as if each of the items within the directory were listed in the transfer list. Therefore, the contents are transferred, but the directory itself is not. When there is no trailing path separator, the directory itself is transferred with all of its contents inside it. On platforms such as Windows where the path separator is not a forward slash (/), a trailing forward slash is treated as equivalent to a trailing path separator. An example of an input directory specified with a trailing forward slash is input_data/.
For grid universe jobs other than HTCondor-C, the transfer of directories is not currently supported.
Symbolic links to files are transferred as the files they point to. Transfer of symbolic links to directories is not currently supported.
If this command is absent, the output is transferred instead.
If no files or directories are specified, nothing will be transferred. This is generally not useful.
The list is interpreted like transfer_output_files, but there is no corresponding remaps command.
Trailing slashes are ignored when preserve_relative_paths is set.
name describes an output file name produced by your job, and newname describes the file name it should be downloaded to. Multiple remaps can be specified by separating each with a semicolon. If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash. You cannot specify directories to be remapped.
Note that whether an output file is transferred is controlled by transfer_output_files. Listing a file in transfer_output_remaps is not sufficient to cause it to be transferred.
Setting when_to_transfer_output to ON_EXIT_OR_EVICT will cause HTCondor to transfer the job's output files when the job completes (exits on its own) and when the job is evicted. When the job is evicted, HTCondor will transfer the output files to a temporary directory on the submit node (determined by the SPOOL configuration variable). When the job restarts, these files will be transferred instead of the input files. If transfer_output_files is not set, HTCondor considers all files in the sandbox's top-level directory to be the output; subdirectories and their contents will not be transferred.
Setting when_to_transfer_output to ON_SUCCESS will cause HTCondor to transfer the job's output files when the job completes successfully. Success is defined by the success_exit_code command, which must be set, even if the successful value is the default 0. If transfer_output_files is not set, HTCondor considers all new files in the sandbox's top-level directory to be the output; subdirectories and their contents will not be transferred.
In all three cases, the job will go on hold if transfer_output_files specifies a file which does not exist at transfer time.
POLICY COMMANDS
This attribute is intended to help minimize the time wasted by jobs which may erroneously run forever.
This command is intended to help minimize the time wasted by jobs which may erroneously run forever.
The combination of the max_retries, retry_until, and success_exit_code commands causes an appropriate OnExitRemove expression to be automatically generated. If retry command(s) and on_exit_remove are both defined, the OnExitRemove expression will be generated by OR'ing the expression specified in OnExitRemove and the expression generated by the retry commands.
max_retries = 5 retry_until = !member( ExitCode, {17, 34, 81} )
Note: non-zero values of success_exit_code should generally not be used for DAG node jobs, unless when_to_transfer_output is set to ON_SUCCESS in order to avoid failed jobs going on hold.
At the present time, condor_dagman does not take into account the value of success_exit_code. This means that, if success_exit_code is set to a non-zero value, condor_dagman will consider the job failed when it actually succeeds. For single-proc DAG node jobs, this can be overcome by using a POST script that takes into account the value of success_exit_code (although this is not recommended). For multi-proc DAG node jobs, there is currently no way to overcome this limitation.
The process by which the condor_schedd claims a condor_startd is somewhat time-consuming. To amortize this cost, the condor_schedd tries to reuse claims to run subsequent jobs, after a job using a claim is done. However, it can only do this if there is an idle job in the queue at the moment the previous job completes. Sometimes, and especially for the node jobs when using DAGMan, there is a subsequent job about to be submitted, but it has not yet arrived in the queue when the previous job completes. As a result, the condor_schedd releases the claim, and the next job must wait an entire negotiation cycle to start. When this submit command is defined with a non-negative integer, when the job exits, the condor_schedd tries as usual to reuse the claim. If it cannot, instead of releasing the claim, the condor_schedd keeps the claim until either the number of seconds given as a parameter, or a new job which matches that claim arrives, whichever comes first. The condor_startd in question will remain in the Claimed/Idle state, and the original job will be "charged" (in terms of priority) for the time in this state.
As an example, if the job is to be removed once the output is retrieved with condor_transfer_data, then use
leave_in_queue = (JobStatus == 4) && ((StageOutFinish =?= UNDEFINED) ||\ (StageOutFinish == 0))
This command has been historically used to implement a form of job start throttling from the job submitter's perspective. It was effective for the case of multiple job submission where the transfer of extremely large input data sets to the execute machine caused machine performance to suffer. This command is no longer useful, as throttling should be accomplished through configuration of the condor_schedd daemon.
For example: Suppose a job is known to run for a minimum of an hour. If the job exits after less than an hour, the job should be placed on hold and an e-mail notification sent, instead of being allowed to leave the queue.
on_exit_hold = (time() - JobStartDate) < (60 * $(MINUTE))
This expression places the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes.
For example, suppose a job occasionally segfaults, but chances are that the job will finish successfully if the job is run again with the same data. The on_exit_remove expression can cause the job to run again with the following command. Assume that the signal identifier for the segmentation fault is 11 on the platform where the job will be running.
on_exit_remove = (ExitBySignal == False) || (ExitSignal != 11)
This expression lets the job leave the queue if the job was not killed by a signal or if it was killed by a signal other than 11, representing segmentation fault in this example. So, if the exited due to signal 11, it will stay in the job queue. In any other case of the job exiting, the job will leave the queue as it normally would have done.
As another example, if the job should only leave the queue if it exited on its own with status 0, this on_exit_remove expression works well:
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
If the job was killed by a signal or exited with a non-zero exit status, HTCondor would leave the job in the queue to run again.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. Note that, by default, this expression is only checked once every 60 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL, MAX_PERIODIC_EXPR_INTERVAL, and PERIODIC_EXPR_TIMESLICE configuration macros.
Only job ClassAd attributes will be defined for use by this ClassAd expression. Note that, by default, this expression is only checked once every 60 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL, MAX_PERIODIC_EXPR_INTERVAL, and PERIODIC_EXPR_TIMESLICE configuration macros.
See the Examples section of this manual page for an example of a periodic_remove expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions. So, the periodic_remove expression takes precedent over the on_exit_remove expression, if the two describe conflicting actions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. Note that, by default, this expression is only checked once every 60 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL, MAX_PERIODIC_EXPR_INTERVAL, and PERIODIC_EXPR_TIMESLICE configuration macros.
Only job ClassAd attributes will be defined for use by this ClassAd expression. Note that, by default, this expression is only checked once every 60 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL, MAX_PERIODIC_EXPR_INTERVAL, and PERIODIC_EXPR_TIMESLICE configuration macros.
COMMANDS FOR THE GRID
For a grid-type-string of batch, the single parameter is the name of the local batch system, and will be one of pbs, lsf, slurm, or sge.
For a grid-type-string of condor, the first parameter is the name of the remote condor_schedd daemon. The second parameter is the name of the pool to which the remote condor_schedd daemon belongs.
For a grid-type-string of ec2, one additional parameter specifies the EC2 URL.
For a grid-type-string of arc, the single parameter is the name of the ARC resource to be used.
For transferring files other than stdin, see transfer_input_files.
For transferring files other than stdout, see transfer_output_files,
x509userproxy is relevant when the universe is vanilla, or when the universe is grid and the type of grid system is one of condor, or arc. Defining a value causes the proxy to be delegated to the execute machine. Further, VOMS attributes defined in the proxy will appear in the job ClassAd.
COMMANDS FOR PARALLEL, JAVA, and SCHEDULER UNIVERSES
remove_kill_sig = SIGUSR1 remove_kill_sig = 10
If this command is not present, the value of kill_sig is used.
COMMANDS FOR THE VM UNIVERSE
An example that specifies two disk files:
vm_disk = /myxen/diskfile.img:sda1:w,/myxen/swap.img:sda2:w
COMMANDS FOR THE DOCKER UNIVERSE
COMMANDS FOR THE CONTAINER UNIVERSE
ADVANCED COMMANDS
The HTCondor User's manual section on Time Scheduling for Job Execution has further details.
The HTCondor User's manual section on Time Scheduling for Job Execution has further details.
The HTCondor User's manual section on Time Scheduling for Job Execution has further details.
Due to implementation details, a deferral time may not be used for scheduler universe jobs.
The HTCondor User's manual section on Time Scheduling for Job Execution has further details.
For vanilla universe jobs where there is a shared file system, it is the current working directory on the machine where the job is executed.
For vanilla or grid universe jobs where file transfer mechanisms are utilized (there is not a shared file system), it is the directory on the machine from which the job is submitted where the input files come from, and where the job's output files go to.
For scheduler universe jobs, it is the directory on the machine from which the job is submitted where the job runs; the current working directory for file input and output with respect to relative path names.
Note that the path to the executable is not relative to initialdir if it is a relative path, it is relative to the directory in which the condor_submit command is run.
LastMatchName0 = "most-recent-Name" LastMatchName1 = "next-most-recent-Name"
The value for each introduced ClassAd is given by the value of the Name attribute from the machine ClassAd of a previous execution (match). As a job is matched, the definitions for these attributes will roll, with LastMatchName1 becoming LastMatchName2, LastMatchName0 becoming LastMatchName1, and LastMatchName0 being set by the most recent value of the Name attribute.
An intended use of these job attributes is in the requirements expression. The requirements can allow a job to prefer a match with either the same or a different resource than a previous match.
Setting this expression does not affect the job's resource requirements or preferences. For a job to only run on a machine with a minimum MachineMaxVacateTime, or to preferentially run on such machines, explicitly specify this in the requirements and/or rank expressions.
This feature is not presently available for Windows.
This feature is not presently available for Windows.
When a resource claim is to be preempted, this expression in the submit file specifies the maximum run time of the job (in seconds, since the job started). This expression has no effect, if it is greater than the maximum retirement time provided by the machine policy. If the resource claim is not preempted, this expression and the machine retirement policy are irrelevant. If the resource claim is preempted the job will be allowed to run until the retirement time expires, at which point it is hard-killed. The job will be soft-killed when it is getting close to the end of retirement in order to give it time to gracefully shut down. The amount of lead-time for soft-killing is determined by the maximum vacating time granted to the job.
Any jobs running with nice_user priority have a default max_job_retirement_time of 0, so no retirement time is utilized by default. In all other cases, no default value is provided, so the maximum amount of retirement time is utilized by default.
Setting this expression does not affect the job's resource requirements or preferences. For a job to only run on a machine with a minimum MaxJobRetirementTime, or to preferentially run on such machines, explicitly specify this in the requirements and/or rank expressions.
Note that setting an job attribute in this way should not be used in place of one of the specific commands listed above. Often, the command name does not directly correspond to an attribute name; furthermore, many submit commands result in actions more complex than simply setting an attribute or attributes. See Job ClassAd Attributes for a list of HTCondor job attributes.
MACROS AND COMMENTS
In addition to commands, the submit description file can contain macros and comments.
<macro_name> = <string>
Several pre-defined macros are supplied by the submit description file parser. The $(Cluster) or $(ClusterId) macro supplies the value of the
ClusterId job ClassAd attribute, and the $(Process) or
$(ProcId) macro supplies the value of the ProcId job ClassAd
attribute. The $(JobId) macro supplies the full job id. It is
equivalent to $(ClusterId).$(ProcId). These macros are intended to
aid in the specification of input/output files, arguments, etc., for
clusters with lots of jobs, and/or could be used to supply an HTCondor
process with its own cluster and process numbers on the command line.
The $(Node) macro is defined for parallel universe jobs, and is especially relevant for MPI applications. It is a unique value assigned for the duration of the job that essentially identifies the machine (slot) on which a program is executing. Values assigned start at 0 and increase monotonically. The values are assigned as the parallel job is about to start.
Recursive definition of macros is permitted. An example of a construction that works is the following:
foo = bar foo = snap $(foo)
As a result, foo = snap bar.
Note that both left- and right- recursion works, so
foo = bar foo = $(foo) snap
has as its result foo = bar snap.
The construction
foo = $(foo) bar
by itself will not work, as it does not have an initial base case. Mutually recursive constructions such as:
B = bar C = $(B) B = $(C) boo
will not work, and will fill memory with expansions.
A default value may be specified, for use if the macro has no definition. Consider the example
D = $(E:24)
Where E is not defined within the submit description file, the default value 24 is used, resulting in
D = 24
This is useful for creating submit templates where values can be passed on the condor_submit command line, but that have a default value as well. In the above example, if you give a value for E on the command line like this
condor_submit E=99 <submit-file>
The value of 99 is used for E, resulting in
D = 99
To use the dollar sign character ($) as a literal, without macro expansion, use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of
macro called a substitution macro
that allows the substitution of a machine ClassAd attribute value defined on
the resource machine itself (gotten after a match to the machine has been
made) into specific commands within the submit description file. The
substitution macro is of the form:
$$(attribute)
As this form of the substitution macro is only evaluated within the context of the machine ClassAd, use of a scope resolution prefix TARGET. or MY. is not allowed.
A common use of this form of the substitution macro is for the heterogeneous submission of an executable:
executable = povray.$$(OpSys).$$(Arch)
Values for the OpSys and Arch attributes are substituted at match time for any given resource. This example allows HTCondor to automatically choose the correct executable for the matched machine.
An extension to the syntax of the substitution macro provides an alternative string to use if the machine attribute within the substitution macro is undefined. The syntax appears as:
$$(attribute:string_if_attribute_undefined)
An example using this extended syntax provides a path name to a required input file. Since the file can be placed in different locations on different machines, the file's path name is given as an argument to the program.
arguments = $$(input_file_path:/usr/foo)
On the machine, if the attribute input_file_path is not defined, then the path /usr/foo is used instead.
As a special case that only works within the submit file environment command, the string $$(CondorScratchDir) is expanded to the value of the job's scratch directory. This does not work for scheduler universe or grid universe jobs.
For example, to set PYTHONPATH to a subdirectory of the job scratch dir, one could set
environment = PYTHONPATH=$$(CondorScratchDir)/some/directory
A further extension to the syntax of the substitution macro allows the evaluation of a ClassAd expression to define the value. In this form, the expression may refer to machine attributes by prefacing them with the TARGET. scope resolution prefix. To place a ClassAd expression into the substitution macro, square brackets are added to delimit the expression. The syntax appears as:
$$([ClassAd expression])
An example of a job that uses this syntax may be one that wants to know how much memory it can use. The application cannot detect this itself, as it would potentially use all of the memory on a multi-slot machine. So the job determines the memory per slot, reducing it by 10% to account for miscellaneous overhead, and passes this as a command line argument to the application. In the submit description file will be
arguments = --memory $$([TARGET.Memory * 0.9])
To insert two dollar sign characters ($$) as literals into a ClassAd string, use
$$(DOLLARDOLLAR)
The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is
$ENV(variable)
An example submit description file command that uses this functionality evaluates the submitter's home directory in order to set the path and file name of a log file:
log = $ENV(HOME)/jobs/logfile
The environment variable is evaluated when the submit description file is processed.
The $RANDOM_CHOICE macro allows a random choice to be made from a given list of parameters at submission time. For an expression, if some randomness needs to be generated, the macro may appear as
$RANDOM_CHOICE(0,1,2,3,4,5,6)
When evaluated, one of the parameters values will be chosen.
While processing the queue command in a submit file or from the command line, condor_submit will set the values of several automatic submit variables so that they can be referred to by statements in the submit file. With the exception of Cluster and Process, if these variables are set by the submit file, they will not be modified during queue processing.
The automatic variables below are set before parsing the submit file, and will not vary during processing unless the submit file itself sets them.
condor_submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.
#################### # # submit description file # Example 1: queuing multiple jobs with differing # command line arguments and output files. # #################### Executable = foo Universe = vanilla Arguments = 15 2000 Output = foo.out0 Error = foo.err0 Queue Arguments = 30 2000 Output = foo.out1 Error = foo.err1 Queue Arguments = 45 6000 Output = foo.out2 Error = foo.err2 Queue
Or you can get the same results as the above submit file by using a list of arguments with the Queue statement
#################### # # submit description file # Example 1b: queuing multiple jobs with differing # command line arguments and output files, alternate syntax # #################### Executable = foo Universe = vanilla # generate different output and error filenames for each process Output = foo.out$(Process) Error = foo.err$(Process) Queue Arguments From ( 15 2000 30 2000 45 6000 )
#################### # # Example 2: Show off some fancy features including # use of pre-defined macros and logging. # #################### Executable = foo Universe = vanilla Requirements = OpSys == "LINUX" && Arch =="INTEL" Rank = Memory >= 64 Request_Memory = 32 Mb Image_Size = 28 Mb Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
#################### # # Example 3: Run on a RedHat 6 machine # #################### Universe = vanilla Executable = /bin/sleep Arguments = 30 Requirements = (OpSysAndVer == "RedHat6") Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = sleep.log Queue
$ condor_submit -a "log = out.log" -a "error = error.log" mysubmitfile
Note that each of the added commands is contained within quote marks because there are space characters within the command.
Including the command
periodic_remove = CumulativeSuspensionTime > ((RemoteWallClockTime - CumulativeSuspensionTime) / 2.0)
in the submit description file causes this to happen.
HTCondor User Manual
HTCondor Team
1990-2024, Center for High Throughput Computing, Computer Sciences Department, University of Wisconsin-Madison, Madison, WI, US. Licensed under the Apache License, Version 2.0.
August 25, 2024 |