SLURM cheatsheet: https://slurm.schedmd.com/pdfs/summary.pdf
You can submit jobs using an SLURM job script. Below is an example of a simple script:
#!/bin/bash #SBATCH --time=00:15:00 #SBATCH --mem=10GB #SBATCH --cpus-per-task=4 #SBATCH --gres=gpu:1 echo 'Hello, world!' sleep 30
You can use SBATCH
variables like --mem
for example the one above will assign 10GB of RAM to the job.
For CPU cores allocation, you can use --cpus-per-task
, for example the one above will assign 4 cores to the job.
The --gres=gpu:1
will assign 1x GPU to your job.
To run the script, simply run sbatch your_script.sh
in any of the SLURM node(s).
To look at the queue of jobs currently, you can use squeue
to display it.
By default the output is placed in a file named "slurm-", suffixed with the job ID number and ".out", e.g. slurm-123456.out
, in the directory from which the job was submitted. Having the job ID as part of the file name is convenient for troubleshooting.
A different name or location can be specified if your workflow requires it by using the --output
directive. Certain replacement symbols can be used in a filename specified this way, such as the job ID number, the job name, or the job array task ID. See the vendor documentation on sbatch for a complete list of replacement symbols and some examples of their use.
Error output will normally appear in the same file as standard output, just as it would if you were typing commands interactively. If you want to send the standard error channel (stderr) to a separate file, use --error
.
By default squeue
will show all the jobs the scheduler is managing at the moment. It will run much faster if you ask only about your own jobs with
$ squeue -u $USER
You can show only running jobs, or only pending jobs:
$ squeue -u <username> -t RUNNING $ squeue -u <username> -t PENDING
You can show detailed information for a specific job with scontrol
:
$ scontrol show job -dd
*Do not* run squeue
from a script or program at high frequency, e.g., every few seconds. Responding to squeue
adds load to Slurm, and may interfere with its performance or correct operation.
Use scancel
with the job ID to cancel a job:
$ scancel
You can also use it to cancel all your jobs, or all your pending jobs:
$ scancel -u $USER
$ scancel -t PENDING -u $USER