SLURM User Documentation

SLURM batch job system

SLURM cheatsheet: https://slurm.schedmd.com/pdfs/summary.pdf

You can submit jobs using an SLURM job script. Below is an example of a simple script:

#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=10GB
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
echo 'Hello, world!'
sleep 30

You can use SBATCH variables like --mem for example the one above will assign 10GB of RAM to the job.

For CPU cores allocation, you can use --cpus-per-task , for example the one above will assign 4 cores to the job. The --gres=gpu:1 will assign 1x GPU to your job.

Running the script

To run the script, simply run sbatch your_script.sh in any of the SLURM node(s).

Queues

To look at the queue of jobs currently, you can use squeue to display it.

Where does the output go?

By default the output is placed in a file named "slurm-", suffixed with the job ID number and ".out", e.g. slurm-123456.out, in the directory from which the job was submitted. Having the job ID as part of the file name is convenient for troubleshooting.

A different name or location can be specified if your workflow requires it by using the --output directive. Certain replacement symbols can be used in a filename specified this way, such as the job ID number, the job name, or the job array task ID. See the vendor documentation on sbatch for a complete list of replacement symbols and some examples of their use.

Error output will normally appear in the same file as standard output, just as it would if you were typing commands interactively. If you want to send the standard error channel (stderr) to a separate file, use --error.

Monitoring jobs

Current jobs

By default squeue will show all the jobs the scheduler is managing at the moment. It will run much faster if you ask only about your own jobs with

$ squeue -u $USER

You can show only running jobs, or only pending jobs:

$ squeue -u <username> -t RUNNING
$ squeue -u <username> -t PENDING

You can show detailed information for a specific job with scontrol:

$ scontrol show job -dd

*Do not* run squeue from a script or program at high frequency, e.g., every few seconds. Responding to squeue adds load to Slurm, and may interfere with its performance or correct operation.

Cancelling jobs

Use scancel with the job ID to cancel a job:

$ scancel

You can also use it to cancel all your jobs, or all your pending jobs:

$ scancel -u $USER

$ scancel -t PENDING -u $USER

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2023-06-08 - HarshRoghelia
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback