Slurm Submission Files (WIP)

Verify properly loaded modules

These scripts perform an efficient, minimal setup to run various jobs with various computational codes. The following steps are performed:

Loads the required modules.
Check that the required modules are loaded.
Activate a virtual environment with ASE installed (assumed to be located at /software/python/virtualenvs/ase/).

VASPGaussianORCA

samples/slurm/activateASEVasp.sh

#!/bin/bash

module purge
module load StdEnv/2023 intel/2023.2.1 openmpi/4.1.5
module load vasp/6.4.2
module load python/3.11.5 scipy-stack

if [[ $(module list | grep 'intel/2023.2.1') == ""  || $(module list | grep 'python/3.11.5') == "" || $(module list | grep 'vasp/6.4.2') == "" ]]; then
    echo "Your modules are not loaded correctly. Cancelling job... "
    exit 1
else
    echo "Your modules are loaded correctly. Proceeding to activate ASE..."
fi

echo "Changing directory to ~/software/python/virtualenvs/ase ..."
cd ~/software/python/virtualenvs/ase || exit

function load_ase() {
    source ~/software/python/virtualenvs/ase/bin/activate
}

if [[ $(pwd | grep 'ase') == */software/python/virtualenvs/ase ]]; then
    pwd
    echo "You are in the right location! Activating ase..."
    load_ase
else
    echo "Please ensure you have the correct directory struture (~/software/python/virtualenvs/ase)..."
    echo "Exiting"
    exit 1
fi

samples/slurm/activateASEGaussian.sh

#!/bin/bash

# Gaussian does not have any dependencies :angel:
module purge
module load gaussian/g16.c01
module load python/3.11.5 scipy-stack

if [[ $(module list | grep 'intel/2023.2.1') == ""  || $(module list | grep 'python/3.11.5') == "" || $(module list | grep 'gaussian/g16.c01') == "" ]]; then
    echo "Your modules are not loaded correctly for Gaussian. Cancelling job... "
    exit 1
else
    echo "Your modules are loaded correctly for Gaussian. Proceeding to activate ASE..."
fi

echo "Changing directory to ~/software/python/virtualenvs/ase ..."
cd ~/software/python/virtualenvs/ase || exit

function load_ase() {
    source ~/software/python/virtualenvs/ase/bin/activate
}

if [[ $(pwd | grep 'ase') == */software/python/virtualenvs/ase ]]; then
    pwd
    echo "You are in the right location! Activating ase..."
    load_ase
else
    echo "Please ensure you have the correct directory struture (~/software/python/virtualenvs/ase)..."
    echo "Exiting"
    exit 1
fi

samples/slurm/activateASEORCA.sh

#!/bin/bash

module purge
module load StdEnv/2020 gcc/10.3.0 openmpi/4.1.1
module load orca/5.0.4
module load python/3.11.5 scipy-stack

if [[ $(module list | grep 'intel/2023.2.1') == ""  || $(module list | grep 'python/3.11.5') == "" || $(module list | grep 'orca/5.0.4') == "" ]]; then
    echo "Your modules are not loaded correctly for ORCA. Cancelling job... "
    exit 1
else
    echo "Your modules are loaded correctly for ORCA. Proceeding to activate ASE..."
    export PATH="${EBROOTORCA}/:$PATH"
fi

echo "Changing directory to ~/software/python/virtualenvs/ase ..."
cd ~/software/python/virtualenvs/ase || exit

function load_ase() {
    source ~/software/python/virtualenvs/ase/bin/activate
}

if [[ $(pwd | grep 'ase') == */software/python/virtualenvs/ase ]]; then
    pwd
    echo "You are in the right location! Activating ase..."
    load_ase
else
    echo "Please ensure you have the correct directory struture (~/software/python/virtualenvs/ase)..."
    echo "Exiting"
    exit 1
fi

Reminder

Don't forget to add logic to actually run your job at the end of these scripts (e.g., python3 run.py).

Submit a DFT calculation

These samples scripts are relatively bloated in comparison to those in the previous section. The scripts perform the following steps:

Loads your bash profile file and the required modules for the computational code.
Creates a scratch directory dedicated to the job that is uniquely identified by the SLURM job ID and creates a symlink to the scratch directory for convenience. (This is especially useful if the jobterminates unexpectedly during execution.)
Copies files to the scratch directory.
Initiates the calculation by running a Python script (presumably, run.py).
Stops the job at 90% of the maximum run time to ensure enough time remains to copy files from the scratch directory to the submission directory.
Cleans up the scratch directory.
Logs the completion of the job in a file in your home directory ~/job.log.

Additionally, the script prints out debugging information that may be useful for identifying issues with running jobs (e.g., resource information, job ID, etc.).

VASPEspressoGaussianORCA

samples/slurm/vasp.sh
``` py title="samples/slurm/vasp.sh" linenums="1" hl_lines="96"
#!/bin/bash
#SBATCH --account=def-samiras
#SBATCH --job-name=JOB_NAME
#SBATCH --mem-per-cpu=1000MB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=23:00:00
#SBATCH --mail-user=SFU_ID@sfu.ca
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90

echo " "
echo "### Setting up shell environment ..."
echo " "

if test -e "/etc/profile"; then
  source "/etc/profile"
fi

if test -e "$HOME/.bash_profile"; then
  source "$HOME/.bash_profile"
fi

unset LANG

module purge
module load vasp
module load python/3.11.9

# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environment
source "$COMP_CHEM_ENV"

export LC_ALL="C"
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1

ulimit -s unlimited

echo " "
echo "### Printing basic job infos to stdout ..."
echo " "
echo "START_TIME               = $(date '+%y-%m-%d %H:%M:%S %s')"
echo "HOSTNAME                 = ${HOSTNAME}"
echo "USER                     = ${USER}"
echo "SLURM_JOB_NAME           = ${SLURM_JOB_NAME}"
echo "SLURM_JOB_ID             = ${SLURM_JOB_ID}"
echo "SLURM_SUBMIT_DIR         = ${SLURM_SUBMIT_DIR}"
echo "SLURM_JOB_NUM_NODES      = ${SLURM_JOB_NUM_NODES}"
echo "SLURM_NTASKS             = ${SLURM_NTASKS}"
echo "SLURM_NODELIST           = ${SLURM_NODELIST}"
echo "SLURM_JOB_NODELIST       = ${SLURM_JOB_NODELIST}"

if test -f "${SLURM_JOB_NODELIST}"; then
  echo "SLURM_JOB_NODELIST (begin) ----------"
  cat "${SLURM_JOB_NODELIST}"
  echo "SLURM_JOB_NODELIST (end) ------------"
fi

echo "--------------- ulimit -a -S ---------------"
ulimit -a -S
echo "--------------- ulimit -a -H ---------------"
ulimit -a -H
echo "----------------------------------------------"

echo " "
echo "### Creating TMP_WORK_DIR directory and changing to it ..."
echo " "

if test -e "$HOME/scratch"; then
  TMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"
elif test -e /scratch/"${SLURM_JOB_ID}"; then
  TMP_WORK_DIR=/scratch/${SLURM_JOB_ID}
else
  TMP_WORK_DIR="$(pwd)"
fi

TMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"
JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"

echo "TMP_WORK_DIR         = ${TMP_WORK_DIR}"
echo "TMP_BASE_DIR         = ${TMP_BASE_DIR}"
echo "JOB_WORK_DIR         = ${JOB_WORK_DIR}"

# Creating a symbolic link to temporary directory holding work files while job running

if ! test -e "${TMP_WORK_DIR}"; then
  mkdir "${TMP_WORK_DIR}"
fi
ln -s "${TMP_WORK_DIR}" scratch_dir
cd "${TMP_WORK_DIR}" || exit

echo " "
echo "### Copying input files for job (if required):"
echo " "

script_name="${BASH_SOURCE[0]}"
AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"
export AUTOJOB_SLURM_SCRIPT
export AUTOJOB_PYTHON_SCRIPT="{{ python_script }}"
export AUTOJOB_COPY_TO_SCRATCH="CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR"
cp -v "$SLURM_SUBMIT_DIR"/{CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR} "$TMP_WORK_DIR"/

echo " "

# Preemptively end job if getting close to time limit
timeline=$(grep -E -m 1 '^#SBATCH[[:space:]]*--time=' "$script_name")
timeslurm=${timeline##*=}
IFS=- read -ra day_split_time <<< "$timeslurm"
no_days_time=${day_split_time[1]}
days=${no_days_time:+${day_split_time[0]}}
no_days_time=${day_split_time[1]:-${day_split_time[0]}}
IFS=: read -ra split_time <<< "$no_days_time"

# Time formats with days: D-H, D-H:M, D-H:M:S
if [[ $days ]]; then
  slurm_days="$days"
  slurm_hours=${split_time[0]}
  slurm_minutes=${split_time[1]:-0}
  slurm_seconds=${split_time[2]:-0}
# Time format without days: M, M:S, H:M:S
else
  slurm_days=0
  if [[ ${#split_time[*]} == 3 ]]; then
    slurm_hours=${split_time[0]}
    slurm_minutes=${split_time[1]}
    slurm_seconds=${split_time[2]}
  else
    slurm_hours=0
    slurm_minutes=${split_time[0]}
    slurm_seconds=${split_time[1]:-0}
  fi
fi

echo "Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."
timeslurm=$(echo "$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds" | bc)
echo "This means $timeslurm seconds."
timeslurm=$(echo "$timeslurm *0.9" |bc)
echo "Will terminate at ${timeslurm}s to copy back necessary files from scratch"

echo ""
echo ""

# run ase calculation and time
timeout "${timeslurm}" python3 "$AUTOJOB_PYTHON_SCRIPT"

exit_code=$?

if [ "$exit_code" -eq 124 ]; then
  echo " "
  echo "Cancelled due to time limit."
else
  echo " "
  echo "Time limit not reached."
fi

echo " "
echo "### Cleaning up files ... removing unnecessary scratch files ..."
echo " "

AUTOJOB_FILES_TO_DELETE="*.d2e *.int *.rwf *.skr *.inp EIGENVAL IBZKPT PCDAT PROCAR ELFCAR LOCPOT PROOUT TMPCAR vasp.dipcor"
rm -vf "$AUTOJOB_FILES_TO_DELETE"
sleep 10 # Sleep some time so potential stale nfs handles can disappear.

echo " "
echo "### Compressing results and copying back result archive ..."
echo " "
cd "${TMP_BASE_DIR}" || exit
mkdir -vp "${SLURM_SUBMIT_DIR}" # if user has deleted or moved the submit dir

echo " "
echo "Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."
echo " "

tar -zcvf "${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \
  || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; }

echo " "
echo "### Remove TMP_WORK_DIR ..."
echo " "
rm -rvf "${TMP_WORK_DIR}"

echo " "
echo "Extracting result tgz-file"
echo " "

cd "${SLURM_SUBMIT_DIR}" || exit
tar -xzf "${JOB_WORK_DIR}".tgz
mv "${JOB_WORK_DIR}"/* .
rm -r "${JOB_WORK_DIR}".tgz "${JOB_WORK_DIR}"

rm "${SLURM_SUBMIT_DIR}/scratch_dir"
echo "END_TIME             = $(date +'%y-%m-%d %H:%M:%S %s')"

# Record job in log file

echo "${SLURM_JOB_ID}-${SLURM_JOB_NAME}" is complete: on "$(date +'%y.%m.%d %H:%M:%S')" "${SLURM_SUBMIT_DIR}" >> ~/job.log

echo " "
echo "### Exiting with exit code ${exit_code}..."
echo " "
exit "$exit_code"

samples/slurm/espresso.sh
``` py title="samples/slurm/espresso.sh" linenums="1"
#!/bin/bash
#SBATCH --account=def-samiras
#SBATCH --job-name=JOB_NAME
#SBATCH --mem-per-cpu=1000MB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=23:00:00
#SBATCH --mail-user=SFU_ID@sfu.ca
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90

echo " "
echo "### Setting up shell environment ..."
echo " "

if test -e "/etc/profile"; then
  source "/etc/profile"
fi

if test -e "$HOME/.bash_profile"; then
  source "$HOME/.bash_profile"
fi

unset LANG

module --force purge
module load gentoo/2020 python/3.11.9 espresso

# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environment
source "$COMP_CHEM_ENV"

export LC_ALL="C"
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1

ulimit -s unlimited

echo " "
echo "### Printing basic job infos to stdout ..."
echo " "
echo "START_TIME               = $(date '+%y-%m-%d %H:%M:%S %s')"
echo "HOSTNAME                 = ${HOSTNAME}"
echo "USER                     = ${USER}"
echo "SLURM_JOB_NAME           = ${SLURM_JOB_NAME}"
echo "SLURM_JOB_ID             = ${SLURM_JOB_ID}"
echo "SLURM_SUBMIT_DIR         = ${SLURM_SUBMIT_DIR}"
echo "SLURM_JOB_NUM_NODES      = ${SLURM_JOB_NUM_NODES}"
echo "SLURM_NTASKS             = ${SLURM_NTASKS}"
echo "SLURM_NODELIST           = ${SLURM_NODELIST}"
echo "SLURM_JOB_NODELIST       = ${SLURM_JOB_NODELIST}"

if test -f "${SLURM_JOB_NODELIST}"; then
  echo "SLURM_JOB_NODELIST (begin) ----------"
  cat "${SLURM_JOB_NODELIST}"
  echo "SLURM_JOB_NODELIST (end) ------------"
fi

echo "--------------- ulimit -a -S ---------------"
ulimit -a -S
echo "--------------- ulimit -a -H ---------------"
ulimit -a -H
echo "----------------------------------------------"

echo " "
echo "### Creating TMP_WORK_DIR directory and changing to it ..."
echo " "

if test -e "$HOME/scratch"; then
  TMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"
elif test -e /scratch/"${SLURM_JOB_ID}"; then
  TMP_WORK_DIR=/scratch/${SLURM_JOB_ID}
else
  TMP_WORK_DIR="$(pwd)"
fi

TMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"
JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"

echo "TMP_WORK_DIR         = ${TMP_WORK_DIR}"
echo "TMP_BASE_DIR         = ${TMP_BASE_DIR}"
echo "JOB_WORK_DIR         = ${JOB_WORK_DIR}"

# Creating a symbolic link to temporary directory holding work files while job running

if ! test -e "${TMP_WORK_DIR}"; then
  mkdir "${TMP_WORK_DIR}"
fi
ln -s "${TMP_WORK_DIR}" scratch_dir
cd "${TMP_WORK_DIR}" || exit

echo " "
echo "### Copying input files for job (if required):"
echo " "

script_name="${BASH_SOURCE[0]}"
AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"
export AUTOJOB_SLURM_SCRIPT
export AUTOJOB_PYTHON_SCRIPT="{{ python_script }}"
export AUTOJOB_COPY_TO_SCRATCH="CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR"
cp -v "$SLURM_SUBMIT_DIR"/{CHGCAR,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR} "$TMP_WORK_DIR"/

echo " "

# Preemptively end job if getting close to time limit
timeline=$(grep -E -m 1 '^#SBATCH[[:space:]]*--time=' "$script_name")
timeslurm=${timeline##*=}
IFS=- read -ra day_split_time <<< "$timeslurm"
no_days_time=${day_split_time[1]}
days=${no_days_time:+${day_split_time[0]}}
no_days_time=${day_split_time[1]:-${day_split_time[0]}}
IFS=: read -ra split_time <<< "$no_days_time"

# Time formats with days: D-H, D-H:M, D-H:M:S
if [[ $days ]]; then
  slurm_days="$days"
  slurm_hours=${split_time[0]}
  slurm_minutes=${split_time[1]:-0}
  slurm_seconds=${split_time[2]:-0}
# Time format without days: M, M:S, H:M:S
else
  slurm_days=0
  if [[ ${#split_time[*]} == 3 ]]; then
    slurm_hours=${split_time[0]}
    slurm_minutes=${split_time[1]}
    slurm_seconds=${split_time[2]}
  else
    slurm_hours=0
    slurm_minutes=${split_time[0]}
    slurm_seconds=${split_time[1]:-0}
  fi
fi

echo "Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."
timeslurm=$(echo "$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds" | bc)
echo "This means $timeslurm seconds."
timeslurm=$(echo "$timeslurm *0.9" |bc)
echo "Will terminate at ${timeslurm}s to copy back necessary files from scratch"

echo ""
echo ""

# run ase calculation and time
timeout "${timeslurm}" python3 "$AUTOJOB_PYTHON_SCRIPT"

exit_code=$?

if [ "$exit_code" -eq 124 ]; then
  echo " "
  echo "Cancelled due to time limit."
else
  echo " "
  echo "Time limit not reached."
fi

echo " "
echo "### Cleaning up files ... removing unnecessary scratch files ..."
echo " "

AUTOJOB_FILES_TO_DELETE="*.mix* *.wfc*"
rm -vf "$AUTOJOB_FILES_TO_DELETE"
sleep 10 # Sleep some time so potential stale nfs handles can disappear.

echo " "
echo "### Compressing results and copying back result archive ..."
echo " "
cd "${TMP_BASE_DIR}" || exit
mkdir -vp "${SLURM_SUBMIT_DIR}" # if user has deleted or moved the submit dir

echo " "
echo "Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."
echo " "

tar -zcvf "${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \
  || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; }

echo " "
echo "### Remove TMP_WORK_DIR ..."
echo " "
rm -rvf "${TMP_WORK_DIR}"

echo " "
echo "Extracting result tgz-file"
echo " "

cd "${SLURM_SUBMIT_DIR}" || exit
tar -xzf "${JOB_WORK_DIR}".tgz
mv "${JOB_WORK_DIR}"/* .
rm -r "${JOB_WORK_DIR}".tgz "${JOB_WORK_DIR}"

rm "${SLURM_SUBMIT_DIR}/scratch_dir"
echo "END_TIME             = $(date +'%y-%m-%d %H:%M:%S %s')"

# Record job in log file

echo "${SLURM_JOB_ID}-${SLURM_JOB_NAME}" is complete: on "$(date +'%y.%m.%d %H:%M:%S')" "${SLURM_SUBMIT_DIR}" >> ~/job.log

echo " "
echo "### Exiting with exit code ${exit_code}..."
echo " "
exit "$exit_code"

Reminder

This script assumes that you are using a self-compiled version of Quantum Espresso and have created a corresponding module named espresso. See this tutorial for how to compile Quantum Espresso and create the necessary modulefile.

samples/slurm/gaussian.sh
#!/bin/bash
#SBATCH --account=def-samiras
#SBATCH --job-name=JOB_NAME
#SBATCH --mem-per-cpu=1000MB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=23:00:00
#SBATCH --mail-user=SFU_ID@sfu.ca
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90

echo " "
echo "### Setting up shell environment ..."
echo " "

if test -e "/etc/profile"; then
  source "/etc/profile"
fi

if test -e "$HOME/.bash_profile"; then
  source "$HOME/.bash_profile"
fi

unset LANG

module purge
module load gaussian/g16.c01
module load python/3.11.9

# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environment
source "$COMP_CHEM_ENV"

export LC_ALL="C"
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1

ulimit -s unlimited

echo " "
echo "### Printing basic job infos to stdout ..."
echo " "
echo "START_TIME               = $(date '+%y-%m-%d %H:%M:%S %s')"
echo "HOSTNAME                 = ${HOSTNAME}"
echo "USER                     = ${USER}"
echo "SLURM_JOB_NAME           = ${SLURM_JOB_NAME}"
echo "SLURM_JOB_ID             = ${SLURM_JOB_ID}"
echo "SLURM_SUBMIT_DIR         = ${SLURM_SUBMIT_DIR}"
echo "SLURM_JOB_NUM_NODES      = ${SLURM_JOB_NUM_NODES}"
echo "SLURM_NTASKS             = ${SLURM_NTASKS}"
echo "SLURM_NODELIST           = ${SLURM_NODELIST}"
echo "SLURM_JOB_NODELIST       = ${SLURM_JOB_NODELIST}"

if test -f "${SLURM_JOB_NODELIST}"; then
  echo "SLURM_JOB_NODELIST (begin) ----------"
  cat "${SLURM_JOB_NODELIST}"
  echo "SLURM_JOB_NODELIST (end) ------------"
fi

echo "--------------- ulimit -a -S ---------------"
ulimit -a -S
echo "--------------- ulimit -a -H ---------------"
ulimit -a -H
echo "----------------------------------------------"

echo " "
echo "### Creating TMP_WORK_DIR directory and changing to it ..."
echo " "

if test -e "$HOME/scratch"; then
  TMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"
elif test -e /scratch/"${SLURM_JOB_ID}"; then
  TMP_WORK_DIR=/scratch/${SLURM_JOB_ID}
else
  TMP_WORK_DIR="$(pwd)"
fi

# Pass memory request, cpu list, and scratch directory to Gaussian
export GAUSS_MDEF="${SLURM_MEM_PER_NODE}MB"
GAUSS_CDEF=$(taskset -cp $$ | awk -F':' '{print $2}')
export GAUSS_CDEF
export GAUSS_SCRDIR=${TMP_WORK_DIR}

TMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"
JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"

echo "TMP_WORK_DIR         = ${TMP_WORK_DIR}"
echo "TMP_BASE_DIR         = ${TMP_BASE_DIR}"
echo "JOB_WORK_DIR         = ${JOB_WORK_DIR}"

# Creating a symbolic link to temporary directory holding work files while job running

ln -s "${TMP_WORK_DIR}" scratch_dir
cd "${TMP_WORK_DIR}" || exit

echo " "
echo "### Copying input files for job (if required):"
echo " "

script_name="${BASH_SOURCE[0]}"
AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"
export AUTOJOB_SLURM_SCRIPT
export AUTOJOB_PYTHON_SCRIPT="run.py"
export AUTOJOB_COPY_TO_SCRATCH="*.chk,*.py,*.traj,*.rwf"
cp -v "$SLURM_SUBMIT_DIR"/{*.chk,*.py,*.traj,*.rwf} "$TMP_WORK_DIR"/

echo " "

# Preemptively end job if getting close to time limit
timeline=$(grep -E -m 1 '^#SBATCH[[:space:]]*--time=' "$script_name")
timeslurm=${timeline##*=}
IFS=- read -ra day_split_time <<< "$timeslurm"
no_days_time=${day_split_time[1]}
days=${no_days_time:+${day_split_time[0]}}
no_days_time=${day_split_time[1]:-${day_split_time[0]}}
IFS=: read -ra split_time <<< "$no_days_time"

# Time formats with days: D-H, D-H:M, D-H:M:S
if [[ $days ]]; then
  slurm_days="$days"
  slurm_hours=${split_time[0]}
  slurm_minutes=${split_time[1]:-0}
  slurm_seconds=${split_time[2]:-0}
# Time format without days: M, M:S, H:M:S
else
  slurm_days=0
  if [[ ${#split_time[*]} == 3 ]]; then
    slurm_hours=${split_time[0]}
    slurm_minutes=${split_time[1]}
    slurm_seconds=${split_time[2]}
  else
    slurm_hours=0
    slurm_minutes=${split_time[0]}
    slurm_seconds=${split_time[1]:-0}
  fi
fi

echo "Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."
timeslurm=$(echo "$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds" | bc)
echo "This means $timeslurm seconds."
timeslurm=$(echo "$timeslurm *0.9" |bc)
echo "Will terminate at ${timeslurm}s to copy back necessary files from scratch"

echo ""
echo ""

# run ase calculation and time
timeout "${timeslurm}" python3 "$AUTOJOB_PYTHON_SCRIPT"

exit_code=$?

if [ "$exit_code" -eq 124 ]; then
  echo " "
  echo "Cancelled due to time limit."
else
  echo " "
  echo "Time limit not reached."
fi

echo " "
echo "### Cleaning up files ... removing unnecessary scratch files ..."
echo " "

AUTOJOB_FILES_TO_DELETE="*.d2e *.int *.rwf *.skr *.inp"
rm -vf "$AUTOJOB_FILES_TO_DELETE"
sleep 10 # Sleep some time so potential stale nfs handles can disappear.

echo " "
echo "### Compressing results and copying back result archive ..."
echo " "
cd "${TMP_BASE_DIR}" || exit
mkdir -vp "${SLURM_SUBMIT_DIR}" # if user has deleted or moved the submit dir

echo " "
echo "Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."
echo " "

tar -zcvf "${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \
  || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; }

echo " "
echo "### Remove TMP_WORK_DIR ..."
echo " "
rm -rvf "${TMP_WORK_DIR}"

echo " "
echo "Extracting result tgz-file"
echo " "

cd "${SLURM_SUBMIT_DIR}" || exit
tar -xzf "${JOB_WORK_DIR}".tgz
mv "${JOB_WORK_DIR}"/* .
rm -r "${JOB_WORK_DIR}".tgz "${JOB_WORK_DIR}"

rm "${SLURM_SUBMIT_DIR}/scratch_dir"
echo "END_TIME             = $(date +'%y-%m-%d %H:%M:%S %s')"

# Record job in log file

echo "${SLURM_JOB_ID}-${SLURM_JOB_NAME}" is complete: on "$(date +'%y.%m.%d %H:%M:%S')" "${SLURM_SUBMIT_DIR}" >> ~/job.log

echo " "
echo "### Exiting with exit code ${exit_code}..."
echo " "
exit "$exit_code"

samples/slurm/orca.sh
``` py title="samples/slurm/orca.sh" linenums="1"

Tip

Edit the brace expansion in line 96 or 101 to change the files copied to the scratch directory.

Reminder

Don't forget to replace JOB_NAME, SFU_ID, and PYTHON_SCRIPT with appropriate values in addition to setting your desired SLURM parameters. Also, if you don't define the path to a Python virtual environment in your .bashrc file, then you should replace $COMP_CHEM_ENV with the path to the activate script (usually, path-to-environment/bin/activate).

Launch an interactive job

The following script initiates an interactive session on the cluster to avoid using the login node.

samples/slurm/InteractiveJob.sh

#!/bin/bash
# This script simply initiates an interactive session on the cluster to avoid using the login node.
# The session has 4 cores, 12 GB memory, X11 forwarding, and a 1 hour time limit.
(cd ~/scratch/ || exit; salloc --x11 --time=01:00:00 --mem-per-cpu=3G --ntasks=4 --account="$SALLOC_ACCOUNT")