Submitting Serial Jobs Using a Script

Jobs are submitted to the queuing system using the command qsub which submits a (shell) script to the queue. Note that it can only execute scripts, not binary (executable) files. A shell script is a file containing a command, or list of commands as you would type them on the command line to run the job interactively. Supposing this shell-script is called filename, it should be submitted with the command:

      qsub [options] filename

Below is a very simple script called krun01 to run the program KSPACE. The executable is in the subdirectory /KSPACE/GNU/bin of the users home directory. The input files are in the subdirectory KRUNs/GNU/run01, which is where the output files will go.

#!/bin/tcsh -f
#
# PBS job submission script
#
cd ${HOME}/KRUNS/GNU/run01
${HOME}/KSPACE/GNU/bin/KSPACE

This script is very simple to read. The first line starting #! says what shell should be used to interpret the script (in this case tcsh). The -f indicates that the file .tcshrc in the users home directory should not be read, as it may contain some customisations for interactive use. The line:

      # PBS job submission script

is a comment line. If you start a line with # and then a whitespace it will be taken as a comment line by the queuing system. If you leave out the whitespace it may possibly be interpreted as a directive, with unpredictable results. After the first line that does not begin with #, the queuing system will not interpret any further lines starting with # as directives, regardless of whether they are '#word' or '# word'.
The cd command changes to the correct directory and then the next line runs the program KSPACE. The job would be submitted with the command:

      qsub -l walltime=20:00:00 krun01

where       -l walltime=20:00:00

is the option which requests 20 hours of running time. See the man page on qsub for the possible options. The -l option indicates the resources required. See the man page pbs_resources_linux for the possible resources. For a serial job walltime is generally the only resource required. Note that as the Maui scheduler instead of the PBS scheduler is used, not all options and resources are available.

A better method of submitting a job is to embed the options in the script, and then you have a record of the options you asked for. This is done with directive lines of the form:

      #PBS option

You really do want the # at the start of the line, with no space between it and PBS. Doing this the above script would look like this:

#!/bin/tcsh -f
#
#PBS -l walltime=20:00:00
#
cd ${HOME}/KRUNS/GNU/run01
${HOME}/KSPACE/GNU/bin/KSPACE

and would be submitted with the simple command:

      qsub krun01

Remember that the queuing system examines your script for directives until it comes to the first line not starting with the # symbol. It will not look any further after this for directives.

By default, the queuing system will create two files,
Your_jobname.oYour_jobIDno for standard out and
Your_jobname.eYour_jobIDno for standard error,
(although they may well only contain a couple of system announcements). These files will be in the directory you were in when you submitted the job. You can use different filenames with the options: -o your_filename1 (for stdout) and -e your_filename2 (for stderr), or merge the two to stdout with the option -j oe (you can also use the -o option with this). The files will only be written once the job has finished- if you wish to see the output as your job runs, you should redirect it to a file in your script- eg use &> which redirects both standard output and standard error, so instead of ${HOME}/KSPACE/GNU/bin/KSPACE in your submission script, use

      ${HOME}/KSPACE/GNU/bin/KSPACE &> KSPACE.out

Another useful flag is -m[abe]. The -m option directs the system to send you email, a - if the job aborts, b - when it begins and e - when it ends. If you wish to use this option, you should create a file called .forward (there really is a dot at the beginning) in your home directory containing a single line, your Bristol email address. The email will then get forwarded to where you normally collect your email. If you do not do this the email will remain on the server you queued the job on. If you use the option for the system to send you an email when the job ends, it will also contain some useful information on how long the job took and how much memory it needed etc.

If you wish to run your job on one of the large memory nodes, use the resource nodes=1:bigmem. e.g.

      #PBS -l walltime=20:00:00,nodes=1:bigmem

A comma separates the resources walltime and nodes. Even though the default number of nodes is one, you must specify it when asking for the attribute bigmem, and bigmem must be after the number of nodes, and separated from it by a colon. See the web pages for the queuing system for each cluster for further information about resources that can be requested.

In certain circumstances, such as if you are running a job which does a lot of file reading and writing, it is necessary to have the files in scratch space on the node instead of in your userspace. Below is a PBS submission script which creates a directory username.IDno in scratch space on the node (/data or /tmp, depending on which cluster you're using). It copies input files to it, runs the job, moves the files back and deletes the directory. It stores the name of the nodes and the scratch directory in a file called rundat. To use this you would merely have to alter the variable MYDIR to the directory where your input files are, and MYEX to the executable you want to use. A copy of this script is in /usr/local/sbin, called qsubserial.example, which, if you wish, you can copy, rename and edit (there appears to be a maximum of 15 characters for the length of the name). If your executable takes arguments e.g. if you with to redirect stdout, include this in MYEX (enclosed in the inverted commas) e.g

      setenv MYEX "${HOME}/TEST1/a.out > outfile"

#!/bin/tcsh -f
#
#PBS -l walltime=40:00:00
#
# ---------------------------------------------
# you would edit this section
#
setenv MYDIR "${HOME}/KRUNS/GNU/run02"
setenv MYEX "${HOME}/KSPACE/GNU/bin/KSPACE"
# on grendel & bohr TMPDIR is "tmp" on dirac it's "data"
setenv TMPDIR "tmp"
#
# ---------------------------------------------
#   everything from here on is standard
#
#  get name for scratch directory and create it
#
setenv JOBNO `echo $PBS_JOBID | cut -d . -f 1`
setenv WORKDIR "/${TMPDIR}/${PBS_O_LOGNAME}.${JOBNO}"
mkdir $WORKDIR
#
#  go to scratch directory and copy files to it
#
cd $WORKDIR
cp ${MYDIR}/* .
#
#  store name of node and scratch directory in file
#  rundat in file directory
#
hostname > ${MYDIR}/rundat
pwd >> ${MYDIR}/rundat
#
#   run program
#
$MYEX
#
#  copy files back and delete scratch directory
#
cd ${MYDIR}
mv -f ${WORKDIR}/* .
rm -fr $WORKDIR

The principle for writing submission scripts for parallel jobs is the same. The content of the script will be different, depending on which inter-node message passing interface is in use on that particular system. A page on parallel scripts for dirac can be found here, other clusters are similar but lack the myrinet interface.