Checking Program Performance

It is worthwhile checking how well your program runs, especially if you are not familiar with it, or are using a different configuration. There are some tools available, although you will have to give some thought to what the results mean.

If your job puts a heavy load on the node, it will run slowly, and it the worst cases will break the queuing system, killing other people's jobs. The causes of this are usually insufficient memory or excessive disk IO.

Find which node your job is running on with qstat -n or qstat -f.

Login to the node with e.g. ssh b-1-10 (on bohr use rlogin eg rlogin comp22)

Bear in mind that some programs make take some time with initialisation and it may be a few hours before it is using full memory etc.

The first thing you should do is use the command top. See this page for information on what you should look for. Also read the man page.

Check memory usage with:

      free -m

See this page for information on what you should look for. Also read the man page.

Check disk activity with:

      sar -o sar.dat -b 1 30 (this collects the data)
      sar -f sar.dat -b 1 30 (this displays the data)

See this page for information on what you should look for. Also read the man page.

You can check which local disk partition activity is occurring on with:

      iostat 5 5 (note that the first set of stats is since the last time iostat was run - ignore it)
      df | grep hda5 (if the activity is occurring on eg hda5)

Ab-initio chemistry programs often store integrals in a file and then re-read them when required, instead of recalculating them. This is generally a bad thing, as disk IO is slow compared to CPU speed, and so slows the program down as well as imposing a big load on the disk. There are usually keywords to put in the input file, to force recalculation.