Fri Feb 9 10:57:29 GMT 2001

The Beowulf Cluster

Contents

Using the Cluster

The School of Computing Beowulf consists of 8 dual-node Pentium III PCs connected by a very fast `Scali' network. These machines are intended as number-crunchers: they run users' programs and very little else.

For this sort of system to run efficiently, work has to be divided evenly between the computers and individual machines should not be given more work than they can handle.

For this reason, users do not log onto cluster machines directly but submit lists of commands - known as jobs - to a batch system.

The batch system, a modified version of OpenPBS, distributes the jobs around the cluster. If the cluster is too busy, the job will be queued until resources become available.

Copying files to and from the Cluster

Users with accounts on the cluster will be able to access their directory space from other departmental computers. The space will appear in:

 /home/cswuk1/username

You can use the usual Unix commands: cp, mv etc to work on files in this directory.

Submitting jobs to the Cluster

Single processor jobs

To create a job, list the commands that you intend to run in a file and send it to the cluster using a command like

 qsub my.job

The job will be given an identifier, which will be of the form number.hostname. This can be used to track the progress of your job and, if neccessary, to remove it from the system.

When the batch system is ready, the commands in the file my.job will be executed on one of the nodes in the cluster. Anything sent to standard output and standard error will appear in files called my.job.o and my.job.e with the job ID number appended.

qsub provides extra options to control how these output files are generated and to report on its progress.

-N NAME Use NAME instead of the job filename as the prefix for the output files.
-j oe Merge standard error and standard output and store both in the .o file.
-m e Send mail when the job has finished. Not currently working.

It is possible to embed these options in the job itself by embedding `magic' comments starting with #PBS. For example, to set the name of the job to `COMPILE' and save both standard output and standard error in COMPILE.onnn, the job could start with

 #PBS -N COMPILE
 #PBS -j eo

Parallel Jobs

To submit a parallel job, you use qsub with a resource limit specifying how many processes you need. On this system, you would use a command like

 qsub -l vnodes=4 mpi.job

This will allocate 4 `virtual nodes' - effectively individual CPUs - for the job and execute the job on one of these. It is up to the job itself to spawn the processes on the remaining three CPUs.

A special mpirun command has been provided to start MPI programs from within a job. This reads the virtual node list provided by OpenPBS and launches the processes on all the chosen nodes.

Resource Limits

qsub can be passed other resource limits. These are used by the batch system to priorotise and possibly reject jobs. These resource limits are set by adding -l options to the qsubcommand. For example

 qsub -l walltime=1:00

limits the total `real' time for a job to one minute.

Common resource limits include:

Resource Purpose Current Default
walltime Maximum `wallclock' time. Format hh:mm:ss. 10 minutes
cput Maximum CPU time. As all computers spend some time waiting rather than processing, this will be less than the walltime. 10 minutes
mem Maximum memory usage. All available memory
vnodes

The number of `virtual nodes' i.e. processors on which the program is to run. This is not part of standard OpenPBS.

For reasons best known to itself, this only works if you choose an even number of processors

16

Controlling jobs

You can see where you job is in the queue using the qstat queuename command. i.e

 qstat

will produce something like

 Job id           Name             User             Time Use S Queue  
 ---------------- ---------------- ---------------- -------- - -----  
 240.cswuk1       COMPILE          jasonl                  0 R cluster
 241.cswuk1       RUN4             jasonl                  0 R cluster
 244.cswuk1       RUN16            jasonl                  0 Q cluster

The listing shows three jobs in the queuing system; all waiting in the cluster queue and the first two of which are running.

If you wish to remove a job from the queue, use qdel. This needs to be given the job ID and server name i.e.

 qdel 244.cswuk1

This will delete the job called 244.cswuk1 from the server running on cswuk1.

MPI jobs

To compile an MPI C program on the cluster, copy the source to your /home/cswuk1/username directory; create a PBS job called compile.job containing...

 #PBS -N COMPILE
 #PBS -j oe

 mpicc -o mpiprog mpiprog.c

and use qsub to send it to the cluster.

mpicc is a replacement for the standard cc command which sets the flags required for building MPI programs. FORTRAN fans have an equivalent mpif77 command.

To run the program, create a second job run.job containing

 #PBS -N RUN
 #PBS -j eo

 mpirun mpiprog

This can be submitted to the cluster using a command like:

 qsub -l vnodes=4 run.job