UKAFF hints for users

Hints for UKAFF1A users

Here are some general hints and tips for compiling and running codes on UKAFF1A. They are solutions found to problems experienced by UKAFF1A users. Most are not hard-and-fast rules however; your mileage may vary.

Compilers and compiler flags
How much memory will my code use?
Process limits
Slave process stack size

Compilers and compiler flags

When compiling OpenMP codes you should use the xlf_r and xlc_r compilers (not xlc or xlf) as these generate thread-safe code. The GNU compilers (gcc and g77) are also present on the system, but these do not support OpenMP.

For running large jobs (using more than 2GB of memory) it is almost invariably necessary to compile the code to produce a 64-bit executable. To achieve this, add the '-q64' switch to the compiler line. Where a program is composed of more than one source file, all should be compiled with this switch.

By default the compiler ignored OpenMP directives; to force it to take notice of these, add the flag '-qsmp=omp.'

How much memory will my code use?

When submitting batch jobs you will need to define how much memory your job requires. For codes which allocate data array statically (ie. they declare large arrays at compile time, rather than using dynamic memory allocation schemes such as F90 ALLOCATE, or C/C++ malloc(), etc.) then you can get an estimate of the memory footprint of the code using the size64 command, eg.

# size64 sph_tree
  text    data     bss     dec     hex filename
430964   55041 456939344         457425349      1b43c1c5       sph_tree

The command shows the size of the three memory segments in the code ('text' holds the macine instructions generated from the source code, 'data' holds initialised static variables, eg. DATA segments and so on, and the 'bss' segment holds uninitialised statically allocated arrays).

In most cases the 'bss' segment will dominate (in this example above it's ~450MB).

Process limits

The default limit for the stack size for a process on UKAFF is set to 8MB. Automatically allocated variables (and arrays) in both FORTRAN and C are created on the stack. This means that if you are calling a subroutine or function which declares a large temporary array, there is a possibility that the process limit will be exceeded, potentially causing silent corruption of data, or the code to crash with a segmentation fault

To avoid this users should raise their per-process stack limit using the command:

  % limit stacksize unlimited

Slave process stack size

Another stack-related gotcha relates to the size of stack allocated to the indivdual threads in a parallel process. By default each thread is allocated a stack of 4MB. This stack is used to allocate automatic variables which are local (private) to the thread. (NB. shared data, even if allocated from the master threads stack, does count against this limit).

The problem arises because the memory from which these stacks are allocated is shared between the threads, and is contiguous, so any thread overflowing its allocation will corrupt the stack of another thread. A tell-tale symptom of this problem is corruption which occurs when a program is run in parallel, but which works correctly with run serially.

It is possible to raise the per-thread stack allocation by setting the XLSMPOPTS environment variable to be the required size (in bytes), eg.

  % setenv XLSMPOPTS 'STACK=67108864'

sets the per-slave stack size to be 64MB.