UKAFF hints for users
Hints for UKAFF1A users
Here are some general hints and tips for compiling and running codes
on UKAFF1A. They are solutions found to problems experienced by UKAFF1A
users. Most are not hard-and-fast rules however; your mileage may vary.
Compilers and compiler flags
How much memory will my code use?
Process limits
Slave process stack size
Compilers and compiler flags
When compiling OpenMP codes you should use the xlf_r and
xlc_r compilers (not xlc or xlf) as these generate
thread-safe code. The GNU compilers (gcc and g77) are also
present on the system, but these do not support OpenMP.
For running large jobs (using more than 2GB of memory) it is almost
invariably necessary to compile the code to produce a 64-bit
executable. To achieve this, add the '-q64' switch to the
compiler line. Where a program is composed of more than one source
file, all should be compiled with this switch.
By default the compiler ignored OpenMP directives; to force it to take notice
of these, add the flag '-qsmp=omp.'
How much memory will my code use?
When submitting batch jobs you will need to define how much memory your job
requires. For codes which allocate data array statically (ie. they declare
large arrays at compile time, rather than using dynamic memory allocation
schemes such as F90 ALLOCATE, or C/C++ malloc(), etc.) then you
can get an estimate of the memory footprint of the code using the
size64 command, eg.
# size64 sph_tree
text data bss dec hex filename
430964 55041 456939344 457425349 1b43c1c5 sph_tree
The command shows the size of the three memory segments in the code ('text'
holds the macine instructions generated from the source code, 'data' holds
initialised static variables, eg. DATA segments and so on, and the 'bss'
segment holds uninitialised statically allocated arrays).
In most cases the 'bss' segment will dominate (in this example above it's
~450MB).
Process limits
The default limit for the stack size for a process on UKAFF is set to
8MB. Automatically allocated variables (and arrays) in both FORTRAN
and C are created on the stack. This means that if you are calling a
subroutine or function which declares a large temporary array, there
is a possibility that the process limit will be exceeded, potentially causing
silent corruption of data, or the code to crash with a segmentation fault
To avoid this users should raise their per-process stack limit using the command:
% limit stacksize unlimited
Slave process stack size
Another stack-related gotcha relates to the size of stack allocated to
the indivdual threads in a parallel process. By default each thread is
allocated a stack of 4MB. This stack is used to allocate automatic variables
which are local (private) to the thread.
(NB. shared data, even if allocated from the master threads stack,
does count against this limit).
The problem arises because the memory from which these stacks are
allocated is shared between the threads, and is contiguous, so any
thread overflowing its allocation will corrupt the stack of another
thread. A tell-tale symptom of this problem is corruption which occurs
when a program is run in parallel, but which works correctly with run
serially.
It is possible to raise the per-thread stack allocation by setting the
XLSMPOPTS environment variable to be the required size (in bytes), eg.
% setenv XLSMPOPTS 'STACK=67108864'
sets the per-slave stack size to be 64MB.
|