GULP with MPI/LAM
I have been runnning MD jobs with GULP (General Utility Lattice
Program) on a 16 node Athlon based Beowulf cluster but have been
unable to run consecutive MPI jobs without a remake of the source
(just deleting the final source file and remaking - taking a second or
two) each time. Although it works, it is a rather unsatisfactory way
of working and limits me to one MPI job at a time.
It seems that every time GULP finishes it somehow manages to
corrupt the LAM MPI libraries, and when a second run of GULP is
attempted an error occurs just as the first step is to proceed:
******************************************************************************
Output for configuration 1
*
******************************************************************************
MPI_Recv: process in local group is dead (rank 0,
MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Recv()
Rank (0, MPI_COMM_WORLD): - MPI_Reduce()
Rank (0, MPI_COMM_WORLD): - MPI_Allreduce()
Rank (0, MPI_COMM_WORLD): - main()
Looking at the source code this seems to be the point in the code
where the memory allocation takes place. I have therefore tried all
combinations of -DMALLOC and -DF90 in the getmachine setup
but none seems to fix the problem.
I am using LAM-MPI compiled with the GNU compilers and
configured to use the pgf90 compiler under Linux.
Has anyone managed to compile a more robust LAM-
MPI/pgf90/GULP setup (or similar), or does anyone have any
suggestions of how to do so. An mpirun of a consecutive GULP job
with np 1 gives no problem only with 2 or more processors. I have
also tried using MPICH but I couldn't even run/compile/link GULP
sucessfully then.
Many thanks for your help in advance.
Stefan
________________________________________________________
Dr Stefan T. Bromley
Laboratory of Applied Organic Chemistry and Catalysis
DelftChemTech, Delft University of Technology
Julianalaan 136, 2628 BL Delft
The Netherlands
Phone : + 31 1527 89418
e-mail : S.T.Bromley - at - tnw.tudelft.nl
________________________________________________________