|
SEARCH
TOOLBOX
LANGUAGES
Forum Menu
MPI-parallel plane wave build
From NWChem
Viewed 2587 times, With a total of 6 Posts
|
Clicked A Few Times
Threads 1
Posts 6
|
|
11:47:48 AM PST - Mon, Jan 7th 2013 |
|
Hello, all.
I've been trying for some time to get a functional build of the plane wave solvers that will work in parallel via MPI, however I've thus far come up empty. I've tried building on RedHat 6.0, RedHat 6.3, CentOS 6.3, and CentOS 5.8, using MPICH 2, OpenMPI, GNU Fortran, and Intel compilers, yet I always see the same behavior: When run in parallel with a plane wave input, the nwchem processes just seem to 'hang'. stracing the processes just show them all to be polling indefinitely:
epoll_wait(4, {}, 32, 0) = 0
epoll_wait(4, {}, 32, 0) = 0
epoll_wait(4, {}, 32, 0) = 0
epoll_wait(4, {}, 32, 0) = 0
epoll_wait(4, {}, 32, 0) = 0
NWChem run in serial with the same input runs to completion.
To eliminate as many variables as possible, I'm currently building on a stock CentOS 6.3 system with GNU compilers and OpenMPI installed (OpenMPI being version 1.5.4 provided with CentOS/RedHat). The system has no Infiniband, to further simplify things.
This is the process I am using to perform the build/install:
cd nwchem-src-2012-12-01
export NWCHEM_TOP=$PWD
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=yes
export MPI_LOC=/usr/lib64/openmpi
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=/usr/include/openmpi-x86_64
export LIBMPI="-pthread -m64 -lmpi_f90 -lmpi_f77 -lmpi -ldl"
export LARGE_FILE=TRUE
export NWCHEM_MODULES=all
export FC=gfortran
cd $NWCHEM_TOP/src
make realclean
make nwchem_config 2>&1 | tee make.nwchem_config.out.$(date +%Y%m%d%H%M)
make FC=gfortran 2>&1 | tee make.out.$(date +%Y%m%d%H%M)
NWCHEM_INSTALL_DIR=$HOME/nwchem/ompi/20121201
mkdir -p $NWCHEM_INSTALL_DIR/bin
mkdir -p $NWCHEM_INSTALL_DIR/data
cp $NWCHEM_TOP/bin/${NWCHEM_TARGET}/nwchem $NWCHEM_INSTALL_DIR/bin
chmod 755 $NWCHEM_INSTALL_DIR/bin/nwchem
cp -r $NWCHEM_TOP/src/basis/libraries $NWCHEM_INSTALL_DIR/data/
cp -r $NWCHEM_TOP/src/data $NWCHEM_INSTALL_DIR/
cp -r $NWCHEM_TOP/src/nwpw/libraryps $NWCHEM_INSTALL_DIR/data/
cat << _EOF_ > $NWCHEM_INSTALL_DIR/data/nwchemrc
nwchem_basis_library $NWCHEM_INSTALL_DIR/data/libraries/
nwchem_nwpw_library $NWCHEM_INSTALL_DIR/data/libraryps/
ffield amber
amber_1 $NWCHEM_INSTALL_DIR/data/amber_s/
amber_2 $NWCHEM_INSTALL_DIR/data/amber_q/
amber_3 $NWCHEM_INSTALL_DIR/data/amber_x/
amber_4 $NWCHEM_INSTALL_DIR/data/amber_u/
spce $NWCHEM_INSTALL_DIR/data/solvents/spce.rst
charmm_s $NWCHEM_INSTALL_DIR/data/charmm_s/
charmm_x $NWCHEM_INSTALL_DIR/data/charmm_x/
_EOF_
ln -s $NWCHEM_INSTALL_DIR/data/nwchemrc $HOME/.nwchemrc
export LD_LIBRARY_PATH=${MPI_LIB}:${LD_LIBRARY_PATH}
export NWCHEM=$HOME/nwchem/ompi/20121201/bin/nwchem
My directory structure for the run is like so:
.
./scratch
./output
./output/S2
./output/S2/known
./output/S2/known/S2-example1.nwout
./output/S2/np1
./output/S2/np1/S2-example1.out
./output/S2/np2
./output/S2/np2/S2-example1.out
./perm
./input
./input/S2-example1.nw
I am using a simple plane wave case from the tutorial in the NWChem wiki:
echo
title "total energy of s2-dimer LDA/30Ry with PSPW method"
scratch_dir ./scratch
permanent_dir ./perm
start s2-pspw-energy
geometry
S 0.0 0.0 0.0
S 0.0 0.0 1.88
end
nwpw
simulation_cell
SC 20.0
end
cutoff 15.0
mult 3
xc lda
lmbfgs
end
task pspw energy
NWChem is executed like so:
gabe@centos6.3 [~/nwchem/pw-examples] % mpirun -np 2 $NWCHEM input/S2-example1.nw 2>&1 | tee output/S2/np2/S2-example1.out
ldd of the nwchem binary:
gabe@centos6.3 [~/nwchem/pw-examples] % ldd $NWCHEM
linux-vdso.so.1 => (0x00007fff56c7c000)
libmpi_f90.so.1 => /usr/lib64/openmpi/lib/libmpi_f90.so.1 (0x00007fa22b3fd000)
libmpi_f77.so.1 => /usr/lib64/openmpi/lib/libmpi_f77.so.1 (0x00007fa22b1c9000)
libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x0000003f4da00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003f4c200000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fa22aebd000)
libm.so.6 => /lib64/libm.so.6 (0x0000003f4be00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003f55e00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f4c600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003f4ba00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003f5c600000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003f5ae00000)
libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x0000003f5a200000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f4b600000)
Full disclosure: I am not a chemist, but an HPC administrator trying to get this working on behalf of one of my users, so I apologize in advance for my ignorance regarding the science in play.
I guess my ultimate questions are:
1) Should I even expect the plane wave solvers to work in parallel?
2) Has anyone gotten NWchem 6.x pspw/nwpw working in parallel via MPI recently?
3) If 2), how?
Any help would be greatly appreciated. Thanks in advance,
Gabe
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Vet
Threads 4
Posts 597
|
|
12:49:32 PM PST - Mon, Jan 7th 2013 |
|
Answers
|
1) yes, all parts of NWChem are parallel and scalable.
2) yes, NWChem plane wave runs in parallel on many platforms.
3) The information is incomplete, but let me try:
a) Looks like you're running 64-bit. You may want to try and compile without export USE_MPIF4=yes
b) You say it hangs, where does it hang. Having some output that tells us where it hangs would be helpful. Does it hang at start up or somewhere during the calculation?
Bert
|
|
|
|
Clicked A Few Times
Threads 1
Posts 6
|
|
2:17:57 PM PST - Mon, Jan 7th 2013 |
|
Quote:Bert Jan 7th 8:49 pm1) yes, all parts of NWChem are parallel and scalable.
2) yes, NWChem plane wave runs in parallel on many platforms.
3) The information is incomplete, but let me try:
a) Looks like you're running 64-bit. You may want to try and compile without export USE_MPIF4=yes
b) You say it hangs, where does it hang. Having some output that tells us where it hangs would be helpful. Does it hang at start up or somewhere during the calculation?
Bert
I appreciate the reply, Bert. I will try it without the USE_MPIF4=yes. The run hangs after generating S.vpp. It does generate a number of files, though:
gabe@centos6.3 [~/nwchem/pw-examples] % find scratch perm -ls
3426406 4 drwx------ 2 gabe gabe 4096 Jan 7 16:09 scratch
3426421 4 -rw------- 1 gabe gabe 72 Jan 7 16:09 scratch/s2-pspw-energy.b^-1
3426419 4 -rw------- 1 gabe gabe 72 Jan 7 16:09 scratch/s2-pspw-energy.b
3426420 4 -rw------- 1 gabe gabe 32 Jan 7 16:09 scratch/s2-pspw-energy.p
3426415 4 -rw------- 1 gabe gabe 32 Jan 7 16:09 scratch/s2-pspw-energy.c
3426418 4 -rw------- 1 gabe gabe 32 Jan 7 16:09 scratch/s2-pspw-energy.zmat
3426407 4 drwx------ 2 gabe gabe 4096 Jan 7 16:09 perm
3426414 92 -rw------- 1 gabe gabe 91601 Jan 7 16:09 perm/s2-pspw-energy.db
3426424 156 -rw------- 1 gabe gabe 156209 Jan 7 16:09 perm/S.psp
3426422 2540 -rw------- 1 gabe gabe 2600426 Jan 7 16:09 perm/S.vpp
Also perhaps worth mentioning is that a 512MB shared memory segment is created, which I notice does not happen when nwchem is run in serial with this input:
gabe@centos6.3 [~/nwchem/pw-examples/output] % ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 131072 gabe 600 393216 2 dest
0x00000000 163841 gabe 600 393216 2 dest
0x00000000 196610 gabe 600 393216 2 dest
0x00000000 229379 gabe 600 393216 2 dest
0x00000000 425988 gabe 600 393216 2 dest
0x00000000 884741 gabe 600 393216 2 dest
0x00000000 917510 gabe 600 393216 2 dest
0x00000000 1114119 gabe 600 536870912 2 dest
Here's the complete output:
gabe@centos6.3 [~/nwchem/pw-examples] % mpirun -np 2 $NWCHEM input/S2-example1.nw 2>&1 | tee output/S2/np2/S2-example1.out
argument 1 = input/S2-example1.nw
============================== echo of input deck ==============================
echo
title "total energy of s2-dimer LDA/30Ry with PSPW method"
scratch_dir ./scratch
permanent_dir ./perm
start s2-pspw-energy
geometry
S 0.0 0.0 0.0
S 0.0 0.0 1.88
end
nwpw
simulation_cell
SC 20.0
end
cutoff 15.0
mult 3
xc lda
lmbfgs
end
task pspw energy
================================================================================
Northwest Computational Chemistry Package (NWChem) 6.1.1
--------------------------------------------------------
Environmental Molecular Sciences Laboratory
Pacific Northwest National Laboratory
Richland, WA 99352
Copyright (c) 1994-2012
Pacific Northwest National Laboratory
Battelle Memorial Institute
NWChem is an open-source computational chemistry package
distributed under the terms of the
Educational Community License (ECL) 2.0
A copy of the license is included with this distribution
in the LICENSE.TXT file
ACKNOWLEDGMENT
--------------
This software and its documentation were developed at the
EMSL at Pacific Northwest National Laboratory, a multiprogram
national laboratory, operated for the U.S. Department of Energy
by Battelle under Contract Number DE-AC05-76RL01830. Support
for this work was provided by the Department of Energy Office
of Biological and Environmental Research, Office of Basic
Energy Sciences, and the Office of Advanced Scientific Computing.
Job information
---------------
hostname = centos6.3
program = /home/gabe/nwchem/ompi/20121201/bin/nwchem
date = Mon Jan 7 16:09:46 2013
compiled = Mon_Jan_07_15:53:25_2013
source = /home/gabe/nwchem/ompi/build/nwchem-src-2012-12-01
nwchem branch = Development
nwchem revision = 23203
ga revision = 10141
input = input/S2-example1.nw
prefix = s2-pspw-energy.
data base = ./perm/s2-pspw-energy.db
status = startup
nproc = 2
time left = -1s
Memory information
------------------
heap = 13107201 doubles = 100.0 Mbytes
stack = 13107201 doubles = 100.0 Mbytes
global = 26214400 doubles = 200.0 Mbytes (distinct from heap & stack)
total = 52428802 doubles = 400.0 Mbytes
verify = yes
hardfail = no
Directory information
---------------------
0 permanent = ./perm
0 scratch = ./scratch
NWChem Input Module
-------------------
total energy of s2-dimer LDA/30Ry with PSPW method
--------------------------------------------------
Scaling coordinates for geometry "geometry" by 1.889725989
(inverse scale = 0.529177249)
ORDER OF PRIMARY AXIS IS BEING SET TO 4
D4H symmetry detected
------
auto-z
------
Geometry "geometry" -> ""
-------------------------
Output coordinates in angstroms (scale by 1.889725989 to convert to a.u.)
No. Tag Charge X Y Z
---- ---------------- ---------- -------------- -------------- --------------
1 S 16.0000 0.00000000 0.00000000 -0.94000000
2 S 16.0000 0.00000000 0.00000000 0.94000000
Atomic Mass
-----------
S 31.972070
Effective nuclear repulsion energy (a.u.) 72.0581785872
Nuclear Dipole moment (a.u.)
----------------------------
X Y Z
---------------- ---------------- ----------------
0.0000000000 0.0000000000 0.0000000000
Symmetry information
--------------------
Group name D4h
Group number 28
Group order 16
No. of unique centers 1
Symmetry unique atoms
1
Z-matrix (autoz)
--------
Units are Angstrom for bonds and degrees for angles
Type Name I J K L M Value
----------- -------- ----- ----- ----- ----- ----- ----------
1 Stretch 1 2 1.88000
XYZ format geometry
-------------------
2
geometry
S 0.00000000 0.00000000 -0.94000000
S 0.00000000 0.00000000 0.94000000
==============================================================================
internuclear distances
------------------------------------------------------------------------------
center one | center two | atomic units | angstroms
------------------------------------------------------------------------------
2 S | 1 S | 3.55268 | 1.88000
------------------------------------------------------------------------------
number of included internuclear distances: 1
==============================================================================
****************************************************
* *
* NWPW PSPW Calculation *
* *
* [ (Grassman/Stiefel manifold implementation) ] *
* *
* [ NorthWest Chemistry implementation ] *
* *
* version #5.10 06/12/02 *
* *
* This code was developed by Eric J. Bylaska, *
* and was based upon algorithms and code *
* developed by the group of Prof. John H. Weare *
* *
****************************************************
>>> JOB STARTED AT Mon Jan 7 16:09:46 2013 <<<
================ input data ========================
library name resolved from: compiled reference
NWCHEM_NWPW_LIBRARY set to: </home/gabe/nwchem/ompi/build/nwchem-src-2012-12-01/src/nwpw/libraryps/>
Generating 1d pseudopotential for S
Generated formatted_filename: ./perm/S.vpp
|
|
|
|
Clicked A Few Times
Threads 1
Posts 6
|
|
2:43:08 PM PST - Mon, Jan 7th 2013 |
|
I unset USE_MPIF4=yes and rebuilt. I still see the same behavior.
Gabe
|
|
|
-
Bylaska Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Clicked A Few Times
Threads 0
Posts 34
|
|
9:04:32 AM PST - Tue, Jan 8th 2013 |
|
Try setting
setenv USE_MPIF4 y
instead of
setenv USE_MPIF4 yes
|
|
|
|
Clicked A Few Times
Threads 1
Posts 6
|
|
9:13:23 AM PST - Tue, Jan 8th 2013 |
|
Quote:Bylaska Jan 8th 5:04 pmTry setting
setenv USE_MPIF4 y
instead of
setenv USE_MPIF4 yes
I actually was setting USE_MPIF4=y; the 'yes' was a typo in my original post. I appreciate the reply, however.
Fortunately, after picking the brain of a colleague at another site, and poring over yet more forum posts, I came up with a winning combination late yesterday by adding the following:
ARMCI_NETWORK=SPAWN
MSG_COMMS=MPI
I've tested my plane wave examples up to 24 processes with success and will now turn things over to my users for verification.
Thanks for the help!
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Vet
Threads 4
Posts 597
|
|
12:34:13 PM PST - Thu, Jan 10th 2013 |
|
SPAWN
|
To use the MPI spawn approach you would have to set ARMCI_NETWORK=MPI-SPAWN.
You got it to compile, but it build something else than you thought. It actually build it with the tcgmsg/MPI network.
Bert
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension Forum theme style by: AWC
| |