From NWChem
You are viewing a single post from the thread title above
|
Clicked A Few Times
Threads 3
Posts 6
|
|
2:30:23 PM PST - Mon, Dec 2nd 2013 |
|
I compiled 6.3 with following settings:
setenv NWCHEM_TARGET LINUX64
setenv USE_MPI y
- setenv ARMCI_NETWORK VAPI
- setenv ARMCI_NETWORK MPI
- setenv ARMCI_NETWORK MPI2
setenv ARMCI_NETWORK OPENIB
setenv IB_HOME /usr
setenv IB_INCLUDE $IB_HOME/include
setenv IB_LIB $IB_HOME/lib64
setenv IB_LIB_NAME "-libverbs -libumad -lpthread"
setenv MA_USE_ARMCI_MEM 1
setenv MPI_LOC /opt/sgi/mpt/mpt-2.08
- setenv MPI_LOC /app/intel/impi/4.0.3.008/intel64/
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI -lmpi
setenv NWCHEM_MODULES all
setenv DISABLE_F77 1
setenv MKL_LIB /app/intel/mkl/lib/intel64
setenv MKL_INC /app/intel/mkl/include
setenv INTEL_LIB /app/intel/lib/intel64/
setenv LASOPT "-L$MKL_LIB -I$MKL_INC -L$INTEL_LIB -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm"
But the binary doesn't work when it runs across nodes (it works within a node) , no matter what ARMCI_NETWORK used. It gives following errors:
from .out file:
argument 1 = kiet_scf.nw
-10016:Segmentation Violation error, status=: 11
(rank:-10016 hostname:r28i1n16 pid:2263722):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
16:Child process terminated prematurely, status=: 256
(rank:16 hostname:r28i1n16 pid:2263705):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigChldHandler():178 cond:0
-10000:Segmentation Violation error, status=: 11
(rank:-10000 hostname:r27i0n17 pid:4037891):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
0:Child process terminated prematurely, status=: 256
(rank:0 hostname:r27i0n17 pid:4037874):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigChldHandler():178 cond:0
from error file:
ARMCI master: wait for child process (server) failed:: No child processes
MPT: Global rank 16 is aborting with error code 256.
Process ID: 2263705, Host: r28i1n16, Program: /work1/app/nwchem/nwchem-6.3.revision2b/bin/LINUX64/nwchem
Please advise!
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC