|
SEARCH
TOOLBOX
LANGUAGES
Forum Menu
Armci error 260 cond:0
From NWChem
Viewed 1798 times, With a total of 12 Posts
|
Clicked A Few Times
Threads 2
Posts 5
|
|
5:44:36 PM - Tue, Nov 2nd 2010 |
|
Hello, the last few weeks, I have been trying to analyse a nwchem crash.
The input of the calculation is from the Benchmarks of this site and called C 240 Buckminster Fullerene.
This is being calculated on 32 nodes with 2 Xeon CPU's both with hyperthreading enabled so each compute
node has 4 computational units. The network interconnections are plain Gigabit Ethernet.
The first crashes were with a home built binary with O3 compiler optimisation. Then I built it again with
O2 optimisation and everything stops at the exactly same spot and both binarys stop after a computation
of almost equal duration. Now both builds were done with Intel MKL so the next step ist to remove MKL and
see what it does. Also the program is built with mpich2 and ifort compiler.
It seems that ARMCI is somehow incorrectly configured or somehow does not now how to communicate.
The significant error seems to be ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
I still have not dug into the code to find out what that means.
Here is an excerpt from the nwchem log.
dft energy failed 0
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
278: task dft energy
------------------------------------------------------------------------
------------------------------------------------------------------------
This type of error is most commonly
associatated with calculations not reaching convergence criteria
------------------------------------------------------------------------
For more information see the NWChem manual at
http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
For further details see manual section:
0:0:dft energy failed:: 0
(rank:0 hostname:j314.jotunn.rhi.hi.is pid:13071):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
I am working on testing some alternatives to try out: Eliminating MKL, Eliminating BLAS altogether, Trying Atlas and lapack.
Should I use Intel CC instead of the GNU CC.
Best regards, Anna Jonna.
|
|
|
|
Clicked A Few Times
Threads 2
Posts 5
|
|
7:58:23 PM - Tue, Nov 2nd 2010 |
|
Running a precompiled binary, from this site gives the same error but much sooner.
This time an error is reported creating Global Arrays.
This is with the same mpirun as previously, ie mpich2 compiled with ifort.
I am going to build it again with GNU fortran compiler and see what happens.
Screening Tolerance Information
-------------------------------
Density screening/tol_rho: 1.00D-10
AO Gaussian exp screening on grid/accAOfunc: 14
CD Gaussian exp screening on grid/accCDfunc: 20
XC Gaussian exp screening on grid/accXCfunc: 20
Schwarz screening/accCoul: 1.00D-08
------------------------------------------------------------------------
dft_main0d: Error creating ga 0
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
278: task dft energy
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
For more information see the NWChem manual at http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
For further details see manual section:
Last System Error Message from Task 0:: Inappropriate ioctl for device
0:0:dft_main0d: Error creating ga:: 0
(rank:0 hostname:j314.jotunn.rhi.hi.is pid:24216):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
0: ARMCI aborting 0 (0).
0: ARMCI aborting 0 (0).
system error message: Invalid argument
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This suggests you are not providing enough memory. Are you setting the memory keyword in the input? If so, 1) what is the size, and 2) how much memory do you have per node? Remember that the memory allocation is per core/processor in a node. And, you need to leave some memory for the operating system.
Bert
|
Edited On 10:07:14 PM - Thu, Nov 4th 2010 by Bert
|
|
|
|
Just Got Here
Threads 0
Posts 3
|
|
6:41:42 PM - Thu, Nov 4th 2010 |
|
Similar problems
|
Hello!
Firstly, I have to say that I got a similar problem, not the answer for the last post... sorry..
In my case I'm doing a RI-MP2 geometry optimization of a complex containing Ni, C, O and H atoms.
I did a first optimization with the cc-pVDZ basis set for the light atoms and 6-31G (and in a further calc. with 6-31G** basis,) for the Ni atom successfully (a total of 388 basis functions).
The problems arise when I change to a bigger basis set for the Ni atom.
If I change to the cc-pVDZ (specified explicitly and available from http://tyr0.chem.wsu.edu/~kipeters/basis.html) or cc-pVTZ basis set (specified explicitly or with the line "Ni library cc-pVTZ"), the process fails with the error:
1:Bus error, status=: 7
(rank:1 hostname:cl1n006 pid:28807):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
4:Bus error, status=: 7
2:Bus error, status=: 7
(rank:2 hostname:cl1n006 pid:28813):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
(rank:4 hostname:cl1n006 pid:28809):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
"cokni_1h.out" 819L, 29602C
There is not input errors and was the same for a single point energy or a DFT (XC B3LYP) trial.
- could be the problems related with the total memory specification?
I'm using the line:
memory 30000 mb
- Is there some rule to estimate the optimal amount of memory required for a calculation in NWchem?
Thanks in advance for any suggestion!
Good luck!
NWchem is running on a SGI-Altix cluster compiled with the intel fortran compilers, Infiniband support and MPI.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 278
|
|
10:05:16 PM - Thu, Nov 4th 2010 |
|
Quote:Diegoagomezh Nov 4th 6:41 pmHello!
Firstly, I have to say that I got a similar problem, not the answer for the last post... sorry..
In my case I'm doing a RI-MP2 geometry optimization of a complex containing Ni, C, O and H atoms.
I did a first optimization with the cc-pVDZ basis set for the light atoms and 6-31G (and in a further calc. with 6-31G** basis,) for the Ni atom successfully (a total of 388 basis functions).
The problems arise when I change to a bigger basis set for the Ni atom.
If I change to the cc-pVDZ (specified explicitly and available from http://tyr0.chem.wsu.edu/~kipeters/basis.html) or cc-pVTZ basis set (specified explicitly or with the line "Ni library cc-pVTZ"), the process fails with the error:
1:Bus error, status=: 7
(rank:1 hostname:cl1n006 pid:28807):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
4:Bus error, status=: 7
2:Bus error, status=: 7
(rank:2 hostname:cl1n006 pid:28813):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
(rank:4 hostname:cl1n006 pid:28809):ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0
"cokni_1h.out" 819L, 29602C
There is not input errors and was the same for a single point energy or a DFT (XC B3LYP) trial.
- could be the problems related with the total memory specification?
I'm using the line:
memory 30000 mb
- Is there some rule to estimate the optimal amount of memory required for a calculation in NWchem?
Thanks in advance for any suggestion!
Good luck!
NWchem is running on a SGI-Altix cluster compiled with the intel fortran compilers, Infiniband support and MPI.
With that memory line you are requesting 30 Gbyte of memory per processor (per core). Memory needs depend on the calculation, but generally you should not allocate more then the memory in the system. I.e., if you have an 8 core node with 20 GByte of memory, I would probably use "memory 2000 mb", and leave some space for the operating system to run in.
Bert
|
|
|
|
Just Got Here
Threads 0
Posts 3
|
|
10:48:45 AM - Fri, Nov 5th 2010 |
|
Hello..
Bert Thank you for your answer!!
I have partially solved my problem...
I was forgetting the "spherical nosegment" keywords (required for the correlation-consistent basis set) in the BASIS directive line.
Actually, I don't know why the calculations with the cc-pVDZ (for C,O and H) and 6-31G (for Ni atoms) finished successfully without the "spherical nosegment" statement. (perhaps because the 6-31g presence?).
Well...
After this correction and fixing the memory line according to Bert's comment, the
"ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0" Error apparently was solved.
However, now the calculation stop after the first SCF energy calculation when the RI-MP2 module starts. I get the error:
1:Segmentation Violation error, status=: 11
(rank:1 hostname:cl1n006 pid:3330):ARMCI DASSERT fail. signaltrap.c:SigSegvHandler():301 cond:0
Anna
In a trial I got the "Armci error 260 cond:0" and the problem was a wrong keyword in the BASIS directive, I wrote "nosegmented" (wrong) instead of "nosement" (right). Perhaps you problem is related with this.
Thanks again for any reply!..
|
|
|
|
|
6:50:37 PM - Fri, Nov 5th 2010 |
|
Quote:Diegoagomezh Nov 5th 10:48 amHello..
Bert Thank you for your answer!!
I have partially solved my problem...
I was forgetting the "spherical nosegment" keywords (required for the correlation-consistent basis set) in the BASIS directive line.
Actually, I don't know why the calculations with the cc-pVDZ (for C,O and H) and 6-31G (for Ni atoms) finished successfully without the "spherical nosegment" statement. (perhaps because the 6-31g presence?).
Well...
After this correction and fixing the memory line according to Bert's comment, the
"ARMCI DASSERT fail. signaltrap.c:SigBusHandler():213 cond:0" Error apparently was solved.
However, now the calculation stop after the first SCF energy calculation when the RI-MP2 module starts. I get the error:
1:Segmentation Violation error, status=: 11
(rank:1 hostname:cl1n006 pid:3330):ARMCI DASSERT fail. signaltrap.c:SigSegvHandler():301 cond:0
Anna
In a trial I got the "Armci error 260 cond:0" and the problem was a wrong keyword in the BASIS directive, I wrote "nosegmented" (wrong) instead of "nosement" (right). Perhaps you problem is related with this.
Thanks again for any reply!..
Spherical nosegment should not be the issue. I would strongly recommend removing the nosegment keyword. There is no reason for using it (not required for the basis set) and it increases memory usage. I would have to see an input deck so that I could test it and provide you more input.
Bert
|
|
|
|
Just Got Here
Threads 0
Posts 3
|
|
10:04:48 AM - Mon, Nov 8th 2010 |
|
Hello again...
Bert
Yes!, the "nosegment" keyword is not required for the basis (the "spherical" keyword is the required for the cc-xxx basis... I was wrong).
However, I tried without the "nosegment" keyword and the process stops without any message before the guess creation. When I add the latter, the calculation do the first SCF calc. and stops when the RI-MP2 module start. Here is a copy of my input...
Thank you again for your help!
START cokni_1h
title "Pw-ni + 1H2 P. optimization cc-pVDZ/cc-pVTZ(NiII)"
ECHO
memory 3000 mb noverify
Geometry units angstrom print NOAUTOZ
Ni 0.00000 0.00000 1.20508
Ni 0.00000 0.00000 -1.20508
O 1.87837 -0.04938 -1.13303
O 1.87837 -0.04938 1.13303
O 0.04938 1.87837 -1.13303
O 0.04938 1.87837 1.13303
O -1.87837 0.04938 -1.13303
O -1.87837 0.04938 1.13303
O -0.04938 -1.87837 -1.13303
O -0.04938 -1.87837 1.13303
C 0.05619 2.45769 0.00000
C 2.45769 -0.05619 0.00000
C -2.45769 0.05619 0.00000
C -0.05619 -2.45769 0.00000
C 3.96341 -0.04562 0.00000
H 4.35033 -0.53051 -0.90513
H 4.30455 1.00279 0.00000
H 4.35033 -0.53051 0.90513
C 0.04562 3.96341 0.00000
H -1.00279 4.30455 0.00000
H 0.53051 4.35033 0.90513
H 0.53051 4.35033 -0.90513
C -3.96341 0.04562 0.00000
H -4.30455 -1.00279 0.00000
H -4.35033 0.53051 0.90513
H -4.35033 0.53051 -0.90513
C -0.04562 -3.96341 0.00000
H 1.00279 -4.30455 0.00000
H -0.53051 -4.35033 0.90513
H -0.53051 -4.35033 -0.90513
H -0.00000103 -0.37759827 3.95105265
H 0.00000103 0.37759827 3.95105265
end
basis spherical nosegment noprint
O library cc-pVDZ
C library cc-pVDZ
H library cc-pVDZ
Ni library cc-pVTZ
end
basis "ri-mp2 basis"
O library cc-pVDZ-fit2-1
C library cc-pVDZ-fit2-1
H library cc-pVDZ-fit2-1
Ni S
1997.8237701 1.0000000
Ni S
1097.2416591 1.0000000
Ni S
496.28986263 1.0000000
Ni S
196.51539911 1.0000000
Ni S
87.216971652 1.0000000
Ni S
35.137417884 1.0000000
Ni S
11.454909043 1.0000000
Ni S
4.1042987424 1.0000000
Ni S
2.7859778875 1.0000000
Ni S
1.5965165491 1.0000000
Ni S
0.49186103149 1.0000000
Ni S
0.28416654012 1.0000000
Ni S
0.12616265844 1.0000000
Ni P
650.35199200 1.0000000
Ni P
184.43208755 1.0000000
Ni P
47.809364688 1.0000000
Ni P
15.580224065 1.0000000
Ni P
7.5148604075 1.0000000
Ni P
3.8056570107 1.0000000
Ni P
2.3464576499 1.0000000
Ni P
0.93782324546 1.0000000
Ni P
0.51838194511 1.0000000
Ni P
0.21837657698 1.0000000
Ni P
0.48219094780E-01 1.0000000
Ni D
146.64155782 1.0000000
Ni D
44.430191232 1.0000000
Ni D
19.451082526 1.0000000
Ni D
8.4371827589 1.0000000
Ni D
4.1107905672 1.0000000
Ni D
2.3210325339 1.0000000
Ni D
0.97427936878 1.0000000
Ni D
0.46721224829 1.0000000
Ni D
0.20770386175 1.0000000
Ni F
48.616960788 1.0000000
Ni F
11.017689500 1.0000000
Ni F
5.4975283859 1.0000000
Ni F
2.7863055316 1.0000000
Ni F
1.2328473329 1.0000000
Ni F
0.61191666508 1.0000000
Ni F
0.27072533276 1.0000000
Ni G
18.775427434 1.0000000
Ni G
6.3224049104 1.0000000
Ni G
2.8789771807 1.0000000
Ni G
0.93385065596 1.0000000
Ni G
0.35342268260 1.0000000
Ni H
9.6049097340 1.0000000
Ni H
5.0249732323 1.0000000
Ni H
1.9659049861 1.0000000
Ni I
9.8702668554 1.0000000
Ni I
4.5127683520 1.0000000
end
constraints
fix atom 1:30
end
scf
print low
end
mp2
freeze atomic
end
TASK RIMP2 optimize
|
|
|
|
|
9:17:59 PM - Fri, Mar 25th 2011 |
|
Any updates on this issue? We have a user whose jobs are running into the "ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0" error in a spinspin calculation.
I will post details once we do some more testing on the issue.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 278
|
|
9:11:15 PM - Tue, Mar 29th 2011 |
|
No solution yet. Any DASSERT error message is generally a message taht points to you asking more memory then available on the system. What is the memory keyword in the spinspin case, how much memory is on a node, and how many cores are on a node? You need to leave some memory for the operating system, etc.
Bert
Quote: Mar 25th 9:17 pmAny updates on this issue? We have a user whose jobs are running into the "ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0" error in a spinspin calculation.
I will post details once we do some more testing on the issue.
|
|
|
|
|
5:57:54 PM - Wed, Mar 30th 2011 |
|
We have tried using as low as "memory 1gb" for a job using all 8 cores form a node with 12GB of memory.
I was told that a job doing only geometry optimization also gave the same error.
I intend to give a look at code to try to have a better ideia of where/why this happens but my programing skills are quite limited, as well as my time.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 278
|
|
7:11:47 PM - Fri, Apr 1st 2011 |
|
I ran the input deck with a 128 cores in under 2 hours with 3 gbyte per core. In your case, with 12 Gbyte per node and 8 cores, I would recommend the 1 gbyte memory setting in NWChem. How many cores did you run this test on? I will try and reproduce with your number of cores and your memory settings.
bert
Quote: Mar 30th 5:57 pmWe have tried using as low as "memory 1gb" for a job using all 8 cores form a node with 12GB of memory.
I was told that a job doing only geometry optimization also gave the same error.
I intend to give a look at code to try to have a better ideia of where/why this happens but my programing skills are quite limited, as well as my time.
|
|
|
|
Just Got Here
Threads 0
Posts 1
|
|
6:25:29 AM - Fri, Jun 17th 2011 |
|
Problem with MD optimize
|
I run with mpirun -n 2 nwchem a md ?
ERROR
0:0:nga_put_common:cannot locate region: [28394:76246 ,28394:76246 ]:: -999
(rank:0 hostname:master.cluster pid:24491):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Do you have suggestion
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 278
|
|
10:24:57 PM - Wed, Jul 6th 2011 |
|
Please start new threads on new items. Here is your item from another email:
Hello,
I have compiled nwchem6.0 on the cluster : I use OpenMPI compiled with INTEL
I want to use Inifiniband
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1. !/bin/bash
export TCGRSH=/usr/bin/ssh
export USE_MPI=yes
export USE_MPIF=yes
export MPI_LOC=/usr/mpi/intel/openmpi-1.4.2/
export MPI_INCLUDE=$MPI_LOC/include
export MPI_LIB=$MPI_LOC/lib64
export LIBMPI="-L $MPI_LIB -lmpi -lopen-pal -lopen-rte -lmpi_f90 -lmpi_f77"
ARMCI_NETWORK=OPENIB
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all"
export NWCHEM_EXECUTABLE=$NWCHEM_TOP/bin/LINUX64/nwchem
cd $NWCHEM_TOP/src
make CC=icc FC=ifort -j4
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Ok when I run mpirun -n 1 nwchem 1.rwc
No problem it runs
But when I run mpirun -n 2 nwchem 1.rwc
IT STOP :
0:0:nga_put_common:cannot locate region: [28394:76246 ,28394:76246 ]:: -999
What is the problem ?
Could you please help me ... I spent lot of time on this but nothing works.
Best Christophe
From your build I am missing the set up for the Infiniband. See the BUILD file under Infiniband for the environment variables to set to compile for IB.
Bert
Quote:Bovigny Jun 17th 6:25 amI run with mpirun -n 2 nwchem a md ?
ERROR
0:0:nga_put_common:cannot locate region: [28394:76246 ,28394:76246 ]:: -999
(rank:0 hostname:master.cluster pid:24491):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Do you have suggestion
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension Forum theme style by: AWC
| |