|
SEARCH
TOOLBOX
LANGUAGES
Forum Menu
Segmentation Violation error for Cr2 PBE aug-cc-pvqz
From NWChem
Viewed 365 times, With a total of 11 Posts
|
Clicked A Few Times
Threads 6
Posts 37
|
|
8:20:47 AM PST - Wed, Dec 21st 2011 |
|
Hi,
i'm getting the following error with NWchem 6.0 and http://www.nwchem-sw.org/images/Nwchem-src-2011-Oct-25.tar.gz
(installed from RPMS http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id262/RPMS_of_NWchem.html on CentOS 5 x86_64):
....
127:Segmentation Violation error, status=: 11
(rank:127 hostname:XXX pid:12303):ARMCI DASSERT fail. signaltrap.c:SigSegvHandler():301 cond:0
....
Problem looks similar to:
http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id43/#post_533
I run the input below (structure from doi:10.1063/1.2162161) up to 128 cores (2GB memory per core).
charge 0.0
geometry noautoz noautosym
Cr 8.0 8.0 8.0
Cr 8.0 8.0 9.679
end
basis spherical
\* library aug-cc-pvqz
end
dft
mult 1
xc xpbe96 cpbe96
iterations 300
convergence gradient 0.0005
convergence energy 1e-06
convergence density 1e-05
convergence nolevelshifting
grid coarse nodisk
smear 0.0
tolerances tight
direct
noio
end
property
dipole
end
memory total 1500 Mb noverify
task dft energy
aug-cc-pvtz runs fine on 4 cores.
The following does not fix the problem with aug-cc-pvqz (tried with NWchem 6.0 on 4 cores):
0. running http://www.nwchem-sw.org/images/Nwchem-6.0-binary-redhat-5-5-gcc-4-1-2.tgz in serial on a system with 24GB memory with "memory total 22000 Mb noverify"
1. using plain geometry (instead of geometry noautoz noautosym)
2. grid medium + removed direct and noio keywords
3. basis cartesian
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 286
|
|
3:47:20 PM PST - Thu, Dec 22nd 2011 |
|
I'll need more info then the last line of the output. Can you please provide the last part of theoutput so I can see where it fails.
Bert
|
|
|
|
Clicked A Few Times
Threads 6
Posts 37
|
|
6:49:49 AM PST - Wed, Dec 28th 2011 |
|
Hi,
it fails at:
Superposition of Atomic Density Guess
Sum of atomic energies: -2762.66188442
0:Segmentation Violation error, status=: 11
....
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 286
|
|
2:14:29 PM PST - Wed, Dec 28th 2011 |
|
Could you try reducing the memory per core, i.e. "memory total 1000 mb".
Bert
Quote:Marcindulak Dec 28th 1:49 pmHi,
it fails at:
Superposition of Atomic Density Guess
Sum of atomic energies: -2762.66188442
0:Segmentation Violation error, status=: 11
....
|
|
|
|
Clicked A Few Times
Threads 6
Posts 37
|
|
10:42:47 AM PST - Thu, Dec 29th 2011 |
|
Hi,
it fails with "memory total 500 Mb noverify" on 16 cores, 3GB per core.
The aug-cc-pvtz job runs in serial on a single node with 24GB memory with "memory total 500 Mb noverify",
while running top reports ~ 100MB "RES" resident size used.
Is it really a memory problem - see point 0?
|
|
|
|
Clicked A Few Times
Threads 6
Posts 37
|
|
12:58:21 AM PST - Thu, Jan 5th 2012 |
|
What helps is to replace (a weird experiment) the first S function of cc-pvqz type (the one starting with exponent 11016640.0000000) with the one of cc-pvtz type (starting with exponent 61771940.0000000) keeping the rest of (aug)cc-pvqz type untouched, so it results for Cr atom in a Summary of "ao basis" change from "32 140 9s8p6d4f3g2h" (aug-cc-pvqz) to " "31 139 8s8p6d4f3g2h" (the created hybrid). Is it a problem with the cc-pvqz basis set family or a problem with NWchem handling them? I see similar crashes for almost all 3-d elements dimers, also for cc-pv5z set. It is even possible to trim the cc-pvqz basis to a much smaller one (as long as i contains the first s-type one), so one could investigate the crash in a debugger.
|
|
|
-
Edoapra Forum:Admin, Forum:Mod, bureaucrat, sysop
|
|
Gets Around
Threads 0
Posts 45
|
|
5:02:32 PM PST - Thu, Jan 5th 2012 |
|
patch
|
Marcin
Please apply the patch below to the directory $NWCHEM_TOP/src/NWints/texas
http://www.nwchem-sw.org/images/Ab_prime2.patch.gz
In other words, please do the following
cd $NWCHEM_TOP/src/NWints/texas
wget http://www.nwchem-sw.org/images/Ab_prime2.patch.gz
gzip -d Ab_prime2.patch.gz
patch -p0 < Ab_prime2.patch
and recompile.
Please let me know if this work for you, too.
Cheers, Edo
|
|
|
|
|
1:40:29 PM PST - Fri, Jan 6th 2012 |
|
Thanks. The 6.0 patched version with aug-cc-pvqz passes the GUESS.
The patched development version http://www.nwchem-sw.org/images/Nwchem-src-2011-Oct-25.tar.gz however,
does not ever reach the GUESS (last print "Schwarz screening/accCoul: 1.00D-08") and fails with:
2:2:ga_matmul:ga_matmul_irreg:xerbla:double: lapack error:: 911
(rank:2 hostname:XXX pid:18049):ARMCI DASSERT fail. ../../ga-5-0/armci/src/armci.c:ARMCI_Error():279 cond:0
\*\* On entry to DGEMM parameter number 8 had an illegal value
xerbla:double: lapack error 911
Problem looks similar to http://www.emsl.pnl.gov/docs/nwchem/nwchem-support/2009/09/0010.Re:_NWCHEM_NWChem_OpenMPI:...
I compiled both patched versions with rpmbuild on my personal CentOS 5 x86_64 (haven't used build.opensuse.org),
which links to CentOS 5 blas/lapack, so as build is partly manual this could be due to a mistake.
Interestingly my non-patched Nwchem-src-2011-Oct-25 fails at GUESS, similarly to the non-patched 6.0:
0:Segmentation Violation error, status=: 11
(rank:0 hostname:XXX pid:18468):ARMCI DASSERT fail. ../../ga-5-0/armci/src/signaltrap.c:SigSegvHandler():312 cond:0
Could it be these are some 32/64 bit issues?
Does your installation (i assume you used a development source) works after patching on x86_64?
|
|
|
-
Edoapra Forum:Admin, Forum:Mod, bureaucrat, sysop
|
|
Gets Around
Threads 0
Posts 45
|
|
2:05:17 PM PST - Fri, Jan 6th 2012 |
|
Yes
Your ga_matmul error seems a 32bit vs 64bit integer issue.
My development computer is a x86_64 box where I compile using the LINUX64 target.
When I use optimized LAPACK/BLAS that use 32-bit integer (a.k.a. integer*4 in Fortran parlance), I use the 64_to_32 steps described in the $NWCHEM_TOP/install file
1) cd $NWCHEM_TOP/src
2) make clean
3) make 64_to_32
4) make USE_64TO32=y HAS_BLAS=yes BLASOPT=" optimized BLAS"
e.g. for IBM64: make USE_64TO32=y HAS_BLAS=yes BLASOPT="-lessl -lmass"
If this still does not work, you might need recompiled (from scratch) the tools directory, after patching the tools GNUmakefile ($NWCHEM_TOP/src/tools/GNUmakefile) with the following patch
http://www.nwchem-sw.org/images/GNUmakefile.toolsoct25.patch.gz
plus define the env. variables BLAS_LIB (same value as BLASOPT), plus BLAS_SIZE=4
In other words (the following instructions are for csh/tcsh)
1) cd NWCHEM_TOP/src/tools
2) wget http://www.nwchem-sw.org/images/GNUmakefile.toolsoct25.patch.gz
3) gzip -d GNUmakefile.toolsoct25.patch.gz
4) patch -p0 < GNUmakefile.toolsoct25.patch
5) rm -rf build install
6) setenv BLAS_LIB "location of optimized blas/lapack"
7) setenv BLAS_SIZE 4
8) recompile
|
|
|
|
|
6:55:42 AM PST - Mon, Jan 9th 2012 |
|
Thanks again. For the moment I see a result of the patch, with aug-cc-pv5z, Co2:
ab_prim_2: increased dimmx to:: 729
So, still some variables need to be tweaked. Please consider that when making 6.1.0 release if possible.
I will have a look at the 32/64 bit problems at a later time.
|
|
|
-
Edoapra Forum:Admin, Forum:Mod, bureaucrat, sysop
|
|
Gets Around
Threads 0
Posts 45
|
|
|
|
|
Clicked A Few Times
Threads 6
Posts 37
|
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension Forum theme style by: AWC
| |