From NWChem
Viewed 1176 times, With a total of 6 Posts
|
Just Got Here
Threads 1
Posts 4
|
|
6:16:26 PM - Mon, Mar 14th 2011 |
|
Hi all!
I have recently compiled NWChem with OpenIB (ifort 12.0.2, openmpi-1.4.2, OFED-1.5.2) and tried to run CCSD calculation. Memory limit in my input file is 'memory total 1800 mb'. This input runs well of with 8 MPI processes on a single node with 16 GB of RAM, consuming about 1.1GB of memory per process at CCSD stage.
The same input fails to run when 16 MPI processes are distributed over 2 nodes. The error is:
(rank:0 hostname:n306 pid:14660):ARMCI DASSERT fail. openib.c:armci_server_register_region():971 cond:(memhdl->memhndl!=((void *)0))
And a message in stderr:
7: WARNING:armci_set_mem_offset: offset changed 4096 to 9859072
Last System Error Message from Task 0:: Cannot allocate memory
This input runs with well with 32 MPI processes.
The problem is that I can not run production CCSD jobs (those are quite big).
Thanks for your kind support!
Roman Zubatyuk.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 238
|
|
8:18:21 PM - Thu, Mar 17th 2011 |
|
Try running with the following environment variable set:
ARMCI_DEFAULT_SHMMAX 2048
or 4096.
Bert
Quote:Romaniz Mar 14th 6:16 pmHi all!
I have recently compiled NWChem with OpenIB (ifort 12.0.2, openmpi-1.4.2, OFED-1.5.2) and tried to run CCSD calculation. Memory limit in my input file is 'memory total 1800 mb'. This input runs well of with 8 MPI processes on a single node with 16 GB of RAM, consuming about 1.1GB of memory per process at CCSD stage.
The same input fails to run when 16 MPI processes are distributed over 2 nodes. The error is:
(rank:0 hostname:n306 pid:14660):ARMCI DASSERT fail. openib.c:armci_server_register_region():971 cond:(memhdl->memhndl!=((void *)0))
And a message in stderr:
7: WARNING:armci_set_mem_offset: offset changed 4096 to 9859072
Last System Error Message from Task 0:: Cannot allocate memory
This input runs with well with 32 MPI processes.
The problem is that I can not run production CCSD jobs (those are quite big).
Thanks for your kind support!
Roman Zubatyuk.
|
|
|
|
Just Got Here
Threads 1
Posts 4
|
|
6:22:30 PM - Sat, Mar 19th 2011 |
|
Unfortunately, doesn't help. Setting this variable to 1024 or 2048 or 4096 results in calculation crashes immediately with the same error (output is completely empty). Setting to 8192 up to 32768 results in crash at the start of CCSD iterations.
|
|
|
|
Just Got Here
Threads 1
Posts 4
|
|
6:31:50 PM - Sat, Mar 19th 2011 |
|
Failed to post my build script to forum. It is here. Could you see anything wrong with it? BTW, QA tests were passed.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 238
|
|
9:14:03 PM - Tue, Mar 29th 2011 |
|
Please try:
1. Running with 1500 mb for memory.
2. Fewer cores per node.
Bert
Quote:Romaniz Mar 19th 6:31 pmFailed to post my build script to forum. It is here. Could you see anything wrong with it? BTW, QA tests were passed.
|
|
|
|
Just Got Here
Threads 1
Posts 4
|
|
7:03:58 PM - Thu, Mar 31st 2011 |
|
Tried both options. Same result. Will try to recompile with mvapich2 or ipmi.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Regular
Threads 2
Posts 238
|
|
7:05:48 PM - Fri, Apr 1st 2011 |
|
What does your failing input deck look like (memory allocation wise)?
Quote:Romaniz Mar 31st 7:03 pmTried both options. Same result. Will try to recompile with mvapich2 or ipmi.
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC