From NWChem
Viewed 1878 times, With a total of 5 Posts
|
Clicked A Few Times
Threads 5
Posts 17
|
|
5:41:30 PM PDT - Tue, May 29th 2012 |
|
Hello everyone.
I'm using NWChem 6.0 installed on one PC cluster. Now I'm trying to install it to another cluster also.
I set some environments in .bashrc file in my home folder and comfiled the program with
[make nwchem_config] and [make FC=ifort >& make.log] commands.
When it's over, I checked the nwchem execute file is created in $NWCHEM_TOP/bin/LINUX64/ folder.
Here is the problem. I tested the program with simple input file, with [./nwchem test.nw] command. Then I found some error message comes out. Here it is...
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,0,0]: OpenIB on host pcs5 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
nwchem 0000000002A1659D Unknown Unknown Unknown
nwchem 0000000002A150A5 Unknown Unknown Unknown
nwchem 00000000029AC319 Unknown Unknown Unknown
nwchem 000000000295744F Unknown Unknown Unknown
nwchem 000000000295BF32 Unknown Unknown Unknown
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
I think this error message is related with Armci network like InfiniBand or Giganet and I guess the program is trying to find OPENIB protocol.
The strange thing is that, the cluster that I'm using do not have any network like InfiniBand or Giganet.
So I wrote [export ARMCI_NETWORK=SOCKETS] in bashrc file for the environment setup. I think there was no problem in compiling process(I checked the make.log file), but don't know why such error message comes out.
Is there anybody who had same or similar problem before? I need your help...
Thanks in advance.
Yjlee
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Vet
Threads 5
Posts 598
|
|
1:45:13 PM PDT - Wed, May 30th 2012 |
|
Could you send me the make.log at bert.dejong@pnnl.gov so we can see what happened.
Bert
Quote:Yjleedaniel May 30th 1:41 amHello everyone.
I'm using NWChem 6.0 installed on one PC cluster. Now I'm trying to install it to another cluster also.
I set some environments in .bashrc file in my home folder and comfiled the program with
[make nwchem_config] and [make FC=ifort >& make.log] commands.
When it's over, I checked the nwchem execute file is created in $NWCHEM_TOP/bin/LINUX64/ folder.
Here is the problem. I tested the program with simple input file, with [./nwchem test.nw] command. Then I found some error message comes out. Here it is...
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,0,0]: OpenIB on host pcs5 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
nwchem 0000000002A1659D Unknown Unknown Unknown
nwchem 0000000002A150A5 Unknown Unknown Unknown
nwchem 00000000029AC319 Unknown Unknown Unknown
nwchem 000000000295744F Unknown Unknown Unknown
nwchem 000000000295BF32 Unknown Unknown Unknown
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
I think this error message is related with Armci network like InfiniBand or Giganet and I guess the program is trying to find OPENIB protocol.
The strange thing is that, the cluster that I'm using do not have any network like InfiniBand or Giganet.
So I wrote [export ARMCI_NETWORK=SOCKETS] in bashrc file for the environment setup. I think there was no problem in compiling process(I checked the make.log file), but don't know why such error message comes out.
Is there anybody who had same or similar problem before? I need your help...
Thanks in advance.
Yjlee
|
|
|
|
Clicked A Few Times
Threads 5
Posts 17
|
|
10:55:30 PM PDT - Wed, May 30th 2012 |
|
I just send you that files
|
I just send you the make.log file and environment setup scripts.
Thank you.
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Vet
Threads 5
Posts 598
|
|
7:57:55 AM PDT - Thu, May 31st 2012 |
|
Hi Yongjin,
Couple of things after reviewing the info:
1. NWChem did not compile in any Infiniband information. It looks like this comes from the OpenMPI itself.
2. What is the network between your nodes in your cluster? There must be some network. The OpenMPI installation suggest it has IB. You may verify this.
3. If you want to compile over sockets, you should not specify any MPI variables when compiling. Instead of starting the job with mpirun you will have to use the parallel.x command to run the code in parallel.
4. When you want to compile with MPI, you should not set the ARMCI_NETWORK to SOCKETS. You should not set this at all (see the compile instructions on the NWChem web page).
Bert
Quote:Yjleedaniel May 31st 6:55 amI just send you the make.log file and environment setup scripts.
Thank you.
|
|
|
|
Clicked A Few Times
Threads 5
Posts 17
|
|
7:01:17 AM PDT - Wed, Jun 13th 2012 |
|
sorry for late
|
Sorry for late reply...
I discussed about this problem with the administrator of the cluster system.
He checked the openmpi and found that your first comment were right.
openmpi itself was looking infiniband.
(Actually, he didn't know about it before. Other cluster users use mpich, instead of openmpi)
The problem was fixed and NWChem is running without any problem. Well, now the problem is problem about
my knowledge and understanding about computational chemistry and program itself.
(Now I'm struggling with other problems, )
Anyway, thank you again!
|
|
|
-
Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
|
|
Forum Vet
Threads 5
Posts 598
|
|
8:30:57 AM PDT - Fri, Jun 15th 2012 |
|
Excellent
Bert
Quote:Yjleedaniel Jun 13th 3:01 pmSorry for late reply...
I discussed about this problem with the administrator of the cluster system.
He checked the openmpi and found that your first comment were right.
openmpi itself was looking infiniband.
(Actually, he didn't know about it before. Other cluster users use mpich, instead of openmpi)
The problem was fixed and NWChem is running without any problem. Well, now the problem is problem about
my knowledge and understanding about computational chemistry and program itself.
(Now I'm struggling with other problems, )
Anyway, thank you again!
|
|
|
AWC's:
2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC