GA-bugs
From NWChem
Contents |
Global Array Toolkit Bug and Development Projects List
NWChem-GA Bug list (in order of priority)
Priorities are Critical (C), High Importance (H), Medium Importance (M), Low Importance (L).
Issue: Wrong results when multiple shared memory regions are accessed during a calculation.
GA version: 5.0
Staff: Manoj (GA) and Bert (NWChem)
Status: Work-around is to set ARMCI_DEFAULT_SHMMAX large enough to support single shared memory region optimization. However, setting ARMCI_DEFAULT_SHMMAX to 8196 leads to "ARMCI DASSERT fail. openib.c:armci_pin_contig_hndl():989 cond:(memhdl->memhndl|=((void *)0)" and "Cannot allocate memory" messages. Is the issue due to one GA distributed over multiple regions? If so, maybe we need to pin a separate block for each GA created?
As a fix NWChem now uses ga_initialize_ltd which initializes the Global Arrays and at the same time specifies the upper limit of the amount of memory the GAs are allowed to use. This way the GAs can always allocate one shared memory segment of this size and not need any other shared memory segments.
Date fixed: (updated June 10, 2011)
MPI-SPAWN compilation with ga-5-0 (H)
Issue: Ga-5-0 does not compile with --use-mpi-spawn set. seems to be an issue with finding mpi.h . Also, in NWChem src/tools/GNUmakefile does not recognize MPI-SPAWN as an ARMCI_NETWORK.
GA version: 5.0
Staff: Jeff (GA) and Bert (NWChem)
Status: Jeff fixed this. Bert tested and it mostly works. GNUMakefile calls for MPI_SPAWN_INCLUDE, etc. but never picks up MPI_INCLUDE. Update: Jeff: The env vars MPI_INCLUDE, MPI_LIB, LIBMPI, and MPI_LOC are parsed if the env var USE_MPI is defined. This all takes place in the src/tools/GNUmakefile. Perhaps USE_MPI is not defined? Or perhaps these are not the correct MPI env vars to look for.
Date fixed: 5/10/11 (partly)
MPI-SPAWN on OpenMPI (H)
Issue: MPI-SPAWN gives incorrect results on various user platforms when using OpenMPI. OpenMPI does support MPI_Comm_spawn_multiple, but not in same way as other MPI implementations do.
GA version: 5.0
Staff: Abhinav,Manoj (GA) and Bert (NWChem)
Status: Not working.
Date fixed: ?
NGA_ADD_PATCH uses excessive memory (H)
Issue: Nga_add_patch relies on the data distributions of all three patches matching exactly. If they do not match exactly then the g_c array is duplicated and the appropriate data block copy into the duplicate. This can lead to an excessive use of memory if the g_c array is large.
GA version: 5.0
Staff: Bruce(GA) and Huub (NWChem)
Status: Issue confirmed.
Date fixed: 9/19/11 A new version of NGA_Add_patch was added to the trunk and should save one copy of the array if A and B are not completely aligned with C. This has been tested with a few of the GA test routines and is currently back with Huub for more exensive testing.
Subgroups on Franklin fails (M)
Issue: Subgroups cannot be used on NERSC Franklin machine.
GA version: 5.0
Staff: Sriram (GA) and Marat (NWChem)
Status: Not working. Marat provided test case. Sriram to check in fix and have Marat test it.
Date fixed: ?
Automatic build for Hopper at NERSC fails (L)
Issue: Cannot build GA using Cray compiler on Hopper.
GA version: 5.0
Staff: Jeff (GA) and Eric (NWChem)
Status: This problem is due to Cray compiler not supporting atomic intrinsic and also the path to gni and dmapp headers is not automatically inserted by cray wrappers. Jeff has reported these to Hopper support and they are listed as known issues, to be fixed at a later point. Until this issue is fixed by Cray/NERSC, we have managed to get a workaround. We will wait for Cray to fix this.
Date fixed: Workaround until Cray fixes this.
NWChem-GA Development Tasks
Tasks are in order of priority.
BlueWaters
Task: Develop initial implementation of GA Toolkit for BlueWaters hardware.
Staff: Abhinav (GA) and Bert (NWChem)
Status: Unknown
Date fixed: ?
Cray Gemini
Task: Optimized Cray Gemini support.
Staff: Abhinav and Manoj (GA) and Olsen (Cray) and Huub (NWChem)
Status: Waiting for Cray to provide specific APIs
Date fixed: ?
Basic GA+OpenMPI/threads
Task: Develop basic implementation of GA with OpenMP and threads.
Staff: Manoj,Daniel (GA) and Eric (NWChem)
Status: Implemented as long as threads do not do GA calls. Eric claims it fails for him. Seems to work for Daniel and Manoj. Eric needs to provide test case that fails or sit down with GA developers.
Date fixed:
New data-types
Task: Provide infrastructure for broader set of data types and operations, including mixed data type and user defined data types.
Staff: ? (GA) and Niri and Marat (NWChem)
Status: Niri users wrappers for complex data types as work-around. Sriram and Jeff implemented NGA_Get_field() and NGA_Put_field functions which can extract the real or imaginary parts of a complex array. But this does not address C-style automatic promotion of data types or mixing float/double operations, etc.
Date fixed: ?
Python
Task: Python bindings in GA Toolkit
Staff: Jeff (GA) and Eric (NWChem)
Status: The Python bindings for GA were released with ga-5-0 and the build was stabilized by ga-5-0-2. The "task python" of an nwchem input deck does not automatically "import ga." The GA bindings currently depend on MPI. They have not been tested with TCGMSG.
Date fixed: ?
Closed NWChem-GA Bugs
Restricted arrays not working on Cray (L)
Issue: Restricted arrays cannot be created on Cray, needed for XSCI Fault Tolerance.
GA version: Development
Staff: Bruce (GA) and Huub (NWChem)
Status: Huub tested it with updated version, seems to work properly now.
Date fixed: 5/12/2011