The newest generation of computer games place great demands on computer hardware. Millions of objects must be rotated, scaled, and textured. These tasks require a large number of floating point operations and are highly parallel. As such, graphical processing units (GPUs) have evolved into high-performance parallel processors capable of up to 1 trillion floating point operations per second (one TeraFLOP/s). With the release of Nvidia CUDA software this hardware can be harnessed by the programmer to do computationally intensive tasks unrelated to rendering graphics using a convenient and easy to learn programming toolkit.
We have worked to provide user-ready applications which utilize GPU hardware to accelerate scientific computing. Because most data analysis occurs on the user’s desktop computer, which will generally contain a high end GPU, data analysis codes are an obvious target for acceleration. We have produced a code which calculates radial distribution functions (RDF) at a rate 99 times faster than the CPU. At the moment this code is being incorporated into the VMD visualization software package, where it will be freely available and widely distributed.
Furthermore we have implemented the functional forms for the CMM coarse grain model into the HOOMD-blue open source MD software package. HOOMD-blue is written to run MD simulations entirely on the GPU and achieves up to 60x speedup over equivalent MD codes (including its internal CPU reference code). This code is publicly available since the HOOMD-blue 0.8.2 release version. Please see the Images and Movies Gallery page for some application examples.
In addition, we have worked to provide CUDA support in the CP2K direct molecular dynamics software package. At present, up to a 2x speed up can be attained in cases by performing the Fourier transform on the GPU instead of the CPU. Future generations of GPU hardware will allow to accelerate many more parts time critical parts of the CP2K code. This code is available for download from the CP2K cvs repository.
We are also currently participating at an effort from several groups throughout the US to add GPU accelerated compute kernels to the LAMMPS MD code. This effort is particularly targeted to support future massively parallel supercomputer with large numbers of CPUs and integrated GPUs.Contact: Benjamin Levine, David LeBard, Axel Kohlmeyer Go back to research index.
With dual- and quad-core CPUs becoming ubiquitous on desktops and many-core processors just around the corner, there is a large potential to speed up analysis systematically and significantly with comparatively little effort using hardware that is already in place. To this extent we have added OpenMP parallelism to several analysis tools, and particularly contributed several new parallel plugins to and parallelized existing tools in the VMD visualization and analysis software package. These are a spectral density calculator, a Savitzky-Golay sliding window polynomial filter, a multi-threaded version of the internal VMD command for calculation of radial distribution functions, a multi-threaded FFT plugin. For sufficiently large data sets the speed gain here is frequently of the order of a factor of three on a quad-core desktop.
Many-core CPUs also pose a new problem to getting good parallel scaling using MPI parallel scientific software packages on clusters and supercomputers. The increase of compute units in each node significantly increase the demand for communication bandwidth and increase latencies, since all communication needs to be channeled through a single communication device (Infiniband, Myrinet or similar adapter), particularly when MPI_Alltoall() calls are required like in distributed 3d-Fourier transforms. This problem can to a large degree be avoided by implementing multi-level parallelism, e.g. hybrid OpenMP/MPI, into existing software packages. Our group is currently working on implementing such a hybrid parallel scheme into the LAMMPS MD simulation package.Contact: Axel Kohlmeyer Go back to research index.