Reduced scaling electronic structure calculations based on a versatile library for sparse matrix multiplication

PI: Jrg Hutter (University of Zurich)

Co-PIs: Joost VandeVondele, Stefan Goedecker, Nicola Marzari

January 1, 2014 - December 31, 2016

Project Summary

This project is part of the 'Materials Simulations' domain science network and will provide library support to enable highly efficient large scale electronic structure calculations on modern computer architectures. Simulation codes using density functional theory, semi-empirical methods, and methods including wavefunction correlation will be supported. These calculations are key for the simulation of nanoparticles, electronic devices, complex interfaces, macromolecules and disordered systems. These systems are important for technological applications in a wide variety of fields, including for example energy and electronics. The computational kernel which is essential for calculations of systems of this size is sparse matrix matrix multiplication. This co-design project builds on our successful implementation of linear scaling SCF methods in the general atomistic simulation code CP2K and our efficient, highly parallel, sparse matrix library DBCSR. In part funded by the the PASC predecessor HP2C, we have been able to demonstrate minimal basis set DFT calculations on systems containing a million atoms, and fair scalability to a few ten thousands of cores. Analyzing our results and based on this experience, we have identified three main targets for this co-design project. First, further development of the sparse matrix library DBCSR is needed to follow the rapid evolution in computer hardware and to implement new and improved algorithms. In order to quickly target new hardware types (CPU, GPU, MIC, etc) and explore new algorithmic ideas, improved modularity within the library is needed. A clearly layered library structure (API, middle-layer and backends) needs to be implemented. Furthermore, the backends must all be build on a framework of auto-generation and auto-tuning matrix multiplication kernels, similar to the successful libSMM, which we have developed for CPU auto-tuning. Improved scalability and performance, especially for very large systems, requires a new algorithm for communication in the DBCSR library. The current scheme based on the Cannon algorithm is good for matrices with moderate sparsity, and has excellent worst case behavior, but does not exploit the spatial origin of the sparsity pattern that is observed for large systems. A new scheme, similar in nature to the Neutral Territory algorithms, which recently became popular in biomolecular simulations, will be implemented. This scheme allows to exploit the data locality and will bring significant reduction in communication for very large systems. Auxiliary and higher level functionality will be added, including improved Lanczos iterations, and further matrix functionals such as exponentials. Second, this state-of-the-art sparse matrix library needs to be embedded into a widely used code that employs the best possible algorithms. Broad adoption of these methods and stringent testing by 'lay' users are essential to obtain robust tools useful beyond simple benchmarks. As a widely adopted and freely available code, CP2K can play an important role in establishing these new methods in an existing community of experienced simulation groups. The current linear scaling code in CP2K can be further enhanced using new algorithms for the matrix sign function, improved SCF convergence procedures as well as the careful use of matrix symmetry and mixed precision arithmetics. All these aspects will yield significant further speedups in actual applications, enhancing the impact of library development. Furthermore, we will broaden the scope of the sparse matrix package by making methods available for low scaling electron correlation calculations, like the atomic orbital based Laplace-transform MP2 method. Third, we will release the DBCSR library as a well-defined and self-contained library. Wider adoption will encourage hard- and software vendors to improve support and efficiency of the DBCSR library, and will establish a larger community of developers that can contribute and maintain this package. In this part of the project we will thus remove the entanglement of DBCSR and CP2K, and release DBCSR as a standalone package. We will redesign the interface (API) and reducing the complexity of the exposed functionality, for example by having standard and expert interfaces. Both the external and internal documentation will be improved in this process. Regular regression testing, based on several thousand hand written inputs, will be complemented with unit testing. This library will then be used within the 'Materials Simulations' domain science network to implement new functionality into the Wannier90 (N. Marzari, EPFL) and BigDFT (S. Goedecker, UniBas) codes. The library will allow the BigDFT code to gain an important part of the linear scaling functionality in a well tested and highly efficient form. The Wannier90 code will be able to make use of the inherent sparsity of the localized function representation for new algorithmic developments when implementing new functionality. Finally, to demonstrate the quality and usability of the standalone library, we will assist other groups in integrating the DBCSR library in their codes. This co-design project will be lead by Prof. Juerg Hutter (PI), Prof. Joost VandeVondele (ETHZ), Prof. Nicola Marzari (EPFL), and Prof. Stefan Goedecker (UniBas) and is thus embedded in the groups that lead electronic structure software development within the Materials Simulation domain network. With pioneering applications, we have established CP2K, BigDFT, and Wannier90 as robust tools for advanced simulations. These codes are among the most used codes at CSCS, and are adopted in supercomputer centers worldwide. We will collaborate with the group of Torsten Hoefler, Computer Science at ETHZ. This group develops a compiler and code transformation framework that we will use to simplify and generalize the backend code of the DBCSR library. This collaboration will allow us to reach the goal to make the library at the same time highly performant and hardware independent. Furthermore, we have a history of successful collaboration and close contacts with major hardware vendors and computer centers. In particular for the DBCSR library, we already have direct contacts with CSCS, Cray (supercomputing systems), Nvidia (GPUs), and Intel (CPU and MIC), which we will be able to intensify as a result of this co-design project. The project is essential to reach the milestones in the road map of the Materials Simulation domain network.