*AQUA-D: Accelerated QuICC Application - Dynamo *

*PI: Andrew Jackson (ETH Zurich)*

*July 1, 2021 – June 30, 2024*

### Project Summary

Fluid flow is ubiquitous in the Earth, atmospheric, oceanic, planetary and astrophysical science. A major role is, of course, played by computations, and today’s generation of computers allows us to understand the physics of these disciplines more fully than ever before. Here we discuss a 3-year plan to leverage the computing power available at CSCS together with a flexible code that will allow new discoveries in both the environmental sciences and in pure fluid mechanics research.

Over the last years we have developed a fully spectral code, called QuICC, that has enormous flexibility. The code is able to treat a variety of simple geometries that cover a large number of applications. Our code is designed to compute solutions to either hydrodynamical problems or magnetohydrodynamical (MHD) situations: in other words, it solves the Navier-Stokes equation of momentum transfer, the pre-Maxwell equations of electrodynamics, and the equations of heat transfer appropriate to a fluid. Thus the code can treat problems of convection in both electrically conducting and non-conducting fluids, as well as being able to treat different types of forcing, as elucidated below.

Primary focus: Convection and dynamos in planetary science

Secondary focus: turbulence in thermal convection

In the context of planetary dynamics, the code is able to treat fluid flow in whole spheres and spherical shells, thus allowing simulations of planetary cores both with and without solid inner cores. In the context of research in pure fluid dynamics, QuICC is equipped with the ability to compute solutions in a Cartesian geometry, thus allowing questions in turbulence research, astrophysics and pure fluid mechanics to be addressed. An important feature of our code is the option to include the Coriolis force that arises in a planetary context as a result of rapid rotation. We are also able to treat forces that arise from precession and libration, both of which arise in a planetary context.

A central question in fluid mechanics is the scaling of dissipation as a function of forcing, typically measured by the Reynolds number. We believe we can contribute to this question through the use of our code to simulate pipe flow. Equally our code can function in a so-called spherical Couette configuration, corresponding to a number of experiments around the world. Such experiments subject fluid trapped between two spheres to shear created by the differential rotation of these spheres.

Our research code is highly accurate by dint of its use of spectral expansions, which converge exponentially. Differentiability in different coordinate systems is treated in an exact manner. The code is highly parallelised and delivers performance that is internationally competitive, being amongst the best in the world. This provides a stable platform from which to develop further enhancements and optimisations that will make the code world-beating. These enhancements partly focus on the use of the new generation of GPUs and accelerators that will be available to CSCS in the near future. Other aspects of our optimisation are concerned with algorithmic changes accounting for the fact that memory operations are expensive while computations are cheap. As a result of this we wish to re-engineer our codes to make use of on-the-fly algorithms as opposed to precomputations and memory accesses.

Very careful methods have been developed in the design of the code to ensure sparsity in the discretised form of the different operators that need to be evaluated in the equations of motion. A special quasi-inverse technique leads to optimal sparseness that means high resolution can be achieved with little memory use and, importantly, fewer floating point operations in the required solution of the discretised equations. The techniques employed make use of the threeterm recurrence relations that every spectral expansion (Fourier, Jacobi, etc.) obeys. A unique aspect of the code is its ability to treat the Coriolis force implicitly in the Navier-Stokes equation. Most other codes treat this force explicitly, which can lead to time step constraints that are prohibitively expensive. Implicit treatment of the Coriolis force generally leads to very dense matrices. The code QuICC is able to treat the Coriolis force without loss of sparsity and has demonstrated reduced time to solution on real-world problems. The ability to treat it implicitly in the time marching of the equation is the cornerstone to reach more extreme and realistic planetary regimes.

In order to make a better use of the available computational power, the high level MPI based parallelisation will be re-engineered in order to provide two level parallelism. The 2D data distribution will be done at the node level and not, as is currently done, directly on the cores. This leads to a two-fold gain for the code. First, the communication structure becomes simpler and much more homogeneous as all communications take place between nodes; the load balancing and estimations of the communication overheads become easier at the high level. Furthermore, a parallelised low level spectral transform provides additional flexibility which helps to achieve a maximal throughput. Simultaneously the migration to two levels of parallelism makes the code portable to HPC systems where accelerators are available. In order to make maximal use of the processing power of modern HPC architectures, an implementation of the spectral transforms using the performance portable Kokkos library will be developed. A single implementation will thus be able to provide high-performance in purely CPU-based simulations as well as leverage the computational power provided by accelerators. It also makes for a smooth transition to new systems such as, for example, the upcoming LUMI platform.

An important aspect of the implementation is to maintain the code’s generic form as much as possible in order to provide a unique interface for the geometries that are being considered here, namely Cartesian and spherical. The high-level pseudo spectral method that effectively provides the time marching algorithms is implemented in a geometry-independent way. The spectral transforms can effectively be represented by a tree that describes the required operations to move between physical and spectral space in order to compute the nonlinear interactions. While the details of the tree depend on the geometry and spectral basis, the time stepping algorithm and the high level parallelisation only require knowledge of the structure and dependencies between the operations.

High resolution simulations generate a large amount of data. An Input/Output engine is required to efficiently work with the produced data. The challenges are two fold: first, on-the-fly calculations of diagnostics and statistics are needed to monitor the evolution of the simulation as well as reduce the amount of produced data. Then, snapshot data that allow simulation restarts, as well as postprocessing and visualising should also be handled efficiently.