PASCHA – Portability And Scalability of COSMO on Heterogeneous Architectures

PI: Torsten Höfler

CO-PIs: Oliver Fuhrer, Carlos Osuna, Christoph Schär, Christina Schnadt Poberaj

July 1, 2017 - June 30, 2020

Project Summary

In the last decade, weather and climate models have seen an unprecedented increase in resolution and accuracy. A key driver for this development is the continuously in-creasing power of the high-performance computing (HPC) systems. As the horizontal resolution of weather and climate models has been approaching the 1 km scale, convective motions are increasingly explicitly resolved without the need of a parameterization of the process. This represents a disruptive change in our capacity to accurately predict the governing evolution of the weather and climate system. At the same time, disruptive changes have been happening in the HPC space, with the advent of heterogeneous architectures, high-bandwidth memory and new programming models to ac-cess these capabilities. Our ability to further progress in the weather and climate do-main, critically depends on our ability to reap the potential of emerging and new high-performance computing systems as they come online at the Swiss National Super-computing Centre (CSCS), but also worldwide.

There have been numerous attempts to port weather models to heterogeneous architectures that provided satisfactory performance results. However, many of them have not seen a transition from research to production codes, due to the lack of portability of the resulting rewritten models. Portability is a serious challenge for weather and climate models since they are composed of large and complex code bases. As part of the HP2C Initiative, the regional weather and climate simulation model employed in Switzerland, COSMO, has been successfully ported to GPUs, allowing to run for the first time operational weather forecasts with 1 km resolution at MeteoSwiss and European-scale climate simulations with 2 km resolution at ETH. Both efforts use HPC computing resources located at CSCS in Lugano. A key characteristic of this effort was the portability of the resulting implementation, which is run also on traditional CPUs architectures at several other weather centers and institutions.

Figure 1 Timeline of different previous activities in research projects for portability of the COSMO model to heterogeneous architectures and their main achievements in time. Last block shows the scope of the proposed project for PASC.

In the last years the latest generation Intel Xeon Phi many-core architecture has emerged as an important alternative to GPU accelerators in the field of scientific computing. As a community code and also in order to avoid vendor lock-in, it is fundamental for COSMO to run efficiently on Xeon Phi and to retain the portability and performance portability aspect of the developments derived from the HP2C project. Additionally, strong scalability is an essential requirement for weather and climate models. Apart from to obvious reduction in time-to-solution, there is a more subtle reason. Increasing horizontal resolution entails a reduction of the timestep used in the forward integration of the governing equations. Strong scaling is the only viable option for reducing the time-to-solution implied by the timestep reduction and the associated increase in computation effort per model gridpoint. Flagship use-cases of the COSMO model such as the new operational 1 km implementation of MeteoSwiss and the 2 km European-scale climate simulations are already at the limit of strong scalability for accelerators. The roadmap of the weather and climate community foresees that the COSMO model will be extensively used beyond 2020 and further progress is critically dependent on improvements in strong scalability.

As shown in Figure 1, the COSMO HP2C project delivered a portable solution for x86 and NVIDIA GPUs, with a special focus on the analysis and optimal performance of individual computation kernels typical in weather and climate models. The developments proposed for the PASCHA project go one step further in scalability and portability of the COSMO model, aiming at a fully portable version of COSMO by supporting Xeon Phi architectures and focusing on the strong scaling behavior of the model, which is essential in order to be able to take advantage of the massively parallel heterogeneous supercomputers at CSCS. The PASCHA project targets these urgent needs and thus will have an immediate and direct impact on the main use-cases of COSMO in Switzerland and community-wide. The work will be executed in four distinct work packages.

Work package A (1.6 FTE funding) will improve the strong scalability of COSMO by introducing functional parallelism, asynchronous RDMA inter-node communication and optimizing the existing fine-grain parallelism (SIMD and SIMT).

Work package B (0.9 FTE funding) will port COSMO to Intel Xeon Phi architectures.

Work package C (1.1 FTE funding) will strengthen the use of the GridTools domain-specific language (DSLs) in COSMO to ensure maintainability and performance port-ability.

Work package D (0.4 FTE funding) will automatize the global optimization process in COSMO through the use of performance-model driven optimization.

We build this project on an experienced and inter-disciplinary team with a strong track-record of being able to execute and disseminate software development projects focused on COSMO. A key aspect of the project is that it will help deepen the collaboration of the domain-scientist community with computer science.

We request funding for two PostDocs working for the duration of two years each. Furthermore, we request 10%-level funding for staff at C2SM to help coordinate, integrate and disseminate the developments with the official trunk version of the COSMO model. The total amount of requested funding (CHF 494’300) is complemented by a substantial contribution in in-kind funding (CHF 625’109). Among several significant contributions, will provide in-kind funding to carry one of the two PostDocs for an additional year. The project will be developed in the framework of the Center for Climate Systems Modeling (C2SM), which has been established to foster and coordinate research activities at ETH, MeteoSwiss, and CSCS with the overarching goal to improve our current capability to predict weather and climate. Apart from the immediate impact on the main use-cases of the COSMO model in weather and climate, the project has the potential to impact several key applications of the PASC community being run at CSCS. The profits of the developments on the GridTools DSL go beyond the scope of the COSMO model and will be applied to other applications.