The first minisymposium in series took place at PASC’23 in beautiful Davos. Gathered in a conference room of Congress Center Davos, early adopters of SYCL programming model shared their experiences using it for performance and portability for different scientific codes. From the intricate world of sparse linear algebra with Ginkgo to the seismic simulations of SeisSol, the session was a testament to the versatility of SYCL in tackling some of the most demanding computational problems. WarpX showcased how SYCL is being harnessed to simulate particle accelerators with unprecedented precision, while the ACTS project demonstrated its potential in reconstructing charged particle tracks in high-energy physics experiments.

The full collection of presentation slides can also be viewed on FigShare.

Minisymposium abstract

Heterogeneous architectures are now common in high-performance computing. As accelerator hardware gets more diverse, HPC applications must target architectures from multiple vendors and hardware generations with different capabilities and limitations. The dominance of vendor-specific and proprietary programming models makes writing portable and future-proof code challenging. Like OpenCL, SYCL is a standard not controlled by a single vendor. SYCL is still in its early stage, but there already are several quite mature implementations of the standard by different groups, targeting a wide range of hardware. It has been adopted by several large HPC projects and is the recommended programming framework for upcoming Intel GPU-based supercomputers. SYCL allows accessing low-level hardware capabilities while providing a high-level abstraction and the ability to rely on the runtime for data movement and execution configuration choice, making it possible to write portable and performant code without compromising the ability to tune for a specific device architecture when needed. This minisymposium aims to discuss SYCL from the perspective of scientific application developers, sharing the experiences of early adopters and promoting interdisciplinary communication in software engineering for modern, performance-portable, and maintainable code.

Introductory slides

Heterogeneous Computing with SYCL in WarpX and AMReX

Authors: Andrew Myers ORCID iD Icon and Weiqun Zhang ORCID iD Icon

WarpX is a time-based electrostatic and electromagnetic Particle-in-Cell code that supports many advanced features such as perfectly-matched layers, boosted frame simulations, highly accurate pseudo-spectral solvers, and mesh refinement. It has been applied to a range of scientific settings such as plasma-based acceleration, laser-plasma interaction, microelectronics, astrophysical plasmas, and more. WarpX is also a highly parallel and highly optimized code that scales to some of the largest supercomputers in the world. For its performance and capabilities, WarpX relies on many of the features of the AMReX framework, including a performance portability layer that supports execution on NVIDIA, AMD, and Intel GPUs as well as many-core CPUs. In this talk, I will describe how WarpX and AMReX achieve performance portability across a variety of heterogenous compute environments, including its use of the SYCL programming model.

Slides

Combining CUDA and SYCL for Accelerating Earthquake Simulations in SeisSol

Authors: Ravil Dorozhinskii, Michael Bader ORCID iD Icon , Alice Gabriel ORCID iD Icon

SeisSol is a high-performance simulation package designed for large-scale simulations of earthquakes and seismic phenomena. SeisSol is based on the discontinuous Galerkin method, the ADER time integration and Local Time Stepping. SeisSol uses a custom Domain Specific Language, called YATeTo, to tackle the demand of modeling various wave propagation models as well as providing portability between various CPU and GPU architectures from different vendors which have different capabilities and limitations. In this talk, we are going to report on 1) our recent results obtained with the SYCL compiler backend of YATeTo, 2) discuss benefits of using SYCL instead of OpenMP in the implementation of the dynamic rupture model in SeisSol and thus 3) coupling CUDA/HIP GPU programming models with SYCL.

Slides

Experience From Ginkgo Porting to the SYCL Ecosystem

Authors: Yu-Hsiang Tsai, Terry Cojean, Tobias Ribizel, Hartwig Anzt

Intel has introduced new GPU architectures into the High-Performance Computing markets and has chosen the SYCL ecosystem for its devices. Based on SYCL, they create several extensions to help the coding experience and improve usability. They also provide the tools to help port the code from CUDA and measure the performance. Many applications rely on mathematical libraries, so mathematical libraries need to be available on these new devices. With the new Exascale machine Aurora using Intel GPU accelerators, Ginkgo, which is a sparse linear algebra library, also faces the challenge of porting the code to the new Intel GPUs. Moreover, Ginkgo needs to provide competitive performance and good portability. We show the ginkgo portability design such that we can add SYCL to our structure. We also share the challenges, concerns, and experiences from porting Ginkgo to SYCL. In the end, we summarize the supported functionality on different GPUs from different vendors.

Slides

Charged Particle Reconstruction Using SYCL with the Traccc Project

Authors: Attila Krasznahorkay ORCID iD Icon , Andreas Salzburger, Beomki Yeo, Stephen Nicholas Swatman ORCID iD Icon , Joana Niermann, Guilherme Metelo Rita De Almeida

Reconstructing the tracks left by high energy, electrically charged particles in specialised particle detectors is one of the most challenging software tasks in modern High Energy Physics. In recent years several projects took it on themselves to find and implement track reconstruction algorithms suitable for hardware accelerators. In many cases this lead to algorithmic compromises, using simplified descriptions of the detector geometry, input data and magnetic field. The traccc project was started as an R&D initiative of the ACTS common track reconstruction project. It aims to demonstrate the feasibility of a complete track reconstruction algorithm chain on both traditional CPU, and GPGPU hardware architectures. A large emphasis in the development is put on sharing as much code between the CPU and GPGPU implementations as possible, and avoiding algorithmic and physics performance compromises. In this talk I will present the state of the art of this R&D project. I will describe its components dedicated to handling well defined sub-tasks in the track reconstruction, and I will present performance results with the code. As run on a variety of hardware, with the help of the SYCL programming language.

Slides