Poster Title: 
Poster Abstract: 
Author First Name: 
Author  Last Name: 



Author Name:  Aymeric MILLAN
Poster Title:  Exploring SYCL for batched kernels with memory allocations
Poster Abstract: 

Batched parallelism with local allocations is an extremely common pattern in HPC, appearing in multi-dimensional FFTs, neural networks processing, or split computation of numerical operators. Its efficient support is especially complex on GPU where memory per thread is limited and dynamic memory allocations are challenging. This study investigates whether the native abstractions of SYCL can support performance portability for this pattern. We implement versions of a batched semi-Lagrangian advection kernel using each parallel construct of SYCL. We evaluate them in terms of maintainability, performance portability and memory footprint on CPUs and GPUs (AMD, Intel, NVIDIA), with two distinct SYCL implementations (AdaptiveCpp and DPC++). Our results demonstrate that no single parallel construct of SYCL emerges as best solution and that a construct offering a higher level of abstraction would be required to support this common pattern.

Poster File URL:  View Poster File


Author Name:  Michele Pellegrino
Poster Title:  Bridging the molecular and the continuous description of wetting dynamics
Poster Abstract: 

Modeling wetting dynamics is important for understanding various natural processes and controlling numerous industrial applications. In the past 50 years the community of fluid dynamics has come up with theoretical models and experiments aimed to demystify the dynamics of contact lines, i.e. the locations in space where liquid, vapor and solid phases meet. One key conclusion of this effort is that wetting dynamics is an inherently multiscale process, whereby flow at all scales is important. The possibility of investigating the physics of contact lines is limited by the spatial resolution of experiments, which cannot probe the nanoscale. In the last two decades a new investigation tool has joined the fray: direct numerical experiments, in the form of Molecular Dynamics (MD) simulations. These ‘virtual lenses' enable us to inspect wetting processes with a time and spatial resolution impossible to achieve with experiments. We have used MD simulations to study how wetting on hydrophilic silica-like surfaces occurs. Our first research direction involves the parametrization of meso- and macroscopic models, which are considerably faster to simulate than MD. We have also focused our attention to molecular-scale processes. We have studied how the local layering and orientation of water molecules close to silica surfaces affects the mobility of contact lines. We have also investigated the relation between liquid-solid friction and liquid viscosity. Large-scale non-equilibrium wetting simulations pose challenges that are rarely encountered in MD simulations outside the field of fluid dynamics. The sheer size of the simulated systems requires the use of high-performance computing and parallelization over multiple compute nodes. Parallelization is not problematic for particle-based techniques, as long as interactions remain sufficiently local. The calculation of electrostatic forces, however, requires global operations, i.e. communication between all compute nodes, which can take up to 50% of the compute time. Lastly, systems with liquid/vapor interfaces are quite inhomogeneous. This complicates load balancing for particle-based algorithms and could lead to a waste of computations for grid-based algorithms.

Poster File URL:  View Poster File


Author Name:  Jakub Homola
Poster Title:  Acceleration of FETI on modern GPUs
Poster Abstract: 

Engineers today rely heavily on numerical simulations, like heat transfer to analyze heat propagation through a material, or linear elasticity for checking how a mechanical part bends and deforms. To perform these simulations, they use e.g. the FETI method, which is a highly parallel solver capable of scaling to whole supercomputers. Its most time-consuming part is the application of the dual operator F=B*K^+*B^T in every iteration of the solver, which is traditionally performed on the CPU using an implicit approach of applying the individual matrices right-to-left. To accelerate the application of the dual operator, we use GPUs. We use an explicit approach, where the local dual operator matrix is explicitly assembled, so the application is then just a dense matrix-vector multiplication, which GPUs are well suited for. We use GPUs to perform the assembly as well, using triangular solve and matrix multiplication operations from mathematical libraries. The time of preprocessing, where the assembly is done, will necessarily increase, but this will be compensated by the now much shorter application. The question is if the preprocessing cost is amortized in few enough iterations to make the GPU acceleration worth. In this poster, we explore several possible implementations of the assembly process and compare them on modern Nvidia and AMD GPUs. We measured the cost of the assembly to be amortized in at most 20 and 40 iterations on Nvidia and AMD GPUs, respectively, for most problems studied.

Poster File URL:  View Poster File


Author Name:  Hanna Mohr
Poster Title:  Learning to Learn on HPC
Poster Abstract: 

Learning to Learn is a machine learning concept to improve learning performance. The learning process is divided into two loops. The inner loop, executes an algorithm with learning capabilities on a specific task T of a family F of tasks. The performance of the algorithm can be calculated using a fitness function. The (hyper)parameters and fitness values of the inner loop are sent to the outer loop. The outer loop uses an optimization technique, such as evolutionary algorithms, filtering methods, or gradient descent, to improve the performance of the inner loop. The parameters are then returned to the inner loop to perform a new iteration.

To enable this meta-learning and hyper-parameter optimization on HPC, the open-source Python tool L2L was developed. It was originally designed for neuroscience use cases, but is applicable to any scientific field. L2L supports MPI across nodes and multi-threading per node. Using population-based optimization methods, simulations can easily be run in parallel. The framework handles the distribution and collection of results using the Jülich Benchmarking Environment (JUBE) as the execution manager.

This poster shows how to run simulations with the L2L framework, what issues need to be addressed to make it applicable to exascale computing in terms of memory usage and I/O efficiency. Futhermore a neuroscientific usecase is presented.

Poster File URL:  View Poster File


Author Name:  Vicente Lopez-Oliva
Poster Title:  Simulation of quantum circuits with Decision Diagrams
Poster Abstract: 

Quantum computing holds the potential to solve complex problems currently beyond the capabilities of classical computers by exploiting the principles of quantum mechanics. However, due to the immaturity of such computers and the inability to know the state of the circuit during execution, efficient quantum simulators on classical computers are needed. We develop different methods for the contraction of quantum circuits represented as tensor networks. Tensor Decision Diagrams (TDD) are used to implement two contraction ordering methods that exploit the structure of the circuits to reduce temporal and spatial costs. Experimental results show that these methods improve other well established approaches and that they are faster that other well-known simulation tools based on other circuit representations. In addition, an analysis of different implementations of the same quantum algorithm highlights the impact of gate sets on contraction efficiency. As a future goal, we aim to develop methods to simulate efficiently quantum circuits on classical computers using TDDs. One way to achieve this is by applying parallelization techniques at different levels of quantum circuit simulation. These levels include parallelizing a single contraction between two tensors, parallelizing contractions of disjoint tensors, and parallelizing contractions of different disjoint tensor networks.

Poster File URL:  View Poster File


Author Name:  Erika Hoffman
Poster Title:  Deciphering Supermassive Black Hole X-ray Winds: Likelihood optimization in high-dimensional spaces with a Bayesian Framework and HPC
Poster Abstract: 

Supermassive black holes (SMBH) at the centers of galaxies play a pivotal role in the evolution of their galactic hosts, but the physics to explain how this occurs remains poorly understood. As nearby matter is attracted to a SMBH, it forms a dense disk that becomes the most efficient powerhouse in the known universe, converting gravitational energy into outflowing radiation, collimated relativistic jets, and wide-angle winds. SMBH winds in particular have strong potential to explain SMBH-host galaxy relationships and can be studied via their interactions with X-ray radiation. Although these regions are too small to resolve in images, we can determine key wind properties, like velocity, temperature, and composition, with careful analysis of high spectral resolution X-ray observations. Studying these data, which include tens of thousands of data points as a function of wavelength and brightness, can be challenging since their behavior depends on computationally intense physical models and likelihood optimization in high-dimensional spaces. However, recent improvements in models used to explain the data, when rigorously analyzed using a Bayesian framework implemented with high-performance computing, can give us new, key insights. I have designed a pilot study to re-analyze archival X-ray data for the SMBH MCG-6-30-15 with these new models and methods to demonstrate its effectiveness in determining the impact of SMBH winds on galaxies.

Poster File URL:  View Poster File


Author Name:  Michael Krayer
Poster Title:  Digital twins for global to regional weather systems
Poster Abstract: 

The GLORI Digital Twins are configurable, on-demand, high-resolution digital twins based on the predictive capabilities of the ICON numerical weather and climate prediction model (www.icon-model.org). GLORI enables the computation of tailored forecasts of atmospheric dynamics and components by professional users in a distributed and modular fashion. Use cases of the digital twins include, inter alia, health applications (pollen, air quality, urban heat islands), energy applications (photovoltaics) and extreme events (floods, droughts) or agricultural applications. The underlying numerical model supports various parallelization paradigms, such that the digital twins can be flexibly deployed to various HPC platforms. 

Poster File URL:  View Poster File


Author Name:  Heribert Pascual Saldaña
Poster Title:  Edge-to-Cloud Architecture and AI System for Precision Medical Gas Dosing
Poster Abstract: 

This presentation highlights the two main areas of my work. One aspect is pursuing a PhD at part-time in the Advanced Networks Architectures lab of UPC, and the other is my role as a IT specialist at CSIC. Both roles are research-focused and closely related to HPC infrastructures

The PhD research addresses the challenge of creating lightweight predictive models for gas dosing that can be executed at edge devices. By leveraging anonymized data, it facilitates inter-hospital information sharing, supporting specialists in diagnostics and prescriptions using data from unrelated institutions. The primary goal is to develop a robust system for improving patient care through advanced data analytics and AI.

The system is based on an edge-to-cloud software architecture combined with HPC, and is designed to enhance medical diagnostics and therapeutic gas dosing, through precise data collection and AI-driven decision-making and recommendations. This architecture ensures efficient data collection, processing, AI model generation, and user assessment across various platforms, maintaining user privacy and optimizing computational resources and network bandwidth.

A key component of this work is the A6MWT system, which automates the widely used Six-Minute Walk Test (6MWT). This system integrates smart devices with a mobile application to collect extensive patient data, enabling more accurate diagnoses and AI model generation.

Additionally, the FALL3D code, a flagship model of the Centre of Excellence for Exascale in Solid Earth (ChEESE CoE), is presented. FALL3D simulates the dispersion and deposition of atmospheric particles, with a particular focus on volcanic ash.


Poster File URL:  View Poster File


Author Name:  Andrea Miola
Poster Title:  AgrUNet: a multi-GPU UNet based model for crop classification
Poster Abstract: 

Agriculture is a key factor for several aspects of human progress, acting as a catalyst for comprehensive economic growth, boosting income levels, mitigating poverty, and contrasting hunger. However, the steady increase in global population and the detrimental impacts of climate change induced by global warming, are impacting Earth's resources regenerative capacity. As a consequence, in the last years, the necessity for careful monitoring and management of agricultural practices and land use has risen in importance. 

The advent of high-resolution satellite missions (e.g. LandSat satellites and Copernicus Sentinel program) and AI techniques completely revolutionized the crop classification field and the Earth Observation (EO) studies as a whole, allowing to conduct extensive studies on large areas in significantly less time than man-made approaches.

In recent years the development of new technologies like Deep Learning (DL) algorithms significantly boosted the performances of AI models compared to classic Machine Learning approaches, but with the downside of requiring significantly higher computing power. In addition, the development of High-Performance Computing (HPC) infrastructure equipped with GPUs or other types of accelerators alleviates this problem, allowing users to train DL models on dedicated hardware within reasonable times.

In this study, we developed AgrUNet, a scalable, fast, and reliable UNet-like DL architecture able to perform crop classification studies by conducting image segmentation on multispectral, multitemporal satellite data. AgrUNet is developed to be able to run on both single and multi-GPU HPC systems with a nearly ideal parallelization level and the ability to exploit different types of numerical precision. This resulted in a DL model significantly faster than state-of-the-art CNN architectures and extremely competitive in segmentation metrics results.


Poster File URL:  View Poster File


Author Name:  Donovan Slabbert
Poster Title:  CAE-based Feature Extraction for QOCSVM Pulsar Anomaly Detection
Poster Abstract: 

The hybrid field of quantum machine learning (QML) shows promise to improve upon classical machine learning in many ways; however, it still faces many limiting factors. Two of the biggest challenges facing quantum computing in the current NISQ era are noise and time. Noise, in the form of decoherence or qubit relaxation, as well as the time required for feature embedding, gate operations, and measurements, are well-known examples. This is accentuated by an increase in data size. As the data size increases, so does the number of qubits required for encoding. Larger quantum circuits with more qubits are more susceptible to noise and take longer to execute. These limitations significantly hinder the application of QML to larger datasets, specifically image datasets such as the HTRU-1 pulsar dataset, as embedding the individual pixels onto a quantum computer necessitates larger circuits.

Despite amplitude encoding partially mitigating these limitations by addressing the dimensionality problem through efficient data representation as amplitudes in a quantum state, dimensionality reduction may still be required for QML to be practical for large images and datasets. The dimensionality reduction should be applied without a significant loss of information. We rely on a feature extraction method that involves training a classical autoencoder for image reconstruction. This will serve as both a dimensionality reduction and feature extractor. Once the autoencoder successfully reconstructs the images, we isolate and flatten the latent space to serve as the new, reduced dataset. This latent space can then be used for further machine learning tasks, such as anomaly detection. Anomaly detection is a common machine learning problem where the goal is to identify a sample as a possible anomaly.

Combining this approach with the convenience that quantum support vector machines (QSVMs) typically require fewer training samples to learn patterns than variational approaches, we create a hybrid pipeline that integrates classical feature extraction using a trained convolutional autoencoder (CAE) for image reconstruction with a quantum-enhanced one-class support vector machine (QOCSVM) for anomaly detection. This method is applied to a real-world task of pulsar anomaly detection, using the HTRU-1 image dataset. Preliminary results show that the QOCSVM performs comparably with its classical counterparts when meaningful features are extracted through image reconstruction. The hope is that this idea can be extended to larger datasets with more complicated images in the future.

Poster File URL:  View Poster File