Poster Title: 
Poster Abstract: 
Author First Name: 
Author  Last Name: 



Author Name:  Paul Grosse-Bley
Poster Title:  Performance Comparison of Multigrid Implementations on Accelerators
Poster Abstract: 

Multigrid (MG) methods are an important tool for efficiently solving large, sparsely occupied linear systems that arise e.g. when discretizing and solving elliptic PDEs. Due to the prevalence of graphics accelerators in the HPC space, this poster compares the performance of different MG algorithms in existing implementations on an Nvidia A100 accelerator. For matrices with regular sparsity patterns as they appear when discretizing PDEs on a structured grid, geometric multigrid (GMG) schemes are in theory very efficient as they make use of the inherent geometry of the problem. In contrast to the more general algebraic multigrid (AMG) there are few broadly used, production-grade libraries that implement GMG efficiently on accelerators, i.e. using stencils instead of general sparse matrix-vector products. In addtion to the generic GMG implementation in the PETSc library using cuSPARSE and cuSOLVER, we therefore take a slightly modified version of the HPGMG-CUDA benchmark and the experimental ExaStencils code generation framework as GMG implementations. AMG will be represented by the AMGX library developed at Nvidia. Numerical results show how big of an influence certain design decisions have on solver performance.

Poster File URL:  View Poster File


Author Name:  Gayatri Aniruddha
Poster Title:  High Time Resolution GPU Imager for Low Frequency Radio Telescopes
Poster Abstract: 

Fast Radio Bursts (FRBs) are very bright (even tens of Jansky) milli-second radio pulses originating even from a very distant Universe. The search for FRBs is similar to that of single pulse search from pulsar observations through beam-forming. This method has a high associated computational cost and hence, in my Masters by Research, I am exploring alternate image-based approaches, specifically for low-frequency FRB searches. Though existing imagers such as CASA, MIRIAD and WSCLEAN are suitable for FRB searches, they have substantial I/O heavy operations as they require their inputs to be present in a specific format. Furthermore, there is no imaging software, operating entirely or predominantly on GPUs to look for FRBs. Thus, the main aim of my Masters by Research is to develop an alternative GPU imager which will be a part of the processing pipeline searching for FRBs. I will continue working on the imaging software to test and deploy it on SKA-Low stations and the Murchison Widefield Array (MWA) and further optimise its processing efficiency by implementing it on GPUs to capitalise on parallel programming. In this poster, I will be presenting the following: 

  • Implementation of GPU Imager using CUDA programming environment for a single time-step and a single frequency channel. 

  • Implementation of multi-channel, multi time-step GPU Imager, focussed on parallel gridding using 2D-grid of CUDA blocks. 

  • Comparison of the run-times of the imager on Topaz and Setonix supercomputers.

  • A discussion of whether real-time imaging was achievable from the current multi-channel, multi time-step GPU imager.


Poster File URL:  View Poster File


Author Name:  Calum Snowdon
Poster Title:  Coupled Cluster on Exa-Scale Systems
Poster Abstract: 

Coupled Cluster is a family of ab initio Computational Chemistry methods which generate approximate solutions to the electronic Schrödinger equation for molecular systems. Coupled Cluster is capable of achieving accuracy on-par with experimental results in many cases of interest. However, Coupled Cluster methods are expensive, with computational complexities of O(N^6) and above, and memory complexities of O(N^4) and above. This has historically limited the applicability of Coupled Cluster to systems of only a few dozen atoms. The newly emerging exascale-class supercomputers have the computational power to make larger-scale Coupled Cluster computations feasible, but the development of Coupled Cluster algorithms for these supercomputers has been slow due to the combined complexity of the algorithms and of the supercomputers themselves. The goal of this project is to develop a software framework to enable the development of Coupled Cluster methods which achieve optimal utilization of exa-scale hardware.

Poster File URL:  View Poster File


Author Name:  Fiona Yu
Poster Title:  Automated Graph Theory-Based Molecular Fragmentation
Poster Abstract: 

The chemico-physical characterisation of large, dynamic molecular systems is a key challenge in computational chemistry. Modern quantum chemical (QC) calculations provide highly accurate computational models of the chemico-physical behaviour of matter and therefore provide a potential resolution for the characterisation of large molecular systems. However, the time required for accurate QC algorithms grows rapidly (at least O(N^4)) with system size, limiting their applicability to systems containing less than a few hundred atoms.

Molecular fragmentation has emerged as a successful strategy to reduce the time complexity of underpinning QC algorithms to O(N) whilst enhancing algorithmic parallelisability. However, such fragmentation approaches are not widely adopted due to a dearth of automated fragmentation methods; most involve manual fragmentation and furthermore, none are optimised for supercomputer architecture, prolonging time-to-solution. 

To this end, we aim to develop a fast, automated fragmentation approach. We leverage concepts from computational chemistry, graph theory and high-performance computing to develop a fragmentation algorithm capable of handling molecular systems of thousands of atoms. The proposed scheme employs a scoring function to define the quality of fragmentation and a corresponding optimisation procedure to attain high quality fragments. This will enable accurate characterisation of large molecular systems including thousands of atoms, and allow researchers to solve problems in chemistry and cognate fields at unprecedented scales that were previously out of reach.


Poster File URL:  View Poster File


Author Name:  Zhaoping Ying
Poster Title:  Computational Fluid Dynamics Model of Laminar Spray Flames
Poster Abstract: 

Computational Fluid Dynamics (CFD) uses numerical analysis and data structures to analyze and solve fluid-flow problems based on the conservation laws (conservation of mass, momentum, and energy) governing fluid motion; Spray flames are highly relevant in many practical combustion systems such as industrial furnaces, liquid propulsion systems, household burners, and internal combustion engines. This poster presents the numerical model of laminar non-premixed ethanol/air spray flames under fuel-rich conditions in the counterflow configuration. The numerical method and the code flow chart are then presented. Some results are shown to show the validation of the CFD code and a typical outcome in the laminar counterflow spray flame, namely, the droplet will reverse and oscillate near the stagnation point as the initial gas strain rate increases. For future research, the multi-component droplet heating and evaporation model will be embedded in this code to analyze the realistic behavior of multi-component fuel like Jet-A in the spray combustion system. Meanwhile, the current serial CFD code will be parallelized and run in the HPC cluster to accelerate the computation speed.

Poster File URL:  View Poster File


Author Name:  Sushree Jagriti Sahoo
Poster Title:  Design of self-consistent convolutional density functionals for solid-gas interfaces using electronic structure descriptors
Poster Abstract: 

DFT is a widely used computational tool for studying electronic structure and properties of materials. It is used in heterogeneous catalysis to provide insights into mechanism of catalytic reactions and predict reactivity of catalysts. Exchange-correlation (XC) functional in DFT is a crucial component of the theory as it accounts for the effects of electron-electron interactions in a system and determines the accuracy of calculated system properties. The most accurate XC functional typically depends on the type of chemical system, making it challenging to choose a functional for systems that contain interfaces between different phases of matter or where multiple types of chemical bonding are important. GGA functionals such as PBE and PW91 are used to calculate chemisorption energies but they differ from experimental values by as much as 1 eV, which can lead to quantitatively and qualitatively incorrect conclusions in analysis of surface reaction systems. These functionals tend to be inaccurate for systems such as molecules and solids with localized electrons but are commonly used for modeling adsorption at catalytic interfaces due to their low cost. To address this, we propose a new functional design paradigm by moving beyond the typical model spaces into a systematically expandable model space which is constructed by extracting fingerprints using 3D convolutional kernels at varying length scales. This enables us to use different approximations at different points in space within a given system and distinguish between different electronic environments present in a system. We demonstrate this functional’s capabilities in interpolating between small organic molecules and primitive metal cells. It can achieve same level of accuracy as RPBE functional for molecules and the WC functional for metals.


Poster File URL:  View Poster File


Author Name:  Bradley Pascoe
Poster Title:  Modelling the Richtmyer-Meshkov instability (RMI) under strain
Poster Abstract: 

The Rayleigh-Taylor and Richtmyer-Meshkov instabilities are two of the limiting factors on the development of controlled fusion through inertial confinement fusion (ICF). A more accurate and robust understanding of these instabilities during the fusion process will allow for greater energy yields, towards the goal of emission free energy. To achieve this insight, the development of efficient and accurate simulations and modelling frameworks is desired. Understanding the role of strain on the growth rate of the instability is important as during ICF the mixing layer is spherically compressed, experiencing radial and transverse strain as compared to planar geometry where there is no strain imposed. High performance computing is utilised to conduct direct numerical simulations and implicit large eddy simulations of strained instability growths to help improve simpler models, such as bouyancy-drag equations and unsteady Reynolds Averaged Navier-Stokes. By enabling the usage of simpler computational models to simulate instability growth, the demand on computing resources is reduced.


Poster File URL:  View Poster File


Author Name:  Liam Ryan
Poster Title:  Accelerating Radio Astronomy Signal Processing with Tensor Cores
Poster Abstract: 

Radio telescopes produce vast amounts of raw voltage data which must be pre-processed in real-time through a series of digital signal processing stages such as filtering, channelisation, beamforming, and correlation before being written to storage. Large GPU clusters play a vital role in these pre-processing stages. Radio astronomers then perform a broad range of post-processing operations on this data such as synthesis imaging, machine learning searches and all-sky surveys through the HPC resources of the Pawsey Supercomputing Centre. Recent advancements have allowed correlation to be performed using Tensor Cores, giving order of magnitude improvements to performance and efficiency. Integration of the "Tensor Core Correlator" into the Murchison Widefield Array, as well as the impacts of this technology towards future telescopes such as the Square Kilometer Array are explored. 

Poster File URL:  View Poster File


Author Name:  Maël MARTIN
Poster Title:  Driving HPC Parallel Optimizations with DSL: Improve resource allocations
Poster Abstract: 

Performance portability of parallel applications is a major issue in a context where the architectures of supercomputers evolve very quickly in relation to the lifespan of the applications. Using a Domain-Specific Language and Abstractions (DSL/DSA) allows adapting to a new machine without having to rewrite the applications. That is why we want to define a methodology to improve parallel code generation and drive resource allocations needed for the execution of the scientific application through DSL/DSA. We first define which properties are guaranteed by the DSL to guide optimized code generation, then adapt the intermediate representation of this language by integrating concepts related to parallelism. In this way, we can deduce the algorithmic complexity of the application and drive code generation alongside resource allocations needed for the execution of the application.

Poster File URL:  View Poster File


Author Name:  Jingde Zhou
Poster Title:  Development of the data buffer holding time-series data across multiple applications
Poster Abstract: 

Cross-reference simulation is a calculation model coupling multiple parallel simulation codes when at least one simulation code needs to read the data in other simulation codes to do its own calculation. To provide a solution that can carry out cross-reference simulation efficiently and conveniently, a cross-reference simulation framework CoToCoA is being developed. It allows CoToCoA users to implement cross-reference simulation with minimal modification on simulation codes while the overhead is restricted as much as possible. 

Generally, a simulation code calculates and updates the data in each timestep so other simulation codes may miss data at some timesteps due to the overwriting in a cross-reference simulation. In this poster, a function with a specific data buffer is developed to implement one-sided data communication among multiple simulation codes with minimal communication loss. This specific data buffer is utilized to save the data at each timestep. Then no data will be skipped as long as the data buffer is not full. Meanwhile, the number of data communication calls between different nodes can be reduced significantly.

Poster File URL:  View Poster File