Poster Title: 
Poster Abstract: 
Author First Name: 
Author  Last Name: 



Author Name:  Philipp Ulbl
Poster Title:  High Performance Implementations of Collision Operators in Simulations of Gyrokinetic Plasma Turbulence in Magnetic Confinement Fusion Devices
Poster Abstract: 

Turbulent transport in the edge of magnetic confinement fusion devices is a critical phenomenon for the understanding of reactor relevant properties such as heat exhaust. Predictions can be made with high fidelity numerical codes that are based on gyrokinetic theory, which can provide a first principles modelling of plasma turbulence. Recently, the gyrokinetic turbulence code GENE-X has been developed, enabling the simulation of the plasma from the core to the wall. Full-device simulations impose strict requirements on the spatial resolution, since the simulation domain increases and the finest length scale to be resolved (the Larmor radius) decreases. Additionally, the plasma is much colder in the edge, increasing the collisionality and thus requiring a proper modelling of collisions.
In this work, we present the gyrokinetic turbulence code GENE-X, along with certain high performance aspects, including the hybrid MPI/OpenMP parallelization. We will focus on the efficient implementation of collision operators used to model collisional effects on turbulence in the edge of magnetic confinement fusion devices.

Poster File URL:  View Poster File


Author Name:  Sarah Skinner
Poster Title:  N-pi Finite-Volume Energy Spectrum from Nf = 2 + 1 Lattice QCD
Poster Abstract: 

My research focuses on understanding the strong nuclear force described by quantum chromodynamics (QCD) within and between particles known as hadrons.  My calculations to study such interactions make use of lattice QCD (LQCD).  Using Monte Carlo techniques, LQCD specialists, such as my advisor, Prof. Colin Morningstar and our collaborators in the CalLat group, have been able to compute observables, such as energy spectra, hadron scattering phase shifts, and electromagnetic form factors. Such quantities are obtained by first using the Monte Carlo method to compute vacuum expectation values of judiciously-chosen quantum field operators separated in time.  Contributions from quark propagation are the most costly to determine since they involve inverting the so-called Dirac matrix.  Even with the use of advanced mathematical techniques, inverting the Dirac matrix (of typical size 10^8x10^8) requires large-scale HPC resources. We use the Texas Advanced Computing Center’s Frontera for our LQCD simulations.

Poster File URL:  View Poster File


Author Name:  Mikhail Kirilin
Poster Title:  The p4est Software for Parallel AMR: Algorithms and Interfaces
Poster Abstract: 

A forest of octrees is one data structure to represent the recursive adaptive refinement of an initial, conforming coarse mesh of hexahedra. By construction, it generates a space filling curve ordering of the inner and leaf nodes of the refinement forest that can be exploited for fast partitioning and load balancing, adhering to a strictly disjoint parallel storage of leaves. In addition, the structure allows for communication-free and efficient local neighbor finding and general remote object searches through the entire partition. The p4est software is a well-known software library that implements a collection of algorithms for parallel AMR. In this poster, we will focus on (a) performance improvements by exploiting the AVX instruction set and (b) latest additions to the set of available algorithms, in particular configurably ordered traversals and non-balanced halo gathering and mesh iteration.

Poster File URL:  View Poster File


Author Name:  Cathal O’Brien
Poster Title:  Investigating the suitability of the Cerebras Wafer Scale Engine to HPC
Poster Abstract: 

The approach to exascale has been hampered by bottlenecks, including communication latency and memory access speeds. The Cerebras Wafer Scale Engine, informally known as the “supercomputer on a chip”, is a novel accelerator which promises to remove these bottlenecks due to its 1 cycle communication latency and 7 cycle memory access speeds.  However the WSE has limitations of its own, such as limited total memory capacity and a lack of precision. Given these strengths and weaknesses, this research involves investigating the suitability of the WSE for HPC. This will be done by porting the NAS Parallel Benchmark suite to run on the WSE. This commonly used benchmark suite contains 8 different benchmark applications, which will test the performance of the WSE on a range of common HPC patterns such as all-to-all communication. The resulting performance will then be compared against CPU and GPU versions of the NPB suite to determine the suitability of the WSE to HPC. This is novel research as the benchmark suite will be ported using the Cerebras Specific Language, a recently released language which offers low-level control of the WSE to researchers for the first time.


Poster File URL:  View Poster File


Author Name:  Hailey D'Silva
Poster Title:  NeuroStack: A CloudFormation Tool for Analyzing Neuroimaging Data Using AWS Cloud Computing
Poster Abstract: 

Analyzing neuroimaging datasets at scale presents numerous challenges, particularly for those new to cloud-based computing. NeuroStack is a novel tool that builds Amazon Web Services (AWS) infrastructure to facilitate neuroimaging analysis using AWS cloud computing. Designed to enable researchers to quickly transition to the cloud, NeuroStack is built with the NITRC Computational Environment (NITRC-CE), allowing access to all of the pre-installed software therein, including FreeSurfer, AFNI, FSL, SPM, PLINK, and more. Users install NeuroStack into their AWS account, modify a template script to fit their needs, and upload their data. The data will be automatically processed according to the script and uploaded into the user’s AWS account. NeuroStack is free to use, although users are charged by AWS for storage, computational time, and other services. To illustrate the average processing time and compute cost of using NeuroStack to process neuroimaging data, I present the results using NeuroStack to process DICOM files from the Adolescent Brain Cognitive Development (ABCD) study through a structural (FreeSurfer) MRI analysis pipeline. The NeuroStack download and instructions can be found on the NITRC website at https://www.nitrc.org/projects/neurostack/.

Poster File URL:  View Poster File


Author Name:  Marie Reinbigler
Poster Title:  Frugal deep learning for neuromuscular tissue analysis for tomorrow’s gene therapies
Poster Abstract: 

Généthon is a pioneer in gene therapy vectors development that can benefit to several hundred neuromuscular diseases of genetic origin. Currently pre-clinical studies are conducted to gain precise knowledge of muscle physiology and disease based on the analysis of histological sections, i.e. slices of organs observed by microscopy. This work aims at processing these large histological slices by exploiting deep learning to analyze and classify the neighborhoods of pathological areas of interest using a 'frugal' computing platform, i.e., inexpensive local hardware resources to enable access democratization as well as total control on data privacy. The core of our approach is to build a scaling processing architecture composed of a pair of differentiable blocks. First, an image-based block will extract a segmentation of the slices into related components. In a second step, a graph-based approach, built on the set of previously detected components, will be deployed to perform the classification. At each step, our goal will be to maximize the computational performance of the execution medium, in order to make our approach compatible with modest computational resources - an open problem in the context of deep learning, which is particularly computationally and energy intensive.

Poster File URL:  View Poster File


Author Name:  Bálint Siklósi
Poster Title:  Achieving mixed precision computing with the help of domain specific libraries
Poster Abstract: 

As scientific applications in HPC are increasingly dependent on floating point number-based operations, the demand is emerging for tools to obtain better performance by utilising the characteristics of different representations. A particularly suitable solution for this purpose is mixed precision computing, where some part of an application can be transformed to lower precision, thus covering less memory and producing faster operation execution, while having as small precision loss as possible. In this project, we propose to extend with automatic mixed precision transformations the OPS and OP2: two domain specific libraries for the execution of structured and unstructured grid applications. These libraries also can be viewed as instantiations of the Access-Execute descriptor programming model, thus allowing the examination of interactions between kernels. As a baseline test, we hand tuned a CFD mini application to explore the possible performance gain through mixed precision computing in OP2. By halving the size of the most accessed data set, we achieved a 1.13x (1.1x) speedup on CPUs (and GPUs), compared to the speedup of a fully reduced sized execution: 1.76x (1.44x). Future tasks of the project are 1. to determine the precision loss and whether it is acceptable; 2. to develop strategies to choose candidates for precision lowering; 3. to estimate the expected performance gain.

Poster File URL:  View Poster File


Author Name:  Mickael Boichot
Poster Title:  Parallel Application Characterization for HPC GPU Porting
Poster Abstract: 

Heterogeneous architectures are now an integral part of the HPC landscape in the race to Exascale. Although powerful, they impose intense programming challenges that may require developers to rewrite part or all of their applications, which represents a significant programming effort to port the application to the GPU. Many powerful programming models exist to assist users in this task, but they do not show users the way to efficiently program this type of architecture. Tools exist to successfully predict the performance that an application would have if ported from one GPU to another, as well as tools for automatic code transformations. As regards applications already ported to a GPU, profiling tools exist to help the developer point out flaws in his program to help him correct them. Finally, a tool (Intel Offload Advisor) allows a preliminary phase of porting to be carried out to identify the parts of the code that are interesting to be deported to accelerators. Although interesting, this tool does not focus on the reality of current architectures (multi-GPU) and is restricted to certain code patterns (i.e. loops). This thesis position will be complementary to all these works and will allow to analyze an application or a sub-part of it, in order to determine the relevance of a possible porting on heterogeneous architectures. This tool will then make it possible to determine whether an application is portable to the GPU in the first instance, and then to assist the developers during the porting phases of the latter. This study will be based on a characterization of applications in order to determine the best practices to be adopted by a GPU computing code, but also the characterization of target multi-GPU architectures. We distinguish ourselves in this work by an approach made at 2 different scales: - static (through compilation, to analyze the structure of the codes), - dynamic (to capture all the elements that can evolve from one execution to another). Moreover, our work will not be linked to a particular constructor, but addresses any type of machine.

Poster File URL:  View Poster File


Author Name:  Duy-Khoi Dang
Poster Title:  Advances in Parallel Heat-Bath Configuration Interaction
Poster Abstract: 

Full configuration interaction (CI) provides the exact solution to the electronic Schrodinger equation in a one-electron basis. However, the method scales exponentially, thus is only feasible for the smallest of molecular systems. The heat-bath configuration (HBCI) method is a deterministic method that approaches the full CI limit at greatly reduced computational cost. The method employs a variational step, which recovers strong correlation effects, and a perturbative step, which recovers the remaining weak correlation. This project introduces improvements to the HBCI algorithm specifically targeting speed, parallel efficiency, and memory requirements. The code takes advantage of both MPI and OpenMP parallelization to achieve maximum performance. A hash function is used for distributing work with minimal communication during both the variational and perturbative steps. In the perturbative step, the hash function also reduces the memory requirements of the perturbative space. Benchmark calculations will demonstrate parallel performance and overall memory footprint of the new code.

Poster File URL:  View Poster File


Author Name:  Caprice Phillips
Poster Title:  A Data Driven Approach to Unlocking Planet Formation: The Atmospheric Retrieval of the BD +60 1417 System
Poster Abstract: 

The atmospheric characterization of resolved brown dwarf companions are unique observational treasures as they help tell a story of giant planet formation and evolution far from a given star (>>10 AU). Retrieving the bulk properties of brown dwarf atmospheres in comparison to their host stars can provide insights into formation mechanisms. I aim to use the atmospheric retrieval code, Brewster, to explore the cloud properties for a planetary mass brown dwarf (BD+60 1417 B) as well as robustly decouple temperature, gravity, and composition to derive gas abundances and ultimately estimate bulk composition ratios such as C/O to directly compare to the composition of the primary. I will present initial findings for this work to probe possible formation routes (e.g. core accretion, gravitational or turbulent disk fragmentation) for the companion


Poster File URL:  View Poster File