Poster Title: 
Poster Abstract: 
Author First Name: 
Author  Last Name: 



Author Name:  Ayush Chaturvedi
Poster Title:  Beyond ComputeCovid19+: An Architecture Aware COVID-19 Diagnosis and Monitoring via High Performance Sparse Deep Learning on CT Images
Poster Abstract: 

Many deep learning model architecture are an inspiration of how human brain works however their implementation in computer programming deviates in the sense that these networks over time become dense or are intentionally designed in such a way to achieve better generalization and accuracy whereas neural architecture in brain is highly sparse. In this work we target a similar deep learning model designed to enhance CT images of Covid-19 chest scans namely DD-Net ( short for Dense Net and Deconvolution Network) from prior work of ComputeCovid19+. The model follows an auto encoder decoder architecture in the deep learning paradigm and has high dimensionality due to presence of stack convolution layers and deconvolution layers and thus takes many compute hours of training. We propose a set of techniques which target these two aspects of model - dimensionality and training time. We will implement techniques to prune neurons making the model sparse and retrain this sparsified model to reduce the effective dimensionality with a loss in overall accuracy of not more than 2% with minimal additional overhead of re-training. Then we propose set of techniques tailored with respect to underlying hardware in order to better utilize the existing components of hardware (such as tensor core) and thus reduce time and associated computational cost required to train this model.

Poster File URL:  View Poster File


Author Name:  Somdutta Ghosh
Poster Title:  Effect of the nuclear Equation of State on the outcome of Core-Collapse Supernova
Poster Abstract: 

Massive stars end their lives when their core collapses under the influence of gravity. In some cases, the collapse results in a bright and spectacular event where they form a core-collapse supernova (CCSN). In some other cases, the star fails to explode and eventually forms a black hole (BH). Despite many efforts, we have yet to answer the question of which massive stars will end their lives as a CCSNe and which ones will collapse into a BH. Here, we investigate the impact of the equation of state (EOS) of the dense nuclear matter on the outcome of core-collapse and subsequent nucleosynthesis. We model the simulation using the parametrized spherically symmetric explosion method PUSH which includes general-relativistic hydrodynamics and neutrino transport. We use 8 different supernova EOS and study the variation in explosion properties and nucleosynthesis yields for stars with different metallicity and ZAMS mass. In this poster, I will present how the nuclear EOS influences the outcome of the core collapse. I will also discuss how using HPC can speed up the process of simulating the core-collapse process.

Poster File URL:  View Poster File


Author Name:  Tomas Cabrera
Poster Title:  Beyond the million-body problem with Cluster Monte Carlo
Poster Abstract: 

The high densities of astrophysical objects in globular and nuclear star clusters (GCs and NSCs, respectively) naturally lead to high rates of strong gravitational interactions among such objects.  GCs are most relevant to contemporary astrophysics as gravitational wave source locales due to the migration of black holes to the center of the cluster and the successive formation of black hole binaries.  To date, the largest computational models of GCs have nearly 10 million particles, a boundary which begins to enter into the regime of NSCs.  The combination of NSCs and super-massive black holes (SMBHs) is a parameter space yet unexplored by simulations due to the complex physics involved, but one that can be entered in the immediate future.  This poster presents our most recent successes at applying the Cluster Monte Carlo (CMC) code to understand the dynamical generation of hypervelocity stars, and outlines our plan of expanding the domain of CMC to include NSCs.  A simplified overview of the inner workings of CMC is also included.

Poster File URL:  View Poster File


Author Name:  Sualeh Khurshid
Poster Title:  SCALING IN INCOMPRESSIBLE ISOTROPIC TURBULENCE
Poster Abstract: 

Universal statistical properties of turbulent flows have been associated traditionally with very high Reynolds numbers which are only realized in simulations with total grid points exceeding 8192^3 on the largest supercomputers in the world. Recent work has shown that the onset of this universal behavior occurs in derivative statistics at very modest Reynolds numbers of the order of 100 which can be simulated with up to 512^3 grid points. In this work, we establish the result for a range of initial conditions, in particular for different forcing mechanisms. We also show that the moments of transverse velocity gradients possess larger scaling exponents than those of the longitudinal moments, suggesting that the former are more intermittent than the latter.

Poster File URL:  View Poster File


Author Name:  Tianqing Zhang
Poster Title:  Bayesian resampling inference for weak lensing and clustering cosmology
Poster Abstract: 

In the past decade, the stage-III extragalactic surveys have provided tremendous data volume for weak lensing and clustering science, enabling cosmologists to recover tighter constraints on cosmological parameters, such as Ωm and S8. However, producing credible contours on the cosmological parameters requires careful marginalization over the uncertainties of other nuisance parameters, which leads to an unbearable increase in the MCMC dimensionality. We develop a resampling method for marginalization over high-dimensional or unparameterizable nuisance effects, such as the redshift distribution uncertainties ∆n(z) and point spread function (PSF), that is in compliance with the Bayes’ theorem. The Bayesian resampling method draws realizations of the nuisance effect and runs the inference pipeline on other parameters parallelly. It then combines all the branched-out chains with a weight proportional to the Bayesian evidence. We test this method on weak lensing shear data from the Hyper-Suprime Cam (HSC). Although it faces the challenge of computational cost, we find that the method gives more credible results compare to other methods, such as direct resampling and shift models. We plan to further apply this method to the joint probe that combines weak lensing and clustering information. 

Poster File URL:  View Poster File


Author Name:  Alex Bercik
Poster Title:  Mathematically Rigorous Discretizations for Fluid Dynamics
Poster Abstract: 

From designing aircraft to modelling weather and simulating complex astrophysical plasmas, Computational Fluid Dynamics (CFD) has applications to nearly every discipline in science and engineering. Unfortunately, current state-of-the-art simulation methods are severely limited by computational cost. Low-order methods, commonplace in industry, introduce several approximation techniques to decrease this computational cost, but in doing so lack the efficiency to accurately simulate complicated flows. In contrast, high-order methods are constructed to be exponentially more efficient, allowing for more accurate simulations for the same computational cost. The main challenge with high order methods is their lack of robustness and susceptibility to numerical instabilities. Rather than apply intuition-based stabilization techniques that lack quantitative justifications, there has recently been significant progress in the use of Summation-by-Parts operators, which possess mathematical properties that allow for rigorous stability proofs. In addition, these methods lend themselves well to parallelization and HPC architectures. My research focusses on taking these mathematical frameworks and applying them to practical CFD solvers. By implementing these schemes within an industry-level code, I will be able to identify and address the issues that prevent these schemes from achieving a fully robust, high order, and stable method capable of simulating challenging turbulent fluid flows.

Poster File URL:  View Poster File


Author Name:  Cheng Xu
Poster Title:  Parallelization of a serial program based on multiple parallel computing tools
Poster Abstract: 

For a serial code simulating the permeation phenomenon, the original code is improved and optimized by MPI , OpenMP, CUDA, and other parallel computing tools. The optimized version will be compared with the efficiency of the serial one. At the same time, we have made comparisons accordingly based on the analysis of the characteristics of different kinds of for loops, and implemented a new, alternative scheduling algorithm. 

Poster File URL:  View Poster File


Author Name:  Andre Marchildon
Poster Title:  Development and Application of a Bayesian Optimizer for Aerodynamic Shape Optimization with Inexact Gradients
Poster Abstract: 

The aviation industry is continuously seeking to increase the efficiency of planes in order to reduce their fuel consumption for both environmental and economic reasons. One tool that is actively used to reduce the fuel consumption of planes is aerodynamic shape optimization. Aerodynamic shape optimization requires solving for the flow of air around the aircraft. This solution can be chaotic, which makes it challenging to calculate gradients. Traditional methods to calculate gradients, such as the adjoint method, cannot be used when the flow is chaotic. Alternative methods have been developed but they provide inexact gradients. This research project seeks to develop a Bayesian optimizer that can solve problems that are high-dimensional, expensive to evaluate, contain nonlinear constraints, and have inexact gradients. This will allow for aerodynamic shape optimization to be performed on problems with chaotic flows.


Poster File URL:  View Poster File


Author Name:  Shivan Khullar
Poster Title:  Combining multiple scales in star formation simulations
Poster Abstract: 

Star formation in galaxies occurs inside Giant Molecular Clouds (GMCs). GMCs have masses in the range of 10^5 to 10^8 solar masses, whereas individual stars have masses in the range of 0.1-50 solar masses. These newly formed stars produce radiation which helps destroy GMCs from the inside. Supernovae explosions and external radiation from nearby regions can also alter the evolution of GMCs. It is therefore important to resolve both the regions inside a GMC, as well as the environment around these GMCs to perform realistic simulations. Traditional simulations focus on the evolution of a single GMC but fail to include the environment.  

We use HPC to answer several outstanding questions in star formation by simulating either parts of a single GMC at high resolution or an ensemble of GMCs embedded in a galactic environment. While the first method  has the advantages of resolving the dynamics inside the GMCs, it neglects the galactic context. On the other hand, the second method offers the advantage of a galactic environment and a statistical ensemble, but cannot resolve the dynamics inside GMCs with high enough resolution. We aim to combine these two approaches and simulate a single GMC at high resolution while keeping the galactic environment intact by focusing more higher resolution in and around the GMC, while de-resolving the galactic regions far away from the GMC. 


Poster File URL:  View Poster File


Author Name:  Hiroyuki Ootomo
Poster Title:  Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance
Poster Abstract: 

Tensor Core is a mixed-precision matrix–matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense matrix multiplication from machine learning. However, many applications in scientific computing such as preconditioners for iterative solvers and low-precision Fourier transforms can exploit these Tensor Cores. To compute a matrix multiplication on Tensor Cores, we need to convert input matrices to half-precision, which results in loss of accuracy. To avoid this, we can keep the mantissa loss in the conversion using additional half-precision variables and use them for correcting the accuracy of matrix–matrix multiplication. Even with this correction, the use of Tensor Cores yields higher throughput compared to FP32 SIMT Cores. Nevertheless, the correcting capability of this method alone is limited, and the resulting accuracy cannot match that of a matrix multiplication on FP32 SIMT Cores. We address this problem and develop a high accuracy, high performance, and low power consumption matrix–matrix multiplication implementation using Tensor Cores, which exactly matches the accuracy of FP32 SIMT Cores while achieving superior throughput. The implementation is based on NVIDIA’s CUTLASS. We found that the key to achieving this accuracy is how to deal with the rounding inside Tensor Cores and underflow probability during the correction computation. Our implementation achieves 51 TFlop/s for a limited exponent range using FP16 Tensor Cores and 33 TFlop/s for full exponent range of FP32 using TF32 Tensor Cores on NVIDIA A100 GPUs, which outperforms the theoretical FP32 SIMT Core peak performance of 19.5 TFlop/s.

Poster File URL:  View Poster File