Poster Title: 
Poster Abstract: 
Author First Name: 
Author  Last Name: 



Author Name:  Mattia Mencagli
Poster Title:  ISTEDDAS: a new direct N-Body code on GPU to study merging compact-object binaries in star clusters
Poster Abstract: 

Since the first GW detection in 2016, the astrophysics community has detected more than 90 different GW events using LIGO and Virgo detectors. The astrophysical interpretation of these events is challenging from both the theoretical and the computational points of view. In this poster, I present a new powerful and versatile direct N-body code to study merging compact objects in star clusters, that are the first candidates for GW emissions. Isteddas is implemented natively in CUDA, it combines utilities of C++, MPI, and OpenMP to exploit GPU workstations and clusters. Star clusters can have as many as 10^6 stars, can be 1 pc large, and can be 10^10 years old, moreover, it is crucial to evolve compact objects binaries that can have an orbit as tight as the radius of our sun, and a period as short as days. Furthermore, Isteddas is coupled with SEVN, a population-synthesis code that evolves stars and binary stars, introducing even more timescales. Therefore we need to resolve large and small spatial and temporal scales, complex algorithms are needed, and developing a high-performance code on GPU is challenging. Isteddas employs the more advanced numerical methods in the state-of-the-art for direct N-body integration. The main integrator is the Hermite 6th order, which is coupled with the block-time-step method and the Ahmad-Cohen neighbors scheme to decrease the computation complexity. Moreover, it employs the algorithmic-regularization-chain integrator to reach the needed high accuracy in the dynamical evolution of very tight gravitational systems, such as black holes binaries. Finally, there are a plethora of other astrophysical phenomena that can be studied using Isteddas, including the star evolution and dynamics in galactic nuclei, the evolution of a black holes binary inside a dark matter halo, or the interaction between the host galaxy with its satellites star cluster.

Poster File URL:  View Poster File


Author Name:  Ahmed Mohammed Abdelfattah Abdellatif
Poster Title:  Direct Numerical Simulation of Supersonic Turbulent Hydrogen Flows on Heterogeneous HPC Systems.
Poster Abstract: 

My research is focused on advancing the scientific and engineering knowledge of supersonic hydrogen flows with application to the next generation of energy & propulsion technologies. In particular, my work is centered on studying supersonic hydrogen ejectors by means of large-scale high-fidelity simulations. For example, their application to fuel cells is especially promising to increase fuel utilization by recirculating the excess hydrogen back to the inlet supply. Despite their mechanical simplicity, ejectors present a complex interplay between several flow phenomena, including shock waves, laminar-turbulent transition, and mixing. Considering the technical difficulties of experiments, CFD appears to be an ideal tool. Previous studies utilized coarse-grained computations. Even though the results compared reasonably well with experiments in terms of global quantities, such approaches were inherently unable to capture the details of the flow. The dimensionless parameters involve turbulent regimes (Re~1e6), and supersonic conditions (Ma~2). As a result, the detailed computations envisioned of the H2 ejector require state-of-the-art HPC capabilities. In this regard, the objective is to accelerate a novel in-house MPI-based compressible flow solver with OpenACC directives to perform the first-ever scale-resolving simulations of a realistic ejector geometry to: (i) gather fundamental insights, and (ii) translate them into improved ejector designs.


Poster File URL:  View Poster File


Author Name:  Christodoulos Stylianou
Poster Title:  Using dynamic sparse matrices for performance portable SpMV
Poster Abstract: 

Sparse matrices and linear algebra are at the heart of scientific simulations. More than 70 sparse matrix storage formats have been developed over the years, targeting a wide range of hardware architectures and matrix types. Each format is developed to exploit the strengths of a particular architecture, or the sparsity pattern of specific matrices, and the choice of the right format can be crucial in order to achieve optimal performance. Dynamically selecting storage formats at run-time may be desirable to achieve maximum performance, however this requires an efficient switching mechanism that will not introduce prohibitive overheads. At the same time, this mechanism will enable the straightforward addition of new formats at a later stage without major application code changes.

In our work, we have developed Morpheus, a library that supports the runtime-selection of matrix storage formats by implementing sparse matrices using a single "abstract" dynamic format and providing a transparent and efficient mechanism to switch between different implementations, depending on the matrix and/or hardware being used. With multiple formats supported across multiple target hardwares, porting Morpheus in applications like HPCG (High-Performance Conjugate Gradients) Benchmark, results in a performance portable code capable of targeting many-core systems without any additional code modifications that optimizes performance by selecting and switching to the best format for the task.

Poster File URL:  View Poster File


Author Name:  Brendan Boyd
Poster Title:  3D Low Mach Simulations of Convective Urca Process in White Dwarfs
Poster Abstract: 

Prior to thermonuclear explosion and resulting supernovae, white dwarfs can go through a simmering phase where stable carbon burning in the core drives convection. During this phase, the Urca process (a pair of beta decay and electron capture reactions) can alter the structure of the white dwarf. This complex process can only be studied via a full 3D simulation of the star. Since the velocity of the material in this environment is slow moving (low mach), traditional fluid codes are ineffective. MAESTROeX is designed exactly for low mach problems like this and allows for the simulation to be computed efficiently and accurately by removing unnecessary sound waves which slow conventional hydrodynamic codes. Despite this, a full 3D simulation that can resolve the motions and mechanisms is still very computationally expensive. MAESTROeX deals with this by using parallelization via a hybrid OpenMP/MPI scheme and more recently with some GPU capabilities.

Poster File URL:  View Poster File


Author Name:  Susmita Singh
Poster Title:  Self-consistent Mean Field Solution for Frustrated Quantum Magnetic Systems
Poster Abstract: 

Frustrated magnetism is the phenomenon where localized magnetic moments of a material interact through competing exchange interactions that cannot be simultaneously satisfied. This leads to highly degenerate ground state configurations that grow as ~2N with system size. Due to strong quantum fluctuations, these degenerate ground state configurations form a superposition state, termed quantum spin liquid. We study the stability of this quantum spin liquid in the framework of extended Kitaev model by computing self-consistent mean field parameters. This requires iterative calculation of eigenvector and eigenvalues of the mean-field Hamiltonian on a 2D grid. This necessitates parallel computation over the grid using MPI.

 


Poster File URL:  View Poster File


Author Name:  Andrew Forembski
Poster Title:  Development of a Highly Efficient Computational Framework for Atomic Dynamics
Poster Abstract: 

Recent breakthroughs in laser technology have led to the realization of super-intense, ultra-fast, laser-driven processes at unprecedented time-scales which correspond to those of characteristic atomic times of the order of 10^{-15}sec and shorter. These extremely short time scales require the accurate theoretical treatment of the dynamics of the processes, which entail significant computational complexity with an associated trade off between efficiency and accuracy.

Our work focuses on the direct solution of the Time-Dependent Schrödinger Equation (TDSE), a parabolic partial differential equation (PDE) of the first degree in time and second degree in a multidimensional space, its dimension being determined by the numbers of electrons in the atom. We present a first-principles approach suitable for two-electron atomic systems (i.e. Helium); the approach is ab-initio, in the sense that no adjustable physical parameters are introduced in the formulation in order to simulate the atom-laser interaction processes. 

More specifically, the PDE is formulated as a linear system of Ordinary Differential Equations (ODE’s) in time. Our problem involves several eigendecompositions of large symmetric matrices of typical dimension n x m ~ 10^4 x 10^4. Initially, we eigendecompose the time-independent system to obtain a basis to expand the propagated solution and form the system of ODEs. The actual time propagation requires the solution of an equation involving the exponential of another large symmetric matrix which we aim to solve using a Krylov subspace method following the Lanczos algorithm to propagate the solution in a stepwise manner. Our aim is to develop an efficient suite of programs for the solution of this problem using parallel and heterogeneous computing.

Poster File URL:  View Poster File


Author Name:  Petr Vacek
Poster Title:  Stopping criteria for coarsest grid solvers in multigrid V-cycle method
Poster Abstract: 

Multigrid methods are frequently used when solving systems of linear equations, applied either as standalone solvers or as preconditioners for iterative methods. Within each cycle, the approximation is computed using smoothing on fine grids and solving on the coarsest grid.

With growth of the size of the problems that are being solved, the size of the problems on the coarsest grid is also growing and their solution can become a computational bottleneck. In practice the problems on the coarsest grid are often solved approximately, for example by Krylov subspace methods or direct methods based on low rank approximation. The accuracy of the coarsest grid solver is typically determined experimentally in order to balance the cost of the solves and the total number of multigrid cycles required for convergence.

We present an approach to analyzing the effect of approximate coarsest grid solves in the multigrid V-cycle method for symmetric positive definite problems. The results are further used to discuss effective stopping criteria for the coarsest grid solvers. The results are illustrated through numerical experiments.

Poster File URL:  View Poster File


Author Name:  Ashwin Adrian Kallor
Poster Title:  An integrated peptidomic pipeline for the discovery of cancer vaccine and immunotherapy candidates
Poster Abstract: 
The treatment of cancer has been revolutionized through the development of immunotherapy and cancer vaccines. Despite their success the number of such therapies is still low due to the vast difference between their predicted and actual efficacy, implying that of the several thousands of candidates identified, less than 1% may be eventually translated into the clinic. Thus, it is necessary to develop novel  computational pipelines that could effectively mine large datasets to screen, identify and validate peptide antigen presentation within cancerous tissues and contrast presentation patterns to those observed in healthy tissues. We demonstrate the functionality and applicability of one such pipeline, which seeks to identify peptide antigens that could a) serve as potential cancer vaccine candidates and b) be predictive of response to immunotherapy and highlight the crucial role that HPC plays in every stage of the pipeline.
Poster File URL:  View Poster File


Author Name:  Romain PEREIRA
Poster Title:  Hybrid and Heterogeneous MPI+OpenMP task-based programming
Poster Abstract: 
The architecture of supercomputers is evolving to expose massive parallelism by considerably increasing the number of compute units per node. At the same time, High-Performance Computing (HPC) users must adapt their applications code to take this complexity into account and remain efficient. One proposed solution is to express application parallelism through a task-based parallel programming model. MPI is a mature standard widely adopted in distributed computation codes, while OpenMP proposes a complete standard for shared-memory programming, including a task programming model. The MPI+OpenMP programming model thus appears as a well-suited solution to tackle performance portability issues while enabling hybrid parallelism on applications. In this poster, we introduce hybrid MPI+OpenMP task-based programming and present a task scheduling strategy that aims at reducing idle periods on a Cholesky factorization applications.
Then, we extend the study of the model by a porting and evaluating LULESH on this programming model.
Finally, we introduce heterogeneity in the model through cooperative task scheduling.
Poster File URL:  View Poster File


Author Name:  Tim Griesbach
Poster Title:  Experiments Towards Parallel Adaptive In-Situ Visualization
Poster Abstract: 

The visualization of simulation data that arises from large- scale numerical simulations is a challenging task. Visualization as post-processing step has the disadvantage of writing large amounts of data to disk, which can be impractically slow. Relying on a dedicated third-party library for visualization often incurs duplicating data or converting it to a prescribed external data structure, as well as a sizable increase in code, executable, and memory complexity.

We develop algorithms to visualize the simulation data using the simulation data structure. Specifically, we work with the widely known distributed adaptive octree structure and extend it to support in-situ visualization. One current approach of ours is to revive the radiosity method, which requires only a re-projection of the radiosity results to visualize the scene from a different view point in the rendered scene. Our algorithm exploits the octree-based data structure by using recursive top-down search algorithms and recursively excluding non-visible surface patches. The AMR data is distributed using the Morton space-filling curve, which we use to parallelize the radiosity system setup and solver. We present a parallel and natively supported radiosity solver operating on a distributed octree data structure.

Poster File URL:  View Poster File