Poster ID:
Poster Title:
Poster Abstract:
Poster Flle: [[Poster Flle]]
Author first name:
Author surname:



Poster Title:  Automatic Parallelization for Shared Memory Scientific Multiprocessing An Analysis & Comparison
Poster Abstract: 

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel shared memory management pitfalls and architecture heterogeneity. To ease this process, many automatic parallelization compilers were created. In this paper we focus on three source-to-source compilers - AutoPar, Par4All and Cetus - which were found to be most suitable for the task, point out their strengths and weaknesses, analyze their performances, inspect their capabilities and suggest new paths for enhancement. We analyze and compare the compilers' performances over several different exemplary test cases, with each test case pointing out different pitfalls, and suggest several new ways to overcome these pitfalls, while yielding excellent results in practice.

Poster ID:  A-16
Poster File:  PDF document poster-auto-parallelization-1.pdf
Poster Image: 
Poster URL:  https://www.cs.bgu.ac.il/~orenw/openmpcon_presentation18.pdf


Poster Title:  A localized tensor-structured algorithm for DFT calculations
Poster Abstract: 

Kohn-Sham Density Functional Theory is commonly used for addressing the ab initio electronic structure calculation. Ab initio electronic structure calculation enables us to understand the microscopic behaviors of materials due to quantum effects. However, the computational costs and the scaling with systems for performing DFT calculations is extremely high with O(N^3) for the most widely used plane wave basis function, where N accounts for the number of electrons. It thus becomes important to improve the efficiency of the calculation by reducing the order of the scaling and floating point arithmetic cost. In our research, we used the tensor algorithm to construct a set of basis with a resemblance to the exact solution to reduce the scaling with the system size to sublinear. In the same time, we construct a set of localized basis function representing the same eigenspace of the basis. The localized basis thus improves the sparsity of the resulting matrix to be solved with spectrum decomposition and reduces the floating point operations. With the tensor-structured technique, we could bring the DFT calculation to sublinear scaling and improves the arithmetic performance of the calculation.

Poster ID:  A-7
Poster File: 
Poster Image: 
Poster URL: 


Poster Title:  Efficient Exploration in Distributed Reinforcement Learning
Poster Abstract: 

Reinforcement Learning methods have great potential in solving sequential decision making tasks. They can learn adaptive optimal control policies in areas that traditionally employ heuristic methods. However, contemporary RL methods are sample inefficient and the real-life problems have a very large state space to explore. We are looking at control application areas that offer a large population of RL agents that can collective learn by sharing their experiences. We research on efficient means of exploration and accelerating the learning process through distributed computing. Energy Harvesting Wireless Sensor Networks for Internet of Things is one such interesting application scenario. The sensor nodes need to learn to optimize their performance with respect to a number of parameters such as energy consumption, Quality of Service (QoS), utility of the sensory data etc. Furthermore, their working environment is highly unpredictable and diverse. Our research leverages the population of these nodes for efficient e-greedy exploration and learn optimal energy management policies to ensure optimal performance for perpetual operation. Our results show a 50x increase in learning performance at one-third of the cost by using DiRL. Future works include methods to decentralize the learning process and exploit the intelligence at the nodes.

Poster ID:  C-17
Poster File:  PDF document Shaswot_SHRESTHAMALI.pdf
Poster Image: 
Poster URL: 


Poster Title:  Divergence Free Augmented Immersed Boundary Method
Poster Abstract: 

Here,  we address the effect of the divergence-free interpolation and the high order discretizations to suppress the spurious force oscillations in Fluid-Structure interaction problem and heat transfer as well.  These problems usually suffer from nonphysical oscillations, especially in forces and pressure. To tackle this problem one can either impose a proper boundary condition for pressure equation or use a divergence-free interpolation method for velocities. Having developed a novel multi-grid Poisson solver for incompressible Navier Stokes solver which imposes the appropriate boundary condition at the immersed boundary surface for the pressure variable, a more consistent moving least square immersed boundary library is going to be developed to impose the desired divergence free boundary condition. This solver is supposed to be used for fluid-structure interaction studies with large deformation bodies and heat transfer in the complex domains. The proposed method also has competitive advantages over other methods like ghost-cell and cut-cell because it can be implemented completely independently from the fluid solver. This is in many ways akin to the continuous forcing techniques, however, enjoys the capability of the mentioned discrete forcing methods and results in the method being suited for 3D and unstructured-grids implementation. 

Poster ID:  B-11
Poster File:  PDF document Riken_Sem.pdf
Poster Image: 
Poster URL: 


Poster Title:  Yoga-Veganism: Correlation Mining of Twitter Health Data.
Poster Abstract: 

Nowadays social media is a huge platform of data. People usually share their interest, thoughts via discussions, tweets, status. It is not possible to go through all the data manually. We need to mine the data to explore hidden patterns or unknown correlations, find out the dominant topic in data and understand people’s interest through the discussions. In this work, we explore Twitter data related to health. We extract the popular topics under different categories (e.g. diet, exercise) discussed in Twitter via topic modeling, observe model behavior on new tweets, discover interesting correlation (i.e. Yoga-Veganism). We evaluate accuracy by comparing with ground truth using manual annotation both for train and test data.



Poster ID:  D-4
Poster File: 
Poster Image: 
Poster URL: 


Poster Title:  High Thoughput Search for New NMC Cathodes
Poster Abstract: 
Layered Li(Ni,Mn,Co)O2 (NMC) presents an intriguing ternary alloy design space for optimization as a cathode material in Li-ion batteries. In the case of NMC, however, only a select few proportions of transition metal cations have been attempted and even fewer have been adopted on a large scale. Recently, the high cost and resource limitations of Co have added a new design constraint and high Ni-containing NMC alloys have gained enormous attention despite possible performance trade-offs. Although the limited collection of NMC cathodes have been successful in providing the performance needed for many applications, specifically electric vehicles, this concern around Co requires further advancement and optimization within the NMC design space. High throughput computation is used to search the ternary phase diagram with an emphasis on high-Ni, and thus low Co, containing compositional phases. This is done through the use of density functional theory training data fed into a reduced order model Hamiltonian that accounts for effective electronic and spin interactions of neighboring transition metal atoms at various lengths in a background of fixed composition and position lithium and oxygen atoms. This model can then be solved to include finite temperature thermodynamics into a convex hull analysis to understand the regions of ordered and disordered solid solution as well the transition metal orderings within the ordered region of the phase diagram. We also provide a method to propagate the uncertainty at every level of the analysis to the final prediction of thermodynamically favorable compositional phases thus providing a quantitative measure of confidence for each prediction made.
Poster ID:  D-12
Poster File:  PDF document Houchins IHPCSS.pdf
Poster Image: 
Poster URL: 


Poster Title:  Astrophysics and Cosmology: Hydrodynamics and Kinetics
Poster Abstract: 

Hydrodynamical codes have proven to be useful in astrophysics and cosmology. They have successfully reproduced the structure we observe over a wide range of length scales ranging from clusters of galaxies to stellar systems. Although widely applicable, one pillar of the Eulerian/NS formalism, namely the assumption of a small Knudsen number (high local momentum diffusivity) does not always hold however. The last year saw much progress in the development of new algorithms that allow for non-equilibrium flows of arbitrary Knudsen number. In this poster, I will briefly review how my lab and I use hydrodynamics as well as review these new algorithms and plans to apply them to study astrophysical phenomenon.


Poster ID:  A-10
Poster File: 
Poster Image: 
Poster URL: 


Poster Title:  Uncertainty in Supernova Nucleosynthesis
Poster Abstract: 

Core-collapse supernovae are the explosive deaths of massive stars powered by the gravitational collapse of their core. The result is a supernova remnant and neutron star whose properties can be measured with astrophysical observation. These explosions are multi-physics multi-scale problems that require bleeding edge computational science techniques to simulate. However, the exact explosion mechanism is still not known, and the scientific community is limited by constraints of computation and observation. We employ a study on three key explosion parameters and their effect on the explosive nucleosynthesis and final yields. Using a broad set of spherically symmetric core-collapse simulations, we examine the effects of stellar mass, explosion energy, and remnant mass to place constraints on the engine and nucleosynthetic uncertainty of core-collapse supernovae.



Poster ID:  A-12
Poster File:  application/vnd.oasis.opendocument.presentation ihpcss2019.odp
Poster Image: 
Poster URL: 


Poster Title:  LES/PDF of Sandia flame D using a pre-partitioned adaptive chemistry (PPAC) methodology
Poster Abstract: 

A pre-partitioned adaptive chemistry (PPAC) methodology has been proposed recently by Liang et al. (Combustion and Flame, 2015) for reducing the CPU time and memory requirement of particle PDF methods. PPAC generates a library of reduced kinetic models in an offline preprocessing stage. At runtime, these reduced models are dynamically utilized to perform reaction integration. The particles retain only the reduced skeletal representation during the adaptive simulation, leading to memory reduction and the use of reduced models for reaction integration leads to a reduction in CPU time. PPAC has been augmented by coupling it with in-situ adaptive tabulation (ISAT) to further improve upon the CPU time reduction (Newale et al., 10th US National Combustion Meeting, 2017). The testing of PPAC and PPAC-ISAT in these works was done in a simplified PaSR configuration. This works examines the performance of PPAC methodology in a LES/PDF simulation of Sandia flame D. We show that the PPAC methodology leads to a significant reduction in the memory requirements. This is especially due to the use of a highly reduced mechanism for the coflow, which covers a large section of the computational domain. The PPAC methodology is also shown to provide a sizable reduction in the CPU time through the use of reduced mechanisms for performing reaction integration. 

Poster ID:  B-16
Poster File: 
Poster Image: 
Poster URL: 


Poster Title:  Stairway to Haline: GPU Acceleration of a 2D Double-Diffusive Convection Code
Poster Abstract: 

Thermohaline staircases are large-scale oceanic structures spanning depths of hundreds of metres. Their name comes from the distinctive staircase-like pattern that appears when temperature (or salinity) is plotted with depth, appearing as thick steps of relatively constant temperature and salinity between thin regions of rapid changes. The precise details of how these staircases form are still debated, however it's well known they require the interaction between temperature and salinity gradients known as double-diffusive convection.

The main force in driving convection is buoyancy. Since differences in both temperature and salinity produce differences in density, gradients in either can drive convection. When hot, salty water sits above cold, fresh water, the resultant double-diffusive convection is called salt-fingering, known to be a potential mechanism in the development and maintenance of thermohaline staircases. In the thin regions between the thick, constant-temperature and constant-salinity steps in thermohaline staircases, it's thought that salt-fingering is the main process that efficiently transports heat and salt across the thin region, maintaining the structure of the entire staircase. While the thickness of a single step can be on the order of tens of metres, the thickness of a salt-fingering layer can be on the order of centimetres.

This poster focusses on a proof-of-concept CUDA acceleration of a code used to simulate double-diffusive convection in 2D, with a view to studying thermohaline staircases. The main algorithm is made up of a vertical finite-difference expansion and a horizontal spectral expansion, with the nonlinear interactions of spectral modes handled using a Galerkin method. To properly resolve the smallest scales, extremely high spatial resolution is required, hence the use of GPUs.

Although the algorithm is fully functional and accelerated on a single node using OpenMP, the GPU acceleration is a work in progress, with the algorithm mostly, but not totally, implemented. Preliminary profiling of individual components show an acceleration of between 10-100x.

Poster ID:  B-17
Poster File: 
Poster Image: 
Poster URL: