Seminar - Optimized Unstructured and Structured Finite Volume Simulations using GPU and Phi Coprocessors using Dual Layer Parallelization By Dr. Matthew Ross SMITH
Date: 27 November 2014 (Thursday)
Time: 10:30 am – 11:30 am
Venue: EF311, The Hong Kong Polytechnic University
The transient simulation of compressible gas flow is an often complex and tedious procedure. In order to resolve flow features important to the physics of the simulated fluid, we often require fine resolutions in some particular regions of a large flow field. The resulting large number of computational cells – in this case, finite volumes – leads to a prohibitive computational expense when using a single CPU core to simulate transient flow. Hence, the need for parallel computation arises. Commonly available commercial software is often capable of running in parallel and is available on high performance computing clusters in Taiwan. As an alternative to this form of conventional computing, however, modern developments in coprocessor technologies have led to the increased use of several pieces of hardware – namely the Graphics Processing Unit (GPU) and the Intel Phi device. Such devices have a high compute density – a large amount of computational capacity for a relatively small amount of capital expense, power and physical space requirements. Hence, such devices are attractive for rapid simulations which can then be followed up (if required) by conventional large scale computation on a conventional HPC system. Presented here is the application of two single solvers – an unstructured (tetrahedral) and structured transient Finite Volume solver – to parallelization using GPU and Phi devices. Comparison in the performance due to application of explicit dual layer parallelization using compiler intrinsics is discussed. Comparison of the performance and implementation difficulties encountered will be discussed, together with application to an industrial research problem. The presentation will show that (i) current compiler auto-vectorization optimizations are insufficient for guaranteeing optimal performance, and (ii) higher performance is possible on structured grids when compared to unstructured grids for heterogeneous computing.
Background / Summary
Grew up in Brisbane before studying Mechanical and Space Engineering at the University of Queensland. Completed two theses in the final year (instead of the required single thesis) - one working with Dr. Peter Jacobs on finite volume CFD, the other while working at Boeing on the development of an in-house solver for air flow through the F-111's environmental control system using a specialized Newton-Raphson algorithm. Received the 2002 UQ Mechanical Engineering Prize (total of approximately $40,000 USD) to engage postgraduate research with Dr. Michael Macrossan in hybrid rarefied / continuum flows. It was then that I met and fell in love with my Taiwanese wife, after which I left my graduate studies early (while still accepting a master's degree from UQ in 2003) and moved to Taiwan. We later got married and moved back to Australia where I worked as the lead production and design engineer with an Australian hydraulics and mining machinery manufacturing company. In March 2006 I began accepted an Australian Postgraduate Award (APA) scholarship and began my PhD, which I completed in November 2007 (thus obtaining the fastest PhD at UQ in Mech. Eng in the school's history). In December 2007 I commenced a post-doc at NCTU in Taiwan with Prof. Chong-Shinn Wu. In August 2008 I accepted a position as an associate researcher at the National Center for High performance Computing (NCHC) where I went on to lead a team into parallel computing research, which included GPU algorithm development. In 2012 I left the NCHC to accept a faculty position at NCKU's mechanical engineering department, where I direct the High Performance Heterogeneous Computing Laboratory.