报告题目:Dynamics Sub-cell-level Parallelism of Fully-implicit CFD Algorithms on GPGPUs
报 告 人:罗力翔博士(Research Scientist),美国北卡罗来纳州州立大学
时 间:2016年01月05日(周二)下午14:00-16:00
地 点:江宁乐学楼(1116室)
主办单位:力学与材料学院
报告摘要:
The performance stagnation of CPU-based HPC systems and increasing deployment of GPGPU-based systems indicate a clear trend towards exascale via massively parallelism. However, most research efforts on utilizing GPGPUs for CFD applications only achieve parallelism at the grid cell level. Consequently, the performance of fully-implicit CFD schemes is consistently found to be very poor on GPGPUs, leading to a prevailing (mis)perception that GPGPUs are unsuitable for fully-implicit schemes. During our recent research efforts on FV and DG applications, we realized that the large memory access amount (proportional to the square of DOF per grid cell) of implicit algorithms leads to serious memory boundedness - a fundamental bottleneck on GPGPUs, which can only be resolved by adopting fine-grained data parallelism at a sub-cell level. More importantly, the necessity of sub-cell level parallelism will only increase as more high-order CFD methods are adopted. Our successful attempt in developing a fine-grained block-incomplete LU factorization and Jacobian matrix filling algorithms removes two key roadblocks for implementing efficient fully-implicit CFD on GPGPUs. Supported by other fine-grained algorithms, the exceptional performance of the ported code proves that GPGPUs can carry out fully-implicit CFD schemes as efficiently as explicit schemes. Our research efforts have been published by Nvidia as a Developer Success Story, the only application in the computational physics domain to be selected for their OpenACC promotion campaign in 2015. Finally, massively data parallelism forces everyone to approach PDE solution techniques from a more global viewpoint, furthering the understanding of the very basic concepts of numerical methods.