Challenges and Opportunities in Reshaping Linear Algebra Libraries for HPC/AI Workloads: A Testcase with 3D Unstructured Mesh Deformations
Traditional HPC simulations and AI / Big Data applications face similar challenges when solving extreme-scale scientific problems: bulk synchronous parallelism, expensive data motion, high algorithmic complexity and large memory footprint. Processors and memory technology scaling have mitigated these challenges thanks to an exponential growth in processor performance but only a constant increase in memory speed and capacity. The free lunch is perhaps over as we approach the hard physical limit of silicon. The energy efficiency gap between communication and computation keeps widening and has even forced the hardware and software communities for an immediate action of co-design. We describe the challenges encountered during the last 15-year journey of reshaping high performance linear algebra libraries for massively parallel systems. We explore disruptive numerical algorithms and programming models required to continue supporting HPC applications as well as emerging AI workloads at the dawn of the exascale age. In particular, we assess our implementation using 3D unstructured mesh deformation based on Radial Basis Function interpolation in the context of the HiCMA numerical library. Our HPC software solution achieves significant performance superiority against state-of-the-art implementations on Shaheen-II (based on dual-socket 16-core Intel Haswell nodes), Hawk (based on dual-socket 64-core AMD Epyc Rome nodes), and Fugaku (based on 48-core Fujitsu A64FX nodes) Supercomputers.
Overview
Abstract
Traditional HPC simulations and AI / Big Data applications face similar challenges when solving extreme-scale scientific problems: bulk synchronous parallelism, expensive data motion, high algorithmic complexity and large memory footprint. Processors and memory technology scaling have mitigated these challenges thanks to an exponential growth in processor performance but only a constant increase in memory speed and capacity. The free lunch is perhaps over as we approach the hard physical limit of silicon. The energy efficiency gap between communication and computation keeps widening and has even forced the hardware and software communities for an immediate action of co-design. We describe the challenges encountered during the last 15-year journey of reshaping high performance linear algebra libraries for massively parallel systems. We explore disruptive numerical algorithms and programming models required to continue supporting HPC applications as well as emerging AI workloads at the dawn of the exascale age. In particular, we assess our implementation using 3D unstructured mesh deformation based on Radial Basis Function interpolation in the context of the HiCMA numerical library. Our HPC software solution achieves significant performance superiority against state-of-the-art implementations on Shaheen-II (based on dual-socket 16-core Intel Haswell nodes), Hawk (based on dual-socket 64-core AMD Epyc Rome nodes), and Fugaku (based on 48-core Fujitsu A64FX nodes) Supercomputers.
Brief Biography
Hatem Ltaief is the Principal Research Scientist in the Extreme Computing Research Center at KAUST, where he is also advising several KAUST postdocs and students in their research. Hatem received the engineering degree from Polytech Lyon at the University of Claude Bernard Lyon I and the Master Fellowship Award from the French Government in 2003. He pursued his graduate studies at the University of Houston, where he received the MSc in applied mathematics in 2004 and the PhD degree in computer science in 2008. He worked as a Research Scientist at the Innovative Computing Laboratory in the University of Tennessee Knoxville until 2011. Since he joined KAUST in 2011, he has contributed to the integration of numerical algorithms into mainstream vendors’ scientific libraries such as NVIDIA cuBLAS, NEC Numeric Library Collection, and Cray LibSci. He has been collaborating with domain scientists, e.g., astronomers, statisticians, computational chemists, and geophysicists, to equip their applications to meet the opportunities of exascale. He has received multiple best papers at international conferences, including the Gauss Award at ISC'20. He is the Subject Area Editor of Journal of Parallel Computing. He is the vice-chair of SIAM Activity Group on Supercomputing. He has authored or co-authored over 100 publications in computational science and engineering, numerical analysis, and computer science.