Floating point performance on 16 nodes of the Knights Landing, Knights Mill and Skylake processor generation. Shown is a large set of EDGE’s possible configurations. All results are reported in terms of non-zero operations, contributing to EDGE’s solution. Dark gray bars represent the performance of single seismic forward simulations, while light gray bars show the performance of fused simulations.
The story “LIBXSMM Brings Deep-learning “Lessons Learned” to Many HPC Applications” by Rob Farber features EDGE as one of the first HPC applications, exploiting the Intel Xeon Phi processor for machine learning (Knights Mill). An important takeaway of the article is that only a comprehensive approach leads to optimal performance. Key factors for seismic wave propagation simulations in EDGE are: a) Extensive verification, including floating point precision as a modeling parameter, b) fused simulation technology for the exploitation of inter-simulation parallelism, and c) Just In Time (GIT) generated small sparse-matrix tensor kernels through the library LIBXSMM. The full story is available from the deep learning section of Medium. Further details are given in the slides of the IXPUG Middle East Conference 2018.
EDGE is part of the presentation “Towards Extreme-Scale Nonlinear Earthquake Simulations” at the Seismology of the Americas 2018 conference in Miami, FL. Topics of the presentation are ongoing work, targeting the integration of rupture physics into the code, as well as verification of 32-bit precision and respective optimizations, capable of utilizing deep learning hardware.
All details on the session “Numerical Modeling of Earthquake Ground Motion, Rupture Dynamics and Seismic Wave Propagation” can be found on the event’s homepage. Slides are available from the assets repository
Alex Heinecke’s presentation “Big Applications for Small Matrix Multiplications and Convolutions” at the IXPUG Middle East Conference 2018 at KAUST featured recent optimizations of EDGE for latest silicon. Specifically, Alex covered the efficient use of Knights Mill’s Quad Fused Multiply Add (QFMA) instruction in the inner sparse matrix-tensor kernels of the code.