In the previous engagement, Minimal Metrics studied and successfully accounted for the performance differences between compilers of multi-dimensional stencil computations on Intel’s Jake Town and Ivy Town architectures. In that particular case, the Cray and Intel compilers were used and the work was primarily performed on Volta, the Cray CX30m. This machine is just one of the Advanced System Technology Test Beds present in the National Nuclear Security Agencies (NNSA) Advanced Simulation and Computing Project. These machines represent small sections of the design space on the path to an exascale computer, meaning a machine capable of a billion, billion (or 10 to the 18th power) floating point operations per second.
For this new engagement, Minimal Metrics will be working closely with the test bed team to do performance studies of codes being developed to run on these (and tomorrow’s exascale) machines. The data gathered is intended not only to help guide performant software design at the lab but also to provide both qualitative and quantitative feedback to the vendors as the architectures mature. As part of this effort, Minimal Metrics will be prototyping an hardware performance instrumentation infrastructure for one of the new run-time systems being developed for Exascale. These systems, like HPX and Qthreads, provide a lightweight abstraction of parallelism more suited to systems of this size. By integrating formal hardware performance measurement at the API level, performance metrics can be more naturally explained and understood by the code developer. The end result is that the developers write better code without the burden of understanding every level of the software stack and how thread-level parallelism is implemented. If all this sounds like magic, then you might want to read up on Coroutines or watch this video on Google’s Go language for some background.
We are super excited to continue our involvement with Sandia and the ASC team.