Monthly Archives: July 2014

Sandia National Laboratories has just signed Minimal Metrics to help in their performance analysis of their next-generation high-performance computing platforms. In particular, the Application Performance Modeling and Analysis team is interested in studying the effects of stall cycles on application performance. More specifically, parallel scientific simulation kernels running on Intel’s Sandy Bridge systems. Stall cycles, or periods of time where the processor is not producing any results, are notoriously difficult to account for. While the hardware architects have added some hardware instrumentation to help accomplish this, using that instrumentation requires extensive background and understand of the specific microprocessor’s architecture. Sandia, while possessing leading expertise in application performance, is looking to leverage Minimal Metrics’ unparalleled experience in the field in an attempt to further their research. Minimal Metrics will be working closely with Sandia’s application developers and systems teams to ensure that their code and systems are working as optimally as…

Read more

Texas Instruments has renewed their contract with Minimal Metrics to deliver optimized numerical libraries for their next generation of high-performance microprocessors. The Keystone 2, is a revolutionary new product that integrates four cores of an ARM A15 with 8 cores of the  C6678 DSP on the same die. Offload to the DSPs can be accomplished either through the use of OpenCL or a subset of the OpenMP 4 specification, OpenMPACC. For this contract, Minimal Metrics will be providing optimized, hybrid, ARM+DSP-accelerated versions of the following libraries. These libraries are critical elements in the middleware of high-performance numerical simulations. FFTW – Fast Fourier transform BLAS/ATLAS – Vector/vector, matrix/vector and matrix/matrix arithmetic LAPACK – Dense library algebra LIBFLAME – Dense linear algebra Applications are accelerated using new versions of the libraries transparently, without requiring any changes to the source or object code. In this way, HPC applications ported to the ARM gain vast increases in performance by…

Read more

2/2