Category Archives: HPC

Sadly, Minimal Metrics has been unable to secure development funding for PerfMiner. We apologize to the many interested parties and thank you for your time an support. Should you be interested in seeing the pitch deck, please contact us at i n v e s t at minimal metrics.

Minimal Metrics has partnered with Sandia National Laboratory to build a performance tool for tomorrow’s Exascale-class supercomputers – and to help inform their design. The system has its roots in a prototype first developed by Philip and colleagues at the Parallel Dator Centrum at the Royal Institute of Technology in Stockholm Sweden. It is a set of software designed to optimize the entire HPC software and hardware ecosystem of an institution. It is capable of analyzing individual HPC applications and their threads of execution as well as entire workloads, groups, users and multiple disjoint systems. It accomplishes this by integrating the best-of-breed dashboarding and visualization methodologies with state-of-the-art performance data collection. detailed performance metrics from the underlying architecture, including memory bandwidth, memory hierarchy behavior and latencies, vectorization, hardware resource utilization, computational intensity and instruction mix are provided. The system is able to identify issues of on and off-node scaling, including message passing performance, load-imbalance, false-sharing, and…

Read more

Here at Minimal Metrics our customers are often our friends – and we have lots of them around the world. Among our favorite people to work with are the brilliant people over at Reservoir Labs, makers of R-Scope among other neat bits of technology. Reservoir also has extensive compiler development expertise. When tasked with optimizing the SPEC CPU benchmarks for a brand new 64-bit multicore (non-Intel) processor, they reached out to the Minimal Metrics crew for guidance. The SPEC benchmarks are tricky animals, solely because one cannot modify the source code to improve its performance; all improvements have to be done by the compiler! On top of that, the source code is ugly, grossly inefficient and poorly documented. In many ways, the SPEC really are the most representative benchmarks in the industry because of those three facts alone. Nevertheless, in order to improve generated code, one still has to understand why the processor is performing the…

Read more

In the previous engagement, Minimal Metrics studied and successfully accounted for the performance differences between compilers of multi-dimensional stencil computations on Intel’s Jake Town and Ivy Town architectures. In that particular case, the Cray and Intel compilers were used and the work was primarily performed on Volta, the Cray CX30m. This machine is just one of the Advanced System Technology Test Beds present in the National Nuclear Security Agencies (NNSA) Advanced Simulation and Computing Project. These machines represent small sections of the design space on the path to an exascale computer, meaning a machine capable of a billion, billion (or 10 to the 18th power) floating point operations per second. For this new engagement, Minimal Metrics will be working closely with the test bed team to do performance studies of codes being developed to run on these (and tomorrow’s exascale) machines. The data gathered is intended not only to help guide…

Read more

Philip Mucci from Minimal Metrics will be attending the SC2014 show in New Orleans, LA from Nov 17 to the 20th. Get in touch with us if you’d like to schedule a meeting. If not, look for Phil in the booths of Texas Instruments, Scalable Informatics or the University of Tennessee.

Minimal Metrics has partnered with Reservoir Labs to work on the prototyping of a entirely new computer architecture and the development of algorithms to run on it. About Reservoir Labs: Reservoir solves the critical technology challenges of high performance computing. Our advanced computing and communications products, thought-leading research and novel technologies have made Reservoir a trusted and respected partner of corporate clients, government agencies, and leading researchers. We thrive on opportunities to learn and create as we develop and deliver groundbreaking science and security solutions. This particular project is indeed groundbreaking science, and as such, we can’t say more  other that we’re thrilled to be part of this effort, and to be working with our friends at Reservoir Labs once again.

Minimal Metrics has signed a contract with Affinity Systems to assist in the optimization and tuning of a streaming data analytics application for the IESO, the Canadian Independent Electricity System Operator. As stated on their website: The Independent Electricity System Operator (IESO) balances the supply of and demand for electricity in Ontario and then directs its flow across the province’s transmission lines. The IESO works at the heart of Ontario’s power system, connecting all participants − generators that produce electricity, transmitters that send it across the province, retailers that buy and sell it, industries and businesses that use it in large quantities, and local distribution companies that deliver it to people’s homes. Minimal Metrics will assess and optimize the performance of a advanced streaming-data appliance responsible for the processing of millions of smart-meter readings taken constantly across the entire power grid. This appliance, built by Affinity Systems of Ontario, consists of a small…

Read more

Sandia National Laboratories has just signed Minimal Metrics to help in their performance analysis of their next-generation high-performance computing platforms. In particular, the Application Performance Modeling and Analysis team is interested in studying the effects of stall cycles on application performance. More specifically, parallel scientific simulation kernels running on Intel’s Sandy Bridge systems. Stall cycles, or periods of time where the processor is not producing any results, are notoriously difficult to account for. While the hardware architects have added some hardware instrumentation to help accomplish this, using that instrumentation requires extensive background and understand of the specific microprocessor’s architecture. Sandia, while possessing leading expertise in application performance, is looking to leverage Minimal Metrics’ unparalleled experience in the field in an attempt to further their research. Minimal Metrics will be working closely with Sandia’s application developers and systems teams to ensure that their code and systems are working as optimally as…

Read more

Texas Instruments has renewed their contract with Minimal Metrics to deliver optimized numerical libraries for their next generation of high-performance microprocessors. The Keystone 2, is a revolutionary new product that integrates four cores of an ARM A15 with 8 cores of the  C6678 DSP on the same die. Offload to the DSPs can be accomplished either through the use of OpenCL or a subset of the OpenMP 4 specification, OpenMPACC. For this contract, Minimal Metrics will be providing optimized, hybrid, ARM+DSP-accelerated versions of the following libraries. These libraries are critical elements in the middleware of high-performance numerical simulations. FFTW – Fast Fourier transform BLAS/ATLAS – Vector/vector, matrix/vector and matrix/matrix arithmetic LAPACK – Dense library algebra LIBFLAME – Dense linear algebra Applications are accelerated using new versions of the libraries transparently, without requiring any changes to the source or object code. In this way, HPC applications ported to the ARM gain vast increases in performance by…

Read more

Minimal Metrics has successful completed their work with Texas Instruments to deliver an optimized and complete BLAS (Basic Linear Algebra Subroutines) library. The BLAS are the building blocks of many high performance numerical algorithms – and are known as the keystone to high performance in scientific simulations. The Minimal Metrics team worked closely with the ATLAS group in order to provide a complete and optimized implementation much faster than could be coded by hand. The target platform for the work was the C6678 DSP, aka Keystone, an 8-core DSP with industry leading floating point performance per watt. This library will also be used for the Keystone 2, a revolutionary new product that integrates the ARM A15 CPU with 8 cores of the C6678. The DSP architecture is substantially different than coding for “traditional” RISC/CISC processors. The Keystone series provides a number of architectural features to enhance performance – including hardware-assisted software…

Read more

10/14