Minimal Metrics has successful completed their work with Texas Instruments to deliver an optimized and complete BLAS (Basic Linear Algebra Subroutines) library. The BLAS are the building blocks of many high performance numerical algorithms – and are known as the keystone to high performance in scientific simulations.
The Minimal Metrics team worked closely with the ATLAS group in order to provide a complete and optimized implementation much faster than could be coded by hand.
The target platform for the work was the C6678 DSP, aka Keystone, an 8-core DSP with industry leading floating point performance per watt. This library will also be used for the Keystone 2, a revolutionary new product that integrates the ARM A15 CPU with 8 cores of the C6678.
The DSP architecture is substantially different than coding for “traditional” RISC/CISC processors. The Keystone series provides a number of architectural features to enhance performance – including hardware-assisted software pipelining and software programmable data-movement. However, such added complexity means that traditional implementations fall flat in terms of performance and thus must be extensively hand-tuned. Minimal Metrics worked on using every available feature of the DSP in hand-tuning the “kernels” of the ATLAS library. The kernels are the lowest-level operations upon which all others are based. ATLAS then generates the higher layers of code to provide the full BLAS library.
The graph below demonstrates the performance of a matrix multiply operation, where one of the matrices has been transposed. The performance of these routines is equivalent to or better than that which was produced by hand by Ti’s engineers – the difference being now that these are leveraged throughout the entire optimized library.
Minimal Metrics is proud to have Texas Instruments as one of their partners in delivering high performance solutions. Your organization could be next. Let us take your performance to the next level, contact us today.