AdaptMemBench: Application-Specific Memory Subsystem Benchmarking

Abstract

Application performance often depends on achieved memory bandwidth. Achieved memory bandwidth varies greatly given specific combinations of instruction mix and order, working set size, and access pattern. Achieving good application performance depends on optimizing these characteristics within the constraints of the given application. This task is complicated due to the lack of information about the impact of small changes on the performance. Some information is provided by benchmarks, but most memory benchmarks are confined to simple access patterns that are not representative of patterns found in real applications. This work presents AdaptMemBench, a configurable benchmark framework designed to explore the performance capabilities of compute kernels extracted from applications. AdaptMemBench provides a framework to emulate application-specific memory access patterns. A set of templates manages standard timing and measurement tasks. The build system accommodates the polyhedral model, making the framework provides a convenient testbed for potential code optimizations. AdaptMemBench supports reproducibility in experimental results and facilitates sharing results. Given that small changes in benchmarks have a large impact on performance a common framework isolates the measured portions of code. This eases the process of rerunning experiments and porting to new systems. The strengths of AdaptMemBench are demonstrated through a collection of case studies on common compute kernels including: streaming patterns, multidimensional stencils, and sparse matrix operations.

Publication
AdaptMemBench: Application-Specific Memory Subsystem Benchmarking