Mint: Realizing CUDA performance in 3D stencil methods with annotated C
In: Proceedings of the 25th International Conference on Supercomputing (ICS'11), ed. by Lowenthal, David K. and de Supinski, Bronis R. and McKee, Sally A., pp. 214-224, ACM Press (ISBN: 978-1-4503-0102-2)
We present Mint, a programming model that enables the non-expert to enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint targets stencil methods, which are an important class of scientific applications. We have implemented the Mint programming model with a source-to-source translator that generates optimized CUDA C from traditional C source. The translator relies on annotations to guide translation at a high level. The set of pragmas is small, and the model is compact and simple. Yet, Mint is able to deliver performance competitive with painstakingly hand-optimized CUDA. We show that, for a set of widely used stencil kernels in two and three dimensions, Mint realized 80% of the performance obtained from aggressively optimized CUDA on the 200 series NVIDIA GPUs. Our optimizations target three dimensional kernels, which present a daunting array of optimizations.