Helium: a transparent inter-kernel optimizer for OpenCL

Thibaut Lutz, Christian Fensch, Murray Cole

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)
178 Downloads (Pure)


State of the art automatic optimization of OpenCL applications focuses on improving the performance of individual compute kernels. Programmers address opportunities for inter-kernel optimization in specific applications by ad-hoc hand tuning: manually fusing kernels together. However, the complexity of interactions between host and kernel code makes this approach weak or even unviable for applications involving more than a small number of kernel invocations or a highly dynamic control flow, leaving substantial potential opportunities unexplored. It also leads to an over complex, hard to maintain code base.

We present Helium, a transparent OpenCL overlay which discovers, manipulates and exploits opportunities for inter-and intra-kernel optimization. Helium is implemented as preloaded library and uses a delay-optimize-replay mechanism in which kernel calls are intercepted, collectively optimized, and then executed according to an improved execution plan. This allows us to benefit from
composite optimizations, on large, dynamically complex applications, with no impact on the code base. Our results show that Helium obtains at least the same, and frequently even better performance, than carefully handtuned code. Helium outperforms hand-optimized code where the exact dynamic composition of compute kernel cannot be known statically. In these cases, we demonstrate speedups of up to 3x over unoptimized code and an average speedup of 1.4x over hand optimized code.
Original languageEnglish
Title of host publicationProceedings of the 8th Workshop on General Purpose Processing Using GPUs
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Number of pages11
ISBN (Electronic)978-1-4503-3407-5
Publication statusPublished - 7 Feb 2015
Event8th Workshop on General Purpose Processing Using GPUs - San Francisco, United States
Duration: 7 Feb 20158 Feb 2015


Conference8th Workshop on General Purpose Processing Using GPUs
Abbreviated titleGPGPU 2015
Country/TerritoryUnited States
CitySan Francisco


  • JIT compilation
  • OpenCL
  • inter-kernel optimization
  • profiling
  • staging


Dive into the research topics of 'Helium: a transparent inter-kernel optimizer for OpenCL'. Together they form a unique fingerprint.

Cite this