Automatically transform arbitrary scalar functions into their SIMD equivalents.
Integrate into any data-parallel language to exploit SIMD instruction sets.
Implemented in LLVM: source-language and target-platform independent.
Data-parallel programming languages are an important component in today's parallel computing landscape. Among those are domain-specific languages like shading languages in graphics (HLSL, GLSL, RenderMan, etc.) and "general-purpose" languages like CUDA or OpenCL. In order to achieve maximum performance for such languages on CPUs one has to exploit both multi-threading and the additional intra-core parallelism provided by the SIMD instruction set of those processors (like Intel's SSE, AVX, and LRBni instruction sets). This intra-core parallelism is becoming increasingly important with increasing SIMD register sizes (e.g. SSE = 128bit, AVX = 256bit, LRBni = 512bit).
Whole-Function Vectorization is an algorithm that transforms a scalar function in such a way that it computes W executions of the original code in parallel using SIMD instructions (W is the chosen vectorization factor which usually depends on the target architecture's SIMD width). Our implementation of the algorithm ("libWFV") is a language- and platform-independent code transformation that works on low-level intermediate code given by a control-flow graph in SSA form (LLVM bitcode).
Highlights of the implementation include the ability to deal with arbitrary control flow structures even on architectures without explicit predicated execution, advanced analyses and algorithms to exploit "uniform" control flow, robust handling of non-vectorizable operations, and a slim interface for efficient integration into LLVM-based compilers.
We have successfully integrated libWFV into various applications, including our own shading system and OpenCL driver as well as commercial systems of industry partners.
The basic algorithm has been published at CGO 2011, an extension appeared at CC 2012 (see Publications).
The LLVM-based implementation of the Whole-Function Vectorization algorithm (libWFV) as well as the OpenCL driver are publicly available under LLVM license (BSD style).
The first alpha versions of libWFV and the OpenCL driver are available on github.
If you are interested in trying out the second alpha version of libWFV, contact Ralf Karrenberg.Download libWFV Download WFVOpenCL
- Presburger Arithmetic in Memory Access Optimization for Data-Parallel Languages - FroCoS 2013 2013
Karrenberg, R., Kosta, M. and Sturm, T.
Frontiers of Combining Systems, 2013. [bib]
- Improving Performance of OpenCL on CPUs - CC 2012
Karrenberg, R. and Hack, S.
Compiler Construction, 2012. [url] [bib]
- Whole Function Vectorization
Karrenberg, R. and Hack, S.
International Symposium on Code Generation and Optimization, 2011. [doi] [url] [slides] [bib]