Automatic SIMD code generation for complex arithmetic reduction for architecture lacking cross data-path support
For machines with SIMD units without cross data-path support, such as VMX or
SPU, implementing operation such as complex multiply can be expensive in the
number of data reorganization operations that are needed. However, we have
observed that reduction for complex multiply is common in user code and
reduction itself presents a good opportunity for minimizing the number of data
reorganization operations. We therefore present our novel approach in
efficiently SIMDizing complex multiply reduction using VMX as a test platform,
and demonstrate that it brings significant performance improvement in
comparison to a naive implementation.
Greg Steffan
Last modified: Wed Aug 26 17:58:51 EDT 2009