Many codes that are still in production use trace their origins to code developed during the vector supercomputing era from the 1970’s to 1990’s. The recently released NEC Vector Engine (VE) provides an opportunity to exploit this vector heritage. The VE can provide state-of-the-art performance without a complete rewrite of a well-validated codebase. Programs do not require an additional level of abstraction to use the capabilities of the VE. Given the time and cost required to port or rewrite codes, this is an attractive solution. Further tuning as described in this paper can realize maximum performance.
The goal was to assess how the NEC VE’s performance and ease of use compare with that of existing CPU architectures (e.g. AMD, Intel) using a legacy Computational Fluid Dynamics (CFD) solver, FDL3DI written in Fortran. FDL3DI was originally vectorized and optimized for efficient operation on vector processing machines. The NEC VE’s architecture, high memory bandwidth and ability to compile Fortran was the primary motivation for this evaluation.
Through profiling and modifying the key compute kernels using typical vector and NEC VE specific optimizations, the code was successfully able to utilize the vector engine hardware with minimal modification of the code. Scalar code developed later in FDL3DI’s lifetime was substituted with vector friendly implementations. With optimizations, this vector architecture was found to be 3× faster for main-memory bound problems with the CPU architectures competitive for smaller problem sizes. This performance using standard well-known techniques is considered to be a key benefit of this architecture.
Keywords
- Vectorization
- CFD
- Optimization