Data parallel programming: The promises and limitations of high performance fortran
Exploiting the full potential of parallel architectures requires a cooperative effort between the user and thelanguage system. There is a clear a trade-off between the amount of information the user has to provide and the amount of effort the compiler has to expend to generate optimal code. At one end of the spectrum are message passing languages where the user has full control and has to provide all the details while the compiler effort is minimal. At the other end of the spectrum is sequential languages where the compiler has the full responsibility for extracting the parallelism. For the past few years, we have been exploring median solutions, such as Kali and Vienna Fortran, which provide a fairly high level environment for distributed memory machines while giving the user some control over the placement of data and computation. These efforts have been very influential in the design of High Performance Fortran (HPF), an international effort to build a set of standard extensions for exploiting a wide variety of parallel architectures.
The common approach in these languages is to provide language constructs or directives which allow the user to carefully control the distribution of data across the memories of the target machine. However, the computation code is written using a global name space with no explicit message passing statements. It is then the compiler's responsibility to analyze the a distribution annotations and generate parallel code inserting communication statements where required by the computation. Thus, the user can focus on high-level algorithmic issues while allowing the software to deal with the complex low-level details.
Initial experience with HPF (and related languages) has shown that it provides excellent support for simple data parallel algorithms. However, it is clear that there are a number of scientific codes for which HPF may not be adequate. In particular, HPF may not have enough expressive power to provide all the information needed by the compiler to generate the most optimal code. Examples include, codes using block-structured and unstructured grids, adaptive computations and multi-disciplinary applications which exhibit multiple types of parallelism.
In this talk, we will provide a short overview of HPF and its capabilities. We will then discuss its limitations giving examples of codes for which HPF may not be adequate. We will also explore some extensions of HPF which provide support for such codes.