Parallelism Detection in Nested Loops
Loop transformations have been shown to be useful for extracting parallelism from regular nested loops for a large class of machines, from vector machines and VLIW machines to multiprocessor architectures. Of course, each type of machine corresponds to a different optimized code; depending on the memory hierarchy of the target, the granularity of the generated code must be carefully chosen so that memory access is optimized. Fine-grain parallelism is efficient for vector machines, whereas for distributed-memory machines, coarse-grain parallelism (obtained by tiling or blocking techniques) is preferable and permits the reduction of interprocessor communication.
KeywordsHull Paral Cuted
Unable to display preview. Download preview PDF.