Parallelism Detection in Nested Loops
Loop transformations have been shown to be useful for extracting parallelism from regular nested loops for a large class of machines, from vector machines and VLIW machines to multiprocessor architectures. Of course, each type of machine corresponds to a different optimized code; depending on the memory hierarchy of the target, the granularity of the generated code must be carefully chosen so that memory access is optimized. Fine-grain parallelism is efficient for vector machines, whereas for distributed-memory machines, coarse-grain parallelism (obtained by tiling or blocking techniques) is preferable and permits the reduction of interprocessor communication.
KeywordsDependence Graph Nest Loop Virtual Node Parallel Loop Unimodular Matrix
Unable to display preview. Download preview PDF.