About this book
Since its introduction decades ago, Instruction Level Parallelism (ILP) has gradually become ubiquitous and is now featured in virtually every processor built today, from general purpose CPUs to application-specific and embedded processors. Because these architectures could not exist or (in the case of superscalar machines) cannot achieve their full potential without specific sophisticated compilation techniques to exploit ILP, the development of architectures that support ILP has proceeded hand-in-hand with the development of sophisticated compiler technology, such as Trace Scheduling and Software Pipelining. While essential for achieving the full potential of ILP, in both performance as well as power consumption management, these techniques are still not widely known, in part because of their intricacy and in part because the only widely available references for ILP techniques are the primary resources, with the brevity of introduction common to conference proceedings.
This book precisely formulates, and simplifies the presentation of Instruction Level Parallelism (ILP) compilation techniques. It uniquely offers consistent and uniform descriptions of the code transformations involved. Due to the ubiquitous nature of ILP in virtually every processor built today, from general purpose CPUs to application-specific and embedded processors, this book is useful to the student, the practitioner and also the researcher of advanced compilation techniques. With an emphasis on fine-grain instruction level parallelism, this book will also prove interesting to researchers and students of parallelism at large, in as much as the techniques described yield insights that go beyond superscalar and VLIW (Very Long Instruction Word) machines compilation and are more widely applicable to optimizing compilers in general. ILP techniques have found wide and crucial application in Design Automation, where they have been used extensively in the optimization of performance as well as area and power minimization of computer designs.
Parallelism Instruction-level parallelism VLIW Superscalar Loop parallelization Compilers Optimization Scheduling Trace scheduling Percolation scheduling Modulo scheduling Software pipelining Performance GPUs GPU architecture scheduling algorithms Parallel processing Parallel computing