Exposing Tunable Parameters in Multi-threaded Numerical Code

Qasem, Apan; Guo, Jichi; Rahman, Faizur; Yi, Qing

doi:10.1007/978-3-642-15672-4_6

Apan Qasem¹⁹,
Jichi Guo²⁰,
Faizur Rahman²⁰ &
…
Qing Yi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6289))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1654 Accesses
2 Citations

Abstract

Achieving high performance on today’s architectures requires careful orchestration of many optimization parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality. This paper presents a systematic and extensive exploration of thecombined search space of transformation parameters that affect both parallelism and data locality in multi-threaded numerical applications.We characterize the nature of the complex interaction between blocking, problem decomposition and selection of loops for parallelism. We identify key parameters for tuning and provide an automatic mechanism for exposing these parameters to a search tool. A series of experiments on two scientific benchmarks illustrates the non-orthogonality of the transformation search space and reiterates the need for integrated transformation heuristics for achieving high-performance on current multicore architectures.

This research is funded by the National Science Foundation under Grant No. 0833203 and No. 0747357 and by the Department of Energy under Grant No. DE-SC001770.

Download to read the full chapter text

Chapter PDF

On the Benefits of Tasking with OpenMP

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

ParaShares: Finding the Important Basic Blocks in Multithreaded Programs

Keywords

References

Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In: PPoPP 2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)
Google Scholar
Almagor, L., Cooper, K., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (2004)
Google Scholar
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC 2008 (2008)
Google Scholar
Qasem, A., Kennedy, K.: Profitable loop fusion and tiling using model-driven empirical search. In: Proceedings of the 20th ACM International Conference on Supercomputing (June 2006)
Google Scholar
Whaley, C., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of SC 1998: High Performance Networking and Computing (November 1998)
Google Scholar
Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the SIGPLAN 1998 Conference on Programming Language Design and Implementation, Montreal, Canada (June 1998)
Google Scholar
Song, F., Moore, S., Dongarra, J.: Feedback-directed thread scheduling with memory considerations. In: HPDC 2007: Proceedings of the 16th International Symposium on High Performance Distributed Computing (2007)
Google Scholar
Yi, Q.: The POET language manual (2008), http://www.cs.utsa.edu/~qingyi/POET/poet-manual.pdf
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In: International Symposium on Microarchitecture (2007)
Google Scholar
Papadopoulos, K., Stavrou, K., Trancoso, P.: Helpercore_db: Exploiting multicore technology for databases. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (2007)
Google Scholar
Wolf, M.E., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada (June 1991)
Google Scholar
Coleman, S., McKinley, K.S.: Tile size selection using cache organization. In: Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and Implementation, La Jolla, CA (June 1995)
Google Scholar
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distributed Processing Symposium, San Francisco, CA (April 2001) (best Paper Award)
Google Scholar
Wolf, M., Maydan, D., Chen, D.: Combining loop transformations considering caches and scheduling. In: Proceedings of the 29th Annual International Symposium on MicroArchitecture (1996)
Google Scholar
Vadlamani, S.N., Jenks, S.F.: The synchronized pipelined parallelism model. In: The 16th IASTED International Conference on Parallel and Distributed Computing and Systems (2004)
Google Scholar
Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Effective automatic parallelization of stencil computations. In: PLDI 2007: Proceedings of the 2007 ACM SIGPLAN conference on Programming Language Design and Implementation (2007)
Google Scholar
Hall, M., Chame, J., Chen, C., Shin, J., Rudy, G., Khan, M.M.: Loop transformation recipes for code generation and auto-tuning. In: The 22nd International Workshop on Languages and Compilers for Parallel Computing, LCPC 2009 (2009)
Google Scholar
Wonnacott, D.: Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In: Proceedings of the 14th International Symposium on Parallel and Distributed Processing (IPDPS 2000), Washington, DC, USA, IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience (2009)
Google Scholar
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: Poet: Parameterized optimizations for empirical tuning. In: Workshop on Performance Optimization for High-Level Languages and Libraries (March 2007)
Google Scholar
Yi, Q., Whaley, C.: Automated transformation for performance-critical kernels. In: ACM SIGPLAN Symposium on Library-Centric Software Design, Montreal, Canada (October 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Texas State University, USA
Apan Qasem
University of Texas at San Antonio, USA
Jichi Guo, Faizur Rahman & Qing Yi

Authors

Apan Qasem
View author publications
You can also search for this author in PubMed Google Scholar
Jichi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Faizur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Qing Yi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Rochester, P.O. Box 270226, 14627, Rochester, NY, USA
Chen Ding
School of Computer Science and Technology, Huazhong University of Science and Technology, 430074, Wuhan, China
Zhiyuan Shao
School of Computer Science and Technology, Services Computing Technology and Huazhong University of Science and Technology, 430074, Wuhan, China
Ran Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qasem, A., Guo, J., Rahman, F., Yi, Q. (2010). Exposing Tunable Parameters in Multi-threaded Numerical Code. In: Ding, C., Shao, Z., Zheng, R. (eds) Network and Parallel Computing. NPC 2010. Lecture Notes in Computer Science, vol 6289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15672-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-15672-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15671-7
Online ISBN: 978-3-642-15672-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exposing Tunable Parameters in Multi-threaded Numerical Code

Abstract

Chapter PDF

Similar content being viewed by others

On the Benefits of Tasking with OpenMP

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

ParaShares: Finding the Important Basic Blocks in Multithreaded Programs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exposing Tunable Parameters in Multi-threaded Numerical Code

Abstract

Chapter PDF

Similar content being viewed by others

On the Benefits of Tasking with OpenMP

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

ParaShares: Finding the Important Basic Blocks in Multithreaded Programs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation