CALIPER: A Coarse Grain Parallel Performance Estimator and Predictor

Kalyur, Sesha; Nagaraja, G.S

doi:10.1007/978-3-030-60036-5_2

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 332))

Included in the following conference series:

International Conference for Emerging Technologies in Computing

794 Accesses
1 Citations

Abstract

Empirical studies of Program Performance, are limited by the choice and the resulting bias, from the input samples used in the experiment. Estimation and Prediction based on static analysis, are more universal, superior and widely accepted. However the higher language artifacts such as Procedures, Loops, Conditionals and Recursion which ease program development can be an hindrance to quality analysis and performance study, both in terms of time and effort spent and in some extreme cases making it impractical. However, we could transform the program, eliminate the constraints imposed by these program structures and greatly ease the process of quality analysis and performance study. This process may also reduce the errors in the estimation, and help deliver timely results, when there is still an opportunity to use them in a later analysis phase. We propose transformations prior to estimation, such as Procedure Call Expansion, Loop Unrolling and Control Predication collectively referred to as Program Shape Flattening here with the structural hindrances themselves referred to as the Program Shape. The outcome of this transformation, is sequential code that is easy to work with. Specifically, for parallel performance estimations, we now have code that is free from Control Dependencies. We use the concept of Equivalence Classes to group statements based on their Data Dependence behavior. Statements that belong to an Equivalence Class are mutually dependent directly or transitively. On the other hand statements that belong to separate Equivalence Classes are dependence free and can be run in parallel without compromising on the program correctness. With this arrangement of program statements we claim that the program run time is now equal to the run time of the class that runs the longest. While this scheme of grouping program instructions, can be viewed as a method of parallel conversion, we use this method here specifically for parallel performance estimation and prediction. After surveying the published literature, and searching for similar commercial products, we did not find a comparable technology, to assess the contributions made by Caliper, at the time of writing, and so we claim that Caliper is the only product of its kind today.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arapattu, D., Gannon, D.: Building analytical models into an interactive performance prediction tool. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing 1989, pp. 521–530. ACM, New York (1989). https://doi.org/10.1145/76263.76321. http://doi.acm.org/10.1145/76263.76321
Ayguade, E., et al.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009). https://doi.org/10.1109/TPDS.2008.105
Article Google Scholar
Balasundaram, V., Fox, G., Kennedy, K., Kremer, U.: A static performance estimator to guide data partitioning decisions. SIGPLAN Not. 26(7), 213–223 (1991). https://doi.org/10.1145/109626.109647. http://doi.acm.org/10.1145/109626.109647
Article Google Scholar
Blume, B., et al.: Polaris: the next generation in parallelizing compilers. In: Proceedings of the Workshop on Languages and Compilers for Parallel Computing, p. 10-1. Springer, Heidelberg (1994)
Google Scholar
Blume, W., Eigenmann, R.: An overview of symbolic analysis techniques needed for the effective parallelization of the perfect benchmarks. In: Proceedings of the 1994 International Conference on Parallel Processing - Volume 02, ICPP 1994, pp. 233–238. IEEE Computer Society, Washington, DC (1994). https://doi.org/10.1109/ICPP.1994.59
Bondhugula, U., et al.: Towards effective automatic parallelization for multicore systems. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–5 (2008). https://doi.org/10.1109/IPDPS.2008.4536401
Bradel, B.J., Abdelrahman, T.S.: Automatic trace-based parallelization of Java programs. In: 2007 International Conference on Parallel Processing (ICPP 2007), p. 26 (2007). https://doi.org/10.1109/ICPP.2007.21
Canetti, R., et al.: The parallel C (pC) programming language. IBM J. Res. Dev. 35(5.6), 727–741 (1991). https://doi.org/10.1147/rd.355.0727
Article Google Scholar
Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A.: Compile-time based performance prediction. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 365–379. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44905-1_23. http://dl.acm.org/citation.cfm?id=645677.663790
Chapter Google Scholar
Codrescu, L., Wills, D.S.: On dynamic speculative thread partitioning and the MEM-slicing algorithm. In: Proceedings of 1999 International Conference on Parallel Architectures and Compilation Techniques, pp. 40–46 (1999). https://doi.org/10.1109/PACT.1999.807404
Cooper, K.D., et al.: ParaScope: a parallel programming environment. Proc. IEEE 81(2), 244–263 (1993)
Article Google Scholar
Cornea, B., Bourgeois, J.: A framework for efficient performance prediction of distributed applications in heterogeneous systems. J. Supercomput. 62, 1609–1634 (2012). https://doi.org/10.1007/s11227-012-0823-5
Article Google Scholar
Diaconescu, R., Wang, L., Mouri, Z., Chu, M.: A compiler and runtime infrastructure for automatic program distribution. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, p. 25a (2005). https://doi.org/10.1109/IPDPS.2005.7
Diniz, P.C.: A compiler approach to performance prediction using empirical-based modeling. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 916–925. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44863-2_90. http://dl.acm.org/citation.cfm?id=1762418.1762519
Chapter Google Scholar
Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the 4th International Conference on Computing Frontiers, CF 2007, pp. 131–142. ACM, New York (2007). https://doi.org/10.1145/1242531.1242553. http://doi.acm.org/10.1145/1242531.1242553
Fahringer, T.: Using the P3T to guide the parallelization and optimization effort under the Vienna Fortran compilation system. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 437–444 (1994). https://doi.org/10.1109/SHPCC.1994.296676
Fahringer, T.: On estimating the useful work distribution of parallel programs under P3T: a static performance estimator. Concurr. Pract. Exp. 8, 261–282 (1996)
Article Google Scholar
Fahringer, T., Scholz, B.: Symbolic evaluation for parallelizing compilers. In: Proceedings of the 11th International Conference on Supercomputing, ICS 1997, pp. 261–268. ACM, New York (1997). https://doi.org/10.1145/263580.263648. http://doi.acm.org/10.1145/263580.263648
Fahringer, T., Zima, H.P.: A static parameter based performance prediction tool for parallel programs. In: Proceedings of the 7th International Conference on Supercomputing, ICS 1993, pp. 207–219. ACM, New York (1993). https://doi.org/10.1145/165939.165971. http://doi.acm.org/10.1145/165939.165971
Garcia, S., Jeon, D., Louie, C., Taylor, M.B.: The kremlin oracle for sequential code parallelization. IEEE Micro 32(4), 42–53 (2012). https://doi.org/10.1109/MM.2012.52
Article Google Scholar
Gropp, W., Gropp, W.D., Lusk, E., Skjellum, A., Lusk, A.D.F.E.E.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, vol. 1. MIT Press, Cambridge (1999)
Google Scholar
Hammacher, C., Streit, K., Hack, S., Zeller, A.: Profiling Java programs for parallelism. In: 2009 ICSE Workshop on Multicore Software Engineering, IWMSE 2009, pp. 49–55 (2009). https://doi.org/10.1109/IWMSE.2009.5071383
Horwitz, S., Reps, T.: The use of program dependence graphs in software engineering. In: Proceedings of the 14th International Conference on Software Engineering, pp. 392–411 (1992)
Google Scholar
Jeon, D., Garcia, S., Louie, C., Taylor, M.B.: Kismet: parallel speedup estimates for serial programs. SIGPLAN Not. 46(10), 519–536 (2011). https://doi.org/10.1145/2076021.2048108. http://doi.acm.org/10.1145/2076021.2048108
Article Google Scholar
Kalyur, S., Nagaraja, G.S.: Paracite: auto-parallelization of a sequential program using the program dependence graph. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 7–12 (2016). https://doi.org/10.1109/CSITSS.2016.7779431
Kalyur, S., Nagaraja, G.S.: A survey of modeling techniques used in compiler design and implementation. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 355–358 (2016). https://doi.org/10.1109/CSITSS.2016.7779385
Kalyur, S., Nagaraja, G.S.: AIDE: an interactive environment for program transformation and parallelization. In: 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), pp. 1–5 (2017). https://doi.org/10.1109/CSITSS.2017.8447848
Kalyur, S., Nagaraja, G.S.: Concerto: a program parallelization, orchestration and distribution infrastructure. In: 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), pp. 1–6 (2017). https://doi.org/10.1109/CSITSS.2017.8447691
Kalyur, S., Nagaraja, G.S.: A taxonomy of methods and models used in program transformation and parallelization. In: Kumar, N., Venkatesha Prasad, R. (eds.) UBICNET 2019. LNICST, vol. 276, pp. 233–249. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20615-4_18
Chapter Google Scholar
Kaminsky, A.: Parallel Java: a unified API for shared memory and cluster parallel programming in 100% Java. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8 (2007). https://doi.org/10.1109/IPDPS.2007.370421
Kim, M., Kim, H., Luk, C.K.: Prospector: a dynamic data-dependence profiler to help parallel programming. In: HotPar 2010: Proceedings of the USENIX Workshop on Hot Topics in Parallelism (2010)
Google Scholar
Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 547–557 (2010). https://doi.org/10.1109/MICRO.2010.27
Lazarescu, M.T., Lavagno, L.: Dynamic trace-based data dependency analysis for parallelization of C programs. In: 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 126–131 (2012). https://doi.org/10.1109/SCAM.2012.15
Lokuciejewski, P., Cordes, D., Falk, H., Marwedel, P.: A fast and precise static loop analysis based on abstract interpretation, program slicing and polytope models. In: International Symposium on Code Generation and Optimization, CGO 2009, pp. 136–146 (2009). https://doi.org/10.1109/CGO.2009.17
Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40(6), 190–200 (2005). https://doi.org/10.1145/1064978.1065034. http://doi.acm.org/10.1145/1064978.1065034
Article Google Scholar
Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2004/Performance 2004, pp. 2–13. ACM, New York (2004). https://doi.org/10.1145/1005686.1005691. http://doi.acm.org/10.1145/1005686.1005691
Miller, B.P., et al.: The paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995). https://doi.org/10.1109/2.471178
Article Google Scholar
Navarro, A., Zapata, E., Padua, D.: Compiler techniques for the distribution of data and computation. IEEE Trans. Parallel Distrib. Syst. 14(6), 545–562 (2003). https://doi.org/10.1109/TPDS.2003.1206503
Article Google Scholar
Nethercote, N., Seward, J.: Valgrind: A framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). https://doi.org/10.1145/1273442.1250746. http://doi.acm.org/10.1145/1273442.1250746
Article Google Scholar
Psarris, K., Kyriakopoulos, K.: An experimental evaluation of data dependence analysis techniques. IEEE Trans. Parallel Distrib. Syst. 15(3), 196–213 (2004). https://doi.org/10.1109/TPDS.2004.1264806
Article MATH Google Scholar
de Rose, L.A., Reed, D.A.: SvPablo: a multi-language architecture-independent performance analysis system. In: Proceedings of the 1999 International Conference on Parallel Processing, ICPP 1999, pp. 311–318. IEEE Computer Society, Washington (1999). http://dl.acm.org/citation.cfm?id=850940.852859
Sarkar, V.: Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Dev. 35(5.6), 779–804 (1991)
Google Scholar
Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2009, pp. 229–240. ACM, New York (2009). https://doi.org/10.1145/1504176.1504210. http://doi.acm.org/10.1145/1504176.1504210
Wagner, T.A., Maverick, V., Graham, S.L., Harrison, M.A.: Accurate static estimators for program optimization. SIGPLAN Not. 29(6), 85–96 (1994). https://doi.org/10.1145/773473.178251. http://doi.acm.org/10.1145/773473.178251
Article Google Scholar
Wang, K.Y.: Precise compile-time performance prediction for superscalar-based computers. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI 1994, pp. 73–84. ACM, New York (1994). https://doi.org/10.1145/178243.178250. http://doi.acm.org/10.1145/178243.178250
Wang, Z., Sanchez, A., Herkersdorf, A.: SciSim: a software performance estimation framework using source code instrumentation. In: Proceedings of the 7th International Workshop on Software and Performance, WOSP 2008, pp. 33–42. ACM, New York (2008). https://doi.org/10.1145/1383559.1383565. http://doi.acm.org/10.1145/1383559.1383565
Zhai, J., Chen, W., Zheng, W., Li, K.: Performance prediction for large-scale parallel applications using representative replay. IEEE Trans. Comput. 65(7), 2184–2198 (2016). https://doi.org/10.1109/TC.2015.2479630
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, R. V. College of Engineering, VTU, Bangalore, India
Sesha Kalyur & G.S Nagaraja

Authors

Sesha Kalyur
View author publications
You can also search for this author in PubMed Google Scholar
G.S Nagaraja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sesha Kalyur .

Editor information

Editors and Affiliations

CFRED, Chinese University of Hong Kong, Hong Kong, China
Mahdi H. Miraz
Professor Emeritus, Wrexham Glyndwr University, Bradford, UK
Peter S. Excell
Faculty of Computing, Engineering and Science, University of South Wales, Pontypridd, Mid Glamorgan, UK
Andrew Ware
AMA International University, Salmabad, Bahrain
Safeeullah Soomro
Epoka University, Tirana, Albania
Maaruf Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalyur, S., Nagaraja, G. (2020). CALIPER: A Coarse Grain Parallel Performance Estimator and Predictor. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds) Emerging Technologies in Computing. iCETiC 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 332. Springer, Cham. https://doi.org/10.1007/978-3-030-60036-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-60036-5_2
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60035-8
Online ISBN: 978-3-030-60036-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics