Skip to main content

CALIPER: A Coarse Grain Parallel Performance Estimator and Predictor

  • Conference paper
  • First Online:
Emerging Technologies in Computing (iCETiC 2020)

Abstract

Empirical studies of Program Performance, are limited by the choice and the resulting bias, from the input samples used in the experiment. Estimation and Prediction based on static analysis, are more universal, superior and widely accepted. However the higher language artifacts such as Procedures, Loops, Conditionals and Recursion which ease program development can be an hindrance to quality analysis and performance study, both in terms of time and effort spent and in some extreme cases making it impractical. However, we could transform the program, eliminate the constraints imposed by these program structures and greatly ease the process of quality analysis and performance study. This process may also reduce the errors in the estimation, and help deliver timely results, when there is still an opportunity to use them in a later analysis phase. We propose transformations prior to estimation, such as Procedure Call Expansion, Loop Unrolling and Control Predication collectively referred to as Program Shape Flattening here with the structural hindrances themselves referred to as the Program Shape. The outcome of this transformation, is sequential code that is easy to work with. Specifically, for parallel performance estimations, we now have code that is free from Control Dependencies. We use the concept of Equivalence Classes to group statements based on their Data Dependence behavior. Statements that belong to an Equivalence Class are mutually dependent directly or transitively. On the other hand statements that belong to separate Equivalence Classes are dependence free and can be run in parallel without compromising on the program correctness. With this arrangement of program statements we claim that the program run time is now equal to the run time of the class that runs the longest. While this scheme of grouping program instructions, can be viewed as a method of parallel conversion, we use this method here specifically for parallel performance estimation and prediction. After surveying the published literature, and searching for similar commercial products, we did not find a comparable technology, to assess the contributions made by Caliper, at the time of writing, and so we claim that Caliper is the only product of its kind today.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arapattu, D., Gannon, D.: Building analytical models into an interactive performance prediction tool. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing 1989, pp. 521–530. ACM, New York (1989). https://doi.org/10.1145/76263.76321. http://doi.acm.org/10.1145/76263.76321

  2. Ayguade, E., et al.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009). https://doi.org/10.1109/TPDS.2008.105

    Article  Google Scholar 

  3. Balasundaram, V., Fox, G., Kennedy, K., Kremer, U.: A static performance estimator to guide data partitioning decisions. SIGPLAN Not. 26(7), 213–223 (1991). https://doi.org/10.1145/109626.109647. http://doi.acm.org/10.1145/109626.109647

    Article  Google Scholar 

  4. Blume, B., et al.: Polaris: the next generation in parallelizing compilers. In: Proceedings of the Workshop on Languages and Compilers for Parallel Computing, p. 10-1. Springer, Heidelberg (1994)

    Google Scholar 

  5. Blume, W., Eigenmann, R.: An overview of symbolic analysis techniques needed for the effective parallelization of the perfect benchmarks. In: Proceedings of the 1994 International Conference on Parallel Processing - Volume 02, ICPP 1994, pp. 233–238. IEEE Computer Society, Washington, DC (1994). https://doi.org/10.1109/ICPP.1994.59

  6. Bondhugula, U., et al.: Towards effective automatic parallelization for multicore systems. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–5 (2008). https://doi.org/10.1109/IPDPS.2008.4536401

  7. Bradel, B.J., Abdelrahman, T.S.: Automatic trace-based parallelization of Java programs. In: 2007 International Conference on Parallel Processing (ICPP 2007), p. 26 (2007). https://doi.org/10.1109/ICPP.2007.21

  8. Canetti, R., et al.: The parallel C (pC) programming language. IBM J. Res. Dev. 35(5.6), 727–741 (1991). https://doi.org/10.1147/rd.355.0727

    Article  Google Scholar 

  9. Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A.: Compile-time based performance prediction. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 365–379. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44905-1_23. http://dl.acm.org/citation.cfm?id=645677.663790

    Chapter  Google Scholar 

  10. Codrescu, L., Wills, D.S.: On dynamic speculative thread partitioning and the MEM-slicing algorithm. In: Proceedings of 1999 International Conference on Parallel Architectures and Compilation Techniques, pp. 40–46 (1999). https://doi.org/10.1109/PACT.1999.807404

  11. Cooper, K.D., et al.: ParaScope: a parallel programming environment. Proc. IEEE 81(2), 244–263 (1993)

    Article  Google Scholar 

  12. Cornea, B., Bourgeois, J.: A framework for efficient performance prediction of distributed applications in heterogeneous systems. J. Supercomput. 62, 1609–1634 (2012). https://doi.org/10.1007/s11227-012-0823-5

    Article  Google Scholar 

  13. Diaconescu, R., Wang, L., Mouri, Z., Chu, M.: A compiler and runtime infrastructure for automatic program distribution. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, p. 25a (2005). https://doi.org/10.1109/IPDPS.2005.7

  14. Diniz, P.C.: A compiler approach to performance prediction using empirical-based modeling. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 916–925. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44863-2_90. http://dl.acm.org/citation.cfm?id=1762418.1762519

    Chapter  Google Scholar 

  15. Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the 4th International Conference on Computing Frontiers, CF 2007, pp. 131–142. ACM, New York (2007). https://doi.org/10.1145/1242531.1242553. http://doi.acm.org/10.1145/1242531.1242553

  16. Fahringer, T.: Using the P3T to guide the parallelization and optimization effort under the Vienna Fortran compilation system. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 437–444 (1994). https://doi.org/10.1109/SHPCC.1994.296676

  17. Fahringer, T.: On estimating the useful work distribution of parallel programs under P3T: a static performance estimator. Concurr. Pract. Exp. 8, 261–282 (1996)

    Article  Google Scholar 

  18. Fahringer, T., Scholz, B.: Symbolic evaluation for parallelizing compilers. In: Proceedings of the 11th International Conference on Supercomputing, ICS 1997, pp. 261–268. ACM, New York (1997). https://doi.org/10.1145/263580.263648. http://doi.acm.org/10.1145/263580.263648

  19. Fahringer, T., Zima, H.P.: A static parameter based performance prediction tool for parallel programs. In: Proceedings of the 7th International Conference on Supercomputing, ICS 1993, pp. 207–219. ACM, New York (1993). https://doi.org/10.1145/165939.165971. http://doi.acm.org/10.1145/165939.165971

  20. Garcia, S., Jeon, D., Louie, C., Taylor, M.B.: The kremlin oracle for sequential code parallelization. IEEE Micro 32(4), 42–53 (2012). https://doi.org/10.1109/MM.2012.52

    Article  Google Scholar 

  21. Gropp, W., Gropp, W.D., Lusk, E., Skjellum, A., Lusk, A.D.F.E.E.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, vol. 1. MIT Press, Cambridge (1999)

    Google Scholar 

  22. Hammacher, C., Streit, K., Hack, S., Zeller, A.: Profiling Java programs for parallelism. In: 2009 ICSE Workshop on Multicore Software Engineering, IWMSE 2009, pp. 49–55 (2009). https://doi.org/10.1109/IWMSE.2009.5071383

  23. Horwitz, S., Reps, T.: The use of program dependence graphs in software engineering. In: Proceedings of the 14th International Conference on Software Engineering, pp. 392–411 (1992)

    Google Scholar 

  24. Jeon, D., Garcia, S., Louie, C., Taylor, M.B.: Kismet: parallel speedup estimates for serial programs. SIGPLAN Not. 46(10), 519–536 (2011). https://doi.org/10.1145/2076021.2048108. http://doi.acm.org/10.1145/2076021.2048108

    Article  Google Scholar 

  25. Kalyur, S., Nagaraja, G.S.: Paracite: auto-parallelization of a sequential program using the program dependence graph. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 7–12 (2016). https://doi.org/10.1109/CSITSS.2016.7779431

  26. Kalyur, S., Nagaraja, G.S.: A survey of modeling techniques used in compiler design and implementation. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 355–358 (2016). https://doi.org/10.1109/CSITSS.2016.7779385

  27. Kalyur, S., Nagaraja, G.S.: AIDE: an interactive environment for program transformation and parallelization. In: 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), pp. 1–5 (2017). https://doi.org/10.1109/CSITSS.2017.8447848

  28. Kalyur, S., Nagaraja, G.S.: Concerto: a program parallelization, orchestration and distribution infrastructure. In: 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), pp. 1–6 (2017). https://doi.org/10.1109/CSITSS.2017.8447691

  29. Kalyur, S., Nagaraja, G.S.: A taxonomy of methods and models used in program transformation and parallelization. In: Kumar, N., Venkatesha Prasad, R. (eds.) UBICNET 2019. LNICST, vol. 276, pp. 233–249. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20615-4_18

    Chapter  Google Scholar 

  30. Kaminsky, A.: Parallel Java: a unified API for shared memory and cluster parallel programming in 100% Java. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8 (2007). https://doi.org/10.1109/IPDPS.2007.370421

  31. Kim, M., Kim, H., Luk, C.K.: Prospector: a dynamic data-dependence profiler to help parallel programming. In: HotPar 2010: Proceedings of the USENIX Workshop on Hot Topics in Parallelism (2010)

    Google Scholar 

  32. Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 547–557 (2010). https://doi.org/10.1109/MICRO.2010.27

  33. Lazarescu, M.T., Lavagno, L.: Dynamic trace-based data dependency analysis for parallelization of C programs. In: 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 126–131 (2012). https://doi.org/10.1109/SCAM.2012.15

  34. Lokuciejewski, P., Cordes, D., Falk, H., Marwedel, P.: A fast and precise static loop analysis based on abstract interpretation, program slicing and polytope models. In: International Symposium on Code Generation and Optimization, CGO 2009, pp. 136–146 (2009). https://doi.org/10.1109/CGO.2009.17

  35. Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40(6), 190–200 (2005). https://doi.org/10.1145/1064978.1065034. http://doi.acm.org/10.1145/1064978.1065034

    Article  Google Scholar 

  36. Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2004/Performance 2004, pp. 2–13. ACM, New York (2004). https://doi.org/10.1145/1005686.1005691. http://doi.acm.org/10.1145/1005686.1005691

  37. Miller, B.P., et al.: The paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995). https://doi.org/10.1109/2.471178

    Article  Google Scholar 

  38. Navarro, A., Zapata, E., Padua, D.: Compiler techniques for the distribution of data and computation. IEEE Trans. Parallel Distrib. Syst. 14(6), 545–562 (2003). https://doi.org/10.1109/TPDS.2003.1206503

    Article  Google Scholar 

  39. Nethercote, N., Seward, J.: Valgrind: A framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). https://doi.org/10.1145/1273442.1250746. http://doi.acm.org/10.1145/1273442.1250746

    Article  Google Scholar 

  40. Psarris, K., Kyriakopoulos, K.: An experimental evaluation of data dependence analysis techniques. IEEE Trans. Parallel Distrib. Syst. 15(3), 196–213 (2004). https://doi.org/10.1109/TPDS.2004.1264806

    Article  MATH  Google Scholar 

  41. de Rose, L.A., Reed, D.A.: SvPablo: a multi-language architecture-independent performance analysis system. In: Proceedings of the 1999 International Conference on Parallel Processing, ICPP 1999, pp. 311–318. IEEE Computer Society, Washington (1999). http://dl.acm.org/citation.cfm?id=850940.852859

  42. Sarkar, V.: Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Dev. 35(5.6), 779–804 (1991)

    Google Scholar 

  43. Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2009, pp. 229–240. ACM, New York (2009). https://doi.org/10.1145/1504176.1504210. http://doi.acm.org/10.1145/1504176.1504210

  44. Wagner, T.A., Maverick, V., Graham, S.L., Harrison, M.A.: Accurate static estimators for program optimization. SIGPLAN Not. 29(6), 85–96 (1994). https://doi.org/10.1145/773473.178251. http://doi.acm.org/10.1145/773473.178251

    Article  Google Scholar 

  45. Wang, K.Y.: Precise compile-time performance prediction for superscalar-based computers. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI 1994, pp. 73–84. ACM, New York (1994). https://doi.org/10.1145/178243.178250. http://doi.acm.org/10.1145/178243.178250

  46. Wang, Z., Sanchez, A., Herkersdorf, A.: SciSim: a software performance estimation framework using source code instrumentation. In: Proceedings of the 7th International Workshop on Software and Performance, WOSP 2008, pp. 33–42. ACM, New York (2008). https://doi.org/10.1145/1383559.1383565. http://doi.acm.org/10.1145/1383559.1383565

  47. Zhai, J., Chen, W., Zheng, W., Li, K.: Performance prediction for large-scale parallel applications using representative replay. IEEE Trans. Comput. 65(7), 2184–2198 (2016). https://doi.org/10.1109/TC.2015.2479630

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sesha Kalyur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kalyur, S., Nagaraja, G. (2020). CALIPER: A Coarse Grain Parallel Performance Estimator and Predictor. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds) Emerging Technologies in Computing. iCETiC 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 332. Springer, Cham. https://doi.org/10.1007/978-3-030-60036-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60036-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60035-8

  • Online ISBN: 978-3-030-60036-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics