International Journal of Parallel Programming

, Volume 44, Issue 4, pp 867–900 | Cite as

Using Machine Learning Techniques to Detect Parallel Patterns of Multi-threaded Applications



Multicore hardware and software are becoming increasingly more complex. The programmability problem of multicore software has led to the use of parallel patterns. Parallel patterns reduce the effort and time required to develop multicore software by effectively capturing its thread communication and data sharing characteristics. Hence, detecting the parallel pattern used in a multi-threaded application is crucial for performance improvements and enables many architectural optimizations; however, this topic has not been widely studied. We apply machine learning techniques in a novel approach to automatically detect parallel patterns and compare these techniques in terms of accuracy and speed. We experimentally validate the detection ability of our techniques on benchmarks including PARSEC and Rodinia. Our experiments show that the k-nearest neighbor, decision trees, and naive Bayes classifier are the most accurate techniques. Overall, decision trees are the fastest technique with the lowest characterization overhead producing the best combination of detection results. We also show the usefulness of the proposed techniques on synthetic benchmark generation.


Parallel patterns Parallel programming Multi-threaded applications Multicore software Pattern detection 



We would like to thank Prof. Ethem Alpaydin for his very helpful comments on early versions of the paper. This work was supported in part by Semiconductor Research Corporation under task 2082.001, Bogazici University Research Fund 7223, and the Turkish Academy of Sciences.


  1. 1.
    Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Design patterns percolating to parallel programming framework implementation. Int. J. Parallel Prog. 42(6), 1012–1031 (2014)CrossRefGoogle Scholar
  2. 2.
    Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge (2010)MATHGoogle Scholar
  3. 3.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: a view from Berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)Google Scholar
  4. 4.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994)CrossRefGoogle Scholar
  5. 5.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC Benchmark suite: characterization and architectural implications. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)Google Scholar
  6. 6.
    Bird, S., Phansalkar, A., John, L.K., Mercas, A., Idukuru, R.: Performance characterization of SPEC CPU benchmarks on Intel’s Core microarchitecture based processor. In: SPEC Benchmark Workshop, pp. 1–7 (2007)Google Scholar
  7. 7.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus (2006)MATHGoogle Scholar
  8. 8.
    Cammarota, R., Beni, L.A., Nicolau, A., Veidenbaum, A.V.: Effective evaluation of multi-core based systems. In: International Symposium on Parallel and Distributed Computing (ISPDC), pp. 19–25. IEEE (2013)Google Scholar
  9. 9.
    Cammarota, R., Kejariwal, A., D’Alberto, P., Panigrahi, S., Veidenbaum, A.V., Nicolau, A.: Pruning hardware evaluation space via correlation-driven application similarity analysis. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, p. 4. ACM (2011)Google Scholar
  10. 10.
    Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the Determination of inlining vectors for program optimization. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. Lecture Notes in Computer Science, vol. 7791, pp. 164–183. Springer Berlin Heidelberg (2013)Google Scholar
  11. 11.
    Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Fut. Gener. Comput. Syst. 37, 354–366 (2014)CrossRefGoogle Scholar
  12. 12.
    Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: International Symposium on Code Generation and Optimization (CGO), pp. 185–197. IEEE (2007)Google Scholar
  13. 13.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)Google Scholar
  14. 14.
    Che, S., Sheaffer, J., Boyer, M., Szafaryn, L., Wang, L., Skadron, K.: A Characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 1–11 (2010)Google Scholar
  15. 15.
    Deniz, E., Sen, A., Kahne, B., Holt, J.: MINIME: pattern-aware multicore benchmark synthesizer. IEEE Trans. Comput. 64(8), 2239–2252 (2015)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Deshpande, A., Riehle, D.: The total growth of open source. In: Open Source Development, Communities and Quality, IFIP? The International Federation for Information Processing, vol. 275, pp. 197–209. Springer, Berlin (2008)Google Scholar
  17. 17.
    Ding, W., Hernandez, O., Curtis, T., Chapman, B.: Porting applications with OpenMP using similarity analysis. In: Caşcaval, C., Montesinos, P. (eds.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 8664, pp. 20–35. Springer International Publishing (2014)Google Scholar
  18. 18.
    DiscoPoP: a profiling tool to identify parallelization opportunities. (2015)
  19. 19.
    Dunteman, G.H.: Principal Component Analysis. Sage, London (1989)Google Scholar
  20. 20.
    DynamoRIO Dynamic Instrumentation Tool Platform, (2015)
  21. 21.
    Eeckhout, L., Vandierendonck, H., Bosschere, K.D.: Quantifying the impact of input data sets on program behavior and its applications. J. Instr. Level Parallelism 5, 1–33 (2003)Google Scholar
  22. 22.
    Embedded microprocessor benchmark consortium. (2015)
  23. 23.
    FastFlow: Pattern-based multi/many-core parallel programming framework. (2015)
  24. 24.
    Ferrari, D.: On the foundations of artificial workload design. Perform. Eval. 3(2), 153 (1983)CrossRefGoogle Scholar
  25. 25.
    Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software, 1st edn. Addison-Wesley Professional, Reading (1994)MATHGoogle Scholar
  26. 26.
    Ganapathi, A., Datta, K., Fox, A., Patterson, D.: A case for machine learning to optimize multicore performance. In: First USENIX Workshop on Hot Topics in Parallelism (HotPar), pp. 1–6 (2009)Google Scholar
  27. 27.
    Ganesan, K., John, L.K.: Automatic generation of miniaturized synthetic proxies for target applications to efficiently design multicore processors. IEEE Trans. Comput. 63(4), 833–846 (2014)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Ganesan, K., John, L.K., Salapura, V., Sexton, J.C.: A performance counter based workload characterization on blue gene/P. In: International Conference on Parallel Processing (ICPP), pp. 330–337 (2008)Google Scholar
  29. 29.
    Goswami, D., Singh, A., Preiss, B.R.: Building parallel applications using design patterns. In: Erdogmus, H., Tanir, O. (eds.) Advances in Software Engineering: Topics in Comprehension, Evolution and Evaluation, pp. 243–265. Springer, New York (2002)CrossRefGoogle Scholar
  30. 30.
    Hammond, K., Aldinucci, M., Brown, C., Cesarini, F., Danelutto, M., González-Vélez, H., Kilpatrick, P., Keller, R., Rossbory, M., Shainer, G.: The ParaPhrase Project: parallel patterns for adaptive heterogeneous multicore systems. In: Formal Methods for Components and Objects, Lecture Notes in Computer Science, vol. 7542, pp. 218–236. Springer, Berlin (2013)Google Scholar
  31. 31.
    Hoste, K., Eeckhout, L.: Comparing benchmarks using key microarchitecture-independent characteristics. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 83–92 (2006)Google Scholar
  32. 32.
    Huda, Z.U., Jannesari, A., Wolf, F.: Using template matching to infer parallel design patterns. ACM Trans. Archit. Code Optim. (TACO) 11(4), 64 (2015)Google Scholar
  33. 33.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc, Upper Saddle River (1988)MATHGoogle Scholar
  34. 34.
    John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)Google Scholar
  35. 35.
    Joshi, A., Eeckhout, L., Jr., R.H.B., John, L.K.: Performance cloning: a technique for disseminating proprietary applications as benchmarks. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 105–115 (2006)Google Scholar
  36. 36.
    Joshi, A., Phansalkar, A., Eeckhout, L., John, L.K.: Measuring benchmark similarity using inherent program characteristics. IEEE Trans. Comput. (TC) 55, 769–782 (2006)CrossRefGoogle Scholar
  37. 37.
    Lin, C.-Y., Kuan, C.-B., Shih, W.-L., Lee, J.K.: Compilers for low power with design patterns on embedded multicore systems. J. Signal Process. Syst. 80(3), 277–293 (2015). doi: 10.1007/s11265-014-0917-9
  38. 38.
    MATLAB: The Language of Technical Computing—MathWorks. (2015)
  39. 39.
    Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley, Reading (2005)MATHGoogle Scholar
  40. 40.
    McCool, M.D.: Structured parallel programming with deterministic patterns. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism, HotPar’10, pp. 1–6 (2010)Google Scholar
  41. 41.
    Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)MATHGoogle Scholar
  42. 42.
    Mitchell, T.M.: The discipline of machine learning. Tech. Rep. CMU-ML-06-108, Machine Learning Department, School of Computer Science, Carnegie Mellon University (2006)Google Scholar
  43. 43.
    Moller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6(4), 525–533 (1993)CrossRefGoogle Scholar
  44. 44.
    The OpenMP API Specification for Parallel Programming. (2015)
  45. 45.
    Ortega-Arjona, J.L., Roberts, G.: Architectural patterns for parallel programming. In: European Conference on Pattern Languages of Programs (EuroPLoP), pp. 225–260 (1998)Google Scholar
  46. 46.
    Poovey, J.A., Railing, B.P., Conte, T.M.: Parallel pattern detection for architectural improvements. In: USENIX Conference on Hot Topic in Parallelism (HotPar), pp. 12–12 (2011)Google Scholar
  47. 47.
    Poovey, J.A., Rosier, M.C., Conte, T.M.: Pattern-aware dynamic thread mapping mechanisms for asymmetric manycore architectures. Tech. Rep. 2011-1, School of Computer Science, Georgia Institute of Technology (2011)Google Scholar
  48. 48.
    IEEE Std 1003.1, 2013 Edition. (2015)
  49. 49.
    Ruparelia, N.B.: Software development lifecycle models. ACM SIGSOFT Softw. Eng. Notes 35(3), 8–13 (2010)CrossRefGoogle Scholar
  50. 50.
    Skillicorn, D.B.: Models for practical parallel computation. Int. J. Parallel Prog. 20(2), 133–158 (1991)CrossRefGoogle Scholar
  51. 51.
    Wang, Z., O’boyle, M.F.P.: Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim. (TACO) 10(3), 20:1–20:25 (2008)Google Scholar
  52. 52.
    Zandifar, M., Abdul Jabbar, M., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, pp. 415–424. ACM (2015)Google Scholar
  53. 53.
    Zanoni, M., Fontana, F.A., Stella, F.: On applying machine learning techniques for design pattern detection. J. Syst. Softw. 103, 102–117 (2015)CrossRefGoogle Scholar
  54. 54.
    Zhao, Q., Bruening, D., Amarasinghe, S.: Umbra: Efficient and scalable memory shadowing. In: IEEE/ACM international symposium on code generation and optimization, pp. 22–31 (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computer EngineeringBogazici UniversityIstanbulTurkey

Personalised recommendations