Skip to main content
Log in

A hybrid sample generation approach in speculative multithreading

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Speculative multithreading (SpMT) is a thread-level automatic parallelization technique to accelerate sequential programs. Machine learning has been successfully brought into SpMT to improve its performance. An appropriate sample set, which plays the role of knowledge provider, is important for machine learning-based (ML-based) thread partition. Conventionally, heuristic rules-based (HR-based) sample generation approach cannot generate adaptive samples. A hybrid sample generation approach can break this bottleneck. With this method, we firstly automatically generate samples, which are MIPS codes consisting of spawning points (SPs) and control quasi-independent points (CQIPs) by heuristic rules; secondly manually adjust the positions of SPs and CQIPs and rebuild pre-computation slice to obtain better performance for every sample; and then build model to ensure that the probability of adjusting to the optimal partition positions is increasing. During the implementation of this approach, three measures: bias weighting, preservation of optimal solutions, summary of greedy rules, are taken. In this way, we enhance the adjustment frequency for subroutines with high called time and preserve the optimal partition positions, so to achieve a stable speedup improvement. On Prophet, which is a generic SpMT processor to evaluate the performance of multithreaded programs, SPEC2000 and Olden benchmarks are used as input. Experiments show that our approach can obtain better sample sets, which deliver a better performance improvement of about 86.9% on a 16 core than the samples generated by HR-based approach. Experiment results also prove that this approach is effective to generate sample sets for ML-based thread partition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Liu B, Zhao Y, Li Y, Sun Y, Feng B (2014) A thread partitioning approach for speculative multithreading. J Supercomput 67(3):778–805

    Article  Google Scholar 

  2. Quiñones CG, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: ACM Sigplan Notices. ACM, vol 40, pp 269–279

  3. Dong Z, Zhao Y, Wei Y, Wang X, Song S (2009) Prophet: a speculative multi-threading execution model with architectural support based on CMP. In: International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing, 2009. SCALCOM-EMBEDDEDCOM’09. IEEE, pp 103–108

  4. Luo Y, Zhai A (2012) Dynamically dispatching speculative threads to improve sequential execution. ACM Trans Archit Code Optim 9(3):732–741

    Article  Google Scholar 

  5. Olukotun K, Hammond L, Willey M (1999) Improving the performance of speculatively parallel applications on the hydra CMP. In: Proceedings of the 13th International Conference on Supercomputing, pp 21–30

  6. August DI, Huang J, Beard SR, Johnson NP, Jablin TB (2013) Automatically exploiting cross-invocation parallelism using runtime information. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, pp 1–11

  7. Franklin M (1995) Multiscalar processors. ACM SIGARCH Comput Archit News 23(2):414–425

    Article  MATH  Google Scholar 

  8. Bader D (2010) Analyzing massive social networks using multicore and multithreaded architectures. Springer, New York

    Book  Google Scholar 

  9. Bader DA, Hart WE and Phillips CA (2005) Parallel algorithm design for branch and bound. In:Tutorials on Emerging Methodologies and Applications in Operations Research. Springer, New York, pp 5–1

  10. Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukolun K (2000) The stanford hydra CMP. IEEE Micro 20(2):71–84

    Article  Google Scholar 

  11. Sohi GS, Roth A (2001) Speculative multithreaded processors. Computer 34(4):66–71

    Article  Google Scholar 

  12. Oplinger J, Heine D, Liao S-W, Nayfeh BA, Lam MS, Olukotun K (1997) Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical report, Citeseer

  13. Rundberg P, Stenström P (2001) An all-software thread-level data dependence speculation system for multiprocessors. J Instr Level Parallelism 3(1):2002

    Google Scholar 

  14. Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: ACM SIGARCH Computer Architecture News. ACM, vol 28, pp 1–12

  15. Steffan JG, Colohan C, Zhai A, Mowry TC (2005) The stampede approach to thread-level speculation. ACM Trans Comput Syst TOCS 23(3):253–300

    Article  Google Scholar 

  16. Gao L, Li L, Xue J, Yew P-C (2013) Seed: a statically greedy and dynamically adaptive approach for speculative loop execution. IEEE Trans Comput 62(5):1004–1016

    Article  MathSciNet  MATH  Google Scholar 

  17. Bhowmik A, Franklin M (2002) A general compiler framework for speculative multithreading. In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, pp 99–108

  18. Ohsawa T, Takagi M, Kawahara S, Matsushita S (2005) Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 81–92

  19. Chen Z, Zhao Y-L, Pan X-Y, Dong Z-Y, Gao B, Zhong Z-W (2009) An overview of prophet. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 396–407

  20. Du Z-H, Lim C-C, Li X-F, Yang C, Zhao Q, Ngai T-F (2004) A cost-driven compilation framework for speculative parallelization of sequential programs. ACM SIGPLAN Not 39(6):71–81

    Article  Google Scholar 

  21. Liu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) Posh: a TLS compiler that exploits program structure. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 158–167

  22. Padua DA, Eigenmann R, Hoeflinger J, Petersen P, Tu P, Weatherford S, Faigin K (1993) Polaris: a new-generation parallelizing compiler for MPPS. In: CSRD Report No. 1306. University of Illinois at Urbana-Champaign. Citeseer

  23. Wilson RP, French RS, Wilson CS, Amarasinghe SP, Anderson JM, Tjiang SWK, Liao S-W, Tseng C-W, Hall MW, Lam MS et al (1994) Suif: an infrastructure for research on parallelizing and optimizing compilers. ACM Sigplan Not 29(12):31–37

    Article  Google Scholar 

  24. Madriles C, García-Quiñones C, Sánchez J, Marcuello P, González A, Tullsen DM, Wang H, Shen JP (2008) Mitosis: a speculative multithreaded processor based on precomputation slices. IEEE Trans Parallel Distrib Syst 19(7):914–925

    Article  Google Scholar 

  25. Sharafeddine M, Jothi K, Akkary H (2012) Disjoint out-of-order execution processor. ACM Trans Archit Code Optim TACO 9(3):19

    Google Scholar 

  26. Liu B, Zhao Y, Zhong X, Liang Z, Feng B (2013) A novel thread partitioning approach based on machine learning for speculative multithreading. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC). IEEE, pp 826–836

  27. Pan X, Zhao Y, Chen Z, Wang X, Wei Y, Du Y (2009) A thread partitioning method for speculative multithreading. In: International Conference on Scalable Computing and Communications/Eighth International Conference on Embedded Computing, Scalcom-Embeddedcom 2009, Dalian, China, September, pp 285–290

  28. Grewe D, Wang Z, O’Boyle MFP (2011) A workload-aware mapping approach for data-parallel programs. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, pp 117–126

  29. Long S, Fursin G, Franke B (2007) A cost-aware parallel workload allocation approach based on machine learning techniques. In: IFIP International Conference on Network and Parallel Computing. Springer, pp 506–515

  30. Tournavitis G, Wang Z, Franke B, O’Boyle MFP (2009) Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. ACM Sigplan Not 44(6):177–187

    Article  Google Scholar 

  31. Wang Z, O’Boyle MFP (2009) Mapping parallelism to multi-cores: a machine learning based approach. In: ACM Sigplan Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February, pp 75–84

  32. Chen X, Long S (2009) Adaptive multi-versioning for OpenMP parallelization via machine learning. In: 2009 15th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 907–912

  33. Liu B, Zhao Y, Li M, Liu Y, Feng B (2012) A virtual sample generation approach for speculative multithreading using feature sets and abstract syntax trees. In: 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE, pp 39–44

  34. Song S, Zhao Y, Feng B, Wei Y, Wang X, Zhao H (2010) Prophet+: an extended multicore simulator for speculative multithreading. J Xian Jiaotong Univ 44(10):13–15

    Google Scholar 

  35. Pickett CJF, Verbrugge C (2005) Sablespmt: a software framework for analysing speculative multithreading in Java. In: ACM SIGSOFT Software Engineering Notes. ACM, vol 31, pp 59–66

  36. Cao Z, Verbrugge C (2013) Mixed model universal software thread-level speculation. In: 2013 42nd International Conference on Parallel Processing. IEEE, pp 651–660

  37. Liu B, Zhao Y, Zhong X, Liang Z, Feng B (2013) A novel thread partitioning approach based on machine learning for speculative multithreading. In: IEEE International Conference on High Performance Computing and Communications, pp 826–836

  38. Mccomas WF (2014) Benchmarks for science literacy. Sense Publishers, Dordrecht

    Book  Google Scholar 

  39. Wilson R, French R, Wilson C, Amarasinghe S, Anderson J, Tjiang S, Liao S, Tseng C, Hall M, Lam M (1994) The suif compiler system: a parallelizing and optimizing research compiler. ACM Sigplan Not, pp 1–7

  40. Gomez I, Piñuel L, Prieto M, Tirado F (2002) Analysis of simulation-adapted spec 2000 benchmarks. ACM SIGARCH Comput Archit News 30(4):4–10

    Article  Google Scholar 

  41. By Olden. Benchmark suite v. (2010)

  42. Monsifrot A, Bodin F, Quiniou R (2002) A machine learning approach to automatic production of compiler heuristics. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, pp 41–50

  43. Stephenson M, Amarasinghe S (2005) Predicting unroll factors using supervised classification. In: International Symposium on Code Generation and Optimization. IEEE, pp 123–134

  44. Agakov F, Bonilla E, Cavazos J, Franke B, Fursin G, O’Boyle MFP, Thomson J, Toussaint M, Williams CKI (2006) Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, pp 295–305

  45. Cavazos J, Dubach C, Agakov F, Bonilla E, O’Boyle MFP, Fursin G, Temam O (2006) Automatic performance model construction for the fast software exploration of new hardware designs. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, pp 24–34

  46. Wang Z, Powell D, Franke B, Boyle MO (2014) Exploitation of gpus for the parallelisation of probably parallel legacy code. In: International Conference on Compiler Construction. Springer, pp 154–173

  47. Yu H, Li Z (2012) Fast loop-level data dependence profiling. In: Proceedings of the 26th ACM International Conference on Supercomputing. ACM, pp 37–46

  48. Ketterlin A, Clauss P (2012) Profiling data-dependence to assist parallelization: framework, scope, and optimization. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 437–448

  49. Kim M, Kim H, Luk C-K (2010) Sd3: a scalable approach to dynamic data-dependence profiling. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 535–546

  50. Chen T, Lin J, Dai X, Hsu W-C, Yew P-C (2004) Data dependence profiling for speculative optimizations. In: International Conference on Compiler Construction. Springer, pp 57–72

  51. Wu P, Kejariwal A, Caşcaval C (2008) Compiler-driven dependence profiling to guide program parallelization. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, pp 232–248

  52. Larus JR (1993) Loop-level parallelism in numeric and symbolic programs. IEEE Trans Parallel Distrib Syst 4(7):812–826

    Article  Google Scholar 

  53. von Praun, C, Bordawekar R, Cascaval C (2008) Modeling optimistic concurrency using quantitative dependence analysis. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 185–196

  54. Wang Z, O’Boyle MFP (2010) Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, pp 307–318

  55. Singer J, Yiapanis P, Pocock A, Lujan M, Brown G, Ioannou N, Cintra M (2010) Static java program features for intelligent squash prediction. Statistical and machine learning approaches to ARchitecture and compilaTion (SMART 10), p 14

  56. Wang S, Yew P-C, Zhai A (2012) Code transformations for enhancing the performance of speculatively parallel threads. J Circuits Syst Comput 21(02):1240008

    Article  Google Scholar 

  57. Yang J, Yu X, Xie Z-Q, Zhang J-P (2011) A novel virtual sample generation method based on Gaussian distribution. Knowl Based Syst 24(6):740–748

    Article  Google Scholar 

  58. Yang S, Shafik RA, Merrett GV, Stott E, Levine JM , Davis J, Al-Hashimi BM (2015) Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In: 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). IEEE, pp 103–110

  59. Poggio T, Vetter T (1992) Recognition and structure from one 2D model view: observations on prototypes, object classes and symmetries. Technical report, DTIC document

  60. Li D-C, Fang Y-H (2009) A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere. Expert Syst Appl 36(1):844–851

    Article  MathSciNet  Google Scholar 

  61. Zheng B, Tsai J-Y, Zang BY, Chen T, Huang B, Li JH, Ding YH, Liang J, Zhen Y, Yew P-C et al (1999) Designing the agassiz compiler for concurrent multithreaded architectures. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, pp 380–398

  62. Sohi G (1997) Multiscalar: another fourth-generation processor. Computer 30(9):72–72

    Google Scholar 

  63. Li Y, Zhao Y, Gao H (2015) Using artificial neural network for predicting thread partitioning in speculative multithreading. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS). IEEE, pp 823–826

Download references

Acknowledgements

We thank our colleagues for their collaboration and the present work. We also thank all the reviewers for their specific comments and suggestions. This work was supported by National Natural Science Foundation of China through Grants No. 61640219 and Doctoral Fund of Ministry of Education of China under Grant No. 2013021110012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinliang Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhao, Y., Sun, L. et al. A hybrid sample generation approach in speculative multithreading. J Supercomput 75, 4193–4225 (2019). https://doi.org/10.1007/s11227-017-2118-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2118-3

Keywords

Navigation