Abstract
Speculative multithreading (SpMT) is a thread-level automatic parallelization technique to accelerate sequential programs. Machine learning has been successfully brought into SpMT to improve its performance. An appropriate sample set, which plays the role of knowledge provider, is important for machine learning-based (ML-based) thread partition. Conventionally, heuristic rules-based (HR-based) sample generation approach cannot generate adaptive samples. A hybrid sample generation approach can break this bottleneck. With this method, we firstly automatically generate samples, which are MIPS codes consisting of spawning points (SPs) and control quasi-independent points (CQIPs) by heuristic rules; secondly manually adjust the positions of SPs and CQIPs and rebuild pre-computation slice to obtain better performance for every sample; and then build model to ensure that the probability of adjusting to the optimal partition positions is increasing. During the implementation of this approach, three measures: bias weighting, preservation of optimal solutions, summary of greedy rules, are taken. In this way, we enhance the adjustment frequency for subroutines with high called time and preserve the optimal partition positions, so to achieve a stable speedup improvement. On Prophet, which is a generic SpMT processor to evaluate the performance of multithreaded programs, SPEC2000 and Olden benchmarks are used as input. Experiments show that our approach can obtain better sample sets, which deliver a better performance improvement of about 86.9% on a 16 core than the samples generated by HR-based approach. Experiment results also prove that this approach is effective to generate sample sets for ML-based thread partition.
Similar content being viewed by others
References
Liu B, Zhao Y, Li Y, Sun Y, Feng B (2014) A thread partitioning approach for speculative multithreading. J Supercomput 67(3):778–805
Quiñones CG, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: ACM Sigplan Notices. ACM, vol 40, pp 269–279
Dong Z, Zhao Y, Wei Y, Wang X, Song S (2009) Prophet: a speculative multi-threading execution model with architectural support based on CMP. In: International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing, 2009. SCALCOM-EMBEDDEDCOM’09. IEEE, pp 103–108
Luo Y, Zhai A (2012) Dynamically dispatching speculative threads to improve sequential execution. ACM Trans Archit Code Optim 9(3):732–741
Olukotun K, Hammond L, Willey M (1999) Improving the performance of speculatively parallel applications on the hydra CMP. In: Proceedings of the 13th International Conference on Supercomputing, pp 21–30
August DI, Huang J, Beard SR, Johnson NP, Jablin TB (2013) Automatically exploiting cross-invocation parallelism using runtime information. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, pp 1–11
Franklin M (1995) Multiscalar processors. ACM SIGARCH Comput Archit News 23(2):414–425
Bader D (2010) Analyzing massive social networks using multicore and multithreaded architectures. Springer, New York
Bader DA, Hart WE and Phillips CA (2005) Parallel algorithm design for branch and bound. In:Tutorials on Emerging Methodologies and Applications in Operations Research. Springer, New York, pp 5–1
Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukolun K (2000) The stanford hydra CMP. IEEE Micro 20(2):71–84
Sohi GS, Roth A (2001) Speculative multithreaded processors. Computer 34(4):66–71
Oplinger J, Heine D, Liao S-W, Nayfeh BA, Lam MS, Olukotun K (1997) Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical report, Citeseer
Rundberg P, Stenström P (2001) An all-software thread-level data dependence speculation system for multiprocessors. J Instr Level Parallelism 3(1):2002
Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: ACM SIGARCH Computer Architecture News. ACM, vol 28, pp 1–12
Steffan JG, Colohan C, Zhai A, Mowry TC (2005) The stampede approach to thread-level speculation. ACM Trans Comput Syst TOCS 23(3):253–300
Gao L, Li L, Xue J, Yew P-C (2013) Seed: a statically greedy and dynamically adaptive approach for speculative loop execution. IEEE Trans Comput 62(5):1004–1016
Bhowmik A, Franklin M (2002) A general compiler framework for speculative multithreading. In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, pp 99–108
Ohsawa T, Takagi M, Kawahara S, Matsushita S (2005) Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 81–92
Chen Z, Zhao Y-L, Pan X-Y, Dong Z-Y, Gao B, Zhong Z-W (2009) An overview of prophet. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 396–407
Du Z-H, Lim C-C, Li X-F, Yang C, Zhao Q, Ngai T-F (2004) A cost-driven compilation framework for speculative parallelization of sequential programs. ACM SIGPLAN Not 39(6):71–81
Liu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) Posh: a TLS compiler that exploits program structure. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 158–167
Padua DA, Eigenmann R, Hoeflinger J, Petersen P, Tu P, Weatherford S, Faigin K (1993) Polaris: a new-generation parallelizing compiler for MPPS. In: CSRD Report No. 1306. University of Illinois at Urbana-Champaign. Citeseer
Wilson RP, French RS, Wilson CS, Amarasinghe SP, Anderson JM, Tjiang SWK, Liao S-W, Tseng C-W, Hall MW, Lam MS et al (1994) Suif: an infrastructure for research on parallelizing and optimizing compilers. ACM Sigplan Not 29(12):31–37
Madriles C, García-Quiñones C, Sánchez J, Marcuello P, González A, Tullsen DM, Wang H, Shen JP (2008) Mitosis: a speculative multithreaded processor based on precomputation slices. IEEE Trans Parallel Distrib Syst 19(7):914–925
Sharafeddine M, Jothi K, Akkary H (2012) Disjoint out-of-order execution processor. ACM Trans Archit Code Optim TACO 9(3):19
Liu B, Zhao Y, Zhong X, Liang Z, Feng B (2013) A novel thread partitioning approach based on machine learning for speculative multithreading. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC). IEEE, pp 826–836
Pan X, Zhao Y, Chen Z, Wang X, Wei Y, Du Y (2009) A thread partitioning method for speculative multithreading. In: International Conference on Scalable Computing and Communications/Eighth International Conference on Embedded Computing, Scalcom-Embeddedcom 2009, Dalian, China, September, pp 285–290
Grewe D, Wang Z, O’Boyle MFP (2011) A workload-aware mapping approach for data-parallel programs. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, pp 117–126
Long S, Fursin G, Franke B (2007) A cost-aware parallel workload allocation approach based on machine learning techniques. In: IFIP International Conference on Network and Parallel Computing. Springer, pp 506–515
Tournavitis G, Wang Z, Franke B, O’Boyle MFP (2009) Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. ACM Sigplan Not 44(6):177–187
Wang Z, O’Boyle MFP (2009) Mapping parallelism to multi-cores: a machine learning based approach. In: ACM Sigplan Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February, pp 75–84
Chen X, Long S (2009) Adaptive multi-versioning for OpenMP parallelization via machine learning. In: 2009 15th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 907–912
Liu B, Zhao Y, Li M, Liu Y, Feng B (2012) A virtual sample generation approach for speculative multithreading using feature sets and abstract syntax trees. In: 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE, pp 39–44
Song S, Zhao Y, Feng B, Wei Y, Wang X, Zhao H (2010) Prophet+: an extended multicore simulator for speculative multithreading. J Xian Jiaotong Univ 44(10):13–15
Pickett CJF, Verbrugge C (2005) Sablespmt: a software framework for analysing speculative multithreading in Java. In: ACM SIGSOFT Software Engineering Notes. ACM, vol 31, pp 59–66
Cao Z, Verbrugge C (2013) Mixed model universal software thread-level speculation. In: 2013 42nd International Conference on Parallel Processing. IEEE, pp 651–660
Liu B, Zhao Y, Zhong X, Liang Z, Feng B (2013) A novel thread partitioning approach based on machine learning for speculative multithreading. In: IEEE International Conference on High Performance Computing and Communications, pp 826–836
Mccomas WF (2014) Benchmarks for science literacy. Sense Publishers, Dordrecht
Wilson R, French R, Wilson C, Amarasinghe S, Anderson J, Tjiang S, Liao S, Tseng C, Hall M, Lam M (1994) The suif compiler system: a parallelizing and optimizing research compiler. ACM Sigplan Not, pp 1–7
Gomez I, Piñuel L, Prieto M, Tirado F (2002) Analysis of simulation-adapted spec 2000 benchmarks. ACM SIGARCH Comput Archit News 30(4):4–10
By Olden. Benchmark suite v. (2010)
Monsifrot A, Bodin F, Quiniou R (2002) A machine learning approach to automatic production of compiler heuristics. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, pp 41–50
Stephenson M, Amarasinghe S (2005) Predicting unroll factors using supervised classification. In: International Symposium on Code Generation and Optimization. IEEE, pp 123–134
Agakov F, Bonilla E, Cavazos J, Franke B, Fursin G, O’Boyle MFP, Thomson J, Toussaint M, Williams CKI (2006) Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, pp 295–305
Cavazos J, Dubach C, Agakov F, Bonilla E, O’Boyle MFP, Fursin G, Temam O (2006) Automatic performance model construction for the fast software exploration of new hardware designs. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, pp 24–34
Wang Z, Powell D, Franke B, Boyle MO (2014) Exploitation of gpus for the parallelisation of probably parallel legacy code. In: International Conference on Compiler Construction. Springer, pp 154–173
Yu H, Li Z (2012) Fast loop-level data dependence profiling. In: Proceedings of the 26th ACM International Conference on Supercomputing. ACM, pp 37–46
Ketterlin A, Clauss P (2012) Profiling data-dependence to assist parallelization: framework, scope, and optimization. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 437–448
Kim M, Kim H, Luk C-K (2010) Sd3: a scalable approach to dynamic data-dependence profiling. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 535–546
Chen T, Lin J, Dai X, Hsu W-C, Yew P-C (2004) Data dependence profiling for speculative optimizations. In: International Conference on Compiler Construction. Springer, pp 57–72
Wu P, Kejariwal A, Caşcaval C (2008) Compiler-driven dependence profiling to guide program parallelization. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, pp 232–248
Larus JR (1993) Loop-level parallelism in numeric and symbolic programs. IEEE Trans Parallel Distrib Syst 4(7):812–826
von Praun, C, Bordawekar R, Cascaval C (2008) Modeling optimistic concurrency using quantitative dependence analysis. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 185–196
Wang Z, O’Boyle MFP (2010) Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, pp 307–318
Singer J, Yiapanis P, Pocock A, Lujan M, Brown G, Ioannou N, Cintra M (2010) Static java program features for intelligent squash prediction. Statistical and machine learning approaches to ARchitecture and compilaTion (SMART 10), p 14
Wang S, Yew P-C, Zhai A (2012) Code transformations for enhancing the performance of speculatively parallel threads. J Circuits Syst Comput 21(02):1240008
Yang J, Yu X, Xie Z-Q, Zhang J-P (2011) A novel virtual sample generation method based on Gaussian distribution. Knowl Based Syst 24(6):740–748
Yang S, Shafik RA, Merrett GV, Stott E, Levine JM , Davis J, Al-Hashimi BM (2015) Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In: 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). IEEE, pp 103–110
Poggio T, Vetter T (1992) Recognition and structure from one 2D model view: observations on prototypes, object classes and symmetries. Technical report, DTIC document
Li D-C, Fang Y-H (2009) A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere. Expert Syst Appl 36(1):844–851
Zheng B, Tsai J-Y, Zang BY, Chen T, Huang B, Li JH, Ding YH, Liang J, Zhen Y, Yew P-C et al (1999) Designing the agassiz compiler for concurrent multithreaded architectures. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, pp 380–398
Sohi G (1997) Multiscalar: another fourth-generation processor. Computer 30(9):72–72
Li Y, Zhao Y, Gao H (2015) Using artificial neural network for predicting thread partitioning in speculative multithreading. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS). IEEE, pp 823–826
Acknowledgements
We thank our colleagues for their collaboration and the present work. We also thank all the reviewers for their specific comments and suggestions. This work was supported by National Natural Science Foundation of China through Grants No. 61640219 and Doctoral Fund of Ministry of Education of China under Grant No. 2013021110012.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Zhao, Y., Sun, L. et al. A hybrid sample generation approach in speculative multithreading. J Supercomput 75, 4193–4225 (2019). https://doi.org/10.1007/s11227-017-2118-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2118-3