Abstract
In recent years, automatic data-driven modeling with machine learning (ML) has received considerable attention as an alternative to analytical modeling for many modeling tasks. While ad hoc adoption of ML approaches has obtained success, the real potential for automation in data-driven modeling has yet to be achieved. We propose AutoMOMML, an end-to-end, ML-based framework to build predictive models for objectives such as performance, and power. The framework adopts statistical approaches to reduce the modeling complexity and automatically identifies and configures the most suitable learning algorithm to model the required objectives based on hardware and application signatures. The experimental results using hardware counters as application signatures show that the median prediction error of performance, processor power, and DRAM power models are 13 %, 2.3 %, and 8 %, respectively.
I’m an employee of the US Government and transfer the rights to the extent transferable (Title 17 §105 U.S.C. applies)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We will make the packages and the framework available on our website (http://www.sdsc.edu/~tiwari/AutoMOMML) at the time of publication.
References
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of the International Conference on Very Large Data Bases, VLDB 1999, pp. 266–277, San Francisco (1999)
Azizi, O., Mahesri, A., Lee, B.C., Patel, S.J., Horowitz, M.: Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 26–36. ACM (2010)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks-summary and preliminary results. In: Proceedings of ACM/IEEE Conference on Supercomputing, SC 1991, New York (1991)
Balaprakash, P., Tiwari, A., Wild, S.M.: Multi objective optimization of HPC kernels for performance, power, and energy. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2013. LNCS, vol. 8551, pp. 239–260. Springer, Heidelberg (2014)
Balaprakash, P., Wild, S., Norris, B.: SPAPT: search problems in automatic performance tuning. Proc. Comp. Sci. 9, 1959–1968 (2012)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)
Bergstra, J., Pinto, N., Cox, D.: Machine learning for predictive auto-tuning with boosted regression trees. In: Innovative Parallel Computing (InPar 2012), pp. 1–9. IEEE (2012)
Berry, M., Potok, T.E., Balaprakash, P., Hoffmann, H., Vatsavai, R., Prabhat: Machine learning and understanding for intelligent extreme scale scientific computing and discovery. Technical report (2015)
Bertran, R., González, M., Martorell, X., Navarro, N., Ayguadé, E.: A systematic methodology to generate decomposable and responsive power models for CMPs. IEEE Trans. Comp. 62(7), 1289–1302 (2013)
Bircher, W.L., John, L.K.: Complete system power estimation: a trickle-down approach based on performance events. In: International Symposium on Performance Analysis of Systems and Software, ISPASS 2007, pp. 158–168. IEEE (2007)
Bishop, C.M.: Pattern Recognition and Machine Learning, vol. 1. Springer, New York (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: IEEE International Symposium on Code Generation and Optimization (CGO 2007), pp. 185–197 (2007)
Chen, C., Chame, J., Hall, M.W.: CHiLL: a framework for composing high-level loop transformations. TR 08–897, Univ. of Southern California, June 2008
Chen, T., Guo, Q., Tang, K., Temam, O., Xu, Z., Zhou, Z.-H., Chen, Y.: Archranker: a ranking approach to design space exploration. In: ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 85–96. IEEE (2014)
David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C.: Rapl: memory power estimation and capping. In: 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), pp. 189–194, August 2010
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Bonilla, E., Thomson, J., Leather, H., et al.: MILEPOST GCC: machine learning based research compiler. In: GCC Summit (2008)
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J.L., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: better decisions enabled by machine learning. In: IEEE International Conference on Data Engineering (ICDE 2009), pp. 592–603 (2009)
Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)
Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intel. Sys. App. 13(4), 18–28 (1998)
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories. Technical report SAND-5574 (2009)
Isci, C., Martonosi, M.: Runtime power monitoring in high-end processors: methodology and empirical data. In: International Symposium on Microarchitecture, MICRO 36, p. 93. IEEE Computer Society, Washington, DC (2003)
Lim, M.Y., Porterfield, A., Fowler, R.: Softpower: fine-grain power estimations using performance counters. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 308–311. ACM (2010)
London, K., Moore, S., Mucci, P., Seymour, K., Luczak, R.: The PAPI cross-platform interface to hardware performance counters. In: Department of Defense Users’ Group Conference Proceedings, pp. 18–21 (2001)
Magni, A., Dubach, C., O’Boyle, M.F.P.: A large-scale cross-architecture evaluation of thread-coarsening. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013 (2013)
McVoy, L., Staelin, C.: lmbench: portable tools for performance analysis. In: USENIX Annual Technical Conference, ATEC 1996, p. 23. USENIX Association, Berkeley (1996)
Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications, pp. 443–462. Chapman & Hall (2007)
Park, E., Cavazos, J., Pouchet, L.-N., Bastoul, C., Cohen, A., Sadayappan, P.: Predictive modeling in a polyhedral optimization space. Int. J. Parallel Program. 41(5), 704–750 (2013)
Pouchet, L.-N.: PolyBench: the polyhedral benchmark suite (2012). http://www.cse.ohio-state.edu/pouchet/software/polybench/
Schöne, R., Hackenberg, D., Molka, D.: Memory performance at reduced CPU clock speeds: an analysis of current x86 64 processors. In: Proceedings of USENIX Conference on Power-Aware Computing and Systems (2012)
Spillinger, O., Eliahu, D., Fox, A., Demmel, J.: Matrix multiplication algorithm selection with support vector machines (2015)
Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of HPC kernels. In: IEEE International Conference on Parallel and Distributed Processing Symposium Workshops (IPDPSW 2012), pp. 990–998 (2012)
Tiwari, A., Peraza, J., Laurenzano, M., Carrington, L., Snavely, A.: Green queue: customized large-scale clock frequency scaling. In: International Conference on Cloud and Green Computing, CGC 2012 (2012)
Tofallis, C.: A better measure of relative prediction accuracy for model selection and model estimation. J. Oper. Res. Soc. 66, 1352–1362 (2014)
Wu, X., Lively, C., Taylor, V., Chang, H.-C., Su, C.-Y., Cameron, K., Moore, S., Terpstra, D., Weaver, V.: Mummi: multiple metrics modeling infrastructure. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 289–295 (2013)
Acknowledgments
This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research program under contract number DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland (outside the US)
About this paper
Cite this paper
Balaprakash, P., Tiwari, A., Wild, S.M., Carrington, L., Hovland, P.D. (2016). AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-41321-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41320-4
Online ISBN: 978-3-319-41321-1
eBook Packages: Computer ScienceComputer Science (R0)