Fast Automatic Heuristic Construction Using Active Learning

  • William F. Ogilvie
  • Pavlos Petoumenos
  • Zheng Wang
  • Hugh Leather
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8967)

Abstract

Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data. However, obtaining this data can take months per platform. This is becoming an ever more critical problem and if no solution is found we shall be left with out of date heuristics which cannot extract the best performance from modern machines.

In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead. Typically in supervised learning the training instances are randomly selected to evaluate regardless of how much useful information they carry. This wastes effort on parts of the space that contribute little to the quality of the produced heuristic. Our approach, on the other hand, uses active learning to select and only focus on the most useful training examples.

We demonstrate this technique by automatically constructing a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative CpuGpu based heterogeneous system. Our methodology is remarkably simple and yet effective, making it a strong candidate for wide adoption. At high levels of classification accuracy the average learning speed-up is 3x, as compared to the state-of-the-art.

Keywords

Machine learning Workload scheduling 

Notes

Acknowledgements

This work was funded under the EPSRC grant, ALEA (EP/H044752/1).

References

  1. 1.
    Power, J., Basu, A., Gu, J., Puthoor, S., Beckmann, B.M., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous system coherence for integrated cpu-gpu systems. In: Proceedings of MICRO 2013Google Scholar
  2. 2.
    Kulkarni, S., Cavazos, J.: Mitigating the compiler optimization phase-ordering problem using machine learning. In: Proceedings of OOPSLA 2012Google Scholar
  3. 3.
    Dubach, C., Jones, T., Bonilla, E., Fursin, G., O’Boyle, M.F.P.: Portable compiler optimisation across embedded programs and microarchitectures using machine learning. In: Proceedings of MICRO 2009Google Scholar
  4. 4.
    Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F.P., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of CGO 2007Google Scholar
  5. 5.
    Grewe, D., Wang, Z., O’Boyle, M.F.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: Proceedings of CGO 2013Google Scholar
  6. 6.
    Settles, B.: Active learning literature survey, University of Wisconsin-Madison, Computer Sciences Technical report 1648 (2009)Google Scholar
  7. 7.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.-H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of IISWC 2009Google Scholar
  8. 8.
    Che, S., Sheaffer, J., Boyer, M., Szafaryn, L., Wang, L., Skadron, K.: A characterization of the rodinia benchmark suite with comparison to contemporary cmp workloads. In: Proceedings of IISWC 2010Google Scholar
  9. 9.
    Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proceedings of LCTES 1999Google Scholar
  10. 10.
    Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of PPoPP 2009Google Scholar
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  12. 12.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of COLT 1992Google Scholar
  13. 13.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc., Secaucus (2006)Google Scholar
  14. 14.
    Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of ICML 1995Google Scholar
  15. 15.
    Moore, D.S., McCabe, G.P.: Introduction to the Practice of Statistics. W.H. Freeman, New York (2002)Google Scholar
  16. 16.
    Welch, B.L.: The Generalization of “Student’s" Problem when Several Different Population Variances are Involved. Biometrika 34, 28–35 (1947)MATHMathSciNetGoogle Scholar
  17. 17.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proceedings of PACT 2004Google Scholar
  18. 18.
    Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterativeoptimization in the polyhedral model: part II, multidimensional time. In: Proceedings of PLDI 2008Google Scholar
  19. 19.
    Clement, M., Quinn, M.: Analytical performance prediction on multicomputers. In: Proceedings of SC 1993Google Scholar
  20. 20.
    Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenström, P.: The worst-case execution-time problem - overview of methods and survey of tools. ACM TECS 7, 1–53 (2008)CrossRefGoogle Scholar
  21. 21.
    Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of ISCA 2009Google Scholar
  22. 22.
    Hormati, A.H., Choi, Y., Kudlur, M., Rabbah, R., Mudge, T., Mahlke, S.: Flextream: adaptive compilation of streaming applications for heterogeneous architectures. In: Proceedings of PACT 2009Google Scholar
  23. 23.
    Stephenson, M., Amarasinghe, S., Martin, M., O’Reilly, U.-M.: Meta optimization: improving compiler heuristics with machine learning. In: Proceedings of PLDI 2003Google Scholar
  24. 24.
    Wang, Z., O’Boyle, M.F.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT 2010Google Scholar
  25. 25.
    Grewe, D., Wang, Z., O’Boyle, M.F.P.: OpenCL task partitioning in the presence of GPU contention. In: Caṣcaval, C., Montesinos-Ortego, P. (eds.) LCPC 2013 - Testing. LNCS, vol. 8664, pp. 87–101. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  26. 26.
    Grewe, D., Wang, Z., O’Boyle, M.: A workload-aware mapping approach for data-parallel programs. In: HiPEAC 2011Google Scholar
  27. 27.
    Zuluaga, M., Krause, A., Milder, P., Püschel, M.: “Smart" design space sampling to predict pareto-optimal solutions. In Proceedings of LCTES 2012Google Scholar
  28. 28.
    Emani, M.K., Wang, Z., O’Boyle, M.F.P.: Smart, adaptivemapping of parallelism in the presence of external workload. In: CGO 2013Google Scholar
  29. 29.
    Wang, Z., O’Boyle, M.F.P.: Using machine learning to partition streaming programs. ACM TACO 10 (2013)Google Scholar
  30. 30.
    Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Bonilla, E., Thomson, J., Leather, H., Williams, C., O’Boyle, M., Barnard, P., Ashton, E., Courtois, E., Bodin, F.: In: Proceedings of the GCC Developers’ SummitGoogle Scholar
  31. 31.
    Luk, C.-k., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of MICRO 2009Google Scholar
  32. 32.
    Zuluaga, M., Krause, A., Sergent, G., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of ICML 2013Google Scholar
  33. 33.
    Balaprakash, P., Gramacy, R.B., Wild, S.M.: Active-learning-based surrogate models for empirical performance tuning. In: Proceedings of CLUSTER 2013Google Scholar
  34. 34.
    Balaprakash, P., Rupp, K., Mametjanov, A., Gramacy, R.B., Hovland, P.D., Wild, S.M.: Empirical performance modeling of GPU kernels using active learning. In: Proceedings of ParCo 2013Google Scholar
  35. 35.
    Liu, Y., Zhang, E.Z., Shen, X.: A Cross-input adaptive framework for GPU program optimizations. In: Proceedings of IPDPS 2009Google Scholar
  36. 36.
    Samadi, M., Hormati, A., Mehrara, M., Lee, J., Mahlke, S.: Adaptive input-aware compilation for graphics engines. In: Proceedings of PLDI 2012Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • William F. Ogilvie
    • 1
  • Pavlos Petoumenos
    • 1
  • Zheng Wang
    • 2
  • Hugh Leather
    • 1
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUK
  2. 2.School of Computing and CommunicationsLancaster UniversityLancasterUK

Personalised recommendations