Skip to main content
Log in

Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Computational scientists and engineers who are eager to obtain the best performance of scientific applications need efficient application characterization methods to successfully exploit high-performance hardware resources. However, modern processors are accompanied by high-bandwidth on-chip memory or a large number of cores. Therefore, application characterization research that takes into account the newly introduced hardware features in next-generation high performance computing environments is insufficient and complex. In this paper, we propose a simple and fast method to classify the application characteristics in systems state-of-the-art processors using hardware performance counters. The proposed method utilizes hardware performance counters to monitor hardware events related to system performance. A clustering approach is adopted that requires limited understanding of the correlation between hardware events and application characteristics. The application characterization technique is applied to NAS parallel benchmarks in two systems, including Intel Knights Landing and SkyLake Xeon processors. We demonstrate that the proposed techniques can capture system and application characteristics and provide users with useful insights into application execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Jones, M.D., et al.: Workload analysis of blue waters. arXiv:1703.00924 (2017)

  2. Cho, J.-Y., Jin, H.-W., Nam, D.: Enhanced memory management for scalable MPI intra-node communication on many-core processor. In: Proceedings of the 24th European MPI Users’ Group Meeting (EuroMPI), Article No. 10. ACM (2017)

  3. Molka, D., Schöne, R., Hackenberg, D., Nagel, W.E.: Detecting memory-boundedness with hardware performance counters. In: Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering (ICPE), pp. 27–38. ACM (2017)

  4. Liang, F., Feng, C., Lu, X., Xu, Z.: Performance characterization of hadoop and data MPI based on Amdahl’s second law. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage, pp. 207–215. IEEE (2014)

  5. Wang, H., Isci, C., Subramanian, L., Choi, J., Qian, D., Mutlu, O.: A-DRM: Architecture-aware distributed resource management of virtualized clusters. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 93–106. ACM (2015)

  6. Sreepathi, S., et al.: Application characterization using Oxbow toolkit and PADS infrastructure. In: Proceedings of the 1st International Workshop on Hardware-Software Co-design for High Performance Computing, pp. 55–63. IEEE (2014)

  7. Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A performance counter architecture for computing accurate CPI components. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 175–184. ACM (2006)

  8. NAS Parallel Benchmark. https://www.nas.nasa.gov/publications/npb.html

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  10. EM Clustering. http://weka.sourceforge.net/doc.dev/weka/clusterers/EM.html

  11. Hardware Performance Counter. https://en.wikipedia.org/wiki/Hardware_performance_counter

  12. Intel\(^{\textregistered }\) 64 and IA-32 Architectures Software Developer’s Manual, vol. 3B. Intel (2017)

  13. AMD64 Architecture Programmer’s Manual, vol. 2. AMD (2013)

  14. Perf. https://perf.wiki.kernel.org/

  15. Performance API (PAPI). http://icl.cs.utk.edu/papi/

  16. Intel Vtune. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  17. Mathur, W., Cook, J.: Improved estimation for software multiplexing of performance counters. In: Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 23–32. IEEE (2005)

  18. Jundt, A., et al.: Compute bottlenecks on the new 64-bit ARM. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, Article No. 6. IEEE (2015)

  19. Schöne, R., Hackenberg, D.: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the 2nd ACM/SPEC International Conference on Performance Engineering, pp. 481–486. ACM (2011)

  20. Keller, V., Gruber, R.: One joule per gflop for blas2 now! In: Proceedings of the International Conference of Numerical Analysis and Applied Mathematics, pp. 1321–1324. American Institute of Physics (2010)

  21. Jarus, M., Oleksiak, A.: Top-down characterization approximation based on performance counters architecture for AMD processors. Simul. Model. Pract. Theory. 68, 146–162 (2016)

    Article  Google Scholar 

  22. Da Costa, G., Pierson, J.-M.: Characterizing Applications from Power Consumption: A Case Study for HPC Benchmarks. In: Kranzlmüller, D., Toja, A.M. (eds.) Information and Communication on Technology for the Fight Against Global Warming (ICT-GLOW 2011). Lecture Notes in Computer Science, vol. 6868. Springer (2011)

  23. Zhang, J., Figueiredo, R.J.: Application classification through monitoring and learning of resource consumption patterns. In: Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE (2006)

  24. Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on Co-scheduling for HPC applications. In: Proceedings of the 44th International Conference on Parallel Processing Workshops, pp. 277–285. IEEE (2015)

  25. Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 129–141. ACM (2010)

  26. Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA), pp. 213–224. ACM (2012)

  27. Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  28. Harini, R.: Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Phi\(^{{\rm TM}}\) Processor—Performance Monitoring Reference Manual, vol. 1. Intel (2017)

  29. Likwid tool: Knights Landing. https://github.com/RRZE-HPC/likwid/wiki/KNL

  30. Harini, R.: Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Phi\(^{{\rm TM}}\) Processor—Performance Monitoring Reference Manual, vol. 2. Intel (2017)

  31. Likwid tool: Skylake. https://github.com/RRZE-HPC/likwid/wiki/Skylake

  32. Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Processor Scalable Memory Family Uncore Performance Monitoring Reference Manual. Intel (2017)

  33. Park, G., Rho, S., Kim, J.-S., Nam, D.: Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Clust. Comput. 22(1), 121–133 (2019)

    Article  Google Scholar 

  34. Choi, J., Park, G., Nam, D.: Efficient classification of application characteristics by using hardware performance counters with data mining. In: Proceedings of 2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 24–29. IEEE (2018)

  35. Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O version 2.4. NAS Technical Report NAS-03-002. NASA Ames Research Center (2003)

  36. Pearson correlation coefficient. https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Download references

Acknowledgements

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIT) (No. R0190-18-2012) and Korea Institute of Science and Technology Information (KISTI) Grant (No. K-19-L02-C06-S01)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dukyun Nam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, J., Park, G. & Nam, D. Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining. Cluster Comput 23, 57–69 (2020). https://doi.org/10.1007/s10586-019-02949-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02949-7

Keywords

Navigation