Optimizing Integrated Application Performance with Cache-Aware Metascheduling

  • Brian Dougherty
  • Jules White
  • Russell Kegley
  • Jonathan Preston
  • Douglas C. Schmidt
  • Aniruddha Gokhale
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7045)


Integrated applications running in multi-tenant environments are often subject to quality-of-service (QoS) requirements, such as resource and performance constraints. It is hard to allocate resources between multiple users accessing these types of applications while meeting all QoS constraints, such as ensuring users complete execution prior to deadlines. Although a processor cache can reduce the time required for the tasks of a user to execute, multiple task execution schedules may exist that meet deadlines but differ in cache utilization efficiency. Determining which task execution schedules will utilize the processor cache most efficiently and provide the greatest reductions in execution time is hard without jeopardizing deadlines.

The work in this paper provides three key contributions to increasing the execution efficiency of integrated applications in multi-tenant environments while meeting QoS constraints. First, we present cache-aware metascheduling, which is a novel approach to modifying system execution schedules to increase cache-hit rate and reduce system execution time. Second, we apply cache-aware metascheduling to 11 simulated software systems to create 2 different execution schedules per system. Third, we empirically evaluate the impact of using cache-aware metascheduling to alter task schedules to reduce system execution time. Our results show that cache-aware metascheduling increases cache performance, reduces execution time, and satisfies scheduling constraints and safety requirements without requiring significant hardware or software changes.


Execution Time Integrate Application Loop Fusion Priority Inversion Cache Replacement Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abandah, G., Abdelkarim, A.: A Study on Cache Replacement Policies (2009)Google Scholar
  2. 2.
    Allen, J., Kennedy, K.: Automatic loop interchange. In: Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, p. 246. ACM (1984)Google Scholar
  3. 3.
    Asaduzzaman, A., Mahgoub, I.: Cache Optimization for Embedded Systems Running H. 264/AVC Video Decoder. In: IEEE International Conference on Computer Systems and Applications, 2006, pp. 665–672. IEEE (2006)Google Scholar
  4. 4.
    Atlas, A., Bestavros, A.: Statistical rate monotonic scheduling. In: Proceedings of the 19th IEEE Real-Time Systems Symposium, 1998, pp. 123–132. IEEE (1998)Google Scholar
  5. 5.
    Bahar, R., Albera, G., Manne, S.: Power and performance tradeoffs using various caching strategies. In: Proceedings of the International Symposium on Low Power Electronics and Design, 1998, pp. 64–69. IEEE (2005)Google Scholar
  6. 6.
    Beyls, K., DâĂŹHollander, E.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, vol. 14, pp. 350–360. Citeseer (2001)Google Scholar
  7. 7.
    Chen, T., Baer, J.: Reducing memory latency via non-blocking and prefetching caches. ACM SIGPLAN Notices 27(9), 51–61 (1992)CrossRefGoogle Scholar
  8. 8.
    Dhall, S., Liu, C.: On a real-time scheduling problem. Operations Research 26(1), 127–140 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 102–110. IEEE Computer Society Press (1992)Google Scholar
  10. 10.
    Ghosh, S., Melhem, R., Mossé, D., Sarma, J.: Fault-tolerant rate-monotonic scheduling. Real-Time Systems 15(2), 149–181 (1998)CrossRefGoogle Scholar
  11. 11.
    Guo, F., Solihin, Y.: An analytical model for cache replacement policy performance. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, pp. 228–239. ACM (2006)Google Scholar
  12. 12.
    Kennedy, K., McKinley, K.: Maximizing Loop Parallelism and Improving Data Locality Via Loop Fusion and Distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  13. 13.
    Kowarschik, M., Weiß, C.: An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 213–232. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Lee, Y., Kim, D., Younis, M., Zhou, J., McElroy, J.: Resource scheduling in dependable integrated modular avionics. In: Proceedings International Conference on Dependable Systems and Networks, DSN 2000, pp. 14–23. IEEE (2000)Google Scholar
  15. 15.
    Lehoczky, J., Sha, L., Ding, Y.: The rate monotonic scheduling algorithm: Exact characterization and average case behavior. In: Proceedings of Real Time Systems Symposium, 1989, pp. 166–171. IEEE (1987)Google Scholar
  16. 16.
    Manjikian, N., Abdelrahman, T.: Array data layout for the reduction of cache conflicts. In: Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems, pp. 1–8. Citeseer (1995)Google Scholar
  17. 17.
    Nayfeh, B., Olukotun, K.: Exploring the design space for a shared-cache multiprocessor. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, p. 175. IEEE Computer Society Press (1994)Google Scholar
  18. 18.
    Orozco, J., Cayssials, R., Santos, J., Ferro, E.: 802.4 rate monotonic scheduling in hard real-time environments: Setting the medium access control parameters. Information Processing Letters 62(1), 47–55 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Panda, P., Nakamura, H., Dutt, N., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48(2), 142–149 (2002)CrossRefGoogle Scholar
  20. 20.
    Pingali, S., Kurose, J., Towsley, D.: On Comparing the Number of Preemptions under Earliest Deadline and Rate Monotonic Scheduling Algorithms (2007)Google Scholar
  21. 21.
    Reineke, J., Grund, D., Berg, C., Wilhelm, R.: Timing predictability of cache replacement policies. Real-Time Systems 37(2), 99–122 (2007)CrossRefzbMATHGoogle Scholar
  22. 22.
    Robinson, J., Devarakonda, M.: Data cache management using frequency-based replacement. ACM SIGMETRICS Performance Evaluation Review 18(1), 134–142 (1990)CrossRefGoogle Scholar
  23. 23.
    Rodríguez-Dapena, P.: Software safety certification: a multidomain problem. IEEE Software 16(4), 31–38 (1999)CrossRefGoogle Scholar
  24. 24.
    Shiue, W., Chakrabarti, C.: Memory design and exploration for low power, embedded systems. The Journal of VLSI Signal Processing 29(3), 167–178 (2001)CrossRefzbMATHGoogle Scholar
  25. 25.
    Singhai, S., McKinley, K.: A parametrized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal 40(6), 340 (1997)CrossRefGoogle Scholar
  26. 26.
    Smith, J., Goodman, J.: Instruction cache replacement policies and organizations. IEEE Transactions on Computers, 234–241 (1985)Google Scholar
  27. 27.
    Sprangle, E., Carmean, D.: Increasing processor performance by implementing deeper pipelines. In: Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002, pp. 25–34. IEEE (2002)Google Scholar
  28. 28.
    Stewart, D., Barr, M.: Rate monotonic scheduling. In: Embedded Systems Programming, pp. 79–80 (2002)Google Scholar
  29. 29.
    Wang, Z., Guo, C., Gao, B., Sun, W., Zhang, Z., An, W.: A study and performance evaluation of the multi-tenant data tier design patterns for service oriented computing. In: IEEE International Conference on e-Business Engineering, pp. 94–101. IEEE (2008)Google Scholar
  30. 30.
    Wolf, M., Maydan, D., Chen, D.: Combining loop transformations considering caches and scheduling. In: Micro, p. 274. IEEE Computer Society (1996)Google Scholar
  31. 31.
    Yi, Q., Kennedy, K.: Improving memory hierarchy performance through combined loop interchange and multi-level fusion. International Journal of High Performance Computing Applications 18(2), 237 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Brian Dougherty
    • 1
  • Jules White
    • 1
  • Russell Kegley
    • 2
  • Jonathan Preston
    • 2
  • Douglas C. Schmidt
    • 3
  • Aniruddha Gokhale
    • 3
  1. 1.Virginia TechUSA
  2. 2.Lockheed Martin AeronauticsUSA
  3. 3.Vanderbilt UniversityUSA

Personalised recommendations