MiniTasking: Improving Cache Performance for Multiple Query Workloads

  • Yan Zhang
  • Zhifeng Chen
  • Yuanyuan Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


This paper proposes a novel idea, called MiniTasking to reduce the number of cache misses by improving the data temporal locality for multiple concurrent queries. Our idea is based on the observation that, in many workloads such as decision support systems (DSS), there is usually significant amount of data sharing among different concurrent queries. MiniTasking exploits such data sharing characteristics to improve data temporal locality by scheduling query execution at three levels: (1) It batches queries based on their data sharing characteristics and the cache configuration. (2) It groups operators that share certain data. (3) It schedules mini-tasks which are small pieces of computation in operator groups according to their data locality without violating their execution dependencies.

Our experimental results show that, MiniTasking can significantly reduce the execution time up to 12% for joins. For the TPC-H throughput test workload, MiniTasking improves the end performance up to 20%. Even with the Partition Attributes Across (PAX) layout, MiniTasking further reduces the cache misses by 65% and the execution time by 9%.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. The VLDB Journal 11(3), 198–215 (2002)zbMATHCrossRefGoogle Scholar
  2. 2.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: Where does time go? In: VLDB 1999, pp. 266–277 (1999)Google Scholar
  3. 3.
    Badawy, A.-H.A., Aggarwal, A., Yeung, D., Tseng, C.-W.: Evaluating the impact of memory system performance on software prefetching and locality optimizations. In: International Conference on Supercomputing, pp. 486–500 (2001)Google Scholar
  4. 4.
    Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: Memory access. In: VLDB 1999, pp. 54–65 (1999)Google Scholar
  5. 5.
    Calder, B., Krintz, C., John, S., Austin, T.: Cache-conscious data placement. In: ASPLOS 1998, pp. 139–149 (1998)Google Scholar
  6. 6.
    Carey, M.J., DeWitt, D.J., Franklin, M.J., Hall, N.E., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., Zwilling, M.J.: Shoring up persistent applications. In: SIGMOD 1994, pp. 383–394 (1994)Google Scholar
  7. 7.
    Carr, S., McKinley, K.S., Tseng, C.-W.: Compiler optimizations for improving data locality. In: ASPLOS 1994, pp. 252–262 (1994)Google Scholar
  8. 8.
    Chen, S., Gibbons, P.B., Mowry, T.C.: Improving index performance through prefetching. In: SIGMOD 2001, pp. 235–246 (2001)Google Scholar
  9. 9.
    Chen, S., Gibbons, P.B., Mowry, T.C., Valentin, G.: Fractal prefetching b+-trees: optimizing both cache and disk performance. In: SIGMOD 2002, pp. 157–168 (2002)Google Scholar
  10. 10.
    Dalvi, N.N., Sanghai, S.K., Roy, P., Sudarshan, S.: Pipelining in multi-query optimization. In: PODS 2001, pp. 59–70 (2001)Google Scholar
  11. 11.
    Ding, C., Kennedy, K.: Inter-array data regrouping. In: Languages and Compilers for Parallel Computing, pp. 149–163 (1999)Google Scholar
  12. 12.
    Ding, C., Orlovich, M.: The potential of computation regrouping for improving locality. In: ACM/IEEE SC 2004, November 6-12 (2004)Google Scholar
  13. 13.
    Finkelstein, S.: Common expression analysis in database applications. In: SIGMOD 1982, pp. 235–245 (1982)Google Scholar
  14. 14.
    Hankins, R.A., Patel, J.M.: Data morphing: An adaptive,cache-conscious storage technique. In: VLDB 2003. Morgan Kaufmann, San Francisco (2003)Google Scholar
  15. 15.
    Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: Qpipe: A simultaneously pipelined relational query engine. In: SIGMOD 2005, pp. 383–394 (2005)Google Scholar
  16. 16.
    IBM. Personal communication with IBM (January 2005)Google Scholar
  17. 17.
    Intel Corporation. Intel vtune performance analyzer (2004)
  18. 18.
    Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pp. 301–320. Springer, Heidelberg (1994)Google Scholar
  19. 19.
    Kim, K., Cha, S.K., Kwon, K.: Optimizing multidimensional index trees for main memory access. In: SIGMOD 2001, pp. 139–150. ACM Press, New York (2001)CrossRefGoogle Scholar
  20. 20.
    Lo, J.L., Barroso, L.A., Eggers, S.J., Gharachorloo, K., Levy, H.M., Parekh, S.S.: An analysis of database workload performance on simultaneous multithreaded processors. In: ISCA 1998, pp. 39–50. IEEE Computer Society, Los Alamitos (1998)Google Scholar
  21. 21.
    McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)CrossRefGoogle Scholar
  22. 22.
    O’Gorman, K., Agrawal, D., Abbadi, A.E.: Multiple query optimization by cache-aware middleware using query teamwork. In: ICDE 2002, p. 274. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  23. 23.
    Philbin, J., Edler, J., Anshus, O.J., Douglas, C.C., Li, K.: Thread scheduling for cache locality. In: ASPLOS 1996, pp. 60–71. ACM Press, New York (1996)Google Scholar
  24. 24.
    Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: VLDB 2002, pp. 430–441 (2002)Google Scholar
  25. 25.
    Rao, J., Ross, K.A.: Making b+- trees cache conscious in main memory. In: SIGMOD 2000, pp. 475–486. ACM Press, New York (2000)CrossRefGoogle Scholar
  26. 26.
    Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD 2000, pp. 249–260. ACM Press, New York (2000)CrossRefGoogle Scholar
  27. 27.
    Sellis, T., Ghosh, S.: On the multiple-query optimization problem. IEEE Transactions on Knowledge and Data Engineering 2(2), 262–266 (1990)CrossRefGoogle Scholar
  28. 28.
    Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)CrossRefGoogle Scholar
  29. 29.
    Shatdal, A., Kant, C., Naughton, J.F.: Cache conscious algorithms for relational query processing. In: VLDB 1994, pp. 510–521 (1994)Google Scholar
  30. 30.
    Transaction processing performance council,
  31. 31.
    Trancoso, P., Larriba-Pey, J.-L., Zhang, Z., Torrellas, J.: The memory performance of DSS commercial workloads in shared-memory multiprocessors. In: HPCA 1997 (1997)Google Scholar
  32. 32.
    Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: PLDI 1991 (1991)Google Scholar
  33. 33.
    Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB 2003, pp. 405–416 (2003)Google Scholar
  34. 34.
    Zhou, Y., Wang, L., Clark, D.W., Li, K.: Thread scheduling for out-of-core applications with memory server on multicomputers. In: IOPADS 1999, pp. 57–67. ACM Press, New York (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yan Zhang
    • 1
  • Zhifeng Chen
    • 2
  • Yuanyuan Zhou
    • 3
  1. 1.National laboratory on machine perceptionPeking Univ.BeijingChina
  2. 2.GoogleUSA
  3. 3.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUSA

Personalised recommendations