Algorithmic Ramifications of Prefetching in Memory Hierarchy

  • Akshat Verma
  • Sandeep Sen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4297)


External Memory models, most notable being the I-O Model [3], capture the effects of memory hierarchy and aid in algorithm design. More than a decade of architectural advancements have led to new features not captured in the I-O model – most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized Random Access Machines under reasonable assumptions. Our work also explains the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not.


Memory Bandwidth Memory Hierarchy Parallel Disk Prediction Sequence Fast Memory 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, A., Alpern, B., Chandra, A., Snir, M.: A model for hierarchical memory. In: Proceedings of ACM Symposium on Theory of Computing (1987)Google Scholar
  2. 2.
    Aggarwal, A., Chandra, A., Snir, M.: Hierarchical memory with block transfer. In: Proceedings of IEEE Foundations of Computer Science, pp. 204–216 (1987)Google Scholar
  3. 3.
    Aggarwal, A., Vitter, J.: The input/output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Alpern, B., Carter, L., Feig, E., Selker, T.: The uniform memory hierarchy model of computation. Algorithmica 12(2), 72–109 (1994)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proceedings of STOC, pp. 307–315 (2003)Google Scholar
  6. 6.
    Chaudhry, G., Cormen, T.H.: Getting more for out-of-core columnsort. In: Mount, D.M., Stein, C. (eds.) ALENEX 2002. LNCS, vol. 2409, p. 143. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Chen, T., Baer, J.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44(5), 609–623 (1995)MATHCrossRefGoogle Scholar
  8. 8.
    Cormen, T.H., Sundquist, T., Wisniewski, L.F.: Asymptotically tight bounds for performing bmmc permutations on parallel disk systems. SIAM Journal on Computing 28(1), 105–136 (1999)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Dementiev, R., Sanders, P.: Asynchronous parallel disk sorting. In: Proceedings of SPAA (2003)Google Scholar
  10. 10.
    Adiga, N.R., et al.: An overview of the bluegene/l supercomputer. In: Proceedings of Supercomputing (SC) (2002)Google Scholar
  11. 11.
    Floyd, R.: Permuting information in idealized two-level storage. Complexity of Computer Computations, 105–109 (1972)Google Scholar
  12. 12.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of FOCS (1999)Google Scholar
  13. 13.
    Worthington, B., Ganger, G., Patt, Y.: The disksim simulation envirnoment (version 2.0), Available at:
  14. 14.
    Hong, J.-W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proceedings of the 13th Symposium on the Theory of Computing (May 1981)Google Scholar
  15. 15.
    Iyer, S., Druschel, P.: Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous i/o. In: Proceedings of SOSP (2001)Google Scholar
  16. 16.
    Kallahalla, M., Varman, P.J.: Optimal read-once parallel disk scheduling. In: Proceedings of IOPADS, pp. 68–77 (1999)Google Scholar
  17. 17.
    Lund, K., Goebel, V.: Adaptive disk scheduling in a multimedia dbms. In: Proceedings of ACM Multimedia (2003)Google Scholar
  18. 18.
    Meyer, U., Zeh, N.: I-o efficient undirected shortest paths. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 434–445. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Nesbit, K.J., Smith, J.E.: Data cache prefetching using a global history buffer. In: Proceedings of HPCA, pp. 96–105 (2004)Google Scholar
  20. 20.
    Sen, S., Chatterjee, S., Dumir, N.: Towards a theory of cache-efficient algorithms. Journal of the ACM (2002)Google Scholar
  21. 21.
    Verma, A., Sen, S.: Model and algorithms for prefetching in memory hierarchy, Working Draft, (2005), Available at:
  22. 22.
    Vishkin, U.: Can parallel algorithms enhance serial implementation? Communications of the ACM (1996)Google Scholar
  23. 23.
    Vitter, J., Shriver, E.: Algorithms for parallel memory I: Two-level memories. Algorithmica 12(2), 110–147 (1994)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Akshat Verma
    • 1
  • Sandeep Sen
    • 2
  1. 1.IBM India Research Lab 
  2. 2.Dept of Computer Science and EngineeringIIT Delhi 

Personalised recommendations