Tight Lower Bounds for Query Processing on Streaming and External Memory Data

  • Martin Grohe
  • Christoph Koch
  • Nicole Schweikardt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3580)

Abstract

We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, e.g., of size \(o(\sqrt[n]{5})\). We also establish tight bounds for the complexity of XPath evaluation and filtering.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Aggarwal, G., Datar, M., Rajagopalan, S., Ruhl, M.: On the streaming model augmented with a sorting primitive. In: Proc. FOCS 2004, pp. 540–549 (2004)Google Scholar
  3. 3.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58, 137–147 (1999)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Arasu, A., Babcock, B., Green, T., Gupta, A., Widom, J.: Characterizing Memory Requirements for Queries over Continuous Data Streams. In: Proc. PODS 2002, pp. 221–232 (2002)Google Scholar
  5. 5.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. PODS 2002, pp. 1–16 (2002)Google Scholar
  6. 6.
    Bar-Yossef, Z., Fontoura, M., Josifovski, V.: On the Memory Requirements of XPath Evaluation over XML Streams. In: Proc. PODS 2004, pp. 177–188 (2004)Google Scholar
  7. 7.
    Brüggemann-Klein, A., Murata, M., Wood, D.: Regular Tree and Regular Hedge Languages over Non-ranked Alphabets: Version 1, April 3 (2001), Technical Report HKUST-TCSC-2001-05, Hong Kong Univ. of Science and Technology (2001)Google Scholar
  8. 8.
    Chen, J.-E., Yap, C.-K.: Reversal Complexity. SIAM J. Comput. 20(4), 622–638 (1991)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Doner, J.: Tree Acceptors and some of their Applications. Journal of Computer and System Sciences 4, 406–451 (1970)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Duris, P., Galil, Z., Schnitger, G.: Lower bounds on communication complexity. Information and Computation 73, 1–22 (1987); Journal version of STOC 1984 (1984)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Gottlob, G., Koch, C.: Monadic Datalog and the Expressive Power of Web Information Extraction Languages. Journal of the ACM 51(1), 74–113 (2004)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Gottlob, G., Koch, C., Pichler, R.: Efficient Algorithms for Processing XPath Queries. In: Proc. VLDB 2002, Hong Kong, China, pp. 95–106 (2002)Google Scholar
  13. 13.
    Gottlob, G., Koch, C., Pichler, R.: The Complexity of XPath Query Evaluation. In: Proc. PODS 2003, San Diego, California, pp. 179–190 (2003)Google Scholar
  14. 14.
    Graefe, G.: Query Evaluation Techniques for Large Databases. ACM Computing Surveys 25(2), 73–170 (1993)CrossRefGoogle Scholar
  15. 15.
    Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Grohe, M., Koch, C., Schweikardt, N.: Tight lower bounds for query processing on streaming and external memory data. Technical report CoRR cs.DB/0505002, Full version of ICALP 2005 paper (2005)Google Scholar
  17. 17.
    Grohe, M., Schweikardt, N.: Lower bounds for sorting with few random accesses to external memory. In: Proc. PODS (2005) (To appear)Google Scholar
  18. 18.
    Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: External memory algorithms. DIMACS Series In Discrete Mathematics And Theoretical Computer Science, vol. 50, pp. 107–118 (1999)Google Scholar
  19. 19.
    Hopcroft, J.E., Ullman, J.D.: Some results on tape-bounded Turing machines. Journal of the ACM 16(1), 168–177 (1969)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Koch, C.: Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach. In: Proc. VLDB 2003, pp. 249–260 (2003)Google Scholar
  21. 21.
    Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge Univ. Press, Cambridge (1997)MATHGoogle Scholar
  22. 22.
    Meyer, U., Sanders, P., Sibeyn, J. (eds.): ESA 2003. LNCS, vol. 2832. Springer, Heidelberg (2003)Google Scholar
  23. 23.
    Munro, J., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Muthukrishnan, S.: Data streams: algorithms and applications. In: Proc. 14th SODA, pp. 413–413 (2003)Google Scholar
  25. 25.
    Neumann, A., Seidl, H.: Locating Matches of Tree Patterns in Forests. In: Arvind, V., Sarukkai, S. (eds.) FST TCS 1998. LNCS, vol. 1530, pp. 134–146. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  26. 26.
    Neven, F., van den Bussche, J.: Expressiveness of Structured Document Query Languages Based on Attribute Grammars. J. ACM 49(1), 56–100 (2002)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Ramakrishnan, R., Gehrke, J.: Database Management Systems. McGraw-Hill, New York (2002)Google Scholar
  28. 28.
    Segoufin, L.: Typing and Querying XML Documents: Some Complexity Bounds. In: Proc. PODS 2003, pp. 167–178 (2003)Google Scholar
  29. 29.
    Segoufin, L., Vianu, V.: Validating Streaming XML Documents. In: Proc. PODS 2002 (2002)Google Scholar
  30. 30.
    Thatcher, J., Wright, J.: Generalized Finite Automata Theory with an Application to a Decision Problem of Second-order Logic. Math. Syst. Theory 2(1), 57–81 (1968)CrossRefMathSciNetGoogle Scholar
  31. 31.
    van Emde Boas, P.: Machine Models and Simulations. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, ch. 1, vol. 1, pp. 1–66. Elsevier Science Publishers B.V, Amsterdam (1990)Google Scholar
  32. 32.
    Vitter, J.: External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys 33(2), 209–271 (2001)CrossRefGoogle Scholar
  33. 33.
    World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C Working Draft (August 16, 2002), http://www.w3.org/XML/Query
  34. 34.
    Yao, A.: Some complexity questions related to distributive computing. In: Proc. 11th STOC, pp. 209–213 (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Martin Grohe
    • 1
  • Christoph Koch
    • 2
  • Nicole Schweikardt
    • 1
  1. 1.Institut für InformatikHumboldt-Universität BerlinGermany
  2. 2.Database GroupUniversität des SaarlandesSaarbrückenGermany

Personalised recommendations