The VLDB Journal

, Volume 22, Issue 2, pp 177–202 | Cite as

A survey on XML streaming evaluation techniques

Regular Paper

Abstract

XML is currently the most popular format for exchanging and representing data on the web. It is used in various applications and for different types of data including structured, semistructured, and unstructured heterogeneous data types. During the period, XML was establishing itself, data streaming applications have gained increased attention and importance. Because of these developments, the querying and efficient processing of XML streams has became a central issue. In this study, we survey the state of the art in XML streaming evaluation techniques. We focus on both the streaming evaluation of XPath expressions and of XQuery queries. We classify the XPath streaming evaluation approaches according to the main data structure used for the evaluation into three categories: automaton-based approach, array-based approach, and stack-based approach. We review, analyze, and compare the major techniques proposed for each approach. We also review multiple query streaming evaluation techniques. For the XQuery streaming evaluation problem, we identify and discuss four processing paradigms adopted by the existing XQuery stream query engines: the transducer-based paradigm, the algebra-based paradigm, the automata-algebra paradigm, and the pull-based paradigm. In addition, we review optimization techniques for XQuery streaming evaluation. We address the problem of optimizing XQuery streaming evaluation as a buffer optimization problem. For all techniques discussed, we describe the research issues and the proposed algorithms and we compare them with other relevant suggested techniques.

Keywords

XML streaming evaluation XPath XQuery XML query optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

778_2012_281_MOESM1_ESM.pdf (18 kb)
ESM 1 (PDF 18 kb)

References

  1. 1.
    Document Object Model (DOM). World Wide Web Consortium site, W3C. http://www.w3.org/DOM/
  2. 2.
    NewsML: News Markup Language. http://www.newsml.org
  3. 3.
    NITF: News Industry Text Format. http://www.nitf.org
  4. 4.
    World Wide Web Consortium site, W3C. http://www.w3.org/
  5. 5.
    XML Path Language (XPath). World Wide Web Consortium site, W3C. http://www.w3.org/TR/xpath20
  6. 6.
    XML Query Language (XQuery). World Wide Web Consortium site, W3C. http://www.w3.org/XML/Query
  7. 7.
    Abiteboul, S., Marinoiu, B.: Distributed monitoring of peer to peer systems. In: WIDM, pp. 41–48 (2007)Google Scholar
  8. 8.
    Akyildiz I.F., Su W., Sankarasubramaniam Y., Cayirci E.: Wireless sensor networks: a survey. Comput. Netw. 38(4), 393–422 (2002)CrossRefGoogle Scholar
  9. 9.
    Altinel, M., Franklin, M.J.: Efficient filtering of XML documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)Google Scholar
  10. 10.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)Google Scholar
  11. 11.
    Bar-Yossef, Z., Fontoura, M., Josifovski V. (2004) On the memory requirements of XPath evaluation over XML streams. In: PODS, pp. 177–188 (2004)Google Scholar
  12. 12.
    Bar-Yossef, Z., Fontoura, M., Josifovski V.: Buffering in query evaluation over XML streams. In: PODS, pp. 216–227 (2005)Google Scholar
  13. 13.
    Bar-Yossef Z., Fontoura M., Josifovski V.: On the memory requirements of XPath evaluation over XML streams. J. Comput. Syst. Sci. 73(3), 391–441 (2007)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., Josifovski, V.: Streaming XPath processing with forward and backward axes. In: ICDE, pp. 455–466 (2003)Google Scholar
  15. 15.
    Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2), 8:1–8:79 (2008)Google Scholar
  16. 16.
    Benzaken, V., Castagna, G., Colazzo, D., Nguyen K.: Type-based XML projection. In: VLDB, pp. 271–282 (2006)Google Scholar
  17. 17.
    Botan, I., Fischer, P.M., Florescu, D., Kossmann, D., Kraska, T., Tamosevicius, R.: Extending XQuery with window functions. In: VLDB, pp. 75–86 (2007)Google Scholar
  18. 18.
    Böttcher, S., Steinmetz, R.: Evaluating xpath queries on XML data streams. In: BNCOD, pp. 101–113 (2007)Google Scholar
  19. 19.
    Bressan S., Catania B., Lacroix Z., Li Y.G., Maddalena A.: Accelerating queries by pruning XML documents. Data Knowl. Eng. 54(2), 211–240 (2005)CrossRefGoogle Scholar
  20. 20.
    Bruno, N., Gravano, L., Koudas, N., Srivastava, D.: Navigation- vs. index-based XML multi-query processing. In: ICDE, pp. 139–150 (2003)Google Scholar
  21. 21.
    Bruno, N., Koudas N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: 310Google Scholar
  22. 22.
    Candan, K.S., Hsiung, W.-P., Chen, S., Tatemura, J., Agrawal, D.: AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. In: VLDB, pp. 559–570 (2006)Google Scholar
  23. 23.
    Candan, K.S., Hsiung, W.-P., Chen, S., Tatemura, J., Agrawal, D.: Afilter: adaptable XML filtering with prefix-caching suffix-clustering. In: VLDB, pp. 559–570. VLDB Endowment (2006)Google Scholar
  24. 24.
    Chan, C.Y., Felber, P., Garofalakis, M.N., Rastogi, R.: Efficient filtering of XML documents with XPath expressions. In: ICDE, pp. 235–244 (2002)Google Scholar
  25. 25.
    Chen, S., Li, H.-G., Tatemura, J., Hsiung, W.-P., Agrawal, D., Candan, K.S.: Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents. In: VLDB (2006)Google Scholar
  26. 26.
    Chen S., Li H.-G., Tatemura J., Hsiung W.-P., Agrawal D., Candan K.S.: Scalable filtering of multiple generalized-tree-pattern queries over XML streams. IEEE Trans. Knowl. Data Eng. 20(12), 1627–1640 (2008)CrossRefGoogle Scholar
  27. 27.
    Chen, Y., Davidson, S.B., Zheng, Y.: An efficient XPath query processor for XML streams. In: ICDE, p. 79 (2006)Google Scholar
  28. 28.
    Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: KDD, pp. 9–17 (2000)Google Scholar
  29. 29.
    Cranor, C.D., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: SIGMOD, pp. 647–651 (2003)Google Scholar
  30. 30.
    Diao Y., Altinel M., Franklin M.J., Zhang H., Fischer P.M.: Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Database Syst. 28(4), 467–516 (2003)CrossRefGoogle Scholar
  31. 31.
    Diao Y., Franklin M.J.: High-performance XML filtering: an overview of YFilter. IEEE Data Eng. Bull. 26(1), 41–48 (2003)Google Scholar
  32. 32.
    Fegaras, L.: Efficient processing of XML update streams. In: ICDE, pp. 616–625 (2008)Google Scholar
  33. 33.
    Fegaras, L., Dash, R.K., Wang, Y.: A fully pipelined XQuery processor. In: XIME-P (2006)Google Scholar
  34. 34.
    Fegaras, L., Levine, D., Bose, S., Chaluvadi, V.: Query processing of streamed XML data. In: CIKM, pp. 126–133 (2002)Google Scholar
  35. 35.
    Fernández, M.F., Siméon, J., Wadler, P.: A semi-monad for semi-structured data. In: ICDT, pp. 263–300 (2001)Google Scholar
  36. 36.
    Florescu, D., Hillery, C., Kossmann, D., Lucas, P., Riccardi, F., Westmann,T., Carey, M.J., Sundararajan, A., Agrawal, G.: The BEA/XQRL streaming XQuery processor. In: VLDB, pp. 997–1008 (2003)Google Scholar
  37. 37.
    Foster I.T., Kesselman C., Nick J.M., Tuecke S.: Grid services for distributed system integration. IEEE Comput. 35(6), 37–46 (2002)CrossRefGoogle Scholar
  38. 38.
    Golab L., Özsu M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)CrossRefGoogle Scholar
  39. 39.
    Gong, X., Yan, Y., Qian, W., Zhou, A.: Bloom filter-based XML packets filtering for millions of path queries. In: ICDE, pp. 890–901 (2005)Google Scholar
  40. 40.
    Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. In: VLDB, pp. 95–106 (2002)Google Scholar
  41. 41.
    Gottlob, G., Koch, C., Pichler, R.: XPath query evaluation: Improving time and space efficiency. In: ICDE, pp. 379–390 (2003)Google Scholar
  42. 42.
    Gottlob G., Koch C., Pichler R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Gou, G., Chirkova, R.: Efficient algorithms for evaluating XPath over streams. In: SIGMOD, pp. 269–280 (2007)Google Scholar
  44. 44.
    Gou G., Chirkova R.: Efficiently querying large XML data repositories: a survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)CrossRefGoogle Scholar
  45. 45.
    Green T.J., Gupta A., Miklau G., Onizuka M., Suciu D.: Processing XML streams with deterministic automata and stream indexes. ACM Trans. Database Syst. 29(4), 752–788 (2004)CrossRefGoogle Scholar
  46. 46.
    Gupta, A.K., Suciu, D.: Stream processing of XPath queries with predicates. In: SIGMOD, pp. 419–430 (2003)Google Scholar
  47. 47.
    Gurari E.M.: Introduction to the Theory of Computation. Computer Science Press, Rockville (1989)Google Scholar
  48. 48.
    Han W.-S., Jiang H., Ho H., Li Q.: StreamTX: extracting tuples from streaming XML data. Proc. VLDB Endow. 1(1), 289–300 (2008)Google Scholar
  49. 49.
    Hoeller, N., Reinke, C., Neumann, J., Groppe, S., Werner, C., Linnemann, V.: XML data management and xpath evaluation in wireless sensor networks. In: MoMM, pp. 218–230 (2009)Google Scholar
  50. 50.
    Hong, M., Demers, A.J., Gehrke, J., Koch, C., Riedewald, M., White, W.M.: Massively multi-query join processing in publish/subscribe systems. In: SIGMOD Conference, pp. 761–772 (2007)Google Scholar
  51. 51.
    Hopcroft J.E., Motwani R., Ullman J.D.: Introduction to Automata Theory, Languages, and Computation (2nd edn). Addison-Wesley-Longman, Boston (2001)MATHGoogle Scholar
  52. 52.
    Ives Z.G., Halevy A.Y., Weld D.S.: An XML query engine for network-bound data. VLDB J. 11(4), 380–402 (2001)CrossRefGoogle Scholar
  53. 53.
    Jeffery, S.R., Garofalakis, M.N., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006)Google Scholar
  54. 54.
    Josifovski V., Fontoura M., Barta A.: Querying XML streams. VLDB J. 14(2), 197–210 (2002)CrossRefGoogle Scholar
  55. 55.
    Kabisch, S., Peintner, D., Heuer, J., Kosch, H.: Efficient and flexible XML-based data-exchange in microcontroller-based sensor actor networks. In: AINA Workshops, pp. 508–513 (2010)Google Scholar
  56. 56.
    Kay, M.: SAXON: The XSLT and XQuery Processor. http://saxon.sourceforge.net/
  57. 57.
    Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: FluXQuery: an optimizing XQuery processor for streaming XML data. In: VLDB, pp. 1309–1312 (2004)Google Scholar
  58. 58.
    Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB, pp. 228–239 (2004)Google Scholar
  59. 59.
    Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: VLDB, p. 1149 (2003)Google Scholar
  60. 60.
    Kwon, J., Rao, P., Moon, B., Lee, S.: Fist: Scalable XML document filtering by sequencing twig patterns. In: VLDB, pp. 217–228 (2005)Google Scholar
  61. 61.
    Lakshmanan, L.V.S., Parthasarathy, S.: On efficient matching of streaming XML documents and queries. In: EDBT (2002)Google Scholar
  62. 62.
    Lee, M.L., Chua, B.C., Hsu, W., Tan, K.-L.: Efficient evaluation of multiple queries on streaming XML data. In: CIKM, pp. 118–125 (2002)Google Scholar
  63. 63.
    Li, M., Mani, M., Rundensteiner, E.A.: Efficiently loading and processing XML streams. In: IDEAS, pp. 59–67 (2008)Google Scholar
  64. 64.
    Li, X., Agrawal, G.: Efficient evaluation of XQuery over streaming data. In: VLDB, pp. 265–276 (2005)Google Scholar
  65. 65.
    Liu, J., Roantree, M.: Precomputing queries for personal health sensor environments. In: MEDES, pp. 49–56 (2009)Google Scholar
  66. 66.
    Ludäscher, B., Mukhopadhyay, P., Papakonstantinou, Y.: A transducer-based XML query processor. In: VLDB, pp. 227–238 (2002)Google Scholar
  67. 67.
    Madden, S., Franklin, M.J.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)Google Scholar
  68. 68.
    Mainwaring, A.M., Culler, D.E., Polastre, J., Szewczyk, R., Anderson, J.: Wireless sensor networks for habitat monitoring. In: WSNA, pp. 88–97 (2002)Google Scholar
  69. 69.
    Marian, A., Siméon, J.: Projecting XML documents. In: VLDB, pp. 213–224 (2003)Google Scholar
  70. 70.
    Massie M.L., Chun B.N., Culler D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30(5–6), 817–840 (2004)CrossRefGoogle Scholar
  71. 71.
    Mayorga, V., Polyzotis, N.: Sketch-based summarization of ordered XML streams. In: ICDE, pp. 541–552 (2009)Google Scholar
  72. 72.
    Megginson, D., et al.: Simple API for XML. http://www.saxproject.org/
  73. 73.
    Min J.-K., Park M.-J., Chung C.-W.: XTREAM: an efficient multi-query evaluation on streaming xml data. Inf. Sci. 177(17), 3519–3538 (2007)CrossRefGoogle Scholar
  74. 74.
    Moro, M.M., Bakalov, P., Tsotras, V.J.: Early profile pruning on XML-aware publish/subscribe systems. In: VLDB, pp. 866–877 (2007)Google Scholar
  75. 75.
    Nizar, A., Kumar, P.S.: Efficient evaluation of forward xpath axes over XML streams. In: COMAD, pp. 222–233 (2008)Google Scholar
  76. 76.
    Nizar, A., Kumar, P.S.: Ordered backward XPath axis processing against XML streams. In: XSym, pp. 1–16 (2009)Google Scholar
  77. 77.
    Olteanu, D.: Evaluation of XPath queries against XML streams. In: PhD Thesis, University of Munich (2005)Google Scholar
  78. 78.
    Olteanu D.: Spex: streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng. 19(7), 934–949 (2007)CrossRefGoogle Scholar
  79. 79.
    Olteanu, D., Furche, T., Bry, F.: Evaluating complex queries against XML streams with polynomial combined complexity. In: BNCOD, pp. 31–44 (2004)Google Scholar
  80. 80.
    Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: Looking forward. In: EDBT, pp. 109–127 (2002)Google Scholar
  81. 81.
    Onizuka, M.: Processing XPath queries with forward and downward axes over XML streams. In: EDBT, pp. 27–38 (2010)Google Scholar
  82. 82.
    Peng, F., Chawathe, S.S.: XPath queries on streaming data. In: SIGMOD, pp. 431–442 (2003)Google Scholar
  83. 83.
    Peng F., Chawathe S.S.: XSQ: a streaming XPath engine. ACM Trans. Database Syst. 30(2), 577–623 (2005)CrossRefGoogle Scholar
  84. 84.
    Raj, A., Kumar, P.S.: Branch sequencing based XML message broker architecture. In: ICDE, pp. 656–665 (2007)Google Scholar
  85. 85.
    Ramanan, P.: Evaluating an XPath query on a streaming XML document. In: ICMD (2005)Google Scholar
  86. 86.
    Ramanan P.: Worst-case optimal algorithm for XPath evaluation over XML streams. J. Comput. Syst. Sci. 75(8), 465–485 (2009)MathSciNetMATHCrossRefGoogle Scholar
  87. 87.
    Schmidt, M., Scherzinger, S., Koch, C.: Combined static and dynamic analysis for effective buffer minimization in streaming XQuery evaluation. In: ICDE, pp. 236–245 (2007)Google Scholar
  88. 88.
    Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Schema-conscious filtering of XML documents. In: EDBT, pp. 970–981 (2009)Google Scholar
  89. 89.
    Snoeren, A.C., Conley, K., Gifford, D.K.: Mesh based content routing using XML. In: SOSP, pp. 160–173 (2001)Google Scholar
  90. 90.
    Souldatos, S., Wu, X., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Evaluation of partial path queries on XML data. In: CIKM, pp. 21–30 (2007)Google Scholar
  91. 91.
    Su, H., Rundensteiner, E.A., Mani, M.: Semantic query optimization for XQuery over XML streams. In: VLDB, pp. 277–288 (2005)Google Scholar
  92. 92.
    Su H., Rundensteiner E.A., Mani M.: Automaton meets algebra: a hybrid paradigm for XML stream processing. Data Knowl. Eng. 59(3), 576–602 (2006)CrossRefGoogle Scholar
  93. 93.
    Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic querying of tree-structured data sources using partially specified tree patterns. In: CIKM (2005)Google Scholar
  94. 94.
    Wang, S., Su, H., Li, M., Wei, M., Yang, S., Ditto, D., Rundensteiner, E.A., Mani, M.: R-sox: Runtime semantic query optimization over XML streams. In: VLDB, pp. 1207–1210 (2006)Google Scholar
  95. 95.
    Wilson, P.R.: Uniprocessor garbage collection techniques. In: IWMM, pp. 1–42 (1992)Google Scholar
  96. 96.
    Wu, H., Ling, T.W., Xu, L., Bao, Z.: Performing grouping and aggregate functions in XML queries. In: WWW, pp. 1001–1010 (2009)Google Scholar
  97. 97.
    Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on XML data. In: WWW, pp. 835–844 (2008)Google Scholar
  98. 98.
    Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Vassiliou, Y., Sellis, T.: Processing and evaluating partial tree queries on XML data. IEEE TKDE (to appear) Electronic TKDE. doi:10.1109/TKDE.2011.137
  99. 99.
    Wu, X., Theodoratos, D., Souldatos, S., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized tree-pattern queries with same-path constraints. In: SSDBM, pp. 361–379 (2009)Google Scholar
  100. 100.
    Wu X., Theodoratos D., Souldatos S., Dalamagas T., Sellis T.K.: Evaluation techniques for generalized path pattern queries on XML data. World Wide Web 13(4), 441–474 (2010)CrossRefGoogle Scholar
  101. 101.
    Wu X., Theodoratos D., Zuzarte C.: Efficient evaluation of generalized tree-pattern queries on XML streams. VLDB J. 19(5), 661–686 (2010)CrossRefGoogle Scholar
  102. 102.
    Yoo, D.-S., Tan, V.V., Yi, M.-J.: A universal data access server for distributed data acquisition and monitoring systems. In: ICIC (1) (2009)Google Scholar
  103. 103.
    Zhang, G., Zou, Q.: QuickXScan: Efficient streaming XPath evaluation. In: International Conference on Internet Computing, pp. 249–255 (2006)Google Scholar
  104. 104.
    Zhu,Y., Shasha, D.: Efficient elastic burst detection in data streams. In: KDD, pp. 336–345 (2003)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.State Key Lab. of Software EngineeringWuhan UniversityWuhanChina
  2. 2.New Jersey Institute of TechnologyNewarkUSA

Personalised recommendations