Advertisement

The VLDB Journal

, Volume 17, Issue 5, pp 1179–1212 | Cite as

Temporal XML: modeling, indexing, and query processing

  • Flavio RizzoloEmail author
  • Alejandro A. Vaisman
Regular Paper

Abstract

In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements.

Keywords

XML Temporal databases Semistructured data Structural summaries XPath 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul S., Cluet S., Ferran G. and Rousset M.-C. (2002). The Xyleme project. Comput. Netw. 39(3): 225–238 CrossRefGoogle Scholar
  2. 2.
    Amagasa, T., Yoshikawa, M., Uemura, S.: A temporal data model for XML documents. In: Proceedings of DEXA Conference, pp. 334–344 (2000)Google Scholar
  3. 3.
    Bozkaya, T., Ozsoyoglu, M.: Indexing valid time intervals. In: Proceedings of DEXA Conference, pp. 541–550 (1998)Google Scholar
  4. 4.
    Buneman P., Davidson S., Fan W., Hara C. and Tan W. (2002). Keys for XML. Comput. Netw. 39(5): 473–487 CrossRefGoogle Scholar
  5. 5.
    Buneman, P., Khanna, S., Tajima, K., Tan, W.: Archiving scientific data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 1–12, Madison, USA (2002)Google Scholar
  6. 6.
    Chawathe, S., Abiteboul, S., Widom, J.: Managing historical semistructured data. In: Theory and Practice of Object Systems, vol. 5(3), pp. 143–162, Wiley, New York (1999)Google Scholar
  7. 7.
    Chawathe, S., Molina, H.G., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: The TSIMMIS project: integration of heterogeneous information sources. In: Proeedings of 100th Anniversary Meeting of the Information Processing Society of Japan, pp. 7–18 (1994)Google Scholar
  8. 8.
    Chien, S., Tsotras, V., Zaniolo, C.: Version management of XML documents. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 75–80, Dallas, TX (2000)Google Scholar
  9. 9.
    Chien, S., Tsotras, V., Zaniolo, C.: Efficient management of multiversion documents by object referencing. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 291–300, Rome, Italy (2001)Google Scholar
  10. 10.
    Chomicki, J.: Temporal query languages: a survey. In: Proceedings of the 1st International Conference on Temporal Logic, LNAI 827, pp. 506–534 (1994)Google Scholar
  11. 11.
    Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 121–132 (2002)Google Scholar
  12. 12.
    Clifford J., Dyreson C.E., Isakowitz T., Jensen C.S. and Snodgrass R.T. (1997). On the semantics of “now” in databases. ACM Trans. Datab. Syst. 22(2): 171–214 CrossRefGoogle Scholar
  13. 13.
    Consens, M.P., Milo, T.: Optimizing queries on files. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 301–312 (1994)Google Scholar
  14. 14.
    De Capitani, S.: An authorization model for temporal XML documents. In: Proceedings of SAC’02, pp. 1088–1093, Madrid, Spain (2002)Google Scholar
  15. 15.
    Drukh, N., Polyzotis, N., Garofalakis, M.N., Matias, Y.: Fractional XSKETCH synopses for XML databases. In: Proceedings of Second International XML Database Symposium, XSym 2004, pp. 189–203 (2004)Google Scholar
  16. 16.
    Dyreson C. and Snodgrass R. (1998). Supporting valid-time indeterminacy. ACM Trans. Datab. Syst. 23(1): 1–57 CrossRefGoogle Scholar
  17. 17.
    Dyreson, C.E.: Observing transaction-time semantics with TTXPath. In: Proceedings of WISE 2001, pp. 193–202 (2001)Google Scholar
  18. 18.
    Dyreson, C.E., Bolen, M.H., Jensen, C.S.: Capturing and querying multiple aspects of semistructured data. In: Proceedings of the 25th VLDB Conference, pp. 290–301 (1999)Google Scholar
  19. 19.
    Etzion, O., Jajodia, S., Sripada, S. (eds): Temporal Databases: Research and Practice. In: LNCS 1399. Springer, Heidelberg (1998)Google Scholar
  20. 20.
    Fan W. and Siméon J. (2003). Integrity constraints for XML. J. Comput. Syst. Sci. 66(1): 254–291 zbMATHCrossRefGoogle Scholar
  21. 21.
    Florescu D. and Kossmann D. (1999). Storing and querying XML data using a RDBMS. IEEE Data Eng. Bull. 22(3): 27–34 Google Scholar
  22. 22.
    Gao, C., Snodgrass, R.: Syntax, semantics and query evaluation in the τXQuery temporal XML query language. Time Center Technical Report TR-72 (2003)Google Scholar
  23. 23.
    Gao, C., Snodgrass, R.: Temporal slicing in the evaluation of XML queries. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 632–643, Berlin, Germany (2003)Google Scholar
  24. 24.
    Gergatsoulis, M., Stavrakas, Y.: Representing changes in XML documents using dimensions. In: Proceedings of the First Symposium on XML databases (XSym 2003), pp. 208–222, Berlin, Germany (2003)Google Scholar
  25. 25.
    Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 436–445 (1997)Google Scholar
  26. 26.
    Grandi F. (2004). Introducing an annotated bibliography on temporal and evolution aspects in the world wide web. SIGMOD Rec. 33(2): 4–86 CrossRefGoogle Scholar
  27. 27.
    Grandi, F., Mandreoli, F.: The valid web: an XML/XSL infrastructure for temporal management of web documents. In: Proceedings of the International Conference on Advances in Information Systems, pp. 294–303 (2000)Google Scholar
  28. 28.
    Grandi, F., Mandreoli, F.: Effective representation and efficient management of indeterminate dates. In: TIME’01, pp. 164–169 (2001)Google Scholar
  29. 29.
    He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: Proceedings of the 20th International Conference on Data Engineering, pp. 683–694 (2004)Google Scholar
  30. 30.
    Kaplan, H., Milo, T., Shabo, R.: A comparison of labeling schemes for ancestor queries. In: Proceedings of the thirteenth annual ACM-SIAM Symposium on Disete Algorithms, pp. 954–963 (2002)Google Scholar
  31. 31.
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 133–144 (2002)Google Scholar
  32. 32.
    Kaushik, R., Bohannon, P., Naughton, J.F., Shenoy, P.: Updates for structure indexes. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 239–250 (2002)Google Scholar
  33. 33.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering, pp. 129–140 (2002)Google Scholar
  34. 34.
    Liefke, H., Suciu, D.: XMILL: an efficient compressor for XML data. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 153–164 (2000)Google Scholar
  35. 35.
    Manukyan, M.G., Kalinichenko, L.A.: Temporal XML. In: Proceedings of ADBIS, pp. 581–590, Vilnius, Lithuania (2001)Google Scholar
  36. 36.
    Marian, A., Abiteboul, S., Cobena, G., Mignet, L.: Change-centric management of versions in an XML warehouse. In: Proceedings of the 27th VLDB Conference, pp. 581–590, Rome, Italy (2001)Google Scholar
  37. 37.
    Mendelzon, A.O., Rizzolo, F., Vaisman, A.: Indexing temporal XML documents. In: Proceedings of the 30th International Conference on Very Large Databases, pp. 216–227, Toronto, Canada (2004)Google Scholar
  38. 38.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory, pp. 277–295 (1999)Google Scholar
  39. 39.
    Nestorov, S., Ullman, J.D., Wiener, J.L., Chawathe, S.S.: Representative objects: concise representations of semistructured, data. In: Proceedings of the 13th International Conference on Data Engineering, pp. 79–90 (1997)Google Scholar
  40. 40.
    Oliboni, B., Quintarelli, E., Tanca, L.: Temporal aspects of semistructured data. In: Proceedings of the Eight International Symposium of Temporal Representation and Reasoning, pp. 119–127 (2001)Google Scholar
  41. 41.
    Polyzotis, N., Garofalakis, M.N.: Statistical synopses for graph-structured XML databases. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 358–369 (2002)Google Scholar
  42. 42.
    Polyzotis, N., Garofalakis, M.N.: Structure and value synopses for XML data graphs. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 466–477 (2002)Google Scholar
  43. 43.
    Polyzotis, N., Garofalakis, M.N.: XCLUSTER synopses for structured XML content. In: Proceedings of the 22nd International Conference on Data Engineering (2006)Google Scholar
  44. 44.
    Polyzotis, N., Garofalakis, M.N., Ioannidis, Y.E.: Approximate XML query answers. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 263–274 (2004)Google Scholar
  45. 45.
    Qun, C., Lim, A., Ong, K.W.: D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 134–144 (2003)Google Scholar
  46. 46.
    Rizzolo, F., Mendelzon, A.O.: Indexing XML data with ToXin. In: Proceedings of 4th International Workshop on the Web and Databases, pp. 49–54 (2001)Google Scholar
  47. 47.
    Salzberg B. and Tsotras V. (1999). Comparison of access methods for time-evolving data. ACM Comput. Surv. 31(2): 158–221 CrossRefGoogle Scholar
  48. 48.
    Santoro N. and Khatib R. (1985). Labelling and implicit routing in networks. Comput. J. 28(1): 5–8 zbMATHCrossRefMathSciNetGoogle Scholar
  49. 49.
    Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: Proceedings of the 9th Conference on Extending Database pp. 237–255 (2004)Google Scholar
  50. 50.
    Sleepycat Software: Berkeley DB Java Edition (2006). http://www.sleepycat.com/products/bdbje.htmlGoogle Scholar
  51. 51.
    Snodgrass R. (1995). The TSQL2 Temporal Query Language. Kluwer Academic Publishers, Dordnecht zbMATHGoogle Scholar
  52. 52.
    Tansel, A., Clifford, J., Gadia, S. (eds.): Temporal Databases: Theory, Design and Implementation. Benjamin/Cummings, Reading (1993)Google Scholar
  53. 53.
    Tatarinov, I., Ives, G., Halevy, A., Weld, D.: Updating XML. In: Proceedings of ACM SIGMOD Conference, pp. 413–424, Santa Barbara, California (2001)Google Scholar
  54. 54.
    Wadler, P.: A formal semantics of patterns in XSLT. In: Markup Technologies, pp. 183–202, IEEE Computer Society, Philadelphia (1999)Google Scholar
  55. 55.
    Wang, F., Zaniolo, C.: Temporal queries in XML document archives and web warehouses. In: Proceedings of the 10th International Symposium on Temporal Representation and Reasoning (TIME’03), pp. 47–55, Cairns, Australia (2003)Google Scholar
  56. 56.
    Wang, F., Zaniolo, C.: XBiT: an XML-based bitemporal data model. In: Proceedings of the 23rd International Conference on Conceptual Modeling, pp. 810–824, Shanghai, China (2004)Google Scholar
  57. 57.
    Wang, F., Zhou, X., Zaniolo, C.: Efficient XML-based techniques for archiving, querying and publishing the histories of relational databases. In: Time Center TeEchnical Report (2005)Google Scholar
  58. 58.
    Wang, F., Zhou, X., Zaniolo, C.: Temporal XML? SQL strikes back! In: Proceedings of the 12th International Symposium on Temporal Representation and Reasoning (TIME’05), pp. 47–55, Burlington, USA (2005)Google Scholar
  59. 59.
    World Wide Web Consortium.: XQuery 1.0: An XML Query Language (2002). http://www.w3.org/TR/2002/WD-xquery-20021115Google Scholar
  60. 60.
    World Wide Web Consortium.: XML Path Language XPath 2.0 (2003). http://www.w3.org/TR/2003/WD-xpath20-20030502Google Scholar
  61. 61.
    Yi, K., He, H., Stanoi, I., Yang, J.: Inemental maintenance of XML structural indexes. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 491–502 (2004)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Toronto, Bahen Center for Information TechnologyTorontoCanada
  2. 2.Universidad de Chile and Universidad de Buenos Aires, Ciudad Universitaria,Buenos AiresArgentina

Personalised recommendations