A Quantitative Summary of XML Structures

  • Zi Lin
  • Bingsheng He
  • Byron Choi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4215)

Abstract

Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.

Keywords

Support Ratio Query Evaluation Selectivity Estimation Document Instance Query Workload 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bex, G.J., Neven, F., den Bussche, J.V.: DTDs versus XML Schema: A Practical Study. In: WebDB, pp. 79–84 (2004)Google Scholar
  2. 2.
    Bohannon, P., Choi, B., Fan, W.: Incremental evaluation of schema-directed XML publishing. In: SIGMOD (2004)Google Scholar
  3. 3.
    Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML schema to relations: A cost-based approach to XML storage. In: ICDE (2002)Google Scholar
  4. 4.
    Boncz, P.A., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: SIGMOD, pp. 479–490 (2006)Google Scholar
  5. 5.
    Braganholo, V.P., Davidson, S.B., Heuser, C.A.: From XML view updates to relational view updates: old solutions to a new problem. In: VLDB (2004)Google Scholar
  6. 6.
    Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large xml repositories. In: ICDE, pp. 261–272 (2005)Google Scholar
  7. 7.
    Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting twig matches in a tree. In: ICDE (2001)Google Scholar
  8. 8.
    Cheney, J.: Compressing XML with multiplexed hierarchical PPM models. In: Data Compression Conference (2001)Google Scholar
  9. 9.
    Choi, B.: What are real DTDs like. In: WebDB, pp. 43–48 (2002)Google Scholar
  10. 10.
    Choi, B.: Document decomposition for XML compression: A heuristic approach. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 202–217. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Deutsch, A., Fernandez, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD, pp. 431–442. ACM Press, New York (1999)Google Scholar
  12. 12.
    ExPASy. Swiss-prot and TrEMBL, available at: http://www.expasy.ch/sprot/
  13. 13.
    Fiebig, T., Helmer, S., Kanne, C.-C., Moerkotte, G., Neumann, J., Schiele, R., Westmann, T.: Anatomy of a native XML base management system. VLDB Journal 11(4), 292–314 (2002)MATHCrossRefGoogle Scholar
  14. 14.
    Florescu, D., Kossmann, D.: Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)Google Scholar
  15. 15.
    Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: StatiX: making XML count. In: SIGMOD Conference, pp. 181–191 (2002)Google Scholar
  16. 16.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE (2002)Google Scholar
  17. 17.
    Ley, M.: DBLP Bibliography (March 2005), available at: http://www.informatik.uni-trier.de/~ley/db/
  18. 18.
    Liefke, H., Suciu, D.: XMILL: An efficient compressor for XML data. In: SIGMOD (2000)Google Scholar
  19. 19.
    McHugh, J., Widom, J.: Query optimization for XML. In: VLDB (1999)Google Scholar
  20. 20.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  21. 21.
    National Aeronautics and Space Administration. The NASA XML project, available at: http://xml.nasa.gov/xmlwg/index.htm
  22. 22.
    Paparizos, S., Al-Khalifa, S., Chapman, A., Jagadish, H.V., Lakshmanan, L.V.S., Nierman, A., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: TIMBER: A native system for querying XML. In: SIGMOD (2003)Google Scholar
  23. 23.
    Polyzotis, N., Garofalakis, M.N.: Statistical synopses for graph-structured XML databases. In: SIGMOD (2002)Google Scholar
  24. 24.
    Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD, pp. 294–305 (1996)Google Scholar
  25. 25.
    Prakash, S., Bhowmick, S.S., Madria, S.K.: Efficient recursive XML query processing in relational database systems. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 493–510. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  26. 26.
    Runapongsa, K., Patel, J., Jagadish, H., Chen, Y., Al-Khalifa, S.: The Michigan benchmark: Towards XML query performance diagnostics (2003)Google Scholar
  27. 27.
    Schmidt, A.: XMark – an XML benchmakr project (2003), available at: http://monetdb.cwi.nl/xml/generator.html
  28. 28.
    Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: VLDB, pp. 974–985 (2002)Google Scholar
  29. 29.
    Segoufin, L., Vianu, V.: Validating streaming xml documents. In: PODS, pp. 53–64 (2002)Google Scholar
  30. 30.
    Shanmugasundaram, J., Shekita, E., Kiernan, J.: A general technique for querying XML documents using a relational database system. SIGMOD Record 30(3), 20–26 (2001)CrossRefGoogle Scholar
  31. 31.
    Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. VLDB Journal, 302–314 (1999)Google Scholar
  32. 32.
    ToXGene. The ToX XML generator (2005), available at: http://www.cs.toronto.edu/tox/toxgene/
  33. 33.
    W3C. Extensible Markup Language (XML), available at: http://www.w3.org/XML/
  34. 34.
    Yao, B.B., Ozsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: ICDE, pp. 621–633 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zi Lin
    • 1
  • Bingsheng He
    • 2
  • Byron Choi
    • 1
  1. 1.Nanyang Technological University 
  2. 2.Hong Kong University of Science and Technology 

Personalised recommendations