A Low-Storage-Consumption XML Labeling Method for Efficient Structural Information Extraction

  • Wenxin Liang
  • Akihiro Takahashi
  • Haruo Yokota
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5690)

Abstract

Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which cannot be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Amagasa, T., Yoshikawa, M., Uemura, S.: QRS: A Robust Numbering Scheme for XML Documents. In: Proc. of ICDE, pp. 705–707 (2003)Google Scholar
  5. 5.
    Boncz, P., Flokstra, J., Grust, T., van Keulen, M., Manegold, S., Mullender, S., Rittinger, J., Teubner, J.: MonetDB/XQuery—consistent and efficient updates on the pre/Post plane. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 1190–1193. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Boncz, P.A., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine. In: Proc. of SIGMOD Conference, pp. 479–490 (2006)Google Scholar
  7. 7.
    Cohen, E., Kaplan, H., Milo, T.: Labeling Dynamic XML Trees. In: Proc. of PODS, pp. 271–281 (2002)Google Scholar
  8. 8.
    Duong, M., Zhang, Y.: LSDX: A New Labeling Scheme for Dynamically Updating XML Data. In: Proc. of ADC, pp. 185–193 (2005)Google Scholar
  9. 9.
    Gabillon, A., Fansi, M.: A persistent labelling scheme for XML and tree databases. In: Proc. of SITIS, pp. 110–115 (2005)Google Scholar
  10. 10.
    Gerdemann, D.: Parsing As Tree Traversal. In: Proc. of COLING, pp. 396–400 (1994)Google Scholar
  11. 11.
    Steele Jr., G.L.: Hacker’s Delight. Addison-Wesley Professional, Reading (2003)Google Scholar
  12. 12.
    Khaing, A., Thein, N.L.: A Persistent Labeling Scheme for Dynamic Ordered XML Trees. In: Proc. of Web Intelligence, pp. 498–501 (2006)Google Scholar
  13. 13.
    Kobayashi, K., Liang, W., Kobayashi, D., Watanabe, A., Yokota, H.: VLEI code: An Efficient Labeling Method for Handling XML Documents in an RDB. In: Proc. of ICDE, Tokyo, Japan, pp. 386–387 (2005) (poster)Google Scholar
  14. 14.
    Li, C., Ling, T.W.: Qed: a novel quaternary encoding to completely avoid re-labeling in xml updates. In: Proc. of CIKM, pp. 501–508 (2005)Google Scholar
  15. 15.
    Li, C., Ling, T.W., Hu, M.: Efficient Processing of Updates in Dynamic XML Data. In: Proc. of ICDE, p. 13 (2006)Google Scholar
  16. 16.
    Liang, W., Miki, T., Yokota, H.: Superimposed code-based indexing method for extracting mCTs from XML documents. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 508–522. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    O’Neil, P.E., O’Neil, E.J., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: Insert-Friendly XML Node Labels. In: Proc. of ACM SIGMOD Conference, pp. 903–908 (2004)Google Scholar
  18. 18.
    Sans, V., Laurent, D.: Prefix Based Numbering Schemes for XML: Techniques, Applications and Performances. In: Proc. of VLDB, pp. 1564–1573 (2008)Google Scholar
  19. 19.
    Takahashi, A., Liang, W., Yokota, H.: Storage Consumption of Variable-length XML Labels Uninfluenced by Insertions. In: Proc. of ADSS, pp. 571–573 (2007)Google Scholar
  20. 20.
    Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and Querying Ordered XML Using a Relational Database System. In: Proc. of ACM SIGMOD Conference, pp. 204–215 (2002)Google Scholar
  21. 21.
    Wu, X., Lee, M.-L., Hsu, W.: A Prime Number Labeling Scheme for Dynamic Ordered XML Trees. In: Proc. of ICDE, pp. 66–78 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wenxin Liang
    • 1
  • Akihiro Takahashi
    • 2
  • Haruo Yokota
    • 2
    • 3
  1. 1.School of SoftwareDalian University of TechnologyChina
  2. 2.Department of Computer ScienceTokyo Institute of TechnologyJapan
  3. 3.Global Scientific Information and Computing CenterTokyo Institute of TechnologyJapan

Personalised recommendations