KCAM: Concentrating on Structural Similarity for XML Fragments

  • Lingbo Kong
  • Shiwei Tang
  • Dongqing Yang
  • Tengjiao Wang
  • Jun Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


This paper proposes a new method, KCAM, to measure the structural similarity of XML fragments satisfying given keywords. Its name is derived directly after the key structure in this method, Keyword Common Ancestor Matrix. One KCAM for one XML fragment is a k × k upper triangle matrix. Each element a i, j stores the level information of the SLCA (Smallest Lowest Common Ancestor) node corresponding to the keywords k i , k j . The matrix distance between KCAMs, denoted as KDist( Open image in new window , Open image in new window ), can be used as the approximate structural similarity. KCAM is independent of label information in fragments. It is powerful to distinguish the structural difference between XML fragments.


Edit Distance Node Position Very Large Data Base Precision Ratio Triangle Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: ACM SIGMOD 2002 (2002)Google Scholar
  2. 2.
    Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)Google Scholar
  3. 3.
    Weigel, F., Meuss, H., Schulz, K.U., Bry, F.: Content and Structure in Indexing and Ranking XML. In: WebDB (2004)Google Scholar
  4. 4.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Pearson Education Limited, London (1999)Google Scholar
  5. 5.
    Wolff, J.E., Flörke, H., Cremers, A.B.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (2000)Google Scholar
  6. 6.
    Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)Google Scholar
  7. 7.
    Chinenyanga, T., Kushmerick, N.: Expressive and Efficient Ranked Querying of XML Data. In: WebDB (2001)Google Scholar
  8. 8.
    Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (2002)Google Scholar
  9. 9.
    Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD, June 3-6 (2002)Google Scholar
  10. 10.
    Yu, C., Qi, H., Jagadish, H.V.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)Google Scholar
  11. 11.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Document. In: SIGMOD 2003, June 9-12 (2003)Google Scholar
  12. 12.
    Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)Google Scholar
  13. 13.
    Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30-September 2, pp. 361–372 (2005)Google Scholar
  15. 15.
    Yang, R., Kalnis, P., Tung, A.K.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16(2005)Google Scholar
  16. 16.
    Augsten, N., Böhlen, M.H., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30 - September 2, pp. 301–312 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lingbo Kong
    • 1
  • Shiwei Tang
    • 1
    • 2
  • Dongqing Yang
    • 1
  • Tengjiao Wang
    • 1
  • Jun Gao
    • 1
  1. 1.Department of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.National Laboratory on Machine PerceptionPeking UniversityBeijingChina

Personalised recommendations