Word Sense Disambiguation for XML Structure Feature Generation

  • Andrea Tagarelli
  • Mario Longo
  • Sergio Greco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5554)


A common limit of most existing methods that manage XML structure information is that they do not handle the semantic meanings that might be associated to the markup tags. In this paper, we study how to map structure information available from XML elements into semantically related concepts in order to support the generation of XML semantic features of XML structural type. For this purpose, we define an unsupervised word sense disambiguation method to select the most appropriate meaning for each element contextually to its respective XML path. The proposed approach exploits conceptual relations provided by a lexical ontology such as WordNet and employs different notions of sense relatedness. Experiments with data from various application domains are discussed, showing that our approach can be effectively used to generate structural semantic features.


Word Sense Disambiguation Sense Relatedness Lexical Ontology Word Sense Disambiguation Method Word Sense Disambiguation Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Banerjee, S., Pedersen, T.: Extended Gloss Overlaps as a Measure of Semantic Relatedness. In: Proc. IJCAI, pp. 805–810 (2003)Google Scholar
  2. 2.
    Candillier, L., Tellier, I., Torre, F.: Transforming XML Trees for Efficient Classification and Clustering. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 469–480. Springer, Heidelberg (2006)Google Scholar
  3. 3.
    Doucet, A., Lehtonen, M.: Unsupervised Classification of Text-Centric XML Document Collections. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 497–509. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  5. 5.
    Lesk, M.: Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to tell a pine cone from a ice cream cone. In: Proc. ACM SIGDOC Int. Conf. on Systems Documentation, pp. 24–26 (1986)Google Scholar
  6. 6.
    Mandreoli, F., Martoglia, R., Ronchetti, E.: Versatile Structural Disambiguation for Semantic-aware Applications. In: Proc. CIKM (2005)Google Scholar
  7. 7.
    Navigli, R., Velardi, P.: Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1075–1086 (2005)Google Scholar
  8. 8.
    Nayak, R.: Fast and effective clustering of XML data using structural information. Knowledge and Information Systems 14, 197–215 (2008)CrossRefGoogle Scholar
  9. 9.
    Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: ACM SIGMOD WebDB Workshop, pp. 61–66 (2002)Google Scholar
  10. 10.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Pedersen, T., Banerjee, S., Patwardhan, S.: Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Tech. rep. UMSI 2005/25, Supercomputing Institute Research at University of Minnesota, pp. 7–9 (2005)Google Scholar
  12. 12.
    Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  13. 13.
    Tagarelli, A., Greco, S.: Clustering Transactional XML Data with Semantically-Enriched Content and Structural Features. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds.) WISE 2004. LNCS, vol. 3306, pp. 266–278. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Tagarelli, A., Greco, S.: Toward Semantic XML Clustering. In: Proc. SIAM Data Mining, pp. 188–199 (2006)Google Scholar
  15. 15.
    Theobald, M., Schenkel, R., Weikum, G.: Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data. In: ACM SIGMOD WebDB Workshop, pp. 1–6 (2003)Google Scholar
  16. 16.
    Vercoustre, A.M., Fegas, M., Gul, S., Lechevallier, Y.: A Flexible Structured-based Representation for XML Document Mining. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 443–457. Springer, Heidelberg (2006)Google Scholar
  17. 17.
    Yang, D., Powers, D.M.W.: Measuring semantic similarity in the taxonomy of WordNet. In: Proc. Australasian Conf. on Computer Science, pp. 315–322 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Andrea Tagarelli
    • 1
  • Mario Longo
    • 1
  • Sergio Greco
    • 1
  1. 1.Dept. of Electronics, Computer and Systems SciencesUniversity of CalabriaItaly

Personalised recommendations