Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

  • Marko Brunzel
  • Myra Spiliopoulou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3915)


The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents.

The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.


Association Rule Semantic Relation Document Cluster Text Element Lexical Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BMV01]
    Basili, R., Missikoff, M., Velardi, P.: Identification of relevant terms to support the construction of Domain Ontologies. In: ACL 2001 workshop on Human language Technologies, Toulouse, France (July 2001)Google Scholar
  2. [BOS05]
    Buitelaar, P., Olejnik, D., Sintek, M.: Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence and Applications Series, vol. 123. IOS Press, Amsterdam (2005)Google Scholar
  3. [COH]
  4. [DCWS04]
    Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.: A Methodology for Clustering XML Documents by Structure. Information Systems (in press, 2004)Google Scholar
  5. [E04]
    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proceedings of the 13th International WWW Conference, New York (2004)Google Scholar
  6. [FN99]
    Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. [GTA05]
    Gillam, L., Tariq, M., Ahmad, K.: Terminology and the Construction of Ontology. In: Terminology, vol. 11, pp. 55–81. John Benjamins Publishing Company, Amsterdam (2005)Google Scholar
  8. [K01a]
    Kruschwitz, U.: A Rapidly Acquired Domain Model Derived from Mark-up Structure. In: Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki (2001)Google Scholar
  9. [K01b]
    Kruschwitz, U.: Exploiting Structure for Intelligent Web Search. In: Proc. of the 34th Hawaii International Conference on System Sciences (HICSS), Maui Hawaii, IEEE, Los Alamitos (2001)Google Scholar
  10. [K99]
    Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proc. of the 12th Workshop on Knowledge Acquisition, Modeling and Management. Alberta, Canada (1999)Google Scholar
  11. [MS00]
    Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI-2000, pp. 321–325 (2000)Google Scholar
  12. [NJ02]
    Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proc. of International Workshop on the Web and Databases, pp. 61–66 (2002)Google Scholar
  13. [SSV02]
    Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web Sites into the Semantic Web. In: Proc. of the 17th ACM symposium on applied computing, pp. 1100–1107. ACM Press, New York (2002)Google Scholar
  14. [ST04]
    Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL 2004), pp. 73–80. Boston, Massachusetts (2004)Google Scholar
  15. [W05]
    Witschel, H.F.: Terminology extraction and automatic indexing - comparison and qualitative evaluation of methods. In: Proc. of Terminology and Knowledge Engineering (TKE) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marko Brunzel
    • 1
  • Myra Spiliopoulou
    • 1
  1. 1.Otto-von-Guericke-University MagdeburgGermany

Personalised recommendations