Discovering Semantic Sibling Associations from Web Documents with XTREEM-SP

Brunzel, Marko; Spiliopoulou, Myra

doi:10.1007/11823728_45

Marko Brunzel¹⁸ &
Myra Spiliopoulou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

761 Accesses
2 Citations

Abstract

The semi-automatic extraction of semantics for ontology enhancement or semantic-based information retrieval encompasses several open challenges. There are many findings on the identification of vertical relations among concepts, but much less on indirect, horizontal relations among concepts that share a common, a priori unknown parent, such as Co-Hyponyms and Co-Meronyms. We propose the method XTREEM-SP (Xhtml TREE Mining for Sibling Pairs) for the discovery of such binary "sibling"-relations between concepts of a given vocabulary. While conventional methods process an appropriately prepared corpus, XTREEM-SP operates upon an arbitrarily heterogeneous Web Document Collection on a given topic and returns sibling relations between concepts associated to it. XTREEM-SP is independent of domain and language and does not rely on linguistic preprocessing nor on background knowledge beyond the ontology it is asked to enhance. We present our evaluation results with two gold standard ontologies and show that XTREEM-SP performs well, while being computationally inexpensive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching very large ontologies using the WWW. In: Proc. of the Workshop on Ontology Construction ECAI-2000 (2000)
Google Scholar
Buttler, D.: A short survey of document structure similarity algorithms. In: Proc. of the International Conference on Internet Computing (June 2004)
Google Scholar
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications Series, vol. 123. IOS Press, Amsterdam (2005)
Google Scholar
Brunzel, M., Spiliopoulou, M.: Discovering Multi Terms and Co-Hyponymy from XHTML Documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) KDXD 2006. LNCS, vol. 3915, Springer, Heidelberg (2006)
Chapter Google Scholar
Brunzel, M., Spiliopoulou, M.: Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, Springer, Heidelberg (2006)
Chapter Google Scholar
Choi, I., Moon, B., Kim, H.-J.: A Clustering Method based on Path Similarities of XML Data. Data & Knowledge Engineering (February 2006)
Google Scholar
Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations 6(2), 24–34 (2004)
Article Google Scholar
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided hierarchical clustering algorithm. In: Workshop on Learning and Extending Lexical Ontologies at ICML 2005, Bonn (2005)
Google Scholar
Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: Clustering XML documents using structural summaries. In: Proc. of the EDBT Workshop on Clustering Information over the Web (ClustWeb 2004), Heraklion, Greece (2004)
Google Scholar
Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD dissertation, University of Stuttgart (2004)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proc. of the 13th International WWW Conference, New York (2004)
Google Scholar
Faure, D., Nedellec, C.: Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Chapter Google Scholar
Faatz, A., Steinmetz, R.: Ontology Enrichment with Texts from the WWW. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, Springer, Heidelberg (2002)
Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of the 14th International Conference on Computational Linguistics (1992)
Google Scholar
Heyer, G., Läuter, M., Quasthoff, U., Wittig, T., Wolff, C.: Learning Relations using Collocations. In: Proc. IJCAI Workshop on Ontology Learning, Seattle, WA (2001)
Google Scholar
Kruschwitz, U.: A Rapidly Acquired Domain Model Derived from Mark-Up Structure. In: Proc. of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki (2001)
Google Scholar
Kruschwitz, U.: Exploiting Structure for Intelligent Web Search. In: Proc. of the 34th Hawaii International Conference on System Sciences (HICSS), Maui Hawaii, IEEE, Los Alamitos (2001)
Google Scholar
Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proc. of the 12th Workshop on Knowledge Acquisition, Modeling and Management, Alberta, Canada (1999)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, May 1999. MIT Press, Cambridge (1999)
MATH Google Scholar
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI 2000 (2000)
Google Scholar
Pasca, M.: Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, Springer, Heidelberg (2005)
Chapter Google Scholar
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from Web Documents. In: Proc. of the 2004 Human Language Technology Conference (HLT-NAACL-04), Boston, Massachusetts (2004)
Google Scholar
Tagarelli, A., Greco, S.: Toward Semantic XML Clustering. In: 6th SIAM International Conference on Data Mining (SDM 2006), April 20-22, 2006. Bethesda, Maryland, USA (2006)
Google Scholar
Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity metric for XML documents. In: Proc. of the Workshop on Knowledge and Experience Management (October 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Otto-von-Guericke-University Magdeburg,
Marko Brunzel & Myra Spiliopoulou

Authors

Marko Brunzel
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunzel, M., Spiliopoulou, M. (2006). Discovering Semantic Sibling Associations from Web Documents with XTREEM-SP. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_45

Download citation

DOI: https://doi.org/10.1007/11823728_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics