Automated Construction of Domain Ontology Taxonomies from Wikipedia

Jurić, Damir; Banek, Marko; Skočir, Zoran

doi:10.1007/978-3-642-23091-2_37

Automated Construction of Domain Ontology Taxonomies from Wikipedia

Damir Jurić²⁰,
Marko Banek²⁰ &
Zoran Skočir²⁰

Conference paper

1273 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6861))

Abstract

The key step for implementing the idea of the Semantic Web into a feasible system is providing a variety of domain ontologies that are constructed on demand, in an automated manner and in a very short time. In this paper we introduce an unsupervised method for constructing domain ontology taxonomies from Wikipedia. The benefit of using Wikipedia as the source is twofold: first, the Wikipedia articles are concise and have a particularly high “density”of domain knowledge; second, the articles represent a consensus of a large community, thus avoiding term disagreements and misinterpretations. The taxonomy construction algorithm, aimed at finding the subsumption relation, is based on two different techniques, which both apply linguistic parsing: analyzing the first sentence of each Wikipedia article and processing the categories associated with the article. The method has been evaluated against human judgment for two independent domains and the experimental results have proven its robustness and high precision.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banek, M., Jurić, D., Skočir, Z.: Learning semantic n-ary relations from Wikipedia. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 470–477. Springer, Heidelberg (2010)
Chapter Google Scholar
Buitelaar, P., Cimiano, P. (eds.): Ontology learning and population: bridging the gap between text and knowledge selected contributions to ontology learning and population from text. IOS Press, Amsterdam (2008)
MATH Google Scholar
Ciaramita, M., Gangemi, A., Ratsch, E., Šarić, J., Rojas, I.: Unsupervised learning of semantic relations for molecular biology ontologies. In: [2]
Google Scholar
Cimiano, P.: Ontology learning and population from text: algorithms, evaluation and applications. Springer, Heidelberg (2006)
Google Scholar
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Art. Int. Research 24, 305–339 (2005)
MATH Google Scholar
Fellbaum, C. (ed.): WordNet. An electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. COLING, pp. 539–545 (1992)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Pr. ICML, pp. 296–304 (1998)
Google Scholar
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. ECAI, pp. 321–325 (2000)
Google Scholar
de Marneffe, C.-M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proc. LREC, pp. 449–454 (2006)
Google Scholar
Ponzetto, S.P., Strube, M.: Deriving a large-scale taxonomy from Wikipedia. In: Proc. AAAI, pp. 1440–1445 (2007)
Google Scholar
Sánchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl. Eng. 64 (3), 600–623 (2008)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO - A large ontology from Wikipedia and WordNet. J. Web Semantics 6(3), 203–217 (2008)
Article Google Scholar
Wikipedia, http://en.wikipedia.org (retrieved February 12, 2011)
Google Scholar
Zirn, C., Nastase, V., Strube, M.: Distinguishing between instances and classes in the Wikipedia taxonomy. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 376–387. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
Damir Jurić, Marko Banek & Zoran Skočir

Authors

Damir Jurić
View author publications
You can also search for this author in PubMed Google Scholar
Marko Banek
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Skočir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT Institut de Recherche en Informatique de Toulouse, Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg and Johannes-Keppler-University Linz, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
School of Information Technology and Electrical Engineering, University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jurić, D., Banek, M., Skočir, Z. (2011). Automated Construction of Domain Ontology Taxonomies from Wikipedia. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23091-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-23091-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23090-5
Online ISBN: 978-3-642-23091-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics