Abstract
The key step for implementing the idea of the Semantic Web into a feasible system is providing a variety of domain ontologies that are constructed on demand, in an automated manner and in a very short time. In this paper we introduce an unsupervised method for constructing domain ontology taxonomies from Wikipedia. The benefit of using Wikipedia as the source is twofold: first, the Wikipedia articles are concise and have a particularly high “density”of domain knowledge; second, the articles represent a consensus of a large community, thus avoiding term disagreements and misinterpretations. The taxonomy construction algorithm, aimed at finding the subsumption relation, is based on two different techniques, which both apply linguistic parsing: analyzing the first sentence of each Wikipedia article and processing the categories associated with the article. The method has been evaluated against human judgment for two independent domains and the experimental results have proven its robustness and high precision.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Banek, M., Jurić, D., Skočir, Z.: Learning semantic n-ary relations from Wikipedia. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 470–477. Springer, Heidelberg (2010)
Buitelaar, P., Cimiano, P. (eds.): Ontology learning and population: bridging the gap between text and knowledge selected contributions to ontology learning and population from text. IOS Press, Amsterdam (2008)
Ciaramita, M., Gangemi, A., Ratsch, E., Šarić, J., Rojas, I.: Unsupervised learning of semantic relations for molecular biology ontologies. In: [2]
Cimiano, P.: Ontology learning and population from text: algorithms, evaluation and applications. Springer, Heidelberg (2006)
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Art. Int. Research 24, 305–339 (2005)
Fellbaum, C. (ed.): WordNet. An electronic lexical database. MIT Press, Cambridge (1998)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. COLING, pp. 539–545 (1992)
Lin, D.: An information-theoretic definition of similarity. In: Pr. ICML, pp. 296–304 (1998)
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. ECAI, pp. 321–325 (2000)
de Marneffe, C.-M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proc. LREC, pp. 449–454 (2006)
Ponzetto, S.P., Strube, M.: Deriving a large-scale taxonomy from Wikipedia. In: Proc. AAAI, pp. 1440–1445 (2007)
Sánchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl. Eng. 64 (3), 600–623 (2008)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO - A large ontology from Wikipedia and WordNet. J. Web Semantics 6(3), 203–217 (2008)
Wikipedia, http://en.wikipedia.org (retrieved February 12, 2011)
Zirn, C., Nastase, V., Strube, M.: Distinguishing between instances and classes in the Wikipedia taxonomy. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 376–387. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jurić, D., Banek, M., Skočir, Z. (2011). Automated Construction of Domain Ontology Taxonomies from Wikipedia. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23091-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-23091-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23090-5
Online ISBN: 978-3-642-23091-2
eBook Packages: Computer ScienceComputer Science (R0)