PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods

Abeyruwan, Saminda; Visser, Ubbo; Lemmon, Vance; Schürer, Stephan

doi:10.1007/978-3-642-35975-0_12

Saminda Abeyruwan²⁸,
Ubbo Visser²⁸,
Vance Lemmon²⁹ &
…
Stephan Schürer³⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7123))

Included in the following conference series:

867 Accesses
2 Citations

Abstract

It is well known that manually formalizing a domain is a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck. Therefore, many researchers have developed algorithms and systems to help automate the process. Among them are systems that incorporate text corpora in the knowledge acquisition process. Here, we provide a novel method for unsupervised bottom-up ontology generation. It is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. To illustrate our approach, we provide three examples generating ontologies in diverse domains and validate them using qualitative and quantitative measures. The examples include the description of high-throughput screening data relevant to drug discovery and two custom text corpora. Our unsupervised method produces viable results with sometimes unexpected content. It is complementary to the typical top-down ontology development process. Our approach may therefore also be useful to domain experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Balby Marinho, L., Buza, K., Schmidt-Thieme, L.: Folksonomy-Based Collabulary Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 261–276. Springer, Heidelberg (2008)
Chapter Google Scholar
Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the N-Gram Statistics Package. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 370–381. Springer, Heidelberg (2003)
Chapter Google Scholar
Bergamaschi, S., Po, L., Sorrentino, S., Corni, A.: Uncertainty in data integration systems: automatic generation of probabilistic relationships. In: Management of the Interconnected World: ItAIS: the Italian Association for Information Systems, p. 221 (2010)
Google Scholar
Bos, J., Markert, K.: Recognising textual entailment with logical inference. In: HLT 2005: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 628–635. Association for Computational Linguistics, Morristown (2005)
Chapter Google Scholar
Cankaya, H.C., Moldovan, D.: Method for extracting commonsense knowledge. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 57–64. ACM, New York (2009)
Chapter Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM 2010: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110. ACM, New York (2010)
Chapter Google Scholar
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
Chapter Google Scholar
Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York, Inc., Secaucus (2006)
Google Scholar
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research 24, 305–339 (2005)
MATH Google Scholar
Cimiano, P., Völker, J.: Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery (2005)
Google Scholar
Clark, P., Harrison, P.: Large-scale extraction and use of knowledge from text. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 153–160. ACM, New York (2009)
Chapter Google Scholar
Davis, B., Iqbal, A.A., Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Handschuh, S.: RoundTrip Ontology Authoring. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 50–65. Springer, Heidelberg (2008)
Chapter Google Scholar
Dellschaft, K., Staab, S.: Strategies for the evaluation of ontology learning. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 253–272. IOS Press, Amsterdam (2008)
Google Scholar
Ding, Z., Peng, Y.: A Probabilistic Extension to Ontology Language OWL. In: HICSS 2004: The Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004) - Track 4, p. 40111.1. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)
Article Google Scholar
Haase, P., Völker, J.: Ontology Learning and Reasoning — Dealing with Uncertainty and Inconsistency. In: da Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005-2007. LNCS (LNAI), vol. 5327, pp. 366–384. Springer, Heidelberg (2008)
Chapter Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Morristown (1992)
Chapter Google Scholar
Hitzler, P.: What’s Happening in Semantic Web ... and What FCA Could Have to Do with It. In: Valtchev, P., Jäschke, R. (eds.) ICFCA 2011. LNCS, vol. 6628, pp. 18–23. Springer, Heidelberg (2011)
Google Scholar
Inglese, J., Shamu, C.E., Guy, R.K.:
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall, Pearson Education International (2009)
Google Scholar
Kim, D.S., Barker, K., Porter, B.: Knowledge integration across multiple texts. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 49–56. ACM, New York (2009)
Chapter Google Scholar
Koller, D., Levy, A., Pfeffer, A.: P-CLASSIC: A tractable probabilistic description logic. In: Proceedings of AAAI 1997, pp. 390–397 (1997)
Google Scholar
Lin, D., Pantel, P.: Discovery of inference rules for question-answering. Natural Language Engineering 7(4), 343–360 (2001)
Article Google Scholar
Lukasiewicz, T.: Expressive probabilistic description logics. Artif. Intell. 172(6-7), 852–883 (2008)
Article MathSciNet MATH Google Scholar
Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the semantic web. J. Web Sem. 6(4), 291–308 (2008)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Maynard, D., Li, Y., Peters, W.: Nlp techniques for term extraction and ontology population. In: Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 107–127. IOS Press, Amsterdam (2008)
Google Scholar
McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)
Chapter Google Scholar
Pantel, P., Pennacchiotti, M.: Automatically harvesting and ontologizing semantic relations. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 171–195. IOS Press, Amsterdam (2008)
Google Scholar
Poon, H., Domingos, P.: Unsupervised ontology induction from text. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 296–305. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall (2009)
Google Scholar
Salloum, W.: A question answering system based on conceptual graph formalism. In: KAM 2009: Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling, pp. 383–386. IEEE Computer Society, Washington, DC (2009)
Chapter Google Scholar
SHOE: Example computer science department ontology, http://www.cs.umd.edu/projects/plus/SHOE/cs.html (last visited on June 2, 2011)
Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: Principles and methods. Data and Knowledge Engineering 25(1-2), 161–197 (1998)
Article MATH Google Scholar
Tanev, H., Magnini, B.: Weakly supervised approaches for ontology population. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 129–143. IOS Press, Amsterdam (2008)
Google Scholar
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 105–112. ACM, New York (2009)
Chapter Google Scholar
Völker, J., Hitzler, P., Cimiano, P.: Acquisition of OWL DL Axioms from Lexical Resources. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 670–685. Springer, Heidelberg (2007)
Chapter Google Scholar
Wang, Y., Xiao, J., Suzek, T.O., Zhang, J., Wang, J., Bryant, S.H.: Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research, 623–633 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Miami, Florida, USA
Saminda Abeyruwan & Ubbo Visser
The Miami Project to Cure Paralysis, University of Miami Miller School of Medicine, Florida, USA
Vance Lemmon
Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Florida, USA
Stephan Schürer

Authors

Saminda Abeyruwan
View author publications
You can also search for this author in PubMed Google Scholar
Ubbo Visser
View author publications
You can also search for this author in PubMed Google Scholar
Vance Lemmon
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Schürer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Systems Engineering, University of Zaragoza, Ada Byron Building, Maria de Luna 1, 50018, Zaragoza, Spain
Fernando Bobillo
Department of Systems Engineering and Operations Research, George Mason University, Engineering Building, 4400 University Drive, 22030-4444, Fairfax, VA, USA
Paulo C. G. Costa
Dipatimento di Informatica, Università degli Studi di Bari, Via Orabona 4, 70125, Bari, Italy
Claudia d’Amato
Dipatimento di Informatica and CILA, Università degli Studi di Bari, Via Orabona 4, 70125, Bari, Italy
Nicola Fanizzi
Department of Systems Engineering and Operations Research, George Mason University, Engineering Building, 4400 University Drive, 22030-4444, Fairfax, VA, USA
Kathryn B. Laskey
MIRTE Corporation, M/S H305, 7515 Colshire Drive, 22102-7508, McLean, VA, USA
Kenneth J. Laskey
Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Thomas Lukasiewicz
Technische Universität München, Abteilung für Informatk, Boltzmannstraße 3, 85748, Garching, Germany
Matthias Nickles
Goldman Sachs, 30 Hudson Street, 07302-4699, Jersey City, NJ, USA
Michael Pool

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abeyruwan, S., Visser, U., Lemmon, V., Schürer, S. (2013). PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods. In: Bobillo, F., et al. Uncertainty Reasoning for the Semantic Web II. URSW URSW URSW UniDL 2010 2009 2008 2010. Lecture Notes in Computer Science(), vol 7123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35975-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-35975-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35974-3
Online ISBN: 978-3-642-35975-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics