Abstract
It is well known that manually formalizing a domain is a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck. Therefore, many researchers have developed algorithms and systems to help automate the process. Among them are systems that incorporate text corpora in the knowledge acquisition process. Here, we provide a novel method for unsupervised bottom-up ontology generation. It is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. To illustrate our approach, we provide three examples generating ontologies in diverse domains and validate them using qualitative and quantitative measures. The examples include the description of high-throughput screening data relevant to drug discovery and two custom text corpora. Our unsupervised method produces viable results with sometimes unexpected content. It is complementary to the typical top-down ontology development process. Our approach may therefore also be useful to domain experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balby Marinho, L., Buza, K., Schmidt-Thieme, L.: Folksonomy-Based Collabulary Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 261–276. Springer, Heidelberg (2008)
Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the N-Gram Statistics Package. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 370–381. Springer, Heidelberg (2003)
Bergamaschi, S., Po, L., Sorrentino, S., Corni, A.: Uncertainty in data integration systems: automatic generation of probabilistic relationships. In: Management of the Interconnected World: ItAIS: the Italian Association for Information Systems, p. 221 (2010)
Bos, J., Markert, K.: Recognising textual entailment with logical inference. In: HLT 2005: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 628–635. Association for Computational Linguistics, Morristown (2005)
Cankaya, H.C., Moldovan, D.: Method for extracting commonsense knowledge. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 57–64. ACM, New York (2009)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM 2010: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110. ACM, New York (2010)
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York, Inc., Secaucus (2006)
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research 24, 305–339 (2005)
Cimiano, P., Völker, J.: Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery (2005)
Clark, P., Harrison, P.: Large-scale extraction and use of knowledge from text. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 153–160. ACM, New York (2009)
Davis, B., Iqbal, A.A., Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Handschuh, S.: RoundTrip Ontology Authoring. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 50–65. Springer, Heidelberg (2008)
Dellschaft, K., Staab, S.: Strategies for the evaluation of ontology learning. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 253–272. IOS Press, Amsterdam (2008)
Ding, Z., Peng, Y.: A Probabilistic Extension to Ontology Language OWL. In: HICSS 2004: The Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004) - Track 4, p. 40111.1. IEEE Computer Society, Washington, DC (2004)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)
Haase, P., Völker, J.: Ontology Learning and Reasoning — Dealing with Uncertainty and Inconsistency. In: da Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005-2007. LNCS (LNAI), vol. 5327, pp. 366–384. Springer, Heidelberg (2008)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Morristown (1992)
Hitzler, P.: What’s Happening in Semantic Web ... and What FCA Could Have to Do with It. In: Valtchev, P., Jäschke, R. (eds.) ICFCA 2011. LNCS, vol. 6628, pp. 18–23. Springer, Heidelberg (2011)
Inglese, J., Shamu, C.E., Guy, R.K.:
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall, Pearson Education International (2009)
Kim, D.S., Barker, K., Porter, B.: Knowledge integration across multiple texts. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 49–56. ACM, New York (2009)
Koller, D., Levy, A., Pfeffer, A.: P-CLASSIC: A tractable probabilistic description logic. In: Proceedings of AAAI 1997, pp. 390–397 (1997)
Lin, D., Pantel, P.: Discovery of inference rules for question-answering. Natural Language Engineering 7(4), 343–360 (2001)
Lukasiewicz, T.: Expressive probabilistic description logics. Artif. Intell. 172(6-7), 852–883 (2008)
Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the semantic web. J. Web Sem. 6(4), 291–308 (2008)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Maynard, D., Li, Y., Peters, W.: Nlp techniques for term extraction and ontology population. In: Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 107–127. IOS Press, Amsterdam (2008)
McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)
Pantel, P., Pennacchiotti, M.: Automatically harvesting and ontologizing semantic relations. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 171–195. IOS Press, Amsterdam (2008)
Poon, H., Domingos, P.: Unsupervised ontology induction from text. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 296–305. Association for Computational Linguistics, Stroudsburg (2010)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall (2009)
Salloum, W.: A question answering system based on conceptual graph formalism. In: KAM 2009: Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling, pp. 383–386. IEEE Computer Society, Washington, DC (2009)
SHOE: Example computer science department ontology, http://www.cs.umd.edu/projects/plus/SHOE/cs.html (last visited on June 2, 2011)
Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: Principles and methods. Data and Knowledge Engineering 25(1-2), 161–197 (1998)
Tanev, H., Magnini, B.: Weakly supervised approaches for ontology population. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 129–143. IOS Press, Amsterdam (2008)
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 105–112. ACM, New York (2009)
Völker, J., Hitzler, P., Cimiano, P.: Acquisition of OWL DL Axioms from Lexical Resources. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 670–685. Springer, Heidelberg (2007)
Wang, Y., Xiao, J., Suzek, T.O., Zhang, J., Wang, J., Bryant, S.H.: Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research, 623–633 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abeyruwan, S., Visser, U., Lemmon, V., Schürer, S. (2013). PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods. In: Bobillo, F., et al. Uncertainty Reasoning for the Semantic Web II. URSW URSW URSW UniDL 2010 2009 2008 2010. Lecture Notes in Computer Science(), vol 7123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35975-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-35975-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35974-3
Online ISBN: 978-3-642-35975-0
eBook Packages: Computer ScienceComputer Science (R0)