Skip to main content

PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods

  • Conference paper
Uncertainty Reasoning for the Semantic Web II (URSW 2010, URSW 2009, URSW 2008, UniDL 2010)

Abstract

It is well known that manually formalizing a domain is a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck. Therefore, many researchers have developed algorithms and systems to help automate the process. Among them are systems that incorporate text corpora in the knowledge acquisition process. Here, we provide a novel method for unsupervised bottom-up ontology generation. It is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. To illustrate our approach, we provide three examples generating ontologies in diverse domains and validate them using qualitative and quantitative measures. The examples include the description of high-throughput screening data relevant to drug discovery and two custom text corpora. Our unsupervised method produces viable results with sometimes unexpected content. It is complementary to the typical top-down ontology development process. Our approach may therefore also be useful to domain experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balby Marinho, L., Buza, K., Schmidt-Thieme, L.: Folksonomy-Based Collabulary Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 261–276. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the N-Gram Statistics Package. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 370–381. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Bergamaschi, S., Po, L., Sorrentino, S., Corni, A.: Uncertainty in data integration systems: automatic generation of probabilistic relationships. In: Management of the Interconnected World: ItAIS: the Italian Association for Information Systems, p. 221 (2010)

    Google Scholar 

  4. Bos, J., Markert, K.: Recognising textual entailment with logical inference. In: HLT 2005: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 628–635. Association for Computational Linguistics, Morristown (2005)

    Chapter  Google Scholar 

  5. Cankaya, H.C., Moldovan, D.: Method for extracting commonsense knowledge. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 57–64. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM 2010: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110. ACM, New York (2010)

    Chapter  Google Scholar 

  7. Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York, Inc., Secaucus (2006)

    Google Scholar 

  9. Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research 24, 305–339 (2005)

    MATH  Google Scholar 

  10. Cimiano, P., Völker, J.: Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery (2005)

    Google Scholar 

  11. Clark, P., Harrison, P.: Large-scale extraction and use of knowledge from text. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 153–160. ACM, New York (2009)

    Chapter  Google Scholar 

  12. Davis, B., Iqbal, A.A., Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Handschuh, S.: RoundTrip Ontology Authoring. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 50–65. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Dellschaft, K., Staab, S.: Strategies for the evaluation of ontology learning. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 253–272. IOS Press, Amsterdam (2008)

    Google Scholar 

  14. Ding, Z., Peng, Y.: A Probabilistic Extension to Ontology Language OWL. In: HICSS 2004: The Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004) - Track 4, p. 40111.1. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  15. Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)

    Article  Google Scholar 

  16. Haase, P., Völker, J.: Ontology Learning and Reasoning — Dealing with Uncertainty and Inconsistency. In: da Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005-2007. LNCS (LNAI), vol. 5327, pp. 366–384. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Morristown (1992)

    Chapter  Google Scholar 

  18. Hitzler, P.: What’s Happening in Semantic Web ... and What FCA Could Have to Do with It. In: Valtchev, P., Jäschke, R. (eds.) ICFCA 2011. LNCS, vol. 6628, pp. 18–23. Springer, Heidelberg (2011)

    Google Scholar 

  19. Inglese, J., Shamu, C.E., Guy, R.K.:

    Google Scholar 

  20. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall, Pearson Education International (2009)

    Google Scholar 

  21. Kim, D.S., Barker, K., Porter, B.: Knowledge integration across multiple texts. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 49–56. ACM, New York (2009)

    Chapter  Google Scholar 

  22. Koller, D., Levy, A., Pfeffer, A.: P-CLASSIC: A tractable probabilistic description logic. In: Proceedings of AAAI 1997, pp. 390–397 (1997)

    Google Scholar 

  23. Lin, D., Pantel, P.: Discovery of inference rules for question-answering. Natural Language Engineering 7(4), 343–360 (2001)

    Article  Google Scholar 

  24. Lukasiewicz, T.: Expressive probabilistic description logics. Artif. Intell. 172(6-7), 852–883 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the semantic web. J. Web Sem. 6(4), 291–308 (2008)

    Article  Google Scholar 

  26. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  27. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  28. Maynard, D., Li, Y., Peters, W.: Nlp techniques for term extraction and ontology population. In: Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 107–127. IOS Press, Amsterdam (2008)

    Google Scholar 

  29. McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  30. Pantel, P., Pennacchiotti, M.: Automatically harvesting and ontologizing semantic relations. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 171–195. IOS Press, Amsterdam (2008)

    Google Scholar 

  31. Poon, H., Domingos, P.: Unsupervised ontology induction from text. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 296–305. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  32. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  33. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall (2009)

    Google Scholar 

  34. Salloum, W.: A question answering system based on conceptual graph formalism. In: KAM 2009: Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling, pp. 383–386. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  35. SHOE: Example computer science department ontology, http://www.cs.umd.edu/projects/plus/SHOE/cs.html (last visited on June 2, 2011)

  36. Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: Principles and methods. Data and Knowledge Engineering 25(1-2), 161–197 (1998)

    Article  MATH  Google Scholar 

  37. Tanev, H., Magnini, B.: Weakly supervised approaches for ontology population. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 129–143. IOS Press, Amsterdam (2008)

    Google Scholar 

  38. Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: K-CAP 2009: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 105–112. ACM, New York (2009)

    Chapter  Google Scholar 

  39. Völker, J., Hitzler, P., Cimiano, P.: Acquisition of OWL DL Axioms from Lexical Resources. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 670–685. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  40. Wang, Y., Xiao, J., Suzek, T.O., Zhang, J., Wang, J., Bryant, S.H.: Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research, 623–633 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abeyruwan, S., Visser, U., Lemmon, V., Schürer, S. (2013). PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods. In: Bobillo, F., et al. Uncertainty Reasoning for the Semantic Web II. URSW URSW URSW UniDL 2010 2009 2008 2010. Lecture Notes in Computer Science(), vol 7123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35975-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35975-0_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35974-3

  • Online ISBN: 978-3-642-35975-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics