Skip to main content

Probabilistic Topic Models for Enriching Ontology from Texts

Abstract

The ontology enrichment process is text-based and the application domain in hand is circumscribed to the content of the related texts. However, the main challenge in ontology enrichment is its learning, since there is still a lack of relevant approach able to achieve automatic enrichment from a textual corpus or dataset of various topics. In this paper, we describe a new approach for automatic learning of terminological ontologies from textual corpus based on probabilistic models. In our approach, two topic modeling algorithms are explored, namely LDA and pLSA for learning topic ontology. The objective is to capture semantic relationships between word-topic and topic-document in terms of probability distributions to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and retrieving corresponding topic ontology for a user query demonstrates the effectiveness of the proposed approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. http://nlp.stanford.edu/software/tagger.shtml.

  2. http://tartarus.org/ martin/PorterStemmer/.

  3. http://lucene.apache.org/.

  4. http://www.cis.upenn.edu/datamining/softwaredist/PennAspect/.

  5. http://alias-i.com/lingpipe//.

References

  1. Al-Arfaj A, Al-Salman A. Ontology construction from text: challenges and trends. Int J Artif Intell Expert Syst. 2015;6(2):15–26.

    Google Scholar 

  2. Amardeilh F. Ontopop or how to annotate documents and populate ontologies from texts. In: Proceedings of the Annual European Semantic Web Conference, Workshop on Mastering the Gap: From Information Extraction to Semantic Representation. Montenegro: Budva; June 12, 2006. [CEUR Workshop Proceedings, ISSN 1613-0073, online [http://ceur-ws.org/Vol-187/19.pdf].

  3. Andrieu C, de Freitas N, Doucet A, Jordan MI. An introduction to MCMC for machine learning. Mach Learn. 2003;50(1–2):391–407.

    MATH  Google Scholar 

  4. Asfari O, Hannachi L, Bentayeb F, Boussaid O. Ontological topic modeling to extract Twitter users’ topics of interest. 8th International Conference on Information Technology and Applications (ICITA). Sydney, Australia (2013). p. 141–146. http://www.scielo.org.mx/scielo.php?script=sci_nlinks&pid=S0188-252X201900010040700004&lng=en.

  5. Benammar R, Trémeau A, Maret P. An approach for ontology population based on information extraction techniques: application to cultural heritage. In: Debruyne C. et al. (eds) On the Move to Meaningful Internet Systems: OTM 2015 Conferences. OTM 2015. Lecture Notes in Computer Science, vol 9415. Springer. https://doi.org/10.1007/978-3-319-26148-5_26.

  6. Berry MW, Dumais ST, O’Brien GW. Using linear algebra for intelligent information retrieval. SIAM Rev. 1995;37:573–95.

    MathSciNet  MATH  Article  Google Scholar 

  7. Bilmes J. A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. University of Berkeley, California, Tech Rep ICSI-TR-97-021; 1998.

  8. Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):776–84.

    Article  Google Scholar 

  9. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

    MATH  Google Scholar 

  10. Bruggmann A, Fabrikant SI. How to visualize the geography of Swiss history. In: Huerta J, Schade S, Granell C, editors. Connecting a Digital Europe Through Location and Place. s.n.: Association of Geographic Information Laboratories for Europe (AGILE). 2014. https://doi.org/10.5167/uzh-103121.

  11. Bruggmann A, Fabrikant SI. Spatializing time in a history text corpus. In: GIScience 2014: Eighth International Conference on Geographic Information Science, Vienna (A), 23–26 September 2014, p. 183–186. https://doi.org/10.5167/uzh-103126.

  12. Buitelaar P, Cimiano P, Magnini B. Ontology learning from text: an overview. In: Buitelaar p, Cimiano P, Magnini B, editors. Ontology learning from text: methods, evaluation and applications. Amsterdam: IOS Press; 2005. p. 3–12.

    Google Scholar 

  13. Casteleiro M, Prieto M, Demetriou G, Maroto N, Read W, Maseda-Fernandez D, Diz J, Nenadic G, Keane J, Stevens R. Ontology learning with deep learning: a case study on patient safety using pubmed. In: SWAT4LS, volume 1795 of CEUR Workshop Proceedings, CEUR-WS.org, (2016). [http://dblp.uni-trier.de/db/conf/swat4ls/swat4ls2016.html#CasteleiroPDMRM16].

  14. Cimiano P, Volker J, Studer R. A software engineering approach to ontology building. Information. 2006;57(6–7):315–20.

    Google Scholar 

  15. Cooper DC, Gregory IN, Hardie A, Rayson P. Spatializing and analyzing digital texts: corpora, gis, and places. In: David JB, John C, Trevor MH, editors. Deep maps and spatial narratives. Bloomington: Indiana University Press; 2015.

    Google Scholar 

  16. Deerwester S, Dumais ST, Landauer TK, Furnas GW, Harshman R. Indexing by latent semantic analysis. J Am Soc Inf Sci. 1990;41(6):391–407.

    Article  Google Scholar 

  17. Derungs C, Purves RS. From text to landscape: locating, identifying and mapping the use of landscape features in a swiss alpine corpus. Int J Geogr Inf Sci. 2014;28(6):1272–93.

    Article  Google Scholar 

  18. Diaconis P. Finite forms of de finetti’s theorem on exchangeability. Synthese. 1977;36(2):271–81.

    MathSciNet  MATH  Article  Google Scholar 

  19. Diederich J, Balke W, The Semantic GrowBag algorithm: automatically deriving categorization systems. In: Kovács L, Fuhr N, Meghini C, editors. Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin. https://doi.org/10.1007/978-3-540-74851-9_1.

  20. Ding Y, Foo S. Ontology research and development. Part I—a review of ontology generation. J Inf Sci. 2002;28(2):123–36. https://doi.org/10.1177/016555150202800204.

    Article  Google Scholar 

  21. Faatz A, Steinmetz R. Ontology enrichment with texts from the www. In: Proceedings of the ECML/PKDD, second workshop on semantic web mining. Finland: Helsinki; p. 20-34, 2002. http://km.aifb.kit.edu/ws/semwebmine2002/papers/semwebmine2002_all.pdf#page=28.

  22. Fernández-Lopez M, Gómez-Pérez A, Juristo N. METHONTOLOGY: from Ontological Art towards Ontological Engineering. In: Spring symposium on ontological engineering of AAAI. California: Stanford University; 1997. p. 33–40. http://oa.upm.es/5484/1/METHONTOLOGY_.pdf.

  23. GeoNames A. Geonames:http://geonames.org/. Switzerland: Männedorf, Tech rep; 2018.

  24. Gomez-Perez A, Manzano-Macho D. A survey of ontology learning methods and techniques. OntoWeb Consortium, Tech Rep D1.5; 2003.

  25. Griffiths TL, Steyvers M. Finding scientific topics. Proc Natl Acad Sci. 2004;101(Suppl 1):5228–35.

    Article  Google Scholar 

  26. Gruninger M, Fox M. S. Methodology for the design and evaluation of ontologie. In: Workshop on basic Ontological Issues in Knowledge Sharing, IJCA-95, Montreal Canada; 1995. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C2E74969584A2E681319618DDA759DF3?doi=10.1.1.44.8723&rep=rep1&type=pdf.

  27. Guo W, Liang L, Deng T. Topic detection model in a single-domain corpus inspired by the human memory cognitive process. Concurr Comput Pract Exp. 2016;29(3):e3776.

    Article  Google Scholar 

  28. Gutiérrez-Batista K, Campaña JR, Vila MA, Martín-Bautista MJ. An ontology-based framework for automatic topic detection in multilingual environments. Int J Intell Syst. 2018;33(7):1459–75.

    Article  Google Scholar 

  29. Gómez-Pérez A, Poveda M, Euzenat J, Le-Duc C. Revision and extension of the neon methodology for building contextualized ontology networks. Open University, Knowledge Media Institute, Tech Rep NEON/2010/D5.4.3/v1.0; 2010.

  30. Harris ZS. Distributional Structure. Book: Papers in Structural and Transformational Linguistics. Netherlands: Springer; 1970. p. 775–794. https://doi.org/10.1007/978-94-017-6059-1.

  31. Hofmann T. Probabilistic latent semantic analysis. In: UAI'99: Proceedings of the fifteenth conference on uncertainty in artificial intelligence; 1990. p. 289–296. https://doi.org/10.5555/2073796.2073829.

  32. Hofmann T. Probabilistic latent semantic indexing. In: SIGIR '99: Proceedings of 22nd international ACM SIGIR conference on research and development in information retrieval. Berkeley, California, USA; 1999. p. 50–57. https://doi.org/10.1145/312624.312649.

  33. Hong-yan Y, Jian-liang X, Mo-ji W, Jing X. Development of domain ontology for e-learning course. In: 2009 IEEE International Symposium on IT in Medicine & Education, Jinan; 2009. p. 501–506. https://doi.org/10.1109/ITIME.2009.5236370. https://ieeexplore.ieee.org/document/5236370.

  34. Hu D, Wang W, Liu S, Xie N, Yin G. Text Segmentation Model Based LDA and Ontology for Question Answering in Agriculture. In: Xu S, editors. Proceedings of World Agricultural Outlook Conference. Berlin: Springer; 2013. p. 307–319. https://doi.org/10.1007/978-3-642-54389-0_27.

  35. Hu Y, Janowicz K, Prasad S, Gao S. Enabling semantic search and knowledge discovery for arcgis online: a linked-data-driven approach. In: Book Series: Lecture Notes in Geoinformation and Cartography “Geographic Information Science as an Enabler of Smarter Cities and Communities”, LNGC, Springer; 2015. p. 107–124. https://doi.org/10.1007/978-3-319-16787-9_7.

  36. Ivanova T. Ontology learning technologies-brief survey, trends and problems. In: Proceedings of the International Conference on Information Technologies; 2012. p. 245–255. https://www.researchgate.net/profile/Tatyana_Ivanova/publication/268816717_ONTOLOGY_LEARNING_TECHNOLOGIES_-_BRIEF_SURVEY_TRENDS_AND_PROBLEMS/links/54788c7c0cf2a961e4877945/ONTOLOGY-LEARNING-TECHNOLOGIES-BRIEF-SURVEYTRENDS-AND-PROBLEMS.pdf.

  37. Liu B, Wang S, Long R. Irspot-el: identify recombination spots with an ensemble learning approach. Bioinformatics. 2016;33(1):35–41.

    Article  Google Scholar 

  38. Maedche A, Staab S. Ontology learning for the semantic web. IEEE Int Syst. 2001;16(2):72–9.

    Article  Google Scholar 

  39. Moher D, Liberati A, Tetzla J, Altman D. Preferred reporting items for systematic reviews and meta-analyses: the Prisma statement. PLoS Med. 2009;6(7):336–41.

    Article  Google Scholar 

  40. Navigli R, Ponzetto S. The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell. 2012;193:217–50.

    MathSciNet  MATH  Article  Google Scholar 

  41. Nicola A, Missikoff M, Navigli R. A software engineering approach to ontology building. Inf Syst. 2009;34(2):258–75.

    Article  Google Scholar 

  42. Nigam K, McCallum A, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Mach Learn. 2000;39(2–3):103–34.

    MATH  Article  Google Scholar 

  43. Noy NF, McGuinness DL. Ontology Development 101: a Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001. http://www.ksl.stanford.edu/KSL_Abstracts/KSL-01-05.html.

  44. Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E. Ontology population and enrichment: state of the art. In: Paliouras G, Tsatsaronis G, editors. Knowledge-driven multimedia information extraction and ontology evolution. Springer, Berlin; 2011. p. 134–66.

  45. Posch L. Enriching ontologies with encyclopedic background knowledge for document indexing. In: ISWC '14: proceedings of the 14th International Semantic Web Conference - Part II, p. 537–544. 2014. https://doi.org/10.1007/978-3-319-11915-1_36.

  46. Santosha DT, Babua KS, Prasada S, Vivekananda A. Opinion mining of online product reviews from traditional lda topic clusters using feature ontology tree and sentiwordnet. Educ Manag Eng. 2016;6:34–44.

    Google Scholar 

  47. Shamsfard M, Barforoush A. The state of the art in ontology learning: a framework for comparison. Knowl Eng Rev. 2003;18(4):293–316.

    Article  Google Scholar 

  48. Steyvers M, Griffiths T. Probabilistic topic models. In: Landauer TK, McNamara DS, Dennis S, Kintsch W, editors. Handbook of latent semantic analysis. Hillsdale: Lawrence Erlbaum Associates Publishers; 2007. p. 427–48.

    Google Scholar 

  49. Velardia P, Fabriani P, Missikoff M. Using text processing techniques to automatically enrich a domain ontology. In: FOIS '01: Proceedings of the international conference on Formal Ontology in Information Systems; 2001. p. 270–284. https://doi.org/10.1145/505168.505194.

  50. Wong W, Liu W, Bennamoun M. Ontology learning from text: a look back and into the future. ACM Comput Surv. 2012;44(4):1–36.

    MATH  Article  Google Scholar 

  51. Zavitsanos E, Paliouras G, Vouros GA, Petridis S. Discovering subsumption hierarchies of ontology concepts from text corpora. In: IEEE/WIC/ACM international conference on web intelligence (WI’07), Fremont, CA; 2007. p. 402–408.

  52. Zhao T, Luo X, Qin W, Huang S, Xie S. Topic detection model in a single-domain corpus inspired by the human memory cognitive process. Concurr Comput Pract Exp. 2018;30(8):e4642.

    Article  Google Scholar 

  53. Zhou L. Ontology learning: state of the art and open issues. Inf Technol Manag. 2007;8(3):241–52.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anis Tissaoui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Web for Information and Knowledge Exploration, Sharing and Security (Section 1: Web2Touch)” guest edited by Haider Abbas, Hammad Afzal, Rodrigo Bonacin, Ismail Bouassida, Khalil Drira, Riccardo Martoglia, Olga Nabuco, and Fatiha Saïs.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tissaoui, A., Sassi, S. & Chbeir, R. Probabilistic Topic Models for Enriching Ontology from Texts. SN COMPUT. SCI. 1, 336 (2020). https://doi.org/10.1007/s42979-020-00349-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00349-y

Keywords

  • Knowledge acquisition
  • Ontology enrichment
  • Ontology learning
  • Probabilistic topic models