An Ontology-Driven Approach for Semantic Annotation of Documents with Specific Concepts

  • Céline AlecEmail author
  • Chantal Reynaud-Delaître
  • Brigitte Safar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


This paper deals with an ontology-driven approach for semantic annotation of documents from a corpus where each document describes an entity of a same domain. The goal is to annotate each document with concepts being too specific to be explicitly mentioned in texts. The only thing we know about the concepts is their labels, i.e., we have no semantic information about these concepts. Moreover, their characteristics in the texts are incomplete. We propose an ontology-based approach, named Saupodoc, aiming to perform this particular annotation process by combining several approaches. Indeed, Saupodoc relies on a domain ontology relative to the field under study, which has a pivotal role, on its population with property assertions coming from documents and external resources, and its enrichment with formal specific concept definitions. Experiments have been carried out in two application domains, showing the benefit of the approach compared to well-known classifiers.


Ontology-driven approach Ontology population Ontology enrichment with specific concepts 



This work has been funded by the Poraso project, in the setting of a collaboration with the Wepingo company.


  1. 1.
  2. 2.
  3. 3.
    Alec, C., Reynaud-Delaître, C., Safar, B., Sellami, Z., Berdugo, U.: Automatic ontology population from product catalogs. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 1–12. Springer, Heidelberg (2014)Google Scholar
  4. 4.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Béchet, N., Aufaure, M.A., Lechevallier, Y.: Construction et peuplement de structures hiérarchiques de concepts dans le domaine du e-tourisme. In: IC, pp. 475–490 (2011)Google Scholar
  6. 6.
    Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to meet new challenges in language engineering. Nat. Lang. Eng. 10(3/4), 349–373 (2004)CrossRefGoogle Scholar
  7. 7.
    Cambria, E., Fu, J., Bisio, F., Poria, S.: AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI, pp. 508–514 (2015)Google Scholar
  8. 8.
    Cambria, E., White, B.: Jumping NLP curves: a review of natural language processing research [review article]. IEEE Comp. Int. Mag. 9(2), 48–57 (2014)CrossRefGoogle Scholar
  9. 9.
    Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP (2013)Google Scholar
  10. 10.
    Cunningham, H., et al.: Text Processing with GATE. University of Sheffield Department of Computer Science, Sheffield (2011)Google Scholar
  11. 11.
    Esposito, F., Fanizzi, N., Iannone, L., Palmisano, I., Semeraro, G.: Knowledge-intensive induction of terminologies from metadata. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 441–455. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Fanizzi, N., d’Amato, C., Esposito, F.: DL-FOIL concept learning in description logics. In: Železný, F., Lavrač, N. (eds.) ILP 2008. LNCS (LNAI), vol. 5194, pp. 107–121. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Kessler, R., Béchet, N., Roche, M., Moreno, J.M.T., El-Bèze, M.: A hybrid approach to managing job offers and candidates. Inf. Process. Manage. 48(6), 1124–1135 (2012)CrossRefGoogle Scholar
  15. 15.
    Lehmann, J.: DL-Learner: learning concepts in description logics. J. Mach. Learn. Res. 10, 2639–2642 (2009)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Lehmann, J., Auer, S., Bühmann, L., Tramp, S.: Class expression learning for ontology engineering. J. Web Seman. 9, 71–81 (2011)CrossRefGoogle Scholar
  17. 17.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd ACL: System Demonstrations, pp. 55–60 (2014)Google Scholar
  18. 18.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: I-Semantics, pp. 1–8. ACM, New York (2011)Google Scholar
  19. 19.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRefGoogle Scholar
  20. 20.
    Oren, E., Möller, K., Scerri, S., Handschuh, S., Sintek, M.: What are Semantic Annotations? Technical report, DERI Galway (2006)Google Scholar
  21. 21.
    Petasis, G., Möller, R., Karkaletsis, V.: BOEMIE: reasoning-based information extraction. In: LPNMR, pp. 60–75. A Corunna, Spain (2013).
  22. 22.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL (2011)Google Scholar
  23. 23.
    Reeve, L.: Survey of semantic annotation platforms. In: ACM Symposium on Applied Computing, pp. 1634–1638. ACM Press (2005)Google Scholar
  24. 24.
    Shearer, R., Motik, B., Horrocks, I.: HermiT: a highly-efficient OWL reasoner. In: OWLED, vol. 432 (2008).
  25. 25.
    Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL reasoner. J. Web Seman. 5(2), 51–53 (2007)CrossRefGoogle Scholar
  26. 26.
    Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: system description. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 292–297. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Yelagina, N., Panteleyev, M.: Deriving of thematic facts from unstructured texts and background knowledge. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol. 468, pp. 208–218. Springer, Heidelberg (2014)Google Scholar
  28. 28.
    Yosef, M.A., Hoffart, J., Bordino, I., Spaniol, M., Weikum, G.: AIDA: an online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12), 1450–1453 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Céline Alec
    • 1
    Email author
  • Chantal Reynaud-Delaître
    • 1
  • Brigitte Safar
    • 1
  1. 1.LRI, Univ. Paris-Sud, CNRS, Université Paris-SaclayOrsayFrance

Personalised recommendations