Abstract
Maintaining updated ontology-based digital libraries faces two main issues. First, documents are often unstructured and in heterogeneous data formats, making it even more difficult to extract information and search in. Second, manual ontology population is time consuming and therefore automatic methods to support this process are needed.
In this paper, we present an ontology-based framework aiming at populating ontologies. In particular, we propose an approach for triplet extraction from heterogeneous and unstructured documents in order to automatically populate ontology-based digital libraries. Finally, we evaluate the proposed framework on a real world case study.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
We used OpenNLP default parameters, namely cutoff frequencies set to 5, number of iterations set to 100.
References
Bush, V.: As we may think. ACM SIGPC Notes 1(4), 36–44 (1979)
Zghal, H.B., Moreno, A.: A system for information retrieval in a medical digital library based on modular ontologies and query reformulation. Multimedia Tools Appl. 72(3), 2393–2412 (2014)
Li, N., Zhu, L., Mitra, P., Mueller, K., Poweleit, E., Giles, C.L.: Orechem chemxseer: a semantic digital library for chemistry. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 245–254. ACM (2010)
Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., van de Sompel, H.: The Europeana data model (edm). In: World Library and Information Congress: 76th IFLA General Conference and Assembly, pp. 10–15 (2010)
Ruiz-Martınez, J.M., Minarro-Giménez, J.A., Castellanos-Nieves, D., Garcıa-Sánchez, F., Valencia-Garcia, R.: Ontology population: an application for the e-tourism domain. Int. J. Innovative Comput. Inf. Control (IJICIC) 7(11), 6115–6134 (2011)
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving gate to meet new challenges in language engineering. Nat. Lang. Eng. 10(3–4), 349–373 (2004)
Faria, C., Serra, I., Girardi, R.: A domain-independent process for automatic ontology population from text. Sci. Comput. Program. 95, 26–43 (2014)
Antinucci, F., Cinque, G.: Sull’ordine delle parole in italiano: l’emarginazione. Studi di grammatica italiana VI, pp. 121–146 (1977)
Boschi, S.: La comunicazione vista dal nostro cervello. Lampi di stampa, Milan (2008)
Sabatini, F.: La comunicazione e gli usi della lingua. Loescher, Torino (1991)
Adrian, W.T., Leone, N., Manna, M.: Ontology-driven information extraction. arXiv preprint arXiv:1512.06034 (2015)
Benammar, R., Trémeau, A., Maret, P.: An approach for ontology population based on information extraction techniques. In: Debruyne, C., Panetto, H., Meersman, R., Dillon, T., Weichhart, G., An, Y., Agostino Ardagna, C. (eds.) On the Move to Meaningful Internet Systems: OTM 2015 Conferences. LNCS, vol. 9415, pp. 397–404. Springer, Switzerland (2015). doi:10.1007/978-3-319-26148-5_26
Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., Mladenic, D.: Triplet extraction from sentences. In: Proceedings of the 10th International Multiconference Information Society-IS, pp. 8–12 (2007)
Adorni, G., Maratea, M., Pandolfo, L., Pulina, L.: An ontology-based archive for historical research. In: Proceedings of the 28th International Workshop on Description Logics. CEUR Workshop Proceedings, Athens, Greece, 7–10 June 2015, vol. 1350. CEUR-WS.org (2015)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (1993)
Motik, B., Patel-Schneider, P.F., Parsia, B., Bock, C., Fokoue, A., Haase, P., Hoekstra, R., Horrocks, I., Ruttenberg, A., Sattler, U., et al.: Owl 2 web ontology language: structural specification and functional-style syntax. W3C Recommendation 27(65), 159 (2009)
Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S.: Owl 2 Web Ontology Language Primer, 2nd edn. W3C Recommendation, December 2012
Dale, R., Moisl, H., Somers, H.: Handbook of Natural Language Processing. CRC Press, Boca Raton (2000)
Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 11–41. Springer, Heidelberg (2012). doi:10.1007/978-1-4614-3223-4_2
Piskorski, J., Yangarber, R.: Information extraction: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 23–49. Springer, Heidelberg (2013). doi:10.1007/978-3-642-28569-1_2
Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-Based Information Extraction for Business Intelligence. Springer, Heidelberg (2007)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36, 306 (2010)
The opennlp project (2005). http://opennlp.apache.org. Accessed June 2016
Horridge, M., Bechhofer, S.: The owl API: a java api for owl ontologies. Semant. Web 2(1), 11–21 (2011)
Harris, S., Seaborne, A., Prudhommeaux, E.: Sparql 1.1 Query Language, vol. 21. W3C Recommendation (2013)
Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual database. In: Proceedings of the 1st International Conference on Global WordNet, pp. 293–302 (2002)
Berners-Lee, T., Connolly, D., Kagal, L., Scharf, Y., Hendler, J.: N3logic: a logical framework for the world wide web. Theor. Pract. Logic Program. 8(03), 249–269 (2008)
Adorni, G., Maratea, M., Pandolfo, L., Pulina, L.: An ontology for historical research documents. In: Cate, B., Mileo, A. (eds.) RR 2015. LNCS, vol. 9209, pp. 11–18. Springer, Heidelberg (2015). doi:10.1007/978-3-319-22002-4_2
Adorni, G., Maratea, M., Mura, S., Pandolfo, L., Pulina, L., Soddu, F.: A domain ontology for historical research documents. In: Artificial Intelligence for Cultural Heritage, pp. 25–48 Cambridge Scholars Publishing (2016)
Kontchakov, R., Pandolfo, L., Pulina, L., Ryzhikov, V., Zakharyaschev, M.: Temporal and spatial OBDA with many-dimensional Halpern-Shoham logic. To appear in Proceedings of IJCAI (2016)
Acknowledgments
The authors would like to thank Dr. Anastasia Di Nunzio for helpful discussion on linguistic typology studies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Pandolfo, L., Pulina, L., Adorni, G. (2016). A Framework for Automatic Population of Ontology-Based Digital Libraries. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds) AI*IA 2016 Advances in Artificial Intelligence. AI*IA 2016. Lecture Notes in Computer Science(), vol 10037. Springer, Cham. https://doi.org/10.1007/978-3-319-49130-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-49130-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49129-5
Online ISBN: 978-3-319-49130-1
eBook Packages: Computer ScienceComputer Science (R0)