Automatic Identification of Relations in Quebec Heritage Data
Heritage data is often represented in unstructured format, especially textual data. In this paper, our objective is to extract instances of predefined relations between persons and real estates from historical notices in French. Using several vector-based representations and supervised learning algorithms, we build classifiers able to achieve an F-measure between 75% to 85% for relation detection. Our results show that performances are highly dependent on the type of relation, and also on the specific evaluation metrics. Our best results are obtained using a TF-IDF vector representation with a support vector machine classifier or Word2Vec vectors combined with a multilayer perceptron classifier.
KeywordsRelation extraction Heritage data Supervised learning Word2Vec TF-IDF
This work has been funded by the Quebec Ministry of Culture and Communication.
- 2.Augenstein, I., Padó, S., Rudolph, S.: LODifier: generating linked data from unstructured text. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 210–224. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_21CrossRefGoogle Scholar
- 3.Benson, E., Haghighi, A., Barzilay, R.: Event discovery in social media feeds. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 389–398. Association for Computational Linguistics (2011)Google Scholar
- 4.Buranasing, W., Phoomvuthisarn, S., Buranarach, M.: Information extraction and integration for enriching cultural heritage collections. In: 2016 11th International Conference on Knowledge, Information and Creativity Support Systems (KICSS), pp. 1–6, November 2016Google Scholar
- 5.Byrne, K., Klein, E.: Automatic extraction of archaeological events from text, April 2009Google Scholar
- 7.Nie, T., Shen, D., Kou, Y., Yu, G., Yue, D.: An entity relation extraction model based on semantic pattern matching. In: 2011 Eighth Web Information Systems and Applications Conference (WISA), pp. 7–12. IEEE (2011)Google Scholar
- 9.Petit, J., Boisson, J.C., Rousseaux, F.: Discovering cultural conceptual structures from texts for ontology generation. In: 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 0225–0229. IEEE (2017)Google Scholar
- 10.Schöch, C.: A Word2Vec model file built from the French Wikipedia XML Dump using gensim, October 2016Google Scholar
- 11.Song, S., Sun, Y., Di, Q.: Multiple order semantic relation extraction. Neural Comput. Appl. 1–14 (2018)Google Scholar