Abstract
In this paper, we present an efficient and accurate method to represent events from numerous public sources, such as Wikidata or more specific knowledge bases. We focus on events happening in the real world, such as festivals or assassinations. Our method merges knowledge from Wikidata and Wikipedia article summaries to gather entities involved in events, dates, types and labels. This event characterization procedure is extended by including vernacular languages. Our method is evaluated by a comparative experiment on two datasets that shows that events are represented more accurately and exhaustively with vernacular languages. This can help to extend the research that mainly exploits hub languages, or biggest language editions of Wikipedia. This method and the tool we release will for instance enhance event-centered semantic search engines, a context in which we already use it. An additional contribution of this paper is the public release of the source code of the tool, as well as the corresponding datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The package is a Python 3 library called wikivents on Pypi.org and available on the Software Heritage repository at https://archive.softwareheritage.org/swh:1:dir:ef325a054ba6f7eb1121807da7b1c92b9ecde8f8.
- 2.
The dataset is hosted on Zenodo: https://doi.org/10.5281/zenodo.4733506.
- 3.
Available on the Software Heritage repository at https://archive.softwareheritage.org/swh:1:dir:ef325a054ba6f7eb1121807da7b1c92b9ecde8f8.
References
Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents. Informatica 42, 23–32 (2017)
Cybulska, A.K., Vossen, P.: Historical event extraction from text. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Portland, Oregon, USA, pp. 39–43, June 2011. https://www.aclweb.org/anthology/W11-1506
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) program. Tasks, data and evaluation. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 837–840, May 2004. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf
Eberhard, D.M., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World (2021). https://www.ethnologue.com/
Exner, P., Nugues, P.: Using semantic role labeling to extract events from Wikipedia. In: DeRiVE@ ISWC, pp. 38–47 (2011)
Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web 9(1), 77–129 (2017). https://doi.org/10.3233/SW-170275
The Wikimedia Foundation: List of Wikipedias. Wikipedia, April 2021. https://en.wikipedia.org/w/index.php?title=List_of_Wikipedias&oldid=1016309550
The Wikimedia Foundation: Wikipedia article depth - Meta, April 2021. https://meta.wikimedia.org/wiki/Wikipedia_article_depth
The Wikimedia Foundation: Wikipedia: Summary style. Wikipedia, April 2021. https://en.wikipedia.org/w/index.php?title=Wikipedia:Summary_style&oldid=1015628666
Gottschalk, S., Demidova, E.: EventKG - the hub of event knowledge on the web - and biographical timeline generation. Semantic Web 10(6), 1039–1070 (2019). https://doi.org/10.3233/SW-190355
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013). https://doi.org/10.1016/j.artint.2012.06.001
Jean-Caurant, A., Doucet, A.: Accessing and investigating large collections of historical newspapers with the Newseye platform. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 531–532 (2020)
Kaffee, L.A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L., Pintscher, L.: A glimpse into babel: an analysis of multilinguality in Wikidata. In: Proceedings of the 13th International Symposium on Open Collaboration - OpenSym 2017, Galway, Ireland, pp. 1–5 (2017). https://doi.org/10.1145/3125433.3125465
La Fleur, A., Teymourian, K., Paschke, A.: Complex event extraction from real-time news streams. In: Proceedings of the 11th International Conference on Semantic Systems, Vienna Austria, pp. 9–16, September 2015. https://doi.org/10.1145/2814864.2814870
Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: Proceedings of the 23rd International Conference on World Wide Web - WWW 2014 Companion, Seoul, Korea, pp. 107–110 (2014). https://doi.org/10.1145/2567948.2577024
Lehmann, J., et al.: DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015). https://doi.org/10.3233/SW-140134
Mele, I., Bahrainian, S.A., Crestani, F.: Event mining and timeliness analysis from heterogeneous news streams. Inf. Process. Manage. 56(3), 969–993 (2019). https://doi.org/10.1016/j.ipm.2019.02.003
Mele, I., Crestani, F.: A multi-source collection of event-labeled news documents. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval - ICTIR 2019, Santa Clara, CA, USA, pp. 205–208 (2019). https://doi.org/10.1145/3341981.3344253
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 3 (1995)
Minsky, M.: A framework for representing knowledge. The Psychology of Computer Vision (1975)
Mishra, A., Berberich, K.: EXPOSÉ: exploring past news for seminal events. In: Proceedings of the 24th International Conference on World Wide Web - WWW 2015 Companion, Florence, Italy, pp. 223–226 (2015). https://doi.org/10.1145/2740908.2742844
Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., Tannier, X.: Searching news articles using an event knowledge graph leveraged by Wikidata. In: Companion Proceedings of The 2019 World Wide Web Conference on - WWW 2019, San Francisco, USA, pp. 1232–1239 (2019). https://doi.org/10.1145/3308560.3316761
Rupnik, J., Muhic, A., Leban, G., Skraba, P., Fortuna, B., Grobelnik, M.: News across languages - cross-lingual document similarity and event tracking. J. Artif. Intell. Res. 55, 283–316 (2016). https://doi.org/10.1613/jair.4780
Shaw, R.: A semantic tool for historical events. In: Proceedings of the The 1st Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Atlanta, Georgia, USA, pp. 38–46, June 2013
Shaw, R., Troncy, R., Hardman, L.: LODE: linking open descriptions of events. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 153–167. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10871-6_11
Shaw, R.B.: Events and periods as concepts for organizing historical knowledge. Ph.D. thesis, UC Berkeley (2010). https://escholarship.org/uc/item/4111f1fw
Sprugnoli, R.: Event detection and classification for the digital humanities. Ph.D. thesis, Università degli Studi di Trento, Trento, Italia, April 2018. http://eprints-phd.biblio.unitn.it/2865/
Sundheim, B.M.: Overview of results of the MUC-6 Evaluation. In: Proceedings of the 6th Conference on Message Understanding, pp. 13–31, November 1995. https://doi.org/10.3115/1072399.1072402
van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use of the Simple Event Model (SEM). J. Web Semant. 9(2), 128–136 (2011). https://doi.org/10.1016/j.websem.2011.03.003
Vrandečić, D.: Architecture for a multilingual Wikipedia. arXiv:2004.04733 [cs], April 2020. http://arxiv.org/abs/2004.04733
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Xiang, W., Wang, B.: A survey of event extraction from text. IEEE Access 7, 173111–173137 (2019). https://doi.org/10.1109/ACCESS.2019.2956831
Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, p. 14, August 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bernard, G., Suire, C., Faucher, C., Doucet, A. (2021). A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-86324-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86323-4
Online ISBN: 978-3-030-86324-1
eBook Packages: Computer ScienceComputer Science (R0)