Skip to main content

A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines

  • Conference paper
  • First Online:
Linking Theory and Practice of Digital Libraries (TPDL 2021)

Abstract

In this paper, we present an efficient and accurate method to represent events from numerous public sources, such as Wikidata or more specific knowledge bases. We focus on events happening in the real world, such as festivals or assassinations. Our method merges knowledge from Wikidata and Wikipedia article summaries to gather entities involved in events, dates, types and labels. This event characterization procedure is extended by including vernacular languages. Our method is evaluated by a comparative experiment on two datasets that shows that events are represented more accurately and exhaustively with vernacular languages. This can help to extend the research that mainly exploits hub languages, or biggest language editions of Wikipedia. This method and the tool we release will for instance enhance event-centered semantic search engines, a context in which we already use it. An additional contribution of this paper is the public release of the source code of the tool, as well as the corresponding datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The package is a Python 3 library called wikivents on Pypi.org and available on the Software Heritage repository at https://archive.softwareheritage.org/swh:1:dir:ef325a054ba6f7eb1121807da7b1c92b9ecde8f8.

  2. 2.

    The dataset is hosted on Zenodo: https://doi.org/10.5281/zenodo.4733506.

  3. 3.

    Available on the Software Heritage repository at https://archive.softwareheritage.org/swh:1:dir:ef325a054ba6f7eb1121807da7b1c92b9ecde8f8.

References

  1. Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents. Informatica 42, 23–32 (2017)

    Google Scholar 

  2. Cybulska, A.K., Vossen, P.: Historical event extraction from text. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Portland, Oregon, USA, pp. 39–43, June 2011. https://www.aclweb.org/anthology/W11-1506

  3. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) program. Tasks, data and evaluation. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 837–840, May 2004. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf

  4. Eberhard, D.M., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World (2021). https://www.ethnologue.com/

  5. Exner, P., Nugues, P.: Using semantic role labeling to extract events from Wikipedia. In: DeRiVE@ ISWC, pp. 38–47 (2011)

    Google Scholar 

  6. Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web 9(1), 77–129 (2017). https://doi.org/10.3233/SW-170275

    Article  Google Scholar 

  7. The Wikimedia Foundation: List of Wikipedias. Wikipedia, April 2021. https://en.wikipedia.org/w/index.php?title=List_of_Wikipedias&oldid=1016309550

  8. The Wikimedia Foundation: Wikipedia article depth - Meta, April 2021. https://meta.wikimedia.org/wiki/Wikipedia_article_depth

  9. The Wikimedia Foundation: Wikipedia: Summary style. Wikipedia, April 2021. https://en.wikipedia.org/w/index.php?title=Wikipedia:Summary_style&oldid=1015628666

  10. Gottschalk, S., Demidova, E.: EventKG - the hub of event knowledge on the web - and biographical timeline generation. Semantic Web 10(6), 1039–1070 (2019). https://doi.org/10.3233/SW-190355

    Article  Google Scholar 

  11. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013). https://doi.org/10.1016/j.artint.2012.06.001

    Article  MathSciNet  MATH  Google Scholar 

  12. Jean-Caurant, A., Doucet, A.: Accessing and investigating large collections of historical newspapers with the Newseye platform. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 531–532 (2020)

    Google Scholar 

  13. Kaffee, L.A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L., Pintscher, L.: A glimpse into babel: an analysis of multilinguality in Wikidata. In: Proceedings of the 13th International Symposium on Open Collaboration - OpenSym 2017, Galway, Ireland, pp. 1–5 (2017). https://doi.org/10.1145/3125433.3125465

  14. La Fleur, A., Teymourian, K., Paschke, A.: Complex event extraction from real-time news streams. In: Proceedings of the 11th International Conference on Semantic Systems, Vienna Austria, pp. 9–16, September 2015. https://doi.org/10.1145/2814864.2814870

  15. Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: Proceedings of the 23rd International Conference on World Wide Web - WWW 2014 Companion, Seoul, Korea, pp. 107–110 (2014). https://doi.org/10.1145/2567948.2577024

  16. Lehmann, J., et al.: DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015). https://doi.org/10.3233/SW-140134

    Article  Google Scholar 

  17. Mele, I., Bahrainian, S.A., Crestani, F.: Event mining and timeliness analysis from heterogeneous news streams. Inf. Process. Manage. 56(3), 969–993 (2019). https://doi.org/10.1016/j.ipm.2019.02.003

    Article  Google Scholar 

  18. Mele, I., Crestani, F.: A multi-source collection of event-labeled news documents. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval - ICTIR 2019, Santa Clara, CA, USA, pp. 205–208 (2019). https://doi.org/10.1145/3341981.3344253

  19. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 3 (1995)

    Article  Google Scholar 

  20. Minsky, M.: A framework for representing knowledge. The Psychology of Computer Vision (1975)

    Google Scholar 

  21. Mishra, A., Berberich, K.: EXPOSÉ: exploring past news for seminal events. In: Proceedings of the 24th International Conference on World Wide Web - WWW 2015 Companion, Florence, Italy, pp. 223–226 (2015). https://doi.org/10.1145/2740908.2742844

  22. Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., Tannier, X.: Searching news articles using an event knowledge graph leveraged by Wikidata. In: Companion Proceedings of The 2019 World Wide Web Conference on - WWW 2019, San Francisco, USA, pp. 1232–1239 (2019). https://doi.org/10.1145/3308560.3316761

  23. Rupnik, J., Muhic, A., Leban, G., Skraba, P., Fortuna, B., Grobelnik, M.: News across languages - cross-lingual document similarity and event tracking. J. Artif. Intell. Res. 55, 283–316 (2016). https://doi.org/10.1613/jair.4780

    Article  MathSciNet  Google Scholar 

  24. Shaw, R.: A semantic tool for historical events. In: Proceedings of the The 1st Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Atlanta, Georgia, USA, pp. 38–46, June 2013

    Google Scholar 

  25. Shaw, R., Troncy, R., Hardman, L.: LODE: linking open descriptions of events. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 153–167. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10871-6_11

    Chapter  Google Scholar 

  26. Shaw, R.B.: Events and periods as concepts for organizing historical knowledge. Ph.D. thesis, UC Berkeley (2010). https://escholarship.org/uc/item/4111f1fw

  27. Sprugnoli, R.: Event detection and classification for the digital humanities. Ph.D. thesis, Università degli Studi di Trento, Trento, Italia, April 2018. http://eprints-phd.biblio.unitn.it/2865/

  28. Sundheim, B.M.: Overview of results of the MUC-6 Evaluation. In: Proceedings of the 6th Conference on Message Understanding, pp. 13–31, November 1995. https://doi.org/10.3115/1072399.1072402

  29. van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use of the Simple Event Model (SEM). J. Web Semant. 9(2), 128–136 (2011). https://doi.org/10.1016/j.websem.2011.03.003

    Article  Google Scholar 

  30. Vrandečić, D.: Architecture for a multilingual Wikipedia. arXiv:2004.04733 [cs], April 2020. http://arxiv.org/abs/2004.04733

  31. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489

    Article  Google Scholar 

  32. Xiang, W., Wang, B.: A survey of event extraction from text. IEEE Access 7, 173111–173137 (2019). https://doi.org/10.1109/ACCESS.2019.2956831

    Article  Google Scholar 

  33. Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, p. 14, August 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Bernard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bernard, G., Suire, C., Faucher, C., Doucet, A. (2021). A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86324-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86323-4

  • Online ISBN: 978-3-030-86324-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics