Skip to main content

VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11779))

Abstract

We present an online multilingual system for event detection and comprehension from media feeds. The system retrieves information from news sites, aggregates them into events (event detection), and summarizes them by extracting semantic labels of its most relevant entities (event representation) in order to answer the journalism Ws: who, what, when and where. The generated events populate VLX-Stories -an event ontology- transforming unstructured text data to a structured knowledge base representation. Our system exploits an external entity Knowledge Graph (VKG) to help populate VLX-Stories. At the same time, this external knowledge graph can also be extended with a Dynamic Entity Linking (DEL) module, which detects emerging entities (EE) on unstructured data. The system is currently deployed in production and used by media producers in the editorial process, providing real-time access to breaking news. Each month, VLX-Stories detects over 9000 events from over 4000 news feeds from seven different countries and in three different languages. At the same time, it detects over 1300 EE per month, which populate VKG.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://news.google.com.

  2. 2.

    http://news.yahoo.com.

  3. 3.

    https://www.vilynx.com.

  4. 4.

    https://spacy.io/.

  5. 5.

    https://schema.org/.

  6. 6.

    https://spacy.io/usage/facts-figures.

  7. 7.

    https://www.icews.com/.

  8. 8.

    https://www.gdeltproject.org/.

  9. 9.

    https://dataverse.harvard.edu/dataverse/icews.

  10. 10.

    http://openeventdata.org/.

  11. 11.

    http://www.newsreader-project.eu/.

References

  1. Agrawal, R.S., Srikant, P.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  2. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)

    Google Scholar 

  3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC-2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  4. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)

    Google Scholar 

  5. Conrad, J.G., Bender, M.: Semi-supervised events clustering in news retrieval. In: NewsIR@ ECIR, pp. 21–26 (2016)

    Google Scholar 

  6. Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  7. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  8. Fernández, D., et al.: Vits: video tagging system from massive web multimedia collections. In: Proceedings of the 5th Workshop on Web-scale Vision and Social Media (VSM), pp. 337–346. IEEE Press (2017)

    Google Scholar 

  9. Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: IJCAI, pp. 1753–1759 (2017)

    Google Scholar 

  10. Hamborg, F., Breitinger, C., Schubotz, M., Lachnit, S., Gipp, B.: Extraction of main event descriptors from news articles by answering the journalistic five W and one H questions. In: JCDL, pp. 339–340 (2018)

    Google Scholar 

  11. Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., Gipp, B.: Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds.) iConference 2018. LNCS, vol. 10766, pp. 356–366. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78105-1_39

    Chapter  Google Scholar 

  12. Hennig, L., et al.: SPIGA-a multilingual news aggregator. In: Proceedings of GSCL 2011 (2011)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  14. Hoffart, J., Milchevski, D., Weikum, G., Anand, A., Singh, J.: The knowledge awakens: keeping knowledge bases fresh with emerging entities. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 203–206. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  15. Ji, H., Grishman, R.: Refining event extraction through cross-document inference. In: Proceedings of ACL 2008: HLT, pp. 254–262 (2008)

    Google Scholar 

  16. Jou, B., Li, H., Ellis, J.G., Morozoff-Abegauz, D., Chang, S.F.: Structured exploration of who, what, when, and where in heterogeneous multimedia news sources. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 357–360. ACM (2013)

    Google Scholar 

  17. Kuzey, E., Vreeken, J., Weikum, G.: A fresh look on knowledge bases: distilling named events from news. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1689–1698. ACM (2014)

    Google Scholar 

  18. Kwak, H., An, J.: A first look at global news coverage of disasters by using the GDELT dataset. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 300–308. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_22

    Chapter  Google Scholar 

  19. Le, P., Titov, I.: Improving entity linking by modeling latent relations between mentions. arXiv preprint arXiv:1804.10637 (2018)

  20. Leban, G., Fortuna, B., Grobelnik, M.: Using news articles for real-time cross-lingual event detection and filtering. In: NewsIR@ ECIR, pp. 33–38 (2016)

    Google Scholar 

  21. Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2, pp. 1–49. Citeseer (2013)

    Google Scholar 

  22. Li, H., Ellis, J.G., Ji, H., Chang, S.F.: Event specific multimodal pattern mining for knowledge base construction. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 821–830. ACM (2016)

    Google Scholar 

  23. Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)

    Google Scholar 

  24. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  25. Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extractionmeets the semantic web: a survey. Semant. Web (Preprint), 1–81 (2018)

    Google Scholar 

  26. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  27. Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007)

    Google Scholar 

  28. Polleres, A., Hogan, A., Harth, A., Decker, S.: Can we ever catch up with the web? Semant. Web 1(1, 2), 45–52 (2010)

    Article  Google Scholar 

  29. Rospocher, M., et al.: Building event-centric knowledge graphs from news. J. Web Semant. 37, 132–151 (2016)

    Article  Google Scholar 

  30. Sagi, T., Wolf, Y., Hose, K.: How new is the (RDF) news? In: Companion Proceedings of The 2019 World Wide Web Conference, pp. 714–721. ACM (2019)

    Google Scholar 

  31. Schrodt, P.A., Beieler, J., Idris, M.: Three’sa charm?: open event data coding with EL: DIABLO, PETRARCH, and the open event data alliance. In: ISA Annual Convention (2014)

    Google Scholar 

  32. Shah, C., Croft, W.B., Jensen, D.: Representing documents with named entities for story link detection (SLD). In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 868–869. ACM (2006)

    Google Scholar 

  33. Singer, J.B.: Five Ws and an H: digital challenges in newspaper newsrooms and boardrooms. Int. J. Media Manag. 10, 122–129 (2008)

    Article  Google Scholar 

  34. Singh, J., Hoffart, J., Anand, A.: Discovering entities with just a little help from you. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1331–1340. ACM (2016)

    Google Scholar 

  35. Steinberger, J.: MediaGist: a cross-lingual analyser of aggregated news and commentaries. In: Proceedings of ACL-2016 System Demonstrations, pp. 145–150 (2016)

    Google Scholar 

  36. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

    Google Scholar 

  37. Van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use of the simple event model (SEM). Web Semant.: Sci. Serv. Agents World Wide Web 9(2), 128–136 (2011)

    Article  Google Scholar 

  38. Vossen, P., et al.: Newsreader: using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowl.-Based Syst. 110, 60–85 (2016)

    Article  Google Scholar 

  39. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  40. Wang, W.: Event detection and extraction from news articles. Ph.D. thesis, Virginia Tech (2018)

    Google Scholar 

  41. Ward, M.D., Beger, A., Cutler, J., Dickenson, M., Dorff, C., Radford, B.: Comparing GDELT and ICEWS event data. Analysis 21(1), 267–297 (2013)

    Google Scholar 

  42. Wu, Z., Liang, C., Giles, C.L.: Storybase: towards building a knowledge base for news events. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, pp. 133–138 (2015)

    Google Scholar 

  43. Zhang, T., et al.: Improving event extraction via multimodal integration. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 270–278. ACM (2017)

    Google Scholar 

Download references

Acknowledgments

Dèlia Fernàndez-Cañellas is funded by contract 2017-DI-011 of the Industrial Doctorate Program of the Government of Catalonia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dèlia Fernàndez-Cañellas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernàndez-Cañellas, D. et al. (2019). VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity Detection. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30796-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30795-0

  • Online ISBN: 978-3-030-30796-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics