Abstract
We present an online multilingual system for event detection and comprehension from media feeds. The system retrieves information from news sites, aggregates them into events (event detection), and summarizes them by extracting semantic labels of its most relevant entities (event representation) in order to answer the journalism Ws: who, what, when and where. The generated events populate VLX-Stories -an event ontology- transforming unstructured text data to a structured knowledge base representation. Our system exploits an external entity Knowledge Graph (VKG) to help populate VLX-Stories. At the same time, this external knowledge graph can also be extended with a Dynamic Entity Linking (DEL) module, which detects emerging entities (EE) on unstructured data. The system is currently deployed in production and used by media producers in the editorial process, providing real-time access to breaking news. Each month, VLX-Stories detects over 9000 events from over 4000 news feeds from seven different countries and in three different languages. At the same time, it detects over 1300 EE per month, which populate VKG.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Agrawal, R.S., Srikant, P.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC-2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Conrad, J.G., Bender, M.: Semi-supervised events clustering in news retrieval. In: NewsIR@ ECIR, pp. 21–26 (2016)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Fernández, D., et al.: Vits: video tagging system from massive web multimedia collections. In: Proceedings of the 5th Workshop on Web-scale Vision and Social Media (VSM), pp. 337–346. IEEE Press (2017)
Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: IJCAI, pp. 1753–1759 (2017)
Hamborg, F., Breitinger, C., Schubotz, M., Lachnit, S., Gipp, B.: Extraction of main event descriptors from news articles by answering the journalistic five W and one H questions. In: JCDL, pp. 339–340 (2018)
Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., Gipp, B.: Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds.) iConference 2018. LNCS, vol. 10766, pp. 356–366. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78105-1_39
Hennig, L., et al.: SPIGA-a multilingual news aggregator. In: Proceedings of GSCL 2011 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hoffart, J., Milchevski, D., Weikum, G., Anand, A., Singh, J.: The knowledge awakens: keeping knowledge bases fresh with emerging entities. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 203–206. International World Wide Web Conferences Steering Committee (2016)
Ji, H., Grishman, R.: Refining event extraction through cross-document inference. In: Proceedings of ACL 2008: HLT, pp. 254–262 (2008)
Jou, B., Li, H., Ellis, J.G., Morozoff-Abegauz, D., Chang, S.F.: Structured exploration of who, what, when, and where in heterogeneous multimedia news sources. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 357–360. ACM (2013)
Kuzey, E., Vreeken, J., Weikum, G.: A fresh look on knowledge bases: distilling named events from news. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1689–1698. ACM (2014)
Kwak, H., An, J.: A first look at global news coverage of disasters by using the GDELT dataset. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 300–308. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_22
Le, P., Titov, I.: Improving entity linking by modeling latent relations between mentions. arXiv preprint arXiv:1804.10637 (2018)
Leban, G., Fortuna, B., Grobelnik, M.: Using news articles for real-time cross-lingual event detection and filtering. In: NewsIR@ ECIR, pp. 33–38 (2016)
Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2, pp. 1–49. Citeseer (2013)
Li, H., Ellis, J.G., Ji, H., Chang, S.F.: Event specific multimodal pattern mining for knowledge base construction. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 821–830. ACM (2016)
Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extractionmeets the semantic web: a survey. Semant. Web (Preprint), 1–81 (2018)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007)
Polleres, A., Hogan, A., Harth, A., Decker, S.: Can we ever catch up with the web? Semant. Web 1(1, 2), 45–52 (2010)
Rospocher, M., et al.: Building event-centric knowledge graphs from news. J. Web Semant. 37, 132–151 (2016)
Sagi, T., Wolf, Y., Hose, K.: How new is the (RDF) news? In: Companion Proceedings of The 2019 World Wide Web Conference, pp. 714–721. ACM (2019)
Schrodt, P.A., Beieler, J., Idris, M.: Three’sa charm?: open event data coding with EL: DIABLO, PETRARCH, and the open event data alliance. In: ISA Annual Convention (2014)
Shah, C., Croft, W.B., Jensen, D.: Representing documents with named entities for story link detection (SLD). In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 868–869. ACM (2006)
Singer, J.B.: Five Ws and an H: digital challenges in newspaper newsrooms and boardrooms. Int. J. Media Manag. 10, 122–129 (2008)
Singh, J., Hoffart, J., Anand, A.: Discovering entities with just a little help from you. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1331–1340. ACM (2016)
Steinberger, J.: MediaGist: a cross-lingual analyser of aggregated news and commentaries. In: Proceedings of ACL-2016 System Demonstrations, pp. 145–150 (2016)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)
Van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use of the simple event model (SEM). Web Semant.: Sci. Serv. Agents World Wide Web 9(2), 128–136 (2011)
Vossen, P., et al.: Newsreader: using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowl.-Based Syst. 110, 60–85 (2016)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wang, W.: Event detection and extraction from news articles. Ph.D. thesis, Virginia Tech (2018)
Ward, M.D., Beger, A., Cutler, J., Dickenson, M., Dorff, C., Radford, B.: Comparing GDELT and ICEWS event data. Analysis 21(1), 267–297 (2013)
Wu, Z., Liang, C., Giles, C.L.: Storybase: towards building a knowledge base for news events. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, pp. 133–138 (2015)
Zhang, T., et al.: Improving event extraction via multimodal integration. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 270–278. ACM (2017)
Acknowledgments
Dèlia Fernàndez-Cañellas is funded by contract 2017-DI-011 of the Industrial Doctorate Program of the Government of Catalonia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fernàndez-Cañellas, D. et al. (2019). VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity Detection. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-30796-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30795-0
Online ISBN: 978-3-030-30796-7
eBook Packages: Computer ScienceComputer Science (R0)