Giveme5W: Main Event Retrieval from News Articles by Extraction of the Five Journalistic W Questions

Hamborg, Felix; Lachnit, Soeren; Schubotz, Moritz; Hepp, Thomas; Gipp, Bela

doi:10.1007/978-3-319-78105-1_39

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10766))

Included in the following conference series:

International Conference on Information

6032 Accesses
17 Citations

Abstract

Extraction of event descriptors from news articles is a commonly required task for various tasks, such as clustering related articles, summarization, and news aggregation. Due to the lack of generally usable and publicly available methods optimized for news, many researchers must redundantly implement such methods for their project. Answers to the five journalistic W questions (5Ws) describe the main event of a news article, i.e., who did what, when, where, and why. The main contribution of this paper is Giveme5W, the first open-source, syntax-based 5W extraction system for news articles. The system retrieves an article’s main event by extracting phrases that answer the journalistic 5Ws. In an evaluation with three assessors and 60 articles, we find that the extraction precision of 5W phrases is \( p = 0.7 \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the POS-tag abbreviations from the Penn Treebank Project [33].

References

Agence France-Presse: Taliban attacks German consulate in Northern Afghan city of Mazar-i-Sharif with truck bomb. The Telegraph (2016)
Google Scholar
Allan, J., et al.: 1998 Topic detection and tracking pilot study: final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Google Scholar
Altenberg, B.: Causal linking in spoken and written English. Studia Linguistica 38(1), 20–69 (1984)
Article Google Scholar
Asghar, N.: Automatic extraction of causal relations from natural language texts: a comprehensive survey. arXiv preprint arXiv:1605.07895 (2016)
Best, C., et al.: Europe media monitor (2005)
Google Scholar
Bethard, S., Martin, J.H.: Learning semantic links from a corpus of parallel temporal and causal relations. In: Proceedings of the 46th Annual Meeting of the ACL on Human Language Technologies, pp. 177–180 (2008)
Google Scholar
Bird, S., et al.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc., Sebastopol (2009)
MATH Google Scholar
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language, pp. 152–164 (2005)
Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 173–180 (2005)
Google Scholar
Christian, D., et al.: The Associated Press Stylebook and Briefing on Media Law. Associated Press, New York (2014)
Google Scholar
Das, A., Bandyaopadhyay, S., Gambäck, B.: The 5W structure for sentiment summarization-visualization-tracking. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7181, pp. 540–555. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28604-9_44
Chapter Google Scholar
Finkel, J.R., et al.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 363–370 (2005)
Google Scholar
Girju, R.: Automatic detection of causal relations for question answering. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, vol. 12, pp. 76–83 (2003)
Google Scholar
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (2006)
Google Scholar
Hamborg, F., et al.: Identification and analysis of media bias in news articles. In: Proceedings of the 15th International Symposium of Information Science (2017)
Google Scholar
Hamborg, F., et al.: Matrix-based news aggregation: exploring different news perspectives. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, p. 10 (2017)
Google Scholar
Hamborg, F., et al.: news-please: A generic news crawler and extractor. In: Proceedings of the 15th International Symposium of Information Science, pp. 218–223 (2017)
Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Article Google Scholar
Jurafsky, D.: Speech and Language Processing. Pearson Education India, New Delhi (2000)
Google Scholar
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. J. Am. Soc. Inform. Sci. Technol. 53(13), 1120–1129 (2002)
Article Google Scholar
Khoo, C.S.G., et al.: Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Lit. Linguist. Comput. 13(4), 177–186 (1998)
Article Google Scholar
Khoo, C.S.G.: Automatic identification of causal relations in text and their use for improving precision in information retrieval (1995)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Article MATH Google Scholar
Manning, C.D., et al.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
McKeown, K.R., et al.: Tracking and summarizing news on a daily basis with Columbia’s Newsblaster. In: Proceedings of the 2nd International Conference on Human Language Technology Research, pp. 280–285 (2002)
Google Scholar
Oliver, P.E., Maney, G.M.: Political processes and local newspaper coverage of protest events: from selection bias to triadic interactions. Am. J. Sociol. 106(2), 463–505 (2000)
Article Google Scholar
Park, S., et al. NewsCube: delivering multiple aspects of news to mitigate media bias. In: Proceedings of SIGCHI 2009 Conference on Human Factors in Computing Systems, pp. 443–453 (2009)
Google Scholar
parsedatetime - Parse human-readable date/time strings. https://github.com/bear/parsedatetime. Accessed 21 Aug 2017
Parton, K., et al.: Who, what, when, where, why?: comparing multiple approaches to the cross-lingual 5W task. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 423–431 (2009)
Google Scholar
Sharma, S., et al.: News event extraction using 5W1H approach & its analysis. Int. J. Sci. Eng. Res. – IJSER 4(5), 2064–2067 (2013)
Google Scholar
Stemler, S.: An overview of content analysis. Pract. Assess. Res. Eval. 7(17), 137–146 (2001)
Google Scholar
Tanev, H., Piskorski, J., Atkinson, M.: Real-time news event extraction for global crisis monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69858-6_21
Chapter Google Scholar
Taylor, A., et al.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks. TLTB, vol. 20, pp. 5–22. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_1
Chapter Google Scholar
Wang, W., et al.: Chinese news event 5W1H elements extraction using semantic role labeling. In: 2010 Third International Symposium on Information Processing (ISIP), pp. 484–489 (2010)
Google Scholar
Yaman, S., et al.: Classification-based strategies for combining multiple 5-W question answering systems. In: INTERSPEECH, pp. 2703–2706 (2009)
Google Scholar
Yaman, S., et al.: Combining semantic and syntactic information sources for 5-W question answering. In: INTERSPEECH, pp. 2707–2710 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Konstanz, Konstanz, Germany
Felix Hamborg, Soeren Lachnit, Moritz Schubotz, Thomas Hepp & Bela Gipp

Authors

Felix Hamborg
View author publications
You can also search for this author in PubMed Google Scholar
Soeren Lachnit
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Schubotz
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hepp
View author publications
You can also search for this author in PubMed Google Scholar
Bela Gipp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felix Hamborg .

Editor information

Editors and Affiliations

Northumbria University, Newcastle upon Tyne, United Kingdom
Gobinda Chowdhury
Northumbria University, Newcastle upon Tyne, United Kingdom
Julie McLeod
University of Sheffield, Sheffield, United Kingdom
Val Gillet
University of Sheffield, Sheffield, United Kingdom
Peter Willett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., Gipp, B. (2018). Giveme5W: Main Event Retrieval from News Articles by Extraction of the Five Journalistic W Questions. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds) Transforming Digital Worlds. iConference 2018. Lecture Notes in Computer Science(), vol 10766. Springer, Cham. https://doi.org/10.1007/978-3-319-78105-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-78105-1_39
Published: 15 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78104-4
Online ISBN: 978-3-319-78105-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics