Summarizing News Articles Using Question-and-Answer Pairs via Learning

Wang, Xuezhi; Yu, Cong

doi:10.1007/978-3-030-30793-6_40

Xuezhi Wang¹⁷ &
Cong Yu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11778))

Included in the following conference series:

International Semantic Web Conference

2477 Accesses
1 Citations

Abstract

The launch of the new Google News in 2018 (https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/.) introduced the Frequently asked questions feature to structurally summarize the news story in its full coverage page. While news summarization has been a research topic for decades, this new feature is poised to usher in a new line of news summarization techniques. There are two fundamental approaches: mining the questions from data associated with the news story and learning the questions from the content of the story directly. This paper provides the first study, to the best of our knowledge, of a learning based approach to generate a structured summary of news articles with question and answer pairs to capture salient and interesting aspects of the news story. Specifically, this learning-based approach reads a news article, predicts its attention map (i.e., important snippets in the article), and generates multiple natural language questions corresponding to each snippet. Furthermore, we describe a mining-based approach as the mechanism to generate weak supervision data for training the learning based approach. We evaluate our approach on the existing SQuAD dataset (https://rajpurkar.github.io/SQuAD-explorer/.) and a large dataset with 91K news articles we constructed. We show that our proposed system can achieve an AUC of 0.734 for document attention map prediction, a BLEU-4 score of 12.46 for natural question generation and a BLEU-4 score of 24.4 for question summarization, beating state-of-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Private communication with Google’s news team: FAQ is shown to improve users’ understanding of the news stories in user studies, which is an important launch criteria.
2.
Private communication.
3.
https://rajpurkar.github.io/SQuAD-explorer/.
4.
https://nlp.stanford.edu/software/tagger.html.
5.
Note there can be multiple questions with the same answer snippet, for example, another question candidate could be: Under which name is the Black Eagle Brewery also known? Our learning based approach can learn those diverse questions provided that the training data captures the same diversity.
6.
https://nlp.stanford.edu/software/tokenizer.shtml.
7.
https://nlp.stanford.edu/software/tokenizer.shtml.

References

Angeli, G., Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: ACL (2015)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Google Scholar
Chen, D., Fisch, A., Weston, J., Bordes, A.: Read wikipedia to answer open-domain questions. In: ACL (2017)
Google Scholar
Du, X., Cardie, C.: Identifying where to focus in reading comprehension for neural question generation. In: EMNLP (2017)
Google Scholar
Erkan, G., Radev, D.R.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: NAACL-ANLP Workshop on Automatic Summarization (2000)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. In: JAIR (2004)
Article Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP (2011)
Google Scholar
Fader, A., Zettlemoyer, L., Etzioni, O.: Paraphrase-driven learning for open question answering. In: ACL (2013)
Google Scholar
Feng, X., Huang, L., Tang, D., Qin, B., Ji, H., Liu, T.: A language-independent neural network for event detection. In: ACL (2016)
Google Scholar
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016)
Google Scholar
Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngonga Ngomo, A.C.: Survey on challenges of question answering in the semantic web. Semant. Web 8(6), 895–920 (2017)
Article Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD (2002)
Google Scholar
Kedzie, C., Diaz, F., McKeown, K.: Real-time web scale event summarization using sequential decision making. In: IJCAI (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kolomiyets, O., Moens, M.F.: A survey on question answering technology from an information retrieval perspective. Inf. Sci. 181(24), 5412–5434 (2011)
Article MathSciNet Google Scholar
Koutra, D., Bennett, P.N., Horvitz, E.: Events and controversies: influences of a shocking news event on information seeking. In: WWW (2015)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: ACL (July 2004). https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)
Google Scholar
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC (2018)
Google Scholar
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3
Chapter Google Scholar
Nguyen, D.B., Abujabal, A., Tran, K., Theobald, M., Weikum, G.: Query-driven on-the-fly knowledge base construction. In: VLDB (2017)
Google Scholar
Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: NAACL (2016)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: EMNLP (2016)
Google Scholar
See, A., Liu, P., Manning, C.: Get to the point: summarization with pointer-generator networks. In: ACL (2017)
Google Scholar
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR (2017)
Google Scholar
Shen, C., Liu, F., Weng, F., Li, T.: A participant-based approach for event summarization using twitter streams. In: NAACL-HLT (2013)
Google Scholar
Upstill, T.: The new Google news: AI meets human intelligence (2018). https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus (February 2006). https://catalog.ldc.upenn.edu/ldc2006t06
Yu, A.W., et al.: QANet: Combining local convolution with global self-attention for reading comprehension. In: ICLR (2018)
Google Scholar
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: NLPCC (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Research, New York, USA
Xuezhi Wang & Cong Yu

Authors

Xuezhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuezhi Wang .

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Trento, Italy
Chiara Ghidini
Linköping University, Linköping, Sweden
Olaf Hartig
University of Bonn, Bonn, Germany
Maria Maleshkova
University of Economics Prague, Prague, Czech Republic
Vojtěch Svátek
University of Illinois at Chicago, Chicago, IL, USA
Isabel Cruz
University of Chile, Santiago, Chile
Aidan Hogan
Memect Technology, Beijing, China
Jie Song
Mines Saint-Etienne, Saint-Etienne, France
Maxime Lefrançois
Inria Sophia Antipolis - Méditerranée, Sophia Antipolis, France
Fabien Gandon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Yu, C. (2019). Summarizing News Articles Using Question-and-Answer Pairs via Learning. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-30793-6_40
Published: 17 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30792-9
Online ISBN: 978-3-030-30793-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)