Skip to main content

Summarizing News Articles Using Question-and-Answer Pairs via Learning

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2019 (ISWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11778))

Included in the following conference series:

Abstract

The launch of the new Google News in 2018 (https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/.) introduced the Frequently asked questions feature to structurally summarize the news story in its full coverage page. While news summarization has been a research topic for decades, this new feature is poised to usher in a new line of news summarization techniques. There are two fundamental approaches: mining the questions from data associated with the news story and learning the questions from the content of the story directly. This paper provides the first study, to the best of our knowledge, of a learning based approach to generate a structured summary of news articles with question and answer pairs to capture salient and interesting aspects of the news story. Specifically, this learning-based approach reads a news article, predicts its attention map (i.e., important snippets in the article), and generates multiple natural language questions corresponding to each snippet. Furthermore, we describe a mining-based approach as the mechanism to generate weak supervision data for training the learning based approach. We evaluate our approach on the existing SQuAD dataset (https://rajpurkar.github.io/SQuAD-explorer/.) and a large dataset with 91K news articles we constructed. We show that our proposed system can achieve an AUC of 0.734 for document attention map prediction, a BLEU-4 score of 12.46 for natural question generation and a BLEU-4 score of 24.4 for question summarization, beating state-of-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Private communication with Google’s news team: FAQ is shown to improve users’ understanding of the news stories in user studies, which is an important launch criteria.

  2. 2.

    Private communication.

  3. 3.

    https://rajpurkar.github.io/SQuAD-explorer/.

  4. 4.

    https://nlp.stanford.edu/software/tagger.html.

  5. 5.

    Note there can be multiple questions with the same answer snippet, for example, another question candidate could be: Under which name is the Black Eagle Brewery also known? Our learning based approach can learn those diverse questions provided that the training data captures the same diversity.

  6. 6.

    https://nlp.stanford.edu/software/tokenizer.shtml.

  7. 7.

    https://nlp.stanford.edu/software/tokenizer.shtml.

References

  1. Angeli, G., Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: ACL (2015)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  3. Chen, D., Fisch, A., Weston, J., Bordes, A.: Read wikipedia to answer open-domain questions. In: ACL (2017)

    Google Scholar 

  4. Du, X., Cardie, C.: Identifying where to focus in reading comprehension for neural question generation. In: EMNLP (2017)

    Google Scholar 

  5. Erkan, G., Radev, D.R.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: NAACL-ANLP Workshop on Automatic Summarization (2000)

    Google Scholar 

  6. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. In: JAIR (2004)

    Article  Google Scholar 

  7. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP (2011)

    Google Scholar 

  8. Fader, A., Zettlemoyer, L., Etzioni, O.: Paraphrase-driven learning for open question answering. In: ACL (2013)

    Google Scholar 

  9. Feng, X., Huang, L., Tang, D., Qin, B., Ji, H., Liu, T.: A language-independent neural network for event detection. In: ACL (2016)

    Google Scholar 

  10. Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016)

    Google Scholar 

  11. Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngonga Ngomo, A.C.: Survey on challenges of question answering in the semantic web. Semant. Web 8(6), 895–920 (2017)

    Article  Google Scholar 

  12. Joachims, T.: Optimizing search engines using clickthrough data. In: KDD (2002)

    Google Scholar 

  13. Kedzie, C., Diaz, F., McKeown, K.: Real-time web scale event summarization using sequential decision making. In: IJCAI (2016)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  15. Kolomiyets, O., Moens, M.F.: A survey on question answering technology from an information retrieval perspective. Inf. Sci. 181(24), 5412–5434 (2011)

    Article  MathSciNet  Google Scholar 

  16. Koutra, D., Bennett, P.N., Horvitz, E.: Events and controversies: influences of a shocking news event on information seeking. In: WWW (2015)

    Google Scholar 

  17. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: ACL (July 2004). https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/

  18. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)

    Google Scholar 

  19. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC (2018)

    Google Scholar 

  20. Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3

    Chapter  Google Scholar 

  21. Nguyen, D.B., Abujabal, A., Tran, K., Theobald, M., Weikum, G.: Query-driven on-the-fly knowledge base construction. In: VLDB (2017)

    Google Scholar 

  22. Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: NAACL (2016)

    Google Scholar 

  23. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)

    Google Scholar 

  24. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: EMNLP (2016)

    Google Scholar 

  25. See, A., Liu, P., Manning, C.: Get to the point: summarization with pointer-generator networks. In: ACL (2017)

    Google Scholar 

  26. Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR (2017)

    Google Scholar 

  27. Shen, C., Liu, F., Weng, F., Li, T.: A participant-based approach for event summarization using twitter streams. In: NAACL-HLT (2013)

    Google Scholar 

  28. Upstill, T.: The new Google news: AI meets human intelligence (2018). https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/

  29. Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus (February 2006). https://catalog.ldc.upenn.edu/ldc2006t06

  30. Yu, A.W., et al.: QANet: Combining local convolution with global self-attention for reading comprehension. In: ICLR (2018)

    Google Scholar 

  31. Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: NLPCC (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuezhi Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X., Yu, C. (2019). Summarizing News Articles Using Question-and-Answer Pairs via Learning. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30793-6_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30792-9

  • Online ISBN: 978-3-030-30793-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics