Skip to main content

Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects

  • Chapter
  • First Online:
Multi-source, Multilingual Information Extraction and Summarization

Abstract

Information Extraction (IE) and Summarization share the same goal of extracting and presenting the relevant information of a document. While IE was a primary element of early abstractive summarization systems, it’s been left out in more recent extractive systems. However, extracting facts, recognizing entities and events should provide useful information to those systems and help resolve semantic ambiguities that they cannot tackle. This paper explores novel approaches to taking advantage of cross-document IE for multi-document summarization. We propose multiple approaches to IE-based summarization and analyze their strengths and weaknesses. One of them, re-ranking the output of a high performing summarization system with IE-informed metrics, leads to improvements in both manually-evaluated content quality and readability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nist.gov/speech/tests/ace/

References

  1. Banko, M., Cafarella, M.J., Soderland, S., Etzioni, O.: Open information extraction from the web. In: Proceeding of the International Joint Conferences on Artificial Intelligence (IJCAI 2007), Hyderabad (2007)

    Google Scholar 

  2. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)

    Google Scholar 

  3. Bellare, K., Sarma, A.D., Loiwal, N., Mehta, V., Ramakrishnan, G., Bhattacharyya, P.: Generic text summarization using wordNet. In: Proceeding of the 4th International Conference on Language Resource and Evaluation (LREC2004), Lisbon (2004)

    Google Scholar 

  4. Biadsy, F., Hirschberg, J., Filatova, E.: An unsupervised approach to biography production using wikipedia. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus, pp. 807–815. (2008)

    Google Scholar 

  5. Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: Proceeding of the National Conference on Artificial Intelligence, Vancouver, vol. 2 (2007)

    Google Scholar 

  6. Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu (2008)

    Google Scholar 

  7. Chaves, R.P.: WordNet and automated text summarization. In: Proceeding of the 6th Natural Language Processing Pacific Rim Symposium, Tokyo (2001)

    Google Scholar 

  8. Chen, Z., Tamang, S., Lee, A., Li, X., Lin, W., Artiles, J., Snover, M., Passantino, M., Ji, H.: CUNY-BLENDER TAC-KBP2010 entity linking and slot filling system description. In: Proceeding of the Text Analysis Conference (TAC2010), City University of New York (2010)

    Google Scholar 

  9. Dang, C., Luo, X., Zhang, H.: Wordnet-based summarization of unstructured document. J. WSEAS Trans. Comput. 7(9), 1467–1472 (2008)

    Google Scholar 

  10. Dang, H. T., Owczarzak, K.: Overview of the TAC 2009 summarization track. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)

    Google Scholar 

  11. Fellbaum, C. (ed.). WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)

    Google Scholar 

  12. Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceeding of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva (2004)

    Google Scholar 

  13. Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The ICSI/UTD summarization system at TAC 2009. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)

    Google Scholar 

  14. Grishman, R., Hobbs, J., Hovy, E., Sanfilippo, A., Wilks, Y: Cross-lingual information extraction and automated text summarization. Linguist. Comput. XIV–XV (1997)

    Google Scholar 

  15. Grishman, R., Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceeding of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 466–471. (1996)

    Google Scholar 

  16. Grishman, R., Westbrook, D., Meyers, A.: NYUs Chinese ACE 2005 EDR system description. In: Proceeding of the NIST Automatic Content Extraction Workshop (ACE2005) (2005)

    Google Scholar 

  17. Hachey, B.: Multi-document summarisation using generic relation extraction. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 420–429. (2009)

    Google Scholar 

  18. Ji, H., Grishman, R.: Refining event extraction through cross-document inference. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)

    Google Scholar 

  19. Ji, H., Grishman, R., Chen, Z., Gupta, P.: Cross-document event extraction, ranking and tracking. In: Proceeding of the Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp. 166–172. (2009)

    Google Scholar 

  20. Ji, H., Grishman, R., Dang, H. T., Griffitt, K., Ellis, J.: An overview of the TAC2010 knowledge base population track. In: Proceeding of the Text Analysis Conference (TAC2010), Gaithersburg (2010)

    Google Scholar 

  21. Lin, C., Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceeding of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), Edmonton, pp. 150–156. (2003)

    Google Scholar 

  22. Liu, F., Liu, Y.: From extractive to abstractive meeting summaries: can it be done by sentence compression? In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)

    Google Scholar 

  23. McKeown, K., Passonneau, R., Elson, D., Nenkova, A., Hirschberg, J.: Do summaries help? A task-based evaluation of multi-document summarization. In: Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador (2005)

    Google Scholar 

  24. Melli, G., Shi, Z., Wang, Y., Liu, Y., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. In: Proceeding of the Document Understanding Conference (DUC 2006), Brooklyn (2006)

    Google Scholar 

  25. Melli, G., Wang, Y., Liu, Y., Kashani, M.M., Shi, Z., Gu, B., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2005 summarization task. In: Proceeding of the Document Understanding Conference (DUC2005), Vancouver (2005)

    Google Scholar 

  26. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)

    Google Scholar 

  27. Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: Proceeding of the Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL 2004), Boston (2004)

    Google Scholar 

  28. Radev, D.R., McKeown, K.R.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1998)

    Google Scholar 

  29. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62, 107–136 (2006)

    Google Scholar 

  30. Rusu, D., Fortuna, B., Grobelink, M., Mladenic, D.: Semantic graphs derived from triplets with application in document summarization. Informatica, 33, 357–362 (2009)

    Google Scholar 

  31. Sauper, C., Barzilay, R.: Automatically generating wikipedia articles: a structure-aware approach. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)

    Google Scholar 

  32. Schlaefer, N., Ko, J., Betteridge, J., Sautter, G., Pathak, M., Nyberg, E.: Semantic extensions of the Ephyra QA system for TREC2007. In: Proceeding of the Text Retrieval Conference (TREC2007), Gaithersburg (2007)

    Google Scholar 

  33. Sekine, S.: On-demand information extraction. In: Proceeding of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL 2006), Sydney (2006)

    Google Scholar 

  34. Vanderwende, L., Banko, M., Menezes, A.: Event-centric summary generation. In: Proceeding of the Document Understanding Conference (DUC 2004), Boston (2004)

    Google Scholar 

  35. Vikas, O., Meshram, A.K., Meena, G., Gupta, A.: Multiple document summarization using principal component analysis incorporating semantic vector space model. Comput. Linguist. Chin. Lang. Process. 13(2), 141–156 (2008)

    Google Scholar 

  36. White, M., Korelsky, T., Cardie, C., Ng, V., Pierce, D., Wagstaff, K.: Multidocument summarization via information extraction. In: Proceeding of the Human Language Technologies (HLT 2001), Lisbon, pp. 263–269. (2001)

    Google Scholar 

  37. Yarowsky, D.: Word-sense disambiguation using statistical models of Rogets categories trained on large corpora. In: Proceeding of the 14th International Conference on Computational Linguistics (COLING 1992), Nantes (1992)

    Google Scholar 

Download references

Acknowledgements

The first author and the third author were supported by the U.S. Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053, the U.S. NSF CAREER Award under Grant IIS-0953149 and PSC-CUNY Research Program. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ji, H., Favre, B., Lin, WP., Gillick, D., Hakkani-Tur, D., Grishman, R. (2013). Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28569-1_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28568-4

  • Online ISBN: 978-3-642-28569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics