Advertisement

Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects

  • Heng Ji
  • Benoit Favre
  • Wen-Pin Lin
  • Dan Gillick
  • Dilek Hakkani-Tur
  • Ralph Grishman
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Information Extraction (IE) and Summarization share the same goal of extracting and presenting the relevant information of a document. While IE was a primary element of early abstractive summarization systems, it’s been left out in more recent extractive systems. However, extracting facts, recognizing entities and events should provide useful information to those systems and help resolve semantic ambiguities that they cannot tackle. This paper explores novel approaches to taking advantage of cross-document IE for multi-document summarization. We propose multiple approaches to IE-based summarization and analyze their strengths and weaknesses. One of them, re-ranking the output of a high performing summarization system with IE-informed metrics, leads to improvements in both manually-evaluated content quality and readability.

Keywords

Information Extraction Source Document Coreference Resolution Semantic Role Label Summarization System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The first author and the third author were supported by the U.S. Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053, the U.S. NSF CAREER Award under Grant IIS-0953149 and PSC-CUNY Research Program. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

References

  1. 1.
    Banko, M., Cafarella, M.J., Soderland, S., Etzioni, O.: Open information extraction from the web. In: Proceeding of the International Joint Conferences on Artificial Intelligence (IJCAI 2007), Hyderabad (2007)Google Scholar
  2. 2.
    Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)Google Scholar
  3. 3.
    Bellare, K., Sarma, A.D., Loiwal, N., Mehta, V., Ramakrishnan, G., Bhattacharyya, P.: Generic text summarization using wordNet. In: Proceeding of the 4th International Conference on Language Resource and Evaluation (LREC2004), Lisbon (2004)Google Scholar
  4. 4.
    Biadsy, F., Hirschberg, J., Filatova, E.: An unsupervised approach to biography production using wikipedia. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus, pp. 807–815. (2008)Google Scholar
  5. 5.
    Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: Proceeding of the National Conference on Artificial Intelligence, Vancouver, vol. 2 (2007)Google Scholar
  6. 6.
    Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu (2008)Google Scholar
  7. 7.
    Chaves, R.P.: WordNet and automated text summarization. In: Proceeding of the 6th Natural Language Processing Pacific Rim Symposium, Tokyo (2001)Google Scholar
  8. 8.
    Chen, Z., Tamang, S., Lee, A., Li, X., Lin, W., Artiles, J., Snover, M., Passantino, M., Ji, H.: CUNY-BLENDER TAC-KBP2010 entity linking and slot filling system description. In: Proceeding of the Text Analysis Conference (TAC2010), City University of New York (2010)Google Scholar
  9. 9.
    Dang, C., Luo, X., Zhang, H.: Wordnet-based summarization of unstructured document. J. WSEAS Trans. Comput. 7(9), 1467–1472 (2008)Google Scholar
  10. 10.
    Dang, H. T., Owczarzak, K.: Overview of the TAC 2009 summarization track. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)Google Scholar
  11. 11.
    Fellbaum, C. (ed.). WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)Google Scholar
  12. 12.
    Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceeding of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva (2004)Google Scholar
  13. 13.
    Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The ICSI/UTD summarization system at TAC 2009. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)Google Scholar
  14. 14.
    Grishman, R., Hobbs, J., Hovy, E., Sanfilippo, A., Wilks, Y: Cross-lingual information extraction and automated text summarization. Linguist. Comput. XIV–XV (1997)Google Scholar
  15. 15.
    Grishman, R., Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceeding of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 466–471. (1996)Google Scholar
  16. 16.
    Grishman, R., Westbrook, D., Meyers, A.: NYUs Chinese ACE 2005 EDR system description. In: Proceeding of the NIST Automatic Content Extraction Workshop (ACE2005) (2005)Google Scholar
  17. 17.
    Hachey, B.: Multi-document summarisation using generic relation extraction. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 420–429. (2009)Google Scholar
  18. 18.
    Ji, H., Grishman, R.: Refining event extraction through cross-document inference. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)Google Scholar
  19. 19.
    Ji, H., Grishman, R., Chen, Z., Gupta, P.: Cross-document event extraction, ranking and tracking. In: Proceeding of the Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp. 166–172. (2009)Google Scholar
  20. 20.
    Ji, H., Grishman, R., Dang, H. T., Griffitt, K., Ellis, J.: An overview of the TAC2010 knowledge base population track. In: Proceeding of the Text Analysis Conference (TAC2010), Gaithersburg (2010)Google Scholar
  21. 21.
    Lin, C., Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceeding of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), Edmonton, pp. 150–156. (2003)Google Scholar
  22. 22.
    Liu, F., Liu, Y.: From extractive to abstractive meeting summaries: can it be done by sentence compression? In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)Google Scholar
  23. 23.
    McKeown, K., Passonneau, R., Elson, D., Nenkova, A., Hirschberg, J.: Do summaries help? A task-based evaluation of multi-document summarization. In: Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador (2005)Google Scholar
  24. 24.
    Melli, G., Shi, Z., Wang, Y., Liu, Y., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. In: Proceeding of the Document Understanding Conference (DUC 2006), Brooklyn (2006)Google Scholar
  25. 25.
    Melli, G., Wang, Y., Liu, Y., Kashani, M.M., Shi, Z., Gu, B., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2005 summarization task. In: Proceeding of the Document Understanding Conference (DUC2005), Vancouver (2005)Google Scholar
  26. 26.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)Google Scholar
  27. 27.
    Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: Proceeding of the Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL 2004), Boston (2004)Google Scholar
  28. 28.
    Radev, D.R., McKeown, K.R.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1998)Google Scholar
  29. 29.
    Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62, 107–136 (2006)Google Scholar
  30. 30.
    Rusu, D., Fortuna, B., Grobelink, M., Mladenic, D.: Semantic graphs derived from triplets with application in document summarization. Informatica, 33, 357–362 (2009)Google Scholar
  31. 31.
    Sauper, C., Barzilay, R.: Automatically generating wikipedia articles: a structure-aware approach. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)Google Scholar
  32. 32.
    Schlaefer, N., Ko, J., Betteridge, J., Sautter, G., Pathak, M., Nyberg, E.: Semantic extensions of the Ephyra QA system for TREC2007. In: Proceeding of the Text Retrieval Conference (TREC2007), Gaithersburg (2007)Google Scholar
  33. 33.
    Sekine, S.: On-demand information extraction. In: Proceeding of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL 2006), Sydney (2006)Google Scholar
  34. 34.
    Vanderwende, L., Banko, M., Menezes, A.: Event-centric summary generation. In: Proceeding of the Document Understanding Conference (DUC 2004), Boston (2004)Google Scholar
  35. 35.
    Vikas, O., Meshram, A.K., Meena, G., Gupta, A.: Multiple document summarization using principal component analysis incorporating semantic vector space model. Comput. Linguist. Chin. Lang. Process. 13(2), 141–156 (2008)Google Scholar
  36. 36.
    White, M., Korelsky, T., Cardie, C., Ng, V., Pierce, D., Wagstaff, K.: Multidocument summarization via information extraction. In: Proceeding of the Human Language Technologies (HLT 2001), Lisbon, pp. 263–269. (2001)Google Scholar
  37. 37.
    Yarowsky, D.: Word-sense disambiguation using statistical models of Rogets categories trained on large corpora. In: Proceeding of the 14th International Conference on Computational Linguistics (COLING 1992), Nantes (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Heng Ji
    • 1
  • Benoit Favre
    • 2
  • Wen-Pin Lin
    • 1
  • Dan Gillick
    • 3
  • Dilek Hakkani-Tur
    • 4
  • Ralph Grishman
    • 5
  1. 1.Computer Science DepartmentQueens College and Graduate Center, City University of New YorkNew YorkUSA
  2. 2.LIF, Aix-Marseille UniversitéMarseilleFrance
  3. 3.Computer Science DepartmentUniversity of CaliforniaBerkeleyUSA
  4. 4.Speech Labs, MicrosoftMountain ViewUSA
  5. 5.Computer Science DepartmentNew York UniversityNew YorkUSA

Personalised recommendations