What Makes a Good Summary?
One of the biggest challenges for intelligence analysts who participate in prevention or response to a terrorism act is to quickly find relevant information from massive amounts of data. Along with research on information retrieval and filtering, text summarization is an effective technique to help intelligence analysts shorten their time to find critical information and make timely decisions. Multi-document summarization is particularly useful as it serves to quickly describe a collection of information. The obvious shortcoming lies in what it cannot capture especially in more diverse collections. Thus, the question lies in the adequacy and/or usefulness of such summarizations to the target analyst. In this chapter, we report our experimental study on the sensitivity of users to the quality and content of multi-document summarization. We used the DUC 2002 collection for multi-document summarization as our testbed. Two groups of document sets were considered: (I) the sets consisting of closely correlated documents with highly overlapped content; and (II) the sets consisting of diverse documents covering a wide scope of topics. Intuitively, this suggests that creating a quality summary would be more difficult for the latter case. However, human evaluators were discovered to be fairly insensitive to this difference. This occurred when they were asked to rank the performance of various automated summarizers. In this chapter, we examine and analyze our experiments in order to better understand this phenomenon and how we might address it to improve summarization quality. In particular, we present a new metric based on document graphs that can distinguish between the two types of document sets.
KeywordsHuman Judgment Document Graph Good Summary Ranking Approach African National Congress
Unable to display preview. Download preview PDF.
- Cheng, J., J. Dang, R. Emami, H. Goradia, J. Huang, M. Huhns, L. Kerschberg, H. Nguyen, E. Jr. Santos, M. Valtorta, H. Wang, S. Xi, and Q. Zhao. 2005. A cognitive framework for user modeling, reuse of prior and tacit knowledge, and collaborative knowledge services. Proceedings of the 38th Annual Hawaii International Conference on Systems Sciences (HICSS-38) IEEE Press 293c. Big Island, HI.Google Scholar
- Elhadad, N. 2004. User-sensitive text summarization. AAAI Doctoral Consortium 987-988. San Jose, CA.Google Scholar
- Goldstein, H. 2006. Modeling terrorists, new simulation could help intelligence analysts think like the enemy. IEEE Spectrum 26–33.Google Scholar
- Gong, Y., and X. Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 19-25. New Orleans, LA.Google Scholar
- Harman, D., and P. Over. 2004. The effects of human variation in duc summarization evaluation. Proceedings of ACL 2004, Workshop on Text Summarization Branches Out 10–17. Barcelona, Spain.Google Scholar
- Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal 159–165.Google Scholar
- Mani, I., and M. T. Maybury, (Eds.) 1999. Advances in Automatic Text Summarization. Cambridge, MA: The MIT Press.Google Scholar
- McKeown, K., R. Barzilay, and S. Blair-Goldensohn. 2002a. The Columbia multi-document summarizer for DUC 2002. Document Understanding Conference.Google Scholar
- McKeown, K., R. Barzilay, D. Evans, V. Hatzivassiloglou, J. L. Klavans, A. Nenkova, C. Sable, B. Schiffman, and S. Sigelman. 2002b. Tracking and summarizing news on daily basis with Columbia’s Newsblaster. Proceedings of 2002 Human Language Technology Conference (HLT). San Diego, CA.Google Scholar
- McKeown, K., R. J. Passonneau, and D. K. Elson. 2005. Do summaries help? A task-based evaluation of multi-document summarization. SIGIR’05. Salvador, Brazil.Google Scholar
- Montes-y-Gómez, M., A. Gelbukh, and A. Lópes-López. 2000. Comparison of conceptual graphs. Proceeding of MICAI-2000 - 1st Mexican International Conference on Artificial Intelligence. Acapulco, Mexico.Google Scholar
- Myers, J. L., and A. D. Well. 1995. Research Design and Statistical Analysis 488–490. New Jersey: Lawrence Erlbaum Associates.Google Scholar
- Nguyen, H., E. Santos Jr., Q. Zhao, and C. Lee. 2004a. Evaluation of effects on retrieval performance for an adaptive user model. Adaptive Hypermedia 2004: Workshop Proceedings - Part I 193–202. Eindhoven, the Netherlands.Google Scholar
- Nguyen, H., E. Santos Jr., Q. Zhao, and H. Wang. 2004b. Capturing user intent for information retrieval. Proceedings of the 48th Annual Meeting for the Human Factors and Ergonomics Society (HFES-04) 371–375. New Orleans, LA.Google Scholar
- Over, P., and W. Liggett. 2002. Introduction to DUC-2002: An intrinsic evaluation of generic news text summarization systems. Document Understanding Conference. http://duc.nist.gov/.
- Sakai, H., and S. Masuyama. 2004. A multiple-document summarization system introducing user interaction for reflecting user’s summarization need. Working Notes of NTCIR-4. Tokyo.Google Scholar
- Sanderson, M. 1998. Accurate user directed summarization from existing tools. Proceedings of the 7th International Conference on Information and Knowledge Management 45–51. Bethesda, MD.Google Scholar
- Santos, E. Jr., G. Johnson, H. Nguyen, P. Thompson, and Q. Zhao. 2005. A cognitive framework for information gathering with deception detection for intelligence analysis. Proceedings of 2005 International Conference on Intelligence Analysis. McLean, VA.Google Scholar
- Santos, E. Jr., A. A. Mohamed, and Q. Zhao. 2004. Automatic evaluation of summaries using document graphs. Proceedings of ACL 2004, Workshop on Text Summarization Branches Out 66–73. Barcelona, Spain.Google Scholar
- Santos, E. Jr., H. Nguyen, and S. M. Brown. 2001. Kavanah: An active user interface. Information Retrieval Agent Technology 412–423. Maebashi, Japan.Google Scholar
- Santos, E. Jr., H. Nguyen, Q. Zhao, and E. Pukinskis. 2003a. Empirical evaluation of adaptive user modeling in a medical information retrieval application. P. Brusilovsky, A. Corbett, and F. de Rosis (Ed.). Lecture Notes in Artificial Intelligence 2702: User Modeling 2003 292–296. Springer, New York.Google Scholar
- Santos, E. Jr., H. Nguyen, Q. Zhao, and H. Wang. 2003b. User modelling for intent prediction in information analysis. Proceedings of the 47th Annual Meeting for the Human Factors and Ergonomics Society 1034–1038.Google Scholar
- Santos, E. Jr., H. Nguyen, Q. Zhao, and H. Wang 2005. Impacts of user modeling on personalization of information retrieval: An evaluation with human intelligence analysts. 4th Workshop on the Evaluation of Adaptive Systems, in conjunction with UM’05 27–36.Google Scholar
- Sleator, D. D., and D. Temperley. 1993. Parsing English with a link grammar. Proceedings of the 3rd International Workshop on Parsing Technologies 277–292.Google Scholar
- Van Rijsbergen. 1979. Information Retrieval, 2nd ed. Butterworths, London.Google Scholar
- Zhao, Q., E. Santos Jr., H. Nguyen, and A. A. Mohamed. 2006. What is needed for a good summary? – Two different types of document sets yet seemingly indistinguishable to human users. Presented at the Hawaii International Conference on System Sciences (HICSS 39). Koloa Hawai, HI.Google Scholar