Summarizing Similarities and Differences Among Related Documents

Mani, Inderjeet; Bloedorn, Eric

doi:10.1023/A:1009930203452

Summarizing Similarities and Differences Among Related Documents

Published: April 1999

Volume 1, pages 35–67, (1999)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Summarizing Similarities and Differences Among Related Documents

Download PDF

Inderjeet Mani &
Eric Bloedorn

1331 Accesses
86 Citations
3 Altmetric
Explore all metrics

Abstract

In many modern information retrieval applications, a common problem which arises is the existence of multiple documents covering similar information, as in the case of multiple news stories about an event or a sequence of events. A particular challenge for text summarization is to be able to summarize the similarities and differences in information content among these documents. The approach described here exploits the results of recent progress in information extraction to represent salient units of text and their relationships. By exploiting meaningful relations between units based on an analysis of text cohesion and the context in which the comparison is desired, the summarizer can pinpoint similarities and differences, and align text segments. In evaluation experiments, these techniques for exploiting cohesion relations result in summaries which (i) help users more quickly complete a retrieval task (ii) result in improved alignment accuracy over baselines, and (iii) improve identification of topic-relevant similarities and differences.

Avoid common mistakes on your manuscript.

References

J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain. “MITRE: Description of the Alembic System Used forMUC-6”, Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 1995.
J. Abracos and G. Pereira Lopes. Statistical Methods for Retrieving Most Significant Paragraphs in Newspaper Articles, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997, pp. 51–57.
R. Alterman. “A Dictionary Based on Concept Coherence”, Artificial Intelligence, 25, 1985, pp. 153–86.
Google Scholar
C. Aone, M.E. Okurowski, J. Gorlinsky and B. Larsen. “A Scalable Summarization System using Robust NLP”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997, pp. 66–73.
J.P. Callan. “Passage-Level Evidence in Document Retrieval”, Proceedings of SIGIR'94, p. 302–310, 1994.
A. Barzilay and M. Elhadad. “Using Lexical Chains for Text Summarization”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997, pp. 10–17.
P.B. Baxendale. “Man-made index for technical literature: an experiment”, IBM Journal of Research and Development, 2, 4, 1958, pp. 354–361.
Google Scholar
B. Boguraev and C. Kennedy. “Salience-based Content Characterization of Text Documents”, in Mani, I., and Maybury, M., eds., Proceedings of theACL/EACL'97Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997, pp. 2–9.
E. Brill. “Some advances in rule-based part-of-speech tagging”, Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, August 1–4, 1994, pp. 722–727.
J. Broglio and B. Croft. “Query Processing for Retrieval from Large Text Bases”, ARPA Human Language Technology Workshop, 1993.
B. Buckley. “The Importance of Proper Weighting Methods”, ARPA Human Language Technology Workshop, 1993.
C.H. Chen, K. Basu and T. Ng. “An Algorithmic Approach to Concept Exploration in a Large Knowledge Network”, Technical Report, MIS Department, University of Arizona, Tucson, AZ, 1994.
Google Scholar
J.D. Cohen. “Hilights: Language-and Domain-Independent Automatic Indexing Terms for Abstracting”, Journal of the American Society for Information Science, 46, 3, 162–174, 1995. See also vol. 47, 3, 260 for a very important erratum.
Google Scholar
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman. “Indexing by Latent Semantic Analysis”, Journal of the American Society for Information Science, 41, 6, pp. 391–407.
H.P. Edmundson. “New methods in automatic abstracting”, Journal of the Association for Computing Machinery, 1969, 16, 2, pp. 264–285.
Google Scholar
D. Evans. “The Clarit Project”, Technical Report, Laboratory for Computational Linguistics, Carnegie Mellon University, 1991.
D.A. Evans, K. Ginther-Webster, M. Hart, R.G. Lefferts and I.A. Monarch. “Automatic indexing using selective NLP and first-order thesauri”, Proceedings of RIAO'91, 2, pp. 624–643.
D. Evans and C. Zhai. “Noun Phrase Analysis in Unrestricted Text for Information Retrieval”, Proceedings of ACL-96, Cambridge, MA, June 1996.
G. Grefenstette. “Use of syntactic context to produce term association lists for text retrieval”, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp. 89–97.
G. Grefenstette. “Explorations in Automatic Thesaurus Discovery”, Kluwer, Boston, 1994.
Google Scholar
M. Halliday and R. Hasan. “Cohesion in Text”, 1996, London, Longmans.
Google Scholar
T.F. Hand. “A Proposal for Task-Based Evaluation of Text Summarization Systems”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997.
D. Harman, editor, “An Overview of the Third Text Retrieval Conference”, National Institute of Standards and Tehnology, NIST Special Publication 500–225, 1994, Gaithersburg, MD.
M. Hearst. “Multi-Paragraph Segmentation of Expository Text”, Proceedings of ACL-94, Las Cruces, New Mexico, 1994.
G. Krupka. “SRA: Description of the SRA System as Used for MUC-6”, Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 1995.
J. Kupiec, J. Pedersen and F. Chen. “A Trainable Document Summarizer”, Proceedings of ACM-SIGIR'95, Seattle, WA, 1995, pp. 68–73.
E.R. Liddy. “The discourse-level Structure of Empirical Abstracts: An Exploratory Study”, Information Processing and Management, 1991, 27, 1, 55–81.
Google Scholar
I. Mani, D. House, M. Maybury and M. Green. “Towards Content-Based Browsing of Broadcast News Video”, in Maybury, M., ed., Intelligent Multimedia Information Retrieval, AAAI/MIT Press, 1997.
I. Mani and E. Bloedorn. “Summarizing Similarities and Differences Among Related Documents”, Proceedings of RIAO-97, Montreal, Canada, June 25–27, 1997, pp. 373–387.
I. Mani and E. Bloedorn. “Multi-document Summarization by Graph Search and Merging”, Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), Providence, RI, July 27–31, 1997, pp. 622–628.
W.C. Mann and S.A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8, 3, 1988, pp. 243–281.
Google Scholar
D. Marcu. “From discourse structures to text summaries”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997, pp. 82–88.
M. Maybury. “Generating Summaries from Event Data”, Information Processing and Management, 31, 5, 1995, pp. 735–751.
Google Scholar
K. McKeown and D. Radev. “Generating Summaries of Multiple News Articles”, Proceedings of ACM-SIGIR '95, Seattle, WA.
S. Miike, E. Itoh, K. Ono and K. Sumita. “A Full-Text Retrieval System with a Dynamic Abstract Generation Function”, Proceedings of ACM-SIGIR'94, Dublin, Ireland.
M. Mitra, A. Singhal and C. Buckley. “Automatic Text Summarization by Paragraph Extraction”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997.
J. Morris and G. Hirst. “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics, 17, 1, pp. 21–43, 1991.
Google Scholar
G. Miller. “WordNet: A Lexical Database for English”, Communications of the ACM, 38, 11, pp. 39–41, 1995.
Article Google Scholar
MUC-6, Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 1995.
C. Paice. “Constructing Literature Abstracts by Computer: Techniques and Prospects, Information Processing and Management, 26, 1, pp. 171–186, 1990.
Google Scholar
C. Paice and P. Jones. “The Identification of Important Concepts in Highly Structured Technical Papers”, Proceedings of ACM-SIGIR'93, Pittsburgh, PA.
W. Paik, E. Liddy, E. Yu and M. McKenna. “Categorizing and Standardizing Proper Nouns for Efficient Information Retrieval”, Proceedings of the ACL Workshop on Acquisition of Lexical Knowledge from Text, Ohio State University, 1993.
C. Pearce and C. Nicholas. “TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data”, JASIS, 47, 4, 263–275, 1996.
Google Scholar
M.F. Porter. “An Algorithm For Suffix Stripping”, Program, 14, 3, July 1980, pp. 130–137.
Google Scholar
G.J. Rath, A. Resnick and T.R. Savage. “The formation of abstracts by the selection of sentences”, American Documentation, 12, 2, 1961, pp. 139–143.
Google Scholar
L. Rau. ”Knowledge Organization and Access in a Conceptual Information System,” Information Processing and Management, 23, 4, 269–283, 1987.
Google Scholar
P. Resnick. “Selection and Information: A Class-Based Approach to Lexical Relationships”, Ph.D. Dissertation, 1993, University of Pennsylvania, Philadelphia, PA.
Google Scholar
G. Salton. “Automatic text processing-the transformation, analysis, and retrieval of information by computer”, Addison-Wesley, Reading, MA, 1989.
Google Scholar
G. Salton, J. Allan, C. Buckley and A. Singhal. “Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts”, Science, 264, June 1994, pp. 1421–1426.
Google Scholar
G. Salton and C. Buckley. “On the Use of Spreading Activation Methods in Automatic Information Retrieval”, Technical Report 88–907, Department of Computer Science, Cornell University, 1988.
G. Salton, A. Singhal, C. Buckley and M. Mitra. “Automatic Text Decomposition Using Text Segments and Text Themes”, Cornell University Technical Report TR 95–1555, Nov. 17, 1995.
H.M. Schutze and J.O. Pedersen. “A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval”, Proceedings of RIAO'97.
A.F. Smeaton and I. Quigley. “Experiments on Using Semantic Distances Between Words in Image Caption Retrieval”, Proceedings of ACM-SIGIR'96, Zurich, Switzerland.
K. Sparck-Jones. “A Statistical Interpretation of Term Specificity and Its Application in Retrieval”, Journal of Documentation, 28, 1, 11–20, 1972.
Google Scholar
K. Sparck-Jones. “Summarizing: Where are we now? Where should we go?”, in Mani, I., and Maybury, M., eds., Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 11 July 1997.
T. Strzalkowski. “Natural Language Information Retrieval: TIPSTER-2 Final Report”, TIPSTER Text Program (Phase II), 1996, pp. 143–148.
T.A. Van Dijk. “News as Discourse”, Lawrence Erlbaum, Hillsdale, NJ, 1988.
Google Scholar
E.M. Voorhees. “Using WordNet to Disambiguate Word Senses for Text Retrieval”, Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, June, 1993, pp. 171–180.

Download references

Authors

Inderjeet Mani
View author publications
You can also search for this author in PubMed Google Scholar
Eric Bloedorn
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mani, I., Bloedorn, E. Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1, 35–67 (1999). https://doi.org/10.1023/A:1009930203452

Download citation

Issue Date: April 1999
DOI: https://doi.org/10.1023/A:1009930203452

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Summarizing Similarities and Differences Among Related Documents

Abstract

Article PDF

Similar content being viewed by others

Siamese Neural Networks: An Overview

Automated identification of media bias in news articles: an interdisciplinary literature review

Trends and challenges in sentiment summarization: a systematic review of aspect extraction techniques

References

Rights and permissions

About this article

Cite this article

Navigation

Summarizing Similarities and Differences Among Related Documents

Abstract

Article PDF

Similar content being viewed by others

Siamese Neural Networks: An Overview

Automated identification of media bias in news articles: an interdisciplinary literature review

Trends and challenges in sentiment summarization: a systematic review of aspect extraction techniques

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation