Advertisement

Text segmentation by topic

  • Jay M. Ponte
  • W. Bruce Croft
Information Retrieval I
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1324)

Abstract

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identification in full-text databases. Researchers have tackled similar problems before but with different goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common making the problem challenging. We present a method for segmentation which makes use of a query expansion technique to find common features for the topic segments. Experiments with the technique show that it can be effective.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Callan J. P., “Passage-Level Evidence in Document Retrieval.” In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 302–310).Google Scholar
  2. 2.
    Croft, W. B. and D. J. Harper. “Using probabilistic models of document retrieval without relevance information.” Journal of Documentation, 35, 1979 (pp. 285–295).Google Scholar
  3. 3.
    Hearst, M. “Multi-Paragraph Segmentation of Expository Text”, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, June 1994.Google Scholar
  4. 4.
    Hearst, M. and Plaunt, C. Subtopic Structuring for Full-Length Document Access, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 59–68).Google Scholar
  5. 5.
    Mittendorf E. and P. Shäuble, “Document and Passage Retrieval Based on Hidden Markov Models”, In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 318–327).Google Scholar
  6. 6.
    Rabiner, L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE vol. 77, no. 2, Feb. 1989, 217.CrossRefGoogle Scholar
  7. 7.
    Salton, Gerard, J. Allan and C. Buckley, “Approaches to Passage Retrieval in Full Text Information Systems”, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 49–58).Google Scholar
  8. 8.
    Salton,Gerard, Amit Singhal, Chris Buckley and Mandar Mitra. “Automatic Text Decomposition Using Text Segments and Text Themes”, Proceedings of the Seventh ACM Conference on Hypertext, Washington D.C., 1996.Google Scholar
  9. 9.
    Salton,Gerard and Amit Singhal. “Automatic Text Theme Generation and the Analysis of Text Structure”, Cornell Computer Science Technical Report 94–1438, July 1994.Google Scholar
  10. 10.
    Xu, Jinxi and W. Bruce Croft, “Query Expansion Using Local and Global Document Analysis”, In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August, 1996 (pp. 4–11).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Jay M. Ponte
    • 1
  • W. Bruce Croft
    • 1
  1. 1.Computer Science DepartmentUniversity of MassachusettsAmherstUSA

Personalised recommendations