Skip to main content

Text segmentation by topic

  • Information Retrieval I
  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1324))

Included in the following conference series:

Abstract

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identification in full-text databases. Researchers have tackled similar problems before but with different goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common making the problem challenging. We present a method for segmentation which makes use of a query expansion technique to find common features for the topic segments. Experiments with the technique show that it can be effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan J. P., “Passage-Level Evidence in Document Retrieval.” In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 302–310).

    Google Scholar 

  2. Croft, W. B. and D. J. Harper. “Using probabilistic models of document retrieval without relevance information.” Journal of Documentation, 35, 1979 (pp. 285–295).

    Google Scholar 

  3. Hearst, M. “Multi-Paragraph Segmentation of Expository Text”, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, June 1994.

    Google Scholar 

  4. Hearst, M. and Plaunt, C. Subtopic Structuring for Full-Length Document Access, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 59–68).

    Google Scholar 

  5. Mittendorf E. and P. Shäuble, “Document and Passage Retrieval Based on Hidden Markov Models”, In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 318–327).

    Google Scholar 

  6. Rabiner, L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE vol. 77, no. 2, Feb. 1989, 217.

    Article  Google Scholar 

  7. Salton, Gerard, J. Allan and C. Buckley, “Approaches to Passage Retrieval in Full Text Information Systems”, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 49–58).

    Google Scholar 

  8. Salton,Gerard, Amit Singhal, Chris Buckley and Mandar Mitra. “Automatic Text Decomposition Using Text Segments and Text Themes”, Proceedings of the Seventh ACM Conference on Hypertext, Washington D.C., 1996.

    Google Scholar 

  9. Salton,Gerard and Amit Singhal. “Automatic Text Theme Generation and the Analysis of Text Structure”, Cornell Computer Science Technical Report 94–1438, July 1994.

    Google Scholar 

  10. Xu, Jinxi and W. Bruce Croft, “Query Expansion Using Local and Global Document Analysis”, In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August, 1996 (pp. 4–11).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Carol Peters Costantino Thanos

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ponte, J.M., Croft, W.B. (1997). Text segmentation by topic. In: Peters, C., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1997. Lecture Notes in Computer Science, vol 1324. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026725

Download citation

  • DOI: https://doi.org/10.1007/BFb0026725

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63554-3

  • Online ISBN: 978-3-540-69597-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics