Skip to main content

Title Generation Using a Training Corpus

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2004))

Abstract

This paper discusses fundamental issues involved in word selection for title generation. We review several methods for title generation, namely extractive summarization and two versions of a Naïve Bayesian, and compare the performance of those methods using an F1 metric. In addition, we introduce a novel approach to title generation using the k-nearest neighbor (KNN) algorithm. Both the KNN method and a limited-vocabulary Naïve Bayesian method outperform the other evaluated methods with an F1 score of around 20%. Since KNN produces complete and legible titles, we conclude that KNN is a very promising method for title generation, provided good content overlap exists between the training corpus and the test documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Witbrock and V. Mittal: Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries. In Proceedings of SIGIR 99, Berkeley, CA, August 1999

    Google Scholar 

  2. J. Kupiec, J. Pedersen, and F. Chen: A trainable document summarizer. In Proceedings of ACM/SIGIR.95, pages 68–73. ACM

    Google Scholar 

  3. J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of SIGIR 99, Berkeley, CA, August 1999.

    Google Scholar 

  4. T. Strzalkowski, J. Wang, and B. Wise: A robust practical text summarization system. In AAAI Intelligent Text Summarization Workshop, pp. 26–30, Stanford, CA, March, 1998.

    Google Scholar 

  5. G. Salton, A. Singhal, M. Mitra, and C. Buckley: Automatic text structuring and summary. Info. Proc. And Management, 33(2):193–207, March, 1997.

    Article  Google Scholar 

  6. M. Mitra, A. Sighal, and C. Buckley: Automatic text summarization by paragraph extraction. In Proceedings of the ACL.97/EACL.97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain.

    Google Scholar 

  7. Y. Yang, C. G. Chute: An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems (TOIS), 12(3):252–77. 1994.

    Article  Google Scholar 

  8. V. Rjiesbergen: Information Retrieval. Chapter 7.. Butterworths, London, 1979.

    Google Scholar 

  9. G. Salton (ed): The SMART Retrieval System: Experiments in Automatic Document Proceeding. Prentice Hall, Englewood Cliffs, New Jersey. 1971.

    Google Scholar 

  10. V. Mittal, M. Kantrowitz, J. Goldstein and J. Carbonell: Selecting Text Spans for Document Summaries: Heuristics and Metrics. AAAI-99.

    Google Scholar 

  11. H. Nye: The Use of a One Stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. AASP-32, No 2, pp. 262–271, April 1984.

    Google Scholar 

  12. P. Kennedy and A.G. Hauptmann: Automatic Title Generation for the Informedia Multimedia Digital Library. ACM Digital Libraries, DL-2000, San Antonio Texas, May 2000, in press.

    Google Scholar 

  13. Primary Source Media, Broadcast News CDROM, Woodbridge, CT, 1997

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, R., Hauptmann, A.G. (2001). Title Generation Using a Training Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-44686-9_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41687-6

  • Online ISBN: 978-3-540-44686-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics