Title Generation Using a Training Corpus

Jin, Rong; Hauptmann, Alexander G.

doi:10.1007/3-540-44686-9_23

Rong Jin² &
Alexander G. Hauptmann²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2004))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

805 Accesses
2 Citations

Abstract

This paper discusses fundamental issues involved in word selection for title generation. We review several methods for title generation, namely extractive summarization and two versions of a Naïve Bayesian, and compare the performance of those methods using an F1 metric. In addition, we introduce a novel approach to title generation using the k-nearest neighbor (KNN) algorithm. Both the KNN method and a limited-vocabulary Naïve Bayesian method outperform the other evaluated methods with an F1 score of around 20%. Since KNN produces complete and legible titles, we conclude that KNN is a very promising method for title generation, provided good content overlap exists between the training corpus and the test documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Witbrock and V. Mittal: Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries. In Proceedings of SIGIR 99, Berkeley, CA, August 1999
Google Scholar
J. Kupiec, J. Pedersen, and F. Chen: A trainable document summarizer. In Proceedings of ACM/SIGIR.95, pages 68–73. ACM
Google Scholar
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of SIGIR 99, Berkeley, CA, August 1999.
Google Scholar
T. Strzalkowski, J. Wang, and B. Wise: A robust practical text summarization system. In AAAI Intelligent Text Summarization Workshop, pp. 26–30, Stanford, CA, March, 1998.
Google Scholar
G. Salton, A. Singhal, M. Mitra, and C. Buckley: Automatic text structuring and summary. Info. Proc. And Management, 33(2):193–207, March, 1997.
Article Google Scholar
M. Mitra, A. Sighal, and C. Buckley: Automatic text summarization by paragraph extraction. In Proceedings of the ACL.97/EACL.97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain.
Google Scholar
Y. Yang, C. G. Chute: An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems (TOIS), 12(3):252–77. 1994.
Article Google Scholar
V. Rjiesbergen: Information Retrieval. Chapter 7.. Butterworths, London, 1979.
Google Scholar
G. Salton (ed): The SMART Retrieval System: Experiments in Automatic Document Proceeding. Prentice Hall, Englewood Cliffs, New Jersey. 1971.
Google Scholar
V. Mittal, M. Kantrowitz, J. Goldstein and J. Carbonell: Selecting Text Spans for Document Summaries: Heuristics and Metrics. AAAI-99.
Google Scholar
H. Nye: The Use of a One Stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. AASP-32, No 2, pp. 262–271, April 1984.
Google Scholar
P. Kennedy and A.G. Hauptmann: Automatic Title Generation for the Informedia Multimedia Digital Library. ACM Digital Libraries, DL-2000, San Antonio Texas, May 2000, in press.
Google Scholar
Primary Source Media, Broadcast News CDROM, Woodbridge, CT, 1997
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Institute, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., 15213, Pittsburgh, PA, USA
Rong Jin & Alexander G. Hauptmann

Authors

Rong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander G. Hauptmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CIC (Centro de Investigación en Computatción IPN (Instituto Politécnico Nacional), Av. Juan Dios Bátiz s/n esq. M. Othon Mendizabal Col. Nuevo Vallejo, CP. 07738, México, Mexico
Alexander Gelbukh (Unidad Profecional “Adolfo López Mateos”) (Unidad Profecional “Adolfo López Mateos”)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, R., Hauptmann, A.G. (2001). Title Generation Using a Training Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_23

Download citation

DOI: https://doi.org/10.1007/3-540-44686-9_23
Published: 16 March 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41687-6
Online ISBN: 978-3-540-44686-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics