Abstract
This paper discusses fundamental issues involved in word selection for title generation. We review several methods for title generation, namely extractive summarization and two versions of a Naïve Bayesian, and compare the performance of those methods using an F1 metric. In addition, we introduce a novel approach to title generation using the k-nearest neighbor (KNN) algorithm. Both the KNN method and a limited-vocabulary Naïve Bayesian method outperform the other evaluated methods with an F1 score of around 20%. Since KNN produces complete and legible titles, we conclude that KNN is a very promising method for title generation, provided good content overlap exists between the training corpus and the test documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Witbrock and V. Mittal: Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries. In Proceedings of SIGIR 99, Berkeley, CA, August 1999
J. Kupiec, J. Pedersen, and F. Chen: A trainable document summarizer. In Proceedings of ACM/SIGIR.95, pages 68–73. ACM
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of SIGIR 99, Berkeley, CA, August 1999.
T. Strzalkowski, J. Wang, and B. Wise: A robust practical text summarization system. In AAAI Intelligent Text Summarization Workshop, pp. 26–30, Stanford, CA, March, 1998.
G. Salton, A. Singhal, M. Mitra, and C. Buckley: Automatic text structuring and summary. Info. Proc. And Management, 33(2):193–207, March, 1997.
M. Mitra, A. Sighal, and C. Buckley: Automatic text summarization by paragraph extraction. In Proceedings of the ACL.97/EACL.97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain.
Y. Yang, C. G. Chute: An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems (TOIS), 12(3):252–77. 1994.
V. Rjiesbergen: Information Retrieval. Chapter 7.. Butterworths, London, 1979.
G. Salton (ed): The SMART Retrieval System: Experiments in Automatic Document Proceeding. Prentice Hall, Englewood Cliffs, New Jersey. 1971.
V. Mittal, M. Kantrowitz, J. Goldstein and J. Carbonell: Selecting Text Spans for Document Summaries: Heuristics and Metrics. AAAI-99.
H. Nye: The Use of a One Stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. AASP-32, No 2, pp. 262–271, April 1984.
P. Kennedy and A.G. Hauptmann: Automatic Title Generation for the Informedia Multimedia Digital Library. ACM Digital Libraries, DL-2000, San Antonio Texas, May 2000, in press.
Primary Source Media, Broadcast News CDROM, Woodbridge, CT, 1997
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, R., Hauptmann, A.G. (2001). Title Generation Using a Training Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_23
Download citation
DOI: https://doi.org/10.1007/3-540-44686-9_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41687-6
Online ISBN: 978-3-540-44686-6
eBook Packages: Springer Book Archive