Abstract
Text Summarization is a process of identifying the most salient information in a document or a set of related documents. This paper presents the performance analysis of a Turkish text summarization system that applies two Latent Semantic Analysis based algorithms with different preprocessing phases. The preprocessing method called “Consecutive Words Detection” is a new method that uses Turkish Wikipedia links to represent related consecutive words as a single term and improves the performance of text summarization in Turkish.
Keywords
- Turkish Text Summarization
- Latent Semantic Analysis
- Turkish Wikipedia
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research Development 2(2), 159–165 (1958)
Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)
Kupiec, J., Jan, O.P., Francine, C.: A trainable document summarizer. In: 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA, pp. 68–73 (1995)
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, pp. 19–25 (2001)
Steinberger, J.: Text Summarization within the LSA Framework. PhD Thesis, University of West Bohemia in Pilsen, Czech Republic (2007)
Turkish Wikipedia, http://tr.wikipedia.org/wiki/Ana_Sayfa
Chris, H., Ding, Q.: A probabilistic model for latent semantic indexing. Journal of the American Society for Information Science and Technology 56(6), 597–608 (2005)
Zemberek-Turkish NLP library, https://zemberek.dev.java.net/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Güran, A., Bayazıt, N.G. (2012). A New Preprocessing Phase for LSA-Based Turkish Text Summarization. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25781-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-25781-0_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25780-3
Online ISBN: 978-3-642-25781-0
eBook Packages: EngineeringEngineering (R0)