Skip to main content

A New Preprocessing Phase for LSA-Based Turkish Text Summarization

  • Chapter

Part of the Lecture Notes in Electrical Engineering book series (LNEE,volume 124)

Abstract

Text Summarization is a process of identifying the most salient information in a document or a set of related documents. This paper presents the performance analysis of a Turkish text summarization system that applies two Latent Semantic Analysis based algorithms with different preprocessing phases. The preprocessing method called “Consecutive Words Detection” is a new method that uses Turkish Wikipedia links to represent related consecutive words as a single term and improves the performance of text summarization in Turkish.

Keywords

  • Turkish Text Summarization
  • Latent Semantic Analysis
  • Turkish Wikipedia

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research Development 2(2), 159–165 (1958)

    CrossRef  MathSciNet  Google Scholar 

  2. Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)

    CrossRef  MATH  Google Scholar 

  3. Kupiec, J., Jan, O.P., Francine, C.: A trainable document summarizer. In: 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA, pp. 68–73 (1995)

    Google Scholar 

  4. Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, pp. 19–25 (2001)

    Google Scholar 

  5. Steinberger, J.: Text Summarization within the LSA Framework. PhD Thesis, University of West Bohemia in Pilsen, Czech Republic (2007)

    Google Scholar 

  6. Turkish Wikipedia, http://tr.wikipedia.org/wiki/Ana_Sayfa

  7. Chris, H., Ding, Q.: A probabilistic model for latent semantic indexing. Journal of the American Society for Information Science and Technology 56(6), 597–608 (2005)

    CrossRef  Google Scholar 

  8. Zemberek-Turkish NLP library, https://zemberek.dev.java.net/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Güran, A., Bayazıt, N.G. (2012). A New Preprocessing Phase for LSA-Based Turkish Text Summarization. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25781-0_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25781-0_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25780-3

  • Online ISBN: 978-3-642-25781-0

  • eBook Packages: EngineeringEngineering (R0)