Abstract
Automatic text summarization is considered as a challenging task in natural language processing field. In the case of multilingual scenario particularly for the low-resource, morphologically complex languages the availability of summarization data set is rare and difficult to construct. In this work, we propose a novel technique to extract Odia text from the image files using optical character recognition (OCR) and summarize the obtained text using extractive summarization techniques. Also, we performed a manual evaluation to measure the quality of summaries to validate our techniques. The proposed approach is found suitable for generating summarized Odia text and the same technique can also extend to other low-resource languages for extractive summarization system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manage. 39(1), 45–65 (2003)
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Balabantaray, R., Sahoo, B., Sahoo, D., Swain, M.: Odia text summarization using stemmer. Int. J. Appl. Inf. Syst. 1(3), 21–24 (2012). 2249–0868
Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization: a survey. arXiv preprint arXiv:1704.03242 (2017)
Biswas, S., Acharya, S., Dash, S.: Automatic text summarization for Oriya language. Int. J. Comput. Appl. 975, 8887 (2015)
Gaikwad, D.K., Mahender, C.N.: A review paper on text summarization. Int. J. Adv. Res. Comput. Commun. Eng. 5(3), 154–160 (2016)
Joshi, N.: Text image extraction and summarization. Asian J. Converg. Technol. (AJCT) 5(1), 1–7 (2019)
Kryściński, W., Paulus, R., Xiong, C., Socher, R.: Improving abstraction in text summarization. arXiv preprint arXiv:1808.07913 (2018)
Lloret, E.: Text summarization: an overview. Paper supported by the Spanish Government under the project TEXT-MESS (TIN2006-15265-C06-01) (2008)
Munot, N., Govilkar, S.S.: Comparative study of text summarization methods. Int. J. Comput. Appl. 102(12), 33–37 (2014)
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNS and beyond. arXiv preprint arXiv:1602.06023 (2016)
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, Piscataway, NJ, vol. 242, pp. 133–142 (2003)
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)
Yousefi-Azar, M., Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pattnaik, P., Mallick, D.K., Parida, S., Dash, S.R. (2020). Extractive Odia Text Summarization System: An OCR Based Approach. In: Dehuri, S., Mishra, B., Mallick, P., Cho, SB., Favorskaya, M. (eds) Biologically Inspired Techniques in Many-Criteria Decision Making. BITMDM 2019. Learning and Analytics in Intelligent Systems, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-39033-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-39033-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39032-7
Online ISBN: 978-3-030-39033-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)