Abstract
Text summarization is defined as a task of minimizing a text that is produced from one or more texts such that the actual significant information in the texts is not lost. A text summarization tool compresses the text and displays only the important content to the user. Using text summarization, decisions can be made in lesser time and the core of the document be understood. This paper emphasizes on an extractive approach and its implementation on Java. The extractive approach selects the significant sentences based on a thematic approach. Before selecting the thematic words the Hindi stop-words was removed and also the stemming process to retrieve the root words in the sentences under consideration. Stop-word elimination eliminates the semantically null words from the input document and stemming helps in clustering together words with the same radix term. The system is based on an algorithm for scoring the sentences based on occurrence of the radix of thematic words. The sentences with highest score are added to the summary. The generated summary is further processed based on removal of extraneous phrases from the previously selected summary sentences so as to bring the sentences closer to human generated summary. The testing of the accuracy of the system can be made by using a technique called The Expert Game. In expert game, experts underline and extract the most interesting or informative fragments of the text. The recall and precision of the system’s summary is measured against the human’s extract. Based on the testing, the system is found to be 85 % accurate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lloret, E., Palomar, M.: Finding the best approach for multi-lingual text summarisation: a comparative analysis. In: Proceedings of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria (2011)
Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artif. Intell. Rev. 37(1), pp. 1–41 (2012). ISSN: 0269-2821
Alguliev, R.M., Aliguliyev, R.M.: Effective summarization method of text documents. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 1–8 (2005)
Mangairkarasi, S., Gunasundari, S.: Semantic based text summarization using universal networking language. Int. J. Appl. Inf. Syst. 3(8), 18–23 (2012) (Published by Foundation of Computer Science, New York, USA, August 2012)
Juneja, V., Germesin, S., Kleinbauer, T.: A learning-based sampling approach to extractive summarization. In: Proceedings of the NAACL HLT 2010 Student Research Workshop, pp. 34–39 (2010)
Gupta, V., Lehal, G.S.: Survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), pp. 258–268 (2010)
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarization text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, USA, 15–19 Aug 1999, pp. 121–128
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL (2003)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Gupta, V., Lehal, G.S.: Features selection and weight learning for Punjabi text summarization. Int. J. Eng. Trends Technol. 2(2), 45–48 (2011)
Chen, F., Han, K., Chen, G.: An approach to sentence selection based text summarization. In: Proceedings of IEEE TENCON02, pp. 489–493 (2002)
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference (2000)
Jing, H.: Cut-and-paste text summarization. Ph.D. thesis, Department of Computer Science, Columbia University, New York (2001)
Ray, P.R., Harish, V., Basu, A., Sarkar, S.: Part of speech tagging and local word grouping techniques for natural language processing. ICON (2003)
Patel, A., Siddiqui, T., Tiwary, U.S.: A language independent approach to multilingual text summarization. Conference RIAO2007, Pittsburgh, PA, USA (2007)
Mihalcea, R., Tarau, P.: An algorithm for language independent single and multiple document summarization. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Korea (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer India
About this paper
Cite this paper
Vimal Kumar, K., Yadav, D. (2015). An Improvised Extractive Approach to Hindi Text Summarization. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 339. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2250-7_28
Download citation
DOI: https://doi.org/10.1007/978-81-322-2250-7_28
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2249-1
Online ISBN: 978-81-322-2250-7
eBook Packages: EngineeringEngineering (R0)