An Improvised Extractive Approach to Hindi Text Summarization

Vimal Kumar, K.; Yadav, Divakar

doi:10.1007/978-81-322-2250-7_28

K. Vimal Kumar⁷ &
Divakar Yadav⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 339))

1714 Accesses
15 Citations

Abstract

Text summarization is defined as a task of minimizing a text that is produced from one or more texts such that the actual significant information in the texts is not lost. A text summarization tool compresses the text and displays only the important content to the user. Using text summarization, decisions can be made in lesser time and the core of the document be understood. This paper emphasizes on an extractive approach and its implementation on Java. The extractive approach selects the significant sentences based on a thematic approach. Before selecting the thematic words the Hindi stop-words was removed and also the stemming process to retrieve the root words in the sentences under consideration. Stop-word elimination eliminates the semantically null words from the input document and stemming helps in clustering together words with the same radix term. The system is based on an algorithm for scoring the sentences based on occurrence of the radix of thematic words. The sentences with highest score are added to the summary. The generated summary is further processed based on removal of extraneous phrases from the previously selected summary sentences so as to bring the sentences closer to human generated summary. The testing of the accuracy of the system can be made by using a technique called The Expert Game. In expert game, experts underline and extract the most interesting or informative fragments of the text. The recall and precision of the system’s summary is measured against the human’s extract. Based on the testing, the system is found to be 85 % accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lloret, E., Palomar, M.: Finding the best approach for multi-lingual text summarisation: a comparative analysis. In: Proceedings of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria (2011)
Google Scholar
Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artif. Intell. Rev. 37(1), pp. 1–41 (2012). ISSN: 0269-2821
Google Scholar
Alguliev, R.M., Aliguliyev, R.M.: Effective summarization method of text documents. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 1–8 (2005)
Google Scholar
Mangairkarasi, S., Gunasundari, S.: Semantic based text summarization using universal networking language. Int. J. Appl. Inf. Syst. 3(8), 18–23 (2012) (Published by Foundation of Computer Science, New York, USA, August 2012)
Google Scholar
Juneja, V., Germesin, S., Kleinbauer, T.: A learning-based sampling approach to extractive summarization. In: Proceedings of the NAACL HLT 2010 Student Research Workshop, pp. 34–39 (2010)
Google Scholar
Gupta, V., Lehal, G.S.: Survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), pp. 258–268 (2010)
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarization text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, USA, 15–19 Aug 1999, pp. 121–128
Google Scholar
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL (2003)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Gupta, V., Lehal, G.S.: Features selection and weight learning for Punjabi text summarization. Int. J. Eng. Trends Technol. 2(2), 45–48 (2011)
Google Scholar
Chen, F., Han, K., Chen, G.: An approach to sentence selection based text summarization. In: Proceedings of IEEE TENCON02, pp. 489–493 (2002)
Google Scholar
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference (2000)
Google Scholar
Jing, H.: Cut-and-paste text summarization. Ph.D. thesis, Department of Computer Science, Columbia University, New York (2001)
Google Scholar
Ray, P.R., Harish, V., Basu, A., Sarkar, S.: Part of speech tagging and local word grouping techniques for natural language processing. ICON (2003)
Google Scholar
Patel, A., Siddiqui, T., Tiwary, U.S.: A language independent approach to multilingual text summarization. Conference RIAO2007, Pittsburgh, PA, USA (2007)
Google Scholar
Mihalcea, R., Tarau, P.: An algorithm for language independent single and multiple document summarization. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Korea (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Jaypee Institute of Information Technology, Noida, India
K. Vimal Kumar & Divakar Yadav

Authors

K. Vimal Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Divakar Yadav
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Vimal Kumar .

Editor information

Editors and Affiliations

University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
Dean, Faculty of Engineering, Technology, University of Kalyani, Kalyani, West Bengal, India
Manas Kumar Sanyal
Engineering and Technological Studies, University of Kalyani, Kalyani, West Bengal, India
Partha Pratim Sarkar
Department Computer Science & Engineering, University of Kalyani, Kalyani, India
Anirban Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vimal Kumar, K., Yadav, D. (2015). An Improvised Extractive Approach to Hindi Text Summarization. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 339. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2250-7_28

Download citation

DOI: https://doi.org/10.1007/978-81-322-2250-7_28
Published: 21 January 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2249-1
Online ISBN: 978-81-322-2250-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics