A Novel Hybrid Text Summarization System for Punjabi Text

Gupta, Vishal; Kaur, Narvinder

doi:10.1007/s12559-015-9359-3

A Novel Hybrid Text Summarization System for Punjabi Text

Published: 19 October 2015

Volume 8, pages 261–277, (2016)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Vishal Gupta¹ &
Narvinder Kaur¹

665 Accesses
25 Citations
Explore all metrics

Abstract

Text summarization is the task of shortening text documents but retaining their overall meaning and information. A good summary should highlight the main concepts of any text document. Many statistical-based, location-based and linguistic-based techniques are available for text summarization. This paper has described a novel hybrid technique for automatic summarization of Punjabi text. Punjabi is an official language of Punjab State in India. There are very few linguistic resources available for Punjabi. The proposed summarization system is hybrid of conceptual-, statistical-, location- and linguistic-based features for Punjabi text. In this system, four new location-based features and two new statistical features (entropy measure and Z score) are used and results are very much encouraging. Support vector machine-based classifier is also used to classify Punjabi sentences into summary and non-summary sentences and to handle imbalanced data. Synthetic minority over-sampling technique is applied for over-sampling minority class data. Results of proposed system are compared with different baseline systems, and it is found that F score, Precision, Recall and ROUGE-2 score of our system are reasonably well as compared to other baseline systems. Moreover, summary quality of proposed system is comparable to the gold summary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Campos CC, Galván PV, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Elsevier’s Phys. A. 2013;392:1481–92.
Article Google Scholar
Neto JL, et al. Document Clustering and Text Summarization. In: Proceedings of 4th Int.Conf. Practical Applications of Knowledge Discovery and Data Mining, London; 2000; pp. 41-55.
Gupta V, Lehal GS. Automatic keywords extraction for Punjabi language. Int J Comput Sci Issues. 2011;8:327–31.
Google Scholar
Gupta V. Automatic stemming of words for Punjabi language. Adv Intell Syst Comput. 2014;264:73–84.
Article Google Scholar
Gupta V, Lehal GS. Complete pre processing phase of Punjabi language text summarization. In: International conference on computational linguistics COLING-2012, IIT Bombay, India; 2012; pp. 199-205.
http://punjabi.aglsoft.com/punjabi/?show=tagger.
Wong CWY, Luk RWP, Ho EKS. Discovering title-like terms. Int J Inf Process Manag. 2005;41:789–800.
Article Google Scholar
Kaur K, Gupta V. Keyword Extraction for Punjabi language. Indian J Comput Sci Eng (IJCSE). 2011;2:364–70.
Google Scholar
Gupta V, Lehal GS. Named entity recognition for Punjabi language text summarization. Int J Comput Appl. 2011;33:28–32.
Google Scholar
Gill MS, Lehal GS, Joshi SS. Part-of-speech tagging for grammar checking of Punjabi. Linguist J. 2009;8:6–22.
Google Scholar
http://www.cfilt.iitb.ac.in/Downloads.html.
www.ajitjalandhar.com/‎.
Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2:258–68.
Google Scholar
Pudota N, Dattolo A, Baruzzo A, Tasso C. A new domain independent key-phrase extraction system. Digit Libr Commun Comput Infor Sci. 2010;91:67–78.
Article Google Scholar
Agarwal B, Poria S, Mittal N, GelBukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cognit Comput. 2015;7:487–99.
Article Google Scholar
Atkinson J, Munoz R. Rhetorics-based multi-document summarization. Expert Syst Appl. 2013;40:4346–52.
Article Google Scholar
Ferreira R, Cabral L, Freitas F, Lins R, Silva G, Simske S, Favaro L. A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl. 2014;41:5780–7.
Article Google Scholar
Salton G, Singhal A, Mitra M, Buckley C. Automatic text structuring and summarization. Inf Process Manage. 1997;33:193–207.
Article Google Scholar
Mihalcea R. Language independent extractive summarization, In: proceeding of ACL2005, Association for Computational Linguistics. 2005; pp. 49–52.
Page L, Brin S, Motwani, R. Winograd, T., The pagerank citation ranking: bringing order to the web. Technical report, Stanford University, USA; 1998; pp. 1–17.
Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM (JACM). 1999;46:604–32.
Article Google Scholar
Alguliev Rasim M, Aliguliyev Ramiz M, Hajirahimova Makrufa S, Mehdiyev Chingiz A. MCMR: maximum coverage and minimum redundant text summarization model. Expert Syst Appl. 2011;38:14514–22.
Article Google Scholar
Huang L, He Y, Li W. Modeling document summarization as multi-objective optimization. In: proceedings of the third international symposium on intelligent information technology and security informatics, jinggabgshan, china; 2010; pp. 382–386.
Alguliev RM, Aliguliyev RM, Mehdiyev CA. Sentence selection for generic document summarization using an adaptive differential evolution algorithm. Swarm Evol Comput. 2011;1:213–22.
Article Google Scholar
Gupta VK, Siddiqui TJ Multi-document summarization using sentence clustering. In: Proceedings of 4th international conference on intelligent human computer interaction (IHCI). 2012; pp. 1–5.
Babara SA, Patilb PD. Improving Performance of Text Summarization. In: international conference on information and communication technologies (ICICT 2014), Procedia Computer Science, Elsevier, vol (46). 2015; pp. 354 –363.
Gupta V, Lehal GS. Automatic Text summarization system for Punjabi Language. J Emerg Technol Web Intell. 2013;5:257–71.
Google Scholar
Saleh MR, Valdivia MT, Ráez AM, Ureña-López LA. Experiments with SVM to classify opinions in different domains. Expert Syst Appl. 2011;38:14799–804.
Article Google Scholar
PadmaPriya G, Duraiswamy K. An approach for text summarization using deep learning algorithm. J Comput Sci. 2014;10:1–9.
Article Google Scholar
Gu Q, Zhifei Song Z. Image Classification Using SVM, KNN and performance comparison with logistic regression. CS44 Final project report, pp. 1–12. https://www.pdffiller.com/en/project/43663391.htm?form_id=16172581.
Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2:121–67.
Article Google Scholar
Azmia AM, Thanyyan SA. A text summarizer for Arabic. Comput Speech Lang. 2012;26:260–73.
Article Google Scholar
Fattah MA, Ren F. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang. 2009;23:126–44.
Article Google Scholar
Peng L, Ting BT, Yang XY, Ben LS. Imbalanced Data Classification Based on AdaBoost-SVM. Int J Database Theory Appl. 2014;7:85–94.
Article Google Scholar
Ertekin S. Adaptive oversampling for imbalanced data classification. Inf Sci Syst. 2013;256:261–9.
Google Scholar
Cai Q, He H, Man H. Imbalanced evolving self-organizing learning. J Neurocomput. 2014;133:258–70.
Article Google Scholar
Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng. 2012;2:42–7.
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmayer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Google Scholar
Hollander M, Wolfe DA. Book on nonparametric statistical methods. 2nd ed. USA: Wiley-Inter Science; 1999. p. 787.
Google Scholar
http://www.graphpad.com/scientific-software/instat.
Alguliev RM, Aliguliyev RM, Hajirahimova MS. GenDocSum + MCLR: generic document summarization based on maximum coverage and less redundancy. Expert Syst Appl. 2012;39:12460–73.
Article Google Scholar
Yatsko VA, Starikov MS, Butakov AV. Automatic genre recognition and adaptive text summarization. Autom Doc Math Linguist. 2010;44:111–20.
Article Google Scholar
Cho SG, Kimization SB. Summarization of documents by finding key sentences based on social network analysis. In: Proceedings of 28th International Conference IEA/AIE’15, Springer, South Korea. 2015; vol (28), pp. 285–292.
Ferreira R, Cabral LS, Lins RD, Silva GP, Freitas F, Cavalcanti GDC, Lima R, Simske S, Favaro L. Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl. 2013;40:5755–64.
Article Google Scholar
Lloret E, Boldrini E, Vodolazova T, Martínez-Barco P, Muñoz R, Palomar M. Novel concept-level approach for ultra-concise opinion summarization. Expert Syst Appl. 2015;42:7148–56.
Article Google Scholar
Nagwani NK. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J Big Data. 2015;6:2–18.
Google Scholar

Download references

Author information

Authors and Affiliations

University Institute of Engineering and Technology, Panjab University Chandigarh, Chandigarh, India
Vishal Gupta & Narvinder Kaur

Authors

Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Narvinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vishal Gupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, V., Kaur, N. A Novel Hybrid Text Summarization System for Punjabi Text. Cogn Comput 8, 261–277 (2016). https://doi.org/10.1007/s12559-015-9359-3

Download citation

Received: 02 October 2014
Accepted: 26 September 2015
Published: 19 October 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s12559-015-9359-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Hybrid Text Summarization System for Punjabi Text

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

A review of semi-supervised learning for text classification

A novel feature and class-based globalization technique for text classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Hybrid Text Summarization System for Punjabi Text

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

A review of semi-supervised learning for text classification

A novel feature and class-based globalization technique for text classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation