Skip to main content
Log in

A Novel Hybrid Text Summarization System for Punjabi Text

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Text summarization is the task of shortening text documents but retaining their overall meaning and information. A good summary should highlight the main concepts of any text document. Many statistical-based, location-based and linguistic-based techniques are available for text summarization. This paper has described a novel hybrid technique for automatic summarization of Punjabi text. Punjabi is an official language of Punjab State in India. There are very few linguistic resources available for Punjabi. The proposed summarization system is hybrid of conceptual-, statistical-, location- and linguistic-based features for Punjabi text. In this system, four new location-based features and two new statistical features (entropy measure and Z score) are used and results are very much encouraging. Support vector machine-based classifier is also used to classify Punjabi sentences into summary and non-summary sentences and to handle imbalanced data. Synthetic minority over-sampling technique is applied for over-sampling minority class data. Results of proposed system are compared with different baseline systems, and it is found that F score, Precision, Recall and ROUGE-2 score of our system are reasonably well as compared to other baseline systems. Moreover, summary quality of proposed system is comparable to the gold summary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Campos CC, Galván PV, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Elsevier’s Phys. A. 2013;392:1481–92.

    Article  Google Scholar 

  2. Neto JL, et al. Document Clustering and Text Summarization. In: Proceedings of 4th Int.Conf. Practical Applications of Knowledge Discovery and Data Mining, London; 2000; pp. 41-55.

  3. Gupta V, Lehal GS. Automatic keywords extraction for Punjabi language. Int J Comput Sci Issues. 2011;8:327–31.

    Google Scholar 

  4. Gupta V. Automatic stemming of words for Punjabi language. Adv Intell Syst Comput. 2014;264:73–84.

    Article  Google Scholar 

  5. Gupta V, Lehal GS. Complete pre processing phase of Punjabi language text summarization. In: International conference on computational linguistics COLING-2012, IIT Bombay, India; 2012; pp. 199-205.

  6. http://punjabi.aglsoft.com/punjabi/?show=tagger.

  7. Wong CWY, Luk RWP, Ho EKS. Discovering title-like terms. Int J Inf Process Manag. 2005;41:789–800.

    Article  Google Scholar 

  8. Kaur K, Gupta V. Keyword Extraction for Punjabi language. Indian J Comput Sci Eng (IJCSE). 2011;2:364–70.

    Google Scholar 

  9. Gupta V, Lehal GS. Named entity recognition for Punjabi language text summarization. Int J Comput Appl. 2011;33:28–32.

    Google Scholar 

  10. Gill MS, Lehal GS, Joshi SS. Part-of-speech tagging for grammar checking of Punjabi. Linguist J. 2009;8:6–22.

    Google Scholar 

  11. http://www.cfilt.iitb.ac.in/Downloads.html.

  12. www.ajitjalandhar.com/‎.

  13. Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2:258–68.

    Google Scholar 

  14. Pudota N, Dattolo A, Baruzzo A, Tasso C. A new domain independent key-phrase extraction system. Digit Libr Commun Comput Infor Sci. 2010;91:67–78.

    Article  Google Scholar 

  15. Agarwal B, Poria S, Mittal N, GelBukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cognit Comput. 2015;7:487–99.

    Article  Google Scholar 

  16. Atkinson J, Munoz R. Rhetorics-based multi-document summarization. Expert Syst Appl. 2013;40:4346–52.

    Article  Google Scholar 

  17. Ferreira R, Cabral L, Freitas F, Lins R, Silva G, Simske S, Favaro L. A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl. 2014;41:5780–7.

    Article  Google Scholar 

  18. Salton G, Singhal A, Mitra M, Buckley C. Automatic text structuring and summarization. Inf Process Manage. 1997;33:193–207.

    Article  Google Scholar 

  19. Mihalcea R. Language independent extractive summarization, In: proceeding of ACL2005, Association for Computational Linguistics. 2005; pp. 49–52.

  20. Page L, Brin S, Motwani, R. Winograd, T., The pagerank citation ranking: bringing order to the web. Technical report, Stanford University, USA; 1998; pp. 1–17.

  21. Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM (JACM). 1999;46:604–32.

    Article  Google Scholar 

  22. Alguliev Rasim M, Aliguliyev Ramiz M, Hajirahimova Makrufa S, Mehdiyev Chingiz A. MCMR: maximum coverage and minimum redundant text summarization model. Expert Syst Appl. 2011;38:14514–22.

    Article  Google Scholar 

  23. Huang L, He Y, Li W. Modeling document summarization as multi-objective optimization. In: proceedings of the third international symposium on intelligent information technology and security informatics, jinggabgshan, china; 2010; pp. 382–386.

  24. Alguliev RM, Aliguliyev RM, Mehdiyev CA. Sentence selection for generic document summarization using an adaptive differential evolution algorithm. Swarm Evol Comput. 2011;1:213–22.

    Article  Google Scholar 

  25. Gupta VK, Siddiqui TJ Multi-document summarization using sentence clustering. In: Proceedings of 4th international conference on intelligent human computer interaction (IHCI). 2012; pp. 1–5.

  26. Babara SA, Patilb PD. Improving Performance of Text Summarization. In: international conference on information and communication technologies (ICICT 2014), Procedia Computer Science, Elsevier, vol (46). 2015; pp. 354 –363.

  27. Gupta V, Lehal GS. Automatic Text summarization system for Punjabi Language. J Emerg Technol Web Intell. 2013;5:257–71.

    Google Scholar 

  28. Saleh MR, Valdivia MT, Ráez AM, Ureña-López LA. Experiments with SVM to classify opinions in different domains. Expert Syst Appl. 2011;38:14799–804.

    Article  Google Scholar 

  29. PadmaPriya G, Duraiswamy K. An approach for text summarization using deep learning algorithm. J Comput Sci. 2014;10:1–9.

    Article  Google Scholar 

  30. Gu Q, Zhifei Song Z. Image Classification Using SVM, KNN and performance comparison with logistic regression. CS44 Final project report, pp. 1–12. https://www.pdffiller.com/en/project/43663391.htm?form_id=16172581.

  31. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2:121–67.

    Article  Google Scholar 

  32. Azmia AM, Thanyyan SA. A text summarizer for Arabic. Comput Speech Lang. 2012;26:260–73.

    Article  Google Scholar 

  33. Fattah MA, Ren F. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang. 2009;23:126–44.

    Article  Google Scholar 

  34. Peng L, Ting BT, Yang XY, Ben LS. Imbalanced Data Classification Based on AdaBoost-SVM. Int J Database Theory Appl. 2014;7:85–94.

    Article  Google Scholar 

  35. Ertekin S. Adaptive oversampling for imbalanced data classification. Inf Sci Syst. 2013;256:261–9.

    Google Scholar 

  36. Cai Q, He H, Man H. Imbalanced evolving self-organizing learning. J Neurocomput. 2014;133:258–70.

    Article  Google Scholar 

  37. Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng. 2012;2:42–7.

    Google Scholar 

  38. Chawla NV, Bowyer KW, Hall LO, Kegelmayer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Google Scholar 

  39. Hollander M, Wolfe DA. Book on nonparametric statistical methods. 2nd ed. USA: Wiley-Inter Science; 1999. p. 787.

    Google Scholar 

  40. http://www.graphpad.com/scientific-software/instat.

  41. Alguliev RM, Aliguliyev RM, Hajirahimova MS. GenDocSum + MCLR: generic document summarization based on maximum coverage and less redundancy. Expert Syst Appl. 2012;39:12460–73.

    Article  Google Scholar 

  42. Yatsko VA, Starikov MS, Butakov AV. Automatic genre recognition and adaptive text summarization. Autom Doc Math Linguist. 2010;44:111–20.

    Article  Google Scholar 

  43. Cho SG, Kimization SB. Summarization of documents by finding key sentences based on social network analysis. In: Proceedings of 28th International Conference IEA/AIE’15, Springer, South Korea. 2015; vol (28), pp. 285–292.

  44. Ferreira R, Cabral LS, Lins RD, Silva GP, Freitas F, Cavalcanti GDC, Lima R, Simske S, Favaro L. Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl. 2013;40:5755–64.

    Article  Google Scholar 

  45. Lloret E, Boldrini E, Vodolazova T, Martínez-Barco P, Muñoz R, Palomar M. Novel concept-level approach for ultra-concise opinion summarization. Expert Syst Appl. 2015;42:7148–56.

    Article  Google Scholar 

  46. Nagwani NK. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J Big Data. 2015;6:2–18.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishal Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, V., Kaur, N. A Novel Hybrid Text Summarization System for Punjabi Text. Cogn Comput 8, 261–277 (2016). https://doi.org/10.1007/s12559-015-9359-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9359-3

Keywords

Navigation