Skip to main content
Log in

Panini: a transformer-based grammatical error correction method for Bangla

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The purpose of the Bangla grammatical error correction task is to spontaneously identify and correct syntactic, morphological, semantic, and punctuation mistakes in written Bangla text using computational models, ultimately enhancing language precision and eloquence. The significance of the task encompasses bolstering linguistic acumen, fostering efficacious communication, and ensuring utmost lucidity and meticulousness in written expression, thereby mitigating the potential for obfuscation or dissemination of fallacious connotations. Prior endeavors have centered around surmounting the constraints inherent in rule-based and statistical methods through the exploration of machine learning and deep learning methods, aiming to enhance accuracy by apprehending intricate linguistic patterns, comprehending contextual cues, and discerning semantic nuances. In this study, we address the absence of a baseline for the task by developing a large-scale parallel corpus comprising 7.7M source-target pairs and exploring the untapped potential of transformers. Alongside the corpus, we introduce a Vaswani-style efficient monolingual transformer-based method named Bangla grammatical error corrector, Panini by leveraging transfer learning, which has become the state-of-the-art method for the task by surpassing the performance of both BanglaT5 and T5-Small by 18.81% and 23.8% of accuracy scores, and 11.5 and 15.6 of SacreBLEU scores, respectively. The empirical findings of the method substantiate its superiority over other approaches when it comes to capturing intricate linguistic rules and patterns. Moreover, the efficacy of our proposed method has been compared with the Bangla paraphrase task, showcasing its superior capability by outperforming the previous state-of-the-art method for the task as well. The BanglaGEC corpus and Panini, along with the baselines of BGEC and the Bangla paraphrase task, have been made publicly accessible at https://tinyurl.com/BanglaGEC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Datasets will be made available on request.

Notes

  1. https://github.com/sagorbrur/bnlp.

References

  1. Rozovskaya A, Roth D (2019) Grammar error correction in morphologically rich languages: the case of Russian. Trans Assoc Comput Linguist 7:1–17

    Article  Google Scholar 

  2. Hu L, Tang Y, Wu X, Zeng J (2022) Considering optimization of English grammar error correction based on neural network. Neural Comput Appl 66:1–13

    Google Scholar 

  3. Grundkiewicz R, Junczys-Dowmunt M, Heafield K (2019) Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In: Proceedings of the fourteenth workshop on innovative use of NLP for building educational applications, pp 252–263

  4. Wang Y, Wang Y, Dang K, Liu J, Liu Z (2021) A comprehensive survey of grammatical error correction. ACM Trans Intell Syst Technol 12(5):1–51

    Google Scholar 

  5. Hasan KA, Mondal A, Saha A (2010) A context free grammar and its predictive parser for Bangla grammar recognition. In: 2010 13th International conference on computer and information technology (ICCIT). IEEE, pp 87–91

  6. Hasan K, Mondal A, Saha A et al (2012) Recognizing Bangla grammar using predictive parser. arXiv preprint arXiv:1201.2010

  7. Islam MA, Hasan KA, Rahman MM (2012) Basic hpsg structure for Bangla grammar. In: 2012 15th International conference on computer and information technology (ICCIT). IEEE, pp 185–189

  8. Purohit PP, Hoque MM, Hassan MK (2014) An empirical framework for semantic analysis of Bangla sentences. In: 2014 9th International forum on strategic technology (IFOST). IEEE, pp 34–39

  9. Purohit PP, Hoque MM, Hassan MK (2014) Feature based semantic analyzer for parsing Bangla complex and compound sentences. In: The 8th International conference on software, knowledge, information management and applications (SKIMA 2014). IEEE, pp 1–7

  10. Karim MS, Robi FRH, Hossain MM, Rahman MT et al (2018) Implementation and performance evaluation of semantic features analysis system for Bangla assertive, imperative and interrogative sentences. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–5

  11. Hasan KA, Hozaifa M, Dutta S (2014) Detection of semantic errors from simple Bangla sentences. In: 2014 17th International conference on computer and information technology (ICCIT). IEEE, pp 296–299

  12. Rabbi RZ, Shuvo MIR, Hasan KA (2016) Bangla grammar pattern recognition using shift reduce parser. In: 2016 5th International conference on informatics, electronics and vision (ICIEV). IEEE, pp 229–234

  13. Al Hadi A, Khan MYA, Sayed MA (2016) Extracting semantic relatedness for Bangla words. In: 2016 5th International conference on informatics, electronics and vision (ICIEV). IEEE, pp 10–14

  14. Alamgir T, Arefin MS (2017) An empirical framework for parsing Bangla imperative, optative and exclamatory sentences. In: 2017 International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 164–169

  15. Khatun S, Hoque MM (2018) Semantic analysis of Bengali sentences. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–6

  16. Saha Prapty A, Rifat Anwar M, Azharul Hasan K (2021) A rule-based parsing for Bangla grammar pattern detection. In: Proceedings of international joint conference on advances in computational intelligence: IJCACI 2020. Springer, pp 319–331

  17. Afroz S, Susmoy M, Anjum F, Nowshin N (2021) Examining lexical and grammatical difficulties in Bengali language using nlp with machine learning. PhD thesis, Brac University

  18. Faisal AMF, Rahman MA, Farah T (2021) A rule-based Bengali grammar checker. In: 2021 Fifth world conference on smart trends in systems security and sustainability (WorldS4). IEEE, pp 113–117

  19. Alam M, UzZaman N, Khan M et al (2007) N-gram based statistical grammar checker for Bangla and English

  20. Kundu B, Chakraborti S, Choudhury SK (2011) Nlg approach for Bangla grammatical error correction. In: 9th International conference on natural language processing, ICON, pp 225–230

  21. Kundu B, Chakraborti S, Choudhury SK (2012) Combining confidence score and mal-rule filters for automatic creation of Bangla error corpus: grammar checker perspective. In: Computational linguistics and intelligent text processing: 13th international conference, CICLing 2012, New Delhi, India, March 11–17, 2012, Proceedings, Part II 13. Springer, pp 462–477

  22. Sinha M, Dasgupta T, Jana A, Basu A (2014) Design and development of a Bangla semantic lexicon and semantic similarity measure. Int J Comput Appl 975:8887

    Google Scholar 

  23. Khan NH (2014) Verification of Bangla sentence structure using n-gram. Glob J Comput Sci Technol 14:1–5

    Google Scholar 

  24. Rahman MR, Habib MT, Rahman MS, Shuvo SB, Uddin MS (2016) An investigative design based statistical approach for determining Bangla sentence validity. Int J Comput Sci Netw Secur 16(11):30–37

    Google Scholar 

  25. Nipu AS, Pal U (2017) A machine learning approach on latent semantic analysis for ambiguity checking on Bengali literature. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–4

  26. Husna A, Mostofa M, Khatun A, Islam J, Mahin M (2018) A framework for word clustering of Bangla sentences using higher order n-gram language model. In: 2018 International conference on innovation in engineering and technology (ICIET). IEEE, pp 1–6

  27. Rana MM, Sultan MT, Mridha M, Khan MEA, Ahmed MM, Hamid MA (2018) Detection and correction of real-word errors in Bangla language. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–4

  28. Mridha M, Rana MM, Hamid MA, Khan MEA, Ahmed MM, Sultan MT (2019) An approach for detection and correction of missing word in Bengali sentence. In: 2019 International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 1–4

  29. Rahman MR, Habib MT, Rahman MS, Islam GZ, Khan MAA (2020) An exploratory research on grammar checking of Bangla sentences using statistical language models. Int J Electr Comput Eng 10(3):3244–3252

    Google Scholar 

  30. Hossain N, Islam S, Huda MN (2021) Development of Bangla spell and grammar checkers: resource creation and evaluation. IEEE Access 9:141079–141097

    Article  Google Scholar 

  31. Kundu SB, Chakraborti S, Choudhury SK (2013) Complexity guided active learning for Bangla grammar correction. In: 10th International conference on natural language processing, ICON, vol 1, p 4

  32. Mridha M, Hamid MA, Rana MM, Khan MEA, Ahmed MM, Sultan MT (2019) Semantic error detection and correction in Bangla sentence. In: 2019 Joint 8th international conference on informatics, electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision & pattern recognition (icIVPR). IEEE, pp 184–189

  33. Islam S, Sarkar MF, Hussain T, Hasan MM, Farid DM, Shatabda S (2018) Bangla sentence correction using deep neural network based sequence to sequence learning. In: 2018 21st International conference of computer and information technology (ICCIT). IEEE, pp 1–6

  34. Shajalal M, Aono M (2018) Semantic textual similarity in Bengali text. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–5

  35. Abujar S, Masum AKM, Chowdhury SMH, Hasan M, Hossain SA (2019) Bengali text generation using bi-directional rnn. In: 2019 10th International conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–5

  36. Rakib OF, Akter S, Khan MA, Das AK, Habibullah KM (2019) Bangla word prediction and sentence completion using gru: an extended version of rnn on n-gram language model. In: 2019 International conference on sustainable technologies for Industry 4.0 (STI). IEEE, pp 1–6

  37. Islam MS, Mousumi SSS, Abujar S, Hossain SA (2019) Sequence-to-sequence Bangla sentence generation with lstm recurrent neural networks. Procedia Comput Sci 152:51–58

    Article  Google Scholar 

  38. Pandit R, Sengupta S, Naskar SK, Dash NS, Sardar MM (2019) Improving semantic similarity with cross-lingual resources: a study in Bangla—a low resourced language. In: Informatics, vol 6. MDPI, p 19

  39. Noshin Jahan M, Sarker A, Tanchangya S, Abu Yousuf M (2020) Bangla real-word error detection and correction using bidirectional lstm and bigram hybrid model. In: Proceedings of international conference on trends in computational and cognitive engineering: proceedings of TCCE 2020. Springer, pp 3–13

  40. Chowdhury MAH, Mumenin N, Taus M, Yousuf MA (2021) Detection of compatibility, proximity and expectancy of Bengali sentences using long short term memory. In: 2021 2nd International conference on robotics, electrical and signal processing techniques (ICREST). IEEE, pp 233–237

  41. Iqbal MA, Sharif O, Hoque MM, Sarker IH (2021) Word embedding based textual semantic similarity measure in Bengali. Procedia Comput Sci 193:92–101

    Article  Google Scholar 

  42. Anbukkarasi S, Varadhaganapathy S (2022) Neural network-based error handler in natural language processing. Neural Comput Appl 66:1–10

    Google Scholar 

  43. Dhar AC, Roy A, Habib MA, Akhand M, Siddique N (2022) Transformer deep learning model for Bangla–English machine translation. In: Proceedings of 2nd international conference on artificial intelligence: advances and applications: ICAIAA 2021. Springer, pp 255–265

  44. Aurpa TT, Sadik R, Ahmed MS (2022) Abusive Bangla comments detection on Facebook using transformer-based deep learning models. Soc Netw Anal Min 12(1):24

    Article  Google Scholar 

  45. Bijoy MH, Hossain N, Islam S, Shatabda S (2022) Dpcspell: a transformer-based detector–purificator–corrector framework for spelling error correction of Bangla and resource scarce Indic languages. arXiv preprint arXiv:2211.03730

  46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:66

    Google Scholar 

  47. Akil A, Sultana N, Bhattacharjee A, Shahriyar R (2022) Banglaparaphrase: a high-quality Bangla paraphrase dataset. arXiv preprint arXiv:2210.05109

  48. Shahgir H, Sayeed KS (2023) Bangla grammatical error detection using t5 transformer model. arXiv preprint arXiv:2303.10612

  49. Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, Seide F, Germann U, Aji AF, Bogoychev N et al (2018) Marian: fast neural machine translation in c++. arXiv preprint arXiv:1804.00344

  50. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is funded by Institute of Advanced Research (Grant No. UIU/IAR/02/2021/SE/22), United International University, Bangladesh.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swakkhar Shatabda.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

One of the earliest linguists and grammararians, Bangla grammar follows the rules set by Panini.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hossain, N., Bijoy, M.H., Islam, S. et al. Panini: a transformer-based grammatical error correction method for Bangla. Neural Comput & Applic 36, 3463–3477 (2024). https://doi.org/10.1007/s00521-023-09211-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09211-7

Keywords

Navigation