Skip to main content

Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature

  • Conference paper
  • First Online:
Book cover Computational Intelligence, Communications, and Business Analytics (CICBA 2018)

Abstract

This paper aims to develop an automatic text categorization system that classifies Bangla medical and non-medical text documents based on two primary features, that is, word length and the presence of English equivalent words in the text documents. To start with, it has been shown that based on the word length and the number of English equivalent words present in a particular text, Bangla medical text documents can be identified among other text documents of any domain. SGD (Stochastic Gradient Descent) classification algorithm is used and an accuracy of 97.75% has been achieved. Comparisons have also been done with other commonly used classifiers to test the system from which it has been observed that SGD performs better than those classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. DeySarkar, S., Goswami, S., Agarwal, A., Akhtar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 10 (2014)

    Google Scholar 

  2. Guru, D.S., Suhil, M.: A novel term_class relevance measure for text categorization. In: Proceedings of International Conference on Advanced Computing Technologies and Applications, pp. 13–22 (2015)

    Google Scholar 

  3. Jin, P., Zhang, Y., Chen, X., Xia, Y.: Bag-of-embeddings for text classification. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2824–2830 (2016)

    Google Scholar 

  4. Wang, D., Zhang, H., Liu, R., Lv, W.: Feature selection based on term frequency and T-test for text categorization. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 1482–1486 (2012)

    Google Scholar 

  5. Gupta, N., Gupta, V.: Punjabi text classification using naive bayes, centroid and hybrid approach. In: Proceedings of Workshop on South and South East Asian Natural Language Processing, pp. 109–122 (2012)

    Google Scholar 

  6. Mansur, M., UzZaman, N., Khan, M.: analysis of n-gram based text categorization for bangla in a newspaper corpus. In: Proceedings of International Conference on Computer and Information Technology, pp. 08 (2006)

    Google Scholar 

  7. Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. Int. J. Artif. Intell. Appl. 05, 93–105 (2014)

    Google Scholar 

  8. Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of International Conference on Cognitive Computing and Information Processing, pp. 1–4 (2015)

    Google Scholar 

  9. Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A comparative study on different types of approaches to bengali document categorization. In: Proceedings of International Conference on Engineering Research, Innovation and Education, p. 06 (2017)

    Google Scholar 

  10. Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A support vector machine mixed with TF-IDF algorithm to categorize bengali document. In: Proceedings of International Conference on Electrical, Computer and Communication Engineering, pp. 191–196 (2017)

    Google Scholar 

  11. Dhar, A., Dash, N.S., Roy, K.: Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of International Conference on Advances in Computing, Communication and Automation, pp. 1–6 (2017)

    Google Scholar 

  12. Dhar, A., Dash, N.S., Roy, K.: Application of TF-IDF feature for categorizing documents of online Bangla web text corpus. In: Proceedings of International Conference on Frontiers of Intelligent Computing: Theory and Applications, pp. 51–59 (2017)

    Google Scholar 

  13. ArunaDevi, K., Saveetha, R.: A novel approach on tamil text classification using C-Feature. Int. J. Sci. Res. Dev. 02, 343–345 (2014)

    Google Scholar 

  14. Swamy, M.N., Thappa, M.H.: Indian Language text representation and categorization using supervised learning algorithm. Int. J. Data Min. Tech. Appl. 02, 251–257 (2013)

    Google Scholar 

  15. Patil, J.J., Bogiri, N.: Automatic text categorization Marathi documents. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 03, 280–287 (2015)

    Google Scholar 

  16. Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155, 6–10 (2016)

    Google Scholar 

  17. Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Min. 03, 255–273 (2015)

    Article  Google Scholar 

  18. Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection. In: Proceedings of International Conference on Arabic language Processing, p. 10 (2014)

    Google Scholar 

  19. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)

    Article  Google Scholar 

Download references

Acknowledgement

One of the authors would like to thank Department of Science and Technology (DST) for support in the form of INSPIRE fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankita Dhar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dhar, A., Dash, N.S., Roy, K. (2019). Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-8581-0_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-8580-3

  • Online ISBN: 978-981-13-8581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics