Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature

Dhar, Ankita; Dash, Niladri Sekhar; Roy, Kaushik

doi:10.1007/978-981-13-8581-0_15

Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature

Ankita Dhar¹¹,
Niladri Sekhar Dash¹² &
Kaushik Roy¹¹

Conference paper
First Online: 26 June 2019

975 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1031))

Abstract

This paper aims to develop an automatic text categorization system that classifies Bangla medical and non-medical text documents based on two primary features, that is, word length and the presence of English equivalent words in the text documents. To start with, it has been shown that based on the word length and the number of English equivalent words present in a particular text, Bangla medical text documents can be identified among other text documents of any domain. SGD (Stochastic Gradient Descent) classification algorithm is used and an accuracy of 97.75% has been achieved. Comparisons have also been done with other commonly used classifiers to test the system from which it has been observed that SGD performs better than those classifiers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

DeySarkar, S., Goswami, S., Agarwal, A., Akhtar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 10 (2014)
Google Scholar
Guru, D.S., Suhil, M.: A novel term_class relevance measure for text categorization. In: Proceedings of International Conference on Advanced Computing Technologies and Applications, pp. 13–22 (2015)
Google Scholar
Jin, P., Zhang, Y., Chen, X., Xia, Y.: Bag-of-embeddings for text classification. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2824–2830 (2016)
Google Scholar
Wang, D., Zhang, H., Liu, R., Lv, W.: Feature selection based on term frequency and T-test for text categorization. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 1482–1486 (2012)
Google Scholar
Gupta, N., Gupta, V.: Punjabi text classification using naive bayes, centroid and hybrid approach. In: Proceedings of Workshop on South and South East Asian Natural Language Processing, pp. 109–122 (2012)
Google Scholar
Mansur, M., UzZaman, N., Khan, M.: analysis of n-gram based text categorization for bangla in a newspaper corpus. In: Proceedings of International Conference on Computer and Information Technology, pp. 08 (2006)
Google Scholar
Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. Int. J. Artif. Intell. Appl. 05, 93–105 (2014)
Google Scholar
Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of International Conference on Cognitive Computing and Information Processing, pp. 1–4 (2015)
Google Scholar
Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A comparative study on different types of approaches to bengali document categorization. In: Proceedings of International Conference on Engineering Research, Innovation and Education, p. 06 (2017)
Google Scholar
Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A support vector machine mixed with TF-IDF algorithm to categorize bengali document. In: Proceedings of International Conference on Electrical, Computer and Communication Engineering, pp. 191–196 (2017)
Google Scholar
Dhar, A., Dash, N.S., Roy, K.: Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of International Conference on Advances in Computing, Communication and Automation, pp. 1–6 (2017)
Google Scholar
Dhar, A., Dash, N.S., Roy, K.: Application of TF-IDF feature for categorizing documents of online Bangla web text corpus. In: Proceedings of International Conference on Frontiers of Intelligent Computing: Theory and Applications, pp. 51–59 (2017)
Google Scholar
ArunaDevi, K., Saveetha, R.: A novel approach on tamil text classification using C-Feature. Int. J. Sci. Res. Dev. 02, 343–345 (2014)
Google Scholar
Swamy, M.N., Thappa, M.H.: Indian Language text representation and categorization using supervised learning algorithm. Int. J. Data Min. Tech. Appl. 02, 251–257 (2013)
Google Scholar
Patil, J.J., Bogiri, N.: Automatic text categorization Marathi documents. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 03, 280–287 (2015)
Google Scholar
Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155, 6–10 (2016)
Google Scholar
Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Min. 03, 255–273 (2015)
Article Google Scholar
Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection. In: Proceedings of International Conference on Arabic language Processing, p. 10 (2014)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Article Google Scholar

Download references

Acknowledgement

One of the authors would like to thank Department of Science and Technology (DST) for support in the form of INSPIRE fellowship.

Author information

Authors and Affiliations

Department of Computer Science, West Bengal State University, Kolkata, West Bengal, India
Ankita Dhar & Kaushik Roy
Linguistic Research Unit, Indian Statistical Institute, Kolkata, India
Niladri Sekhar Dash

Authors

Ankita Dhar
View author publications
You can also search for this author in PubMed Google Scholar
Niladri Sekhar Dash
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankita Dhar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Department of Computer and Systems Sciences, Visva Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, West Bengal, India
Kousik Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhar, A., Dash, N.S., Roy, K. (2019). Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_15

Download citation

DOI: https://doi.org/10.1007/978-981-13-8581-0_15
Published: 26 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics