Building a Language Data Set in Telugu Using Machine Learning Techniques to Address Suicidal Ideation and Behaviors in Adolescents

Soumya, K.; Garg, Vijay Kumar

doi:10.1007/978-981-16-3067-5_1

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 778))

1464 Accesses

Abstract

Taking one’s own life is a tragic reaction to stressful situations in life. There is a noticeable increase in the ratio of number of suicides every year in Telangana [1]. Most of them are adolescents and youngsters and others too. So there is an urging need of research to be done on suicidal ideation and preventive methods to support mental health professionals and psychotherapists. So this paper aims in developing technological solutions to the problem. Suicides can be prevented if we could identify the mental health conditions of a person with ideations and predict the severity in earlier [2]. So in this paper, we applied machine learning algorithms to categorize persons with suicidal ideations from the data that is maintained or recorded during visit of an adolescent with a mental health professional in textual form of questionnaires. The data is recorded in native Telugu language during the session, as most of cases are from illiterates [1, 3]. So in order to classify the patient test data with more accuracy, there is a need of language corpus in Telugu with ideations. So this paper would give a great insight into creation of suicidal language or ideation corpora in native language Telugu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nilesh V (2019) Telangana has third-highest suicide rate in India. NCRB. https://www.newindianexpress.com/states/telangana/2019/nov/11/telangana-has-third-highest-suicide-rate-in-india-ncrb-2060087.html
Choudhary N, Singh R, Bindlish I, Shrivastava M (2018a) Emotions are universal: learning sentiment based representations of resource-poor languages using siamese networks. arXiv preprint arXiv:1804.00805
Rohit PS, State records highest suicide rate in country. https://www.thehindu.com/news/national/telangana/state-records-highest-suicide-rate-in-country/article8433720.ece#comments_14219168
Naidu R, Bharti SK, Babu KS, Mohapatra RK (2017) Sentiment analysis using Telugu SentiWordNet. In: 2017 international conference on wireless communications, signal processing and networking (WiSPNET), Chennai, pp 666–670. https://doi.org/10.1109/wispnet.2017.8299844
Magdum D, Dubey MS, Patil T, Shah R, Belhe S, Kulkarni M (2015) Methodology for designing and creating Hindi speech corpus. In: 2015 international conference on signal processing and communication engineering systems, Guntur, pp 336–339. https://doi.org/10.1109/spaces.2015.7058279
Gangula RR, Mamidi R (2018) Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Conference: language resources and evaluation conference, At Miyazaki (Japan)
Google Scholar
Srirangam V, Abhinav A, Singh V, Shrivastava M (2019) Corpus creation and analysis for named entity recognition in Telugu-English code-mixed social media data. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. https://doi.org/10.18653/v1/p19-2025
Abdelali A, Guzman F, Sajjad H, Vogel S (2014) The AMARA corpus: Building parallel language resources for the educational domain. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland, pp 1856–1862
Google Scholar
Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. SAGE J. https://doi.org/10.1177/0265532217710675
Choi Y, Wiebe J (2014) Effectwordnet: sense-level lexicon acquisition for opinion inference. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1181–1191
Google Scholar
Wołk K, Marasek K (2014) A sentence meaning based alignment method for parallel text corpora preparation. Adv Intell Syst Comput 275:107–114. arXiv:1509.09090
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Google Scholar
Aguilar WG, Alulema D, Limaico A, Sandoval D (2017) Development and verification of a verbal corpus based on natural language for Ecuadorian dialect. In: IEEE 11th International Conference on Semantic Computing (ICSC), San Diego, CA, 2017, pp 515–519. https://doi.org/10.1109/icsc.2017.82
Choudhary N, Singh R, Bindlish I, Shrivastava M (2018b) Sentiment analysis of code-mixed languages leveraging resource rich languages. arXiv preprint arXiv:1804.00806

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, Lovely Professional University, Phagwara, Punjab, India
K. Soumya & Vijay Kumar Garg
VBIT, Hyderabad, India
K. Soumya

Authors

K. Soumya
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Kumar Garg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Ankur Choudhary
Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Arun Prakash Agrawal
Asia Pacific Centre for Analytics (APCA), Asia Pacific University of Technology and Innovation (APU), Kuala Lumpur, Malaysia
Rajasvaran Logeswaran
Information Technology, University of South Florida Sarasota–Manatee Campus, Sarasota, FL, USA
Bhuvan Unhelkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soumya, K., Garg, V.K. (2021). Building a Language Data Set in Telugu Using Machine Learning Techniques to Address Suicidal Ideation and Behaviors in Adolescents. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-3067-5_1
Published: 27 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics