Galo Lexical Tone Recognition Using Machine Learning Approach

Kamdak, Bomken; Taye, Gom; Bhattacharjee, Utpal

doi:10.1007/978-981-99-7093-3_23

Bomken Kamdak¹³,
Gom Taye¹³ &
Utpal Bhattacharjee¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 798))

Included in the following conference series:

International Conference on Image Processing and Capsule Networks

165 Accesses

Abstract

In this paper, we present the development of a Galo speech corpus and the design of a Galo lexical tone recognizer. The Galo language is spoken by the Galo people residing in Arunachal Pradesh, a frontier state of North East India. Galo belongs to the Tani branch of the Tibeto–Burman language family. Galo exhibits two discernible lexical tones: High/Plain, characterized by a moderately high-level pitch contour and Low/Tense, characterized by a descending pitch contour. The speech database used in this study contains Galo phrases spoken by 27 individuals, comprising 13 male and 14 female speakers aged between 20 and 50 years. The recording of each speaker encompasses a phonetically diverse script consisting of twenty Galo tonal words and all their possible tonal variations. The work aims to automatically recognize these lexical tones by employing seven features extracted from the fundamental frequency (F0) contour. Two classification models based on machine learning, specifically support vector machine (SVM) and random forest (RF), were employed to identify lexical tones. The RF-based recognizer achieves a recognition accuracy of 90.42%, while the SVM-based recognizer surpasses that with an accuracy of 92.31%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Moore R (2007) Presence: a human-inspired architecture for speech-based human-machine interaction. IEEE Trans Comput 56(9):1176–1188
Article MathSciNet Google Scholar
Li YW, Cheng X, Ding C, Galvin JJ, Chen B, Fu QJ (2023) Benefits of long-term music training for segregation of competing speech by tonal language speakers. J Acoust Soc Am 153(no. 3_supplement):A330–A330
Google Scholar
Post MW (2007) A grammar of Galo. Ph.D. Dissertation. La Trobe University
Google Scholar
Chen XX, Cai CN, Guo P, Sun Y (1987) A hidden Markov model applied to Chinese four-tone recognition. In: ICASSP'87 IEEE international conference on acoustics, speech, and signal processing, vol 12, pp 797–800 IEEE
Google Scholar
Chang, PC, Sun SW, Chen SH (1990) Mandarin tone recognition by multi-layer perceptron. In: international conference on acoustics, speech, and signal processing, IEEE pp 517–520
Google Scholar
Lee T, Ching PC, Chan LW, Cheng YH, Mak B (1995) Tone recognition of isolated Cantonese syllables. IEEE Trans Speech Audio process 3(3):204–209
Google Scholar
Lee T, Kochanski G, Shih C, Li Y (2002) Modeling tones in continuous Cantonese speech. In: Proceedings ICSLP, pp 2401–2404
Google Scholar
Peng G, Wang WS (2004) Parallel tone score association method for tone language speech recognition. In: Proceedings international conference on spoken language processing (ICSLP)
Google Scholar
Wang S, Levow GA (2008) Mandarin Chinese tone nucleus detection with landmarks. In: ninth annual conference of the International Speech Communication Association
Google Scholar
Hsieh YL, Chuang CT, Hsieh FF, Chang YC, Hsu WL (2014) Taiwanese tone recognition using fractionalized curve-fitting of prosodic features. In: Proceedings 7th international conference on speech prosody 2014, pp 772–775
Google Scholar
Gogoi P, Tzudir M, Sarmah P, Prasanna SRM (2020) Automatic tone recognition of Ao language. In Proceedings 10th international conference on speech prosody 2020, pp 1005–1008
Google Scholar
Gogoi P, Dey A, Lalhminghlui W, Sarmah P, Prasanna SM (2020) Lexical tone recognition in Mizo using acoustic-prosodic features. In: Proceedings of the twelfth language resources and evaluation conference, pp 6458–6461
Google Scholar
Zhao X, Shaughnessy DO, Minh-Quang N (2007) A processing method for pitch smoothing based on autocorrelation and cepstral F0 detection approaches. In: 2007 international symposium on signals, systems and electronics, pp 59–62 IEEE
Google Scholar
Vapnik V (1999) The nature of statistical learning theory. Springer science and business media
Google Scholar
Gigović L, Pourghasemi HR, Drobnjak S, Bai S (2019) Testing a new ensemble model based on SVM and random forest in forest fire susceptibility assessment and its mapping in Serbia’s Tara National Park. Forests 10(5):408
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Abuella M, Chowdhury B (2017) Hourly probabilistic forecasting of solar power. In: 2017 North American power symposium (NAPS), IEEE pp 1–5
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Rajiv Gandhi University, Itanagar, Arunachal Pradesh, 791112, India
Bomken Kamdak, Gom Taye & Utpal Bhattacharjee

Authors

Bomken Kamdak
View author publications
You can also search for this author in PubMed Google Scholar
Gom Taye
View author publications
You can also search for this author in PubMed Google Scholar
Utpal Bhattacharjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Utpal Bhattacharjee .

Editor information

Editors and Affiliations

Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
Subarna Shakya
Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Universidad de Castilla-La Mancha, Albacete, Albacete, Spain
Antonio Fernández-Caballero
Department of Computer Science, International Hellenic University, Kavala, Greece
George Papakostas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamdak, B., Taye, G., Bhattacharjee, U. (2023). Galo Lexical Tone Recognition Using Machine Learning Approach. In: Shakya, S., Tavares, J.M.R.S., Fernández-Caballero, A., Papakostas, G. (eds) Fourth International Conference on Image Processing and Capsule Networks. ICIPCN 2023. Lecture Notes in Networks and Systems, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-99-7093-3_23

Download citation

DOI: https://doi.org/10.1007/978-981-99-7093-3_23
Published: 18 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7092-6
Online ISBN: 978-981-99-7093-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics