Investigating the Recognition of Non-articulatory Sounds by Using Statistical Tests and Support Vector Machine
People with articulation and phonological disorders need training to plan and to execute sounds of speech. Compared to other children, children with Down Syndrome have significantly delayed speech development because they present developmental disabilities, mainly apraxia of speech. In practice, speech therapists plan and perform trainings of articulatory and non-articulatory sounds such as blow production and popping lips in order to assist speech production. Mobile applications can be integrated into the clinical treatment to transcend the boundaries of clinics and schedules and therefore reach more people at any time. The use of artificial intelligence and machine learning techniques can improve this kind of application. The aim of this pilot study is to assess speech recognition methods prioritizing the training of sounds for speech production, particularly the non-articulatory sounds. These methods apply Mel-Frequency Cepstrum Coefficients and Laplace transform to extract features, as well as traditional statistical tests and Support Vector Machine (SVM) to recognize sounds. This study also reports experimental results regarding the effectiveness of the methods on a set of 197 sounds. Overall, SVM provides higher accuracy.
KeywordsDelayed speech development Speech recognition methods Machine learning Automatic speech recognition
Authors are grateful to CNPq (442533/2016-0) and FAPESP (2016/13206-4) for the funding. We would also to thank Maria Roberta Cantarelli, Myrian Neves, Thais Moretti, Aline Camargo, and the individuals with DS by their participation.
- 2.A. Hennequin, A. Rochet-Capellan, M. Dohen, Auditory-visual perception of VCVs produced by people with down syndrome: preliminary results, in 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), San Francisco, Sept. 2016Google Scholar
- 4.H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, in Readings in Speech Recognition, ed. by A. Waibel, K.-F. Lee (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1990), pp. 159–165. Available: http://dl.acm.org/citation.cfm?id=108235.108244 CrossRefGoogle Scholar
- 5.A. Alatwi, S. So, K.K. Paliwal, Perceptually motivated linear prediction cepstral features for network speech recognition, in 10th International Conference on Signal Processing and Communication Systems, ICSPCS 2016, Surfers Paradise, Gold Coast, Australia, December 19–21, 2016 (2016), pp. 1–5Google Scholar
- 6.D. Yu, L. Deng, Automatic Speech Recognition: A Deep Learning Approach (Springer, Berlin, 2014)Google Scholar
- 7.R. Courant, D. Hilbert, Methods of Mathematical Physics, vol. 1 (Interscience, New York, 1953)Google Scholar
- 9.D. Kremelberg, Practical Statistics: A Quick and Easy Guide to IBM SPSS Statistics, STATA, and Other Statistical Software (Sage Publications, Inc., Thousand Oaks, 2010)Google Scholar
- 10.V.N. Vapnik, The Nature of Statistical Learning Theory (Springer, Inc., New York, 1995)Google Scholar
- 11.S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall PTR, Upper Saddle River, 1998)Google Scholar
- 12.B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory (ACM, New York, 1992), pp. 144–152Google Scholar
- 13.F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATHGoogle Scholar
- 15.O. Bälter, O. Engwall, A.-M. Öster, H. Kjellström, Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction, in Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility, ser. Assets ’05 (ACM, New York, 2005), pp. 36–43Google Scholar
- 16.T.M. Kuan, Y.K. Jiar, E. Supriyanto, Language assessment and training support system (LATSS) for down syndrome children under 6 years old. WSEAS Trans. Inf. Sci. Appl. 7(8), 1058–1067 (2010)Google Scholar