Abstract
This study compared and analyzed the speech recognition performance of Korean phonological rules for cloud-based Open APIs, and analyzed the speech recognition characteristics of Korean phonological rules. As a result of the experiment, Kakao and MS showed good performance in speech recognition. By phonological rule, Kakao showed good performance in all areas except for nasalization and Flat stop sound formation in final syllable. The performance of speech recognition of Korean phonological rules was good for /l/nasalization and /h/deletion. The speech recognition performance of phonological rule words accounted for a very high percentage of the whole words speech recognition performance, and the speech recognition performance of phonological rule was more different among companies than between speakers. This study hopes to contribute to the improvement of speech recognition system performance of cloud companies for Korean phonological rules and is expected to help speech recognition developers select Open API for application speech recognition system development.
Article PDF
Avoid common mistakes on your manuscript.
References
J.H. Jeong, Current status and challenges of cloud computing, NARS Issue Rep. 313 (2017), 17–21.
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag. 29 (2012), 82–97.
Trends and Prospects of Voice Recognition Technology, Korea Creative Content Agency’s Cultural Technology (CT) In-depth Report, 11 (2011), 2020, Available from: https://www.kocca.kr/cop/bbs/view/B0000144/1756144.do?menuNo=.
K.N. Lee, M.H. Chung, Morphological analysis of spoken Korean based on pseudo-morphemes, Proceedings of the Annual Conference on Human and Language Technology, Korean Institute of Information Scientists and Engineers, Busan, Korea, 10 (1998), pp. 396–404.
J.U. Bang, S.H. Kim, O.W. Kwon, Performance of speech recognition unit considering morphological pronunciation variation, Phonet. Speech Sci. 10 (2018), 111–119.
J.U. Bang, S.H. Kim, O.W. Kwon, Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data, Phonet. Speech Sci. 11 (2019), 39–47.
K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, P. Nguyen, On the choice of modeling unit for sequence-to-sequence speech recognition, Proc. Interspeech 7 (2019), 3800–3804.
M.H. Lee, J.H. Chang, Korean speech recognition based on grapheme, J. Acoust. Soc. Korea 38 (2019), 601–606.
J.c. Bae, Opening of Korean Phonetics, third ed., (Hak)Shingu media & publishing, Gyeonggi Sungnamsi Jungwongu, Korea, 2018.
W. Chan, N. Jaitly, Q. Le, O. Vinyals, “Listen, attend and spell: a neural network for large vocabulary conversational speech recognition”, Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Shanghai, China, 2016, pp. 4960–4964.
L.G. Nim, J.M. Hwa, Pronunciation dictionary for continuous speech recognition (in Korean), Proc. KIISE. Conf. 27 (2000), 197–199.
P. Younghee, M. Chung, Pseudomorpheme-based Korean continuous speech recognition using tagged word bigram, Korean Inst. Inform. Sci. Eng. 26 (1999), 351–353.
J.W. Yoo, A study on method of constructing pronunciation unit for continuous speech recognition, The Korean Electronics and Telecommunications Research Institute report, ETRI-94-03295, 1 (1995).
L. Chang-Beom, Legal tasks for safe use and revitalization of cloud computing, Review of The Korea Institute of Information Security and Cryptology (Review of KIISC) 20 (2010), 32–43.
Guide of Kakao Speech API, 2020, Available from: https://devel-opers.kakao.com/docs/latest/ko/voice/.
Guide of NUGU SDK Developers, 2020, Available from: https://developers-doc.nugu.co.kr/nugu-sdk.
Guide of Clova Speech Recognition, 2020, Available from: https://www.ncloud.com/product/aiService/csr.
Guide of GiGA Genie Speech Recognition API, 2020, Available from: https://apilink.kt.co.kr/api/menu/apiSpcDetail.do?apiSp-cId=57.
Guide of aihub Speech Recognition API, 2020, Available from: https://www.aihub.or.kr/ai_software/370#group00.
Guide of Azure Speech to Text, 2020, Available from: https://azure.microsoft.com/ko-kr/services/cognitive-services/speech-to-text/.
Guide of Amazon Transcribe, 2020, Available from: https://aws.amazon.com/ko/transcribe.
Guide of Watson Speech to Text, 2020, Available from: https://www.ibm.com/kr-ko/cloud/watson-speech-to-text.
Guide of Google Speech-to-Text, 2020, Available from: https://cloud.google.com/speech-to-text/.
V. Këpuska, G. Bohouta, Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx), Int. J. Eng. Res. Appl. 7 (2017), 20–24.
S.J. Choi, J.B. Kim, Comparison analysis of speech recognition open APIs’ accuracy, Asia Pac. J. Multim. Serv. Converge. Art Human. Sociol. 7 (2017), 411–418.
A.L. Herchonvicz, C.R. Franco, M.G. Jasinski, A comparison of cloud-based speech recognition engines, Computer on the Beach, 4 (2019), 366–375.
H. Roh, K. Lee, A basic performance evaluation of the speech recognition APP of standard language and dialect using Google Naver and DaumKAKAO APIs, Asia Pac. J. Multim. Serv. Converge. Art Human. Sociol. 7 (2017), 819–829.
I. Bobriakov, Comparison of the top speech processing APIs, 2018, Available from: https://activewizards.com/blog/comparison-of-the-top-speech-processing-apis.
O. Hyun-woo, L. Koen-Nyeong, Y. Dongsuk, Performance comparison of open APIs for speech recognition, Journal of the Acoustical Society of Korea 2019 Spring Conference(Jeju, Korea), 5 (2019), Volume 38, No 1(s), P256.
J. Lee, Lecture on Korean Phonology, Samkyung Munhwa Sa, Seoul Gangbukgu Miadong, Korea, 2014.
J.h. Lee, G.h. Lee, S.j. Kim, Korean Pronouncing Dictionary, Jigu Publishing Co., Gyoha-eup, Paju-si, Gyeonggi-do, Korea, 2008.
J. Laver, Principles of Phonetics, Cambridge University Press, New York, 1994, p. 561.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Yoo, H.J., Seo, S., Im, S.W. et al. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API. Int J Netw Distrib Comput 9, 10–18 (2021). https://doi.org/10.2991/ijndc.k.201218.005
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijndc.k.201218.005