The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API

Yoo, Hyun Jae; Seo, Sungwoong; Im, Sun Woo; Gim, Gwang Yong

doi:10.2991/ijndc.k.201218.005

The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API

Research Article
Open access
Published: 08 January 2021

Volume 9, pages 10–18, (2021)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Networked and Distributed Computing Aims and scope Submit manuscript

The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API

Download PDF

Hyun Jae Yoo¹,
Sungwoong Seo¹,
Sun Woo Im² &
…
Gwang Yong Gim¹

105 Accesses
7 Citations
Explore all metrics

Abstract

This study compared and analyzed the speech recognition performance of Korean phonological rules for cloud-based Open APIs, and analyzed the speech recognition characteristics of Korean phonological rules. As a result of the experiment, Kakao and MS showed good performance in speech recognition. By phonological rule, Kakao showed good performance in all areas except for nasalization and Flat stop sound formation in final syllable. The performance of speech recognition of Korean phonological rules was good for /l/nasalization and /h/deletion. The speech recognition performance of phonological rule words accounted for a very high percentage of the whole words speech recognition performance, and the speech recognition performance of phonological rule was more different among companies than between speakers. This study hopes to contribute to the improvement of speech recognition system performance of cloud companies for Korean phonological rules and is expected to help speech recognition developers select Open API for application speech recognition system development.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

J.H. Jeong, Current status and challenges of cloud computing, NARS Issue Rep. 313 (2017), 17–21.
Google Scholar
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag. 29 (2012), 82–97.
Google Scholar
Trends and Prospects of Voice Recognition Technology, Korea Creative Content Agency’s Cultural Technology (CT) In-depth Report, 11 (2011), 2020, Available from: https://www.kocca.kr/cop/bbs/view/B0000144/1756144.do?menuNo=.
K.N. Lee, M.H. Chung, Morphological analysis of spoken Korean based on pseudo-morphemes, Proceedings of the Annual Conference on Human and Language Technology, Korean Institute of Information Scientists and Engineers, Busan, Korea, 10 (1998), pp. 396–404.
J.U. Bang, S.H. Kim, O.W. Kwon, Performance of speech recognition unit considering morphological pronunciation variation, Phonet. Speech Sci. 10 (2018), 111–119.
Google Scholar
J.U. Bang, S.H. Kim, O.W. Kwon, Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data, Phonet. Speech Sci. 11 (2019), 39–47.
Google Scholar
K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, P. Nguyen, On the choice of modeling unit for sequence-to-sequence speech recognition, Proc. Interspeech 7 (2019), 3800–3804.
Google Scholar
M.H. Lee, J.H. Chang, Korean speech recognition based on grapheme, J. Acoust. Soc. Korea 38 (2019), 601–606.
Google Scholar
J.c. Bae, Opening of Korean Phonetics, third ed., (Hak)Shingu media & publishing, Gyeonggi Sungnamsi Jungwongu, Korea, 2018.
W. Chan, N. Jaitly, Q. Le, O. Vinyals, “Listen, attend and spell: a neural network for large vocabulary conversational speech recognition”, Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Shanghai, China, 2016, pp. 4960–4964.
Google Scholar
L.G. Nim, J.M. Hwa, Pronunciation dictionary for continuous speech recognition (in Korean), Proc. KIISE. Conf. 27 (2000), 197–199.
Google Scholar
P. Younghee, M. Chung, Pseudomorpheme-based Korean continuous speech recognition using tagged word bigram, Korean Inst. Inform. Sci. Eng. 26 (1999), 351–353.
Google Scholar
J.W. Yoo, A study on method of constructing pronunciation unit for continuous speech recognition, The Korean Electronics and Telecommunications Research Institute report, ETRI-94-03295, 1 (1995).
L. Chang-Beom, Legal tasks for safe use and revitalization of cloud computing, Review of The Korea Institute of Information Security and Cryptology (Review of KIISC) 20 (2010), 32–43.
Google Scholar
Guide of Kakao Speech API, 2020, Available from: https://devel-opers.kakao.com/docs/latest/ko/voice/.
Guide of NUGU SDK Developers, 2020, Available from: https://developers-doc.nugu.co.kr/nugu-sdk.
Guide of Clova Speech Recognition, 2020, Available from: https://www.ncloud.com/product/aiService/csr.
Guide of GiGA Genie Speech Recognition API, 2020, Available from: https://apilink.kt.co.kr/api/menu/apiSpcDetail.do?apiSp-cId=57.
Guide of aihub Speech Recognition API, 2020, Available from: https://www.aihub.or.kr/ai_software/370#group00.
Guide of Azure Speech to Text, 2020, Available from: https://azure.microsoft.com/ko-kr/services/cognitive-services/speech-to-text/.
Guide of Amazon Transcribe, 2020, Available from: https://aws.amazon.com/ko/transcribe.
Guide of Watson Speech to Text, 2020, Available from: https://www.ibm.com/kr-ko/cloud/watson-speech-to-text.
Guide of Google Speech-to-Text, 2020, Available from: https://cloud.google.com/speech-to-text/.
V. Këpuska, G. Bohouta, Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx), Int. J. Eng. Res. Appl. 7 (2017), 20–24.
Google Scholar
S.J. Choi, J.B. Kim, Comparison analysis of speech recognition open APIs’ accuracy, Asia Pac. J. Multim. Serv. Converge. Art Human. Sociol. 7 (2017), 411–418.
Google Scholar
A.L. Herchonvicz, C.R. Franco, M.G. Jasinski, A comparison of cloud-based speech recognition engines, Computer on the Beach, 4 (2019), 366–375.
Google Scholar
H. Roh, K. Lee, A basic performance evaluation of the speech recognition APP of standard language and dialect using Google Naver and DaumKAKAO APIs, Asia Pac. J. Multim. Serv. Converge. Art Human. Sociol. 7 (2017), 819–829.
Google Scholar
I. Bobriakov, Comparison of the top speech processing APIs, 2018, Available from: https://activewizards.com/blog/comparison-of-the-top-speech-processing-apis.
O. Hyun-woo, L. Koen-Nyeong, Y. Dongsuk, Performance comparison of open APIs for speech recognition, Journal of the Acoustical Society of Korea 2019 Spring Conference(Jeju, Korea), 5 (2019), Volume 38, No 1(s), P256.
J. Lee, Lecture on Korean Phonology, Samkyung Munhwa Sa, Seoul Gangbukgu Miadong, Korea, 2014.
J.h. Lee, G.h. Lee, S.j. Kim, Korean Pronouncing Dictionary, Jigu Publishing Co., Gyoha-eup, Paju-si, Gyeonggi-do, Korea, 2008.
J. Laver, Principles of Phonetics, Cambridge University Press, New York, 1994, p. 561.

Download references

Author information

Authors and Affiliations

Department of IT Policy and Management, Graduate School, Soongsil University, Seoul, Korea
Hyun Jae Yoo, Sungwoong Seo & Gwang Yong Gim
Graduate School of Korean Language and Literature, Soongsil University, Seoul, Korea
Sun Woo Im

Authors

Hyun Jae Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Sungwoong Seo
View author publications
You can also search for this author in PubMed Google Scholar
Sun Woo Im
View author publications
You can also search for this author in PubMed Google Scholar
Gwang Yong Gim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gwang Yong Gim.

Rights and permissions

This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Yoo, H.J., Seo, S., Im, S.W. et al. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API. Int J Netw Distrib Comput 9, 10–18 (2021). https://doi.org/10.2991/ijndc.k.201218.005

Download citation

Received: 09 October 2020
Accepted: 18 November 2020
Published: 08 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.2991/ijndc.k.201218.005

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation