Advertisement

International Journal of Speech Technology

, Volume 22, Issue 4, pp 971–977 | Cite as

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

  • Hyok-Chol RiEmail author
Article
  • 19 Downloads

Abstract

In large vocabulary continuous speech recognition (LVCSR), it is important in improving the system’s performance to determine reasonably the recognition unit. In Korean continuous speech recognition, a morph rather than a word is used basically as the recognition unit due to Korean’s agglutinative property and a good performance is provided by combining high-frequency morph sequences, which leading to an increase of vocabulary size and high out-of-vocabulary (OOV) rate. Sub-lexical units such as a syllable and a graphone are widely used for inflectional languages, while they have not been introduced successfully for Korean speech recognition, due to a weakness of their linguistic information. In this paper, we investigate a usage of a syllable unit to resolve a mismatch problem between the recognition unit and vocabulary size that have occurred frequently in Korean large vocabulary speech recognition. We apply the local segmentation into syllables based on morphological statistics and perform experiments using the language model (LM) constructed from mixed unit types of morpheme, combined morpheme and syllable. By the proposed model, an absolute reduction of around 0.4% in word error rate (WER) is obtained compared to a traditional LM consisting of morphemes and combined morphemes.

Keywords

Recognition unit Language model Morpheme Syllable 

Notes

Acknowledgements

We appreciate the helpful discussions with Dr. Kim and Prof. Ri, anonymous reviewers and editors for many invaluable comments and suggestions to improve this paper.

References

  1. Adda-Decker, M. (2003). A corpus-based decompounding algorithm for German lexical modeling in LVCSR. Proceedings European Conference on Speech Communication and Technology (pp. 257–260). Geneva, Switzerland.Google Scholar
  2. Bisani, M., & Ney, H. (2005). Open vocabulary speech recognition with flat hybrid models. Interspeech (pp. 725–728), Lisbon, Portugal.Google Scholar
  3. Byrne, W., Hajič, J., Ircing, P., Krbec, P., & Psutka, J. (2000). Morpheme based language models for speech recognition of Czech. Text, Speech and Dialogue, ser. Lecture Notes in Computer Science, 1902 (pp. 139–162). Berlin: Springer.Google Scholar
  4. Creutz, M. (2006). Induction of the morphology of natural language: Unsupervised morpheme segmentation with application to automatic speech recognition. Ph.D. dissertation, Helsinki University of Technology, Finland, 2006.Google Scholar
  5. Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., et al. (2007). Morph-based speech recognition and modeling of out of- vocabulary words across languages. ACM Transactions on Speech and Language Processing,5(1), 3.CrossRefGoogle Scholar
  6. Diehl, F., Gales, M., Tomalin, M., & Woodland, P. (2012). Morphological decomposition in Arabic ASR systems. Computer Speech and Language,26, 229–243.CrossRefGoogle Scholar
  7. El-Desoky, A., Gollan, C., Rybach, D., Schlüter, R., & Ney, H. (2009). Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. Interspeech (pp. 2679–2682), Brighton, UK.Google Scholar
  8. El-Desoky, A., Shaik, M., Schlüter, R., & Ney, H. (2010). Sub-lexical language models for German LVCSR. IEEE Workshop on Spoken Language Technology (pp. 159–164), Berkeley, CA, USA, Dec. 2010.Google Scholar
  9. Hirsimaki, T. (2006). Unlimited vocabulary speech recognition with morph language models applied to Finish. Computer Speech and Language,20, 515–541.CrossRefGoogle Scholar
  10. Huet, S. (2010). Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Computer Speech and Language,24, 663–684.CrossRefGoogle Scholar
  11. Kneissler, J., & Klakow, D. (2001). Speech recognition for huge vocabularies by using optimized sub-word units. Proceedings of the European Conference on Speech Communication and Technology, 1, (pp. 69–72). Aalborg, Denmark.Google Scholar
  12. Kurimo, M., Puurula, A., Arisoy, E., Siivola, V., Hirsimäki, T., Pylkkönen, J., Alumäe, T., & Saraclar, M. (2006). Unlimited vocabulary speech recognition for agglutinative languages. Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (pp. 487–494).Google Scholar
  13. Larson, M., Willett, D., Köhler, J., & Rigoll, R. (2000). Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches. Proceedings of the International Conference on Spoken Language Processing, Beijing, China.Google Scholar
  14. Ordelman, R., Hassen, A. V., & Jong, F. D. (2003). Compound decomposition in Dutch large vocabulary speech recognition. Proceedings of the European Conference on Speech Communication and Technology (pp. 225–228), Geneva, Switzerland.Google Scholar
  15. Piotr, M. (2008). Syllable based language model for large vocabulary continuous speech recognition of polish. Text, Speech and Dialogue, ser. Lecture Notes in Computer Science,5246, 397–401.CrossRefGoogle Scholar
  16. Rotovnik, T., Maučec, M. S., & Kačič, Z. (2007). Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Communication,49(6), 452–537.CrossRefGoogle Scholar
  17. Schrumpf, C., Larson, M., & Eickeler, S. (2005). Syllable-based language models in speech recognition for English spoken document retrieval. Proceedings of the 7th International Workshop of the EU Network of Excellence DELOS on AVIVDiLib (pp. 196–205). Cortona, Italy.Google Scholar
  18. Shaik, M., El-Desoky, A.,Schlüter, R., & Ney, H. (2011). hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR. Interspeech (pp. 28–31). Florence, Italy.Google Scholar
  19. Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. Proceedings of the International Conference on Spoken Language Processing, 2 (pp. 901–904). Denver, Colorado, USA.Google Scholar
  20. Stolke, A. (2006). Morphology-based language modeling for conversational Arabic speech recognition. Computer Speech and Language,20, 589–608.CrossRefGoogle Scholar
  21. Xu, B., Ma, B., Zhang, S., Qu, F., & Huang, T. (1996). Speaker independent dictation of Chinese speech with 32K vocabulary. Proceeding of Fourth International Conference on Spoken Language Processing (Vol. 4, pp. 2320 – 2323), Philadelphia, PA, USA.Google Scholar
  22. Young, S., et al. (2006). The HTK book version 3.4. Cambridge: Cambridge University.Google Scholar
  23. Zitoni, I. (2003). Statistical language modeling based on variable-length sequences. Computer Speech and Language,17, 27–41.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Information ScienceKIM IL SUNG UniversityPyongyangDemocratic People’s Republic of Korea

Personalised recommendations