Advertisement

Speech Activity Detection for Deaf People: Evaluation on the Developed Smart Solution Prototype

  • Ales BergerEmail author
  • Filip Maly
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 830)

Abstract

This research constitutes a relatively new approach by developing a smart solution which has emerged from the research activity using at first Google Glass and usable speech detection services. The authors conducted in the last year a series of developing, testing and evaluating the prototype results in order to decide, which service provides better results than the third-party speech detection service like Google Speech API or IBM Watson Speech To Text. This finding should significantly help the authors during the data evaluation and testing in developed smart solution. The basic idea is that authors have already developed a functional basic solution—a prototype. This solution was properly working and usable, but there are still some disadvantages to be improved. In order to accomplish the best results possible, the authors have added another element to their solution. A challenging problem which arises in this domain is concerned with significant data savings, server load, detection quality, and again opens a space for further improvements, such as following research and testing. This element is part of the statistical analysis and it is called Hidden Markov Model, which is used for speech recognition applications for last twenty years. The authors examined and studied many different articles and scientific sources in order to find the best solution for higher efficiency of speech recognition usable in their developed prototype (and for this article).

Keywords

Speech and natural language processing Voice detection Smart device Smart solution Android OS Deafness R language RStudio 

Notes

Acknowledgements

This work and the contribution were supported by the project of Students Grant Agency—FIM, University of Hradec Kralove, Czech Republic. Ales Berger is a student member of the research team.

References

  1. 1.
    Graf, S. et al.: Features for voice activity detection: a comparative analysis. EURASIP J. Adv. Sign. Process. 1, 91 (2015)Google Scholar
  2. 2.
    Yanna, M.A., Nishihara. A.: Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP J. Audio Speech Music Process. 1, 87 (2013)Google Scholar
  3. 3.
    Warakagoda, N.D.: A hybrid ANN-HMM ASR system with NN based adaptive preprocessing. May. Web (1996)Google Scholar
  4. 4.
    Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003, Proceedings (ICASSP’03), pp. I–I. IEEE (2003)Google Scholar
  5. 5.
    Shearer, A.E., Hildebrand, M.S., Smith, R.J.H.: Hereditary hearing loss and deafness overview (2017)Google Scholar
  6. 6.
    Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Sig. Process Lett. 6(1), 1–3 (1999)CrossRefGoogle Scholar
  7. 7.
    Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence 1–1024 (2009)Google Scholar
  8. 8.
    Kyle, J.G., et al.: Sign language: the study of deaf people and their language. Cambridge University Press, Cambridge (1988)Google Scholar
  9. 9.
    Berger, A., et al.: Google glass used as assistive technology its utilization for blind and visually impaired people. In: International Conference on Mobile Web and Information Systems, pp. 70–82. Springer, Cham (2017)Google Scholar
  10. 10.
    Berger, A., Maly, F.: Prototype of a smart google glass solution for deaf (and hearing impaired) people. In: International Conference on Mobile Web and Intelligent Information Systems, pp. 38–47. Springer, Cham (2018)Google Scholar
  11. 11.
    Gandrud, C.: Reproducible research with R and R studio. Chapman and Hall/CRC (2016)Google Scholar
  12. 12.
    Urbanek, S.: Audio Interface for R. URL: https://cran.r-project.org/package=audio
  13. 13.
    Ligges, U., et al.: Analysis of Music and Speech. URL: https://cran.r-project.org/package=tuneR
  14. 14.
    Sueur, J., et al. Sound Analysis and Synthesis. URL: https://cran.r-project.org/package=seewave
  15. 15.
    Himmelmann, L.: HMM—Hidden Markov Models. URL: https://cran.r-project.org/package=HMM
  16. 16.
    Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)CrossRefGoogle Scholar
  17. 17.
    Garofolo, J.S. et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium (1993)Google Scholar
  18. 18.
    Aalen, O.O., Johansen, S.: An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Stat. 1, 141–150 (1978)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Lou, H.-L.: Implementing the Viterbi algorithm. IEEE Signal Process. Mag. 12(5), 42–52 (1995)CrossRefGoogle Scholar
  20. 20.
    Tatarinov, J., Pollák, P.: Hidden markov models in voice activity detection. In: COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Informatics and ManagementUniversity of Hradec KraloveHradec KraloveCzech Republic

Personalised recommendations