Circuits, Systems, and Signal Processing

, Volume 34, Issue 4, pp 1279–1304 | Cite as

Mouth State Detection From Low-Frequency Ultrasonic Reflection

Article
  • 147 Downloads

Abstract

This paper develops, simulates and experimentally evaluates a novel method based on non-contact low frequency (LF) ultrasound which can determine, from airborne reflection, whether the lips of a subject are open or closed. The method is capable of accurately distinguishing between open and closed lip states through the use of a low-complexity detection algorithm, and is highly robust to interfering audible noise. A novel voice activity detector is implemented and evaluated using the proposed method and shown to detect voice activity with high accuracy, even in the presence of high levels of background noise. The lip state detector is evaluated at a number of angles of incidence to the mouth and under various conditions of background noise. The underlying mouth state detection technique relies upon an inaudible LF ultrasonic excitation, generated in front of the face of a user, either reflecting back from their face as a simple echo in the closed mouth state or resonating inside the open mouth and vocal tract, affecting the spectral response of the reflected wave when the mouth is open. The difference between echo and resonance behaviours is used as the basis for automated lip opening detection, which implies determining whether the mouth is open or closed at the lips. Apart from this, potential applications include use in voice generation prosthesis for speech impaired patients, or as a hands-free control for electrolarynx and similar rehabilitation devices. It is also applicable to silent speech interfaces and may have use for speech authentication.

Keywords

Lip state detection Low frequency ultrasound Mouth state detection Speech activity detection Voice activity detection 

References

  1. 1.
    F. Ahmadi, Voice replacement for the severely speech impaired through sub-ultrasonic excitation of the vocal tract. Ph.D. Thesis, Nanyang Technological University (2013). http://repository.ntu.edu.sg/handle/10356/52661
  2. 2.
    F. Ahmadi, M. Ahmadi, I.V. McLoughlin, Human mouth state detection using low frequency ultrasound, in INTERSPEECH, (2013) pp. 1806–1810Google Scholar
  3. 3.
    F. Ahmadi, I.V. McLoughlin, The use of low-frequency ultrasonics in speech processing, in Signal Processing, ed. by Sebastian Miron (InTech, 2010). ISBN: 978-953-7619-91-6Google Scholar
  4. 4.
    F. Ahmadi, I.V. McLoughlin, Measuring resonances of the vocal tract using frequency sweeps at the lips, in 2012 5th International Symposium on Communications Control and Signal Processing (ISCCSP) (2012)Google Scholar
  5. 5.
    F. Ahmadi, I.V. McLoughlin, S. Chauhan, G. ter Haar, Bio-effects and safety of low-intensity, low-frequency ultrasonic exposure. Progr. Biophys. Mol. Biol. 108, 3 (2012)CrossRefGoogle Scholar
  6. 6.
    F. Ahmadi, I.V. McLoughlin, H.R. Sharifzadeh, Autoregressive modelling for linear prediction of ultrasonic speech, in INTERSPEECH, (2010), pp. 1616–1619Google Scholar
  7. 7.
    S.P. Arjunan, H. Weghorn, D.K. Kumar, W.C. Yau, Vowel recognition of English and German language using facial movement (SEMG) for speech control based HCI, in Proceedings of the HCSNet workshop on Use of vision in human–computer interaction—Volume 56, VisHCI ’06, ( Australian Computer Society, Inc. 2006), pp. 13–18Google Scholar
  8. 8.
    D. Beautemps, P. Badin, R. Laboissihere, Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data. Speech Commun. 16, 27–47 (1995)CrossRefGoogle Scholar
  9. 9.
    M. Brookes, et al., Voicebox: Speech processing toolbox for matlab. Software, available [Mar. 2011] from www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (1997)
  10. 10.
    G.L. Calhoun, G.R. McMillan, Hands-free input devices for wearable computers, in Proceedings of the Fourth Symposium on Human Interaction with Complex Systems, HICS ’98, (IEEE Computer Society 1998) p. 118Google Scholar
  11. 11.
    B.G. Douglass, Apparatus and Method for Detecting Speech Using Acoustic Signals Outside the Audible Frequency Range (United States Patent and Trademark Office, United States, 2006)Google Scholar
  12. 12.
    J. Epps, J.R. Smith, J. Wolfe, A novel instrument to measure acoustic resonances of the vocal tract during speech. Meas. Sci. Technol. 8, 1112–1121 (1997)CrossRefGoogle Scholar
  13. 13.
    L.J. Eriksson, Higher order mode effects in circular ducts and expansion chambers. J. Acoust. Soc. Am. 68(2), 545–550 (1980)CrossRefMathSciNetGoogle Scholar
  14. 14.
    J.-P. Fouque, J. Garnier, G. Papanicolaou, K. Solna, Wave Propagation and Time Reversal in Randomly Layered Media (Springer, 2010)Google Scholar
  15. 15.
    J. Freitas, A. Teixeira, M.S. Dias, Towards a silent speech interface for Portuguese: surface electromyography and the nasality challenge, in Proceedings of the International Conference on Bio-inspired Systems and Signal Processing BIOSIGNALS 2012 (Vilamoura, Algarve, Portugal, 2012)Google Scholar
  16. 16.
    C. Jorgensen, S. Dusan, Speech interfaces based upon surface electromyography. Speech Commun. 52(4), 354–366 (2010)CrossRefGoogle Scholar
  17. 17.
    K. Kalgaonkar, R. Hu, B. Raj, Ultrasonic doppler sensor for voice activity detection. IEEE Signal Process. Lett. 14(10), 754–757 (2007)CrossRefGoogle Scholar
  18. 18.
    R. Kaucic, B. Dalton, A. Blake, Real-time lip tracking for audio-visual speech recognition applications, in Computer Vision ECCV ’96, vol. 1065, Lecture Notes in Computer Science, ed. by B. Buxton, R. Cipolla (Springer, Berlin / Heidelberg, 1996), pp. 376–387Google Scholar
  19. 19.
    M. Kob, C. Neuschaefer-Rube, A method for measurement of the vocal tract impedance at the mouth. Med. Eng. Phys. 24, 467–471 (2002)CrossRefGoogle Scholar
  20. 20.
    R.J. Lahr, Head-worn, Trimodal Device to Increase Transcription Accuracy in a Voice Recognition System and to Process Unvocalized Speech (United States Patent and Trademark Office, United States, 2002)Google Scholar
  21. 21.
    I. McLoughlin, Super-audible voice activity detection. IEEE/ACM Trans. Audio Speech Lang. Process. 22(9), 1424–1433 (2014). doi:10.1109/TASLP.2014.2335055 CrossRefGoogle Scholar
  22. 22.
    I.V. McLoughlin, Applied Speech and Audio Processing (Cambridge University Press, Cambridge, 2009)CrossRefGoogle Scholar
  23. 23.
    I.V. McLoughlin, F. Ahmadi, Method and apparatus for determining mouth state using low frequency ultrasonics. UK Patent Office (pending) (2012)Google Scholar
  24. 24.
    I.V. McLoughlin, F. Ahmadi, A new mechanical index for gauging the human bioeffects of low frequency ultrasound, in Proceedings of the IEEE Engineering in Medicine and Biology Conference, (2013), pp. 1964–1967Google Scholar
  25. 25.
    B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2007)CrossRefGoogle Scholar
  26. 26.
    H.R. Sharifzadeh, I.V. McLoughlin, F. Ahmadi, Speech rehabilitation methods for laryngectomised patients, in Electronic Engineering and Computing Technology, vol. 60, Lecture Notes in Electrical Engineering, ed. by S.I. Ao, L. Gelman (Springer, Netherlands, 2010), pp. 597–607Google Scholar
  27. 27.
    D.J. Sinder, Speech synthesis using an aeroacoustic fricative model (PhD Thesis). The State University of New Jersey (1999)Google Scholar
  28. 28.
    M.M. Sondhi, B. Gopinath, Determination of vocal-tract shape from impulse response at the lips. J. Acoust. Soc. Am. 49(6), 1867–1873 (1971)CrossRefGoogle Scholar
  29. 29.
    B.H. Story, Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract (PhD Thesis). The University of Iowa (1995)Google Scholar
  30. 30.
    B.H. Story, I.R. Titze, E.A. Hoffman, Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am. 100, 1 (1996)Google Scholar
  31. 31.
    Texas Instruments: TIMIT database (Texas Instruments and MIT). a CD-ROM database of phonetically classified recordings of sentences spoken by a number of different male and female speakers (1990)Google Scholar
  32. 32.
    C.A. Tosaya, J.W. Sliwa, Signal Injection Coupling into the Human Vocal Tract for Robust Audible and Inaudible Voice Recognition (United States Patent and Trademark Office, United States, 1999)Google Scholar
  33. 33.
    H.K. Vorperian, S. Wang, M.K. Chung, E.M. Schimek, R.B. Durtschi, R.D. Kent, A.J. Ziegert, L.R. Gentry, Anatomic development of the oral and pharyngeal portions of the vocal tract: an imaging study. J. Acoust. Soc. Am. 125, 1666 (2009)CrossRefGoogle Scholar
  34. 34.
    J. Wolfe, M. Garnier, J. Smith, Vocal tract resonances in speech, singing and playing musical instruments. Hum. Front. Sci. Progr. J. 3, 6–23 (2009)Google Scholar
  35. 35.
    J.A. Zagzebski, Essentials of Ultrasound Physics (Mosby, Elsevier, St. Louis, 1996)Google Scholar
  36. 36.
    A.J. Zuckerwar, Speed of sound in fluids, in Handbook of Acoustics, ed. by M.J. Crocker (Wiley, New York, 1998)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.National Engineering Laboratory of Speech and Language Information ProcessingThe University of Science & Technology of ChinaHefeiChina

Personalised recommendations