Mouth State Detection From Low-Frequency Ultrasonic Reflection

Abstract

This paper develops, simulates and experimentally evaluates a novel method based on non-contact low frequency (LF) ultrasound which can determine, from airborne reflection, whether the lips of a subject are open or closed. The method is capable of accurately distinguishing between open and closed lip states through the use of a low-complexity detection algorithm, and is highly robust to interfering audible noise. A novel voice activity detector is implemented and evaluated using the proposed method and shown to detect voice activity with high accuracy, even in the presence of high levels of background noise. The lip state detector is evaluated at a number of angles of incidence to the mouth and under various conditions of background noise. The underlying mouth state detection technique relies upon an inaudible LF ultrasonic excitation, generated in front of the face of a user, either reflecting back from their face as a simple echo in the closed mouth state or resonating inside the open mouth and vocal tract, affecting the spectral response of the reflected wave when the mouth is open. The difference between echo and resonance behaviours is used as the basis for automated lip opening detection, which implies determining whether the mouth is open or closed at the lips. Apart from this, potential applications include use in voice generation prosthesis for speech impaired patients, or as a hands-free control for electrolarynx and similar rehabilitation devices. It is also applicable to silent speech interfaces and may have use for speech authentication.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    The speed of sound is approximated to 1,600 m/s in muscle and 343 m/s in air.

  2. 2.

    The six vowel geometries are: /i/, /æ/, /u/, /ɛ/, /ɔ/, /o/ as in heed, had, who, head, paw, and hoe respectively.

  3. 3.

    The conversion was made using the lpcaa2rf() and lpcrf2ar() functions from the excellent Voicebox package [9]

  4. 4.

    Office and Car recordings were obtained as 96 kHz, 24- and 32-bit sample files from Freesound.org (nos. 108695 and 193780 respectively), recorded on Tascam DR-100 mk-II using on board directional condenser microphones (TEAC Corp., Tokyo, Japan). Other recordings were made by the author using the on board directional condenser microphones of a Zoom H4n (Zoom Corp., Tokyo, Japan), recorded at a 96 kHz sample rate with 16-bit resolution. The original recordings are available upon request.

  5. 5.

    Note that, since the system detects lip opening rather than speaking, it is possible that some of these false detections did actually correspond to non-speech lip opening events if the subject opened their lips, for example to breathe through their mouth.

References

  1. 1.

    F. Ahmadi, Voice replacement for the severely speech impaired through sub-ultrasonic excitation of the vocal tract. Ph.D. Thesis, Nanyang Technological University (2013). http://repository.ntu.edu.sg/handle/10356/52661

  2. 2.

    F. Ahmadi, M. Ahmadi, I.V. McLoughlin, Human mouth state detection using low frequency ultrasound, in INTERSPEECH, (2013) pp. 1806–1810

  3. 3.

    F. Ahmadi, I.V. McLoughlin, The use of low-frequency ultrasonics in speech processing, in Signal Processing, ed. by Sebastian Miron (InTech, 2010). ISBN: 978-953-7619-91-6

  4. 4.

    F. Ahmadi, I.V. McLoughlin, Measuring resonances of the vocal tract using frequency sweeps at the lips, in 2012 5th International Symposium on Communications Control and Signal Processing (ISCCSP) (2012)

  5. 5.

    F. Ahmadi, I.V. McLoughlin, S. Chauhan, G. ter Haar, Bio-effects and safety of low-intensity, low-frequency ultrasonic exposure. Progr. Biophys. Mol. Biol. 108, 3 (2012)

    Article  Google Scholar 

  6. 6.

    F. Ahmadi, I.V. McLoughlin, H.R. Sharifzadeh, Autoregressive modelling for linear prediction of ultrasonic speech, in INTERSPEECH, (2010), pp. 1616–1619

  7. 7.

    S.P. Arjunan, H. Weghorn, D.K. Kumar, W.C. Yau, Vowel recognition of English and German language using facial movement (SEMG) for speech control based HCI, in Proceedings of the HCSNet workshop on Use of vision in human–computer interaction—Volume 56, VisHCI ’06, ( Australian Computer Society, Inc. 2006), pp. 13–18

  8. 8.

    D. Beautemps, P. Badin, R. Laboissihere, Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data. Speech Commun. 16, 27–47 (1995)

    Article  Google Scholar 

  9. 9.

    M. Brookes, et al., Voicebox: Speech processing toolbox for matlab. Software, available [Mar. 2011] from www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (1997)

  10. 10.

    G.L. Calhoun, G.R. McMillan, Hands-free input devices for wearable computers, in Proceedings of the Fourth Symposium on Human Interaction with Complex Systems, HICS ’98, (IEEE Computer Society 1998) p. 118

  11. 11.

    B.G. Douglass, Apparatus and Method for Detecting Speech Using Acoustic Signals Outside the Audible Frequency Range (United States Patent and Trademark Office, United States, 2006)

    Google Scholar 

  12. 12.

    J. Epps, J.R. Smith, J. Wolfe, A novel instrument to measure acoustic resonances of the vocal tract during speech. Meas. Sci. Technol. 8, 1112–1121 (1997)

    Article  Google Scholar 

  13. 13.

    L.J. Eriksson, Higher order mode effects in circular ducts and expansion chambers. J. Acoust. Soc. Am. 68(2), 545–550 (1980)

    Article  MathSciNet  Google Scholar 

  14. 14.

    J.-P. Fouque, J. Garnier, G. Papanicolaou, K. Solna, Wave Propagation and Time Reversal in Randomly Layered Media (Springer, 2010)

  15. 15.

    J. Freitas, A. Teixeira, M.S. Dias, Towards a silent speech interface for Portuguese: surface electromyography and the nasality challenge, in Proceedings of the International Conference on Bio-inspired Systems and Signal Processing BIOSIGNALS 2012 (Vilamoura, Algarve, Portugal, 2012)

  16. 16.

    C. Jorgensen, S. Dusan, Speech interfaces based upon surface electromyography. Speech Commun. 52(4), 354–366 (2010)

    Article  Google Scholar 

  17. 17.

    K. Kalgaonkar, R. Hu, B. Raj, Ultrasonic doppler sensor for voice activity detection. IEEE Signal Process. Lett. 14(10), 754–757 (2007)

    Article  Google Scholar 

  18. 18.

    R. Kaucic, B. Dalton, A. Blake, Real-time lip tracking for audio-visual speech recognition applications, in Computer Vision ECCV ’96, vol. 1065, Lecture Notes in Computer Science, ed. by B. Buxton, R. Cipolla (Springer, Berlin / Heidelberg, 1996), pp. 376–387

  19. 19.

    M. Kob, C. Neuschaefer-Rube, A method for measurement of the vocal tract impedance at the mouth. Med. Eng. Phys. 24, 467–471 (2002)

    Article  Google Scholar 

  20. 20.

    R.J. Lahr, Head-worn, Trimodal Device to Increase Transcription Accuracy in a Voice Recognition System and to Process Unvocalized Speech (United States Patent and Trademark Office, United States, 2002)

    Google Scholar 

  21. 21.

    I. McLoughlin, Super-audible voice activity detection. IEEE/ACM Trans. Audio Speech Lang. Process. 22(9), 1424–1433 (2014). doi:10.1109/TASLP.2014.2335055

    Article  Google Scholar 

  22. 22.

    I.V. McLoughlin, Applied Speech and Audio Processing (Cambridge University Press, Cambridge, 2009)

    Google Scholar 

  23. 23.

    I.V. McLoughlin, F. Ahmadi, Method and apparatus for determining mouth state using low frequency ultrasonics. UK Patent Office (pending) (2012)

  24. 24.

    I.V. McLoughlin, F. Ahmadi, A new mechanical index for gauging the human bioeffects of low frequency ultrasound, in Proceedings of the IEEE Engineering in Medicine and Biology Conference, (2013), pp. 1964–1967

  25. 25.

    B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2007)

    Article  Google Scholar 

  26. 26.

    H.R. Sharifzadeh, I.V. McLoughlin, F. Ahmadi, Speech rehabilitation methods for laryngectomised patients, in Electronic Engineering and Computing Technology, vol. 60, Lecture Notes in Electrical Engineering, ed. by S.I. Ao, L. Gelman (Springer, Netherlands, 2010), pp. 597–607

  27. 27.

    D.J. Sinder, Speech synthesis using an aeroacoustic fricative model (PhD Thesis). The State University of New Jersey (1999)

  28. 28.

    M.M. Sondhi, B. Gopinath, Determination of vocal-tract shape from impulse response at the lips. J. Acoust. Soc. Am. 49(6), 1867–1873 (1971)

    Article  Google Scholar 

  29. 29.

    B.H. Story, Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract (PhD Thesis). The University of Iowa (1995)

  30. 30.

    B.H. Story, I.R. Titze, E.A. Hoffman, Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am. 100, 1 (1996)

    Google Scholar 

  31. 31.

    Texas Instruments: TIMIT database (Texas Instruments and MIT). a CD-ROM database of phonetically classified recordings of sentences spoken by a number of different male and female speakers (1990)

  32. 32.

    C.A. Tosaya, J.W. Sliwa, Signal Injection Coupling into the Human Vocal Tract for Robust Audible and Inaudible Voice Recognition (United States Patent and Trademark Office, United States, 1999)

    Google Scholar 

  33. 33.

    H.K. Vorperian, S. Wang, M.K. Chung, E.M. Schimek, R.B. Durtschi, R.D. Kent, A.J. Ziegert, L.R. Gentry, Anatomic development of the oral and pharyngeal portions of the vocal tract: an imaging study. J. Acoust. Soc. Am. 125, 1666 (2009)

    Article  Google Scholar 

  34. 34.

    J. Wolfe, M. Garnier, J. Smith, Vocal tract resonances in speech, singing and playing musical instruments. Hum. Front. Sci. Progr. J. 3, 6–23 (2009)

    Google Scholar 

  35. 35.

    J.A. Zagzebski, Essentials of Ultrasound Physics (Mosby, Elsevier, St. Louis, 1996)

  36. 36.

    A.J. Zuckerwar, Speed of sound in fluids, in Handbook of Acoustics, ed. by M.J. Crocker (Wiley, New York, 1998)

    Google Scholar 

Download references

Acknowledgments

Some of the data for this paper was recorded and processed at the School of Computer Engineering, Nanyang Technological University (NTU), Singapore by student assistants Farzaneh Ahmadi, Mark Huan, and Chu Thanh Minh. Their contribution to this work is gratefully acknowledged, particularly the PhD research of Farzaneh Ahmadi [1]. Thanks are also due to Prof. Eng Siong Chng of NTU, and Jingjie Li of USTC for their assistance with the experimental work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ian Vince McLoughlin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McLoughlin, I.V., Song, Y. Mouth State Detection From Low-Frequency Ultrasonic Reflection. Circuits Syst Signal Process 34, 1279–1304 (2015). https://doi.org/10.1007/s00034-014-9904-4

Download citation

Keywords

  • Lip state detection
  • Low frequency ultrasound
  • Mouth state detection
  • Speech activity detection
  • Voice activity detection