Optical flow based waveform for the assessment of the vocal fold vibrations

  • Heyfa AmmarEmail author
Scientific Paper


Assessing vocal fold (VF) vibrations is important for the diagnosis of several diseases, and is made possible through the analysis of videoendoscopy recordings. However, the visual analysis of these recordings is hard due to the high acquisition rate. For this reason, it is commonly used to extract the laryngeal activity information from the recordings and represent it in a way suitable to be visually analyzed. Waveforms, images and playbacks are examples of representations reported in the literature. The main limitation of some of them is the lack of precisely locating the pathology within the VFs. Whereas others require the segmentation of the glottis in all the images of the video which is a complex and hard task given the high amount of images in the video and the necessity for the user intervention. To overcome these problems, the present study proposes a new waveform that maps the local vibrations of the VFs without the need for segmenting all the images of the video. Instead, the segmentation is restricted to only one image per vibratory cycle. Then, a new optical flow based technique is proposed to deduce the cycle-to-cycle dynamics of the VFs. The ability of the proposed approach to provide a reliable visual assessment is experimentally evaluated using different types of phonation and different vocal pathologies.


Vocal fold vibrations Waveform Optical flow Pathology Cycle-to-cycle analysis 



This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, Saudi Arabia (Grant No. J-552-612-38). The author, therefore, gratefully acknowledges the DSR financial support.

Compliance with ethical standards

Conflict of interest

The author declares having no competing interests.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Research involving human participants and/or animals

The video recordings related to healthy phonation as well as the EGG and audio signals used in this paper are provided by Erkki Bianco ( & Gilles Degottex, IRCAM (Institut de Recherche et Coordination Acoustique/Musique,, upon permission. The recordings are part of the USC_2008 database produced by the owners. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The recordings related to pathological phonation are publicly available at

Supplementary material

13246_2018_717_MOESM1_ESM.pdf (526 kb)
Supplementary material 1 (pdf 526 KB)


  1. 1.
    Islam MS, Parvez I, Deng H, Goswami P (2014) Performance comparison of heterogeneous classifiers for detection of Parkinson’s disease using voice disorder (dysphonia). In: International conference on informatics, electronics & vision (ICIEV), IEEE, pp 1–7Google Scholar
  2. 2.
    Hartelius L, Svensson P (1994) Speech and swallowing symptoms associated with Parkinson’s disease and multiple sclerosis: a survey. Folia Phoniatr Logop 46(1):9–17CrossRefGoogle Scholar
  3. 3.
    Yamauchi A, Yokonishi H, Imagawa H, Sakakibara K-I, Nito T, Tayama N, Yamasoba T (2016) Quantification of vocal fold vibration in various laryngeal disorders using high-speed digital imaging. J Voice 30(2):205–214CrossRefGoogle Scholar
  4. 4.
    Deliyski D, Kendall K, Leonard R (2010) Laryngeal high-speed videoendoscopy. In: Laryngeal evaluation: indirect laryngoscopy to high-speed digital imaging. Thieme, New York, pp 245–270Google Scholar
  5. 5.
    Deliyski DD, Petrushev PP, Bonilha HS, Gerlach TT, Martin-Harris B, Hillman RE (2007) Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution. Folia Phoniatr Logop 60(1):33–44CrossRefGoogle Scholar
  6. 6.
    Titze IR (1984) Parameterization of the glottal area, glottal flow, and vocal fold contact area. J Acoust Soc Am 75(2):570–580CrossRefGoogle Scholar
  7. 7.
    Lohscheller J, Eysholdt U, Toy H, Dollinger M (2008) Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans Med Imaging 27(3):300–309CrossRefGoogle Scholar
  8. 8.
    Karakozoglou S-Z, Henrich N, d’Alessandro C, Stylianou Y (2012) Automatic glottal segmentation using local-based active contours and application to glottovibrography. Speech Commun 54(5):641–654CrossRefGoogle Scholar
  9. 9.
    Wittenberg T, Tigges M, Mergell P, Eysholdt U (2000) Functional imaging of vocal fold vibration: digital multislice high-speed kymography. J Voice 14(3):422–442CrossRefGoogle Scholar
  10. 10.
    Hertegård S, Larsson H, Wittenberg T (2003) High-speed imaging: applications and development. Logop Phoniatr Vocol 28(3):133–139CrossRefGoogle Scholar
  11. 11.
    Qiu Q, Schutte H, Gu L, Yu Q (2003) An automatic method to quantify the vibration properties of human vocal folds via videokymography. Folia Phoniatr Logop 55(3):128–136CrossRefGoogle Scholar
  12. 12.
    Andrade-Miranda G, Bernardoni NH, Godino-Llorente JI (2017) Synthesizing the motion of the vocal folds using optical flow based techniques. Biomed Signal Process Control 34:25–35CrossRefGoogle Scholar
  13. 13.
    Löfqvist A, Yoshioka H (1980) Laryngeal activity in Swedish obstruent clusters. J Acoust Soc Am 68(3):792–801CrossRefGoogle Scholar
  14. 14.
    Löfqvist A, McGarr NS, Honda K (1984) Laryngeal muscles and articulatory control. J Acoust Soc Am 76(3):951–954CrossRefGoogle Scholar
  15. 15.
    Degottex G, Bianco E, Rodet X (2008) Usual to particular phonatory situations studied with high-speed videoendoscopy. In: International conference on voice physiology and biomechanics, pp 19–26Google Scholar
  16. 16.
    Degottex G (2010) Glottal source and vocal-tract separation: estimation of glottal parameters, voice transformation and synthesis using a glottal model. Ph.D. thesis, Paris 6Google Scholar
  17. 17.
    Samet H, Tamminen M (1988) Efficient component labeling of images of arbitrary dimension represented by linear bintrees. IEEE Trans Pattern Anal Mach Intell 10(4):579–586CrossRefGoogle Scholar
  18. 18.
    Jiang JJ, Chang C-I, Raviv JR, Gupta S, Banzali FM, Hanson DG (2000) Quantitative study of mucosal wave via videokymography in canine larynges. Laryngoscope 110(9):1567–1573CrossRefGoogle Scholar
  19. 19.
    Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203CrossRefGoogle Scholar
  20. 20.
    Black MJ, Anandan P (1996) The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput Vis Image Underst 63(1):75–104CrossRefGoogle Scholar
  21. 21.
    Wedel A, Pock T, Zach C, Bischof H, Cremers D (2009) An improved algorithm for tv-l 1 optical flow. In: Statistical and geometrical approaches to visual motion analysis. Springer, Heidelberg, pp 23–45CrossRefGoogle Scholar
  22. 22.
    Sun D, Roth S, Black MJ (2010) Secrets of optical flow estimation and their principles. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2432–2439Google Scholar
  23. 23.
    Chan TF, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10(2):266–277CrossRefGoogle Scholar
  24. 24.
    Lankton S, Tannenbaum A (2008) Localizing region-based active contours. IEEE Trans Image Process 17(11):2029–2039CrossRefGoogle Scholar
  25. 25.
    Howard DM, Lindsey GA, Allen B (1990) Toward the quantification of vocal efficiency. J Voice 4(3):205–212CrossRefGoogle Scholar
  26. 26.
    Baker S, Scharstein D, Lewis J, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92(1):1–31CrossRefGoogle Scholar
  27. 27.
    Howard DM (1995) Variation of electrolaryngographically derived closed quotient for trained and untrained adult female singers. J Voice 9(2):163–172CrossRefGoogle Scholar
  28. 28.
    Larsson H, Hertegård S, Lindestad P-Å, Hammarberg B (2000) Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report. Laryngoscope 110(12):2117–2122CrossRefGoogle Scholar

Copyright information

© Australasian College of Physical Scientists and Engineers in Medicine 2018

Authors and Affiliations

  1. 1.King Abdulaziz UniversityJeddahKingdom of Saudi Arabia
  2. 2.University of Tunis El ManarTunisTunisia

Personalised recommendations