Optical flow based waveform for the assessment of the vocal fold vibrations
Assessing vocal fold (VF) vibrations is important for the diagnosis of several diseases, and is made possible through the analysis of videoendoscopy recordings. However, the visual analysis of these recordings is hard due to the high acquisition rate. For this reason, it is commonly used to extract the laryngeal activity information from the recordings and represent it in a way suitable to be visually analyzed. Waveforms, images and playbacks are examples of representations reported in the literature. The main limitation of some of them is the lack of precisely locating the pathology within the VFs. Whereas others require the segmentation of the glottis in all the images of the video which is a complex and hard task given the high amount of images in the video and the necessity for the user intervention. To overcome these problems, the present study proposes a new waveform that maps the local vibrations of the VFs without the need for segmenting all the images of the video. Instead, the segmentation is restricted to only one image per vibratory cycle. Then, a new optical flow based technique is proposed to deduce the cycle-to-cycle dynamics of the VFs. The ability of the proposed approach to provide a reliable visual assessment is experimentally evaluated using different types of phonation and different vocal pathologies.
KeywordsVocal fold vibrations Waveform Optical flow Pathology Cycle-to-cycle analysis
This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, Saudi Arabia (Grant No. J-552-612-38). The author, therefore, gratefully acknowledges the DSR financial support.
Compliance with ethical standards
Conflict of interest
The author declares having no competing interests.
Informed consent was obtained from all individual participants included in the study.
Research involving human participants and/or animals
The video recordings related to healthy phonation as well as the EGG and audio signals used in this paper are provided by Erkki Bianco (email@example.com) & Gilles Degottex, IRCAM (Institut de Recherche et Coordination Acoustique/Musique, firstname.lastname@example.org), upon permission. The recordings are part of the USC_2008 database produced by the owners. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The recordings related to pathological phonation are publicly available at www.entusa.com.
- 1.Islam MS, Parvez I, Deng H, Goswami P (2014) Performance comparison of heterogeneous classifiers for detection of Parkinson’s disease using voice disorder (dysphonia). In: International conference on informatics, electronics & vision (ICIEV), IEEE, pp 1–7Google Scholar
- 4.Deliyski D, Kendall K, Leonard R (2010) Laryngeal high-speed videoendoscopy. In: Laryngeal evaluation: indirect laryngoscopy to high-speed digital imaging. Thieme, New York, pp 245–270Google Scholar
- 15.Degottex G, Bianco E, Rodet X (2008) Usual to particular phonatory situations studied with high-speed videoendoscopy. In: International conference on voice physiology and biomechanics, pp 19–26Google Scholar
- 16.Degottex G (2010) Glottal source and vocal-tract separation: estimation of glottal parameters, voice transformation and synthesis using a glottal model. Ph.D. thesis, Paris 6Google Scholar
- 22.Sun D, Roth S, Black MJ (2010) Secrets of optical flow estimation and their principles. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2432–2439Google Scholar