Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria

Isaev, Dmitry Yu.; Vlasova, Roza M.; Di Martino, J. Matias; Stephen, Christopher D.; Schmahmann, Jeremy D.; Sapiro, Guillermo; Gupta, Anoopum S.

doi:10.1007/s12311-023-01539-z

Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria

Research
Published: 11 April 2023

Volume 23, pages 459–470, (2024)
Cite this article

The Cerebellum Aims and scope Submit manuscript

Dmitry Yu. Isaev¹,
Roza M. Vlasova²,
J. Matias Di Martino³,
Christopher D. Stephen⁴,
Jeremy D. Schmahmann⁴,
Guillermo Sapiro^1,3,5 &
…
Anoopum S. Gupta⁴

401 Accesses
Explore all metrics

Abstract

Dysarthria is a common manifestation across cerebellar ataxias leading to impairments in communication, reduced social connections, and decreased quality of life. While dysarthria symptoms may be present in other neurological conditions, ataxic dysarthria is a perceptually distinct motor speech disorder, with the most prominent characteristics being articulation and prosody abnormalities along with distorted vowels. We hypothesized that uncertainty of vowel predictions by an automatic speech recognition system can capture speech changes present in cerebellar ataxia. Speech of participants with ataxia (N=61) and healthy controls (N=25) was recorded during the “picture description” task. Additionally, participants’ dysarthric speech and ataxia severity were assessed on a Brief Ataxia Rating Scale (BARS). Eight participants with ataxia had speech and BARS data at two timepoints. A neural network trained for phoneme prediction was applied to speech recordings. Average entropy of vowel tokens predictions (AVE) was computed for each participant’s recording, together with mean pitch and intensity standard deviations (MPSD and MISD) in the vowel segments. AVE and MISD demonstrated associations with BARS speech score (Spearman’s rho=0.45 and 0.51), and AVE demonstrated associations with BARS total (rho=0.39). In the longitudinal cohort, Wilcoxon pairwise signed rank test demonstrated an increase in BARS total and AVE, while BARS speech and acoustic measures did not significantly increase. Relationship of AVE to both BARS speech and BARS total, as well as the ability to capture disease progression even in absence of measured speech decline, indicates the potential of AVE as a digital biomarker for cerebellar ataxia.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language in autism: domains, profiles and co-occurring conditions

Article Open access 16 March 2023

Perception of vocoded speech in domestic dogs

Article Open access 16 April 2024

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

Data Availability

Raw speech samples cannot be shared due to privacy concerns. Deidentified extracted features can be shared by request with qualified investigators.

Abbreviations

A-T:: Ataxia-telangiectasia
ARCA1:: Autosomal recessive cerebellar ataxia type 1
AVE:: Average vowel entropy
ASR:: Automatic speech recognition
BARS:: Brief Ataxia Rating Scale
CANVAS:: Cerebellar ataxia with neuropathy and vestibular areflexia syndrome
FTNN:: Fine-tuned neural network. A wav2vec2 model trained initially on LibriSpeech-960 dataset, then fine-tuned on TIMIT phonetic annotations dataset
FXTAS:: Fragile X associated tremor ataxia syndrome
MGH:: Massachusetts General Hospital
MISD:: Mean of intensity standard deviation per each vowel segment across all segments.
MPSD:: Mean of pitch (F0) standard deviation per each vowel segment across all segments.
MRI:: Magnetic resonance imaging
MSA-C:: Multiple system atrophy, cerebellar type
NN:: Neural network
PER:: Phoneme error rate
PSP-C:: Progressive supranuclear palsy, with predominant cerebellar ataxia
SARA:: Scale for Assessment and Rating of Ataxia
SCA[#N]:: Spinocerebellar ataxia type #N
SPG7:: Spastic paraplegia type 7
TIMIT:: A corpus of speech data for acoustic-phonetic studies, created as a joint effort of Texas Instruments (TI), SRI International and Massachusetts Institute of Technology (MIT)
TP:: Timepoint
VSA:: Vowel space area

References

Klockgether T. Chapter 35 - ataxias. In: Goetz CG, editor. Textbook of clinical neurology (Third Edition). Philadelphia: W.B. Saunders; 2007. p. 765–80.
Chapter Google Scholar
Ziegler W. Chapter 1 - the phonetic cerebellum: cerebellar involvement in speech sound production. In: Mariën P, Manto M, editors. The Linguistic Cerebellum. San Diego: Academic Press; 2016. p. 1–32.
Google Scholar
Duffy, J.R., Motor speech disorders : substrates, differential diagnosis, and management. 2nd ed. 2005, St. Louis, Mo.: Elsevier Mosby. xiii, 578 p.
Gibilisco P, Vogel AP. Friedreich ataxia. BMJ. 2013;347:f7062.
Article PubMed Google Scholar
Kent RD, et al. Ataxic dysarthria. J Speech Lang Hear Res. 2000;43(5):1275–89.
Article CAS PubMed Google Scholar
Darley FL, Aronson AE, Brown JR. Differential diagnostic patterns of dysarthria. J Speech Hear Res. 1969;12(2):246–69.
Article CAS PubMed Google Scholar
Kent RD, et al. A speaking task analysis of the dysarthria in cerebellar disease. Folia Phoniatr Logop. 1997;49(2):63–82.
Article CAS PubMed Google Scholar
Kent RD, Netsell R, Abbs JH. Acoustic characteristics of dysarthria associated with cerebellar disease. J Speech Hear Res. 1979;22(3):627–48.
Article CAS PubMed Google Scholar
Trouillas P, et al. International Cooperative Ataxia Rating Scale for pharmacological assessment of the cerebellar syndrome. The Ataxia Neuropharmacology Committee of the World Federation of Neurology. J Neurol Sci. 1997;145(2):205–11.
Article CAS PubMed Google Scholar
Schmahmann JD, et al. Development of a brief ataxia rating scale (BARS) based on a modified form of the ICARS. Mov Disord. 2009;24(12):1820–8.
Article PubMed PubMed Central Google Scholar
Weyer A, et al. Reliability and validity of the scale for the assessment and rating of ataxia: a study in 64 ataxia patients. Mov Disord. 2007;22(11):1633–7.
Article PubMed Google Scholar
Kewley-Port D, Burkle TZ, Lee JH. Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. J Acoust Soc Am. 2007;122(4):2365–75.
Article PubMed Google Scholar
Lansford KL, Liss JM. Vowel acoustics in dysarthria: speech disorder diagnosis and classification. J Speech Lang Hear Res. 2014;57(1):57–67.
Article PubMed Google Scholar
Lansford KL, Liss JM. Vowel acoustics in dysarthria: mapping to perception. J Speech Lang Hear Res. 2014;57(1):68–80.
Article PubMed Google Scholar
Kent RD, Rountrey C. What acoustic studies tell us about vowels in developing and disordered speech. Am J Speech Lang Pathol. 2020;29(3):1749–78.
Article PubMed PubMed Central Google Scholar
Boersma, P. and D. Weenink, Praat: doing phonetics by computer [Computer program]. Version 6.1.38, retrieved 2 January 2021 from http://www.praat.org. 2021.
Odell K, et al. Perceptual characteristics of vowel and prosody production in apraxic, aphasic, and dysarthric speakers. J Speech Hear Res. 1991;34(1):67.
Article CAS PubMed Google Scholar
Delgado-Hernandez J. Pilot study of the acoustic values of the vowels in Spanish as indicators of the severity of dysarthria. Revista de neurologiá. 2017;64(3):105.
CAS PubMed Google Scholar
Liss JM, et al. Lexical boundary error analysis in hypokinetic and ataxic dysarthria. J Acoust Soc Am. 2000;107(6):3415–24.
Article CAS PubMed Google Scholar
Borrie SA, Lansford KL, Barrett TS. Rhythm perception and its role in perception and learning of dysrhythmic speech. J Speech Lang Hear Res. 2017;60(3):561–70.
Article PubMed Google Scholar
Hertrich I, Ackermann H. Temporal and spectral aspects of coarticulation in ataxic dysarthria: an acoustic analysis. J Speech Lang Hear Res. 1999;42(2):367–81.
Article CAS PubMed Google Scholar
Ackermann H, et al. Phonemic vowel length contrasts in cerebellar disorders. Brain Lang. 1999;67(2):95–109.
Article CAS PubMed Google Scholar
Liu, A.T., et al. Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020.
Song, X., et al. Speech-XLNet: Unsupervised acoustic model pretraining for self-attention networks. 2019. arXiv:1910.10387.
Chi, P.-H., et al. Audio ALBERT: A lite BERT for self-supervised learning of audio representation. in 2021 IEEE Spoken Language Technology Workshop (SLT). 2021.
Liu, A.T., S.-W. Li, H.-y. Lee TERA: Self-supervised learning of transformer encoder representation for speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021. 29: p. 2351–66.
Baevski, A., et al. wav2vec 2.0: a framework for self-supervised learning of speech representations. Advances in neural information processing systems, 2020. 33: p. 12449–60.
Garofolo, John S., et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium, 1993.
Panayotov, V., et al. Librispeech: an ASR corpus based on public domain audio books. in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2015.
Zhu J, Zhang C. Performing forced alignment with Wav2vec 2.0. J Acoust Soc Am. 2021;150(4):A357–7.
Article Google Scholar
Noffs G, et al. Acoustic speech analytics are predictive of cerebellar dysfunction in multiple sclerosis. Cerebellum. 2020;19(5):691–700.
Article PubMed Google Scholar
Vogel AP, et al. Voice in Friedreich Ataxia. J Voice. 2017;31(2):243.e9–243.e19.
Article PubMed Google Scholar
Vogel AP, et al. Features of speech and swallowing dysfunction in pre-ataxic spinocerebellar ataxia type 2. Neurology. 2020;95(2):e194–205.
Article PubMed Google Scholar
Blair IA, et al. The current state of biomarker research for Friedreich’s ataxia: a report from the 2018 FARA biomarker meeting. Future Sci OA. 2019;5(6):Fso398.
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Kashyap B, et al. Quantitative assessment of speech in cerebellar ataxia using magnitude and phase based cepstrum. Ann Biomed Eng. 2020;48(4):1322–36.
Article PubMed Google Scholar
Blaney B, Hewlett N. Dysarthria and Friedreich’s ataxia: what can intelligibility assessment tell us? Int J Lang Commun Disord. 2007;42(1):19–37.
Article PubMed Google Scholar
Kent RD, Vorperian HK. Static measurements of vowel formant frequencies and bandwidths: A review. J Commun Disord. 2018;74:74–97.
Article PubMed PubMed Central Google Scholar
Ludlow CL, Kent RD, Gray LC. Measuring voice, speech, and swallowing in the clinic and laboratory. In: San Diego. United States: Plural Publishing, Incorporated; 2014.
Google Scholar
Zhou H, et al. Assessment of gait and balance impairment in people with spinocerebellar ataxia using wearable sensors. Neurol Sci. 2022;43(4):2589–99.
Article PubMed Google Scholar
Goodglass, H., et al., Boston diagnostic aphasia examination. 2001.
Google Scholar
Chang Z, et al. Accurate detection of cerebellar smooth pursuit eye movement abnormalities via mobile phone video and machine learning. Sci Rep. 2020;10(1):18641.
Article CAS PubMed PubMed Central Google Scholar
Wolf, T., et al. Transformers: state-of-the-art natural language processing. Online: Association for Computational Linguistics 2020.
Lee K, Hon H. Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989;37(11):1641–8.
Article Google Scholar
Shannon CE, Weaver W. The mathematical theory of communication, vol. v. Urbana: University of Illinois Press; 1949. p. 117.
Google Scholar
Jadoul Y, Thompson B, de Boer B. Introducing parselmouth: a python interface to Praat. JPhon. 2018;71:1–15.
Google Scholar
Tukey JW. Exploratory data analysis. Addison-Wesley series in behavioral science, vol. xvi. Reading, Mass: Addison-Wesley Pub. Co; 1977. p. 688.
Google Scholar
Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965;52(3/4):591–611.
Article MathSciNet Google Scholar
Long JS. Regression models for categorical and limited dependent variables. Advanced quantitative techniques in the social sciences, vol. xxx. Thousand Oaks: Sage Publications; 1997. p. 297.
Google Scholar
Folker JE, et al. Differentiating profiles of speech impairments in Friedreich’s ataxia: a perceptual and instrumental approach. Int J Lang Commun Disord. 2012;47(1):65–76.
Article PubMed Google Scholar
Daniloff RG, Hammarberg RE. On defining coarticulation. J Phon. 1973;1(3):239–48.
Article Google Scholar
Stilp CE, Kluender KR. Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility. Proc Natl Acad Sci USA. 2010;107(27):12387–92.
Article CAS PubMed PubMed Central Google Scholar
Shor J, Venugopalan S. TRILLsson: distilled universal paralinguistic speech representations. 2022. arXiv:2203.00236.
Shor, J., et al. Universal paralinguistic speech representations using self-supervised conformers. 2022. in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022.
Korzekwa, D., et al. Interpretable deep learning model for the detection and reconstruction of dysarthric speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019 p. 3890–94.
Kim H, et al. Dysarthric speech database for universal access research. In: Proceedings of the Annual Conference of the International Speech Communication Association: INTERSPEECH; 2008. p. 1741–4.
Weston J et al. Learning de-identified representations of prosody from raw audio. In International Conference on Machine Learning. 2021. PMLR.
Grabe E, Low EL. Durational variability in speech and the rhythm class hypothesis. Lab Phonol. 2002;7:515–46.
Google Scholar
Low EL. Prosodic prominence in singapore english: University of Cambridge; 1998.
Conneau A et al. Unsupervised cross-lingual representation learning for speech recognition. 2020. arXiv:2006.13979.
Malmsten M, Haffenden C, Börjeson L. Hearing voices at the national library -- a speech corpus and acoustic model for the Swedish language. 2022. arXiv:2205.03026.
Xu Q, Baevski A, Auli M. Simple and effective zero-shot cross-lingual phoneme recognition. 2021. arXiv:2109.11680.

Download references

Acknowledgements

We would like to thank Mary Donovan, Winnie Ching, and Nergis Khan for recruitment and data collection.

Funding

This work was partially supported by NIH 1R01-NS117826. Additional support from NSF and the Ataxia-Telangiectasia Children’s Project is also acknowledged.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Duke University, Durham, NC, USA
Dmitry Yu. Isaev & Guillermo Sapiro
Department of Psychiatry, UNC School of Medicine, University of North Carolina, Chapel Hill, NC, USA
Roza M. Vlasova
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
J. Matias Di Martino & Guillermo Sapiro
Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Christopher D. Stephen, Jeremy D. Schmahmann & Anoopum S. Gupta
Departments of Mathematics & Computer Science, Duke University, Durham, NC, USA
Guillermo Sapiro

Authors

Dmitry Yu. Isaev
View author publications
You can also search for this author in PubMed Google Scholar
Roza M. Vlasova
View author publications
You can also search for this author in PubMed Google Scholar
J. Matias Di Martino
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Stephen
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy D. Schmahmann
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Sapiro
View author publications
You can also search for this author in PubMed Google Scholar
Anoopum S. Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.Yu.I., R.M.V., J.M.D.M., G.S., and A.S.G. contributed to the conception and design of the study; C.D.S., J.D.S., and A.S.G. contributed to acquisition of the data; D.Yu.I., J.M.D.M., G.S., and A.S.G. contributed to the analysis of the data; D.Yu.I., R.M.V., J.M.D.M., G.S., and A.S.G. contributed to drafting the text and figures; all authors revised the manuscript for intellectual content.

Corresponding author

Correspondence to Dmitry Yu. Isaev.

Ethics declarations

Ethical Approval

All experimental protocols were approved by the Partners Healthcare Institutional Review Board (Protocol# 2016P001048) and were in accordance with guidelines of the Declaration of Helsinki.

Consent to participate

All participants provided written informed consent and/or assent prior to participation in the study.

Consent for Publication

Not applicable.

Competing Interests

G.S. is also affiliated with Apple, Inc.; the work here reported was initiated before such affiliation and is independent of it.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Endnotes

¹ Fine-tuning was done following the same procedures as in the original speech-to-text training in Baevski et al. (2020). The original paper by Baevski et al. (2020) reported PER of 8.3 on TIMIT dataset; however, the model was not released, and likely the discrepancy is caused by a slight difference in training parameters.

² The code for average vowel entropy computation is made publicly available at https://github.com/dyisaev/average-vowel-entropy.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Isaev, D.Y., Vlasova, R.M., Di Martino, J.M. et al. Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria. Cerebellum 23, 459–470 (2024). https://doi.org/10.1007/s12311-023-01539-z

Download citation

Accepted: 27 February 2023
Published: 11 April 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12311-023-01539-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria

Abstract

Access this article

Similar content being viewed by others

Language in autism: domains, profiles and co-occurring conditions

Perception of vocoded speech in domestic dogs

Early dementia detection with speech analysis and machine learning techniques

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to participate

Consent for Publication

Competing Interests

Additional information

Publisher’s Note

Endnotes

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria

Abstract

Access this article

Similar content being viewed by others

Language in autism: domains, profiles and co-occurring conditions

Perception of vocoded speech in domestic dogs

Early dementia detection with speech analysis and machine learning techniques

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to participate

Consent for Publication

Competing Interests

Additional information

Publisher’s Note

Endnotes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation