Skip to main content

Accelerated Pseudo 3D Dynamic Speech MR Imaging at 3T Using Unsupervised Deep Variational Manifold Learning

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)


Magnetic resonance imaging (MRI) of vocal tract shaping and surrounding articulators during speaking is a powerful tool in several application areas such as understanding language disorder, informing treatment plans in oro-pharyngeal cancers. However, this is a challenging task due to fundamental tradeoffs between spatio-temporal resolution, organ coverage, and signal-to-noise ratio. Current volumetric vocal tract MR methods are either restricted to image during sustained sounds, or does dynamic imaging at highly compromised spatio-temporal resolutions for slowly moving articulators. In this work, we propose a novel unsupervised deep variational manifold learning approach to recover a “pseudo-3D” dynamic speech dataset from sequential acquisition of multiple 2D slices during speaking. We demonstrate “pseudo-3D” (or time aligned multi-slice 2D) dynamic imaging at a high temporal resolution of 18 ms capable of resolving vocal tract motion for arbitrary speech tasks. This approach jointly learns low-dimensional latent vectors corresponding to the image time frames and parameters of a 3D convolutional neural network based generator that generates volumes of the deforming vocal tract by minimizing a cost function which enforce: a) temporal smoothness on the latent vectors; b) \(l_1\) norm based regularization on generator weights; c) latent vectors of all the slices to have zero mean and unit variance Gaussian distribution; and d) data consistency with measured k-space v/s time data. We evaluate our proposed method using in-vivo vocal tract airway datasets from two normal volunteers producing repeated speech tasks, and compare it against state of the art 2D and 3D dynamic compressed sensing (CS) schemes in speech MRI. We finally demonstrate (for the first time) extraction of quantitative 3D vocal tract area functions from under-sampled 2D multi-slice datasets to characterize vocal tract shape changes in 3D during speech production. Code:

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Aggarwal, H.K., Mani, M.P., Jacob, M.: MoDL: model-based deep learning architecture for inverse problems. IEEE Trans. Med. Imaging 38(2), 394–405 (2018)

    Article  Google Scholar 

  2. Burdumy, M., et al.: Acceleration of MRI of the vocal tract provides additional insight into articulator modifications. J. Magn. Reson. Imaging 42(4), 925–935 (2015)

    Article  Google Scholar 

  3. Cheng, J.Y., Zhang, T., Alley, M.T., Lustig, M., Vasanawala, S.S., Pauly, J.M.: Variable-density radial view-ordering and sampling for time-optimized 3d cartesian imaging. In: Proceedings of the ISMRM Workshop on Data Sampling and Image Reconstruction (2013)

    Google Scholar 

  4. Fu, M., et al.: High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn. Reson. Med. 73(5), 1820–1832 (2015)

    Article  Google Scholar 

  5. Iltis, P.W., Frahm, J., Voit, D., Joseph, A.A., Schoonderwaldt, E., Altenmüller, E.: High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players. Quant. Imaging Med. Surg. 5(3), 374 (2015)

    Google Scholar 

  6. Isaieva, K., Laprie, Y., Leclère, J., Douros, I.K., Felblinger, J., Vuissoz, P.A.: Multimodal dataset of real-time 2d and static 3d MRI of healthy French speakers. Sci. Data 8(1), 1–9 (2021)

    Article  Google Scholar 

  7. Javed, A., Kim, Y.C., Khoo, M.C., Ward, S.L.D., Nayak, K.S.: Dynamic 3-d MR visualization and detection of upper airway obstruction during sleep using region-growing segmentation. IEEE Trans. Biomed. Eng. 63(2), 431–437 (2015)

    Article  Google Scholar 

  8. Kim, J., Kumar, N., Lee, S., Narayanan, S.: Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data. In: International Seminar on Speech Production (ISSP), pp. 222–225. Citeseer (2014)

    Google Scholar 

  9. Kim, Y.C., Lebel, R.M., Wu, Z., Ward, S.L.D., Khoo, M.C., Nayak, K.S.: Real-time 3d magnetic resonance imaging of the pharyngeal airway in sleep apnea. Magn. Reson. Med. 71(4), 1501–1510 (2014)

    Article  Google Scholar 

  10. Lim, Y., et al.: A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3d volumetric images. Sci. Data 8(1), 1–14 (2021)

    Article  Google Scholar 

  11. Lim, Y., Zhu, Y., Lingala, S.G., Byrd, D., Narayanan, S., Nayak, K.S.: 3d dynamic MRI of the vocal tract during natural speech. Magn. Reson. Med. 81(3), 1511–1520 (2019)

    Article  Google Scholar 

  12. Lingala, S.G., Sutton, B.P., Miquel, M.E., Nayak, K.S.: Recommendations for real-time speech MRI. J. Magn. Reson. Imaging 43(1), 28–44 (2016)

    Article  Google Scholar 

  13. Lingala, S.G., Zhu, Y., Kim, Y.C., Toutios, A., Narayanan, S., Nayak, K.S.: A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med. 77(1), 112–125 (2017)

    Article  Google Scholar 

  14. Miquel, M.E., Freitas, A.C., Wylezinska, M.: Evaluating velopharyngeal closure with real-time MRI. Pediatr. Radiol. 45(6), 941–942 (2015)

    Article  Google Scholar 

  15. Niebergall, A., et al.: Real-time MRI of speaking at a resolution of 33 ms: undersampled radial flash with nonlinear inverse reconstruction. Magn. Reson. Med. 69(2), 477–485 (2013)

    Article  Google Scholar 

  16. Sandino, C.M., Lai, P., Vasanawala, S.S., Cheng, J.Y.: Accelerating cardiac cine MRI using a deep learning-based ESPIRiT reconstruction. Magn. Reson. Med. 85(1), 152–167 (2021)

    Article  Google Scholar 

  17. Scott, A.D., Wylezinska, M., Birch, M.J., Miquel, M.E.: Speech MRI: morphology and function. Phys. Med. 30(6), 604–618 (2014)

    Article  Google Scholar 

  18. Skordilis, Z.I., Toutios, A., Töger, J., Narayanan, S.: Estimation of vocal tract area function from volumetric magnetic resonance imaging. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 924–928. IEEE (2017)

    Google Scholar 

  19. Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am. 100(1), 537–554 (1996)

    Article  Google Scholar 

  20. Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions for an adult female speaker based on volumetric imaging. J. Acoust. Soc. Am. 104(1), 471–487 (1998)

    Article  Google Scholar 

  21. Tamir, J.I., Ong, F., Cheng, J.Y., Uecker, M., Lustig, M.: Generalized magnetic resonance image reconstruction using the Berkeley advanced reconstruction toolbox. In: ISMRM Workshop on Data Sampling and Image Reconstruction, Sedona, AZ (2016)

    Google Scholar 

  22. Uecker, M., et al.: ESPIRiT–an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA. Magn. Reson. Med. 71(3), 990–1001 (2014)

    Article  Google Scholar 

  23. Zou, Q., Ahmed, A.H., Nagpal, P., Kruger, S., Jacob, M.: Dynamic imaging using a deep generative SToRM (Gen-SToRM) model. IEEE Trans. Med. Imaging 40(11), 3102–3112 (2021)

    Article  Google Scholar 

  24. Zou, Q., Ahmed, A.H., Nagpal, P., Priya, S., Schulte, R., Jacob, M.: Variational manifold learning from incomplete data: application to multislice dynamic MRI. IEEE Trans. Med. Imaging (in press)

    Google Scholar 

Download references


This work was conducted on an MRI instrument funded by NIH-S10 instrumentation grant: 1S10OD025025-01. We also acknowledge Yongwan Lim (University of Southern California, USA) for providing example code for vocal tract area function estimation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rushdi Zahid Rusho .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1513 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rusho, R.Z., Zou, Q., Alam, W., Erattakulangara, S., Jacob, M., Lingala, S.G. (2022). Accelerated Pseudo 3D Dynamic Speech MR Imaging at 3T Using Unsupervised Deep Variational Manifold Learning. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13436. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16445-3

  • Online ISBN: 978-3-031-16446-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics