Skip to main content

Instantaneous Physiological Estimation Using Video Transformers

  • Chapter
  • First Online:
Multimodal AI in Healthcare

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1060))

Abstract

Video-based physiological signal estimation has been limited primarily to predicting episodic scores in windowed intervals. While these intermittent values are useful, they provide an incomplete picture of patients’ physiological status and may lead to late detection of critical conditions. We propose a video Transformer for estimating instantaneous heart rate and respiration rate from face videos. Physiological signals are typically confounded by alignment errors in space and time. To overcome this, we formulated the loss in the frequency domain. We evaluated the method on the large scale Vision-for-Vitals (V4V) benchmark. It outperformed both shallow and deep learning based methods for instantaneous respiration rate estimation. In the case of heart-rate estimation, it achieved an instantaneous-MAE of 13.0 beats-per-minute.

Github link: https://github.com/revanurambareesh/instantaneous_transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Block, R. C., Yavarimanesh, M., Natarajan, K., Carek, A., Mousavi, A., Chandrasekhar, A., Kim, C. S., Zhu, J., Schifitto, G., & Mestha, L.K., et al. (2020). Conventional pulse transit times as markers of blood pressure changes in humans. Scientific Reports, 10(1).

    Google Scholar 

  2. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: European Conference on Computer Vision.

    Google Scholar 

  3. Chen, W., & McDuff, D. (2018). Deepphys: Video-based physiological measurement using convolutional attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV).

    Google Scholar 

  4. Dasari, A., Prakash, S. K. A., Jeni, L. A., & Tucker, C. (2021). Evaluation of biases in remote photoplethysmography methods. NPJ Digital Medicene.

    Google Scholar 

  5. De Haan, G., & Jeanne, V. (2013). Robust pulse rate from chrominance-based rppg. IEEE Transactions on Biomedical Engineering, 60(10).

    Google Scholar 

  6. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16 \(\times \) 16 words: Transformers for image recognition at scale. arXiv:2010.11929.

  7. Gideon, J., & Stent, S. (2021). The way to my heart is through contrastive learning: Remote photoplethysmography from unlabelled video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision.

    Google Scholar 

  8. Hill, B., Liu, X., & McDuff, D. (2021). Beat-to-beat cardiac pulse rate measurement from video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

    Google Scholar 

  9. Lin, K., Wang, L., & Liu, Z. (2021). End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1954–1963), June 2021.

    Google Scholar 

  10. Liu, X., Fromm, J., Patel, S., & McDuff, D. (2020). Multi-task temporal shift attention networks for on-device contactless vitals measurement. arXiv:2006.03790.

  11. Lu, H., Han, H., & Zhou, S. K. (2021). Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  12. McDuff, D., & Blackford, E. (2019). iphys: An open non-contact imaging-based physiological measurement toolbox. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE.

    Google Scholar 

  13. Neimark, D., Bar, O., Zohar, M., & Asselmann, D. (2021). Video transformer network. arXiv:2102.00719.

  14. Niu, X., Yu, Z., Han, H., Li, X., Shan, S., & Zhao, G. (2020). Video-based remote physiological measurement via cross-verified feature disentangling. In: European Conference on Computer Vision.

    Google Scholar 

  15. Pereira, T., Tran, N., Gadhoumi, K., M. Pelter, M., Do, D.H., Lee, R.J., Colorado, R., Meisel, K., & Hu, X. (2020). Photoplethysmography based atrial fibrillation detection: a review. NPJ Digital Medicene.

    Google Scholar 

  16. Poh, M. Z., McDuff, D. J., & Picard, R. W. (2010). Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics Express, 18(10).

    Google Scholar 

  17. Prakash, S. K. A., & Tucker, C. S. (2018). Bounded kalman filter method for motion-robust, non-contact heart rate estimation. Biomedical Optics Express, 9(2).

    Google Scholar 

  18. Revanur, A., Li, Z., Ciftci, U. A., Yin, L., & Jeni, L. A. (2021). The first vision for vitals (v4v) challenge for non-contact video-based physiological estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

    Google Scholar 

  19. Stent, S., & Gideon, J. (2021). Estimating heart rate from unlabelled video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

    Google Scholar 

  20. Tarassenko, L., Villarroel, M., Guazzi, A., Jorge, J., Clifton, D., & Pugh, C. (2014). Non-contact video-based vital sign monitoring using ambient light and auto-regressive models. Physiological Measurement, 35(5).

    Google Scholar 

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: Advances in neural information processing systems (pp. 5998–6008).

    Google Scholar 

  22. Verkruysse, W., Svaasand, L. O., & Nelson, J. S. (2008). Remote plethysmographic imaging using ambient light. Optics Express, 16(26).

    Google Scholar 

  23. Wang, W., den Brinker, A. C., Stuijk, S., De Haan, G. (2016). Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering, 64(7).

    Google Scholar 

  24. Wu, H. Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W. T. (2012). Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics (Proceedings of the SIGGRAPH 2012), 31(4).

    Google Scholar 

  25. Yu, Z., Li, X., Wang, P., & Zhao, G. (2021). Transrppg: Remote photoplethysmography transformer for 3d mask face presentation attack detection. IEEE Signal Processing Letters.

    Google Scholar 

Download references

Acknowledgements

This project is funded by the Bill & Melinda Gates Foundation (BMGF). Any opinions, findings, or conclusions are those of the authors and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ambareesh Revanur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Revanur, A., Dasari, A., Tucker, C.S., Jeni, L.A. (2023). Instantaneous Physiological Estimation Using Video Transformers. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds) Multimodal AI in Healthcare. Studies in Computational Intelligence, vol 1060. Springer, Cham. https://doi.org/10.1007/978-3-031-14771-5_22

Download citation

Publish with us

Policies and ethics