Abstract
Video-based physiological signal estimation has been limited primarily to predicting episodic scores in windowed intervals. While these intermittent values are useful, they provide an incomplete picture of patients’ physiological status and may lead to late detection of critical conditions. We propose a video Transformer for estimating instantaneous heart rate and respiration rate from face videos. Physiological signals are typically confounded by alignment errors in space and time. To overcome this, we formulated the loss in the frequency domain. We evaluated the method on the large scale Vision-for-Vitals (V4V) benchmark. It outperformed both shallow and deep learning based methods for instantaneous respiration rate estimation. In the case of heart-rate estimation, it achieved an instantaneous-MAE of 13.0 beats-per-minute.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Block, R. C., Yavarimanesh, M., Natarajan, K., Carek, A., Mousavi, A., Chandrasekhar, A., Kim, C. S., Zhu, J., Schifitto, G., & Mestha, L.K., et al. (2020). Conventional pulse transit times as markers of blood pressure changes in humans. Scientific Reports, 10(1).
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: European Conference on Computer Vision.
Chen, W., & McDuff, D. (2018). Deepphys: Video-based physiological measurement using convolutional attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV).
Dasari, A., Prakash, S. K. A., Jeni, L. A., & Tucker, C. (2021). Evaluation of biases in remote photoplethysmography methods. NPJ Digital Medicene.
De Haan, G., & Jeanne, V. (2013). Robust pulse rate from chrominance-based rppg. IEEE Transactions on Biomedical Engineering, 60(10).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16 \(\times \) 16 words: Transformers for image recognition at scale. arXiv:2010.11929.
Gideon, J., & Stent, S. (2021). The way to my heart is through contrastive learning: Remote photoplethysmography from unlabelled video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision.
Hill, B., Liu, X., & McDuff, D. (2021). Beat-to-beat cardiac pulse rate measurement from video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
Lin, K., Wang, L., & Liu, Z. (2021). End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1954–1963), June 2021.
Liu, X., Fromm, J., Patel, S., & McDuff, D. (2020). Multi-task temporal shift attention networks for on-device contactless vitals measurement. arXiv:2006.03790.
Lu, H., Han, H., & Zhou, S. K. (2021). Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
McDuff, D., & Blackford, E. (2019). iphys: An open non-contact imaging-based physiological measurement toolbox. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE.
Neimark, D., Bar, O., Zohar, M., & Asselmann, D. (2021). Video transformer network. arXiv:2102.00719.
Niu, X., Yu, Z., Han, H., Li, X., Shan, S., & Zhao, G. (2020). Video-based remote physiological measurement via cross-verified feature disentangling. In: European Conference on Computer Vision.
Pereira, T., Tran, N., Gadhoumi, K., M. Pelter, M., Do, D.H., Lee, R.J., Colorado, R., Meisel, K., & Hu, X. (2020). Photoplethysmography based atrial fibrillation detection: a review. NPJ Digital Medicene.
Poh, M. Z., McDuff, D. J., & Picard, R. W. (2010). Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics Express, 18(10).
Prakash, S. K. A., & Tucker, C. S. (2018). Bounded kalman filter method for motion-robust, non-contact heart rate estimation. Biomedical Optics Express, 9(2).
Revanur, A., Li, Z., Ciftci, U. A., Yin, L., & Jeni, L. A. (2021). The first vision for vitals (v4v) challenge for non-contact video-based physiological estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
Stent, S., & Gideon, J. (2021). Estimating heart rate from unlabelled video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
Tarassenko, L., Villarroel, M., Guazzi, A., Jorge, J., Clifton, D., & Pugh, C. (2014). Non-contact video-based vital sign monitoring using ambient light and auto-regressive models. Physiological Measurement, 35(5).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: Advances in neural information processing systems (pp. 5998–6008).
Verkruysse, W., Svaasand, L. O., & Nelson, J. S. (2008). Remote plethysmographic imaging using ambient light. Optics Express, 16(26).
Wang, W., den Brinker, A. C., Stuijk, S., De Haan, G. (2016). Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering, 64(7).
Wu, H. Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W. T. (2012). Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics (Proceedings of the SIGGRAPH 2012), 31(4).
Yu, Z., Li, X., Wang, P., & Zhao, G. (2021). Transrppg: Remote photoplethysmography transformer for 3d mask face presentation attack detection. IEEE Signal Processing Letters.
Acknowledgements
This project is funded by the Bill & Melinda Gates Foundation (BMGF). Any opinions, findings, or conclusions are those of the authors and do not necessarily reflect the views of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Revanur, A., Dasari, A., Tucker, C.S., Jeni, L.A. (2023). Instantaneous Physiological Estimation Using Video Transformers. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds) Multimodal AI in Healthcare. Studies in Computational Intelligence, vol 1060. Springer, Cham. https://doi.org/10.1007/978-3-031-14771-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-14771-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14770-8
Online ISBN: 978-3-031-14771-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)