A Deep Neural Network Video Framework for Monitoring Elderly Persons

  • M. FarrajotaEmail author
  • João M. F. Rodrigues
  • J. M. H. du Buf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9738)


The rapidly increasing population of elderly persons is a phenomenon which affects almost the entire world. Although there are many telecare systems that can be used to monitor senior persons, none integrates one key requirement: detection of abnormal behavior related to chronic or new ailments. This paper presents a framework based on deep neural networks for detecting and tracking people in known environments, using one or more cameras. Video frames are fed into a convolutional network, and faces and upper/full bodies are detected in a single forward pass through the network. Persons are recognized and tracked by using a Siamese network which compares faces and/or bodies in previous frames with those in the current frame. This allows the system to monitor the persons in the environment. By taking advantage of parallel processing of ConvNets with GPUs, the system runs in real time on a NVIDIA Titan board, performing all above tasks simultaneously. This framework provides the basic infrastructure for future pose inference and gait tracking, in order to detect abnormal behavior and, if necessary, to trigger timely assistance by caregivers.


Design for aging Design for quality of life technologies Deep learning 



This work was supported by the FCT project LARSyS: UID/EEA/50009/2013 and FCT PhD grant to author MF (SFRH/BD/79812/2011).


  1. 1.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Conference CVPR, vol. 1, pp. 539–546 (2005)Google Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Conf. CVPR 1, 886–893 (2005)Google Scholar
  3. 3.
    Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral Channel Features, pp. 1–11. BMVC Press, Cambridge (2009)Google Scholar
  5. 5.
    Dundar, A., Bates, J., Farabet, C., Culurciello, E.: Tracking with deep neural networks. In: 47th Annual Conference CISS, pp. 1–5 (2013)Google Scholar
  6. 6.
    Farrajota, M., Rodrigues, J.M.F., du Buf, J.M.H.: Bio-Inspired pedestrian detection and tracking. In: 3rd International Conference on Advanced Bio-Informatics, Bio-Technology Environments, pp. 28–33 (2015)Google Scholar
  7. 7.
    Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. PAMI 34, 1–20 (2009)Google Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: IEEE Proceedings of the ICCV, June 2015Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visualrecognition. IEEE Trans. PAMI 37, 346–361 (2015). IEEECrossRefGoogle Scholar
  10. 10.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07–49, Uni. Massachusetts, Amherst, 49(07–49), 1–11 (2007)Google Scholar
  11. 11.
    Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. 148, 574–591 (1959)CrossRefGoogle Scholar
  12. 12.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: IEEE Proceedings of the ICCV, pp. 2146–2153 (2009)Google Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1–9 (2012)Google Scholar
  14. 14.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: NIPS, pp. 396–404 (1990)Google Scholar
  15. 15.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. IEEE Proc. 86, 2278–2323 (1998)CrossRefGoogle Scholar
  16. 16.
    Nowozin, S.: Optimal decisions from probabilistic models: the intersection-over-union case. In: IEEE Proceedings of the CVPR, pp. 548–555. IEEE (2014)Google Scholar
  17. 17.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Schroff, F., Dmitry, K., Philbin, J.: FaceNet : a unified embedding for face recognition and clustering. In: IEEE Proceedings of the CVPR, pp. 815–823 (2015)Google Scholar
  19. 19.
    Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: Proceedings of the ICPR, pp. 3288–3291 (2012)Google Scholar
  20. 20.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, C., Fergus, R., LeCun, Y.: OverFeat : Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv preprint, pp. 1–15 (2013). arXiv:1312.6229
  21. 21.
    Sermanet, P., Lecun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2809–2813 (2011)Google Scholar
  22. 22.
    Serre, T., Poggio, T.: A neuromorphic approach to computer vision. Commun. ACM 53(10), 54–61 (2010)CrossRefGoogle Scholar
  23. 23.
    Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv, pp. 1–13 (2014)Google Scholar
  24. 24.
    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: An experimental survey. IEEE Trans. PAMI 36, 1442–1468 (2014)CrossRefGoogle Scholar
  25. 25.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the International Conference on Machine Learning, ICML 2008, pp. 1096–1103 (2008)Google Scholar
  26. 26.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • M. Farrajota
    • 1
    Email author
  • João M. F. Rodrigues
    • 1
  • J. M. H. du Buf
    • 1
  1. 1.Vision Laboratory, LARSySUniversity of the AlgarveFaroPortugal

Personalised recommendations