Skip to main content

Smart environment architecture for robust people detection by infrared and visible video fusion


Smart people detection systems are nowadays using heterogeneous cameras. This paper proposes an architecture which is focused on robustly detecting people by infrared and visible video fusion in smart environment. The architecture covers all levels provided by the INT\(^3\)-Horus framework, initially designed to perform monitoring and activity interpretation tasks. Indeed, INT\(^3\)-Horus is used as the development environment where the approach starts with image segmentation in both infrared and visible spectra. Then, the results are fused to enhance the overall detection performance. The paper describes in detail the INT\(^3\)-Horus levels selected to implement the new architecture. These are the acquisition, segmentation, fusion, identification and Tracking levels.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  • Anwer R, Vázquez D, López A (2011) Opponent colors for human detection. In: Pattern Recognition and Image Analysis, pp 363–370. Springer. doi:10.1007/978-3-642-21257-4_45

  • Capellades MB, Doermann D, DeMenthon D, Chellappa R (2003) An appearance based approach for human and object tracking. In: Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, volume 2, pp 82–85. IEEE. doi:10.1109/ICIP.2003.1246622

  • Carmona EJ, Martínez-Cantos J, Mira J (2008) A new video segmentation method of moving objects based on blob-level knowledge. Pattern Recogn Lett 29(3):272–285. doi:10.1016/j.patrec.2007.10.007

    Article  Google Scholar 

  • Carneiro D, Castillo JC, Novais P, Fernández-Caballero A, Neves J (2012) Multimodal behavioral analysis for non-invasive stress detection. Expert Syst Appl 39(18):13376–13389. doi:10.1016/j.eswa.2012.05.065

    Article  Google Scholar 

  • Castillo JC, Carneiro D, Serrano-Cuerda J, Novais P, Fernández-Caballero A, Neves J (2014) A multi-modal approach for activity classification and fall detection. Int J Syst Sci 45(4):810–824. doi:10.1080/00207721.2013.784372

    Article  Google Scholar 

  • Castillo JC, Serrano-Cuerda J, Fernández-Caballero A, Martínez-Rodrigo A (2016) Hierarchical architecture for robust people detection by fusion of infrared and visible video. In: Intelligent Distributed Computing IX, pp 343–351. Springer. doi:10.1007/978-3-319-25017-5_32

  • Chang SL, Yang FT, Wu WP, Cho YA, Chen SW (2011) Nighttime pedestrian detection using thermal imaging based on hog feature. In: Proceedings 2011 International Conference on System Science and Engineering, pp 694–698. IEEE. doi:10.1109/ICSSE.2011.5961992

  • Chen X, Flynn PJ, Bowyer KW (2005) Ir and visible light face recognition. Comput Vis Image Und 99(3):332–358. doi:10.1016/j.cviu.2005.03.001

    Article  Google Scholar 

  • Costa Â, Castillo JC, Novais P, Fernández-Caballero A, Simoes R (2012) Sensor-driven agenda for intelligent home care of the elderly. Expert Syst Appl 39(15):12192–12204. doi:10.1016/j.eswa.2012.04.058

    Article  Google Scholar 

  • Davis JW, Sharma V (2004) Robust detection of people in thermal imagery. In: Proceedings of the 17th International Conference on Pattern Recognition, volume 4, pages 713–716. IEEE. doi:10.1109/ICPR.2004.781

  • Davis JW, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vis Image Und 106(2):162–182. doi:10.1016/j.cviu.2006.06.010

    Article  Google Scholar 

  • Elguebaly T, Bouguila N (2013) Finite asymmetric generalized gaussian mixture models learning for infrared object detection. Comput Vis Image Und 117(12):1659–1671. doi:10.1016/j.cviu.2013.07.007

    Article  Google Scholar 

  • Fang Y, Yamada K, Ninomiya Y, Horn BK, Masaki I (2004) A shape-independent method for pedestrian detection with far-infrared images. IEEE T Veh Technol 53(6):1679–1697

    Article  Google Scholar 

  • Fernández-Caballero A, Castillo JC, López MT, Serrano-Cuerda J, Sokolova MV (2013) Int 3-horus framework for multispectrum activity interpretation in intelligent environments. Expert Syst Appl 40(17):6715–6727. doi:10.1016/j.eswa.2013.06.058

    Article  Google Scholar 

  • Fernández-Caballero A, Castillo JC, Rodríguez-Sánchez JM (2012) Human activity monitoring by local and global finite state machines. Expert Syst Appl 39(8):6982–6993. doi:10.1016/j.eswa.2012.01.050

    Article  Google Scholar 

  • Fernández-Caballero A, Castillo JC, Serrano-Cuerda J, Maldonado-Bascón S (2011a) Real-time human segmentation in infrared videos. Expert Syst Appl 38(3):2577–2584. doi:10.1016/j.eswa.2010.08.047

    Article  Google Scholar 

  • Fernández-Caballero A, López MT, Carmona EJ, Delgado AE (2011b) A historical perspective of algorithmic lateral inhibition and accumulative computation in computer vision. Neurocomputing 74(8):1175–1181. doi:10.1016/j.neucom.2010.07.028

    Article  Google Scholar 

  • Fernández-Caballero A, López MT, Saiz-Valverde S (2008) Dynamic stereoscopic selective visual attention (dssva): integrating motion and shape with depth in video segmentation. Expert Syst Appl 34(2):1394–1402. doi:10.1016/j.eswa.2007.01.007

    Article  Google Scholar 

  • Gascueña JM, Fernández-Caballero A (2011) On the use of agent technology in intelligent, multisensory and distributed surveillance. Knowl Eng Rev 26(02):191–208. doi:10.1017/S0269888911000026

    Article  Google Scholar 

  • Gascueña JM, Fernández-Caballero A, López MT, Delgado AE (2011) Knowledge modeling through computational agents: application to surveillance systems. Expert Syst 28(4):306–323. doi:10.1111/j.1468-0394.2011.00609.x

    Article  Google Scholar 

  • Gascueña JM, Navarro E, Fernández-Caballero A (2012) Model-driven engineering techniques for the development of multi-agent systems. Eng Appl Artif Intell 25(1):159–173. doi:10.1016/j.engappai.2011.08.008

    Article  Google Scholar 

  • Gascueña JM, Navarro E, Fernández-Sotos P, Fernández-Caballero A, Pavón J (2015) Idk and icaro to develop multi-agent systems in support of ambient intelligence. J Intell Fuzzy Syst 28(1):3–15. doi:10.3233/IFS-141200

    Google Scholar 

  • Goubet E, Katz J, Porikli F (2006) Pedestrian tracking using thermal infrared imaging. In: Defense and security symposium, pp 62062C–62062C. International Society for Optics and Photonics. doi:10.1117/12.673132

  • Huang J, Kumar SR, Mitra M, Zhu W-J, Zabih R (1999) Spatial color indexing and applications. Int J Comput Vision 35(3):245–268. doi:10.1023/A:1008108327226

    Article  Google Scholar 

  • Jiang H, Tian Y (2011) Fuzzy image fusion based on modified self-generating neural network. Expert Syst Appl 38(7):8515–8523. doi:10.1016/j.eswa.2011.01.052

    Article  Google Scholar 

  • Kieran D, Yan W (2010) A framework for an event driven video surveillance system. In: 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 97–102. IEEE. doi:10.1109/AVSS.2010.57

  • Leykin A, Hammoud R (2010) Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach Vision Appl 21(4):587–595. doi:10.1007/s00138-008-0176-5

    Article  Google Scholar 

  • Li J, Gong W, Li W, Liu X (2010) Robust pedestrian detection in thermal infrared imagery using the wavelet transform. Infrared Phys Techn 53(4):267–273. doi:10.1016/j.infrared.2010.03.005

    Article  Google Scholar 

  • Liao W-H, Kuo J-H (2013) Sleep monitoring system in real bedroom environment using texture-based background modeling approaches. J Ambient Intell Human Comput 4(1):57–66. doi:10.1007/s12652-011-0067-x

    Article  Google Scholar 

  • Määttä T (2013) Sensor fusion in smart camera networks for ambient intelligence. J Ambient Intell Human Comput 5(4):415–417

    Google Scholar 

  • Martínez-Tomás R, Fernández-Caballero A, Ferrández JM (2014) Intelligent monitoring for people assistance and safety. Expert Syst 31(4):343–344. doi:10.1111/exsy.12016

    Article  Google Scholar 

  • OMalley R, Jones E, Glavin M (2010) Detection of pedestrians in far-infrared automotive night vision using region-growing and clothing distortion compensation. Infrared Phys Techn 53(6):439–449. doi:10.1016/j.infrared.2010.09.006

  • Pavón J, Gómez-Sanz J, Fernández-Caballero A, Valencia-Jiménez JJ (2007) Development of intelligent multisensor surveillance systems with agents. Robot Auton Syst 55(12):892–903. doi:10.1016/j.robot.2007.07.009

    Article  Google Scholar 

  • Rivas-Casado A, Martinez-Tomás R, Fernández-Caballero A (2011) Multi-agent system for knowledge-based event recognition and composition. Expert Syst. doi:10.1111/j.1468-0394.2010.00578.x

    Google Scholar 

  • Rodriguez MD, Shah M (2007) Detecting and segmenting humans in crowded scenes. In: Proceedings of the 15th ACM International Conference on Multimedia, pp 353–356. ACM. doi:10.1145/1291233.1291310

  • Sathyanarayana S, Satzoda RK, Sathyanarayana S, Thambipillai S (2015) Vision-based patient monitoring: a comprehensive review of algorithms and technologies. J Ambient Intell Human Comput, pp 1–27. doi:10.1007/s12652-015-0328-1

  • Schwartz WR, Kembhavi A, Harwood D, Davis LS (2009) Human detection using partial least squares analysis. In: 2009 IEEE 12th International Conference on Computer Vision, pages 24–31. IEEE

  • Sokolova MV, Castillo JC, Fernández-Caballero A, Serrano-Cuerda J (2012) Intelligent monitoring and activity interpretation framework-int3-horus ontological model. In: Advances in Knowledge-Based and Intelligent Information and Engineering Systems, pp 980–989. doi:10.3233/978-1-61499-105-2-980

  • Sokolova MV, Serrano-Cuerda J, Castillo JC, Fernández-Caballero A (2013) A fuzzy model for human fall detection in infrared video. J Intell Fuzzy Syst 24(2):215–228. doi:10.3233/IFS-2012-0548

    Google Scholar 

  • Torabi A, Massé G, Bilodeau G-A (2012) An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput Vision Image Und 116(2):210–221. doi:10.1016/j.cviu.2011.10.006

    Article  Google Scholar 

  • Wang J-T, Chen D-B, Chen H-Y, Yang J-Y (2012) On pedestrian detection and tracking in infrared videos. Pattern Recogn Lett 33(6):775–785. doi:10.1016/j.patrec.2011.12.011

    Article  Google Scholar 

  • Wu C (2011) Multi-camera vision for smart environments. PhD thesis, Stanford University

  • Xu F, Liu X, Fujimura K (2005) Pedestrian detection and tracking with night vision. IEEE T Intell Transp Syst 6(1):63–71. doi:10.1109/TITS.2004.838222

    Article  Google Scholar 

  • Yilmaz A, Javed O, Shah M (2006) Object tracking: A survey. ACM Comput Surv 38(4):13. doi:10.1145/1177352.1177355

    Article  Google Scholar 

  • Zin TT, Takahashi H, Hama H, Toriu T (2011) Fusion of infrared and visible images for robust person detection. In: Image fusion, pp 239–264. INTECH Open Access Publisher. doi:10.5772/14173

Download references


This work was partially supported by Spanish Ministerio de Economía y Competitividad / FEDER under TIN2013-47074-C2-1-R and DPI2016-80894-R grants.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Antonio Fernández-Caballero.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Castillo, J.C., Fernández-Caballero, A., Serrano-Cuerda, J. et al. Smart environment architecture for robust people detection by infrared and visible video fusion. J Ambient Intell Human Comput 8, 223–237 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Architecture
  • Video fusion
  • Visible video
  • Infrared video
  • People detection