Multimedia Tools and Applications

, Volume 73, Issue 1, pp 61–89 | Cite as

Human segmentation by geometrically fusing visible-light and thermal imageries

  • Jian Zhao
  • Sen-ching S. CheungEmail author


From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.


Sensor fusion Human segmentation Multi-camera fusion Thermal cameras 



We would like to thank the anonymous reviewers and the guest editors for their valuable comments.


  1. 1.
    Beyan C, Yigit A, Temizel A (2011) Fusion of thermal-and visible-band video for abandoned object detection. J Electron Imaging 20:033,001CrossRefGoogle Scholar
  2. 2.
    Bouguet JY (2005) Matlab camera calibration toolbox. Online at
  3. 3.
    Bradski G, Kaehler A (2008) Learning openCV. O’Reilly Media PressGoogle Scholar
  4. 4.
    Brown D (1966) Decentering distortion of lenses. Photogramm Eng 32(3):444–462Google Scholar
  5. 5.
    Bunyak F, Palaniappan K, Nath S, Seetharaman G (2007) Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In: IEEE workshop on applications of computer vision, WACV’07. IEEE, pp 35–35Google Scholar
  6. 6.
    Cevher V, Sankaranarayanan A, McClellan J, Chellappa R (2007) Target tracking using a joint acoustic video system. IEEE Trans Multimedia 9(4):715–727CrossRefGoogle Scholar
  7. 7.
    Chen S, Zhu W, Leung H (2008) Thermo-visual video fusion using probabilistic graphical model for human tracking. In: IEEE International Symposium on Circuits and systems, ISCAS 2008. IEEE, pp 1926–1929Google Scholar
  8. 8.
    Chen X, Davis J, Slusallek P (2000) Wide area camera calibration using virtual calibration objects. In: Conference on computer vision and pattern recognition, vol 2. IEEE, pp 520–527Google Scholar
  9. 9.
    Chen Y, Han C (2008) Night-time pedestrian detection by visual-infrared video fusion. In: 7th World congress on intelligent control and automation, WCICA 2008. IEEE, pp 5079–5084Google Scholar
  10. 10.
    Conaire C, OConnor N, Smeaton A (2008) Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Mach Vis Appl 19(5):483–494CrossRefzbMATHGoogle Scholar
  11. 11.
    Conaire CO, Cooke E, O’Connor N, Murphy N, Smeaton AF (2005) Fusion of infrared and visible spectrum video for indoor surveillance. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux, SwitzerlandGoogle Scholar
  12. 12.
    Cramer H, Scheunert U, Wanielik C (2003) Multi sensor data fusion using a generalized feature model applied to different types of extended road objects. In: 6th international conference of information fusion, vol 1, pp 2–10Google Scholar
  13. 13.
    Davis J, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vis Image Underst 106(2):162–182CrossRefGoogle Scholar
  14. 14.
    Davis JW, Sharma V (2005) Fusion-based background-subtraction using contour saliency. In: CVPR ’05: proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’05)—workshops. IEEE Computer Society, Washington, DC, p 11. doi: 10.1109/CVPR.2005.462
  15. 15.
    Denman S, Lamb T, Fookes C, Chandran V, Sridharan S (2010) Multi-spectral fusion for surveillance systems. Comput Electr Eng 36(4):643–663CrossRefzbMATHGoogle Scholar
  16. 16.
    Elmenreich W (2002) Sensor fusion in time-triggered systems. Ph.D. thesis, Vienna University of TechnologyGoogle Scholar
  17. 17.
    Forsyth DA, Ponce J (2002) Computer vision: a modern approach. Prentice Hall.
  18. 18.
    Goubet E, Katz J, Porikli F (2006) Pedestrian tracking using thermal infrared imaging. Mitsubishi Electric Research Laboratories, Technical Report, TR2005-126Google Scholar
  19. 19.
    Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion (Artech House Information Warfare Library). Artech House, Inc., Norwood, MA, USAGoogle Scholar
  20. 20.
    Han J, Bhanu B (2007) Fusion of color and infrared video for moving human detection. Pattern Recogn 40(6):1771–1784. doi: 10.1016/j.patcog.2006.11.010 CrossRefzbMATHGoogle Scholar
  21. 21.
    Hartley R, Reid I (2004) Multiple view geometry in computer vision. Cambridge University PressGoogle Scholar
  22. 22.
    Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):115–127. doi: 10.1023/A:1008115206617 CrossRefGoogle Scholar
  23. 23.
    Johnson M, Bajcsy P (2008) Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces. In: 11th international conference on information fusion, 2008. IEEE, pp 1–8Google Scholar
  24. 24.
    Kim K, Chalidabhongse TH, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):172–185. doi: 10.1016/j.rti.2004.12.004. Special issue on video object processingCrossRefGoogle Scholar
  25. 25.
    Kolmogorov V, Zabih R (2001) Computing visual correspondence with occlusions via graph cuts. Tech. rep., Cornell University, Ithaca, NY, USAGoogle Scholar
  26. 26.
    Krotosky S, Trivedi M (2006) Multimodal stereo image registration for pedestrian detection. In: Intelligent Transportation Systems Conference, 2006. ITSC’06. IEEE, pp 109–114Google Scholar
  27. 27.
    Kumar P, Mittal A, Kumar P (2006) Fusion of thermal infrared and visible spectrum video for robust surveillance. In: ICCVGIP06, pp 528–539Google Scholar
  28. 28.
    Lee S, McHenry K, Kooper R, Bajcsy P (2009) Characterizing human subjects in real-time and three-dimensional spaces by integrating thermal-infrared and visible spectrum cameras. In: IEEE International Conference on Multimedia and Expo, ICME 2009. IEEE, pp 1708–1711Google Scholar
  29. 29.
    Leykin A, Hammoud R (2010) Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach Vis Appl 21(4):587–595CrossRefGoogle Scholar
  30. 30.
    Llinas J, Bowman C, Rogova G, Steinberg A, Waltz E, White F (2004) Revisiting the JDL data fusion model II.
  31. 31.
    St-Laurent L, Maldague X, Prévost D (2007) Combination of colour and thermal sensors for enhanced object detection. In: 10th international conference on information fusion, 2007. IEEE, pp 1–8Google Scholar
  32. 32.
    St Onge P, Bilodeau G (2007) Visible and infrared sensors fusion by matching feature points of foreground blobs. In: ISVC07, pp II: 1–10Google Scholar
  33. 33.
    Steinberg AN, Bowman CL (2004) Rethinking the JDL data fusion levels. In: NSSDF conference proceedings. JHAPLGoogle Scholar
  34. 34.
    Svoboda T, Martinec D, Pajdla T (2005) A convenient multi-camera self-calibration for virtual environments. PRESENCE: Teleoperators and Virtual Environments 14(4):407–422CrossRefGoogle Scholar
  35. 35.
    Torresan H, Turgeon B, Ibarra-Castanedo C, Hebert P, Maldague XP (2004) Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In: Burleigh DD, Cramer KE, Peacock GR (eds) Thermosense XXVI, vol 5405. SPIE, pp 506–515. doi: 10.1117/12.548359.
  36. 36.
    Ulusoy I, Yuruk H (2011) New method for the fusion of complementary information from infrared and visual images for object detection. IET Image Process 5(1):36–48CrossRefGoogle Scholar
  37. 37.
    Venkatesh MV, Cheung SC, Zhao J (2008) Efficient object-based video inpainting. Pattern Recogn Lett: Special issue on video-based object and event analysis. doi: 10.1016/j.patrec.2008.03.011
  38. 38.
    Venkatesh MV, Zhao J, Profitt L, Cheung SCS (2009) Audio-visual privacy protection for video conference. In: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME’09. IEEE, Piscataway, NJ, pp 1574–1575. CrossRefGoogle Scholar
  39. 39.
    Volfson L (2006) Visible, night vision and ir sensor fusion. In: 9th international conference on information fusion, pp 10–13:1–4Google Scholar
  40. 40.
    White F (1988) A model for data fusion. In: 1st national symposium on sensor fusionGoogle Scholar
  41. 41.
    Wolfram Research I (2010) Mathematica edition: version 8.0. Champaign, ILGoogle Scholar
  42. 42.
    Wu Q, Boulanger P, Bischof WF (2008) Bi-layer video segmentation with foreground and background infrared illumination. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, NY, pp 1025–1026. doi:10.1145/1459359.1459562 CrossRefGoogle Scholar
  43. 43.
    Zhao J (2011) Camera planning and fusion in a heterogeneous camera network. Ph.D. thesis, University of KentuckyGoogle Scholar
  44. 44.
    Zhao J, Cheung SC (2009) Human segmentation by fusing visible-light and thermal imaginary. In: International Conference on Computer Vision workshops (ICCV workshops). IEEE, p 1185Google Scholar
  45. 45.
    Zhou H, Taj M (2008) Cavallaro: target detection and tracking with heterogeneous sensors. IEEE J Sel Topics Signal Process 2(4):503–513CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Windows Phone, Microsoft Corporation, One Microsoft WayRedmondUSA
  2. 2.Center for Visualization and Virtual EnvironmentsUniversity of KentuckyLexingtonUSA

Personalised recommendations