Hough Forests Revisited: An Approach to Multiple Instance Tracking from Multiple Cameras

  • Georg Poier
  • Samuel Schulter
  • Sabine Sternig
  • Peter M. Roth
  • Horst Bischof
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8753)


Tracking multiple objects in parallel is a difficult task, especially if instances are interacting and occluding each other. To alleviate the arising problems multiple camera views can be taken into account, which, however, increases the computational effort. Evoking the need for very efficient methods, often rather simple approaches such as background subtraction are applied, which tend to fail for more difficult scenarios. Thus, in this work, we introduce a powerful multi-instance tracking approach building on Hough Forests. By adequately refining the time consuming building blocks, we can drastically reduce their computational complexity without a significant loss in accuracy. In fact, we show that the test time can be reduced by one to two orders of magnitude, allowing to efficiently process the large amount of image data coming from multiple cameras. Furthermore, we adapt the pre-trained generic forest model in an online manner to train an instance-specific model, making it well suited for multi-instance tracking. Our experimental evaluations show the effectiveness of the proposed efficient Hough Forests for object detection as well as for the actual task of multi-camera tracking.


  1. 1.
    Amit, Y., Geman, D.: Randomized inquiries about shape; an application to handwritten digit recognition. Technical report 401, Department of Statistics, University of Chicago, IL (1994)Google Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: Proceedings of CVPR (2008)Google Scholar
  3. 3.
    Ballard, D.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognit. 13(2), 111–122 (1981)MATHCrossRefGoogle Scholar
  4. 4.
    Barinova, O., Lempitsky, V.S., Kohli, P.: On detection of multiple object instances using hough transforms. IEEE Trans. PAMI 34(9), 1773–1784 (2012)CrossRefGoogle Scholar
  5. 5.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)CrossRefGoogle Scholar
  6. 6.
    Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: Proceedings of CVPR (2012)Google Scholar
  7. 7.
    Berclaz, J., Fleuret, F., Fua, P.: Multiple object tracking using k-shortest path optimization. IEEE Trans. PAMI 9(33), 1806–1819 (2011)CrossRefGoogle Scholar
  8. 8.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Butt, A.A., Collins, R.T.: Multi-target tracking by lagrangian relaxation to min-cost network flow. In: Proceedings of CVPR (2013)Google Scholar
  10. 10.
    Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 278–291. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis. 7(2–3), 81–227 (2012)Google Scholar
  12. 12.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: Proc. CVPR (2012).Google Scholar
  13. 13.
    Dicle, C., Camps, O., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: Proceedings of ICCV (2013)Google Scholar
  14. 14.
    Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: Proceedings of BMVC (2010)Google Scholar
  15. 15.
    Eshel, R., Moses, Y.: Tracking in a dense crowd using multiple cameras. IJCV 88(1), 129–143 (2010)CrossRefGoogle Scholar
  16. 16.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  17. 17.
    Gall, J., Razavi, N., Van Gool, L.: On-line adaption of class-specific codebooks for instance tracking. In: Proceedings of BMVC (2010)Google Scholar
  18. 18.
    Gall, J., Razavi, N., Van Gool, L.: An introduction to random forests for multi-class object detection. In: Dellaert, F., Frahm, J.-M., Pollefeys, M., Leal-Taixé, L., Rosenhahn, B. (eds.) Real-World Scene Analysis 2011. LNCS, vol. 7474, pp. 243–263. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.S.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. PAMI 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
  20. 20.
    Girshick, R.B., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: Proceedings of ICCV (2011)Google Scholar
  21. 21.
    Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)CrossRefGoogle Scholar
  22. 22.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. PAMI 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  23. 23.
    Khan, S.M., Shah, M.: Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans. PAMI 31(3), 505–519 (2009)CrossRefGoogle Scholar
  24. 24.
    Küttel, D., Breitenstein, M.D., Van Gool, L., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: Proceedings of CVPR (2010)Google Scholar
  25. 25.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1–3), 259–289 (2008)CrossRefGoogle Scholar
  26. 26.
    Liu, J., Carr, P., Collins, R.T., Liu, Y.: Tracking sports players with context-conditioned motion models. In: Proceedings of CVPR (2013)Google Scholar
  27. 27.
    Okada, R.: Discriminative generalized hough transform for object dectection. In: Proceedings of ICCV (2009)Google Scholar
  28. 28.
    Özuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. PAMI 32(3), 448–461 (2010)CrossRefGoogle Scholar
  29. 29.
    Perlich, C., Provost, F.J., Simonoff, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4, 211–255 (2003)MathSciNetGoogle Scholar
  30. 30.
    Possegger, H., Sternig, S., Mauthner, T., Roth, P.M., Bischof, H.: Robust real-time tracking of multiple objects by volumetric mass densities. In: Proceedings of CVPR (2013)Google Scholar
  31. 31.
    Roth, P.M., Leistner, C., Berger, A., Bischof, H.: Multiple instance learning from multiple cameras. In: IEEE Workshop on Camera Networks (CVPR) (2010)Google Scholar
  32. 32.
    Schreiber, D., Cambrini, L., Biber, J., Sardy, B.: Online visual quality inspection for weld seams. Int. J. Adv. Manuf. Technol. 42(5–6), 497–504 (2008)Google Scholar
  33. 33.
    Schulter, S., Leistner, C., Roth, P.M., Van Gool, L., Bischof, H.: On-line hough forests. In: Proceedings of BMVC (2011)Google Scholar
  34. 34.
    Shotton, J., Girshick, R.B., Fitzgibbon, A.W., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. IEEE Trans. PAMI 35(12), 2821–2840 (2013)CrossRefGoogle Scholar
  35. 35.
    Sternig, S., Mauthner, T., Irschara, A., Roth, P.M., Bischof, H.: Multi-camera multi-object tracking by robust hough-based homography projections. In: IEEE Workshop on Visual Surveillance (ICCV) (2011)Google Scholar
  36. 36.
    Tang, D., Liu, Y., Kim, T.K.: Fast pedestrian detection by cascaded random forest with dominant orientation templates. In: Proceedings of BMVC (2012)Google Scholar
  37. 37.
    Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: Proceedings of ICCV (2013)Google Scholar
  38. 38.
    Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.: Learning people detectors for tracking in crowded scenes. In: Proceedings of ICCV (2013)Google Scholar
  39. 39.
    Wohlhart, P., Donoser, M., Roth, P.M., Bischof, H.: Detecting partially occluded objects with an implicit shape model random field. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 302–315. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  40. 40.
    Roshan Zamir, A., Dehghan, A., Shah, M.: GMCP-tracker: global multi-object tracking using generalized minimum clique graphs. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 343–356. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Georg Poier
    • 1
  • Samuel Schulter
    • 1
  • Sabine Sternig
    • 1
  • Peter M. Roth
    • 1
  • Horst Bischof
    • 1
  1. 1.Institute for Computer Graphics and VisionGraz University of TechnologyGrazAustria

Personalised recommendations