A Spatio-temporal Approach for Multiple Object Detection in Videos Using Graphs and Probability Maps

  • Henrique MorimitsuEmail author
  • Roberto M. CesarJr.
  • Isabelle Bloch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8815)


This paper presents a novel framework for object detection in videos that considers both structural and temporal information. Detection is performed by first applying low-level feature extraction techniques in each frame of the video. Then, additional robustness is obtained by considering the temporal stability of videos, using particle filters and probability maps, which encode information about the expected location of each object. Lastly, structural information of the scene is described using graphs, which allows us to further improve the results. As a practical application, we evaluate our approach on table tennis sport videos databases: the UCF101 table tennis shots and an in-house one. The observed results indicate that the proposed approach is robust, showing a high hit rate on the two databases.


Object detection Structural information Graph Tracking Video 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Almajai, I., Yan, F., de Campos, T., Khan, A., Christmas, W., Windridge, D., Kittler, J.: Anomaly detection and knowledge transfer in automatic sports video annotation. In: Weinshall, D., Anemüller, J., van Gool, L. (eds.) Detection and Identification of Rare Audiovisual Cues. SCI, vol. 384, pp. 109–117. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  2. 2.
    Bloch, I., Colliot, O., Cesar, R.: On the ternary spatial relation between. IEEE Transactions on Systems, Man, and Cybernetics SMC-B 36(2), 312–327 (2006)CrossRefGoogle Scholar
  3. 3.
    Bradski, G.R.: Real time face and object tracking as a component of a perceptual user interface. In: WACV, pp. 214–219 (1998)Google Scholar
  4. 4.
    Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3D geometric phrases. In: CVPR, pp. 33–40 (2013)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE PAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  7. 7.
    Gaur, U., Zhu, Y., Song, B., Chowdhury, A.K.R.: A “string of feature graphs” model for recognition of complex activities in natural videos. In: ICCV, pp. 2595–2602 (2011)Google Scholar
  8. 8.
    Isard, M., Blake, A.: CONDENSATION - conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)CrossRefGoogle Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Matsushita, Y., Ofek, E., Ge, W., Tang, X., Shum, H.Y.: Full-frame video stabilization with motion inpainting. IEEE PAMI 28(7), 1150–1163 (2006)CrossRefGoogle Scholar
  11. 11.
    Morimitsu, H., Hashimoto, M., Pimentel, R.B., Cesar Jr, R.M., Hirata Jr, R.: Keygraphs for sign detection in indoor environments by mobile phones. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 315–324. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  12. 12.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)Google Scholar
  13. 13.
    Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)Google Scholar
  14. 14.
    Wang, Z., Shi, Q., Shen, C., van den Hengel, A.: Bilinear programming for human activity recognition with unknown MRF graphs. In: CVPR, pp. 1690–1697 (2013)Google Scholar
  15. 15.
    Widynski, N., Dubuisson, S., Bloch, I.: Fuzzy spatial constraints and ranked partitioned sampling approach for multiple object tracking. Computer Vision and Image Understanding (CVIU) 116(10), 1076–1094 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Henrique Morimitsu
    • 1
    Email author
  • Roberto M. CesarJr.
    • 1
  • Isabelle Bloch
    • 2
  1. 1.University of São PauloSão PauloBrazil
  2. 2.Institut Mines TélécomTélécom ParisTech, CNRS LTCIParisFrance

Personalised recommendations