Scene Modelling and Classification Using Learned Spatial Relations

  • Hannah M. Dee
  • David C. Hogg
  • Anthony G. Cohn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5756)


This paper describes a method for building visual scene models from video data using quantized descriptions of motion. This method enables us to make meaningful statements about video scenes as a whole (such as “this video is like that video”) and about regions within these scenes (such as “this part of this scene is similar to this part of that scene”). We do this through unsupervised clustering of simple yet novel motion descriptors, which provide a quantized representation of gross motion within scene regions. Using these we can characterise the dominant patterns of motion, and then group spatial regions based upon both proximity and local motion similarity to define areas or regions with particular motion characteristics. We are able to process scenes in which objects are difficult to detect and track due to variable frame-rate, video quality or occlusion, and we are able to identify regions which differ by usage but which do not differ by appearance (such as frequently used paths across open space). We demonstrate our method on 50 videos making up very different scene types: indoor scenarios with unpredictable unconstrained motion, junction scenes, road and path scenes, and open squares or plazas. We show that these scenes can be clustered using our representation, and that the incorporation of learned spatial relations into the representation enables us to cluster more effectively.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 26(9), 1124–1137 (2004)CrossRefMATHGoogle Scholar
  2. 2.
    Boykov, Y., Veksler, O., Zabih, R.: Efficient approximate energy minimization via graph cuts. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 20(12), 1222–1239 (2001)CrossRefGoogle Scholar
  3. 3.
    Breitenstein, M.D., Sommerlade, E., Leibe, B., Van Gool, L., Reid, I.: Probabilistic parameter selection for learning scene structure from video. In: Proc. British Machine Vision Conference, BMVC (2008)Google Scholar
  4. 4.
    Clarke, G.M., Cooke, D.: A basic course in statistics, 3rd edn. Edward Arnold, London (1992)MATHGoogle Scholar
  5. 5.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Dee, H.M., Fraile, R., Hogg, D.C., Cohn, A.G.: Modelling scenes using the activity within them. In: Freksa, C., Newcombe, N.S., Gärdenfors, P., Wölfl, S. (eds.) Spatial Cognition VI. LNCS, vol. 5248, pp. 394–408. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. International Conference on Computer Vision (ICCV), Nice, France (2003)Google Scholar
  8. 8.
    Ommer, B., Buhmann, J.M.: Object categorization by compositional graphical models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 235–250. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: Proc. Computer Vision and Pattern Recognition (CVPR), pp. 2137–2144 (2006)Google Scholar
  10. 10.
    Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop in scene interpretation. In: CVPR (2008)Google Scholar
  11. 11.
    Johnson, N., Hogg, D.C.: Learning the distribution of object tractories for event recognition. Image and Vision Computing 14(8), 609–615 (1996)CrossRefGoogle Scholar
  12. 12.
    KaewTraKulPong, P., Bowden, R.: Probabilistic learning of salient patterns across spatially separated, uncalibrated views. In: Intelligent Distributed Surveillance Systems, pp. 36–40 (2004)Google Scholar
  13. 13.
    Kaufhold, J., Colling, R., Hoogs, A., Rondot, P.: Recognition and segmentation of scene content using region-based classification. In: Proc. International Conference on Pattern Recognition, ICPR (2006)Google Scholar
  14. 14.
    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (2004)CrossRefMATHGoogle Scholar
  15. 15.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2/3), 107–123 (2005)CrossRefGoogle Scholar
  16. 16.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  17. 17.
    Laptev, I., Pérez, P.: Retrieving actions in movies. In: Proc. International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil (2007)Google Scholar
  18. 18.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)Google Scholar
  19. 19.
    Makris, D., Ellis, T.: Learning semantic scene models from observing activity in visual surveillance. IEEE Transactions on Systems, Man and Cybernetics 35(3), 397–408 (2005)CrossRefGoogle Scholar
  20. 20.
    McKenna, S.J., Charif, H.N.: Summarising contextual activity and detecting unusual inactivity in a supportive home environment. Pattern Analysis and Applications 7(4), 386–401 (2004)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Rother, D., Patwardhan, K.A., Sapiro, G.: What can casual walkers tell us about a 3D scene?. In: Proc. International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil (2007)Google Scholar
  22. 22.
    Shi, J., Tomasi, C.: Good features to track. In: Proc. Computer Vision and Pattern Recognition (CVPR), pp. 593–600 (1994)Google Scholar
  23. 23.
    Stauffer, C., Grimson, E.: Learning patterns of activity using real-time tracking. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(8), 747–757 (2000)CrossRefGoogle Scholar
  24. 24.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon (1991)Google Scholar
  25. 25.
    Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 31(3), 539–555 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hannah M. Dee
    • 1
  • David C. Hogg
    • 1
  • Anthony G. Cohn
    • 1
  1. 1.School of ComputingUniversity of LeedsLeedsUnited Kingdom

Personalised recommendations