Shot scale distribution in art films


The scale of shot, i.e. the apparent distance of the camera from the main subject of a scene, is one of the main stylistic and narrative functions of audiovisual products, conveying meaning and inducing the viewer’s emotional state. The statistical distribution of different shot scales in a film may be an important identifier of an individual film, an individual author, and of various narrative and affective functions of a film. In order to understand at which level shot scale distribution (SSD) of a movie might become its fingerprint, it is necessary to produce automatic recognition of shot scale on a large movie corpus. In our work we propose an automatic framework for estimating the SSD of a movie by using inherent characteristics of shots containing information about camera distance, without the need to recover the 3D structure of the scene. In the experimental investigation, the comparison of obtained results with manual SSD annotations proves the validity of the framework. Experiments conducted on movies by Michelangelo Antonioni taken from different stylistic periods (1950–57, 1960–64, 1966–75, 1980–82) show a strong similarity in shot scale distributions within each period, thus opening interesting research lines regarding the possible aesthetic and cognitive sources of such a regularity.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17


  1. 1.

    Arijon D (1991) Grammar of the film language, Silman-James Press

  2. 2.

    Balázs B (1924) Der sichtbare Mensch Berlin

  3. 3.

    Barnich O, Van Droogenbroeck M (2011) Vibe: A universal background subtraction algorithm for video sequences. Image Process IEEE Trans 20(6):1709–1724

    MathSciNet  Article  Google Scholar 

  4. 4.

    Barrow HG, Tenenbaum JM (1981) Interpreting line drawings as three-dimensional surfaces. Artif Intell 17(1):75–116

    Article  Google Scholar 

  5. 5.

    Benini S, Canini L, Leonardi R (2010) Estimating cinematographic scene depth in movie shots. In: 2010 IEEE international conference on Multimedia and expo (ICME). IEEE, pp 855–860

  6. 6.

    Bhattacharya S, Mehran R, Sukthankar R, Shah M. (2014) Classification of cinematographic shots using lie algebra and its application to complex event recognition. IEEE Trans Multimed 16(3):686–696. doi:10.1109/TMM.2014.2300833

    Article  Google Scholar 

  7. 7.

    Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Brooks MJ, Horn BKP (1989) Shape and source from shading. In: Horn BKP, Brooks MJ (eds) Shape from shading. MIT Press, Cambridge, MA, pp 53–68

    Google Scholar 

  9. 9.

    Canini L, Benini S, Leonardi R (2013) Classifying cinematographic shot types. Multimed Tools Appl 62(1):51–73

    Article  Google Scholar 

  10. 10.

    Cantoni V, Lombardi L, Porta M, Vallone U (2001) Qualitative estimation of depth in monocular vision. In: Visual form 2001. Springer, pp 135–144

  11. 11.

    Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168

  12. 12.

    Chatman S, Duncan P (2008) Michelangelo Antonioni. Tutti i film. Kleine film Taschen.

  13. 13.

    Chen F, Delannay D, De Vleeschouwer C (2011) An autonomous framework to produce and distribute personalized team-sport video summaries: a basketball case study. IEEE Trans Multimed 13(6):1381–1394. doi:10.1109/TMM.2011.2166379

    Article  Google Scholar 

  14. 14.

    Cherif I, Solachidis V, Pitas I (2007) Shot type identification of movie content. In: ISSPA 2007. 9th international symposium on Signal processing and its applications, 2007. IEEE, pp 1–4

  15. 15.

    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  16. 16.

    Criminisi A, Shotton J, Konukoglu E (2011) Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Res Camb, Tech Rep MSRTR-2011-114 5(6):12

    MATH  Google Scholar 

  17. 17.

    Duan LY, Xu M, Tian Q, Xu CS, Jin JS (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083

    Article  Google Scholar 

  18. 18.

    Ekin A, Tekalp AM (2003) Robust dominant color region detection and color-based applications for sports video. In: 2003 international conference on Image processing, 2003. ICIP 2003. Proceedings, vol 1. IEEE, pp i–21

  19. 19.

    Fan J, Elmagarmid A, Zhu X, Aref W, Wu L (2004) Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimed 6 (1):70–86. doi:10.1109/TMM.2003.819583

    Article  Google Scholar 

  20. 20.

    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  21. 21.

    Hoiem D, Adviser-Efros AA, Adviser-Hebert M (2007) Seeing the world behind the image: spatial layout for three-dimensional scene understanding Carnegie Mellon University

  22. 22.

    Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  23. 23.

    Internet movie database (imdb).

  24. 24.

    Keller JM, Crownover RM, Chen RY (1987) Characteristics of natural scenes related to the fractal dimension. IEEE Transactions on Pattern Analysis and Machine Intelligence (5), pp 621– 627

  25. 25.

    Kovács AB (2014) Shot scale distribution: an authorial fingerprint or a cognitive pattern? Projections 8(2). doi:10.3167/proj.2014.080204

  26. 26.

    Kurita T, Otsu N, Abdelmalek N (1992) Maximum likelihood thresholding based on population mixture models. Pattern Recogn 25(10):1231–1240

    Article  Google Scholar 

  27. 27.

    Matessi A, Lombardi L (1999) Vanishing point detection in the hough transform space. In: Euro-par’99 parallel processing. Springer, pp 987–994

  28. 28.

    McIvor AM (2000) Background subtraction techniques. Proc. Image Vis Comput 1(3):155–163

    Google Scholar 

  29. 29.

    Nagai T, Naruse T, Ikehara M, Kurematsu A (2002) Hmm-based surface reconstruction from single images. In: 2002 international conference on Image processing. 2002. Proceedings, vol 2. IEEE, pp II–561

  30. 30.

    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  31. 31.

    Palmer SE (1999) Vision science: Photons to phenomenology, vol 1. MIT press Cambridge, MA

    Google Scholar 

  32. 32.

    Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vis 39(2):97–110

    Article  MATH  Google Scholar 

  33. 33.

    Super BJ, Bovik AC (1995) Shape from texture using local spectral moments. IEEE Trans Pattern Anal Mach Intell 17(4):333–343

    Article  Google Scholar 

  34. 34.

    Svanera M, Benini S, Adami N, Leonardi R, Kovács AB 13th International Workshop on Content-Based Multimedia Indexing, CBMI 2015, Prague, Czech Republic, June 10-12, 2015, pp. 1–6. IEEE (2015). doi:10.1109/CBMI.2015.7153627

  35. 35.

    Torralba A, Oliva A (2002) Depth estimation from image structure. IEEE Trans Pattern Anal Mach Intell 24(9):1226–1238

    Article  MATH  Google Scholar 

  36. 36.

    Tsingalis I, Vretos N, Nikolaidis N, Pitas I (2012) Svm-based shot type classification of movie content. In: Proceedings of 9th mediterranean electro technical conference. Istanbul, Turkey, pp 104–107

  37. 37.

    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on Computer vision and pattern recognition, vol 1. IEEE, pp i–511

  38. 38.

    Wang HL, Cheong LF (2009) Taxonomy of directing semantics for film shot classification. IEEE Trans Circ Syst Video Technol 19(10):1529–1542. doi:10.1109/TCSVT.2009.2022705

    Article  Google Scholar 

  39. 39.

    Wikipedia: Art film — wikipedia, the free encyclopedia (2015). [Online; accessed 20-March-2015]

  40. 40.

    Xie L, Chang SF, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden markov models, vol 4. IEEE, pp IV–4096

  41. 41.

    Xu M, Wang J, Hasan MA, He X, Xu C, Lu H, Jin JS (2011) Using context saliency for movie shot classification. In: 2011 18th IEEE international conference on Image processing (ICIP). IEEE, pp 3653–3656

  42. 42.

    Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2879–2886

Download references

Author information



Corresponding author

Correspondence to Michele Svanera.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Benini, S., Svanera, M., Adami, N. et al. Shot scale distribution in art films. Multimed Tools Appl 75, 16499–16527 (2016).

Download citation


  • Shot scale distribution
  • Antonioni
  • Feature extraction
  • Cognitive pattern
  • Authorship