Multimedia Tools and Applications

, Volume 75, Issue 23, pp 16499–16527 | Cite as

Shot scale distribution in art films

  • Sergio Benini
  • Michele Svanera
  • Nicola Adami
  • Riccardo Leonardi
  • András Bálint Kovács


The scale of shot, i.e. the apparent distance of the camera from the main subject of a scene, is one of the main stylistic and narrative functions of audiovisual products, conveying meaning and inducing the viewer’s emotional state. The statistical distribution of different shot scales in a film may be an important identifier of an individual film, an individual author, and of various narrative and affective functions of a film. In order to understand at which level shot scale distribution (SSD) of a movie might become its fingerprint, it is necessary to produce automatic recognition of shot scale on a large movie corpus. In our work we propose an automatic framework for estimating the SSD of a movie by using inherent characteristics of shots containing information about camera distance, without the need to recover the 3D structure of the scene. In the experimental investigation, the comparison of obtained results with manual SSD annotations proves the validity of the framework. Experiments conducted on movies by Michelangelo Antonioni taken from different stylistic periods (1950–57, 1960–64, 1966–75, 1980–82) show a strong similarity in shot scale distributions within each period, thus opening interesting research lines regarding the possible aesthetic and cognitive sources of such a regularity.


Shot scale distribution Antonioni Feature extraction Cognitive pattern Authorship 


  1. 1.
    Arijon D (1991) Grammar of the film language, Silman-James PressGoogle Scholar
  2. 2.
    Balázs B (1924) Der sichtbare Mensch BerlinGoogle Scholar
  3. 3.
    Barnich O, Van Droogenbroeck M (2011) Vibe: A universal background subtraction algorithm for video sequences. Image Process IEEE Trans 20(6):1709–1724MathSciNetCrossRefGoogle Scholar
  4. 4.
    Barrow HG, Tenenbaum JM (1981) Interpreting line drawings as three-dimensional surfaces. Artif Intell 17(1):75–116CrossRefGoogle Scholar
  5. 5.
    Benini S, Canini L, Leonardi R (2010) Estimating cinematographic scene depth in movie shots. In: 2010 IEEE international conference on Multimedia and expo (ICME). IEEE, pp 855–860Google Scholar
  6. 6.
    Bhattacharya S, Mehran R, Sukthankar R, Shah M. (2014) Classification of cinematographic shots using lie algebra and its application to complex event recognition. IEEE Trans Multimed 16(3):686–696. doi: 10.1109/TMM.2014.2300833 CrossRefGoogle Scholar
  7. 7.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Brooks MJ, Horn BKP (1989) Shape and source from shading. In: Horn BKP, Brooks MJ (eds) Shape from shading. MIT Press, Cambridge, MA, pp 53–68Google Scholar
  9. 9.
    Canini L, Benini S, Leonardi R (2013) Classifying cinematographic shot types. Multimed Tools Appl 62(1):51–73CrossRefGoogle Scholar
  10. 10.
    Cantoni V, Lombardi L, Porta M, Vallone U (2001) Qualitative estimation of depth in monocular vision. In: Visual form 2001. Springer, pp 135–144Google Scholar
  11. 11.
    Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168Google Scholar
  12. 12.
    Chatman S, Duncan P (2008) Michelangelo Antonioni. Tutti i film. Kleine film Taschen.
  13. 13.
    Chen F, Delannay D, De Vleeschouwer C (2011) An autonomous framework to produce and distribute personalized team-sport video summaries: a basketball case study. IEEE Trans Multimed 13(6):1381–1394. doi: 10.1109/TMM.2011.2166379 CrossRefGoogle Scholar
  14. 14.
    Cherif I, Solachidis V, Pitas I (2007) Shot type identification of movie content. In: ISSPA 2007. 9th international symposium on Signal processing and its applications, 2007. IEEE, pp 1–4Google Scholar
  15. 15.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATHGoogle Scholar
  16. 16.
    Criminisi A, Shotton J, Konukoglu E (2011) Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Res Camb, Tech Rep MSRTR-2011-114 5(6):12MATHGoogle Scholar
  17. 17.
    Duan LY, Xu M, Tian Q, Xu CS, Jin JS (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083CrossRefGoogle Scholar
  18. 18.
    Ekin A, Tekalp AM (2003) Robust dominant color region detection and color-based applications for sports video. In: 2003 international conference on Image processing, 2003. ICIP 2003. Proceedings, vol 1. IEEE, pp i–21Google Scholar
  19. 19.
    Fan J, Elmagarmid A, Zhu X, Aref W, Wu L (2004) Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimed 6 (1):70–86. doi: 10.1109/TMM.2003.819583 CrossRefGoogle Scholar
  20. 20.
    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
  21. 21.
    Hoiem D, Adviser-Efros AA, Adviser-Hebert M (2007) Seeing the world behind the image: spatial layout for three-dimensional scene understanding Carnegie Mellon UniversityGoogle Scholar
  22. 22.
    Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425CrossRefGoogle Scholar
  23. 23.
    Internet movie database (imdb).
  24. 24.
    Keller JM, Crownover RM, Chen RY (1987) Characteristics of natural scenes related to the fractal dimension. IEEE Transactions on Pattern Analysis and Machine Intelligence (5), pp 621– 627Google Scholar
  25. 25.
    Kovács AB (2014) Shot scale distribution: an authorial fingerprint or a cognitive pattern? Projections 8(2). doi: 10.3167/proj.2014.080204
  26. 26.
    Kurita T, Otsu N, Abdelmalek N (1992) Maximum likelihood thresholding based on population mixture models. Pattern Recogn 25(10):1231–1240CrossRefGoogle Scholar
  27. 27.
    Matessi A, Lombardi L (1999) Vanishing point detection in the hough transform space. In: Euro-par’99 parallel processing. Springer, pp 987–994Google Scholar
  28. 28.
    McIvor AM (2000) Background subtraction techniques. Proc. Image Vis Comput 1(3):155–163Google Scholar
  29. 29.
    Nagai T, Naruse T, Ikehara M, Kurematsu A (2002) Hmm-based surface reconstruction from single images. In: 2002 international conference on Image processing. 2002. Proceedings, vol 2. IEEE, pp II–561Google Scholar
  30. 30.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATHGoogle Scholar
  31. 31.
    Palmer SE (1999) Vision science: Photons to phenomenology, vol 1. MIT press Cambridge, MAGoogle Scholar
  32. 32.
    Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vis 39(2):97–110CrossRefMATHGoogle Scholar
  33. 33.
    Super BJ, Bovik AC (1995) Shape from texture using local spectral moments. IEEE Trans Pattern Anal Mach Intell 17(4):333–343CrossRefGoogle Scholar
  34. 34.
    Svanera M, Benini S, Adami N, Leonardi R, Kovács AB 13th International Workshop on Content-Based Multimedia Indexing, CBMI 2015, Prague, Czech Republic, June 10-12, 2015, pp. 1–6. IEEE (2015). doi: 10.1109/CBMI.2015.7153627
  35. 35.
    Torralba A, Oliva A (2002) Depth estimation from image structure. IEEE Trans Pattern Anal Mach Intell 24(9):1226–1238CrossRefMATHGoogle Scholar
  36. 36.
    Tsingalis I, Vretos N, Nikolaidis N, Pitas I (2012) Svm-based shot type classification of movie content. In: Proceedings of 9th mediterranean electro technical conference. Istanbul, Turkey, pp 104–107Google Scholar
  37. 37.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on Computer vision and pattern recognition, vol 1. IEEE, pp i–511Google Scholar
  38. 38.
    Wang HL, Cheong LF (2009) Taxonomy of directing semantics for film shot classification. IEEE Trans Circ Syst Video Technol 19(10):1529–1542. doi: 10.1109/TCSVT.2009.2022705 CrossRefGoogle Scholar
  39. 39.
    Wikipedia: Art film — wikipedia, the free encyclopedia (2015). [Online; accessed 20-March-2015]
  40. 40.
    Xie L, Chang SF, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden markov models, vol 4. IEEE, pp IV–4096Google Scholar
  41. 41.
    Xu M, Wang J, Hasan MA, He X, Xu C, Lu H, Jin JS (2011) Using context saliency for movie shot classification. In: 2011 18th IEEE international conference on Image processing (ICIP). IEEE, pp 3653–3656Google Scholar
  42. 42.
    Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2879–2886Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Sergio Benini
    • 1
  • Michele Svanera
    • 1
  • Nicola Adami
    • 1
  • Riccardo Leonardi
    • 1
  • András Bálint Kovács
    • 2
  1. 1.Department of Information EngineeringUniversità degli studi di BresciaBresciaItaly
  2. 2.Film DepartmentELTE UniversityBudapestHungary

Personalised recommendations