Skip to main content

Movie films consumption in Brazil: an analysis of support vector machine classification


We employ the support vector machine (SVM) classifier, over different types of kernels, to investigate whether observable variables of individuals and their household information are able to describe their consumption decision of film at theaters in Brazil. Using a very big dataset of 340,000 individuals living in metropolitan areas of a whole large developing economy, we performed a Knowledge Discovery in Databases to classify the film consumers, which results in 80% instances correctly classified. To reduce the degrees of freedom for SVM and to learn the more important determinants of film consumption, we apply the Linear Discriminant Analysis that allows us to identify the key determinants of this consumption. The main individual characteristics are age, education (that merges to be a student), income, and preferences for cultural goods. Regarding the main geographic characteristics, these are the timing of sample, population concentration, and supply of movie theaters. The results point to an ineffective policy for the sector at the time investigated.

This is a preview of subscription content, access via your institution.

Fig. 1

Source: Kinto (2011)

Fig. 2

Source: ANCINE, the Brazilian movies Agency


  • Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. ECML 2004. Lecture notes in computer science, vol 3201. Springer, Berlin

  • Bruzzone L, Serpico SB (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recogn Lett 18(11–13):1323–1328

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at Accessed 03 Aug 2018

  • Chen X, Chen Y, Weinberg CB (2013) Learning about movies: the impact of movie release types on the Nationwide. J Cult Econ 37:359–386

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  • Diniz SC, Machado AF (2011) Analysis of the consumption of artistic-cultural goods and services in Brazil. J Cult Econ 35(1):1–18

    Article  Google Scholar 

  • Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave version 3.8.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform. ISBN 1441413006. Accessed 30 July 2018

  • Fayyad U, Shapiro GP, Smyth P (1996) From data mining to knowledge discovery in databases, AI Magazine, vol 17, Issue 3

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  • Galar M, Fernandez A, Barrenechea B, Bustince H, Herrera F (2013) A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches. IEEE Trans Syst Man Cybern C 42:463–484

    Article  Google Scholar 

  • Jehle GA, Reny PJ (2000) Advanced microeconomic theory, 2nd edn. Prentice Hall, USA

    Google Scholar 

  • Kinto EA (2011) Otimização e análise das máquinas de vetores de suporte aplicadas à classificação de documentos. PhD Dissertation, University of Sao Paulo, Sao Paulo, Brazil, p 145

  • McLachlan Geoffrey (2004) Discriminant analysis and statistical pattern recognition, vol 544. Wiley, USA

    MATH  Google Scholar 

  • Mitchell TM (1997) Machine learning, 1st edn. [S.1]. McGraw-Hill Science/Engineering/Math

  • Moretti E (2011) Social learning and peer effects in consumption: evidence from movie sales. Rev Econ Stud 78(1):356–393

    Article  Google Scholar 

  • Amo S, Rocha, AR (2003) Mining sequential patterns using genetic programming, International Conference on Artificial Intelligence, Las Vegas, USA, pp 451–456

  • Russel SJ, Norvig P (1995) Artificial intelligence—a modern approach. Pearson Education, Malaysia

    MATH  Google Scholar 

  • Scott AJ (2017) Creative cities: the role of culture. Revue d’économie Politique 120(1):181–204

    Article  Google Scholar 

  • Segaram T (2007) Advance classification: kernel methods and SVMs. In: Programming Collective intelligence: Build Smart Web 2.0 Applications, O’Reilly

  • Witten IH, Frank E, Hall AM (2011) Data mining practical machine learning tools and techniques, 3rd edn. Elsevier, Netherlands

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marislei Nishijima.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nishijima, M., Nieuwenhoff, N., Pires, R. et al. Movie films consumption in Brazil: an analysis of support vector machine classification. AI & Soc 35, 451–457 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Film at theaters
  • SVM
  • LDA
  • KDD
  • Classification
  • Consumers
  • Individual data