Abstract
Movie success prediction in the development phase is considered a challenging task due to the availability of very limited information. A movie plot is established during the development phase and it is crucial aspect for determining the movie success. In this research, novel time series based features are proposed for “say” Story popularity in order to predict the movie success accurately. Multiple time series are generated representing the sentiment of a story and plot topics that are collectively termed as “say” Story popularity. A hybrid voting based classifier is created using Gradient Boosting, Random Forest, Support Vector Machine, Multilayer Perceptron, and Deep Learning classifiers to forecast the success of the movies in the development phase. The proposed features enhanced the accuracy by 11.9% and achieves an F1 Score of 75.1% in comparison to the state-of-the-art. This study also conducts experiments that highlight the importance of story popularity on release time related to movie success.
Similar content being viewed by others
References
Abidi SMR, Xu Y, Ni J, Wang X, Zhang W (2020) Popularity prediction of movies: from statistical modeling to machine learning techniques, multimedia tools and applications. pp 1–35
Ahmad IS, Bakar AA, Yaakub MR (2020) Movie revenue prediction based on purchase intention mining using youtube trailer reviews. Inf Process Manag 57(5):102278
Ahmed U, Waqas H, Afzal MT (2020) Pre-production box-office success quotient forecasting. Soft Comput 24(9):6635–6653
Banik R (2017) The movies dataset, dataset on kaggle. Version 7:3
Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) Abcdm: an attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation, journal of machine learning research 3 (Jan). pp 993–1022
Chaturvedi S, Srivastava S, Roth D (2018) Where have i heard this story before? identifying narrative similarity in movie remakes. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol 2 (Short Papers). pp 673–678
Chen K, Franko K, Sang R (2021) Structured model pruning of convolutional networks on tensor processing units, arXiv:2107.04191
Chow PS (2020) You are here: home/spring 2020_# intelligence/ghost in the (hollywood) machine: emergent applications of artificial intelligence...
Dashtipour K, Gogate M, Li J, Jiang F, Kong B, Hussain A (2020) A hybrid persian sentiment analysis framework: integrating dependency grammar based rules and deep neural networks. Neurocomputing 380. pp 1–10
Dora L, Agrawal S, Panda R, Abraham A (2018) Nested cross-validation based adaptive sparse representation algorithm and its application to pathological brain classification, expert systems with applications. vol 114, pp 313–321
Eliashberg J, Hui S, Zhang S (2010) Green-lighting movie scripts: revenue forecasting and risk management, Ph.D. thesis, Ph, D. Thesis, University of Pennsylvania
Eliashberg J, Hui SK, Zhang ZJ (2014) Assessing box office performance using movie scripts: a kernel-based approach. IEEE Trans Knowl Data Eng 26 (11):2639–2648
Ertugrul AM, Karagoz P (2018) Movie genre classification from plot summaries using bidirectional lstm. In: 2018 IEEE 12th International Conference on Semantic Computing ICSC, IEEE. pp 248–251
Fathalla A, Salah A, Li K, Li K, Francesco P (2020) Deep end-to-end learning for price prediction of second-hand items, knowledge and information systems. pp 1–28
Franses PH (2021) Modeling box office revenues of motion pictures, technological forecasting and social change 169. pp 120812
Gao Z, Malic V, Ma S, Shih P (2019) How to make a successful movie: factor analysis from both financial and critical perspectives. In: International Conference on Information, Springer. pp 669–678
Gorinski PJ, Lapata M (2018) What’s this movie about? a joint neural network architecture for movie content analysis. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol 1 (Long Papers), pp 1770–1781
Gross JA, Roberson WC, Foley-Cox JB (2021) Cs 230: film success prediction using nlp techniques
Hunter I, David S, Smith S, Singh S (2016) Predicting box office from the screenplay: a text analytical approach. J Screenwriting 7(2):135–154
Hutto CJ, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice OTexts
Jabrayilzade E, Arslan AP, Para H, Polatbilek O, Sezerer E, Tekir S (2020) A turkish topic modeling dataset for multi-label classification of movie genre. In: 2020 28th Signal Processing and Communications Applications Conference (SIU), IEEE. pp 1–5
Kim DH (2021) What types of films are successful at the box office? predicting opening weekend and non-opening gross earnings of films, journal of media business studies. pp 1–21
Kim T, Hong J, Kang P (2015) Box office forecasting using machine learning algorithms based on sns data. Int J Forecast 31(2):364–390
Kim Y-J, Lee J-H, Cheong Y-G (2019) Prediction of a movie’s success from plot summaries using deep learning models. ACL 2019:127
Lash MT, Zhao K (2016) Early predictions of movie success: the who, what, and when of profitability. J Manag Inf Syst 33(3):874–903
Lee O-J, Jung JJ (2018) Explainable movie recommendation systems by using story-based similarity. In: IUI Workshops
Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing MIT press
Moon S, Jalali N, Song R (2022) Green-lighting scripts in the movie pre-production stage: an application of consumption experience carryover theory, journal of business research
Mun MK, Chong CW (2018) Forecasting movie demand using total and split exponential smoothing. Jurnal Ekonomi Malaysia 52(2):81–94
Mundra S, Dhingra A, Kapur A, Joshi D (2019) Prediction of a movie’s success using data mining techniques. In: Information and Communication Technology for Intelligent Systems, Springer. pp 219–227
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML
Nawar A, Toma NT, Mamun S, Kaiser MS, Mahmud M, Rahman MA (2021) Cross-content recommendation between movie and book using machine learning. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), IEEE. pp 1–6
Parvandeh S, Yeh H-W, Paulus MP, McKinney B (2020) Consensus features nested cross-validation, bioRxiv. pp 2019–12
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python, journal of machine learning research. vol 12 (Oct), pp 2825–2830
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L Deep contextualized word representations, arXiv:1802.05365
Rasmussen NV (2020) data, camera, action: how algorithms are shaking up european screen production, AoIR selected papers of internet research
Razeen F, Sankar S, Banu WA, Magesh S (2021) Predicting movie success using regression techniques. In: Intelligent Computing and Applications, Springer. pp 657–670
Redfern N (2012) Genre trends at the us box office, 1991 to 2010. Eur J of Am Cult 31(2):145–167
Ru Y, Li B, Liu J, Chai J (2018) An effective daily box office prediction model based on deep neural networks. Cogn Syst Res 52:182–191
Ryoo JH, Wang X, Lu S (2021) Do spoilers really spoil? using topic modeling to measure the effect of spoiler reviews on box office revenue. J Mark 85 (2):70–88
Sarimax: Introduction (2020) (2020) https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html Accessed: 2020-02-30
Seabold S, Perktold J (2010) Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference. vol 57, Austin, TX. pp 61
Silver-Lasky P (2004) Screenwriting for the 21st century, sterling publishing company
Usero B, Hernández V, Quintana C (2022) Social media mining for business intelligence analytics: an application for movie box office forecasting. In: Intelligent Computing, Springer. pp 981–999
van Gerven M, Bohte S (2018) Artificial neural networks as models of neural information processing, frontiers media SA
Wang Z, Zhang J, Ji S, Meng C, Li T, Zheng Y (2020) Predicting and ranking box office revenue of movies based on big data, information fusion
Wei WW (2006) Time series analysis. In: The Oxford Handbook of Quantitative Methods in Psychology: vol 2
Where data and the movie business meet (2020) https://www.the-numbers.com/, Accessed: 2020-02-30
Xu L, Yu X, Gulliver TA (2021) Intelligent outage probability prediction for mobile iot networks based on an igwo-elman neural network. IEEE Trans Veh Technol 70(2):1365–1375
Zhang C, Tian Y-X, Fan Z-P (2021) Forecasting the box offices of movies coming soon using social media analysis: a method based on improved bass models, expert systems with applications. pp 116241
Zhou Q, Han R, Li T, Xia B (2019) Joint prediction of time series data in inventory management. Knowl and Inf Syst 61(2):905–929
Author information
Authors and Affiliations
Contributions
Muzammil Hussain Shahid: Conceptualization, Methodology, Writing — original draft preparation, Software, Validation, Formal Analysis, Data Curation. Muhammad Arshad Islam: Conceptualization, Supervision, Writing — review and editing Mirza Beg: Conceptualization, Supervision, Writing — review and editing
Corresponding author
Ethics declarations
Conflicts of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Availability of Data and Material
The data that support the findings of this study are available in Kaggle at [4]. Time series based data were derived from the following resources available in the public domain “the-numbers” at [50]
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
“Any great film is always driven by script, script, script.” Ridley Scott.
Rights and permissions
About this article
Cite this article
Shahid, M.H., Islam, M.A. & Beg, M. Exploiting time series based story plot popularity for movie success prediction. Multimed Tools Appl 82, 3509–3534 (2023). https://doi.org/10.1007/s11042-022-13219-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13219-x