Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

A Gaussian copula regression model for movie box-office revenues prediction


  • 167 Accesses

  • 4 Citations


In this article, we revisit the task of movie box-office revenues prediction using multi-type features. The movie box-office revenues are affected by numerous factors. Previous work with discriminative models assumes these factors are identically and independently distributed. The correlations between these factors are rarely considered, which limited the performances of discriminative models in this task. To address these problems, we investigate a novel Gaussian copula regression model. Based on this model, we do not need to make any prior assumptions about the marginal distributions of the features. In particular, we perform a cumulative probability estimation on each of the smoothed features. The estimation learns the marginal distributions and maps all features into a uniform vector space. Sequentially, we bridge the marginal distributions with a copula function to create their joint distribution, and learn the dependency structure between them. Moreover, we propose a computational-efficient approximate algorithm for responsible variable inference. Experimental results on two movie datasets from Chinese and U.S. market show that our approach outperforms strong discriminative regression baselines.


本文中, 我们讨论利用多种特征进行电影票房预测的任务。影响电影票房的因素有很多。之前的工作采用的判别模型假设影响电影票房的这些因素是独立同分布的。这些因素之间的关联性很少被考虑, 这样的假设限制了判别模型在此任务上的效果。为了处理这些问题, 我们采用了一个全新的高斯连接回归模型。基于此模型, 我们不需要对特征的边缘分布作任何先验假设。特别地, 我们首先对平滑处理后的特征进行累积概率分布进行估计。通过估计我们学习到了特征的边缘分布, 同时将特征投影到同一向量空间。随后, 我们通过高斯连接函数将这些边缘分布转化为它们的联合分布, 同时获得这些边缘分布之间的依赖关系。此外, 我们还针对联合分布提出了一种高效的因变量推断的近似算法。在两个来自美国和中国电影市场的数据集上的实验结果证明我们的方法表现优于判别模型基线方法。

This is a preview of subscription content, log in to check access.


  1. 1

    Liu T, Ding X, Chen Y, et al. Predicting movie box-office revenues by exploiting large-scale social media content. Multimedia Tools Appl, 2016, 75: 1509–1528

  2. 2

    Zhou D H, Han W B, Wang Y J, et al. Information diffusion network inferring and pathway tracking. Sci China Inf Sci, 2015, 58: 092111

  3. 3

    Duan J, Chen Y, Liu T, et al. Mining intention-related products on online q&a community. J Comput Sci Tech, 2015, 30: 1054–1062

  4. 4

    Ding X, Liu T, Duan J, et al. Mining user consumption intention from social media using domain adaptive convolutional neural network. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, 2015. 2389–2395

  5. 5

    Wang H, Can D, Kazemzadeh A, et al. A system for real-time twitter sentiment analysis of 2012 U.S. presidential election cycle. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, Jeju Island, 2012. 115–120

  6. 6

    Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci, 2011, 2: 1–8

  7. 7

    Ding X, Zhang Y, Liu T, et al. Using structured events to predict stock price movement: an empirical investigation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1415–1425

  8. 8

    Asur S, Huberman B A. Predicting the future with social media. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Washington: IEEE Computer Society, 2010. 492–499

  9. 9

    Pan R K, Sinha S. The statistical laws of popularity: universal properties of the box-office dynamics of motion pictures. New J Phys, 2010, 12: 5004

  10. 10

    Sklar M. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris, 1959, 8: 229–231

  11. 11

    Härdle W, Kleinow T, Stahl G. Applied Quantitative Finance: Theory and Computational Tools. Berlin: Springer, 2013

  12. 12

    Eickhoff C, Vries A P, Collins-Thompson K. Copulas for information retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013. 663–672

  13. 13

    Wang W Y, Wen M. I can has cheezburger? A nonparanormal approach to combining textual and visual information for predicting and generating popular meme descriptions. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, 2015. 355–365

  14. 14

    Elidan G. Copula bayesian networks. Advances Neural Inf Process Syst, 2010, 23: 559–567

  15. 15

    Fujimaki R, Sogawa Y, Morinaga S. Online heterogeneous mixture modeling with marginal and copula selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 645–653

  16. 16

    Sharda R, Delen D. Predicting box-office success of motion pictures with neural networks. Expert Syst Appl, 2006, 30: 243–254

  17. 17

    Zhang L, Luo J, Yang S. Forecasting box office revenue of movies with bp neural network. Expert Syst Appl, 2009, 36: 6580–6587

  18. 18

    Mishne G, Glance N S. Predicting movie sales from blogger sentiment. In: Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, Stanford, 2006. 155–158

  19. 19

    Zhang W B, Skiena S. Improving movie gross prediction through news analysis. In: Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Washington: IEEE Computer Society, 2009. 301–304

  20. 20

    Joshi M, Das D, Gimpel K, et al. Movie reviews and revenues: an experiment in text regression. In: Proceedings of Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, 2010. 293–296

  21. 21

    Mesty´an M, Yasseri T, Kertész J. Early prediction of movie box office success based on wikipedia activity big data. Plos One, 2013, 8: e71226

  22. 22

    Zhang L, Singh V. Bivariate flood frequency analysis using the copula method. J Hydrol Eng, 2006, 11: 150–164

  23. 23

    Wang W Y, Hua Z. A semiparametric gaussian copula regression model for predicting financial risks from earnings calls. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. 1155–1165

  24. 24

    Nelsen R B. An Introduction to Copulas. New York: Springer, 2013

  25. 25

    Joe H. Multivariate Models and Multivariate Dependence Concepts. Boca Raton: CRC Press, 1997

  26. 26

    Yan J, Leeuw J D, Zeileis A. Enjoy the joy of copulas: with a package copula. J Stat Softw, 2007, 21: 1–21

  27. 27

    Bird S. Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, Sydney, 2006. 69–72

  28. 28

    Toutanova K, Manning C D. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction With the 38th Annual Meeting of the Association for Computational Linguistics- Volume 13, Hong Kong, 2000. 63–70

  29. 29

    Manning C D, Surdeanu M, Bauer J, et al. The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, 2014. 55–60

  30. 30

    Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B, 2005, 67: 301–320

  31. 31

    Smola A, Vapnik V. Support vector regression machines. Adv Neural Inf Process Syst, 1997, 9: 155–161

Download references


This work was supported by National Basic Research Program of China (Grant No. 2014CB340503), and National Natural Science Foundation of China (Grant Nos. 71532004, 61133012, 61472107).

Author information

Correspondence to Ting Liu.

Additional information

Conflict of interest The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Duan, J., Ding, X. & Liu, T. A Gaussian copula regression model for movie box-office revenues prediction. Sci. China Inf. Sci. 60, 092103 (2017). https://doi.org/10.1007/s11432-015-0905-6

Download citation


  • Gaussian copula
  • movie box-office revenue
  • multi-variate regression
  • text regression
  • social media


  • 高斯连接
  • 电影票房
  • 回归模型
  • 文本回归
  • 社会媒体