Science China Information Sciences

, 60:092103 | Cite as

A Gaussian copula regression model for movie box-office revenues prediction

  • Junwen Duan
  • Xiao Ding
  • Ting Liu
Research Paper


In this article, we revisit the task of movie box-office revenues prediction using multi-type features. The movie box-office revenues are affected by numerous factors. Previous work with discriminative models assumes these factors are identically and independently distributed. The correlations between these factors are rarely considered, which limited the performances of discriminative models in this task. To address these problems, we investigate a novel Gaussian copula regression model. Based on this model, we do not need to make any prior assumptions about the marginal distributions of the features. In particular, we perform a cumulative probability estimation on each of the smoothed features. The estimation learns the marginal distributions and maps all features into a uniform vector space. Sequentially, we bridge the marginal distributions with a copula function to create their joint distribution, and learn the dependency structure between them. Moreover, we propose a computational-efficient approximate algorithm for responsible variable inference. Experimental results on two movie datasets from Chinese and U.S. market show that our approach outperforms strong discriminative regression baselines.


Gaussian copula movie box-office revenue multi-variate regression text regression social media 



本文中, 我们讨论利用多种特征进行电影票房预测的任务。影响电影票房的因素有很多。之前的工作采用的判别模型假设影响电影票房的这些因素是独立同分布的。这些因素之间的关联性很少被考虑, 这样的假设限制了判别模型在此任务上的效果。为了处理这些问题, 我们采用了一个全新的高斯连接回归模型。基于此模型, 我们不需要对特征的边缘分布作任何先验假设。特别地, 我们首先对平滑处理后的特征进行累积概率分布进行估计。通过估计我们学习到了特征的边缘分布, 同时将特征投影到同一向量空间。随后, 我们通过高斯连接函数将这些边缘分布转化为它们的联合分布, 同时获得这些边缘分布之间的依赖关系。此外, 我们还针对联合分布提出了一种高效的因变量推断的近似算法。在两个来自美国和中国电影市场的数据集上的实验结果证明我们的方法表现优于判别模型基线方法。


高斯连接 电影票房 回归模型 文本回归 社会媒体 



This work was supported by National Basic Research Program of China (Grant No. 2014CB340503), and National Natural Science Foundation of China (Grant Nos. 71532004, 61133012, 61472107).


  1. 1.
    Liu T, Ding X, Chen Y, et al. Predicting movie box-office revenues by exploiting large-scale social media content. Multimedia Tools Appl, 2016, 75: 1509–1528CrossRefGoogle Scholar
  2. 2.
    Zhou D H, Han W B, Wang Y J, et al. Information diffusion network inferring and pathway tracking. Sci China Inf Sci, 2015, 58: 092111Google Scholar
  3. 3.
    Duan J, Chen Y, Liu T, et al. Mining intention-related products on online q&a community. J Comput Sci Tech, 2015, 30: 1054–1062CrossRefGoogle Scholar
  4. 4.
    Ding X, Liu T, Duan J, et al. Mining user consumption intention from social media using domain adaptive convolutional neural network. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, 2015. 2389–2395Google Scholar
  5. 5.
    Wang H, Can D, Kazemzadeh A, et al. A system for real-time twitter sentiment analysis of 2012 U.S. presidential election cycle. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, Jeju Island, 2012. 115–120Google Scholar
  6. 6.
    Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci, 2011, 2: 1–8CrossRefGoogle Scholar
  7. 7.
    Ding X, Zhang Y, Liu T, et al. Using structured events to predict stock price movement: an empirical investigation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1415–1425Google Scholar
  8. 8.
    Asur S, Huberman B A. Predicting the future with social media. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Washington: IEEE Computer Society, 2010. 492–499Google Scholar
  9. 9.
    Pan R K, Sinha S. The statistical laws of popularity: universal properties of the box-office dynamics of motion pictures. New J Phys, 2010, 12: 5004CrossRefGoogle Scholar
  10. 10.
    Sklar M. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris, 1959, 8: 229–231MATHGoogle Scholar
  11. 11.
    Härdle W, Kleinow T, Stahl G. Applied Quantitative Finance: Theory and Computational Tools. Berlin: Springer, 2013MATHGoogle Scholar
  12. 12.
    Eickhoff C, Vries A P, Collins-Thompson K. Copulas for information retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013. 663–672Google Scholar
  13. 13.
    Wang W Y, Wen M. I can has cheezburger? A nonparanormal approach to combining textual and visual information for predicting and generating popular meme descriptions. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, 2015. 355–365Google Scholar
  14. 14.
    Elidan G. Copula bayesian networks. Advances Neural Inf Process Syst, 2010, 23: 559–567Google Scholar
  15. 15.
    Fujimaki R, Sogawa Y, Morinaga S. Online heterogeneous mixture modeling with marginal and copula selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 645–653Google Scholar
  16. 16.
    Sharda R, Delen D. Predicting box-office success of motion pictures with neural networks. Expert Syst Appl, 2006, 30: 243–254CrossRefGoogle Scholar
  17. 17.
    Zhang L, Luo J, Yang S. Forecasting box office revenue of movies with bp neural network. Expert Syst Appl, 2009, 36: 6580–6587CrossRefGoogle Scholar
  18. 18.
    Mishne G, Glance N S. Predicting movie sales from blogger sentiment. In: Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, Stanford, 2006. 155–158Google Scholar
  19. 19.
    Zhang W B, Skiena S. Improving movie gross prediction through news analysis. In: Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Washington: IEEE Computer Society, 2009. 301–304Google Scholar
  20. 20.
    Joshi M, Das D, Gimpel K, et al. Movie reviews and revenues: an experiment in text regression. In: Proceedings of Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, 2010. 293–296Google Scholar
  21. 21.
    Mesty´an M, Yasseri T, Kertész J. Early prediction of movie box office success based on wikipedia activity big data. Plos One, 2013, 8: e71226CrossRefGoogle Scholar
  22. 22.
    Zhang L, Singh V. Bivariate flood frequency analysis using the copula method. J Hydrol Eng, 2006, 11: 150–164CrossRefGoogle Scholar
  23. 23.
    Wang W Y, Hua Z. A semiparametric gaussian copula regression model for predicting financial risks from earnings calls. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. 1155–1165Google Scholar
  24. 24.
    Nelsen R B. An Introduction to Copulas. New York: Springer, 2013MATHGoogle Scholar
  25. 25.
    Joe H. Multivariate Models and Multivariate Dependence Concepts. Boca Raton: CRC Press, 1997CrossRefMATHGoogle Scholar
  26. 26.
    Yan J, Leeuw J D, Zeileis A. Enjoy the joy of copulas: with a package copula. J Stat Softw, 2007, 21: 1–21CrossRefGoogle Scholar
  27. 27.
    Bird S. Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, Sydney, 2006. 69–72Google Scholar
  28. 28.
    Toutanova K, Manning C D. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction With the 38th Annual Meeting of the Association for Computational Linguistics- Volume 13, Hong Kong, 2000. 63–70Google Scholar
  29. 29.
    Manning C D, Surdeanu M, Bauer J, et al. The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, 2014. 55–60Google Scholar
  30. 30.
    Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B, 2005, 67: 301–320MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Smola A, Vapnik V. Support vector regression machines. Adv Neural Inf Process Syst, 1997, 9: 155–161Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Research Center for Social Computing and Information RetrievalHarbin Institute of TechnologyHarbinChina

Personalised recommendations