Cross-Domain Analysis of the Blogosphere for Trend Prediction

  • Patrick Siehndel
  • Fabian Abel
  • Ernesto Diaz-Aviles
  • Nicola Henze
  • Daniel Krause
Chapter

Abstract

In the recent years blogs became an important part of the web. New technologies like smartphones emerged that enable blogging at any time and make blogs more up-to-date than ever before. Due to their high popularity they are a valuable source of information regarding public opinions about all kind of topics. Blog postings that refer to products are of particular interest for companies to adjust marketing campaigns or advertisement. In this article we compare the blogging characteristics of two different domains: the music and the movie domain. We investigate how chatter from the blogosphere can be used to predict the success of products. We analyze and identify typical patterns of blogging behavior around the release of a product, point out methods for extracting features from the blogosphere and show that we can exploit these features to predict the monetary success of movies and music with high accuracy.

References

  1. 1.
    Abel, F., Diaz-Aviles, E., Henze, N., Krause, D., Siehndel, P.: Analyzing the blogosphere for predicting the success of music and movie products. In: Memon, N., Alhajj, R. (eds.) International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010), Odense, Denmark, pp. 276–280. IEEE Computer Society, Washington, DC (2010)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)MathSciNetMATHGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth International Group, Belmont, California (1984)MATHGoogle Scholar
  4. 4.
    Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in twitter: the million follower fallacy. In: Cohen, W.W., Gosling, S. (eds.) Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM ’10). AAAI, Palo Alto, California (2010)Google Scholar
  5. 5.
    Dhar, V., Chang, E.: Does Chatter Matter? The Impact of User-Generated Content on Music Sales. Journal of Interactive Marketing, 23(4), 300–307 (2009)CrossRefGoogle Scholar
  6. 6.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View of Boosting. Annals of Statistics, Vol. 28 (1998)Google Scholar
  7. 7.
    Glance, N.S., Hurst, M., Tomokiyo, T.: Blogpulse: automated trend discovery for weblogs. In: WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, ACM (2004)Google Scholar
  8. 8.
    Goetz, M, Leskovec, J., Mcglohon, M., Faloutsos, C.: Modeling blog dynamics. In: International Conference on Weblogs and Social Media. AAAI, Menlo Park (2009)Google Scholar
  9. 9.
    Gruhl, D., Guha, R., Nowell, D.L., Tomkins, A.: Information diffusion through blogspace. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 491–501. ACM, New York (2004)Google Scholar
  10. 10.
    Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 78–87. ACM, New York (2005)Google Scholar
  11. 11.
    Holmes, G., Pfahringer, B., Kirkby, R., Frank, E., Hall, M.: Multiclass alternating decision trees. In: ECML ’02: Proceedings of the 13th European Conference on Machine Learning, pp. 161–172. Springer, London (2002)Google Scholar
  12. 12.
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. UAI’95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA (1995)Google Scholar
  14. 14.
    Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Kohavi, R.: The power of decision tables. In: Lavrac, N., Wrobel, S. (eds.) Proceedings of the 8th European Conference on Machine Learning (ECML ’95), Heraclion, pp. 174–189. Springer, Berlin/Heidelberg (1995)Google Scholar
  16. 16.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web (WWW ’10), pp. 591–600. ACM, New York (2010)Google Scholar
  17. 17.
    Lerman, K., Ghosh, R.: Information contagion: an empirical study of spread of news on Digg and Twitter social networks. In: Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM ’10), AAAI, Palo Alto, California, May 2010Google Scholar
  18. 18.
    Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., Hurst, M.: Cascading behavior in large blog graphs. In: Society of Applied and Industrial Mathematics: Data Mining (SDM07), SIAM, Society for Industrial and Applied Mathematics, Philadelphia, April 2007Google Scholar
  19. 19.
    Liu, Y., Huang, X., An, A., Yu, X.: Arsa: a sentiment-aware model for predicting sales performance using blogs. In: SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2007)Google Scholar
  20. 20.
    McGlohon, M., Leskovec, J., Faloutsos, C., Hurst, M., Glance, N.: Finding patterns in blog shapes and blog evolution. In: International Conference on Weblogs and Social Media, Boulder. Carnegie Mellon University, School of Computer Science, Machine, Pittsburgh (2007)Google Scholar
  21. 21.
    Obradovic, D., Baumann, S., Dengel, A.: A social network analysis and mining methodology for the monitoring of specific domains in the blogosphere. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010), pp. 1–8. IEEE, Los Alamitos (2010)Google Scholar
  22. 22.
    Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization, pp. 185–208. MIT, Cambridge (1999)Google Scholar
  23. 23.
    Poggio, T. Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)CrossRefGoogle Scholar
  24. 24.
    Sadikov, E., Parameswaran, A., Venetis, P.: Blogs as predictors of movie success. Technical report, Stanford University (2009)Google Scholar
  25. 25.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web (WWW ’10), pp. 851–860. ACM, New York (2010)Google Scholar
  26. 26.
    Sussman, M.: Who are the bloggers? The what and why of blogging. Technical report, Technorati Media (2009)Google Scholar
  27. 27.
    Technorati: State of the blogosphere 2008. Technical report, Technorati Media (2008)Google Scholar
  28. 28.
    Weng, J., Lim, E.P., He, Q., Leung, C.W.K.: What do people want in microblogs? measuring interestingness of hashtags in twitter. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pp. 1121–1126. IEEE Computer Society, Washington, DC (2010)Google Scholar
  29. 29.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM ’11. ACM, New York (2011)Google Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Patrick Siehndel
    • 1
  • Fabian Abel
    • 2
  • Ernesto Diaz-Aviles
    • 1
  • Nicola Henze
    • 1
  • Daniel Krause
    • 1
  1. 1.L3S Research CenterLeibniz University HannoverHannoverGermany
  2. 2.Web Information SystemsDelft University of TechnologyDelftThe Netherlands

Personalised recommendations