Advertisement

Data Mining and Knowledge Discovery

, Volume 28, Issue 2, pp 442–474 | Cite as

Affinity-driven blog cascade analysis and prediction

  • Hui Li
  • Sourav S Bhowmick
  • Aixin Sun
  • Jiangtao Cui
Article

Abstract

Information propagation within the blogosphere is of much importance in implementing policies, marketing research, launching new products, and other applications. In this paper, we take a microscopic view of the information propagation pattern in blogosphere by investigating blog cascade affinity. A blog cascade is a group of posts linked together discussing about the same topic, and cascade affinity refers to the phenomenon of a blog’s inclination to join a specific cascade. We identify and analyze an array of macroscopic and microscopic content-oblivious features that may affect a blogger’s cascade joining behavior and utilize these features to predict cascade affinity of blogs. Based on these features, we present two non-probabilistic and probabilistic strategies, namely support vector machine (SVM) classification-based approach and Bipartite Markov Random Field-based (BiMRF) approach, respectively, to predict the probability of blogs’ affinity to a cascade and rank them accordingly. Evaluated on a real dataset consisting of 873,496 posts, our experimental results demonstrate that our prediction strategy can generate high quality results (\(F1\)-measure of 72.5 % for SVM and 71.1 % for BiMRF) comparing with the approaches using traditional or singular features only such as elapsed time, number of participants which is around 11.2 and 8.9 %, respectively. Our experiments also showed that among all features identified, the number of quasi-friends is the most important factor affecting bloggers’ inclination to join cascades.

Keywords

Social networks Network evolution Blog cascade  Information flow 

Notes

Acknowledgments

Part of the work was done when the first author was pursuing PhD in School of Computer Engineering, Nanyang Technological University, Singapore. This work is partly supported by NSFC 61202179, 61173089

References

  1. Adams B, Phung DQ, Venkatesh S (2010) Discovery of latent subcommunities in a blog’s readership. TWEB 4(3):12:1–12:30Google Scholar
  2. Agarwal N, Liu H, Tang L, Yu PS (2008) Identifying the influential bloggers in a community. In: WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining, pp 207–218Google Scholar
  3. Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 44–54Google Scholar
  4. Bao H, Chang EY (2010) Adheat: an influence-based diffusion model for propagating hints to match ads. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 71–80Google Scholar
  5. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512CrossRefMathSciNetGoogle Scholar
  6. Bikhchandani S, Hirshleifer D, Welch I (1992) A theory of fads, fashion, custom, and cultural change as informational cascades. J Political Econ 100(5):992–1026CrossRefGoogle Scholar
  7. Cha M, Mislove A, Gummadi PK (2009) A measurement-driven analysis of information propagation in the flickr social network. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 721–730Google Scholar
  8. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 10 Feb 2013
  9. Chekuri C, Even G, Kortasrz G (2006) A greedy approximation algorithm for the group steiner problem. Discret Appl Math 154(1):15–34CrossRefMATHGoogle Scholar
  10. Chen H, Tiño P, Yao X (2009b) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013CrossRefGoogle Scholar
  11. Chen D, Tang J, Li J, Zhou L (2009a) Discovering the staring people from social networks. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 1219–1220Google Scholar
  12. Clements M, De Vries AP, Reinders MJT (2010) The task-dependent effect of tags and ratings on social media access. ACM Trans Inf Syst 28:21:1–21:42CrossRefGoogle Scholar
  13. Davidson I, Gilpin S, Walker PB (2012) Behavioral event data and their analysis. Data Min Knowl Discov 25(3):635–653Google Scholar
  14. Dodds PS, Watts DJ (2004) Universal behavior in a generalized model of contagion. Phys Rev Lett 92(21):218, 701+Google Scholar
  15. Goyal A, Bonchi F, Lakshmanan Laks VS (2012) A data-based approach to social influence maximization. PVLDB 5(1):73–84Google Scholar
  16. Gruhl D, Guha RV, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: WWW ’04: Proceedings of the 13th international conference on, World wide web, pp 491–501Google Scholar
  17. Guice SL (1995) Creating Communities of Readers: A Study of Children’s Information Networks as Multiple Contexts for Responding to Texts. Journal of Literacy Research 27(3):379–397CrossRefGoogle Scholar
  18. Hartline JD, Mirrokni VS, Sundararajan M (2008) Optimal marketing strategies over social networks. In: WWW ’08: Proceedings of the 17th international conference on, World wide web, pp 189–198Google Scholar
  19. Iribarren JL, Moro E (2009) Impact of human activity patterns on the dynamics of information diffusion. Phys Rev Lett 103(3):038, 702+Google Scholar
  20. Karagiannis T, Vojnovic M (2009) Behavioral profiles for advanced email features. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 711–720Google Scholar
  21. Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137–146Google Scholar
  22. Kimura M, Saito K, Motoda H (2009) Blocking links to minimize contamination spread in a social network. ACM Trans Knowl Discov Data 3:9:1–9:23CrossRefGoogle Scholar
  23. Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on, World wide web, pp 568–576Google Scholar
  24. Lee C, Kwak H, Park H, Moon SB (2010) Finding influentials based on the temporal order of information adoption in twitter. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 1137–1138Google Scholar
  25. Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, WWW ’10, pp 621–630Google Scholar
  26. Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: EC ’06: Proceedings of the 7th ACM conference on Electronic commerce, ACM, New York, NY, USA, pp 228–237Google Scholar
  27. Leskovec J, Adamic LA, Huberman BA (2007a) The dynamics of viral marketing. TWEB 1(1): Article 5. doi: 10.1145/1232722.1232727
  28. Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 462–470Google Scholar
  29. Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007b) Cascading behavior in large blog graphs: Patterns and a model. In: SDM ’07: Society of Applied and Industrial Mathematics: Data MiningGoogle Scholar
  30. Li H, Bhowmick SS, Sun A (2009) Blog cascade affinity: analysis and prediction. In: CIKM’ 09: Proceeding of the 18th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’09, pp 1117–1126Google Scholar
  31. Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3):503–528CrossRefMATHMathSciNetGoogle Scholar
  32. Ma H, Yang H, Lyu MR, King I (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: CIKM ’08: Proceeding of the 17th ACM conference on Information and, knowledge management, pp 233–242Google Scholar
  33. McGlohon M, Leskovec J, Faloutsos C, Hurst M, Glance N (2007) Finding patterns in blog shapes and blog evolution. In: International Conference on Weblogs and Social Media, Boulder, ColoGoogle Scholar
  34. Newman MEJ (2002) Spread of epidemic disease on networks. Phys Rev E 66(1):016, 128+Google Scholar
  35. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256CrossRefMATHMathSciNetGoogle Scholar
  36. Pal A, Counts S (2011) Identifying topical authorities in microblogs. In: WSDM ’11: Proceedings of the Forth International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, pp 45–54Google Scholar
  37. Pastor-Satorras R, Vespignani A (2002) Epidemics and immunization in scale-free networks. ArXiv Condensed Matter e-prints/0205260Google Scholar
  38. Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New YorkGoogle Scholar
  39. Satorras RP, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86(14): 3200–3203Google Scholar
  40. Shi X, Zhu J, Cai R, Zhang L (2009) User grouping behavior in online forums. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 777–786Google Scholar
  41. Stewart A, Chen L, Paiu R, Nejdl W (2007) Discovering information diffusion paths from blogosphere for online advertising. In: ADKDD ’07: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, ACM, New York, NY, USA, pp 46–54Google Scholar
  42. Strang D, Soule S (1998) Diffusion in organizations and social movements: from hybrid corn to poison pills. Annu Rev Sociol 24:265–290CrossRefGoogle Scholar
  43. Technorati (2008) State of the blogosphere. Tech Rep http://www.technorati.com/blogging/state-of-the-blogosphere/. Accessed 3 Mar 2010
  44. Wang Y, Chakrabarti D, Wang C, Faloutsos C (2003) Epidemic spreading in real networks: An eigenvalue viewpoint. IEEE Symposium on Reliable Distributed Systems 0:25+Google Scholar
  45. Wang Y, Cong G, Song G, Xie K (2010) Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In: KDD ’10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 1039–1048Google Scholar
  46. Watts D (2002) A simple model of global cascades on random networks. P Natl Acad Sci USA 99(9):5766–5771CrossRefMATHMathSciNetGoogle Scholar
  47. Watts DJ, Dodds PS (2007) Influentials, networks, and public opinion formation. J Consumer Res 34: 441–458Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Hui Li
    • 1
  • Sourav S Bhowmick
    • 2
  • Aixin Sun
    • 2
  • Jiangtao Cui
    • 1
  1. 1.School of Computer Science and TechnologyXidian UniversityXi’anChina
  2. 2.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations