Early detection of persistent topics in social networks

Original Article
  • 208 Downloads

Abstract

In social networking services (SNSs), persistent topics are extremely rare and valuable. In this paper, we propose an algorithm for the detection of persistent topics in SNSs based on Topic Graph. A topic graph is a subgraph of the ordinary social network graph that consists of the users who shared a certain topic up to some time point. Based on the assumption that the time evolutions of the topic graphs associated with persistent and non-persistent topics are different, we propose to detect persistent topics by performing anomaly detection on the feature values extracted from the time evolution of the topic graph. For anomaly detection, we use principal component analysis to capture the subspace spanned by normal (non-persistent) topics. We demonstrate our technique on a real dataset we gathered from Twitter and show that it performs significantly better than a baseline method based on power-law curve fitting, the linear influence model, ridge regression, and Support Vector Machine.

Keywords

Social networks Information diffusion Anomaly detection Principal component analysis Complex networks Topic graph 

References

  1. Allan J, Carbonell J, Doddington G, Yamron J, Yang Y (1998) Topic detection and tracking pilot study: Final report. Evaluation 1998:194–218Google Scholar
  2. Allan J, Papka R, Lavrenko V (1998b) On-line new event detection and tracking. In: Proceedings of SIGIR, pp 37–45Google Scholar
  3. Asur S, Huberman B, Szabó G, Wang C (2011) Trends in social media: Persistence and decay. In: Proceedings of ICSWMGoogle Scholar
  4. Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of WSDM, pp 65–74Google Scholar
  5. Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of WWW, pp 519–528Google Scholar
  6. Bishop CM (2007) Pattern recognition and machine learning. SpringerGoogle Scholar
  7. Boser BE, Guyon IM, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of ACM, COLT, pp 144–152Google Scholar
  8. Boyd D, Ellison N (2007) Social network sites: definition, history, and scholarship. J Comput Mediat Commun 13(1–2):210–230CrossRefGoogle Scholar
  9. Cataldi M, Torino U, Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of MDMKDD, pp 1–10Google Scholar
  10. Cha M, Haddadi H, Benevenuto F, Gummadi K (2010) Measuring user influence in twitter: The million follower fallacy. In: Proceedings of ICWSM, pp 10–17Google Scholar
  11. Christakis N, Fowler J (2008) The Collective Dynamics of Smoking in a Large Social Network. N Eng J Med 358(21):2249–2258CrossRefGoogle Scholar
  12. Cormen T (2001) Introduction to algorithms. The MIT pressGoogle Scholar
  13. Dijkstra E (1959) A note on two problems in connexion with graphs. Numerische mathematik 1(1):269–271MATHMathSciNetCrossRefGoogle Scholar
  14. Donchin E, Heffley E (1978) Multivariate analysis of event-related potential data: a tutorial review. U.S. Gov, Printing OfficeGoogle Scholar
  15. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174MathSciNetCrossRefGoogle Scholar
  16. Hirose S, Yamanishi K, Nakata T, Fujimaki R (2009) Network anomaly detection based on eigen equation compression. In: Proceedings of KDDGoogle Scholar
  17. Ide T, Kashima H (2004) Eigenspace-based anomaly detection in computer systems. In: Proceedings of KDD, pp 440–449Google Scholar
  18. Inokuchi A, Kashima H (2003) Mining significant pairs of patterns from graph structures with class labels. In: Proceedings of ICDMGoogle Scholar
  19. Kim D, Motter A (2007) Ensemble averageability in network spectra. Phys Rev Lett 98(24):248701CrossRefGoogle Scholar
  20. Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of KDD, pp 91–101Google Scholar
  21. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of WWW, pp 591–600Google Scholar
  22. Lakhina A, Crovella M, Diot C (2004) Diagnosing network-wide traffic anomalies. In: Proceedings of SIGCOMM, pp 219–230Google Scholar
  23. Lerman K, Ghosh R (2010) Information contagion: An empirical study of the spread of news on digg and twitter social networks. In: Proceedings of ICWSMGoogle Scholar
  24. Newman M (2004) Fast algorithm for detecting community structure in networks. Physics Review E 69:066–133Google Scholar
  25. Newman M (2005) Power laws, Pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351CrossRefGoogle Scholar
  26. Newman M (2006) Modularity and community structure in networks. Proc Natl Acad of Sci USA 103(23):8577CrossRefGoogle Scholar
  27. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Phil Mag 2(11):559–572CrossRefGoogle Scholar
  28. Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in twitter. In: Proceedings of WICACM, vol 3, pp 120–123Google Scholar
  29. Preisendorfer R, Mobley C (1988) Principal component analysis in meteorology and oceanography. Elsevier, Developments in atmospheric scienceGoogle Scholar
  30. Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet and American Life Project 1Google Scholar
  31. Saito S, Tomioka R, Yamanishi K (2014) Early detection of persistent topics in social networks. In: Proceedings of ASONAM, pp 417–424Google Scholar
  32. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of WWW, pp 851–860Google Scholar
  33. Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In: Proceedings of ICDM, pp 1230–1235Google Scholar
  34. Trusov M, Bucklin R, Pauwels K (2009) Effects of word-of-mouth versus traditional marketing: Findings from an internet social networking site. J Mark 73(5):90–102CrossRefGoogle Scholar
  35. Vapnik V (1998) Statistical learning theory, vol 2. Wiley, New YorkGoogle Scholar
  36. Von Luxburg U (2007) A tutorial on spectral clustering. Statistics and computing 17(4):395–416MathSciNetCrossRefGoogle Scholar
  37. Wang C, Huberman B (2011) Long trend dynamics in social media. CoRR abs/1109.1852Google Scholar
  38. Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442CrossRefGoogle Scholar
  39. Yang J, Leskovec J (2010) Modeling information diffusion in implicit networks. In: Proceedings of ICDM, pp 599–608Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  1. 1.Graduate School of Information Science and TechnologyThe University of TokyoTokyoJapan
  2. 2.Toyota Technological Institute at ChicagoChicagoUSA
  3. 3.CREST, JSTTokyoJapan

Personalised recommendations