Social Network Analysis and Mining

, Volume 3, Issue 2, pp 209–220 | Cite as

Improving network response times using social information

  • Sharath Hiremagalore
  • Chen Liang
  • Angelos Stavrou
  • Huzefa Rangwala
Original Article

Abstract

Social networks and discussion boards have become significant outlets for people to communicate and freely express their opinions. Although the social networks themselves are usually well-provisioned, the participating users frequently point to external links in order to substantiate their discussions. Unfortunately, the heavy traffic load suddenly imposed on these externally linked websites makes them unresponsive, leading to the “flash crowd effect.” Flash crowds present a real challenge as their intensity and occurrence times are impossible to predict. Moreover, most present-day web hosting servers and caching systems, although increasingly capable, are designed to handle a nominal load of requests before they become unresponsive due to limited bandwidth or the processing power allocated to the hosting site. In this paper, we quantify the prevalence of flash crowd events for a popular social discussion board (Digg). Using PlanetLab, we measured the response times of 1,289 unique popular websites and verified that 89 % of the popular URLs suffered variations in their response times. In an effort to identify flash crowds in advance, we evaluated and compared traffic forecasting mechanisms. We showed that predicting network traffic using network measurements has very limited success and cannot be used for large-scale prediction. However, by analyzing the content and structure of social discussions, we were able to accurately forecast popularity for 86 % of the websites within 5 min of a story’s submission and for 95 % of the sites when more social content (5 h worth) became available. Our work indicates that we can effectively leverage social activity to forecast network events when it would otherwise be infeasible to anticipate them.

Keywords

Flash crowds Traffic prediction Social networks Website response time Social content data mining 

References

  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
  2. Ali-Hasan N, Adamic LA (2007) Expressing social relationships on the blog through links and comments. In: International Conference on Weblogs and Social Media (ICWSM)Google Scholar
  3. Barford P, Kline J, Plonka D, Ron A (2002) A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment. ACM, pp 71–82 Google Scholar
  4. Baryshnikov Y, Coffman E, Pierre G, Rubenstein D, Squillante M, Yimwadsana T (2005) Predictability of web-server traffic congestion. In: Proceedings of the 10th international workshop on web content caching and distribution, IEEE Computer Society, Washington, DC, USA, pages 97–103 Google Scholar
  5. Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159CrossRefGoogle Scholar
  6. Canali C, Colajanni M, Lancellotti R (2010) Characteristics and evolution of content popularity and user relations in social networks. In: 2010 IEEE Symposium on Computers and Communications (ISCC), pp 750–756Google Scholar
  7. Cha M, Prez J, Haddadi H (2011) The spread of media content through blogs. Soc Netw Anal Min. 1–16. doi:10.1007/s13278-011-0040-x
  8. Chabaa S, Zeroual A, Antari J (2010) Identification and prediction of internet traffic using artificial neural networks. JILSA 2(3):147–155CrossRefGoogle Scholar
  9. Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  10. Chang C-C, Lin C-J (2002) Training v-support vector regression: theory and algorithms. Neural Comput 14(8):1959–1977MATHCrossRefGoogle Scholar
  11. Figueiredo F, Benevenuto F, and Almeida JM (2011) The tube over time: characterizing popularity growth of youtube videos. In: Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ’11. ACM, New York, NY, USA, pp 745–754 Google Scholar
  12. Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32:63–76MATHCrossRefGoogle Scholar
  13. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139MathSciNetMATHCrossRefGoogle Scholar
  14. Fu-Ke S, Wei Z, Pan C (2009) An engineering approach to prediction of network traffic based on time-series model. In: International Joint Conference on Artificial Intelligence, 2009. JCAI’09, IEEE, pp 432–435 Google Scholar
  15. Halavais AMC (2001) The slashdot effect: analysis of a large-scale public conversation on the world wide web. University of WashingtonGoogle Scholar
  16. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1)Google Scholar
  17. Jamali S, Rangwala H (2009) Digging digg: comment mining, popularity prediction, and social network analysis. In: WISM’09-AICI’09, Shanghai University of Electic Power, Shanghai, China. EI Compendex and ISTPGoogle Scholar
  18. Jung J, Krishnamurthy B, Rabinovich M (2002) Flash crowds and denial of service attacks: characterization and implications for cdns and web sites. In: Proceedings of the 11th international conference on World Wide Web, WWW ’02, ACM, New York, NY, USA, pages 293–304Google Scholar
  19. Lakhina A, Crovella M, Diot C (2004) Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 201–206Google Scholar
  20. Lerman K (2007) Social information processing in news aggregation. IEEE Internet Comput 11(6):16–28MathSciNetCrossRefGoogle Scholar
  21. Li K, Zhou W, Li P, Hai J, Liu J (2009) Distinguishing ddos attacks from flash crowds using probability metrics. In: Third international conference on network and system security, 2009. NSS ’09, pp 9–17Google Scholar
  22. Li X, Bian F, Crovella M, Diot C, Govindan R, Iannaccone G, Lakhina A (2006) Detection and identification of network anomalies using sketch subspaces. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, ACM, pp 147–152Google Scholar
  23. Liang C, Hiremagalore S, Stavrou A, Rangwala H (2011) Predicting network response times using social information. In: ASONAM, pp 527–531Google Scholar
  24. Mishne G, Glance N (2006) Leave a reply: an analysis of weblog comments. In: In third annual workshop on the Weblogging ecosystemGoogle Scholar
  25. Niksic H (1996) GNU wgetGoogle Scholar
  26. Papagiannaki K, Taft N, Zhang Z.L, Diot C (2005) Long-term forecasting of Internet backbone traffic. IEEE Trans Neural Netw 16(5):1110–1124CrossRefGoogle Scholar
  27. Rangwala H, Jamali S (2010) Defining a coparticipation network using comments on digg. Intell Syst IEEE 25(4):36–45CrossRefGoogle Scholar
  28. Sengar H, Wang X, Wang H, Wijesekera D, Jajodia S (2009) Online detection of network traffic anomalies using behavioral distance. In: 17th International Workshop on quality of service, 2009. IWQoS, IEEE, pp 1–9Google Scholar
  29. Shakkottai S, Johari R (2010) Demand-aware content distribution on the internet. IEEE/ACM Transact Netw 18(2):476–489CrossRefGoogle Scholar
  30. Sivasubramanian S, Szymaniak M, Pierre G, Steen M (2004) Replication for web hosting systems. ACM Comput Surv (CSUR) 36(3):291–334CrossRefGoogle Scholar
  31. Szabo G, Huberman B (2008) Predicting the popularity of online content. Technical Report HP Labs, pp 1–6Google Scholar
  32. Tang L, Liu H (2009) Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 817–826Google Scholar
  33. Tang L, Liu H (2010) Toward collective behavior prediction via social dimension extraction. IEEE Intell SystGoogle Scholar
  34. Webb G (1997) Decision tree grafting. In: In IJCAI-97: fifteen international joint conference on artificial intelligence, Morgan Kaufmann, pp 846–851 Google Scholar
  35. Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40:159–196CrossRefGoogle Scholar
  36. Wendell P, Freedman MJ (2011) Going viral: flash crowds in an open cdn. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, ACM, New York, NY, USA, pp 549–558Google Scholar
  37. Zhongbao K, Changshui Z (2003) Reply networks on a bulletin board system. Phys Rev E 67(3):036117CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Sharath Hiremagalore
    • 1
  • Chen Liang
    • 1
  • Angelos Stavrou
    • 1
  • Huzefa Rangwala
    • 2
  1. 1.Center for Secure Information SystemsGeorge Mason UniversityFairfaxUSA
  2. 2.Department of Computer Science and EngineeringGeorge Mason UniversityFairfaxUSA

Personalised recommendations