Social Network Analysis and Mining

, Volume 3, Issue 4, pp 1403–1415

Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution

Original Article

Abstract

Sampling the content of an Online Social Network (OSN) is a major application area due to the growing interest in collecting social information e.g., email, location, age and number of friends. Large-scale social networks such as Facebook can be difficult to sample due to the amount of data and the privacy settings imposed by this company. Sampling techniques require the development of reliable algorithms able to cope with an unknown environment. Our main purpose in this manuscript is to examine whether it is possible to switch the normal distribution of the Metropolis–Hasting random walk (MHRW) by using a spiral approach as an alternative and reliable distribution. We propose a sampling algorithm, the Alternative Metropolis–Hasting random walk AMHRW, to study the effect of collecting digital profiles on two different datasets. We examine the soundness and robustness of the proposed algorithm through independent walks on two different representative samples of Facebook. We observe that normal distribution performance can be approximated by means of the use of an Illusion spiral. Similarly, we provide a formal convergence analysis to evaluate the performance of our independent walks and to evaluate whether the sample of draws has attained an equilibrium state. Finally, our preliminary results provide experimental evidence that collecting data with the AMHRW algorithm can be equally effective as the MHRW algorithm on large-scale networks.

Keywords

Random walks Spiral searching Online social networks Markov chain Monte Carlo methods Facebook 

References

  1. Akkermans H (2012) Web dynamics as a random walk: how and why power laws occurGoogle Scholar
  2. Allison P (2010) Survival analysis using SAS: a practical guide. Sas Institute, CaryGoogle Scholar
  3. API (2013) Graph api getting started guide. https://developers.facebook.com/docs/reference/api/search/. Accessed on 6 June 2013
  4. Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the 4th ACM international conference on Web search and data mining, ACM, pp 635–644Google Scholar
  5. Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24Google Scholar
  6. Beskos A, Stuart A (2009) Computational complexity of Metropolis–Hastings methods in high dimensions. In: Monte Carlo and Quasi-Monte Carlo methods 2008, Springer, Berlin, pp 61–71Google Scholar
  7. Best N, Cowles M, Vines K (1995) Coda* convergence diagnosis and output analysis software for gibbs sampling output version 0.30. MRC Biostatistics Unit, CambridgeGoogle Scholar
  8. Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1(3):143–158CrossRefGoogle Scholar
  9. Caci B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min, pp 1–5Google Scholar
  10. Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307Google Scholar
  11. Chapra S, Canale R (2010) Numerical methods for engineers, vol 2. McGraw-Hill, New YorkGoogle Scholar
  12. Codling E, Bearon R, Thorn G (2010) Diffusion about the mean drift location in a biased random walk. Ecology 91(10):3106–3113CrossRefGoogle Scholar
  13. Cook T (1979) The curves of life. Dover Publications, New YorkGoogle Scholar
  14. Cutillo L, Molva R, Onen M (2011) Analysis of privacy in online social networks from the graph theory perspective. In: Proceedings of the global telecommunications conference (GLOBECOM 2011), IEEE, pp 1–5Google Scholar
  15. Davis P, Gautschi W, Iserles A (1993) Spirals: from Theodorus to chaos. AK Peters, WellesleyGoogle Scholar
  16. Dudewicz E (1976) Introduction to statistics and probability. Holt, Rinehart and WinstonGoogle Scholar
  17. Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137CrossRefGoogle Scholar
  18. Geweke J et al (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Department, Federal Reserve Bank of MinneapolisGoogle Scholar
  19. Gjoka M, Kurant M, Butts C, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892CrossRefGoogle Scholar
  20. Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM ’10. San Diego, CAGoogle Scholar
  21. Hargittai I (1992) Spiral symmetry. World Scientific Publishing Company Incorporated, SingaporeGoogle Scholar
  22. Karatzas I, Shreve S (1991) Brownian motion and stochastic calculus, vol 113, Springer, BerlinGoogle Scholar
  23. Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th international conference on World wide web, ACM, pp 597–606Google Scholar
  24. Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of ACM SIGMETRICS ’11. San Jose, CAGoogle Scholar
  25. Lafore R, Waite M (2003) Data structures and algorithms in Java. Sams PublishingGoogle Scholar
  26. LeSage J (1999) Applied econometrics using matlab. Manuscript, Department of Economics, University of TorontoGoogle Scholar
  27. Martinez W, Martinez A (2001) Computational statistics handbook with MATLAB, vol 2. Chapman and Hall/CRCGoogle Scholar
  28. Mislove A, Gummadi KP, Druschel P (2006) Exploiting social networks for internet search. In: Proceedings of the 5th workshop on hot topics in networks (HotNets06). Citeseer, p 79Google Scholar
  29. Papagelis M, Das G, Koudas N (2011) Sampling online social networksGoogle Scholar
  30. Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th annual conference on Internet measurement, ACM, pp 390–403Google Scholar
  31. Robert C, Casella G (2009) Introducing Monte Carlo methods with R. Springer, BerlinGoogle Scholar
  32. Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26CrossRefGoogle Scholar
  33. Stuckman J, Purtilo J (2011) Analyzing the wikisphere: methodology and data to support quantitative wiki research. J Am Soc Inf Sci Technol 62(8):1564–1576CrossRefGoogle Scholar
  34. Tang J, Musolesi M, Mascolo C, Latora V (2009) Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM workshop on Online social networks, ACM, pp 31–36Google Scholar
  35. Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. Arxiv preprint arXiv:1111.4503Google Scholar
  36. Viswanathan G, Buldyrev S, Havlin S, Da Luz M, Raposo E, Stanley H (1999) Optimizing the success of random searches. Nature 401(6756):911–914CrossRefGoogle Scholar
  37. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  1. 1.School of Computer Science and Electronic EngineeringUniversity of EssexColchesterUK

Personalised recommendations