Inference of the Russian drug community from one of the largest social networks in the Russian Federation

Abstract

The criminal nature of narcotics complicates the direct assessment of a drug community, while having a good understanding of the type of people drawn or currently using drugs is vital for finding effective intervening strategies. Especially for the Russian Federation this is of immediate concern given the dramatic increase it has seen in drug abuse since the fall of the Soviet Union in the early nineties. Using unique data from the Russian social network ‘LiveJournal’ with over 39 million registered users worldwide, we were able for the first time to identify the on-line drug community by context sensitive text mining of the users’ blogs using a dictionary of known drug-related official and ‘slang’ terminology. By comparing the interests of the users that most actively spread information on narcotics over the network with the interests of the individuals outside the on-line drug community, we found that the ‘average’ drug user in the Russian Federation is generally mostly interested in topics such as Russian rock, non-traditional medicine, UFOs, Buddhism, yoga and the occult. We identify three distinct scale-free sub-networks of users which can be uniquely classified as being either ‘infectious’, ‘susceptible’ or ‘immune’.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    LiveJournal is available at http://www.livejournal.com (English) and http://www.livejournal.ru (Russian).

  2. 2.

    Facebook is available at http://www.facebook.com.

  3. 3.

    Twitter is available at http://www.twitter.com.

  4. 4.

    LiveJournal’s own statistics page can be found at http://www.livejournal.com/stats.bml.

  5. 5.

    The homepage of SPb IAC can be found at http://iac.spb.ru (in Russian).

  6. 6.

    The full drug-dictionary is freely available and can be downloaded at http://escience.ifmo.ru/?ws=sub48.

  7. 7.

    The number of phrases (8,359) is rather high in comparison to the number of words (368) in this dictionary. This is due to the fact that we consider a phrase consisting, for example, of the words ‘injecting’, ‘heroin’ and the phrase with the words ‘injection’, ‘heroin’ and ‘needle’ as two separate expressions (where the latter is associated with a higher weight than the former).

  8. 8.

    A \(\chi ^2\) test originally designed for \(2 \times 2\) contingency tables by Sir R. A. Fisher (1922).

  9. 9.

    Strictly speaking, the expected false discovery rate is only upper bounded when the \(m\) test statistics are independent, which does not hold in this particular case. B. Efron makes the case in his book Large-Scale Inference (2010) that this independency constraint is not strong.

  10. 10.

    The governmental statistics agency of the Russian Federation. They can be found at http://www.gks.ru (in Russian) with links to their rather extensive database.

  11. 11.

    A rank/frequency log–log plot is the plot of the occurrence frequency versus the rank on logarithmically scaled axes. For a more elaborate description on how to construct such a plot, see the paper by Newman (2005), Appendix.

References

  1. Agar, M.: Agents in living color: towards emic agent-based models. J. Artif. Soc. Soc. Simul. 8(4). http://jasss.soc.surrey.ac.uk/8/1/4.html (2005). Accessed 13 May 2013

  2. Agresti, A.: A survey of exact inference for contingency tables. Stat. Sci. 7(1), 131–177 (1992)

    Article  Google Scholar 

  3. Albert, R., Jeong, H., Barabasi, A.L.: Error and attack tolerance of complex networks. Nature 406, 378–382 (2000)

    Article  Google Scholar 

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1), 289–300 (1995)

    Google Scholar 

  5. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)

    Article  Google Scholar 

  6. Bernades, D.F., Latapy, M., Tarissan, F.: Relevance of SIR model for real-world spreading phenomena: experiments on a large-scale p2p system. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM), Istanbul (2012)

  7. Bollobas, B., Riordan, O.: Robustness and vulnerability of scale-free random graphs. Internet Math. 1(1), 1–35 (2004)

    Article  Google Scholar 

  8. Cafarella, M., Cutting, D.: Building nutch: open source search. ACM Queue 2(2), 54–61 (2004). doi:10.1145/988392.988408

    Article  Google Scholar 

  9. Clauset, A., Shalizi, C., Newman, M.: Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009). doi:10.1137/070710111

    Article  Google Scholar 

  10. Crucitti, P., Latora, V., Marchiori, M., Rapisarda, A.: Efficiency of scale-free networks: error and attack tolerance. Phys. A Stat. Mech. Appl. 320, 622–642 (2003)

    Article  Google Scholar 

  11. Daley, D., Kendall, D.: Epidemics and rumours. Nature 204, 1118 (1964). doi:10.1038/2041118a0

    Article  Google Scholar 

  12. Efron, B.: Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing and Prediction. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  13. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)

    Google Scholar 

  14. Ferri, F., Grifoni, P., Guzzo, T.: New forms of social and professional digital relationships: the case of Facebook. Soc. Netw. Anal. Min. 2(2), 121–137 (2012)

    Article  Google Scholar 

  15. Fisher, R.: On the interpretation \(\chi ^2\) from contingency tables, and the calculation of \(p\). J. R. Stat. Soc. 85(1), 87–94 (1922)

    Article  Google Scholar 

  16. Gallos, L.K., Barttfield, P., Havlin, S., Sigman, M., Makse, H.A.: Collective behavior in the spatial spreading of obesity. Sci. Rep. 2(45), 1–9 (2012)

    Google Scholar 

  17. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)

    Google Scholar 

  18. Iribarren, J.B., Moro, E.: Impact of human activity patterns on the dynamics of information diffusion. Phys. Rev. Lett. 103(3), 8–11 (2009)

    Article  Google Scholar 

  19. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. IEEE Press/Wiley-Interscience, Hoboken (2011)

    Google Scholar 

  20. Lämmel, R.: Google’s MapReduce programming model—revisted. Sci. Comput. Program. 70, 1–30 (2007)

    Article  Google Scholar 

  21. Mityagin, S.A.: Modeling the spread of drug-addiction through the population on the basis of complex networks (in Russian—Modelirovanie processov narkotizatsiya nasileniya na osnove kompleksnix cetei). Dissertation, National Research University of Information Technologies, Mechanics and Optics (2012)

  22. Newman, M.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2005)

    Article  Google Scholar 

  23. Ochiai, A.: A zoogeographic studies on the solenoid fishes found in Japan and its neighbouring regions. Bull. Jpn. Soc. Fish Sci. 22, 526–530 (1957)

    Article  Google Scholar 

  24. Onnela, J., Saramäki, J., Hyvönen, J., Szabó, G., Lazer, D., Kaski, K., Kertész, J., Barabási, A.L.: Structure and tie strengths in mobile communication networks. Proc. Natl. Acad. Sci. USA 104(8), 7332–7336 (2007)

    Article  Google Scholar 

  25. Pinto, C., Mendes Lopez, A., Machado, J.: A review of power laws in real life phenomena. Commun. Nonlinear Sci. Numer. Simul. 17(9), 3558–3578 (2012)

    Article  Google Scholar 

  26. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  27. Porter, M.F.: Stemming algorithms for various European languages. http://www.snowball.tartarus.org/texts/stemmersoverview.html (2006). Accessed 19 November 2012

  28. Scott, J.: Social network analysis: developments, advances, and prospects. Soc. Netw. Anal. Min. 1(1), 21–26 (2011)

    Article  Google Scholar 

  29. Sunami, A.N.: Drug-conflict management in the context of information warfare (in Russian—Politika upravleniya narkokonfliktom v kontekste informatsionnoi voiny). Saint Petersburg State University, Saint Petersburg (2007)

  30. Wilson, R.E., Gosling, S.D., Graham, L.T.: A review of Facebook research in the social sciences. Perspect. Psychol. Sci. 7(3), 203–220 (2012)

    Article  Google Scholar 

  31. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Yahoo! Press, New York (2009)

Download references

Acknowledgments

The authors thank Dr. Sergey Mityagin from the Saint Petersburg Information and Analytical Center (SPb IAC) for fruitful discussions on the drug addiction profiles in the Russian Federation. In addition, the authors would like to express their gratitude to Prof. Dr. T.K. Dijkstra from the University of Groningen (RUG) and the Free University Amsterdam (VU) for introducing us with false discovery rate control and his useful remarks. This work is supported by the Leading Scientist Program of the Russian Federation, contract 11.G34.31.0019, as well as by the Complexity program of NTU, Singapore. Peter Sloot also acknowledges the support from the FET-Proactive Grant TOPDRIM, Number FP7-ICT-318121.

Author information

Affiliations

Authors

Corresponding author

Correspondence to P. M. A. Sloot.

Additional information

L. J. Dijkstra and A. V. Yakushev share first authorship of this work.

Appendix: LiveJournal user interests

Appendix: LiveJournal user interests

Figure 6a shows the frequency of occurrence of interests within the crawled population. Note that the distribution is heavy right-tailed; its slope suggests that the distribution might follow a power-law, see Eq. (5). Figure 6b shows the corresponding rank/frequency log–log plot of the histogram in Fig. 6a. The exponent \(\gamma \approx 1.54\) and the start of the distribution \(x_{min} = 3\) were approximated using the maximum likelihood method as proposed by Clauset et al. (2009). Note that the fitted line in Fig. 6b approximates the distribution quite well. The standard goodness-of-fit test (Clauset et al. 2009) indicates there is no reason to believe that the distribution does not follow a power-law, i.e., the \(p\) value was approximately equal to \(0.57\).

Fig. 6
figure6

a The histogram of interests expressed by the users in the crawled LiveJournal data set. b The rank/frequency log–log plot of the histogram in 6a and the maximum likelihood power-law fit (\(\gamma \approx 1.54\) and \(x_{min} = 3\))

The fact that the distribution of interests within the SNS LiveJournal is heavy-right tailed explains why the number of susceptible users (see Table 4) is relatively small compared to the other groups.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Dijkstra, L.J., Yakushev, A.V., Duijn, P.A.C. et al. Inference of the Russian drug community from one of the largest social networks in the Russian Federation. Qual Quant 48, 2739–2755 (2014). https://doi.org/10.1007/s11135-013-9921-6

Download citation

Keywords

  • Illicit drug use
  • Drug use
  • Social network
  • LiveJournal
  • Power-law
  • Russian Federation