Sampling dark networks to locate people of interest

  • Pivithuru WijegunawardanaEmail author
  • Vatsal Ojha
  • Ralucca Gera
  • Sucheta Soundarajan
Original Article


Dark networks, which describe networks with covert entities and connections such as those representing illegal activities, are of great interest to intelligence analysts. However, before studying such a network, one must first collect appropriate network data. Collecting accurate network data in such a setting is a challenging task, as data collectors will make inferences, which may be incorrect, based on available intelligence data, which may itself be misleading. In this paper, we consider the problem of how to effectively sample dark networks, in which sampling queries may return incorrect information, with the specific goal of locating people of interest. We present RedLearn and RedLearnRS, two algorithms for crawling dark networks with the goal of maximizing the identification of nodes of interest, given a limited sampling budget. RedLearn assumes that a query on a node can accurately return whether a node represents a person of interest, while RedLearnRS dispenses with that assumption. We consider realistic error scenarios, which describe how individuals in a dark network may attempt to conceal their connections. We evaluate and present results on several real-world networks, including dark networks, as well as various synthetic dark network structures proposed in the criminology literature. Our analysis shows that RedLearn and RedLearnRS meet or outperform other sampling strategies.


Sampling Lying scenarios Nodes of interest Dark networks 



R. Gera thanks the DoD for partially sponsoring this work. This research was supported in part through computational resources provided by Syracuse University.


  1. Adamic LA, Lukose RM, Puniyani AR, Huberman BA (2001) Search in power-law networks. Phys Rev E 64(4):046–135CrossRefGoogle Scholar
  2. Aldous D, Fill J (2002) Reversible markov chains and random walks on graphs. BerkeleyGoogle Scholar
  3. Asztalos A, Toroczkai Z (2010) Network discovery by generalized random walks. EPL (Europhys Lett) 92(5):50,008CrossRefGoogle Scholar
  4. Avrachenkov K, Basu P, Neglia G, Ribeiro B, Towsley D (2014) Pay few, influence most: Online myopic network covering. In: IEEE NetSciCom workshopGoogle Scholar
  5. Baker WE, Faulkner RR (1993) The social organization of conspiracy: Illegal networks in the heavy electrical equipment industry. Am Sociol Rev 58(6):837–860Google Scholar
  6. Benkherouf L, Bather J (1988) Oil exploration: sequential decisions in the face of uncertainty. J Appl Probab 25(3):529–543MathSciNetzbMATHCrossRefGoogle Scholar
  7. Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Social network data analytics, Springer, pp 115–148Google Scholar
  8. Biernacki P, Waldorf D (1981) Snowball sampling: problems and techniques of chain referral sampling. Soc Methods Res 10(2):141–163CrossRefGoogle Scholar
  9. Bliss CA, Danforth CM, Dodds PS (2014) Estimation of global network statistics from incomplete data. PloS ONE 9(10):e108,471CrossRefGoogle Scholar
  10. Bnaya Z, Puzis R, Stern R, Felner A (2013) Social network search as a volatile multi-armed bandit problem. HUMAN 2(2):84Google Scholar
  11. Burfoot C, Bird S, Baldwin T (2011) Collective classification of congressional floor-debate transcripts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-vol 1, Association for Computational Linguistics, pp 1506–1515Google Scholar
  12. Carvalho VR, Cohen WW (2005) On the collective classification of email speech acts. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 345–352Google Scholar
  13. Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56CrossRefGoogle Scholar
  14. Davis B, Gera R, Lazzaro G, Lim BY, Rye EC (2016) The marginal benefit of monitor placement on networks. In: Complex networks VII, Springer, pp 93–104Google Scholar
  15. Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5(1):17–60MathSciNetzbMATHGoogle Scholar
  16. Friedman N, Koller D (2003) Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Mach Learn 50(1–2):95–125zbMATHCrossRefGoogle Scholar
  17. Fronczak A, Fronczak P (2009) Biased random walks in complex networks: the role of local navigation rules. Phys Rev E 80(1):016–107MathSciNetzbMATHCrossRefGoogle Scholar
  18. Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 256–264Google Scholar
  19. Gera R, Miller R, MirandaLopez M, Warnke S, Saxena A (2017) Three is the answer: combining relationships to analyze multilayered terrorist networks. In: Advances in social networks analysis and mining (ASONAM), 2017 IEEE/ACM, IEEEGoogle Scholar
  20. Hanneke S, Xing EP (2009) Network completing and survey sampling. In: AISTATS, pp 209–215Google Scholar
  21. Holme P, Kim BJ (2002) Growing scale-free networks with tunable clustering. Phys Rev E 65(2):026–107CrossRefGoogle Scholar
  22. Hughes BD (1995) Random walks and random environments. Oxford, vol 2, 1995–1996Google Scholar
  23. Koschade S (2006) A social network analysis of jemaah islamiyah: The applications to counterterrorism and intelligence. Stud Confl Terror 29(6):559–575CrossRefGoogle Scholar
  24. Le V (2012) Organised crime typologies: structure, activities and conditions. Int J Criminol Sociol 1:121–131Google Scholar
  25. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, ACM, pp 631–636Google Scholar
  26. Lin F, Cohen WW (2010) Semi-supervised classification of network data using very few labels. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 192–199Google Scholar
  27. Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 496–503Google Scholar
  28. Lu Y, Luo X, Polgar M, Cao Y (2010) Social network analysis of a criminal hacker community. J Comput Inf Syst 51(2):31–41Google Scholar
  29. Macskassy SA, Provost F (2005) Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In: International Conference on Intelligence AnalysisGoogle Scholar
  30. Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98Google Scholar
  31. Michalak TP, Rahwan T, Wooldridge M (2017) Strategic social network analysis. In: AAAI, pp 4841–4845Google Scholar
  32. Neville J, Jensen D (2000) Iterative classification in relational data. In: Proceedings of AAAI-2000 workshop on learning statistical models from relational data, pp 13–20Google Scholar
  33. Noh JD, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92(11):118–701CrossRefGoogle Scholar
  34. Novikov AA, Shiryaev AN (2005) On an effective solution of the optimal stopping problem for random walks. Theory Probab Appl 49(2):344–354MathSciNetzbMATHCrossRefGoogle Scholar
  35. Raab J, Milward HB (2003) Dark networks as problems. J Public Adm Res Theory 13(4):413–439CrossRefGoogle Scholar
  36. Roberts N, Everton S (2011) Terrorist data: Noordin top terrorist network.
  37. Schwartz DM, Rouselle TD (2009) Using social network analysis to target criminal networks. Trends Organ Crime 12(2):188–207CrossRefGoogle Scholar
  38. Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274CrossRefGoogle Scholar
  39. Stern RT, Samama L, Puzis R, Beja T, Bnaya Z, Felner A (2013) Tonic: Target oriented network intelligence collection for the social web. In: AAAIGoogle Scholar
  40. Tsitsiklis JN, Van Roy B (1999) Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans Autom Control 44(10):1840–1851MathSciNetzbMATHCrossRefGoogle Scholar
  41. Wijegunawardana P, Ojha V, Gera R, Soundarajan S (2017) Seeing red: locating people of interest in networks. In: Workshop on Complex Networks CompleNet, Springer, pp 141–150Google Scholar
  42. Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, ACM, pp 981–990Google Scholar
  43. Yan G (2013) Peri-watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 57(2):540–555CrossRefGoogle Scholar
  44. Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th International Conference on World Wide Web, ACM, pp 531–540Google Scholar
  45. Zhu X, Ghahramani Z, Lafferty J et al (2003) Semi-supervised learning using gaussian fields and harmonic functions. ICML 3:912–919Google Scholar

Copyright information

© This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2018

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceSyracuse UniversitySyracuseUSA
  2. 2.Science and Humanities Scholars ProgramCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of Applied MathematicsNaval Postgraduate SchoolMontereyUSA

Personalised recommendations