Data Mining and Knowledge Discovery

, Volume 32, Issue 1, pp 187–217 | Cite as

Selective harvesting over networks

  • Fabricio MuraiEmail author
  • Diogo Rennó
  • Bruno Ribeiro
  • Gisele L. Pappa
  • Don Towsley
  • Krista Gile


Active search on graphs focuses on collecting certain labeled nodes (targets) given global knowledge of the network topology and its edge weights (encoding pairwise similarities) under a query budget constraint. However, in most current networks, nodes, network topology, network size, and edge weights are all initially unknown. In this work we introduce selective harvesting, a variant of active search where the next node to be queried must be chosen among the neighbors of the current queried node set; the available training data for deciding which node to query is restricted to the subgraph induced by the queried set (and their node attributes) and their neighbors (without any node or edge attributes). Therefore, selective harvesting is a sequential decision problem, where we must decide which node to query at each step. A classifier trained in this scenario can suffer from what we call a tunnel vision effect: without any recourse to independent sampling, the urge to only query promising nodes forces classifiers to gather increasingly biased training data, which we show significantly hurts the performance of active search methods and standard classifiers. We demonstrate that it is possible to collect a much larger set of targets by using multiple classifiers, not by combining their predictions as a weighted ensemble, but switching between classifiers used at each step, as a way to ease the tunnel vision effect. We discover that switching classifiers collects more targets by (a) diversifying the training data and (b) broadening the choices of nodes that can be queried in the future. This highlights an exploration, exploitation, and diversification trade-off in our problem that goes beyond the exploration and exploitation duality found in classic sequential decision problems. Based on these observations we propose D\(^3\)TS, a method based on multi-armed bandits for non-stationary stochastic processes that enforces classifier diversity, which outperforms all competing methods on five real network datasets in our evaluation and exhibits comparable performance on the other two.


Active search Network search Tunnel vision effect Model selection 



This work was sponsored by the ARO under MURI W911NF-12-1-0385, the U.S. Army Research Laboratory under Cooperative Agreement W911NF-09-2-0053, the CNPq, National Council for Scientific and Technological Development—Brazil, FAPEMIG, NSF under SES-1230081, including support from the National Agricultural Statistics Service. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied of the ARL or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. The authors thank Xuezhi Wang and Roman Garnett for kindly providing code and datasets used in Wang et al. (2013).


  1. Ali A, Caruana R, Kapoor A (2014) Active learning with model selection. In: AAAI conference on artificial intelligence, pp 1673–1679Google Scholar
  2. Attenberg J, Provost F (2011) Online active inference and learning. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 186–194Google Scholar
  3. Attenberg J, Melville P, Provost F (2010) Guided feature labeling for budget-sensitive learning under extreme class imbalance. In: ICML workshop on budgeted learningGoogle Scholar
  4. Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77MathSciNetCrossRefzbMATHGoogle Scholar
  5. Avrachenkov K, Basu P, Neglia G, Ribeiro B (2014) Pay few, influence most: online myopic network covering. In: Computer communications workshops (INFOCOM WKSHPS), 2014 IEEE conference on, pp 813–818Google Scholar
  6. Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291MathSciNetGoogle Scholar
  7. Beygelzimer A, Dasgupta S, Langford J (2009) Importance weighted active learning. In: International conference on machine learning, ACM, pp 49–56Google Scholar
  8. Beygelzimer A, Langford J, Li L, Reyzin L, Schapire RE (2011) Contextual bandit algorithms with supervised learning guarantees. In: International conference on artificial intelligence and statistics, pp 19–26Google Scholar
  9. Bnaya Z, Puzis R, Stern R, Felner A (2013) Bandit algorithms for social network queries. In: Social computing (SocialCom), 2013 international conference onGoogle Scholar
  10. Borgs C, Brautbar M, Chayes J, Khanna S, Lucier B (2012) The power of local information in social networks. Internet and network economics. Springer, Berlin, pp 406–419CrossRefGoogle Scholar
  11. Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: International conference on machine learning, ACM, pp 129–136Google Scholar
  12. Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning. Springer series in statistics, vol 1. Springer, BerlinzbMATHGoogle Scholar
  13. Ganti R, Gray AG (2012) UPAL: unbiased pool based active learning. In: International conference on artificial intelligence and statistics, pp 422–431Google Scholar
  14. Ganti R, Gray AG (2013) Building bridges: viewing active learning from the multi-armed bandit lens. In: Conference on uncertainty in artificial intelligence, pp 232–241Google Scholar
  15. Garnett R, Krishnamurthy Y, Wang D, Schneider J, Mann R (2011) Bayesian optimal active search on graphs. In: Workshop on mining and learning with graphsGoogle Scholar
  16. Garnett R, Krishnamurthy Y, Xiong X, Mann R, Schneider JG (2012) Bayesian optimal active search and surveying. In: International conference on machine learning, ACM, pp 1239–1246Google Scholar
  17. Gouriten G, Maniu S, Senellart P (2014) Scalable, generic, and adaptive systems for focused crawling. In: ACM conference on hypertext and social media, pp 35–45Google Scholar
  18. Gupta N, Granmo OC, Agrawala A (2011) Thompson sampling for dynamic multi-armed bandits. In: International conference on machine learning and applications and workshops, vol 1, pp 484–489Google Scholar
  19. Helleputte T (2015) LiblineaR: linear predictive models based on the LIBLINEAR C/C++ library. R package version 1.94-2Google Scholar
  20. Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674MathSciNetCrossRefGoogle Scholar
  21. Hsu WN, Lin HT (2015) Active learning by learning. In: AAAI conference on artificial intelligence, pp 2659–2665Google Scholar
  22. Khuller S, Purohit M, Sarpatwar KK (2014) Analyzing the optimal neighborhood: algorithms for budgeted and partial connected dominating set problems. In: ACM-SIAM symposium on discrete algorithms, pp 1702–1713Google Scholar
  23. Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis, Springer, pp 1126–1138Google Scholar
  24. Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: efficient online random forests. In: Advances in neural information processing systems, pp 3140–3148Google Scholar
  25. Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection.
  26. Liu W, Principe JC, Haykin S (2011) Kernel adaptive filtering: a comprehensive introduction, vol 57. Wiley, HobokenGoogle Scholar
  27. Ma Y, Huang TK, Schneider JG (2015) Active search and bandits on graphs using sigma-optimality. In: Conference on uncertainty in artificial intelligence, pp 542–551Google Scholar
  28. Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256MathSciNetCrossRefzbMATHGoogle Scholar
  29. Pant G, Srinivasan P (2005) Learning to crawl: comparing classification schemes. ACM Trans Inf Syst 23(4):430–462CrossRefGoogle Scholar
  30. Pfeiffer III JJ, Neville J, Bennett PN (2012) Active sampling of networks. In: Workshop on mining and learning with graphsGoogle Scholar
  31. Pfeiffer III JJ, Neville J, Bennett PN (2014) Active exploration in networks: using probabilistic relationships for learning and inference. In: ACM international conference on information and knowledge management, pp 639–648Google Scholar
  32. Robins G, Pattison P, Kalish Y, Lusher D (2007) An introduction to exponential random graph (\(p^*\)) models for social networks. Soc Networks 29(2):173–191CrossRefGoogle Scholar
  33. Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007) Recent developments in exponential random graph (\(p^*\)) models for social networks. Soc networks 29(2):192–215CrossRefGoogle Scholar
  34. Schein AI, Ungar LH (2007) Active learning for logistic regression: an evaluation. Mach Learn 68(3):235–265CrossRefGoogle Scholar
  35. Settles B (2010) Active learning literature survey, vol 52(55–66). University of Wisconsin, MadisonzbMATHGoogle Scholar
  36. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: ACM workshop on computational learning theory, pp 287–294Google Scholar
  37. Stapenhurst R (2012) Diversity, margins and non-stationary learning. Ph.D. thesis, University of ManchesterGoogle Scholar
  38. Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65(1):247–271CrossRefGoogle Scholar
  39. Wang X, Garnett R, Schneider J (2013) Active search on graphs. In: ACM SIGKDD International conference on knowledge discovery and data mining, ACM, pp 731–738Google Scholar
  40. Xie P, Zhu J, Xing E (2016) Diversity-promoting bayesian learning of latent variable models. In: International conference on machine learning, PMLR, vol 48, pp 59–68Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.University of Massachusetts AmherstAmherstUSA
  2. 2.Universidade Federal de Minas GeraisBelo HorizonteBrazil
  3. 3.Universidade Federal de Minas GeraisBelo HorizonteBrazil
  4. 4.Purdue UniversityWest LafayetteUSA

Personalised recommendations