Autonomous Agents and Multi-Agent Systems

, Volume 33, Issue 5, pp 645–671 | Cite as

Expertise drift in referral networks

  • Ashiqur R. KhudaBukhshEmail author
  • Jaime G. Carbonell


Learning-to-refer is a challenge in expert referral networks, wherein Active Learning helps experts (agents) estimate the topic-conditioned skills of other connected experts for problems that the initial expert cannot solve and therefore must seek referral to experts with more appropriate expertise. Recent research has investigated different reinforcement action selection algorithms to assess viability of the learning setting both with uninformative priors and with partially available noisy priors, where experts are allowed to advertise a subset of their skills to their colleagues. Prior to this work, time-varying expertise drift (e.g., experts learning with experience) had not been considered, though it is an aspect that may often arise in practice. This paper addresses the challenge of referral learning with time-varying expertise, proposing Hybrid, a novel combination of Thompson Sampling and Distributed Interval Estimation Learning (DIEL) with variance reset, first proposed in this paper. In our extensive empirical evaluation, considering both biased and unbiased drift, the proposed algorithm outperforms the previous state-of-the-art (DIEL) and other competitive algorithms e.g., Thompson Sampling and Optimistic Thompson Sampling. We further show that our method is robust to topic-dependent drifts and expertise level-dependent drifts, and the newly-proposed DIEL\(_{reset}\) can be effectively combined with other Bayesian approaches e.g., Optimistic Thompson Sampling and Dynamic Thompson Sampling and Discounted Thompson Sampling for improved performance.


Active Learning Referral networks Expertise drift 



  1. 1.
    Abdallah, S., & Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. The Journal of Machine Learning Research, 17(1), 1582–1612.MathSciNetzbMATHGoogle Scholar
  2. 2.
    Agrawal, R. (1995). Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054–1078.MathSciNetzbMATHGoogle Scholar
  3. 3.
    Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).Google Scholar
  4. 4.
    Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. arXiv preprint arXiv:0801.0970.
  5. 5.
    Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In IEEE international conference on data science and advanced analytics (DSAA) (pp. 1–7). IEEE.Google Scholar
  6. 6.
    Audibert, J. Y., Munos, R., & Szepesvári, C. (2007). Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory (pp. 150–165). Springer.Google Scholar
  7. 7.
    Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.zbMATHGoogle Scholar
  8. 8.
    Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77.MathSciNetzbMATHGoogle Scholar
  9. 9.
    Babaioff, M., Sharma, Y., & Slivkins, A. (2014). Characterizing truthful multi-armed bandit mechanisms. SIAM Journal on Computing, 43(1), 194–230.MathSciNetzbMATHGoogle Scholar
  10. 10.
    Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.MathSciNetzbMATHGoogle Scholar
  11. 11.
    Baram, Y., Yaniv, R. E., & Luz, K. (2004). Online choice of active learning algorithms. Journal of Machine Learning Research, 5(Mar), 255–291.MathSciNetGoogle Scholar
  12. 12.
    Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (monographs on statistics and applied probability) (Vol. 12). Berlin: Springer.zbMATHGoogle Scholar
  13. 13.
    Bertsimas, D., & Niño-Mora, J. (2000). Restless bandits, linear programming relaxations, and a primal–dual index heuristic. Operations Research, 48(1), 80–90.MathSciNetzbMATHGoogle Scholar
  14. 14.
    Bouneffouf, D., & Feraud, R. (2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16–21.Google Scholar
  15. 15.
    Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence (Vol. 17, pp. 1021–1026). Lawrence Erlbaum Associates Ltd.Google Scholar
  16. 16.
    Burtini, G., Loeppky, J., & Lawrence, R. (2015). A survey of online experiment design with the stochastic multi-armed bandit. arXiv preprint arXiv:1510.00757.
  17. 17.
    Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  18. 18.
    Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2249–2257). Google Scholar
  19. 19.
    Donmez, P., Carbonell, J., & Schneider, J. (2010). A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of the 2010 SIAM international conference on data mining (pp. 826–837). SIAM.Google Scholar
  20. 20.
    Donmez, P., Carbonell, J. G., & Bennett, P. N. (2007) Dual strategy active learning. In Machine learning ECML (pp. 116–127).Google Scholar
  21. 21.
    Donmez, P., Carbonell, J. G., & Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (p. 259).Google Scholar
  22. 22.
    Dragoni, N. (2006). Fault tolerant knowledge level inter-agent communication in open multi-agent systems. AI Communications, 19(4), 385–387.MathSciNetGoogle Scholar
  23. 23.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.zbMATHGoogle Scholar
  24. 24.
    Garivier, A., & Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415.
  25. 25.
    Gittins, J. C., & Jones, D. M. (1974). A dynamic allocation indices for the sequential design of experiments. In J. Gani (Ed.), Progress in statistics, European meeting of statisticians (Vol. 1, pp. 241–266). Google Scholar
  26. 26.
    Gorner, J. M. (2011). Advisor networks and referrals for improved trust modelling in multi-agent systems. Master’s thesis, University of WaterlooGoogle Scholar
  27. 27.
    Graepel, T., Candela, J. Q., Borchert, T., & Herbrich, R. (2010). Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 13–20).Google Scholar
  28. 28.
    Granmo, O. C. (2010). Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2), 207–234.MathSciNetzbMATHGoogle Scholar
  29. 29.
    Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 3.MathSciNetzbMATHGoogle Scholar
  30. 30.
    Gupta, N., Granmo, O. C., & Agrawala, A. (2011). Thompson sampling for dynamic multi-armed bandits. In 10th International conference on machine learning and applications and workshops (ICMLA) (Vol. 1, pp. 484–489). IEEE.Google Scholar
  31. 31.
    Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., & Sebag, M. (2006). Multi-armed bandit, dynamic environments and meta-bandits. In NIPS-2006 workshop, Online trading between exploration and exploitation. Whistler, Canada.Google Scholar
  32. 32.
    Hasselt, H. V. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621).Google Scholar
  33. 33.
    Holme, P., & Kim, B. J. (2002). Growing scale-free networks with tunable clustering. Physical Review E, 65(2), 026,107.Google Scholar
  34. 34.
    Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., & Tygar, J. (2011). Adversarial machine learning. In Proceedings of the 4th ACM workshop on security and artificial intelligence (pp. 43–58). ACM.Google Scholar
  35. 35.
    Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge: MIT Press.Google Scholar
  36. 36.
    Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (Vol. 1, pp. 309–316). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  37. 37.
    Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213). Springer.Google Scholar
  38. 38.
    Kautz, H., Selman, B., & Milewski, A. (1996). Agent amplified communication (pp. 3–9).Google Scholar
  39. 39.
    KhudaBukhsh, A. R., & Carbonell, J. G. (2018). Expertise drift in referral networks. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 425–433). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  40. 40.
    KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive-DIEL in evolving referral networks. In European conference on multi-agent systems (pp. 148–156). Springer.Google Scholar
  41. 41.
    KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive skill posting in referral networks. In Australasian joint conference on artificial intelligence (pp. 585–596). Springer.Google Scholar
  42. 42.
    KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Incentive compatible proactive skill posting in referral networks. In European conference on multi-agent systems. Springer.Google Scholar
  43. 43.
    KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Robust learning in expert networks: A comparative analysis. In International symposium on methodologies for intelligent systems (ISMIS) (pp. 292–301). Springer.Google Scholar
  44. 44.
    KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2018). Robust learning in expert networks: A comparative analysis. Journal of Intelligent Information Systems, 51(2), 207–234.Google Scholar
  45. 45.
    KhudaBukhsh, A. R., Jansen, P. J., & Carbonell, J. G. (2016). Distributed learning in expert referral networks. In European conference on artificial intelligence (ECAI) (pp. 1620–1621).Google Scholar
  46. 46.
    Lai, T. L. (2001). Sequential analysis. New York: Wiley Online Library.Google Scholar
  47. 47.
    Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.MathSciNetzbMATHGoogle Scholar
  48. 48.
    Langford, J., Strehl, A., & Wortman, J. (2008). Exploration scavenging. In Proceedings of the 25th international conference on Machine learning (pp. 528–535). ACM.Google Scholar
  49. 49.
    Levine, N., Crammer, K., & Mannor, S. (2017). Rotting bandits. In Advances in neural information processing systems (pp. 3074–3083).Google Scholar
  50. 50.
    Liu, K., & Zhao, Q. (2010). Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11), 5547–5567.MathSciNetzbMATHGoogle Scholar
  51. 51.
    Lu, X., Adams, N., & Kantas, N. (2017). On adaptive estimation for dynamic Bernoulli bandits. arXiv preprint arXiv:1712.03134.
  52. 52.
    May, B. C., Korda, N., Lee, A., & Leslie, D. S. (2012). Optimistic Bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13(Jun), 2069–2106.MathSciNetzbMATHGoogle Scholar
  53. 53.
    Noda, I. (2009). Recursive adaptation of stepsize parameter for non-stationary environments. In ALA (pp. 74–90). Springer.Google Scholar
  54. 54.
    Raj, V., & Kalyani, S. (2017). Taming non-stationary bandits: A Bayesian approach. arXiv preprint arXiv:1707.09727.
  55. 55.
    Shivaswamy, P. K., & Joachims, T. (2012). Multi-armed bandit problems with history. In N. D. Lawrence & M. Girolami (Eds.), International Conference on Artificial Intelligence and Statistics (pp. 1046–1054).Google Scholar
  56. 56.
    Silva, B. C. D., Basso, E. W., Bazzan, A., & Engel, P. M. (2006). Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on machine learning (pp. 217–224). ACM.Google Scholar
  57. 57.
    Slivkins, A., & Upfal, E. (2008). Adapting to a changing environment: The Brownian restless bandits. In COLT (pp. 343–354).Google Scholar
  58. 58.
    Tekin, C., & Liu, M. (2012). Online learning of rested and restless bandits. IEEE Transactions on Information Theory, 58(8), 5588–5611.MathSciNetzbMATHGoogle Scholar
  59. 59.
    Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.zbMATHGoogle Scholar
  60. 60.
    Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Computer Science Department, Trinity College Dublin, 106(2), 58.Google Scholar
  61. 61.
    Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.zbMATHGoogle Scholar
  62. 62.
    Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.zbMATHGoogle Scholar
  63. 63.
    Weber, R. R., & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability, 27(3), 637–648.MathSciNetzbMATHGoogle Scholar
  64. 64.
    Wei, W., Li, C. M., & Zhang, H. (2008). A switching criterion for intensification, and diversification in local search for SAT. Journal on Satisfiability, Boolean Modeling and Computation, 4, 219–237.zbMATHGoogle Scholar
  65. 65.
    Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.MathSciNetzbMATHGoogle Scholar
  66. 66.
    Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. In Proceedings of the fifth international conference on simulation of adaptive behavior (SAB’98) (pp. 223–228).Google Scholar
  67. 67.
    Yolum, P., & Singh, M. P. (2003). Dynamic communities in referral networks. Web Intelligence and Agent Systems, 1(2), 105–116.Google Scholar
  68. 68.
    Yolum, P., & Singh, M. P. (2003). Emergent properties of referral systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 592–599). ACM.Google Scholar
  69. 69.
    Yu, B. (2002). Emergence and evolution of agent-based referral networks. Ph.D. thesis, North Carolina State University.Google Scholar
  70. 70.
    Yu, B., Venkatraman, M., & Singh, M. P. (2003). An adaptive social network for information access: Theoretical and experimental results. Applied Artificial Intelligence, 17, 21–38.Google Scholar
  71. 71.
    Yu, J. Y., & Mannor, S. (2009). Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th annual international conference on machine learning (pp. 1177–1184). ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations