Skip to main content
Log in

Expertise drift in referral networks

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Learning-to-refer is a challenge in expert referral networks, wherein Active Learning helps experts (agents) estimate the topic-conditioned skills of other connected experts for problems that the initial expert cannot solve and therefore must seek referral to experts with more appropriate expertise. Recent research has investigated different reinforcement action selection algorithms to assess viability of the learning setting both with uninformative priors and with partially available noisy priors, where experts are allowed to advertise a subset of their skills to their colleagues. Prior to this work, time-varying expertise drift (e.g., experts learning with experience) had not been considered, though it is an aspect that may often arise in practice. This paper addresses the challenge of referral learning with time-varying expertise, proposing Hybrid, a novel combination of Thompson Sampling and Distributed Interval Estimation Learning (DIEL) with variance reset, first proposed in this paper. In our extensive empirical evaluation, considering both biased and unbiased drift, the proposed algorithm outperforms the previous state-of-the-art (DIEL) and other competitive algorithms e.g., Thompson Sampling and Optimistic Thompson Sampling. We further show that our method is robust to topic-dependent drifts and expertise level-dependent drifts, and the newly-proposed DIEL\(_{reset}\) can be effectively combined with other Bayesian approaches e.g., Optimistic Thompson Sampling and Dynamic Thompson Sampling and Discounted Thompson Sampling for improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The data set can be downloaded from https://www.cs.cmu.edu/~akhudabu/referral-networks.html.

  2. [54] reports 0.8 as the optimal value of \(\gamma \) for slowly moving distributions. However, in our experiments, we obtained better performance for both Discounted TS and Hybrid\(_{\texttt {Discounted TS}}\) when \(\gamma \) was set to 0.95. We have not performed extensive parameter tuning for the new TS variants and chose values that seemed reasonable. We admit that with parameter tuning, it may be possible to squeeze further performance boost out of Hybrid\(_{\texttt {Dynamic TS}}\), but our primary goal was to test Hybrid’s design compatibility with other TS variants.

References

  1. Abdallah, S., & Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. The Journal of Machine Learning Research, 17(1), 1582–1612.

    MathSciNet  MATH  Google Scholar 

  2. Agrawal, R. (1995). Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054–1078.

    Article  MathSciNet  MATH  Google Scholar 

  3. Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).

  4. Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. arXiv preprint arXiv:0801.0970.

  5. Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In IEEE international conference on data science and advanced analytics (DSAA) (pp. 1–7). IEEE.

  6. Audibert, J. Y., Munos, R., & Szepesvári, C. (2007). Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory (pp. 150–165). Springer.

  7. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.

    Article  MATH  Google Scholar 

  8. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77.

    Article  MathSciNet  MATH  Google Scholar 

  9. Babaioff, M., Sharma, Y., & Slivkins, A. (2014). Characterizing truthful multi-armed bandit mechanisms. SIAM Journal on Computing, 43(1), 194–230.

    Article  MathSciNet  MATH  Google Scholar 

  10. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  MATH  Google Scholar 

  11. Baram, Y., Yaniv, R. E., & Luz, K. (2004). Online choice of active learning algorithms. Journal of Machine Learning Research, 5(Mar), 255–291.

    MathSciNet  Google Scholar 

  12. Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (monographs on statistics and applied probability) (Vol. 12). Berlin: Springer.

    Book  MATH  Google Scholar 

  13. Bertsimas, D., & Niño-Mora, J. (2000). Restless bandits, linear programming relaxations, and a primal–dual index heuristic. Operations Research, 48(1), 80–90.

    Article  MathSciNet  MATH  Google Scholar 

  14. Bouneffouf, D., & Feraud, R. (2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16–21.

    Article  Google Scholar 

  15. Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence (Vol. 17, pp. 1021–1026). Lawrence Erlbaum Associates Ltd.

  16. Burtini, G., Loeppky, J., & Lawrence, R. (2015). A survey of online experiment design with the stochastic multi-armed bandit. arXiv preprint arXiv:1510.00757.

  17. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  18. Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2249–2257).

  19. Donmez, P., Carbonell, J., & Schneider, J. (2010). A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of the 2010 SIAM international conference on data mining (pp. 826–837). SIAM.

  20. Donmez, P., Carbonell, J. G., & Bennett, P. N. (2007) Dual strategy active learning. In Machine learning ECML (pp. 116–127).

  21. Donmez, P., Carbonell, J. G., & Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (p. 259).

  22. Dragoni, N. (2006). Fault tolerant knowledge level inter-agent communication in open multi-agent systems. AI Communications, 19(4), 385–387.

    MathSciNet  Google Scholar 

  23. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.

    Article  MATH  Google Scholar 

  24. Garivier, A., & Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415.

  25. Gittins, J. C., & Jones, D. M. (1974). A dynamic allocation indices for the sequential design of experiments. In J. Gani (Ed.), Progress in statistics, European meeting of statisticians (Vol. 1, pp. 241–266).

  26. Gorner, J. M. (2011). Advisor networks and referrals for improved trust modelling in multi-agent systems. Master’s thesis, University of Waterloo

  27. Graepel, T., Candela, J. Q., Borchert, T., & Herbrich, R. (2010). Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 13–20).

  28. Granmo, O. C. (2010). Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2), 207–234.

    Article  MathSciNet  MATH  Google Scholar 

  29. Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 3.

    Article  MathSciNet  MATH  Google Scholar 

  30. Gupta, N., Granmo, O. C., & Agrawala, A. (2011). Thompson sampling for dynamic multi-armed bandits. In 10th International conference on machine learning and applications and workshops (ICMLA) (Vol. 1, pp. 484–489). IEEE.

  31. Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., & Sebag, M. (2006). Multi-armed bandit, dynamic environments and meta-bandits. In NIPS-2006 workshop, Online trading between exploration and exploitation. Whistler, Canada.

  32. Hasselt, H. V. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621).

  33. Holme, P., & Kim, B. J. (2002). Growing scale-free networks with tunable clustering. Physical Review E, 65(2), 026,107.

    Article  Google Scholar 

  34. Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., & Tygar, J. (2011). Adversarial machine learning. In Proceedings of the 4th ACM workshop on security and artificial intelligence (pp. 43–58). ACM.

  35. Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge: MIT Press.

    Book  Google Scholar 

  36. Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (Vol. 1, pp. 309–316). International Foundation for Autonomous Agents and Multiagent Systems.

  37. Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213). Springer.

  38. Kautz, H., Selman, B., & Milewski, A. (1996). Agent amplified communication (pp. 3–9).

  39. KhudaBukhsh, A. R., & Carbonell, J. G. (2018). Expertise drift in referral networks. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 425–433). International Foundation for Autonomous Agents and Multiagent Systems.

  40. KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive-DIEL in evolving referral networks. In European conference on multi-agent systems (pp. 148–156). Springer.

  41. KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive skill posting in referral networks. In Australasian joint conference on artificial intelligence (pp. 585–596). Springer.

  42. KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Incentive compatible proactive skill posting in referral networks. In European conference on multi-agent systems. Springer.

  43. KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Robust learning in expert networks: A comparative analysis. In International symposium on methodologies for intelligent systems (ISMIS) (pp. 292–301). Springer.

  44. KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2018). Robust learning in expert networks: A comparative analysis. Journal of Intelligent Information Systems, 51(2), 207–234.

    Article  Google Scholar 

  45. KhudaBukhsh, A. R., Jansen, P. J., & Carbonell, J. G. (2016). Distributed learning in expert referral networks. In European conference on artificial intelligence (ECAI) (pp. 1620–1621).

  46. Lai, T. L. (2001). Sequential analysis. New York: Wiley Online Library.

    Google Scholar 

  47. Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.

    Article  MathSciNet  MATH  Google Scholar 

  48. Langford, J., Strehl, A., & Wortman, J. (2008). Exploration scavenging. In Proceedings of the 25th international conference on Machine learning (pp. 528–535). ACM.

  49. Levine, N., Crammer, K., & Mannor, S. (2017). Rotting bandits. In Advances in neural information processing systems (pp. 3074–3083).

  50. Liu, K., & Zhao, Q. (2010). Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11), 5547–5567.

    Article  MathSciNet  MATH  Google Scholar 

  51. Lu, X., Adams, N., & Kantas, N. (2017). On adaptive estimation for dynamic Bernoulli bandits. arXiv preprint arXiv:1712.03134.

  52. May, B. C., Korda, N., Lee, A., & Leslie, D. S. (2012). Optimistic Bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13(Jun), 2069–2106.

    MathSciNet  MATH  Google Scholar 

  53. Noda, I. (2009). Recursive adaptation of stepsize parameter for non-stationary environments. In ALA (pp. 74–90). Springer.

  54. Raj, V., & Kalyani, S. (2017). Taming non-stationary bandits: A Bayesian approach. arXiv preprint arXiv:1707.09727.

  55. Shivaswamy, P. K., & Joachims, T. (2012). Multi-armed bandit problems with history. In N. D. Lawrence & M. Girolami (Eds.), International Conference on Artificial Intelligence and Statistics (pp. 1046–1054).

  56. Silva, B. C. D., Basso, E. W., Bazzan, A., & Engel, P. M. (2006). Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on machine learning (pp. 217–224). ACM.

  57. Slivkins, A., & Upfal, E. (2008). Adapting to a changing environment: The Brownian restless bandits. In COLT (pp. 343–354).

  58. Tekin, C., & Liu, M. (2012). Online learning of rested and restless bandits. IEEE Transactions on Information Theory, 58(8), 5588–5611.

    Article  MathSciNet  MATH  Google Scholar 

  59. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.

    Article  MATH  Google Scholar 

  60. Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Computer Science Department, Trinity College Dublin, 106(2), 58.

    Google Scholar 

  61. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.

    MATH  Google Scholar 

  62. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.

    Article  MATH  Google Scholar 

  63. Weber, R. R., & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability, 27(3), 637–648.

    Article  MathSciNet  MATH  Google Scholar 

  64. Wei, W., Li, C. M., & Zhang, H. (2008). A switching criterion for intensification, and diversification in local search for SAT. Journal on Satisfiability, Boolean Modeling and Computation, 4, 219–237.

    Article  MATH  Google Scholar 

  65. Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.

    Article  MathSciNet  MATH  Google Scholar 

  66. Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. In Proceedings of the fifth international conference on simulation of adaptive behavior (SAB’98) (pp. 223–228).

  67. Yolum, P., & Singh, M. P. (2003). Dynamic communities in referral networks. Web Intelligence and Agent Systems, 1(2), 105–116.

    Google Scholar 

  68. Yolum, P., & Singh, M. P. (2003). Emergent properties of referral systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 592–599). ACM.

  69. Yu, B. (2002). Emergence and evolution of agent-based referral networks. Ph.D. thesis, North Carolina State University.

  70. Yu, B., Venkatraman, M., & Singh, M. P. (2003). An adaptive social network for information access: Theoretical and experimental results. Applied Artificial Intelligence, 17, 21–38.

    Article  Google Scholar 

  71. Yu, J. Y., & Mannor, S. (2009). Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th annual international conference on machine learning (pp. 1177–1184). ACM.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashiqur R. KhudaBukhsh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in [39]. The previous version contained an experimental bug due to an inadvertent error in our random sequence generation which we fixed and re-designed Hybrid accordingly. Our new design of Hybrid is more elegant and capable of producing qualitatively similar results to our previously published results. Additionally, this version contains a thorough robustness analysis considering topic-dependent drifts, expertise-level-dependent drifts, and combined topic-and-expertise drift. Extending our results to effectively combining other Thompson Sampling variants such as Dynamic Thompson Sampling [28], Discounted Thompson Sampling [54] and Optimistic Thompson Sampling [52], is also new. We also provide an extensive design-component analysis of Hybrid showing empirical evidence that any simpler design of Hybrid cannot match our current design’s performance.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

KhudaBukhsh, A.R., Carbonell, J.G. Expertise drift in referral networks. Auton Agent Multi-Agent Syst 33, 645–671 (2019). https://doi.org/10.1007/s10458-019-09419-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-019-09419-9

Keywords

Navigation