Expertise drift in referral networks

KhudaBukhsh, Ashiqur R.; Carbonell, Jaime G.

doi:10.1007/s10458-019-09419-9

Expertise drift in referral networks

Published: 06 August 2019

Volume 33, pages 645–671, (2019)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Ashiqur R. KhudaBukhsh¹ &
Jaime G. Carbonell¹

189 Accesses
1 Citation
Explore all metrics

Abstract

Learning-to-refer is a challenge in expert referral networks, wherein Active Learning helps experts (agents) estimate the topic-conditioned skills of other connected experts for problems that the initial expert cannot solve and therefore must seek referral to experts with more appropriate expertise. Recent research has investigated different reinforcement action selection algorithms to assess viability of the learning setting both with uninformative priors and with partially available noisy priors, where experts are allowed to advertise a subset of their skills to their colleagues. Prior to this work, time-varying expertise drift (e.g., experts learning with experience) had not been considered, though it is an aspect that may often arise in practice. This paper addresses the challenge of referral learning with time-varying expertise, proposing Hybrid, a novel combination of Thompson Sampling and Distributed Interval Estimation Learning (DIEL) with variance reset, first proposed in this paper. In our extensive empirical evaluation, considering both biased and unbiased drift, the proposed algorithm outperforms the previous state-of-the-art (DIEL) and other competitive algorithms e.g., Thompson Sampling and Optimistic Thompson Sampling. We further show that our method is robust to topic-dependent drifts and expertise level-dependent drifts, and the newly-proposed DIEL\(_{reset}\) can be effectively combined with other Bayesian approaches e.g., Optimistic Thompson Sampling and Dynamic Thompson Sampling and Discounted Thompson Sampling for improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from positive and unlabeled data: a survey

Article 02 April 2020

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Emerging trends in federated learning: from model fusion to federated X learning

Article Open access 02 April 2024

Notes

The data set can be downloaded from https://www.cs.cmu.edu/~akhudabu/referral-networks.html.
[54] reports 0.8 as the optimal value of \(\gamma \) for slowly moving distributions. However, in our experiments, we obtained better performance for both Discounted TS and Hybrid\(_{\texttt {Discounted TS}}\) when \(\gamma \) was set to 0.95. We have not performed extensive parameter tuning for the new TS variants and chose values that seemed reasonable. We admit that with parameter tuning, it may be possible to squeeze further performance boost out of Hybrid\(_{\texttt {Dynamic TS}}\), but our primary goal was to test Hybrid’s design compatibility with other TS variants.

References

Abdallah, S., & Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. The Journal of Machine Learning Research, 17(1), 1582–1612.
MathSciNet MATH Google Scholar
Agrawal, R. (1995). Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054–1078.
Article MathSciNet MATH Google Scholar
Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).
Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. arXiv preprint arXiv:0801.0970.
Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In IEEE international conference on data science and advanced analytics (DSAA) (pp. 1–7). IEEE.
Audibert, J. Y., Munos, R., & Szepesvári, C. (2007). Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory (pp. 150–165). Springer.
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77.
Article MathSciNet MATH Google Scholar
Babaioff, M., Sharma, Y., & Slivkins, A. (2014). Characterizing truthful multi-armed bandit mechanisms. SIAM Journal on Computing, 43(1), 194–230.
Article MathSciNet MATH Google Scholar
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Article MathSciNet MATH Google Scholar
Baram, Y., Yaniv, R. E., & Luz, K. (2004). Online choice of active learning algorithms. Journal of Machine Learning Research, 5(Mar), 255–291.
MathSciNet Google Scholar
Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (monographs on statistics and applied probability) (Vol. 12). Berlin: Springer.
Book MATH Google Scholar
Bertsimas, D., & Niño-Mora, J. (2000). Restless bandits, linear programming relaxations, and a primal–dual index heuristic. Operations Research, 48(1), 80–90.
Article MathSciNet MATH Google Scholar
Bouneffouf, D., & Feraud, R. (2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16–21.
Article Google Scholar
Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence (Vol. 17, pp. 1021–1026). Lawrence Erlbaum Associates Ltd.
Burtini, G., Loeppky, J., & Lawrence, R. (2015). A survey of online experiment design with the stochastic multi-armed bandit. arXiv preprint arXiv:1510.00757.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2249–2257).
Donmez, P., Carbonell, J., & Schneider, J. (2010). A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of the 2010 SIAM international conference on data mining (pp. 826–837). SIAM.
Donmez, P., Carbonell, J. G., & Bennett, P. N. (2007) Dual strategy active learning. In Machine learning ECML (pp. 116–127).
Donmez, P., Carbonell, J. G., & Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (p. 259).
Dragoni, N. (2006). Fault tolerant knowledge level inter-agent communication in open multi-agent systems. AI Communications, 19(4), 385–387.
MathSciNet Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.
Article MATH Google Scholar
Garivier, A., & Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415.
Gittins, J. C., & Jones, D. M. (1974). A dynamic allocation indices for the sequential design of experiments. In J. Gani (Ed.), Progress in statistics, European meeting of statisticians (Vol. 1, pp. 241–266).
Gorner, J. M. (2011). Advisor networks and referrals for improved trust modelling in multi-agent systems. Master’s thesis, University of Waterloo
Graepel, T., Candela, J. Q., Borchert, T., & Herbrich, R. (2010). Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 13–20).
Granmo, O. C. (2010). Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2), 207–234.
Article MathSciNet MATH Google Scholar
Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 3.
Article MathSciNet MATH Google Scholar
Gupta, N., Granmo, O. C., & Agrawala, A. (2011). Thompson sampling for dynamic multi-armed bandits. In 10th International conference on machine learning and applications and workshops (ICMLA) (Vol. 1, pp. 484–489). IEEE.
Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., & Sebag, M. (2006). Multi-armed bandit, dynamic environments and meta-bandits. In NIPS-2006 workshop, Online trading between exploration and exploitation. Whistler, Canada.
Hasselt, H. V. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621).
Holme, P., & Kim, B. J. (2002). Growing scale-free networks with tunable clustering. Physical Review E, 65(2), 026,107.
Article Google Scholar
Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., & Tygar, J. (2011). Adversarial machine learning. In Proceedings of the 4th ACM workshop on security and artificial intelligence (pp. 43–58). ACM.
Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge: MIT Press.
Book Google Scholar
Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (Vol. 1, pp. 309–316). International Foundation for Autonomous Agents and Multiagent Systems.
Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213). Springer.
Kautz, H., Selman, B., & Milewski, A. (1996). Agent amplified communication (pp. 3–9).
KhudaBukhsh, A. R., & Carbonell, J. G. (2018). Expertise drift in referral networks. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 425–433). International Foundation for Autonomous Agents and Multiagent Systems.
KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive-DIEL in evolving referral networks. In European conference on multi-agent systems (pp. 148–156). Springer.
KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2016). Proactive skill posting in referral networks. In Australasian joint conference on artificial intelligence (pp. 585–596). Springer.
KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Incentive compatible proactive skill posting in referral networks. In European conference on multi-agent systems. Springer.
KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2017). Robust learning in expert networks: A comparative analysis. In International symposium on methodologies for intelligent systems (ISMIS) (pp. 292–301). Springer.
KhudaBukhsh, A. R., Carbonell, J. G., & Jansen, P. J. (2018). Robust learning in expert networks: A comparative analysis. Journal of Intelligent Information Systems, 51(2), 207–234.
Article Google Scholar
KhudaBukhsh, A. R., Jansen, P. J., & Carbonell, J. G. (2016). Distributed learning in expert referral networks. In European conference on artificial intelligence (ECAI) (pp. 1620–1621).
Lai, T. L. (2001). Sequential analysis. New York: Wiley Online Library.
Google Scholar
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.
Article MathSciNet MATH Google Scholar
Langford, J., Strehl, A., & Wortman, J. (2008). Exploration scavenging. In Proceedings of the 25th international conference on Machine learning (pp. 528–535). ACM.
Levine, N., Crammer, K., & Mannor, S. (2017). Rotting bandits. In Advances in neural information processing systems (pp. 3074–3083).
Liu, K., & Zhao, Q. (2010). Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11), 5547–5567.
Article MathSciNet MATH Google Scholar
Lu, X., Adams, N., & Kantas, N. (2017). On adaptive estimation for dynamic Bernoulli bandits. arXiv preprint arXiv:1712.03134.
May, B. C., Korda, N., Lee, A., & Leslie, D. S. (2012). Optimistic Bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13(Jun), 2069–2106.
MathSciNet MATH Google Scholar
Noda, I. (2009). Recursive adaptation of stepsize parameter for non-stationary environments. In ALA (pp. 74–90). Springer.
Raj, V., & Kalyani, S. (2017). Taming non-stationary bandits: A Bayesian approach. arXiv preprint arXiv:1707.09727.
Shivaswamy, P. K., & Joachims, T. (2012). Multi-armed bandit problems with history. In N. D. Lawrence & M. Girolami (Eds.), International Conference on Artificial Intelligence and Statistics (pp. 1046–1054).
Silva, B. C. D., Basso, E. W., Bazzan, A., & Engel, P. M. (2006). Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on machine learning (pp. 217–224). ACM.
Slivkins, A., & Upfal, E. (2008). Adapting to a changing environment: The Brownian restless bandits. In COLT (pp. 343–354).
Tekin, C., & Liu, M. (2012). Online learning of rested and restless bandits. IEEE Transactions on Information Theory, 58(8), 5588–5611.
Article MathSciNet MATH Google Scholar
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
Article MATH Google Scholar
Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Computer Science Department, Trinity College Dublin, 106(2), 58.
Google Scholar
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
MATH Google Scholar
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.
Article MATH Google Scholar
Weber, R. R., & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability, 27(3), 637–648.
Article MathSciNet MATH Google Scholar
Wei, W., Li, C. M., & Zhang, H. (2008). A switching criterion for intensification, and diversification in local search for SAT. Journal on Satisfiability, Boolean Modeling and Computation, 4, 219–237.
Article MATH Google Scholar
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.
Article MathSciNet MATH Google Scholar
Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. In Proceedings of the fifth international conference on simulation of adaptive behavior (SAB’98) (pp. 223–228).
Yolum, P., & Singh, M. P. (2003). Dynamic communities in referral networks. Web Intelligence and Agent Systems, 1(2), 105–116.
Google Scholar
Yolum, P., & Singh, M. P. (2003). Emergent properties of referral systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 592–599). ACM.
Yu, B. (2002). Emergence and evolution of agent-based referral networks. Ph.D. thesis, North Carolina State University.
Yu, B., Venkatraman, M., & Singh, M. P. (2003). An adaptive social network for information access: Theoretical and experimental results. Applied Artificial Intelligence, 17, 21–38.
Article Google Scholar
Yu, J. Y., & Mannor, S. (2009). Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th annual international conference on machine learning (pp. 1177–1184). ACM.

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Ashiqur R. KhudaBukhsh & Jaime G. Carbonell

Authors

Ashiqur R. KhudaBukhsh
View author publications
You can also search for this author in PubMed Google Scholar
Jaime G. Carbonell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashiqur R. KhudaBukhsh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in [39]. The previous version contained an experimental bug due to an inadvertent error in our random sequence generation which we fixed and re-designed Hybrid accordingly. Our new design of Hybrid is more elegant and capable of producing qualitatively similar results to our previously published results. Additionally, this version contains a thorough robustness analysis considering topic-dependent drifts, expertise-level-dependent drifts, and combined topic-and-expertise drift. Extending our results to effectively combining other Thompson Sampling variants such as Dynamic Thompson Sampling [28], Discounted Thompson Sampling [54] and Optimistic Thompson Sampling [52], is also new. We also provide an extensive design-component analysis of Hybrid showing empirical evidence that any simpler design of Hybrid cannot match our current design’s performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

KhudaBukhsh, A.R., Carbonell, J.G. Expertise drift in referral networks. Auton Agent Multi-Agent Syst 33, 645–671 (2019). https://doi.org/10.1007/s10458-019-09419-9

Download citation

Published: 06 August 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10458-019-09419-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expertise drift in referral networks

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

A practical guide to multi-objective reinforcement learning and planning

Emerging trends in federated learning: from model fusion to federated X learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Expertise drift in referral networks

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

A practical guide to multi-objective reinforcement learning and planning

Emerging trends in federated learning: from model fusion to federated X learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation