Advertisement

Incentive Compatible Proactive Skill Posting in Referral Networks

  • Ashiqur R. KhudaBukhsh
  • Jaime G. Carbonell
  • Peter J. Jansen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10767)

Abstract

Learning to refer in a network of experts (agents) consists of distributed estimation of other experts’ topic-conditioned skills so as to refer problem instances too difficult for the referring agent to solve. This paper focuses on the cold-start case, where experts post a subset of their top skills to connected agents, and as the results show, improve overall network performance and, in particular, early-learning-phase behavior. The method surpasses state-of-the-art, i.e., proactive-DIEL, by proposing a new mechanism to penalize experts who misreport their skills, and extends the technique to other distributed learning algorithms: proactive-\(\epsilon \)-Greedy, and proactive-Q-Learning. Our proposed new technique exhibits stronger discouragement of strategic lying, both in the limit and finite-horizon empirical analysis. The method is shown robust to noisy self-skill estimates and in evolving networks.

Keywords

Active learning Referral networks Proactive skill posting 

References

  1. 1.
    KhudaBukhsh, A.R., Jansen, P.J., Carbonell, J.G.: Distributed learning in expert referral networks. Eur. Conf. Artif. Intell. (ECAI) 2016, 1620–1621 (2016)Google Scholar
  2. 2.
    KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J.: Proactive skill posting in referral networks. In: Kang, B.H., Bai, Q. (eds.) AI 2016. LNCS (LNAI), vol. 9992, pp. 585–596. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-50127-7_52CrossRefGoogle Scholar
  3. 3.
    Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58. ACM (2011)Google Scholar
  4. 4.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefGoogle Scholar
  5. 5.
    Chakrabarti, D., Kumar, R., Radlinski, F., Upfal, E.: Mortal multi-armed bandits. In: Advances in Neural Information Processing Systems, pp. 273–280 (2009)Google Scholar
  6. 6.
    Xia, Y., Li, H., Qin, T., Yu, N., Liu, T.: Thompson sampling for Budgeted Multi-armed Bandits. CoRR abs/1505.00146 (2015)Google Scholar
  7. 7.
    Tran-Thanh, L., Chapman, A.C., Rogers, A., Jennings, N.R.: Knapsack based optimal policies for budget-limited multi-armed bandits. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)Google Scholar
  8. 8.
    KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J.: Proactive-DIEL in evolving referral networks. In: Criado Pacheco, N., Carrascosa, C., Osman, N., Julián Inglada, V. (eds.) EUMAS/AT -2016. LNCS (LNAI), vol. 10207, pp. 148–156. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59294-7_13CrossRefGoogle Scholar
  9. 9.
    KhudaBukhsh, A.R., Carbonell, J.G., Jansen, P.J.: Robust learning in expert networks: a comparative analysis. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2017. LNCS (LNAI), vol. 10352, pp. 292–301. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-60438-1_29CrossRefGoogle Scholar
  10. 10.
    Kaelbling, L.P.: Learning in Embedded Systems. MIT Press, Cambridge (1993)Google Scholar
  11. 11.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)CrossRefGoogle Scholar
  12. 12.
    Donmez, P., Carbonell, J.G., Schneider, J.: Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of KDD 2009, p. 259 (2009)Google Scholar
  13. 13.
    Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006).  https://doi.org/10.1007/11856214_5CrossRefGoogle Scholar
  14. 14.
    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Google Scholar
  15. 15.
    Babaioff, M., Sharma, Y., Slivkins, A.: Characterizing truthful multi-armed bandit mechanisms. In: Proceedings of the 10th ACM conference on Electronic commerce, pp. 79–88. ACM (2009)Google Scholar
  16. 16.
    Biswas, A., Jain, S., Mandal, D., Narahari, Y.: A truthful budget feasible multi-armed bandit mechanism for crowdsourcing time critical tasks. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 1101–1109 (2015)Google Scholar
  17. 17.
    Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: European Conference on Artificial Intelligence, pp. 768–773 (2012)Google Scholar
  18. 18.
    Xia, Y., Qin, T., Ma, W., Yu, N., Liu, T.Y.: Budgeted multi-armed bandits with multiple plays. In: Proceedings of 25th International Joint Conference on Artificial Intelligence (2016)Google Scholar
  19. 19.
    Xia, Y., Ding, W., Zhang, X.D., Yu, N., Qin, T.: Budgeted bandit problems with continuous random costs. In: Proceedings of the 7th Asian Conference on Machine Learning, pp. 317–332 (2015)Google Scholar
  20. 20.
    Watkins, C.J., Dayan, P.: Q-Learning. Mach. Learn. 8(3), 279–292 (1992)zbMATHGoogle Scholar
  21. 21.
    KhudaBukhsh, A.R., Xu, L., Hoos, H.H., Leyton-Brown, K.: Satenstein: automatically building local search SAT solvers from components. Artif. Intell. 232, 20–42 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Ashiqur R. KhudaBukhsh
    • 1
  • Jaime G. Carbonell
    • 1
  • Peter J. Jansen
    • 1
  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations