Advertisement

Machine Learning

, Volume 106, Issue 9–10, pp 1747–1770 | Cite as

Group online adaptive learning

  • Alon Zweig
  • Gal Chechik
Article
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2017 Journal Track

Abstract

Sharing information among multiple learning agents can accelerate learning. It could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. Here we address the problem in the context of online adaptive learning. We formally define the learning settings of Group Online Adaptive Learning and derive an algorithm named Shared Online Adaptive Learning (SOAL) to address it. SOAL avoids explicitly modeling changes or their dynamics, and instead shares information continuously. The key idea is that learners share a common small pool of experts, which they can use in a weighted adaptive way. We define group adaptive regret and prove that SOAL maintains known bounds on the adaptive regret obtained for single adaptive learners. Furthermore, it quickly adapts when learning tasks are related to each other. We demonstrate the benefits of the approach for two domains: vision and text. First, in the visual domain, we study a visual navigation task where a robot learns to navigate based on outdoor video scenes. We show how navigation can improve when knowledge from other robots in related scenes is available. Second, in the text domain, we create a new dataset for the task of assigning submitted papers to relevant editors. This is, inherently, an adaptive learning task due to the dynamic nature of research fields evolving in time. We show how learning to assign editors improves when knowledge from other editors is available. Together, these results demonstrate the benefits for sharing information across learners in concurrently changing environments.

Keywords

Multi-task learning Knowledge transfer Adaptive learning Online learning Domain adaptation 

References

  1. Abernethy, J., Bartlett, P. L., Rakhlin, A., & Tewari, A. (2008). Optimal strategies and minimax lower bounds for online convex games. In: Proceedings of the nineteenth annual conference on computational learning theory.Google Scholar
  2. Adamskiy, D., Warmuth, M. K., & Koolen, W. M. (2012). Putting Bayes to sleep. In: Advances in neural information processing systems (pp. 135–143).Google Scholar
  3. Ang, H. H., Gopalkrishnan, V., Zliobaite, I., Pechenizkiy, M., & Hoi, S. C. (2013). Predictive handling of asynchronous concept drifts in distributed environments. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2343–2355.CrossRefGoogle Scholar
  4. Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.CrossRefGoogle Scholar
  5. Bartlett, P. L., & Tewari, A. (2007). Sparseness vs estimating conditional probabilities: Some asymptotic results. The Journal of Machine Learning Research, 8, 775–790.MathSciNetMATHGoogle Scholar
  6. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.MathSciNetCrossRefGoogle Scholar
  7. Cavallanti, G., Cesa-Bianchi, N., & Gentile, C. (2010). Linear algorithms for online multitask classification. The Journal of Machine Learning Research, 11, 2901–2934.MathSciNetMATHGoogle Scholar
  8. Dredze, M., & Crammer, K. (2008). Online methods for multi-domain learning and adaptation. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics (ACL) (pp. 689–697).Google Scholar
  9. Freund, Y., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997). Using and combining predictors that specialize. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 334–343). ACM.Google Scholar
  10. Gabel, M., Keren, D., & Schuster, A. (2015). Monitoring least squares models of distributed streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 319–328). ACM.Google Scholar
  11. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.CrossRefMATHGoogle Scholar
  12. Hazan, E., & Seshadhri, C. (2007). Adaptive algorithms for online decision problems. In: Electronic colloquium on computational complexity (ECCC) (Vol. 14, Issue 088).Google Scholar
  13. Hazan, E., & Seshadhri, C. (2009). Efficient learning algorithms for changing environments. In: Proceedings of the 26th annual international conference on machine learning (pp. 393–400). ACM.Google Scholar
  14. Hazan, E., et al. (2016). Introduction to online convex optimization. Foundations and Trends \({\textregistered }\) in Optimization, 2(3–4), 157–325.Google Scholar
  15. Herbster, M., & Warmuth, M. K. (1998). Tracking the best expert. Machine Learning, 32(2), 151–178.CrossRefMATHGoogle Scholar
  16. Kamp, M., Boley, M., Keren, D., Schuster, A., & Sharfman, I. (2014). Communication-efficient distributed online prediction by dynamic model synchronization. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD (pp. 623–639). Springer.Google Scholar
  17. Lugosi, G., Papaspiliopoulos, O., & Stoltz, G. (2009). Online multi-task learning with hard constraints. arXiv preprint arXiv:0902.3526.
  18. Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 1099–1125.MathSciNetMATHGoogle Scholar
  19. Micchelli, C. A., Xu, Y., & Zhang, H. (2006). Universal kernels. The Journal of Machine Learning Research, 7, 2651–2667.MathSciNetMATHGoogle Scholar
  20. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRefGoogle Scholar
  21. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.CrossRefGoogle Scholar
  22. Procopio, M. J., Mulligan, J., & Grudic, G. (2009). Learning terrain segmentation with classifier ensembles for autonomous robot navigation in unstructured environments. Journal of Field Robotics, 26(2), 145–175.CrossRefGoogle Scholar
  23. Ruvolo, P., & Eaton, E. (2014). Online multi-task learning via sparse dictionary optimization. In: AAAI conference on artificial intelligence (AAAI-14).Google Scholar
  24. Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In: Computer vision–ECCV 2010 (pp. 213–226). Springer.Google Scholar
  25. Saha, A., Rai, P., Venkatasubramanian, S., & Daume, H. (2011). Online learning of multiple tasks and their relationships. In: International conference on artificial intelligence and statistics (pp. 643–651).Google Scholar
  26. Sugiyama, M., Krauledat, M., & Müller, K. R. (2007). Covariate shift adaptation by importance weighted cross validation. The Journal of Machine Learning Research, 8, 985–1005.MATHGoogle Scholar
  27. Thrun, S., & Mitchell, T. M. (1995). Lifelong robot learning. Berlin: Springer.CrossRefGoogle Scholar
  28. Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3081–3088). IEEE.Google Scholar
  29. Xiao, L., et al. (2010). Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 11, 2543–2596.MathSciNetMATHGoogle Scholar
  30. Yang, H., Xu, Z., King, I., & Lyu, M. R. (2010). Online learning for group lasso. In: Proceedings of the 27th international conference on machine learning (ICML) (pp. 1191–1198).Google Scholar
  31. Zeimpekis, D., & Gallopoulos, E. (2005). CLSI: A flexible approximation scheme from clustered term-document matrices. In: Proceedings on SIAM data mining conference (pp. 631–635). SIAM.Google Scholar
  32. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th annual international conference on machine learning (ICML). School of Computer Science, Carnegie Mellon University.Google Scholar
  33. Zliobaite, I. (2009). Learning under concept drift: An overview. Technical report. Vilnius University.Google Scholar
  34. Zweig, A., & Weinshall, D. (2007). Exploiting object hierarchy: Combining models from different category levels. In: IEEE 11th international conference on computer vision (ICCV) (pp. 1–8). IEEE.Google Scholar
  35. Zweig, A., & Weinshall, D. (2013). Hierarchical regularization cascade for joint learning. In: Proceedings of the 30th international conference on machine learning (ICML) (pp. 37–45).Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Qylur Intelligent Systems Inc.Palo AltoUSA
  2. 2.Gonda Brain Research CenterBar-Ilan UniversityRamat GanIsrael
  3. 3.Google ResearchMountain ViewUSA

Personalised recommendations