Abstract
There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, and the leaking of data privacy/proprietary. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not to need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.
References
Zhou Z-H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44–53
Zhou Z-H. Open-environment machine learning. Natl Sci Rev, 2022, 9: nwac123
Delange M, Aljundi R, Masana M, et al. A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Machine Intell, 2022, 44: 3366–3385
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 5998–6008
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020. 1877–1901
Zhou Z-H. Learnware: on the future of machine learning. Front Comput Sci, 2016, 10: 589–590
Yang Y, Zhan D C, Fan Y, et al. Deep learning for fixed model reuse. In: Proceedings of Association for the Advancement of Artificial Intelligence, 2017. 2831–2837
Zhao P, Cai L W, Zhou Z-H. Handling concept drift via model reuse. Mach Learn, 2020, 109: 533–568
Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation: learning bounds and algorithms. In: Proceedings of the 22nd Conference on Learning Theory, 2009
Zhang Y, Yang Q. A survey on multi-task learning. IEEE Trans Knowl Data Eng, 2022, 34: 5586–5609
Peng X, Huang Z, Sun X, et al. Domain agnostic learning with disentangled representations. In: Proceedings of International Conference on Machine Learning, 2019. 5102–5112
Xie Y, Tan Z H, Jiang Y, et al. Identifying helpful learnwares without examining the whole market. In: Proceedings of the 26th European Conference on Artificial Intelligence, 2023
Zhou Z-H. Ensemble Methods: Foundations and Algorithms. New York: Chapman & Hall/CRC, 2012
Li N, Tsang I W, Zhou Z-H. Efficient optimization of performance measures by classifier adaptation. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1370–1382
Ding Y-X, Zhou Z-H. Boosting-based reliable model reuse. In: Proceedings of the 12th Asian Conference on Machine Learning, 2020. 145–160
Kuzborskij I, Orabona F. Stability and hypothesis transfer learning. In: Proceedings of International Conference on Machine Learning, 2013. 942–950
Romero F, Li Q, Yadwadkar N J, et al. INFaaS: automated model-less inference serving. In: Proceedings of USENIX Annul Technical Conference, 2021. 397–411
Schölkopf B, Mika S, Burges C J C, et al. Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw, 1999, 10: 1000–1017
Berlinet A, Thomas-Agnan C. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Berlin: Springer, 2011
Zhou Z-H, Jiang Y. NeC4.5: neural ensemble based C4.5. IEEE Trans Knowl Data Eng, 2004, 16: 770–773
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv:1503.02531
Wu X-Z, Xu W, Liu S, et al. Model reuse with reduced kernel mean embedding specification. IEEE Trans Knowl Data Eng, 2023, 35: 699–710
Zhang Y-J, Yan Y-H, Zhao P, et al. Towards enabling learnware to handle unseen jobs. In: Proceedings of Association for the Advancement of Artificial Intelligence, 2021. 10964–10972
Muandet K, Fukumizu K, Sriperumbudur B, et al. Kernel mean embedding of distributions: a review and beyond. FNT Machine Learn, 2017, 10: 1–141
Sriperumbudur B, Fukumizu K, Lanckriet G. Universality, characteristic kernels and rkhs embedding of measures. J Machine Learning Res, 2011, 12: 2389–2410
Bach F, Lacoste-Julien S, Obozinski G. On the equivalence between herding and conditional gradient algorithms. In: Proceedings of International Conference on Machine Learning, 2012. 1355–1362
Karnin Z, Liberty E. Discrepancy, coresets, and sketches in machine learning. In: Proceedings of the 32nd Conference on Learning Theory, 2019. 1975–1993
Phillips J M, Tai W M. Near-optimal coresets of kernel density estimates. Discrete Comput Geom, 2020, 63: 867–887
Ramaswamy H G, Scott C, Tewari A. Mixture proportion estimation via kernel embeddings of distributions. In: Proceedings of International Conference on Machine Learning, 2016. 2052–2060
du Plessis M C, Niu G, Sugiyama M. Class-prior estimation for learning from positive and unlabeled data. Mach Learn, 2017, 106: 463–492
Schapire R E. The strength of weak learnability. Mach Learn, 1990, 5: 197–227
Acknowledgements This work was supported by National Natural Science Foundation of China (Grant No. 62250069).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, ZH., Tan, ZH. Learnware: small models do big. Sci. China Inf. Sci. 67, 112102 (2024). https://doi.org/10.1007/s11432-023-3823-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3823-6