Skip to main content
Log in

Learnware: small models do big

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, and the leaking of data privacy/proprietary. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not to need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Zhou Z-H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44–53

    Article  MathSciNet  Google Scholar 

  2. Zhou Z-H. Open-environment machine learning. Natl Sci Rev, 2022, 9: nwac123

    Article  MathSciNet  Google Scholar 

  3. Delange M, Aljundi R, Masana M, et al. A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Machine Intell, 2022, 44: 3366–3385

    Google Scholar 

  4. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 5998–6008

  5. Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020. 1877–1901

  6. Zhou Z-H. Learnware: on the future of machine learning. Front Comput Sci, 2016, 10: 589–590

    Article  Google Scholar 

  7. Yang Y, Zhan D C, Fan Y, et al. Deep learning for fixed model reuse. In: Proceedings of Association for the Advancement of Artificial Intelligence, 2017. 2831–2837

  8. Zhao P, Cai L W, Zhou Z-H. Handling concept drift via model reuse. Mach Learn, 2020, 109: 533–568

    Article  MathSciNet  MATH  Google Scholar 

  9. Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation: learning bounds and algorithms. In: Proceedings of the 22nd Conference on Learning Theory, 2009

  10. Zhang Y, Yang Q. A survey on multi-task learning. IEEE Trans Knowl Data Eng, 2022, 34: 5586–5609

    Article  Google Scholar 

  11. Peng X, Huang Z, Sun X, et al. Domain agnostic learning with disentangled representations. In: Proceedings of International Conference on Machine Learning, 2019. 5102–5112

  12. Xie Y, Tan Z H, Jiang Y, et al. Identifying helpful learnwares without examining the whole market. In: Proceedings of the 26th European Conference on Artificial Intelligence, 2023

  13. Zhou Z-H. Ensemble Methods: Foundations and Algorithms. New York: Chapman & Hall/CRC, 2012

    Book  Google Scholar 

  14. Li N, Tsang I W, Zhou Z-H. Efficient optimization of performance measures by classifier adaptation. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1370–1382

    Article  Google Scholar 

  15. Ding Y-X, Zhou Z-H. Boosting-based reliable model reuse. In: Proceedings of the 12th Asian Conference on Machine Learning, 2020. 145–160

  16. Kuzborskij I, Orabona F. Stability and hypothesis transfer learning. In: Proceedings of International Conference on Machine Learning, 2013. 942–950

  17. Romero F, Li Q, Yadwadkar N J, et al. INFaaS: automated model-less inference serving. In: Proceedings of USENIX Annul Technical Conference, 2021. 397–411

  18. Schölkopf B, Mika S, Burges C J C, et al. Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw, 1999, 10: 1000–1017

    Article  Google Scholar 

  19. Berlinet A, Thomas-Agnan C. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Berlin: Springer, 2011

    MATH  Google Scholar 

  20. Zhou Z-H, Jiang Y. NeC4.5: neural ensemble based C4.5. IEEE Trans Knowl Data Eng, 2004, 16: 770–773

    Article  Google Scholar 

  21. Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv:1503.02531

  22. Wu X-Z, Xu W, Liu S, et al. Model reuse with reduced kernel mean embedding specification. IEEE Trans Knowl Data Eng, 2023, 35: 699–710

    Article  Google Scholar 

  23. Zhang Y-J, Yan Y-H, Zhao P, et al. Towards enabling learnware to handle unseen jobs. In: Proceedings of Association for the Advancement of Artificial Intelligence, 2021. 10964–10972

  24. Muandet K, Fukumizu K, Sriperumbudur B, et al. Kernel mean embedding of distributions: a review and beyond. FNT Machine Learn, 2017, 10: 1–141

    Article  MATH  Google Scholar 

  25. Sriperumbudur B, Fukumizu K, Lanckriet G. Universality, characteristic kernels and rkhs embedding of measures. J Machine Learning Res, 2011, 12: 2389–2410

    MathSciNet  MATH  Google Scholar 

  26. Bach F, Lacoste-Julien S, Obozinski G. On the equivalence between herding and conditional gradient algorithms. In: Proceedings of International Conference on Machine Learning, 2012. 1355–1362

  27. Karnin Z, Liberty E. Discrepancy, coresets, and sketches in machine learning. In: Proceedings of the 32nd Conference on Learning Theory, 2019. 1975–1993

  28. Phillips J M, Tai W M. Near-optimal coresets of kernel density estimates. Discrete Comput Geom, 2020, 63: 867–887

    Article  MathSciNet  MATH  Google Scholar 

  29. Ramaswamy H G, Scott C, Tewari A. Mixture proportion estimation via kernel embeddings of distributions. In: Proceedings of International Conference on Machine Learning, 2016. 2052–2060

  30. du Plessis M C, Niu G, Sugiyama M. Class-prior estimation for learning from positive and unlabeled data. Mach Learn, 2017, 106: 463–492

    Article  MathSciNet  MATH  Google Scholar 

  31. Schapire R E. The strength of weak learnability. Mach Learn, 1990, 5: 197–227

    Article  Google Scholar 

Download references

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant No. 62250069).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, ZH., Tan, ZH. Learnware: small models do big. Sci. China Inf. Sci. 67, 112102 (2024). https://doi.org/10.1007/s11432-023-3823-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3823-6

Keywords

Navigation