Skip to main content
Log in

A unified pruning framework for vision transformers

  • Letter
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Conclusion

In this study, we proposed a novel method called UP-ViTs to prune ViTs in a unified manner. Our framework can prune all components in a ViT and its variants, maintain the models’ structure, and generalize well into downstream tasks. UP-ViTs achieve state-of-the-art results when pruning various ViT backbones. Moreover, we studied the transferring ability of the compressed model and found that our UP-ViTs also outperform original ViTs. We also extended our method into NLP tasks and obtained more efficient transformer models. Please refer to the appendix for more details.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008

  2. Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357

  3. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (ICLR), 2021

  4. Yu S, Chen T, Shen J, et al. Unified visual transformer compression. In: Proceedings of International Conference on Learning Representations (ICLR), 2021

  5. Xu Y, Zhang Z, Zhang M, et al. Evo-ViT: slow-fast token evolution for dynamic vision transformer. 2021. ArXiv:2108.01390

  6. Luo J H, Wu J. Neural network pruning with residual-connections and limited-data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1458–1467

  7. Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. 248–255

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 62276123. 61921006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Wu.

Additional information

Supporting information

Appendixes A-E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted. without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, H., Wu, J. A unified pruning framework for vision transformers. Sci. China Inf. Sci. 66, 179101 (2023). https://doi.org/10.1007/s11432-022-3646-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3646-6

Navigation