Skip to main content

Learning Bilevel Sparse Regularized Neural Network

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13070))

Included in the following conference series:

  • 1280 Accesses

Abstract

Sparse regularization has attracted considerable attention in machine learning community these years, which is a quite powerful and widely used strategy for high dimensional learning problems. However, when applied in deep neural networks (DNNs), sparse regularizers have a lot of redundant weights and unnecessary connections, and little work has been devoted to regularizer-based method for DNNs sparsification. Therefore, we aim to develop a proper sparse regularizer that can avoid augmenting excessive computation complexity in DNNs. In this paper, we find that the sparse regularizer learning corresponds to learning a activation function. Further, the regularizer is learned by the bilevel optimization method for smaller number of function evaluations. Moreover, we design a novel learning method, named bilevel sparse regularized neural network (BSRL) to learn the regularization parameters based on the prior knowledge of the system. Experimental results on standard benchmark datasets show that the proposed BSRL framework outperforms other models with state-of-the-art sparse regularizers.

This work was supported by JD.com, Beijing, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alimoglu, F., Alpaydin, E.: Combining multiple representations and classifiers for pen-based handwritten digit recognition. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 637–640 (1997)

    Google Scholar 

  2. Arzeno, N.M., Deng, Z.D., Poon, C.S.: Analysis of first-derivative based QRS detection algorithms. IEEE Trans. Biomed. Eng. 55(2), 478–484 (2008)

    Article  Google Scholar 

  3. Atserias, A., Müller, M.: Automating resolution is np-hard. J. ACM (JACM) 67(5), 1–17 (2020)

    Article  MathSciNet  Google Scholar 

  4. Bayer, C., Enge-Rosenblatt, O., Bator, M., Mönks, U.: Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction. In: 2013 IEEE 18th Conference on Emerging Technologies Factory Automation (ETFA), pp. 1–4 (2013)

    Google Scholar 

  5. Bore, J.C., Ayedh, W.M.A., Li, P., Yao, D., Xu, P.: Sparse autoregressive modeling via the least absolute lp-norm penalized solution. IEEE Access 7, 40959–40968 (2019)

    Article  Google Scholar 

  6. Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted L1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)

    Article  MathSciNet  Google Scholar 

  7. Chen, M., Wang, Q., Chen, S., Li, X.: Capped \( l_1 \)-norm sparse representation method for graph clustering. IEEE Access 7, 54464–54471 (2019)

    Article  Google Scholar 

  8. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)

    Article  MathSciNet  Google Scholar 

  9. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  10. Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, pp. 3581–3590 (2017)

    Google Scholar 

  11. Hillar, C.J., Lim, L.H.: Most tensor problems are np-hard. J. ACM (JACM) 60(6), 1–39 (2013)

    Article  MathSciNet  Google Scholar 

  12. Hu, E.L., Kwok, J.T.: Low-rank matrix learning using biconvex surrogate minimization. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3517–3527 (2019)

    Article  MathSciNet  Google Scholar 

  13. Issa, I., Gastpar, M.: Computable bounds on the exploration bias. In: 2018 IEEE International Symposium on Information Theory (ISIT), pp. 576–580. IEEE (2018)

    Google Scholar 

  14. Jiang, H., Zheng, W., Luo, L., Dong, Y.: A two-stage minimax concave penalty based method in pruned adaboost ensemble. Appl. Soft Comput. 83, 105674 (2019)

    Article  Google Scholar 

  15. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  16. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  17. Li, Z., Wan, C., Tan, B., Yang, Z., Xie, S.: A fast dc-based dictionary learning algorithm with the scad penalty. Neurocomputing 429, 89–100 (2020)

    Article  Google Scholar 

  18. Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 64(1), 178–196 (2015)

    Article  MathSciNet  Google Scholar 

  19. Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)

    Google Scholar 

  20. Luo, X., Chang, X., Ban, X.: Regression and classification using extreme learning machine based on l1-norm and l2-norm. Neurocomputing 174, 179–186 (2016)

    Article  Google Scholar 

  21. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  Google Scholar 

  22. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)

    Google Scholar 

  23. Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 113–124 (2019)

    Google Scholar 

  24. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  25. Tsagkarakis, N., Markopoulos, P.P., Sklivanitis, G., Pados, D.A.: L1-norm principal-component analysis of complex data. IEEE Trans. Signal Process. 66(12), 3256–3267 (2018)

    Article  MathSciNet  Google Scholar 

  26. Wu, S., et al.: \( l1 \)-norm batch normalization for efficient training of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2043–2051 (2018)

    Article  Google Scholar 

  27. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv preprint arXiv:1708.07747

  28. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)

    Article  MathSciNet  Google Scholar 

  29. Yoon, J., Hwang, S.J.: Combined group and exclusive sparsity for deep neural networks. In: International Conference on Machine Learning, pp. 3958–3966 (2017)

    Google Scholar 

  30. Zhang, M., Ding, C., Zhang, Y., Nie, F.: Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)

    Google Scholar 

  31. Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11(3), 1081–1107 (2010)

    MathSciNet  MATH  Google Scholar 

  32. Zhang, Y., Zhang, H., Tian, Y.: Sparse multiple instance learning with non-convex penalty. Neurocomputing 391, 142–156 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, X., Zhang, L., Kong, Q. (2021). Learning Bilevel Sparse Regularized Neural Network. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93049-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93048-6

  • Online ISBN: 978-3-030-93049-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics