Learning Bilevel Sparse Regularized Neural Network

Xu, Xin; Zhang, Liangliang; Kong, Qi

doi:10.1007/978-3-030-93049-3_16

Xin Xu¹⁴,
Liangliang Zhang¹⁵ &
Qi Kong¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13070))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1280 Accesses

Abstract

Sparse regularization has attracted considerable attention in machine learning community these years, which is a quite powerful and widely used strategy for high dimensional learning problems. However, when applied in deep neural networks (DNNs), sparse regularizers have a lot of redundant weights and unnecessary connections, and little work has been devoted to regularizer-based method for DNNs sparsification. Therefore, we aim to develop a proper sparse regularizer that can avoid augmenting excessive computation complexity in DNNs. In this paper, we find that the sparse regularizer learning corresponds to learning a activation function. Further, the regularizer is learned by the bilevel optimization method for smaller number of function evaluations. Moreover, we design a novel learning method, named bilevel sparse regularized neural network (BSRL) to learn the regularization parameters based on the prior knowledge of the system. Experimental results on standard benchmark datasets show that the proposed BSRL framework outperforms other models with state-of-the-art sparse regularizers.

This work was supported by JD.com, Beijing, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SRS-DNN: a deep neural network with strengthening response sparsity

Article 26 June 2019

Sparse Optimization Based on Non-convex $$\ell _{1/2}$$ Regularization for Deep Neural Networks

Gradient-Free Neural Network Training Based on Deep Dictionary Learning with the Log Regularizer

References

Alimoglu, F., Alpaydin, E.: Combining multiple representations and classifiers for pen-based handwritten digit recognition. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 637–640 (1997)
Google Scholar
Arzeno, N.M., Deng, Z.D., Poon, C.S.: Analysis of first-derivative based QRS detection algorithms. IEEE Trans. Biomed. Eng. 55(2), 478–484 (2008)
Article Google Scholar
Atserias, A., Müller, M.: Automating resolution is np-hard. J. ACM (JACM) 67(5), 1–17 (2020)
Article MathSciNet Google Scholar
Bayer, C., Enge-Rosenblatt, O., Bator, M., Mönks, U.: Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction. In: 2013 IEEE 18th Conference on Emerging Technologies Factory Automation (ETFA), pp. 1–4 (2013)
Google Scholar
Bore, J.C., Ayedh, W.M.A., Li, P., Yao, D., Xu, P.: Sparse autoregressive modeling via the least absolute lp-norm penalized solution. IEEE Access 7, 40959–40968 (2019)
Article Google Scholar
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted L1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Article MathSciNet Google Scholar
Chen, M., Wang, Q., Chen, S., Li, X.: Capped $ l_1 $-norm sparse representation method for graph clustering. IEEE Access 7, 54464–54471 (2019)
Article Google Scholar
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)
Article MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Article MathSciNet Google Scholar
Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, pp. 3581–3590 (2017)
Google Scholar
Hillar, C.J., Lim, L.H.: Most tensor problems are np-hard. J. ACM (JACM) 60(6), 1–39 (2013)
Article MathSciNet Google Scholar
Hu, E.L., Kwok, J.T.: Low-rank matrix learning using biconvex surrogate minimization. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3517–3527 (2019)
Article MathSciNet Google Scholar
Issa, I., Gastpar, M.: Computable bounds on the exploration bias. In: 2018 IEEE International Symposium on Information Theory (ISIT), pp. 576–580. IEEE (2018)
Google Scholar
Jiang, H., Zheng, W., Luo, L., Dong, Y.: A two-stage minimax concave penalty based method in pruned adaboost ensemble. Appl. Soft Comput. 83, 105674 (2019)
Article Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, Z., Wan, C., Tan, B., Yang, Z., Xie, S.: A fast dc-based dictionary learning algorithm with the scad penalty. Neurocomputing 429, 89–100 (2020)
Article Google Scholar
Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 64(1), 178–196 (2015)
Article MathSciNet Google Scholar
Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Google Scholar
Luo, X., Chang, X., Ban, X.: Regression and classification using extreme learning machine based on l1-norm and l2-norm. Neurocomputing 174, 179–186 (2016)
Article Google Scholar
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Article MathSciNet Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 113–124 (2019)
Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Tsagkarakis, N., Markopoulos, P.P., Sklivanitis, G., Pados, D.A.: L1-norm principal-component analysis of complex data. IEEE Trans. Signal Process. 66(12), 3256–3267 (2018)
Article MathSciNet Google Scholar
Wu, S., et al.: $ l1 $-norm batch normalization for efficient training of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2043–2051 (2018)
Article Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv preprint arXiv:1708.07747
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Article MathSciNet Google Scholar
Yoon, J., Hwang, S.J.: Combined group and exclusive sparsity for deep neural networks. In: International Conference on Machine Learning, pp. 3958–3966 (2017)
Google Scholar
Zhang, M., Ding, C., Zhang, Y., Nie, F.: Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Google Scholar
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11(3), 1081–1107 (2010)
MathSciNet MATH Google Scholar
Zhang, Y., Zhang, H., Tian, Y.: Sparse multiple instance learning with non-convex penalty. Neurocomputing 391, 142–156 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Autonomous Driving Division, JD.com, Beijing, China
Xin Xu
Autonomous Driving Division, JD.com American Technologies Corporation, Mountain View, CA, 94043, USA
Liangliang Zhang & Qi Kong

Authors

Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Liangliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Xu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Yiran Chen
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
University of British Columbia, Vancouver, BC, Canada
Jane Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ruiping Wang
Xidian University, Xi'an, China
Weisheng Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, X., Zhang, L., Kong, Q. (2021). Learning Bilevel Sparse Regularized Neural Network. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-93049-3_16
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93048-6
Online ISBN: 978-3-030-93049-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Bilevel Sparse Regularized Neural Network

Abstract

Access this chapter

Similar content being viewed by others

SRS-DNN: a deep neural network with strengthening response sparsity

Sparse Optimization Based on Non-convex $$\ell _{1/2}$$ Regularization for Deep Neural Networks

Gradient-Free Neural Network Training Based on Deep Dictionary Learning with the Log Regularizer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Bilevel Sparse Regularized Neural Network

Abstract

Access this chapter

Similar content being viewed by others

SRS-DNN: a deep neural network with strengthening response sparsity

Sparse Optimization Based on Non-convex $$\ell _{1/2}$$ Regularization for Deep Neural Networks

Gradient-Free Neural Network Training Based on Deep Dictionary Learning with the Log Regularizer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation