Abstract
Sparse regularization has attracted considerable attention in machine learning community these years, which is a quite powerful and widely used strategy for high dimensional learning problems. However, when applied in deep neural networks (DNNs), sparse regularizers have a lot of redundant weights and unnecessary connections, and little work has been devoted to regularizer-based method for DNNs sparsification. Therefore, we aim to develop a proper sparse regularizer that can avoid augmenting excessive computation complexity in DNNs. In this paper, we find that the sparse regularizer learning corresponds to learning a activation function. Further, the regularizer is learned by the bilevel optimization method for smaller number of function evaluations. Moreover, we design a novel learning method, named bilevel sparse regularized neural network (BSRL) to learn the regularization parameters based on the prior knowledge of the system. Experimental results on standard benchmark datasets show that the proposed BSRL framework outperforms other models with state-of-the-art sparse regularizers.
This work was supported by JD.com, Beijing, China.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alimoglu, F., Alpaydin, E.: Combining multiple representations and classifiers for pen-based handwritten digit recognition. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 637–640 (1997)
Arzeno, N.M., Deng, Z.D., Poon, C.S.: Analysis of first-derivative based QRS detection algorithms. IEEE Trans. Biomed. Eng. 55(2), 478–484 (2008)
Atserias, A., Müller, M.: Automating resolution is np-hard. J. ACM (JACM) 67(5), 1–17 (2020)
Bayer, C., Enge-Rosenblatt, O., Bator, M., Mönks, U.: Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction. In: 2013 IEEE 18th Conference on Emerging Technologies Factory Automation (ETFA), pp. 1–4 (2013)
Bore, J.C., Ayedh, W.M.A., Li, P., Yao, D., Xu, P.: Sparse autoregressive modeling via the least absolute lp-norm penalized solution. IEEE Access 7, 40959–40968 (2019)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted L1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Chen, M., Wang, Q., Chen, S., Li, X.: Capped \( l_1 \)-norm sparse representation method for graph clustering. IEEE Access 7, 54464–54471 (2019)
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, pp. 3581–3590 (2017)
Hillar, C.J., Lim, L.H.: Most tensor problems are np-hard. J. ACM (JACM) 60(6), 1–39 (2013)
Hu, E.L., Kwok, J.T.: Low-rank matrix learning using biconvex surrogate minimization. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3517–3527 (2019)
Issa, I., Gastpar, M.: Computable bounds on the exploration bias. In: 2018 IEEE International Symposium on Information Theory (ISIT), pp. 576–580. IEEE (2018)
Jiang, H., Zheng, W., Luo, L., Dong, Y.: A two-stage minimax concave penalty based method in pruned adaboost ensemble. Appl. Soft Comput. 83, 105674 (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, Z., Wan, C., Tan, B., Yang, Z., Xie, S.: A fast dc-based dictionary learning algorithm with the scad penalty. Neurocomputing 429, 89–100 (2020)
Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 64(1), 178–196 (2015)
Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Luo, X., Chang, X., Ban, X.: Regression and classification using extreme learning machine based on l1-norm and l2-norm. Neurocomputing 174, 179–186 (2016)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 113–124 (2019)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Tsagkarakis, N., Markopoulos, P.P., Sklivanitis, G., Pados, D.A.: L1-norm principal-component analysis of complex data. IEEE Trans. Signal Process. 66(12), 3256–3267 (2018)
Wu, S., et al.: \( l1 \)-norm batch normalization for efficient training of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2043–2051 (2018)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv preprint arXiv:1708.07747
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Yoon, J., Hwang, S.J.: Combined group and exclusive sparsity for deep neural networks. In: International Conference on Machine Learning, pp. 3958–3966 (2017)
Zhang, M., Ding, C., Zhang, Y., Nie, F.: Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11(3), 1081–1107 (2010)
Zhang, Y., Zhang, H., Tian, Y.: Sparse multiple instance learning with non-convex penalty. Neurocomputing 391, 142–156 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, X., Zhang, L., Kong, Q. (2021). Learning Bilevel Sparse Regularized Neural Network. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-93049-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93048-6
Online ISBN: 978-3-030-93049-3
eBook Packages: Computer ScienceComputer Science (R0)