Abstract
Multi-objective bi-level optimization (MOBLO) addresses nested multi-objective optimization problems common in a range of applications. However, its multi-objective and hierarchical bi-level nature makes it notably complex. Gradient-based MOBLO algorithms have recently grown in popularity, as they effectively solve crucial machine learning problems like meta-learning, neural architecture search, and reinforcement learning. Unfortunately, these algorithms depend on solving a sequence of approximation subproblems with high accuracy, resulting in adverse time and memory complexity that lowers their numerical efficiency. To address this issue, we propose a gradient-based algorithm for MOBLO, called gMOBA, which has fewer hyperparameters to tune, making it both simple and efficient. Additionally, we demonstrate the theoretical validity by accomplishing the desirable Pareto stationarity. Numerical experiments confirm the practical efficiency of the proposed method and verify the theoretical results. To accelerate the convergence of gMOBA, we introduce a beneficial L2O (learning to optimize) neural network (called L2O-gMOBA) implemented as the initialization phase of our gMOBA algorithm. Comparative results of numerical experiments are presented to illustrate the performance of L2O-gMOBA.
Similar content being viewed by others
References
Abdolmaleki A, Huang S, Hasenclever L, et al. A distributional view on multi-objective policy optimization. In: International Conference on Machine Learning, vol. 119. Virtual Event: PMLR, 2020, 11–22
Albuquerque I, Monteiro J, Doan T, et al. Multi-objective training of generative adversarial networks with multiple discriminators. In: International Conference on Machine Learning, vol. 97. Long Beach: PMLR, 2019, 202–211
Andreani R, Ramirez V A, Santos S A, et al. Bilevel optimization with a multiobjective problem in the lower level. Numer Algor, 2019, 81: 915–946
Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems. Barcelona: NIPS, 2016, 3981–3989
Bandyopadhyay S, Pal S K, Aruna B. Multiobjective gas, quantitative indices, and pattern classification. IEEE Trans Syst Man Cy B, 2004, 34: 2088–2099
Beck A. First-order Methods in Optimization. Philadelphia: SIAM, 2017
Bonnel H, Iusem A N, Svaiter B F. Proximal methods in vector optimization. SIAM J Optim, 2005, 15: 953–970
Bonnel H, Morgan J. Semivectorial bilevel optimization problem: Penalty approach. J Optim Theo Appl, 2006, 131: 365–382
Chen J, Tang L, Yang X. A Barzilai-Borwein descent method for multiobjective optimization problems. Eur J Oper Res, 2023, 311: 196–209
Chen T, Chen X, Chen W, et al. Learning to optimize: A primer and a benchmark. J Mach Learn Res, 2022, 23: 1–59
Chen W Y, Liu Y C, Kira Z, et al. A closer look at few-shot classification. In: International Conference on Learning Representations. New Orleans: OpenReview.net, 2019
Chen X, Ghadirzadeh A, Björkman M, et al. Meta-learning for multi-objective reinforcement learning. In: Proceedings of the International Conference on Intelligent Robots and Systems. Macau: IEEE, 2019, 977–983
Chen X, Xie L, Wu J, et al. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In: Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019, 1294–1303
Custódio A L, Madeira J A, Vaz A I F, et al. Direct multisearch for multiobjective optimization. SIAM J Optim, 2011, 21: 1109–1140
da Cruz Neto J X, Da Silva G, Ferreira O P, et al. A subgradient method for multiobjective optimization. Comput Optim Appl, 2013, 54: 461–472
Dagréou M, Ablin P, Vaiter S, et al. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. arXiv:2201.13409, 2022
Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evo Comput, 2002, 6: 182–197
Dempe S, Gadhi N, Zemkoho A B. New optimality conditions for the semivectorial bilevel optimization problem. J Optim Theo App, 2013, 157: 54–74
Dempe S, Mehlitz P. Semivectorial bilevel programming versus scalar bilevel programming. Optimization, 2019, 69: 657–679
Désidéri J A. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. C R Math, 2012, 350: 313–318
Dong J D, Cheng A C, Juan D C, et al. Dpp-net: Device-aware progressive search for Pareto-optimal neural architectures. In: Proceedings of the European Conference on Computer Vision, vol. 11220. Munich: Springer, 2018, 517–531
Ehrgott M. Multicriteria Optimization. Luxembourg: Springer, 2005
Elsken T, Metzen J H, Hutter F. Efficient multi-objective neural architecture search via lamarckian evolution. In: International Conference on Learning Representations. New Orleans: OpenReview.net, 2019
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, vol. 70. Sydney: PMLR, 2017, 1126–1135
Fliege J, Svaiter B F. Steepest descent methods for multicriteria optimization. Math Meth Oper Res, 2000, 51: 479–494
Franceschi L, Donini M, Frasconi P, et al. Forward and reverse gradient-based hyperparameter optimization. In: International Conference on Machine Learning, vol. 70. Sydney: PMLR, 2017, 1165–1173
Franceschi L, Frasconi P, Salzo S, et al. Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning, vol. 80. Stockholm: PMLR, 2018, 1563–1572
Ghadimi S, Wang M. Approximation methods for bilevel programming. arXiv:1802.02246, 2018
Goldblum M, Fowl L, Goldstein T. Adversarially robust few-shot learning: A meta-learning approach. In: Advances in Neural Information Processing Systems. Virtual Event: NIPS, 2020, 17886–17895
Grazzi R, Franceschi L, Pontil M, et al. On the iteration complexity of hypergradient computation. In: International Conference on Machine Learning, vol. 119. Virtual Event: PMLR, 2020
Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: International Conference on Machine Learning, vol. 14. Haifa: PMLR, 2010, 399–406
Gu A, Lu S, Ram P, et al. Min-max multi-objective bilevel optimization with applications in robust machine learning. In: International Conference on Learning Representations. Virtual Event: OpenReview.net, 2022
Hospedales T M, Antoniou A, Micaelli P, et al. Meta-learning in neural networks: A survey. IEEE Trans Pattern Anal, 2020, 44: 5149–5169
Hu Z, Shaloudegi K, Zhang G, et al. Federated learning meets multi-objective optimization. IEEE Trans Netw Sci Eng, 2022, 9: 2039–2051
Jain H, Deb K. An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part ii: Handling constraints and extending to an adaptive approach. IEEE Trans Evo Comput, 2013, 18: 602–622
Ji K, Lee J D, Liang Y, et al. Convergence of meta-learning with task-specific adaptation over partial parameters. In: Advances in Neural Information Processing Systems. Virtual Event: NIPS, 2020, 11490–11500
Ji K, Liu M, Liang Y, et al. Will bilevel optimizers benefit from loops. arXiv:2205.14224, 2022
Ji K, Yang J, Liang Y. Bilevel optimization: Convergence analysis and enhanced design. In: International Conference on Machine Learning, vol. 139. Virtual Event: PMLR, 2021, 4882–4892
Jin Y, Sendhoff B. Pareto-based multiobjective machine learning: An overview and case studies. IEEE Trans Syst Man Cy C, 2008, 38: 397–415
Khanduri P, Zeng S, Hong M, et al. A near-optimal algorithm for stochastic bilevel optimization via double-momentum. In: Advances in Neural Information Processing Systems. Vancouver: NIPS, 2021, 30271–30283
Killamsetty K, Li C, Zhao C, et al. A nested bi-level optimization framework for robust few shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36. Palo Alto: AAAI Press, 2022, 7176–7184
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014
Li K, Chen R, Fu G, et al. Two-archive evolutionary algorithm for constrained multiobjective optimization. IEEE Trans Evo Comput, 2018, 23: 303–315
Liang H, Zhang S, Sun J, et al. DARTS+: Improved differentiable architecture search with early stopping. arXiv:1909.06035, 2019
Lin X, Yang Z, Zhang Q, et al. Controllable Pareto multi-task learning. arXiv:2010.06313, 2020
Lin X, Zhen H L, Li Z, et al. Pareto multi-task learning. In: Advances in Neural Information Processing Systems. Vancouver: NIPS, 2019, 12037–12047
Liu B, Liu X, Jin X, et al. Conflict-averse gradient descent for multi-task learning. In: Advances in Neural Information Processing Systems. Virtual Event: NIPS, 2021, 18878–18890
Liu H, Simonyan K, Yang Y. DARTS: Differentiable architecture search. In: International Conference on Learning Representations. Vancouver: OpenReview.net, 2018
Liu J, Chen X. ALISTA: Analytic weights are as good as learned weights in LISTA. In: International Conference on Learning Representations. New Orleans: OpenReview.net, 2019
Liu R, Gao J, Zhang J, et al. Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE Trans Pattern Anal, 2021, 44: 10045–10067
Liu R, Liu Y, Yao W, et al. Averaged method of multipliers for bi-level optimization without lower-level strong convexity. In: International Conference on Machine Learning, vol. 202. Honolulu: PMLR, 2023, 21839–21866
Liu R, Mu P, Yuan X, et al. A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. In: International Conference on Machine Learning, vol. 119. Virtual Event: PMLR, 2020, 6305–6315
Lu Z, Deb K, Goodman E, et al. NSGANetV2: Evolutionary multi-objective surrogate-assisted neural architecture search. In: European Conference on Computer Vision, vol. 12346. Virtual Event: Springer, 2020, 35–51
Mackay M, Vicol P, Lorraine J, et al. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. In: International Conference on Learning Representations. New Orleans: OpenReview.net, 2019
Maclaurin D, Duvenaud D, Adams R. Gradient-based hyperparameter optimization through reversible learning. In: International Conference on Machine Learning, vol. 37. Lille: PMLR, 2015, 2113–2122
Mahapatra D, Rajan V. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In: International Conference on Machine Learning, vol. 119. Virtual Event: PMLR, 2020, 6597–6607
Miettinen K. Nonlinear Multiobjective Optimization. Luxembourg: Springer, 1999
Mohri M, Sivek G, Suresh A T. Agnostic federated learning. In: International Conference on Machine Learning, vol. 97. Long Beach: PMLR, 2019, 4615–4625
Momma M, Dong C, Liu J. A multi-objective/multi-task learning framework induced by pareto stationarity. In: International Conference on Machine Learning, vol. 162. Baltimore: PMLR, 2022, 15895–15907
Monga V, Li Y, Eldar Y C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Proc Mag, 2021, 38: 18–44
Mordukhovich B S, Nam N M. An Easy Path to Convex Analysis and Applications. Luxembourg: Springer, 2013
Mossalam H, Assael Y M, Roijers D M, et al. Multi-objective deep reinforcement learning. arXiv:1610.02707, 2016
Neyshabur B, Bhojanapalli S, Chakrabarti A. Stabilizing GAN training with multiple random projections. arXiv:1705.07831, 2017
Paszke A, Gross S, Massa F, et al. PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. Vancouver: NIPS, 2019, 8024–8035
Pedregosa F. Hyperparameter optimization with approximate gradient. In: International Conference on Machine Learning, vol. 48. New York: PMLR, 2016, 737–746
Rajeswaran A, Finn C, Kakade S M, et al. Meta-learning with implicit gradients. In: Advances in Neural Information Processing Systems. Vancouver: NIPS, 2019, 113–124
Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098, 2017
Schott J R. Fault tolerant design using single and multicriteria genetic algorithm optimization. PhD Thesis. Cambridge: Massachusetts Institute of Technology, 1995
Sener O, Koltun V. Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems. Montreal: NIPS, 2018, 113–124
Shaban A, Cheng C A, Hatch N, et al. Truncated back-propagation for bilevel optimization. In: International Conference on Artificial Intelligence and Statistics, vol. 89. Naha: PMLR, 2019, 1723–1732
Sprechmann P, Litman R, Ben Yakar T, et al. Supervised sparse analysis and synthesis operators. In: Advances in Neural Information Processing Systems. Lake Tahoe: NIPS, 2013, 908–916
Sun J, Li H, Xu Z, et al. Deep ADMM-Net for compressive sensing MRI. In: Advances in Neural Information Processing Systems. Barcelona: NIPS, 2016, 10–18
Tan M, Chen B, Pang R, et al. MnasNet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019, 2820–2828
Tanabe H, Fukuda E H, Yamashita N. Proximal gradient methods for multiobjective optimization and their applications. Comput Optim Appl, 2019, 72: 339–361
Vamplew P, Dazeley R, Berry A, et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn, 2011, 84: 51–80
Van Moffaert K, NowNowe A. Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res, 2014, 15: 3483–3512
Van Veldhuizen D A. Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations. Ohio: Air Force Institute of Technology, 1999
Venkatakrishnan S V, Bouman C A, Wohlberg B. Plug-and-play priors for model based reconstruction. In: IEEE Global Conference on Signal and Information Processing. Austin: IEEE, 2013, 945–948
Yang R, Sun X, Narasimhan K. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In: Advances in Neural Information Processing Systems. Vancouver: NIPS, 2019, 14610–14621
Yang Y, Sun J, Li H, et al. ADMM-Net: A deep learning approach for compressive sensing MRI. arXiv:1705.06869, 2017
Ye F, Lin B, Yue Z, et al. Multi-objective meta learning. In: Advances in Neural Information Processing Systems. Virtual Event: NIPS, 2021, 21338–21351
Yu T, Kumar S, Gupta A, et al. Gradient surgery for multi-task learning. In: Advances in Neural Information Processing Systems. Virtual Event: NIPS, 2020, 5824–5836
Zügner D, Günnemann S. Adversarial attacks on graph neural networks via meta learning. In: International Conference on Learning Representations. Vancouver: OpenReview.net, 2018
Acknowledgements
Yang’s work was supported by the Major Program of National Natural Science Foundation of China (Grant Nos. 11991020 and 11991024). Yao’s work was supported by National Natural Science Foundation of China (Grant No. 12371305). Zhang’s work was supported by National Natural Science Foundation of China (Grant No. 12222106), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2022B1515020082) and Shenzhen Science and Technology Program (Grant No. RCYX20200714114700072).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, X., Yao, W., Yin, H. et al. Gradient-based algorithms for multi-objective bi-level optimization. Sci. China Math. 67, 1419–1438 (2024). https://doi.org/10.1007/s11425-023-2302-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-023-2302-9