Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming

Sierra-Altamiranda, Alvaro; Charkhgard, Hadi; Dayarian, Iman; Eshragh, Ali; Javadi, Sorna

doi:10.1007/s11590-024-02100-5

Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming

Original Paper
Published: 18 March 2024

(2024)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Alvaro Sierra-Altamiranda¹,
Hadi Charkhgard ORCID: orcid.org/0000-0001-5416-6960¹,
Iman Dayarian²,
Ali Eshragh³ &
…
Sorna Javadi¹

77 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ one of the most effective and recently developed criterion space search algorithms, the so-called KSA, during our study. This algorithm computes all nondominated points of a problem with p objectives by searching on a projected criterion space, i.e., a $(p-1)$-dimensional criterion apace. We present an effective and fast learning approach to identify on which projected space the KSA should work. We also present several generic features/variables that can be used in machine learning techniques for identifying the best projected space. Finally, we present an effective bi-objective optimization-based heuristic for selecting the subset of the features to overcome the issue of overfitting in learning. Through an extensive computational study over 2000 instances of tri-objective knapsack and assignment problems, we demonstrate that an improvement of up to 18% in time can be achieved by the proposed learning method compared to a random selection of the projected space. To show that the performance of our algorithm is not limited to instances of knapsack and assignment problems with three objective functions, we also report similar performance results when the proposed learning approach is used for solving random binary integer program instances with four objective functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Article 13 March 2023

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

References

Alvarez, A.M., Louveaux, Q., Wehenkel, L.: A machine learning-based approximation of strong branching. INFORMS J. Comput. 29(1), 185–195 (2017)
Article MathSciNet Google Scholar
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Article MathSciNet Google Scholar
Boland, N., Charkhgard, H., Savelsbergh, M.: A criterion space search algorithm for biobjective integer programming: the balanced box method. INFORMS J. Comput. 27(4), 735–754 (2015)
Article MathSciNet Google Scholar
Boland, N., Charkhgard, H., Savelsbergh, M.: The L-shape search method for triobjective integer programming. Math. Program. Comput. 8(2), 217–251 (2016)
Article MathSciNet Google Scholar
Boland, N., Charkhgard, H., Savelsbergh, M.: A new method for optimizing a linear function over the efficient set of a multiobjective integer program. Eur. J. Oper. Res. 260(3), 904–919 (2017)
Article MathSciNet Google Scholar
Boland, N., Charkhgard, H., Savelsbergh, M.: The quadrant shrinking method: a simple and efficient algorithm for solving tri-objective integer programs. Eur. J. Oper. Res. 260(3), 873–885 (2017)
Article MathSciNet Google Scholar
Boland, N., Charkhgard, H., Savelsbergh, M.: Preprocessing and cut generation techniques for multi-objective binary programming. Eur. J. Oper. Res. 274(3), 858–875 (2019)
Article MathSciNet Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Charkhgard, H., Eshragh, A.: A new approach to select the best subset of predictors in linear regression modeling: bi-objective mixed integer linear programming. ANZIAM J. 61(1), 64–75 (2019)
Article MathSciNet Google Scholar
Charkhgard, H., Talebian, M., Savelsbergh, M.: Nondominated nash points: application of biobjective mixed integer programming. 4OR - A Q. J. Oper. Res. 16, 151–171 (2018)
Article MathSciNet Google Scholar
Charkhgard, H., Takalloo, M., Haider, Z.: Bi-objective autonomous vehicle repositioning problem with travel time uncertainty. 4OR - A Q. J. Oper. Res. 18(4), 477–505 (2020)
Article MathSciNet Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2(Dec), 265–292 (2001)
Google Scholar
Dächert, K., Klamroth, K., Lacour, R., Vanderpooten, D.: Efficient computation of the search region in multi-objective optimization. Eur. J. Oper. Res. 260(3), 841–855 (2017)
Article MathSciNet Google Scholar
Dai, R., Charkhgard, H.: A two-stage approach for bi-objective integer linear programming. Oper. Res. Lett. 46(1), 81–87 (2018)
Article MathSciNet Google Scholar
He, H., Daume III, H., Eisner, J.M.: Learning to search in branch and bound algorithms. In: Advances in Neural Information Processing Systems, pp. 3293–3301 (2014)
Hutter, F., Hoos, H.H., Stützle, T.: Automatic algorithm configuration based on local search. AAAI 7, 1152–1157 (2007)
Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Automated configuration of mixed integer programming solvers. In: International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pp. 186–202. Springer (2010)
Khalil, E.B., Le Bodic, P., Song, L., Nemhauser, G.L., Dilkina, B.N.: Learning to branch in mixed integer programming. In: AAAI, pp. 724–731 (2016)
Khalil, E.B., Dilkina, B., Nemhauser, G.L., Ahmed, S., Shao, Y.: Learning to run heuristics in tree search. In: Proceedings of the International Joint Conference on Artificial Intelligence. AAAI Press, Melbourne, Australia (2017)
Kirlik, G., Sayın, S.: A new algorithm for generating all nondominated solutions of multiobjective discrete optimization problems. Eur. J. Oper. Res. 232(3), 479–488 (2014)
Article MathSciNet Google Scholar
Le, Q.V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.Y.: On optimization methods for deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 265–272. Omnipress (2011)
Lindauer, M., Hoos, H.H., Hutter, F., Schaub, T.: Autofolio: an automatically configured algorithm selector. J. Artif. Intell. Res. 53, 745–778 (2015)
Article Google Scholar
Lokman, B., Köksalan, M.: Finding all nondominated points of multi-objective integer programs. J. Global Optim. 57(2), 347–365 (2013)
Article MathSciNet Google Scholar
Malitsky, Y., Sabharwal, A., Samulowitz, H., Sellmann, M.: Non-model-based algorithm portfolios for sat. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 369–370. Springer (2011)
Nikolić, M., Marić, F., Janičić, P.: Instance-based selection of policies for sat solvers. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 326–340. Springer (2009)
Nikolić, M., Marić, F., Janičić, P.: Simple algorithm portfolio for sat. Artif. Intell. Rev. 40(4), 457–465 (2013)
Article Google Scholar
Özlen, M., Azizoğlu, M.: Multi-objective integer programming: a general approach for generating all non-dominated solutions. Eur. J. Oper. Res. 199, 25–35 (2009)
Article MathSciNet Google Scholar
Özlen, M., Burton, B.A., MacRae, C.A.G.: Multi-objective integer programming: an improved recursive algorithm. J. Optim. Theory Appl. 160(2), 470–482 (2013)
Article MathSciNet Google Scholar
Özpeynirci, Ö., Köksalan, M.: An exact algorithm for finding extreme supported nondominated points of multiobjective mixed integer programs. Manage. Sci. 56(12), 2302–2315 (2010)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Prinzie, A., Van den Poel, D.: Random forests for multiclass classification: random multinomial logit. Expert Syst. Appl. 34(3), 1721–1732 (2008)
Article Google Scholar
Przybylski, A., Gandibleux, X.: Multi-objective branch and bound. Eur. J. Oper. Res. 260(3), 856–872 (2017)
Article MathSciNet Google Scholar
Rice, J.R.: The algorithm selection problem. In: Advances in Computers, Vol. 15, pp. 65–118. Elsevier (1976)
Roth, D., Yih, W.-t.: Integer linear programming inference for conditional random fields. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 736–743. ACM (2005)
Sabharwal, A., Samulowitz, H., Reddy, C.: Guiding combinatorial optimization with UCT. In: International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pp. 356–361. Springer (2012)
Serafini, P.: Some considerations about computational complexity for multi objective combinatorial problems. In: Recent Advances and Historical Development of Vector Optimization, pp. 222–232. Springer (1987)
Sierra-Altamiranda, A., Charkhgard, H.: A new exact algorithm to optimize a linear function over the set of efficient solutions for bi-objective mixed integer linear programming. INFORMS J. Comput. 31(4), 823–840 (2019)
Article MathSciNet Google Scholar
Sierra-Altamiranda, A., Charkhgard, H.: Ooesalgorithm.jl: a julia package for optimizing a linear function over the set of efficient solutions for biobjective mixed integer linear programming. Int. Trans. Oper. Res. 27(2), 945–957 (2020)
Article MathSciNet Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Soylu, B., Yıldız, G.B.: An exact algorithm for biobjective mixed integer linear programming problems. Comput. Oper. Res. 72, 204–213 (2016)
Article MathSciNet Google Scholar
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58, 267–288 (1996)
MathSciNet Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-first International Conference on Machine Learning, pp. 104. ACM (2004)
Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla-07: the design and analysis of an algorithm portfolio for sat. In: International Conference on Principles and Practice of Constraint Programming, pp. 712–727. Springer (2007)
Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla: portfolio-based algorithm selection for sat. J. Artif. Intell. Res. 32, 565–606 (2008)
Article Google Scholar
Xu, L., Hoos, H., Leyton-Brown, K.: Hydra: automatically configuring algorithms for portfolio-based selection. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Hydra-mip: Automated algorithm configuration and selection for mixed integer programming. In: RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion at the International Joint Conference on Artificial Intelligence (IJCAI), pp. 16–30 (2011)
Xu, L., Hutter, F., Shen, J., Hoos, H.H., Leyton-Brown, K.: Satzilla2012: Improved algorithm selection based on cost-sensitive classification models. In: Proceedings of SAT challenge, pp. 57–58 (2012)

Download references

Author information

Authors and Affiliations

Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, 33620, USA
Alvaro Sierra-Altamiranda, Hadi Charkhgard & Sorna Javadi
Culverhouse College of Business, The University of Alabama, Tuscaloosa, AL, 35487, USA
Iman Dayarian
Carey Business School, Johns Hopkins University, Washington, DC, USA
Ali Eshragh

Authors

Alvaro Sierra-Altamiranda
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Charkhgard
View author publications
You can also search for this author in PubMed Google Scholar
Iman Dayarian
View author publications
You can also search for this author in PubMed Google Scholar
Ali Eshragh
View author publications
You can also search for this author in PubMed Google Scholar
Sorna Javadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hadi Charkhgard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: MSVM

Our proposed ML framework needs a classification method to be run. The classification algorithm that we used in this study is MSVM. This approach seeks to learn a function $\beta :\varPhi \rightarrow \varOmega $ to predict which objective function should be selected. Let $S=\{(\varvec{\phi }^1,y^1),\dots ,(\varvec{\phi }^s,y^s)\}$ be a set of training instances. For each instance $i\in \{1,\dots , s\}$, $\varvec{\phi }^i$ is the vector of features and $y^i$ is its corresponding label, i.e., the index of the objective function with the best performance. We use the multi-class formulation in [13], which establish a classifier of the form:

$$\begin{aligned} \beta _{{\textbf {W}}}(\varvec{\phi },y)=\arg \max \{\varGamma _{\varvec{w}_r}(\varvec{\phi },y): r\in {1,\dots ,p}\} \end{aligned}$$

(3)

where ${\textbf {W}}\in \mathbb {R}^{p\times k}$ is a parameter matrix, p is the number of objectives, and k is the number of features. Also, $\varGamma _{\varvec{w}_r}(\varvec{\phi },y):=\varvec{w}_r\cdot \varPsi (\varvec{\phi },y)$ where $\varvec{w}_r$ is the rth row of ${\textbf {W}}$ and $\varPsi (\varvec{\phi },y)$ is a feature vector relating $\varvec{\phi }$ and y. Observe that $\varGamma _{\varvec{w}_r}(\varvec{\phi },y)$ is a linear function, which makes it suitable for a better understanding of the impact of the features in our predictions. Eventually, this will help to our best subset selection of features, that will be explained in detail in the next subsection. We used the MSVM algorithm described in [44] to train the model, which finds the solution of the following optimization problem:

$$\begin{aligned}{} & {} \min _{{\textbf {W}},\xi _i\ge 0}\ \ \ \frac{1}{2}{} {\textbf {W}}^{\intercal }{} {\textbf {W}}+\frac{C}{s}\sum _{i=1}^{s}\xi _i\nonumber \\{} & {} \text {s.t. }\forall \ i\in \{1,\dots ,s\},\bar{y}^i\in \varOmega : \varGamma (\phi ^i,y^i)-\varGamma (\phi ^i,\bar{y}^i)\ge \varDelta (y^i,\bar{y}^i)-\xi _i, \end{aligned}$$

(4)

where $C > 0$ is the regularization parameter, and $\varDelta :\varOmega ^2\rightarrow \{0,1\}$ is the loss function. Basically, $\varDelta (y,\bar{y})$ is a function that returns 0 when the predicted label $\bar{y}$ is equal to y, and returns 1 otherwise. Generally, Problem (4) is not solved to optimality, instead, a parameter $\epsilon $ is defined to stop the search when the absolute gap reaches this value.

Appendix B: List of features

In this section, we present all $5p^2+106p-50$ features that we used in this study. Note that since $p=3$ in our computational study, the total number of features are 313 features. For convenience, we partition all the proposed features into some subsets and present them next. We use the letter F to represent each subset.

The first subset of features is $F^1=\{\varvec{c}_1^\intercal \varvec{\tilde{x}},\varvec{c}_2^\intercal \varvec{\tilde{x}},\dots ,\varvec{c}_p^\intercal \varvec{\tilde{x}}\}$, which will be automatically computed during the pre-ordering process. To incorporate the impact of the size of an instance in the learning process, we introduce $F^2=\{n\}$, $F^3=\{m\}$, and $F^4=\{\text{ density }(A)\}$. To incorporate the impact of zero coefficients of the objective functions in the learning process, for each $i\in \{1,\dots ,p\}$ we introduce:

$$\begin{aligned} F_{i}^{5}=\{\text{ size }(S^5_i)\}, \end{aligned}$$

where $S^5_i=\{c\in \varvec{c}_{i}:c=0\}$. To incorporate the impact of positive coefficients of the objective functions in the learning process, for each $i\in \{1,\dots ,p\}$ we introduce:

$$\begin{aligned} F_i^{6}=\{\text{ size }(S^6_i),\text{ Avg }(S^6_i),\text{ Min }(S^6_i),\text{ Max }(S^6_i),\text{ Std }(S^6_i),\text{ Median }(S^6_i)\}, \end{aligned}$$

where $S^6_i=\{c\in \varvec{c}_{i}:c>0\}$. To incorporate the impact of negative coefficients of the objective functions in the learning process, for each $i\in \{1,\dots ,p\}$ we introduce:

$$\begin{aligned} F_i^{7}=\{\text{ size }(S^7_i),\text{ Avg }(S^7_i),\text{ Min }(S^7_i),\text{ Max }(S^7_i),\text{ Std }(S^7_i),\text{ Median }(S^7_i)\}, \end{aligned}$$

where $S^7_i=\{c\in \varvec{c}_{i}:c<0\}$. To establish a relation between the objective functions and A in the learning process, for each $i\in \{1,\dots ,p\}$ we introduce:

$$\begin{aligned} F_i^{8}=\{\text{ Avg }(S^8_i),\text{ Min }(S^8_i),\text{ Max }(S^8_i),\text{ Std }(S^8_i),\text{ Median }(S^8_i)\}, \end{aligned}$$

where $S^8_i=\cup _{j\in \{1,\dots ,m\}}\{\varvec{c}^\intercal _i \varvec{a}^\intercal _{j.}\}$ and $\varvec{a}_{j.}$ represents row j of matrix A. For the same purpose, for each $i\in \{1,\dots ,p\}$ we also introduce:

$$\begin{aligned} F^{9}_{i}=\varvec{c}^\intercal _i\times A^\intercal \times \varvec{b}. \end{aligned}$$

For each $j\in \{1,\dots ,m\}$, let $b'_j:=b_j+1$ if $b_j \ge 0$ and $b'_j:=b_j-1$ otherwise. To establish a relation between the positive and negative coefficients in the objective functions and $\varvec{b}$ in the learning process, for each $i\in \{1,\dots ,p\}$ and $k\in \{10,11\}$ we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

where $S^{10}_i=\cup _{j\in \{1,\dots ,m\}}\{\frac{\sum _{c\in \varvec{S}_{i}^{6}}}{b'_j}\}$ and $S^{11}_i=\cup _{j\in \{1,\dots ,m\}}\{\frac{\sum _{c\in \varvec{S}_{i}^{7}}}{b'_j}\}$.

For each $i\in \{1,\dots ,p\}$, let $l_{i}:=\min _{\varvec{x} \in \mathcal {X}_{LR}} \ z_i(\varvec{x})$ and $u_{i}:=\max _{\varvec{x} \in \mathcal {X}_{LR}} \ z_i(\varvec{x})$ where $\mathcal {X}_{LR}$ is the linear programming relaxation of $\mathcal {X}$. To incorporate the impact of the volume of the search region in a projected criterion space, i.e., a $(p-1)$-dimensional criterion space, in the learning process, for each $i\in \{1,\dots ,p\}$ we introduce:

$$\begin{aligned} F_i^{12}=\prod \limits _{j\in \{1,\dots ,p\}\setminus \{i\}}(u_{j}-l_{j}). \end{aligned}$$

Let $\bar{c}'_i:=\frac{\sum _{c\in \varvec{c}_i}{|c|}}{n}$ be the average of the absolute values of the elements in the objective $i\in \{1,\dots ,p\}$. Note that we understand that $\varvec{c}_i$ is a vector (and not a set) and hence $c\in \varvec{c}_i$ is not a well-defined mathematical notation. However, for simplicity, we keep this notation as is and basically treat each component of the vector as an element.

We also introduce some features that measures the size of an instance in an indirect way. Specifically, for each $i\in \{1,\dots ,p\}$ and $k\in \{13,14,15\}$ we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

to measure n, p and m, respectively, where

$$\begin{aligned}{} & {} S^{13}_i:=\cup _{j\in \{1,\dots ,p\}\backslash \{i\}}\left\{ \frac{\sum _{c\in \varvec{c}_i}{|c|}}{\bar{c}'_j+1}\right\} ,\\{} & {} S^{14}_i:=\cup _{k\in \{1,\dots ,n\}}\left\{ \frac{\sum _{j\in \{1,\dots ,p\}\setminus \{i\}}{|c_{jk}|}}{|c_{ik}|+1}\right\} , \end{aligned}$$

and

$$\begin{aligned} S^{15}_i:=\cup _{k\in \{1,\dots ,n\}}\left\{ \frac{\sum _{j=1}^{m}{|a_{jk}|}}{|c_{ik}|+1}\right\} . \end{aligned}$$

Motivated by the idea using the product of two variables for studying the interaction effect between them, for each $i\in \{1,\dots ,p\}$, we introduce:

$$\begin{aligned} F_i^{16}=\{\text{ Avg }(S^{16}_i),\text{ Min }(S^{16}_i),\text{ Max }(S^{16}_i),\text{ Std }(S^{16}_i),\text{ Median }(S^{16}_i)\}, \end{aligned}$$

where $S_i^{16}=\cup _{j\in \{1,\dots ,p\}\backslash \{i\}}\{\sum _{l=1}^{n} c_{il}c_{jl}\}$. Similarly, we also define a subset of features based on the leverage score $LS_j$ of the variable $j\in \{1,\dots ,n\}$ in the matrix A. Specifically, for each $i\in \{1\dots ,p\}$, we introduce:

$$\begin{aligned} F_i^{17}=\sum _{j=1}^n c_{ij} LS_j. \end{aligned}$$

where $LS_j:= \frac{||\varvec{a}_{j}||^2}{\sum _{l=1}^{n} ||\varvec{a}_{l}||^2}$ and $\varvec{a}_{j}$ represents column j of matrix A for each $j\in \{1,\dots ,n\}$. Let $\text {Avg}(C):=\text {Avg}(\varvec{c}_1,\dots ,\varvec{c}_p)$, $\text {Std}(C):=\text {Std}(\varvec{c}_1,\dots ,\varvec{c}_p)$, and

$$\begin{aligned} O:=\{(-\infty ,-1),(-1,-0.5),(-0.5, 0),(0,0.5),(0.5,1),(1,\infty )\}. \end{aligned}$$

For each $i\in \{1\dots ,p\}$, we define

$$\begin{aligned} F_i^{18}=\cup _{(l,u)\in O}\{car(\varvec{c}_i^{l,u})\} \end{aligned}$$

where

$$\begin{aligned} \varvec{c}_i^{l,u}:=\{c\in \varvec{c}_i:\text {Avg}(C)+l\ \text {Std}(C)\le c\le \text {Avg}(C)+u\ \text {std}(C)\}. \end{aligned}$$

The following observation creates the basis of the remaining features.

Observation 1

Let $\alpha _i>0$ where $i\in \{1,\dots ,p\}$ and $\beta _k$ where $k\in \{1,\dots ,m\}$ be positive constants. For a MOBLP, its equivalent problem can be constructed as follows:

$$\begin{aligned} \begin{aligned} \min \ {}&\left\{ \sum _{j=1}^n\alpha _1c_{1j}x_j,\dots ,\sum _{j=1}^n\alpha _pc_{pj}x_j\right\} \\ \text{ s.t. }&\sum _{j=1}^n\beta _k a_{kj}x_{j}\le \beta _kb_k{} & {} \forall \ k\in \{1,\dots ,m\},\\&x_j\in \{0,1\}{} & {} \forall \ j\in \{1,\dots ,n\} \end{aligned} \end{aligned}$$

(5)

Observation 1 is critical because it shows that our ML approach should not be sensitive to positive scaling. So, the remaining features are specifically designed to address this issue. Note that the remaining features are similar to the ones that we have already seen before but they are less sensitive to a positive scaling.

Let $c^{max}_{i}=\max \{|c_{i1}|,\dots , |c_{in}|\}$ and $\bar{\varvec{c}}_{i}=(\frac{c_{i1}}{c^{max}_{i}},\dots ,\frac{c_{in}}{c^{max}_{i}})$. To incorporate the impact of the relative number of zeros, positive, and negative coefficients of the objective functions in the learning process, for each $i\in \{2,\dots ,p\}$, we introduce:

$$\begin{aligned} F_{i}^{19}= & {} \left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^0}{1+ car(\bar{\varvec{c}}_i)^0}\right) \right\} ,\\ F_{i}^{20}= & {} \left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^+}{1+car(\bar{\varvec{c}}_i)^+}\right) \right\} , \end{aligned}$$

and

$$\begin{aligned} F_{i}^{21}=\left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^-}{1+car(\bar{\varvec{c}}_i)^-}\right) \right\} , \end{aligned}$$

where $car(\bar{\varvec{c}}_i)^0$ is the number of elements in $\bar{\varvec{c}}_i$ with zero values. Also, $car(\bar{\varvec{c}}_i)^+$ is the number of elements in $\bar{\varvec{c}}_i$ with positive values. Finally, $car(\bar{\varvec{c}}_i)^-$ is the number of elements in $\bar{\varvec{c}}_i$ with negative values.

The following function is helpful for introducing some other features:

$$\begin{aligned} g(a)= \left\{ \begin{array}{cc} a+1 &{} \text{ if } a\ge 0 \\ a-1 &{} \text{ otherwise } \end{array} \right\} . \end{aligned}$$

For each $l\in \{1,\dots ,m\}$, let $a^{max}_{l}=\max \{|a_{l1}|,\dots , |a_{ln}|\}$, $\bar{\varvec{a}}_{l}=(\frac{a_{l1}}{a^{max}_{l}},\dots ,\frac{a_{ln}}{a^{max}_{l}})$ and $\bar{b}_l=\frac{b_l}{a^{max}_{l}}$. To incorporate the relative impact of the magnitude of objective function coefficients, and constraints in the learning process, for each $i\in \{2,\dots ,p\}$ and $k\in \{22,23,24,25,26,27\}$, we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

where

$$\begin{aligned} S_i^{22}= & {} \left\{ \frac{\bar{c}_{11}}{g(\bar{c}_{i1})},\dots , \frac{\bar{c}_{1n}}{g(\bar{c}_{in})}\right\} ,\\ S_i^{23}= & {} \left\{ \frac{\bar{c}^2_{11}}{g(\bar{c}^2_{i1})},\dots , \frac{\bar{c}^2_{1n}}{g(\bar{c}^2_{in})}\right\} ,\\ S_i^{24}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{1j}}{ng(\bar{c}_{ij})},\dots , \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{mj}}{ng(\bar{c}_{ij})}\right\} ,\\ S_i^{25}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{1j}}{ng(\bar{c}_{ij})}-\bar{b}_1,\dots , \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{mj}}{ng(\bar{c}_{ij})}-\bar{b}_m\right\} ,\\ S_i^{26}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{1j}}{ng(\bar{c}^2_{ij})},\dots , \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{mj}}{ng(\bar{c}^2_{ij})}\right\} , \end{aligned}$$

and

$$\begin{aligned} S_i^{27}=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{1j}}{ng(\bar{c}^2_{ij})}-\bar{b}_1,\dots , \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{mj}}{ng(\bar{c}^2_{ij})}-\bar{b}_m\right\} . \end{aligned}$$

For the same reason, for each $i\in \{1,\dots ,p\}$ and $l\in \{1\dots ,p\}\backslash \{i\}$, we introduce:

$$\begin{aligned} F_{il}^{28}=\left\{ \text{ Avg }(S^{28}_{il}),\text{ Min }(S^{28}_{il}),\text{ Max }(S^{28}_{il}),\text{ Std }(S^{28}_{il}),\text{ Median }(S^{28}_{il})\right\} , \end{aligned}$$

where

$$\begin{aligned} S_{il}^{28}=\left\{ \frac{\bar{c}_{i1}}{g(\bar{c}_{l1})},\dots , \frac{\bar{c}_{in}}{g(\bar{c}_{ln})}\right\} . \end{aligned}$$

Finally, let $\bar{A}_j=\{\bar{a}_{1j},\dots ,\bar{a}_{mj}\}$ for each $j\in \{1,\dots ,n\}$. For each $i\in \{2,\dots ,p\}$, the following subsets of features are also defined for linking the constraints and objective functions:

$$\begin{aligned} F_i^{29}&=\left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}_{ij})}, \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}_{ij})}\right\} ,\\ F_i^{30}&=\left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Avg }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Min }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Max }(\bar{b}_j),\right. \\&\quad \left. \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Std }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Median }(\bar{b}_j)\right\} ,\\ F_i^{31}&=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\right. \\{} & {} \quad \left. \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}\right\} ,\\ F_i^{32}&=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Avg }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Min }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Max }(\bar{b}_j),\right. \\&\quad \left. \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Std }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Median }(\bar{b}_j)\right\} , \end{aligned}$$

where $\bar{c}^2_{ij}=\bar{c}_{ij}\bar{c}_{ij}$ for each $i\in \{1,\dots ,p\}$ and $j\in \{1\dots ,n\}$.

Appendix C: Experiments on tri-objective random binary programming instances

In this section, we generate 1000 tri-objective random binary programming instances and replicate the experiment conducted in Sect. 6.1. Instances are divided over 5 subclasses, each with 200 instances. In each subclass, the number of decision variables, i.e., n, and the number of constraints, i.e., m, are equal, i.e., $n=m$. Specifically, we assume that $n\in \{50,55,60,65,70\}$. Each instance has the following structure,

$$\begin{aligned} \max _{\varvec{x} \in \mathcal {X}} \ \{z_1(\varvec{x}),z_2(\varvec{x}),z_3(\varvec{x})\}, \end{aligned}$$

where $\mathcal {X}:=\big \{\varvec{x}\in \mathbb {B}^{n}:A\varvec{x}\le \varvec{b}\big \}$ represents the feasible set in the decision space, and $z_i(\varvec{x})=\varvec{c}^\intercal _{i}\varvec{x}$ for $i=1,2,3$. We randomly generate the parameters such that $\varvec{c}_{i}\in \{1,2,\dots ,20\}^n$, $A\in \{1,2,\dots ,10\}^{n\times n}$ with a sparsity of 0.75, and $b_j=\Big \lceil {\frac{\sum _{i=1}^{n}a_{ij}}{b'}}\Big \rceil $ for $j=1,2,\dots ,n$, where $a_{ij}$ is an element of A and $b'$ is drawn randomly from the interval (1, 3).

Table 6 Summary results for experiments on random binary instances with $\varvec{p}{} \mathbf{=3}$

Full size table

We observe from Table 6 that the accuracy of our ML framework is 58.0% (without considering tie cases). Time columns show that the average time that KSA takes to solve the problem when projecting based on the best objective, a random objective, and the objective proposed by the ML approach. It is evident that our approach shows a time improvement of around 4%-8% on average compared to a random projection. Additionally, our approach captures between 45% and 69% of the maximum possible time decrease. Note that, for instances with 65 and 70 variables, the time decrease of our approach is between 1000 and 3000 s on average.

Appendix D: Experiments on instances with four objectives

In our final experiment, we replicate the experiment that was performed in Sect. 6.1 over a set of 1000 random binary instances with four objectives. The instances are created using the procedure described in the last experiment, i.e., Sect. 7. Again, there are five subclasses, each with 200 instances, which are defined based the number of variables, i.e., $n\in \{40,45,50,55,60\}$. Based on the formula presented in Sect. 4.1, the number of features for this experiment is 484. However, our proposed feature subset selection approach selects 72 of these features.

Table 7 Summary results for experiments on random binary instances with $\varvec{p}{} \mathbf{=4}$

Full size table

Note that for $p=4$, the new average percentage of success when selecting a random objective is 25%. Observe from Table 7, the increment in the accuracy generated by our approach with respect to the random case is between 60.0% and 78.4% (without considering tie cases). Finally, our ML framework captures between 40.9% and 53.5% of the average maximum time decrease with respect to a random selection of objective to project. This means that our approach reduces the computational time for the largest instances between 1 and 2 h on average.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sierra-Altamiranda, A., Charkhgard, H., Dayarian, I. et al. Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming. Optim Lett (2024). https://doi.org/10.1007/s11590-024-02100-5

Download citation

Received: 06 October 2022
Accepted: 01 February 2024
Published: 18 March 2024
DOI: https://doi.org/10.1007/s11590-024-02100-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming

Abstract

Access this article

Similar content being viewed by others

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: MSVM

Appendix B: List of features

Observation 1

Appendix C: Experiments on tri-objective random binary programming instances

Appendix D: Experiments on instances with four objectives

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming

Abstract

Access this article

Similar content being viewed by others

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: MSVM

Appendix B: List of features

Observation 1

Appendix C: Experiments on tri-objective random binary programming instances

Appendix D: Experiments on instances with four objectives

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation