Skip to main content
Log in

Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ one of the most effective and recently developed criterion space search algorithms, the so-called KSA, during our study. This algorithm computes all nondominated points of a problem with p objectives by searching on a projected criterion space, i.e., a \((p-1)\)-dimensional criterion apace. We present an effective and fast learning approach to identify on which projected space the KSA should work. We also present several generic features/variables that can be used in machine learning techniques for identifying the best projected space. Finally, we present an effective bi-objective optimization-based heuristic for selecting the subset of the features to overcome the issue of overfitting in learning. Through an extensive computational study over 2000 instances of tri-objective knapsack and assignment problems, we demonstrate that an improvement of up to 18% in time can be achieved by the proposed learning method compared to a random selection of the projected space. To show that the performance of our algorithm is not limited to instances of knapsack and assignment problems with three objective functions, we also report similar performance results when the proposed learning approach is used for solving random binary integer program instances with four objective functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Alvarez, A.M., Louveaux, Q., Wehenkel, L.: A machine learning-based approximation of strong branching. INFORMS J. Comput. 29(1), 185–195 (2017)

    Article  MathSciNet  Google Scholar 

  2. Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)

    Article  MathSciNet  Google Scholar 

  3. Boland, N., Charkhgard, H., Savelsbergh, M.: A criterion space search algorithm for biobjective integer programming: the balanced box method. INFORMS J. Comput. 27(4), 735–754 (2015)

    Article  MathSciNet  Google Scholar 

  4. Boland, N., Charkhgard, H., Savelsbergh, M.: The L-shape search method for triobjective integer programming. Math. Program. Comput. 8(2), 217–251 (2016)

    Article  MathSciNet  Google Scholar 

  5. Boland, N., Charkhgard, H., Savelsbergh, M.: A new method for optimizing a linear function over the efficient set of a multiobjective integer program. Eur. J. Oper. Res. 260(3), 904–919 (2017)

    Article  MathSciNet  Google Scholar 

  6. Boland, N., Charkhgard, H., Savelsbergh, M.: The quadrant shrinking method: a simple and efficient algorithm for solving tri-objective integer programs. Eur. J. Oper. Res. 260(3), 873–885 (2017)

    Article  MathSciNet  Google Scholar 

  7. Boland, N., Charkhgard, H., Savelsbergh, M.: Preprocessing and cut generation techniques for multi-objective binary programming. Eur. J. Oper. Res. 274(3), 858–875 (2019)

    Article  MathSciNet  Google Scholar 

  8. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)

  9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  10. Charkhgard, H., Eshragh, A.: A new approach to select the best subset of predictors in linear regression modeling: bi-objective mixed integer linear programming. ANZIAM J. 61(1), 64–75 (2019)

    Article  MathSciNet  Google Scholar 

  11. Charkhgard, H., Talebian, M., Savelsbergh, M.: Nondominated nash points: application of biobjective mixed integer programming. 4OR - A Q. J. Oper. Res. 16, 151–171 (2018)

    Article  MathSciNet  Google Scholar 

  12. Charkhgard, H., Takalloo, M., Haider, Z.: Bi-objective autonomous vehicle repositioning problem with travel time uncertainty. 4OR - A Q. J. Oper. Res. 18(4), 477–505 (2020)

    Article  MathSciNet  Google Scholar 

  13. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2(Dec), 265–292 (2001)

    Google Scholar 

  14. Dächert, K., Klamroth, K., Lacour, R., Vanderpooten, D.: Efficient computation of the search region in multi-objective optimization. Eur. J. Oper. Res. 260(3), 841–855 (2017)

    Article  MathSciNet  Google Scholar 

  15. Dai, R., Charkhgard, H.: A two-stage approach for bi-objective integer linear programming. Oper. Res. Lett. 46(1), 81–87 (2018)

    Article  MathSciNet  Google Scholar 

  16. He, H., Daume III, H., Eisner, J.M.: Learning to search in branch and bound algorithms. In: Advances in Neural Information Processing Systems, pp. 3293–3301 (2014)

  17. Hutter, F., Hoos, H.H., Stützle, T.: Automatic algorithm configuration based on local search. AAAI 7, 1152–1157 (2007)

    Google Scholar 

  18. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Automated configuration of mixed integer programming solvers. In: International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pp. 186–202. Springer (2010)

  19. Khalil, E.B., Le Bodic, P., Song, L., Nemhauser, G.L., Dilkina, B.N.: Learning to branch in mixed integer programming. In: AAAI, pp. 724–731 (2016)

  20. Khalil, E.B., Dilkina, B., Nemhauser, G.L., Ahmed, S., Shao, Y.: Learning to run heuristics in tree search. In: Proceedings of the International Joint Conference on Artificial Intelligence. AAAI Press, Melbourne, Australia (2017)

  21. Kirlik, G., Sayın, S.: A new algorithm for generating all nondominated solutions of multiobjective discrete optimization problems. Eur. J. Oper. Res. 232(3), 479–488 (2014)

    Article  MathSciNet  Google Scholar 

  22. Le, Q.V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.Y.: On optimization methods for deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 265–272. Omnipress (2011)

  23. Lindauer, M., Hoos, H.H., Hutter, F., Schaub, T.: Autofolio: an automatically configured algorithm selector. J. Artif. Intell. Res. 53, 745–778 (2015)

    Article  Google Scholar 

  24. Lokman, B., Köksalan, M.: Finding all nondominated points of multi-objective integer programs. J. Global Optim. 57(2), 347–365 (2013)

    Article  MathSciNet  Google Scholar 

  25. Malitsky, Y., Sabharwal, A., Samulowitz, H., Sellmann, M.: Non-model-based algorithm portfolios for sat. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 369–370. Springer (2011)

  26. Nikolić, M., Marić, F., Janičić, P.: Instance-based selection of policies for sat solvers. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 326–340. Springer (2009)

  27. Nikolić, M., Marić, F., Janičić, P.: Simple algorithm portfolio for sat. Artif. Intell. Rev. 40(4), 457–465 (2013)

    Article  Google Scholar 

  28. Özlen, M., Azizoğlu, M.: Multi-objective integer programming: a general approach for generating all non-dominated solutions. Eur. J. Oper. Res. 199, 25–35 (2009)

    Article  MathSciNet  Google Scholar 

  29. Özlen, M., Burton, B.A., MacRae, C.A.G.: Multi-objective integer programming: an improved recursive algorithm. J. Optim. Theory Appl. 160(2), 470–482 (2013)

    Article  MathSciNet  Google Scholar 

  30. Özpeynirci, Ö., Köksalan, M.: An exact algorithm for finding extreme supported nondominated points of multiobjective mixed integer programs. Manage. Sci. 56(12), 2302–2315 (2010)

    Article  Google Scholar 

  31. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  32. Prinzie, A., Van den Poel, D.: Random forests for multiclass classification: random multinomial logit. Expert Syst. Appl. 34(3), 1721–1732 (2008)

    Article  Google Scholar 

  33. Przybylski, A., Gandibleux, X.: Multi-objective branch and bound. Eur. J. Oper. Res. 260(3), 856–872 (2017)

    Article  MathSciNet  Google Scholar 

  34. Rice, J.R.: The algorithm selection problem. In: Advances in Computers, Vol. 15, pp. 65–118. Elsevier (1976)

  35. Roth, D., Yih, W.-t.: Integer linear programming inference for conditional random fields. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 736–743. ACM (2005)

  36. Sabharwal, A., Samulowitz, H., Reddy, C.: Guiding combinatorial optimization with UCT. In: International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pp. 356–361. Springer (2012)

  37. Serafini, P.: Some considerations about computational complexity for multi objective combinatorial problems. In: Recent Advances and Historical Development of Vector Optimization, pp. 222–232. Springer (1987)

  38. Sierra-Altamiranda, A., Charkhgard, H.: A new exact algorithm to optimize a linear function over the set of efficient solutions for bi-objective mixed integer linear programming. INFORMS J. Comput. 31(4), 823–840 (2019)

    Article  MathSciNet  Google Scholar 

  39. Sierra-Altamiranda, A., Charkhgard, H.: Ooesalgorithm.jl: a julia package for optimizing a linear function over the set of efficient solutions for biobjective mixed integer linear programming. Int. Trans. Oper. Res. 27(2), 945–957 (2020)

    Article  MathSciNet  Google Scholar 

  40. Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

  41. Soylu, B., Yıldız, G.B.: An exact algorithm for biobjective mixed integer linear programming problems. Comput. Oper. Res. 72, 204–213 (2016)

    Article  MathSciNet  Google Scholar 

  42. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)

    Google Scholar 

  43. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  Google Scholar 

  44. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-first International Conference on Machine Learning, pp. 104. ACM (2004)

  45. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla-07: the design and analysis of an algorithm portfolio for sat. In: International Conference on Principles and Practice of Constraint Programming, pp. 712–727. Springer (2007)

  46. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla: portfolio-based algorithm selection for sat. J. Artif. Intell. Res. 32, 565–606 (2008)

    Article  Google Scholar 

  47. Xu, L., Hoos, H., Leyton-Brown, K.: Hydra: automatically configuring algorithms for portfolio-based selection. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)

  48. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Hydra-mip: Automated algorithm configuration and selection for mixed integer programming. In: RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion at the International Joint Conference on Artificial Intelligence (IJCAI), pp. 16–30 (2011)

  49. Xu, L., Hutter, F., Shen, J., Hoos, H.H., Leyton-Brown, K.: Satzilla2012: Improved algorithm selection based on cost-sensitive classification models. In: Proceedings of SAT challenge, pp. 57–58 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hadi Charkhgard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: MSVM

Our proposed ML framework needs a classification method to be run. The classification algorithm that we used in this study is MSVM. This approach seeks to learn a function \(\beta :\varPhi \rightarrow \varOmega \) to predict which objective function should be selected. Let \(S=\{(\varvec{\phi }^1,y^1),\dots ,(\varvec{\phi }^s,y^s)\}\) be a set of training instances. For each instance \(i\in \{1,\dots , s\}\), \(\varvec{\phi }^i\) is the vector of features and \(y^i\) is its corresponding label, i.e., the index of the objective function with the best performance. We use the multi-class formulation in [13], which establish a classifier of the form:

$$\begin{aligned} \beta _{{\textbf {W}}}(\varvec{\phi },y)=\arg \max \{\varGamma _{\varvec{w}_r}(\varvec{\phi },y): r\in {1,\dots ,p}\} \end{aligned}$$
(3)

where \({\textbf {W}}\in \mathbb {R}^{p\times k}\) is a parameter matrix, p is the number of objectives, and k is the number of features. Also, \(\varGamma _{\varvec{w}_r}(\varvec{\phi },y):=\varvec{w}_r\cdot \varPsi (\varvec{\phi },y)\) where \(\varvec{w}_r\) is the rth row of \({\textbf {W}}\) and \(\varPsi (\varvec{\phi },y)\) is a feature vector relating \(\varvec{\phi }\) and y. Observe that \(\varGamma _{\varvec{w}_r}(\varvec{\phi },y)\) is a linear function, which makes it suitable for a better understanding of the impact of the features in our predictions. Eventually, this will help to our best subset selection of features, that will be explained in detail in the next subsection. We used the MSVM algorithm described in [44] to train the model, which finds the solution of the following optimization problem:

$$\begin{aligned}{} & {} \min _{{\textbf {W}},\xi _i\ge 0}\ \ \ \frac{1}{2}{} {\textbf {W}}^{\intercal }{} {\textbf {W}}+\frac{C}{s}\sum _{i=1}^{s}\xi _i\nonumber \\{} & {} \text {s.t. }\forall \ i\in \{1,\dots ,s\},\bar{y}^i\in \varOmega : \varGamma (\phi ^i,y^i)-\varGamma (\phi ^i,\bar{y}^i)\ge \varDelta (y^i,\bar{y}^i)-\xi _i, \end{aligned}$$
(4)

where \(C > 0\) is the regularization parameter, and \(\varDelta :\varOmega ^2\rightarrow \{0,1\}\) is the loss function. Basically, \(\varDelta (y,\bar{y})\) is a function that returns 0 when the predicted label \(\bar{y}\) is equal to y, and returns 1 otherwise. Generally, Problem (4) is not solved to optimality, instead, a parameter \(\epsilon \) is defined to stop the search when the absolute gap reaches this value.

Appendix B: List of features

In this section, we present all \(5p^2+106p-50\) features that we used in this study. Note that since \(p=3\) in our computational study, the total number of features are 313 features. For convenience, we partition all the proposed features into some subsets and present them next. We use the letter F to represent each subset.

The first subset of features is \(F^1=\{\varvec{c}_1^\intercal \varvec{\tilde{x}},\varvec{c}_2^\intercal \varvec{\tilde{x}},\dots ,\varvec{c}_p^\intercal \varvec{\tilde{x}}\}\), which will be automatically computed during the pre-ordering process. To incorporate the impact of the size of an instance in the learning process, we introduce \(F^2=\{n\}\), \(F^3=\{m\}\), and \(F^4=\{\text{ density }(A)\}\). To incorporate the impact of zero coefficients of the objective functions in the learning process, for each \(i\in \{1,\dots ,p\}\) we introduce:

$$\begin{aligned} F_{i}^{5}=\{\text{ size }(S^5_i)\}, \end{aligned}$$

where \(S^5_i=\{c\in \varvec{c}_{i}:c=0\}\). To incorporate the impact of positive coefficients of the objective functions in the learning process, for each \(i\in \{1,\dots ,p\}\) we introduce:

$$\begin{aligned} F_i^{6}=\{\text{ size }(S^6_i),\text{ Avg }(S^6_i),\text{ Min }(S^6_i),\text{ Max }(S^6_i),\text{ Std }(S^6_i),\text{ Median }(S^6_i)\}, \end{aligned}$$

where \(S^6_i=\{c\in \varvec{c}_{i}:c>0\}\). To incorporate the impact of negative coefficients of the objective functions in the learning process, for each \(i\in \{1,\dots ,p\}\) we introduce:

$$\begin{aligned} F_i^{7}=\{\text{ size }(S^7_i),\text{ Avg }(S^7_i),\text{ Min }(S^7_i),\text{ Max }(S^7_i),\text{ Std }(S^7_i),\text{ Median }(S^7_i)\}, \end{aligned}$$

where \(S^7_i=\{c\in \varvec{c}_{i}:c<0\}\). To establish a relation between the objective functions and A in the learning process, for each \(i\in \{1,\dots ,p\}\) we introduce:

$$\begin{aligned} F_i^{8}=\{\text{ Avg }(S^8_i),\text{ Min }(S^8_i),\text{ Max }(S^8_i),\text{ Std }(S^8_i),\text{ Median }(S^8_i)\}, \end{aligned}$$

where \(S^8_i=\cup _{j\in \{1,\dots ,m\}}\{\varvec{c}^\intercal _i \varvec{a}^\intercal _{j.}\}\) and \(\varvec{a}_{j.}\) represents row j of matrix A. For the same purpose, for each \(i\in \{1,\dots ,p\}\) we also introduce:

$$\begin{aligned} F^{9}_{i}=\varvec{c}^\intercal _i\times A^\intercal \times \varvec{b}. \end{aligned}$$

For each \(j\in \{1,\dots ,m\}\), let \(b'_j:=b_j+1\) if \(b_j \ge 0\) and \(b'_j:=b_j-1\) otherwise. To establish a relation between the positive and negative coefficients in the objective functions and \(\varvec{b}\) in the learning process, for each \(i\in \{1,\dots ,p\}\) and \(k\in \{10,11\}\) we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

where \(S^{10}_i=\cup _{j\in \{1,\dots ,m\}}\{\frac{\sum _{c\in \varvec{S}_{i}^{6}}}{b'_j}\}\) and \(S^{11}_i=\cup _{j\in \{1,\dots ,m\}}\{\frac{\sum _{c\in \varvec{S}_{i}^{7}}}{b'_j}\}\).

For each \(i\in \{1,\dots ,p\}\), let \(l_{i}:=\min _{\varvec{x} \in \mathcal {X}_{LR}} \ z_i(\varvec{x})\) and \(u_{i}:=\max _{\varvec{x} \in \mathcal {X}_{LR}} \ z_i(\varvec{x})\) where \(\mathcal {X}_{LR}\) is the linear programming relaxation of \(\mathcal {X}\). To incorporate the impact of the volume of the search region in a projected criterion space, i.e., a \((p-1)\)-dimensional criterion space, in the learning process, for each \(i\in \{1,\dots ,p\}\) we introduce:

$$\begin{aligned} F_i^{12}=\prod \limits _{j\in \{1,\dots ,p\}\setminus \{i\}}(u_{j}-l_{j}). \end{aligned}$$

Let \(\bar{c}'_i:=\frac{\sum _{c\in \varvec{c}_i}{|c|}}{n}\) be the average of the absolute values of the elements in the objective \(i\in \{1,\dots ,p\}\). Note that we understand that \(\varvec{c}_i\) is a vector (and not a set) and hence \(c\in \varvec{c}_i\) is not a well-defined mathematical notation. However, for simplicity, we keep this notation as is and basically treat each component of the vector as an element.

We also introduce some features that measures the size of an instance in an indirect way. Specifically, for each \(i\in \{1,\dots ,p\}\) and \(k\in \{13,14,15\}\) we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

to measure n, p and m, respectively, where

$$\begin{aligned}{} & {} S^{13}_i:=\cup _{j\in \{1,\dots ,p\}\backslash \{i\}}\left\{ \frac{\sum _{c\in \varvec{c}_i}{|c|}}{\bar{c}'_j+1}\right\} ,\\{} & {} S^{14}_i:=\cup _{k\in \{1,\dots ,n\}}\left\{ \frac{\sum _{j\in \{1,\dots ,p\}\setminus \{i\}}{|c_{jk}|}}{|c_{ik}|+1}\right\} , \end{aligned}$$

and

$$\begin{aligned} S^{15}_i:=\cup _{k\in \{1,\dots ,n\}}\left\{ \frac{\sum _{j=1}^{m}{|a_{jk}|}}{|c_{ik}|+1}\right\} . \end{aligned}$$

Motivated by the idea using the product of two variables for studying the interaction effect between them, for each \(i\in \{1,\dots ,p\}\), we introduce:

$$\begin{aligned} F_i^{16}=\{\text{ Avg }(S^{16}_i),\text{ Min }(S^{16}_i),\text{ Max }(S^{16}_i),\text{ Std }(S^{16}_i),\text{ Median }(S^{16}_i)\}, \end{aligned}$$

where \(S_i^{16}=\cup _{j\in \{1,\dots ,p\}\backslash \{i\}}\{\sum _{l=1}^{n} c_{il}c_{jl}\}\). Similarly, we also define a subset of features based on the leverage score \(LS_j\) of the variable \(j\in \{1,\dots ,n\}\) in the matrix A. Specifically, for each \(i\in \{1\dots ,p\}\), we introduce:

$$\begin{aligned} F_i^{17}=\sum _{j=1}^n c_{ij} LS_j. \end{aligned}$$

where \(LS_j:= \frac{||\varvec{a}_{j}||^2}{\sum _{l=1}^{n} ||\varvec{a}_{l}||^2}\) and \(\varvec{a}_{j}\) represents column j of matrix A for each \(j\in \{1,\dots ,n\}\). Let \(\text {Avg}(C):=\text {Avg}(\varvec{c}_1,\dots ,\varvec{c}_p)\), \(\text {Std}(C):=\text {Std}(\varvec{c}_1,\dots ,\varvec{c}_p)\), and

$$\begin{aligned} O:=\{(-\infty ,-1),(-1,-0.5),(-0.5, 0),(0,0.5),(0.5,1),(1,\infty )\}. \end{aligned}$$

For each \(i\in \{1\dots ,p\}\), we define

$$\begin{aligned} F_i^{18}=\cup _{(l,u)\in O}\{car(\varvec{c}_i^{l,u})\} \end{aligned}$$

where

$$\begin{aligned} \varvec{c}_i^{l,u}:=\{c\in \varvec{c}_i:\text {Avg}(C)+l\ \text {Std}(C)\le c\le \text {Avg}(C)+u\ \text {std}(C)\}. \end{aligned}$$

The following observation creates the basis of the remaining features.

Observation 1

Let \(\alpha _i>0\) where \(i\in \{1,\dots ,p\}\) and \(\beta _k\) where \(k\in \{1,\dots ,m\}\) be positive constants. For a MOBLP, its equivalent problem can be constructed as follows:

$$\begin{aligned} \begin{aligned} \min \ {}&\left\{ \sum _{j=1}^n\alpha _1c_{1j}x_j,\dots ,\sum _{j=1}^n\alpha _pc_{pj}x_j\right\} \\ \text{ s.t. }&\sum _{j=1}^n\beta _k a_{kj}x_{j}\le \beta _kb_k{} & {} \forall \ k\in \{1,\dots ,m\},\\&x_j\in \{0,1\}{} & {} \forall \ j\in \{1,\dots ,n\} \end{aligned} \end{aligned}$$
(5)

Observation 1 is critical because it shows that our ML approach should not be sensitive to positive scaling. So, the remaining features are specifically designed to address this issue. Note that the remaining features are similar to the ones that we have already seen before but they are less sensitive to a positive scaling.

Let \(c^{max}_{i}=\max \{|c_{i1}|,\dots , |c_{in}|\}\) and \(\bar{\varvec{c}}_{i}=(\frac{c_{i1}}{c^{max}_{i}},\dots ,\frac{c_{in}}{c^{max}_{i}})\). To incorporate the impact of the relative number of zeros, positive, and negative coefficients of the objective functions in the learning process, for each \(i\in \{2,\dots ,p\}\), we introduce:

$$\begin{aligned} F_{i}^{19}= & {} \left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^0}{1+ car(\bar{\varvec{c}}_i)^0}\right) \right\} ,\\ F_{i}^{20}= & {} \left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^+}{1+car(\bar{\varvec{c}}_i)^+}\right) \right\} , \end{aligned}$$

and

$$\begin{aligned} F_{i}^{21}=\left\{ \ln \left( 1+\frac{car(\bar{\varvec{c}}_1)^-}{1+car(\bar{\varvec{c}}_i)^-}\right) \right\} , \end{aligned}$$

where \(car(\bar{\varvec{c}}_i)^0\) is the number of elements in \(\bar{\varvec{c}}_i\) with zero values. Also, \(car(\bar{\varvec{c}}_i)^+\) is the number of elements in \(\bar{\varvec{c}}_i\) with positive values. Finally, \(car(\bar{\varvec{c}}_i)^-\) is the number of elements in \(\bar{\varvec{c}}_i\) with negative values.

The following function is helpful for introducing some other features:

$$\begin{aligned} g(a)= \left\{ \begin{array}{cc} a+1 &{} \text{ if } a\ge 0 \\ a-1 &{} \text{ otherwise } \end{array} \right\} . \end{aligned}$$

For each \(l\in \{1,\dots ,m\}\), let \(a^{max}_{l}=\max \{|a_{l1}|,\dots , |a_{ln}|\}\), \(\bar{\varvec{a}}_{l}=(\frac{a_{l1}}{a^{max}_{l}},\dots ,\frac{a_{ln}}{a^{max}_{l}})\) and \(\bar{b}_l=\frac{b_l}{a^{max}_{l}}\). To incorporate the relative impact of the magnitude of objective function coefficients, and constraints in the learning process, for each \(i\in \{2,\dots ,p\}\) and \(k\in \{22,23,24,25,26,27\}\), we introduce:

$$\begin{aligned} F_i^{k}=\{\text{ Avg }(S^{k}_i),\text{ Min }(S^{k}_i),\text{ Max }(S^{k}_i),\text{ Std }(S^{k}_i),\text{ Median }(S^{k}_i)\}, \end{aligned}$$

where

$$\begin{aligned} S_i^{22}= & {} \left\{ \frac{\bar{c}_{11}}{g(\bar{c}_{i1})},\dots , \frac{\bar{c}_{1n}}{g(\bar{c}_{in})}\right\} ,\\ S_i^{23}= & {} \left\{ \frac{\bar{c}^2_{11}}{g(\bar{c}^2_{i1})},\dots , \frac{\bar{c}^2_{1n}}{g(\bar{c}^2_{in})}\right\} ,\\ S_i^{24}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{1j}}{ng(\bar{c}_{ij})},\dots , \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{mj}}{ng(\bar{c}_{ij})}\right\} ,\\ S_i^{25}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{1j}}{ng(\bar{c}_{ij})}-\bar{b}_1,\dots , \sum _{j=1}^n \frac{\bar{c}_{1j}\bar{a}_{mj}}{ng(\bar{c}_{ij})}-\bar{b}_m\right\} ,\\ S_i^{26}= & {} \left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{1j}}{ng(\bar{c}^2_{ij})},\dots , \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{mj}}{ng(\bar{c}^2_{ij})}\right\} , \end{aligned}$$

and

$$\begin{aligned} S_i^{27}=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{1j}}{ng(\bar{c}^2_{ij})}-\bar{b}_1,\dots , \sum _{j=1}^n \frac{\bar{c}^2_{1j}\bar{a}_{mj}}{ng(\bar{c}^2_{ij})}-\bar{b}_m\right\} . \end{aligned}$$

For the same reason, for each \(i\in \{1,\dots ,p\}\) and \(l\in \{1\dots ,p\}\backslash \{i\}\), we introduce:

$$\begin{aligned} F_{il}^{28}=\left\{ \text{ Avg }(S^{28}_{il}),\text{ Min }(S^{28}_{il}),\text{ Max }(S^{28}_{il}),\text{ Std }(S^{28}_{il}),\text{ Median }(S^{28}_{il})\right\} , \end{aligned}$$

where

$$\begin{aligned} S_{il}^{28}=\left\{ \frac{\bar{c}_{i1}}{g(\bar{c}_{l1})},\dots , \frac{\bar{c}_{in}}{g(\bar{c}_{ln})}\right\} . \end{aligned}$$

Finally, let \(\bar{A}_j=\{\bar{a}_{1j},\dots ,\bar{a}_{mj}\}\) for each \(j\in \{1,\dots ,n\}\). For each \(i\in \{2,\dots ,p\}\), the following subsets of features are also defined for linking the constraints and objective functions:

$$\begin{aligned} F_i^{29}&=\left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}_{ij})}, \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}_{ij})},\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}_{ij})}\right\} ,\\ F_i^{30}&=\left\{ \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Avg }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Min }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Max }(\bar{b}_j),\right. \\&\quad \left. \sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Std }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}_{ij})}-\text{ Median }(\bar{b}_j)\right\} ,\\ F_i^{31}&=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\right. \\{} & {} \quad \left. \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}^2_{ij})},\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}\right\} ,\\ F_i^{32}&=\left\{ \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Avg }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Avg }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Min }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Min }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Max }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Max }(\bar{b}_j),\right. \\&\quad \left. \sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Std }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Std }(\bar{b}_j),\sum _{j=1}^n \frac{\bar{c}^2_{1j}\text{ Median }(\bar{A}_j)}{ng(\bar{c}^2_{ij})}-\text{ Median }(\bar{b}_j)\right\} , \end{aligned}$$

where \(\bar{c}^2_{ij}=\bar{c}_{ij}\bar{c}_{ij}\) for each \(i\in \{1,\dots ,p\}\) and \(j\in \{1\dots ,n\}\).

Appendix C: Experiments on tri-objective random binary programming instances

In this section, we generate 1000 tri-objective random binary programming instances and replicate the experiment conducted in Sect. 6.1. Instances are divided over 5 subclasses, each with 200 instances. In each subclass, the number of decision variables, i.e., n, and the number of constraints, i.e., m, are equal, i.e., \(n=m\). Specifically, we assume that \(n\in \{50,55,60,65,70\}\). Each instance has the following structure,

$$\begin{aligned} \max _{\varvec{x} \in \mathcal {X}} \ \{z_1(\varvec{x}),z_2(\varvec{x}),z_3(\varvec{x})\}, \end{aligned}$$

where \(\mathcal {X}:=\big \{\varvec{x}\in \mathbb {B}^{n}:A\varvec{x}\le \varvec{b}\big \}\) represents the feasible set in the decision space, and \(z_i(\varvec{x})=\varvec{c}^\intercal _{i}\varvec{x}\) for \(i=1,2,3\). We randomly generate the parameters such that \(\varvec{c}_{i}\in \{1,2,\dots ,20\}^n\), \(A\in \{1,2,\dots ,10\}^{n\times n}\) with a sparsity of 0.75, and \(b_j=\Big \lceil {\frac{\sum _{i=1}^{n}a_{ij}}{b'}}\Big \rceil \) for \(j=1,2,\dots ,n\), where \(a_{ij}\) is an element of A and \(b'\) is drawn randomly from the interval (1, 3).

Table 6 Summary results for experiments on random binary instances with \(\varvec{p}{} \mathbf{=3}\)

We observe from Table 6 that the accuracy of our ML framework is 58.0% (without considering tie cases). Time columns show that the average time that KSA takes to solve the problem when projecting based on the best objective, a random objective, and the objective proposed by the ML approach. It is evident that our approach shows a time improvement of around 4%-8% on average compared to a random projection. Additionally, our approach captures between 45% and 69% of the maximum possible time decrease. Note that, for instances with 65 and 70 variables, the time decrease of our approach is between 1000 and 3000 s on average.

Appendix D: Experiments on instances with four objectives

In our final experiment, we replicate the experiment that was performed in Sect. 6.1 over a set of 1000 random binary instances with four objectives. The instances are created using the procedure described in the last experiment, i.e., Sect. 7. Again, there are five subclasses, each with 200 instances, which are defined based the number of variables, i.e., \(n\in \{40,45,50,55,60\}\). Based on the formula presented in Sect. 4.1, the number of features for this experiment is 484. However, our proposed feature subset selection approach selects 72 of these features.

Table 7 Summary results for experiments on random binary instances with \(\varvec{p}{} \mathbf{=4}\)

Note that for \(p=4\), the new average percentage of success when selecting a random objective is 25%. Observe from Table 7, the increment in the accuracy generated by our approach with respect to the random case is between 60.0% and 78.4% (without considering tie cases). Finally, our ML framework captures between 40.9% and 53.5% of the average maximum time decrease with respect to a random selection of objective to project. This means that our approach reduces the computational time for the largest instances between 1 and 2 h on average.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sierra-Altamiranda, A., Charkhgard, H., Dayarian, I. et al. Learning to project in a criterion space search algorithm: an application to multi-objective binary linear programming. Optim Lett (2024). https://doi.org/10.1007/s11590-024-02100-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11590-024-02100-5

Keywords

Navigation