Skip to main content
Log in

The cluster problem in constrained global optimization

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Deterministic branch-and-bound algorithms for continuous global optimization often visit a large number of boxes in the neighborhood of a global minimizer, resulting in the so-called cluster problem (Du and Kearfott in J Glob Optim 5(3):253–265, 1994). This article extends previous analyses of the cluster problem in unconstrained global optimization (Du and Kearfott 1994; Wechsung et al. in J Glob Optim 58(3):429–438, 2014) to the constrained setting based on a recently-developed notion of convergence order for convex relaxation-based lower bounding schemes. It is shown that clustering can occur both on nearly-optimal and nearly-feasible regions in the vicinity of a global minimizer. In contrast to the case of unconstrained optimization, where at least second-order convergent schemes of relaxations are required to mitigate the cluster problem when the minimizer sits at a point of differentiability of the objective function, it is shown that first-order convergent lower bounding schemes for constrained problems may mitigate the cluster problem under certain conditions. Additionally, conditions under which second-order convergent lower bounding schemes are sufficient to mitigate the cluster problem around a global minimizer are developed. Conditions on the convergence order prefactor that are sufficient to altogether eliminate the cluster problem are also provided. This analysis reduces to previous analyses of the cluster problem for unconstrained optimization under suitable assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Adjiman, C.S., Floudas, C.A.: Rigorous convex underestimators for general twice-differentiable problems. J. Glob. Optim. 9(1), 23–40 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, Hoboken (2013)

    MATH  Google Scholar 

  3. Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52(1), 1–28 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bompadre, A., Mitsos, A., Chachuat, B.: Convergence analysis of Taylor models and McCormick–Taylor models. J. Glob. Optim. 57(1), 75–114 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bonnans, J.F., Ioffe, A.: Second-order sufficiency and quadratic growth for nonisolated minima. Math. Oper. Res. 20(4), 801–817 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  6. Clarke, F.H.: Optimization and Nonsmooth Analysis, Classics in Applied Mathematics, vol. 5. SIAM, Philadelphia (1990)

    Book  Google Scholar 

  7. Du, K., Kearfott, R.B.: The cluster problem in multivariate global optimization. J. Glob. Optim. 5(3), 253–265 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  8. Floudas, C.A., Pardalos, P.M., Adjiman, C., Esposito, W.R., Gümüs, Z.H., Harding, S.T., Klepeis, J.L., Meyer, C.A., Schweiger, C.A.: Handbook of test problems in local and global optimization. In: Nonconvex Optimization and Its Applications, 1st edn, vol. 33. Springer, Berlin (1999)

  9. Goldsztejn, A., Domes, F., Chevalier, B.: First order rejection tests for multiple-objective optimization. J. Glob. Optim. 58(4), 653–672 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  10. Hijazi, H., Liberti, L.: Constraint qualification failure in action. Oper. Res. Lett. 44(4), 503–506 (2016)

    Article  MathSciNet  Google Scholar 

  11. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer, Berlin (1996)

    Book  MATH  Google Scholar 

  12. Ioffe, A.: On sensitivity analysis of nonlinear programs in Banach spaces: the approach via composite unconstrained optimization. SIAM J. Optim. 4(1), 1–43 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kearfott, R.B., Du, K.: The cluster problem in global optimization: the univariate case. In: Validation Numerics, Computing Supplementum, vol. 9, pp. 117–127. Springer, Berlin (1993)

  14. Khan, K.A., Watson, H.A.J., Barton, P.I.: Differentiable McCormick relaxations. J. Glob. Optim. 67(4), 687–729 (2017)

  15. Krawczyk, R., Nickel, K.: Die zentrische form in der Intervallarithmetik, ihre quadratische Konvergenz und ihre Inklusionsisotonie. Computing 28(2), 117–137 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  16. Mayer, G.: Epsilon-inflation in verification algorithms. J. Comput. Appl. Math. 60(1), 147–169 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  17. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems. Math. Program. 10(1), 147–175 (1976)

    Article  MATH  Google Scholar 

  18. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. Society for Industrial and Applied Mathematics, Philadelphia (2009)

    Book  MATH  Google Scholar 

  19. Najman, J., Bongartz, D., Tsoukalas, A., Mitsos, A.: Erratum to: multivariate McCormick relaxations. J. Glob. Optim. 68(1), 219–225 (2017)

    Article  MATH  Google Scholar 

  20. Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Glob. Optim. 66(4), 597–628 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  21. Neumaier, A.: Complete search in continuous global optimization and constraint satisfaction. Acta Numer. 13, 271–369 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  22. Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, New York (1976)

    MATH  Google Scholar 

  23. Schöbel, A., Scholz, D.: The theoretical and empirical rate of convergence for geometric branch-and-bound methods. J. Glob. Optim. 48(3), 473–495 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  24. Scholtes, S.: Introduction to Piecewise Differentiable Equations. SpringerBriefs in Optimization, 1st edn. Springer, New York (2012)

    Book  MATH  Google Scholar 

  25. Scholz, D.: Theoretical rate of convergence for interval inclusion functions. J. Glob. Optim. 53(4), 749–767 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  26. Tsoukalas, A., Mitsos, A.: Multivariate McCormick relaxations. J. Glob. Optim. 59(2–3), 633–662 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  27. Van Iwaarden, R.J.: An improved unconstrained global optimization algorithm. Ph.D. thesis, University of Colorado at Denver (1996)

  28. Wechsung, A.: Global optimization in reduced space. Ph.D. thesis, Massachusetts Institute of Technology (2014)

  29. Wechsung, A., Schaber, S.D., Barton, P.I.: The cluster problem revisited. J. Glob. Optim. 58(3), 429–438 (2014)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank three anonymous reviewers and an associate editor for suggestions which helped improve the readability of this article, and Dr. Johannes Jäschke for helpful discussions (in particular, for bring references [5] and [12] to our attention). The first author would also like to thank Jaichander Swaminathan for helpful discussions regarding the proof of Theorem 2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul I. Barton.

Additional information

The authors gratefully acknowledge financial support from BP. This work was conducted as a part of the BP-MIT conversion research program.

Appendix: Proofs

Appendix: Proofs

1.1 Proof of Lemma 4

Lemma 4 Consider Problem (P). Suppose \(\mathbf {x}^*\) is nonisolated and f is differentiable at \(\mathbf {x}^*\). Then \(\forall \theta > 0\), \(\exists \alpha > 0\) such that

$$\begin{aligned} \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}> \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}- \theta . \end{aligned}$$

Proof

We proceed by contradiction. Define

$$\begin{aligned} L(\alpha )&:= \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}, \\ L^*&:= \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}, \end{aligned}$$

and note that \(L(\alpha )\) is monotonically nonincreasing on \((0,+\infty )\). Suppose \(\exists \theta > 0\) such that \(\forall \alpha > 0\), we have \(L(\alpha ) \le L^* -\theta \). Consider a sequence \(\{\alpha _k\} \rightarrow 0\) with \(\alpha _k > 0\), and a corresponding sequence \(\{\mathbf {d}_k\}\) such that

$$\begin{aligned} \mathbf {d}_k \in \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t_k > 0 \, : \, (\mathbf {x}^*+ t_k \mathbf {d}) \in \mathscr {N}^1_{\alpha _k}(\mathbf {x}^*) \cap \mathscr {F}(X), {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\le L^* -\frac{\theta }{2} \right\} . \end{aligned}$$

The existence of \(\mathbf {d}_k\) follows from the assumption that \(L(\alpha ) \le L^* -\theta \), \(\forall \alpha > 0\). Since \({||}\mathbf {d}_k{||}_1 = 1\), \(\forall k\), we have the existence of \(\mathbf {d}^* \in \mathbb {R}^{n_x}\) with \(\mathbf {d}^* = \underset{k_q \rightarrow \infty }{\lim } \mathbf {d}_{k_q}\) and \({||}\mathbf {d}^*{||}_1 = 1\) for some convergent subsequence \(\{\mathbf {d}_{k_q}\}\). Furthermore, \(\mathbf {d}^* \in T(\mathbf {x}^*)\) and \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}^* \le L^* -\displaystyle \frac{\theta }{2}\), since \(\forall k_q\) we have \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}_{k_q} \le L^* -\displaystyle \frac{\theta }{2}\), which contradicts the definition of \(L^*\). \(\square \)

1.2 Proof of Proposition 2

Proposition 2 Consider Problem (P) with \(m_E \ge 1\). Suppose \(\mathbf {x}^*\) is nonisolated, f is differentiable at \(\mathbf {x}^*\), functions \(h_k\), \(k = 1,\ldots ,m_E\), are differentiable at \(\mathbf {x}^*\), and \(\mathscr {A}(\mathbf {x}^*) = \emptyset \). Furthermore, suppose there exist multipliers \(\varvec{\lambda }^* \in \mathbb {R}^{m_E}\) corresponding to the equality constraints such that \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point. Then

$$\begin{aligned} \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0. \end{aligned}$$

Proof

Since \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point, we have

$$\begin{aligned} \nabla f(\mathbf {x}^*) + \displaystyle \sum _{k=1}^{m_E}{\lambda ^*_k \nabla h_k(\mathbf {x}^*)} = \mathbf {0}. \end{aligned}$$

From the assumption that \(\mathbf {x}^*\) is a nonisolated feasible point, we have that the set \(\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} \) is nonempty. Additionally, we have

$$\begin{aligned} T(\mathbf {x}^*) \subset \mathscr {L}(\mathbf {x}^*) := \left\{ \mathbf {d}\in \mathbb {R}^{n_x} : {\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0, \forall k \in \{1,\ldots ,m_E\} \right\} , \end{aligned}$$

where \(\mathscr {L}(\mathbf {x}^*)\) denotes the linearized cone at \(\mathbf {x}^*\) (see, for instance, [2]). Consequently, for each \(\mathbf {d}\in T(\mathbf {x}^*)\) with \({||}\mathbf {d}{||}_1 = 1\), we have \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0\). \(\square \)

1.3 Proof of Theorem 2

Theorem 2 Suppose the assumptions of Lemma 5 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\), \(r = \displaystyle \frac{2\varepsilon }{L}\).

  1. 1.

    If \(\delta \ge 2r\), let \(N = 1\).

  2. 2.

    If \(\displaystyle \frac{2r}{m-1} > \delta \ge \displaystyle \frac{2r}{m}\) for some \(m \in \mathbb {N}\) with \(m \le n_x\) and \(2 \le m \le 5\), then let

    $$\begin{aligned} N = \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{m-3}{3}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N= & {} {\lceil }2 \left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil }^{n_x-1} \left( {\lceil }2 \left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } + 2n_x {\lceil }\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } \right) . \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_{5}\).

Proof

This proof is rederived based on Corollary 2.1 in [28] and the proof of Lemma 3 in [29]. Note that the condition in the second case is corrected to ‘\({2 \le m \le 5}\)’ as opposed to ‘\({2 \le m \le 6}\)’ in [28].

From Lemma 5, we have \(\hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} \subset \Big \{\mathbf {x}:{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le \displaystyle \frac{2\varepsilon }{L}\Big \} {=}{:} \tilde{B}\). Therefore, an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) can be obtained by conservatively estimating the number of boxes of width \(\delta \) required to cover \(\tilde{B}\). In what follows, we will assume without loss of generality that \(\mathbf {x}^* = \mathbf {0}\).

  1. 1.

    Suppose \(\delta \ge 2r\). Consider the box \(B_{\delta }\) of width \(\delta \) centered at \(\mathbf {x}^* = \mathbf {0}\). We have

    $$\begin{aligned} \mathbf {x}\in \tilde{B} \implies {||}\mathbf {x}{||}_1 \le \displaystyle \frac{2\varepsilon }{L} \implies {||}\mathbf {x}{||}_{\infty } \le \displaystyle \frac{2\varepsilon }{L} = r \le \displaystyle \frac{\delta }{2} \implies \mathbf {x}\in B_{\delta }, \end{aligned}$$

    where we have used the fact that \({||}\mathbf {x}{||}_{\infty } \le {||}\mathbf {x}{||}_1\), \(\forall \mathbf {x}\in \mathbb {R}^{n_x}\). Therefore, \(B_{\delta }\) is sufficient to cover \(\tilde{B}\).

  2. 2.

    Suppose \(m \le n_x\) with \(m \in \{2,\ldots ,5\}\) and \(\delta \ge \frac{2r}{m}\). Place a box \(B_{\delta }\) of width \(\delta \) centered at \(\mathbf {x}^* = \mathbf {0}\) (the condition on \(\delta \) ensures that \(B_{\delta }\) intersects the boundary of \(\tilde{B}\)). Let

    $$\begin{aligned} E_i := \left\{ \mathbf {e}\in \mathbb {R}^{n_x} : e_j \in \left\{ -\displaystyle \frac{\delta }{2}, 0, \displaystyle \frac{\delta }{2}\right\} , \, \forall j \in \{1,\ldots ,n_x\}, \,\, \displaystyle \sum _{j=1}^{n_x}{I_0(e_j)} = i \right\} , \end{aligned}$$

    where \(I_0:\mathbb {R}\rightarrow \{0,1\}\) is defined as \(I_0(x) := {\left\{ \begin{array}{ll} 0, &{}\text{ if } x = 0 \\ 1, &{} \text{ otherwise } \end{array}\right. }\), denote the set of midpoints of the \((n_x-i)\)-dimensional faces of \(B_{\delta }\) (each element of \(E_i\) has exactly i nonzero components, each of which is \(\pm \frac{\delta }{2}\)). Note that \({|}*{|}{E_i} = 2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) \), \(\forall i \in \{1,\ldots ,n_x\}\). Under the assumption \(\delta \ge \frac{2r}{m}\), we will show that, in addition to \(B_{\delta }\), it is sufficient to place one box beside \(B_{\delta }\) along the directions in \(E_1, \ldots , E_{m-1}\) when \(m = 2\) or \(m = 3\), and two boxes beside \(B_{\delta }\) along the directions in \(E_1\) and one box beside \(B_{\delta }\) along the directions in \(E_2, \ldots , E_{m-1}\) when \(m = 4\) or \(m = 5\) in order to cover \(\tilde{B}\). First, we show that we need not place any boxes beside \(B_{\delta }\) along the directions in \(E_{m}, \ldots , E_{n_x}\). Let \(\mathbf {e}\in E_i\) with \( i \in \{m,\ldots ,n_x\}\). We have \({||}\mathbf {e}{||}_1 = \frac{\delta }{2}i \ge \frac{i}{m}r \ge r\), which implies \(\mathbf {e}\in \partial \tilde{B} \cup \tilde{B}^{\text {C}}\) (where \(\partial \tilde{B}\) denotes the boundary of \(\tilde{B}\)). Consequently, boxes placed beside \(B_{\delta }\) along the directions in \(E_{m}, \ldots , E_{n_x}\) do not intersect the interior of \(\tilde{B}\) and are not required to cover \(\tilde{B}\). Suppose \(\delta \ge \frac{2r}{m}\), and let \(\mathbf {e}\in E_i\) for some \(i \in \{1,\ldots ,m-1\}\). The distance from \(\mathbf {e}\), which is the midpoint of an \((n-i)\)-dimensional face of \(B_{\delta }\), to \(\frac{2r}{\delta i}\mathbf {e}\), which is a point on the boundary of \(\tilde{B}\) in the direction \(\mathbf {e}\), in the \(\infty \)-norm is \(\frac{r}{i} - \frac{\delta }{2} \le \frac{r}{i} - \frac{r}{m}\). If this distance is less than \(\delta \) for each \(i \in \{1,\ldots ,m-1\}\), then one box beside \(B_{\delta }\) along the directions in \(E_1, \ldots , E_{m-1}\) is sufficient to cover \(\tilde{B}\). This amounts to requiring

    $$\begin{aligned}&\qquad \quad \,\,\displaystyle \frac{r}{i} - \displaystyle \frac{r}{m} \le \displaystyle \frac{2r}{m}, \, \forall i \in \{1,\ldots , m-1\} \iff m \le 3i, \, \forall i \in \{1,\ldots , m-1\}\\&\iff m = 2 \,\, \text {or} \,\, m = 3. \end{aligned}$$

    Note that if \(m = 4\) or \(m = 5\), we still have \(m \le 3i\), \(\forall i \in \{2,\ldots , m-1\}\). Additionally, \(\frac{r}{1} - \frac{r}{m} \le \frac{4r}{m} \le 2\delta \) in such cases. Therefore, when \(m = 4\) or \(m = 5\), two boxes along the directions in \(E_1\) and one box along the directions in \(E_2, \ldots , E_{m-1}\) are sufficient to cover \(\tilde{B}\).

  3. 3.

    If the previous assumptions on \(\delta \) are not satisfied, a box of width \(\delta \) centered at \(\mathbf {x}^*\) may not intersect \(\partial \tilde{B}\). To estimate the number of boxes of width \(\delta \) required to cover \(\tilde{B}\), we first estimate the number of boxes, \(N_r\), of width \(r = \frac{2\varepsilon }{L}\) required to cover \(\tilde{B}\) using the previous analysis, and then estimate the number of boxes of width \(\delta \) required to cover the intersection of these \(N_r\) boxes with \(\tilde{B}\).

    The number of boxes of width r required to cover \(\tilde{B}\) is \(N_r := 1 + 2n_x\), where ‘1’ corresponds to the box centered at \(\mathbf {x}^* = \mathbf {0}\), and ‘\({2n_x}\)’ corresponds to the boxes along the directions in \(E_1\). Note that \(E_1\) is now defined as

    $$\begin{aligned} E_1 := \left\{ \mathbf {e}\in \mathbb {R}^{n_x} : e_j \in \left\{ -\displaystyle \frac{r}{2}, 0, \displaystyle \frac{r}{2}\right\} , \, \forall j \in \{1,\ldots ,n_x\}, \, \, \displaystyle \sum _{j=1}^{n_x}{I_0(e_j)} = 1 \right\} \end{aligned}$$

    since \(\tilde{B}\) is first covered using boxes of width r. The box of width r centered at \(\mathbf {x}^*\) can be covered using \({\lceil }\frac{r}{\delta }{\rceil }^{n_x}\) boxes of width \(\delta \). Note that the entire volume of the \(2n_x\) boxes along the directions in \(E_1\) need not be covered using boxes of width \(\delta \) since parts of those boxes have no intersection with \(\tilde{B}\). To estimate the extent to which each of the \(2n_x\) boxes need to be covered with boxes of width \(\delta \), we compute the distance between any \(\mathbf {e}\in E_1\) (which is a midpoint of a one-dimensional face of the box of width r centered at \(\mathbf {x}^*\)) and \(\frac{2r}{r \times 1}\mathbf {e}= 2\mathbf {e}\) (which is a point on the boundary of \(\tilde{B}\) in the direction \(\mathbf {e}\)) in the \(\infty \)-norm. This distance turns out to be equal to \(\frac{r}{2}\). This implies at most half the volumes of the \(2n_x\) boxes need to be covered using boxes of width \(\delta \), which yields the estimate of \(2n_x {\lceil }\frac{r}{\delta }{\rceil }^{n_x - 1} {\lceil }\frac{r}{2\delta }{\rceil }\) boxes of width \(\delta \) that are required to cover the \(2n_x\) boxes of width r along the directions in \(E_1\). \(\square \)

1.4 Proof of Lemma 7

Lemma 7 Consider Problem (P). Suppose \(\mathbf {x}^*\) is nonisolated, f is locally Lipschitz continuous on X and directionally differentiable at \(\mathbf {x}^*\), and \(\exists \alpha > 0\) such that

$$\begin{aligned} L := \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf } f'(\mathbf {x}^*;\mathbf {d}) > 0. \end{aligned}$$

Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} . \end{aligned}$$

Proof

Let \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 > 0\). We have (see Theorem 3.1.2 in [24])

$$\begin{aligned} f(\mathbf {x})&= f(\mathbf {x}^* + t\mathbf {d}) \\&= f(\mathbf {x}^*) + f'(\mathbf {x}^*;(\mathbf {x}- \mathbf {x}^*)) + o({||}\mathbf {x}- \mathbf {x}^*{||}_1) \\&= f(\mathbf {x}^*) + tf'(\mathbf {x}^*;\mathbf {d}) + o(t) \\&\ge f(\mathbf {x}^*) + Lt + o(t), \end{aligned}$$

where Step 2 follows from the directional differentiability of f at \(\mathbf {x}^*\). Consequently, there exists \(\hat{\alpha } \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t \in [0,\hat{\alpha })\):

$$\begin{aligned} f(\mathbf {x}) \ge f(\mathbf {x}^*) + Lt + o(t) \ge f(\mathbf {x}^*) + \displaystyle \frac{L}{2} t. \end{aligned}$$

Therefore, \(\forall \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) we have \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon \ge f(\mathbf {x}) - f(\mathbf {x}^*) \ge \displaystyle \frac{L}{2} t \implies Lt = L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon . \end{aligned}$$

\(\square \)

1.5 Proof of Lemma 10

Lemma 10 Consider Problem (P), and suppose the assumptions of Lemma 8 hold. Then \(\exists \hat{\alpha } \in (0,\alpha ]\) and constants \(\rho _1, \rho _2 \ge 0\) such that the region \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_5 = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon , \, -\rho _1\varepsilon \le {\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \le \rho _2\varepsilon \right\} . \end{aligned}$$

Proof

The result trivially follows from Lemma 8 when \(\nabla f(\mathbf {x}^*) = \mathbf {0}\).

Suppose \(\nabla f(\mathbf {x}^*) \ne \mathbf {0}\). From Lemma 8, we have

$$\begin{aligned}&\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5 \subset \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon \right\} . \end{aligned}$$
(2)

Suppose we represent each \(\mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \mathscr {F}(X)\) by \(\mathbf {x}:= \mathbf {x}^* + \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\), where \(\beta _1, \beta _2 \in \mathbb {R}\) and \(\mathbf {d}\perp \nabla f(\mathbf {x}^*)\) with \({||}\mathbf {d}{||} = 1\). Consider the case when \(\beta _1 \ge 0\). We have

$$\begin{aligned} f(\mathbf {x}) - f(\mathbf {x}^*)= & {} {\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) + \displaystyle \frac{1}{2}{(\mathbf {x}- \mathbf {x}^*)}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) (\mathbf {x}- \mathbf {x}^*) + o\left( {||}\mathbf {x}- \mathbf {x}^*{||}^2\right) \\= & {} {\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) \\&+ \displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) + o\left( \beta ^2_1 + \beta ^2_2\right) \\= & {} \beta _1 {||}\nabla f(\mathbf {x}^*){||}^2 \\&\displaystyle +\frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) + o\left( \beta ^2_1 + \beta ^2_2\right) . \end{aligned}$$

Therefore, \(\forall \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) with \(\mathbf {x}= \mathbf {x}^* + \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\), \(\beta _1 \ge 0, \beta _2 \in \mathbb {R}\) and \(\mathbf {d}\perp \nabla f(\mathbf {x}^*)\) with \({||}\mathbf {d}{||} = 1\), we have

$$\begin{aligned}&\beta _1 {||}\nabla f(\mathbf {x}^*){||}^2 + \displaystyle \frac{1}{2}{( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d})}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) ( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}) {+} o(\beta ^2_1 + \beta ^2_2) \le \varepsilon \nonumber \\ \implies&\beta _1 {||}\nabla f(\mathbf {x}^*){||}^2 \le \varepsilon - \displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) \nonumber \\&\qquad \qquad \qquad \qquad \quad - o\left( \beta ^2_1 + \beta ^2_2\right) \nonumber \\ \implies&\beta _1 {||}\nabla f(\mathbf {x}^*){||}^2 \le \varepsilon - \displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) \nonumber \\&\qquad \qquad \qquad \qquad \quad + \displaystyle \frac{\gamma }{2} \left( \beta ^2_1 {||}\nabla f(\mathbf {x}^*){||}^2 + \beta ^2_2 \right) , \end{aligned}$$
(3)

where the last step uses the fact that \(\hat{\alpha }\) is chosen such that \(o\left( \beta ^2_1 + \beta ^2_2\right) \ge -\displaystyle \frac{\gamma }{2} ( \beta ^2_1 {||}\nabla f(\mathbf {x}^*){||}^2 + \beta ^2_2 )\) (see Eq. 1). Note that \(\beta _1 \le \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma {||}\nabla f(\mathbf {x}^*){||}^2}}\) and \({|}*{|}{\beta _2} \le \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma }}\) follow from Eq. 2. The right hand side of Eq. 3 is \(O(\varepsilon )\) since \(\beta _1 = \beta _2 = O(\displaystyle \sqrt{\varepsilon })\), thereby establishing the existence of \(\rho _2 \ge 0\).

Next, suppose \(\beta _1 \le 0\). From the assumptions of Lemma 8, we have for each \(\mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) with \(\mathbf {x}= \mathbf {x}^* + \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\), \(\beta _1 \le 0, \beta _2 \in \mathbb {R}\) and \(\mathbf {d}\perp \nabla f(\mathbf {x}^*)\) with \({||}\mathbf {d}{||} = 1\):

$$\begin{aligned}&{\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) + \displaystyle \frac{1}{2}{(\mathbf {x}- \mathbf {x}^*)}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) (\mathbf {x}- \mathbf {x}^*) \ge \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \nonumber \\ \implies&{\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) + \displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) \nonumber \\&\quad \ge \gamma \left( \beta ^2_1 {||}\nabla f(\mathbf {x}^*){||}^2 + \beta ^2_2 \right) \nonumber \\ \implies&\beta _1 {||}\nabla f(\mathbf {x}^*){||}^2 + \displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) \nonumber \\&\quad \ge \gamma \left( \beta ^2_1 {||}\nabla f(\mathbf {x}^*){||}^2 + \beta ^2_2 \right) \nonumber \\ \implies&\displaystyle \frac{1}{2}{\left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \beta _1 \nabla f(\mathbf {x}^*) + \beta _2 \mathbf {d}\right) - \gamma \left( \beta ^2_1 {||}\nabla f(\mathbf {x}^*){||}^2 + \beta ^2_2 \right) \nonumber \\&\quad \ge -\beta _1 {||}\nabla f(\mathbf {x}^*){||}^2, \end{aligned}$$
(4)

and \(\beta _1 \ge -\displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma {||}\nabla f(\mathbf {x}^*){||}^2}}\), \({|}*{|}{\beta _2} \le \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma }}\) from Eq. 2. The left hand side of Eq. 4 is \(O(\varepsilon )\) since \(\beta _1 = \beta _2 = O(\displaystyle \sqrt{\varepsilon })\), thereby establishing the existence of \(\rho _1 \ge 0\). \(\square \)

1.6 Proof of Proposition 4

Proposition 4 Consider Problem (P) with \(m_E \ge 1\). Suppose \(\mathbf {x}^*\) is a nonisolated constrained minimizer, f is differentiable at \(\mathbf {x}^*\), functions \(h_k\), \(k = 1,\ldots ,m_E\), are differentiable at \(\mathbf {x}^*\), and \(\mathscr {A}(\mathbf {x}^*) = \emptyset \). Furthermore, suppose there exist multipliers \(\varvec{\lambda }^* \in \mathbb {R}^{m_E}\) corresponding to the equality constraints such that \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point. Then \(\not \exists \alpha > 0\), \(\mathscr {D}_0\) such that the assumptions of Lemma 11 are satisfied.

Proof

Since \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point, we have

$$\begin{aligned} \nabla f(\mathbf {x}^*) + \displaystyle \sum _{k=1}^{m_E}{\lambda ^*_k \nabla h_k(\mathbf {x}^*)} = \mathbf {0}. \end{aligned}$$

From the assumption that \(\mathbf {x}^*\) is a nonisolated feasible point, we have that the set \(\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} \) is nonempty. Additionally, we have from the proof of Proposition 2 that for each \(\mathbf {d}\in T(\mathbf {x}^*)\) with \({||}\mathbf {d}{||}_1 = 1\), \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0\) and \({\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0, \forall k \in \{1,\ldots ,m_E\}\).

Assume, by way of contradiction that \(\exists \alpha > 0\) and a set \(\mathscr {D}_0\) satisfying the assumptions of Lemma 11. Consequently, \(\exists L_f, L_I > 0\) such that

$$\begin{aligned} L_f = \underset{\mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I}{\inf } {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\end{aligned}$$

and

$$\begin{aligned} L_I = \underset{\mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0}{\inf } \,\, \underset{k \in \{1,\ldots ,m_E\}}{\max }\left|{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\right|. \end{aligned}$$

Since \(\exists \mathbf {d}\in T(\mathbf {x}^*)\) with \({||}\mathbf {d}{||}_1 = 1\) such that \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0\) and \({\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0, \forall k \in \{1,\ldots ,m_E\}\), we have that the set

$$\begin{aligned} S := \left\{ \mathbf {d}\in \mathbb {R}^{n_x} : {||}\mathbf {d}{||}_1 = 1, \, {|}*{|}{{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}}< L_f, \, {|}*{|}{{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}} < L_I, \forall k \in \{1,\ldots ,m_E\} \right\} \end{aligned}$$

is nonempty. All that remains to reach a contradiction is to show that \(\exists \bar{\mathbf {d}} \in S \cap \mathscr {D}_I\).

From the above arguments, we have the existence of \(\bar{\mathbf {d}} \in S\), \(\bar{k} \in \{1,\ldots ,m_E\}\) such that \({|}*{|}{{\nabla h_{\bar{k}}(\mathbf {x}^*)}^\mathrm{T} \bar{\mathbf {d}}} \in (0, L_I)\), since the assumption \(L_I > 0\) implies all of the equality constraint gradients \(\nabla h_k(\mathbf {x}^*)\), \(k \in \{1,\ldots ,m_E\}\), cannot simultaneously be \(\mathbf {0}\). Since \({\nabla h_{\bar{k}}(\mathbf {x}^*)}^\mathrm{T} \bar{\mathbf {d}} \ne 0\), we have \(\bar{\mathbf {d}} \not \in T(\mathbf {x}^*)\) (this follows from the arguments made in the proof of Proposition 2). Consequently, \(\exists t \in (0,\alpha )\) such that \((\mathbf {x}^*+t\bar{\mathbf {d}}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} \implies \bar{\mathbf {d}} \in \mathscr {D}_I\). This implies that either \(\bar{\mathbf {d}} \in \mathscr {D}_0\), or \(\bar{\mathbf {d}} \in \mathscr {D}_I \backslash \mathscr {D}_0\), which contradicts the definition of \(L_f\) or \(L_I\) since \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \bar{\mathbf {d}} < L_f\) and \({|}*{|}{{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \bar{\mathbf {d}}} < L_I, \forall k \in \{1,\ldots ,m_E\}\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kannan, R., Barton, P.I. The cluster problem in constrained global optimization. J Glob Optim 69, 629–676 (2017). https://doi.org/10.1007/s10898-017-0531-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-017-0531-z

Keywords

Mathematics Subject Classification

Navigation