The paper by Izmailov and Solodov provides an excellent overview of recent research on a phenomenon of slow convergence of optimization algorithms. The phenomenon was unknown about 10 years ago. For me, one of the most striking outcomes of this research is that the phenomenon may cause serious problems to software if applied to certain optimization problems. Although the occurrence of the phenomenon also depends on the point where an algorithm starts from, the sole existence of a critical multiplier frequently forces an algorithm to show this slow convergence. By now, this behavior is not fully understood and remedies are sought. The invited paper is well suited for researchers to become interested in the topic.

In the remainder of the discussion, I would like to first make a short excursion to a research problem that started to attract people in the second half of the nineties. Then, I will raise some questions related to both the excursion and the phenomenon of slow convergence.

A root of the invited paper’s topic goes back to about 20 years. At that time, researchers thought about algorithmically dealing with optimization problems whose set of Lagrange multipliers associated to a stationary point contains more the one element (degenerate case). More in detail, let us consider the optimization problem

$$\begin{aligned} \text{ minimize }\quad f(x)\quad \text{ subject } \text{ to }\quad g(x)\le 0,\;h(x)=0, \end{aligned}$$
(1)

where \(f:\mathbb {R}^n\rightarrow \mathbb {R}\), \(g:\mathbb {R}^n\rightarrow \mathbb {R}^m\), and \(\mathbb {R}^n\rightarrow \mathbb {R}^l\) are sufficiently smooth functions. For the sake of simplicity, Izmailov and Solodov deal with equality constraints only. Here, inequality constraints are involved as well. A point \(\bar{x}\in \mathbb {R}^n\) is called stationary for problem (1), if \(\bar{x}\) together with some multipliers \(\mu \in \mathbb {R}^m\) and \(\lambda \in \mathbb {R}^l\) satisfy the Karush–Kuhn–Tucker (KKT) system

$$\begin{aligned} \frac{\textstyle \partial }{\textstyle \partial x}\bigg (f(x)+\langle \mu ,g(x)\rangle +\langle \lambda ,h(x)\rangle \bigg )&= 0, \quad h(x)=0,\nonumber \\ \mu \ge 0,\quad g(x)\le 0,\quad \langle \mu ,g(x)\rangle&= 0. \end{aligned}$$
(2)

The set of Lagrange multipliers associated to \(\bar{x}\) is denoted by \(\mathcal M(\bar{x})\). Our notation is close to that of Izmailov and Solodov. Moreover, references from their paper will appear as [IS #].

The construction of locally superlinearly convergent methods for problems where the set \(\mathcal M(\bar{x})\) of Lagrange multipliers has infinitely many (nonisolated) elements was not an easy task in general. Note that classical conditions for superlinear convergence of methods for problem (1), like SQP-type methods in Robinson (1974) and Wilson (1963), or algorithms based on the reformulation of the KKT system (2) as a nonsmooth system of equations (Facchinei et al. 1998a; Qi and Jiang 1997), imply the local uniqueness of a KKT point. A review of some early works on the topic of dealing with nonunique multipliers can be found in Fischer (1999). For further references we just refer the reader to the invited paper by Izmailov and Solodov. Here, we would like to mention only two approaches that, in a wider sense, might be useful to deal with inequality constraints in the degenerate case. The first one (Ralph and Wright 1997, 2000) is based on an interior point technique for solving monotone variational inequalities and allows local superlinear convergence under conditions that do not imply the uniqueness of the multiplier. A related result for not necessarily convex optimization problems is given in Vicente and Wright (2002). The second approach is about identifying active inequality constraints at a stationary point \(\bar{x}\) if problem (1) is degenerate. Such techniques may help to locally replace inequality constraints by equations. They were developed, applied, and extended in several contexts, for example see Dan et al. (2002), De Leone and Lazzari (2010), Facchinei et al. (1998b), Izmailov and Solodov (2008), Oberlin and Wright (2006) and Wright (2003). A basic ingredient of many identification techniques is a computable local error bound that holds in some neighborhood of the set of KKT points or, similarly, in the neighborhood of a certain particular KKT point. In the latter case, such an error bound is a function \(\delta :\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^l\rightarrow [0,\infty )\) which, for some KKT point \((\bar{x},\bar{\mu },\bar{\lambda })\) and some \(\gamma \in (0,1]\), \(C>0\), satisfies

$$\begin{aligned} \delta (x,\mu ,\lambda )^\gamma \ge C\text{ dist }\left[ (x,\mu ,\lambda ),\{\bar{x}\}\times \mathcal M(\bar{x})\right] \!, \end{aligned}$$

for all \((x,\mu ,\lambda )\) in a neighborhood of \((\bar{x},\bar{\mu },\bar{\lambda })\). The locally correct identification of active inequalities according to Facchinei et al. (1998b) works for any \(\gamma \in (0,1]\). Interestingly, if an error bound with \(\gamma =1\) is available, it can be used to construct algorithms that converge locally superlinearly to a KKT point of a degenerate optimization problem. The first algorithm of this kind is the stabilized SQP method in [IS 51]. For more discussion and developments see Sect. 4 of the paper by Izmailov and Solodov. It is worth noting that the existence of an error bound around the KKT point \((\bar{x},\bar{\mu },\bar{\lambda })\) with \(\gamma =1\) is equivalent to requiring that \((\mu ,\lambda )\) is noncritical [IS 10]. Thus, for starting points sufficiently close to a noncritical multiplier, critical multipliers do not spoil those stabilized algorithms.

It is certainly not by chance that, in a more general sense, the use of error bounds plays a very important role for solving KKT systems or other problems that have nonisolated solutions. Such problems with nonisolated solutions may arise, for example, if we consider optimization problems with nonisolated primal solutions, systems of equations and inequalities, complementarity problems, multicriteria optimization, KKT conditions or Fritz John conditions (Dorsch et al. 2013) for generalized Nash equilibrium problems. In such cases a first task is to design appropriate algorithms that allow a certain stabilization or regularization (for example see Behling and Fischer 2012; Dong and Fischer 2006; Facchinei et al. 2014; Kanzow et al. 2004, [IS 13]), where we would like to explicitly mention the breakthrough in Yamashita and Fukushima (2001) by means of an appropriate regularization within the Levenberg–Marquardt method. In some cases, it is also not obvious how one can construct a computable local error bound (with \(\gamma =1\)) or under which sufficient conditions such a bound exists, see Dreves et al. (2014), Facchinei et al. (1998b), Fischer and Shukla (2008), Izmailov and Solodov (2014) and [IS 11] for corresponding results.

In view of these works on problems with nonisolated solutions the question arises whether in those more general problems a criticality notion is useful or not. More importantly, do globally convergent algorithms for those problems exhibit the tendency to converge slowly to certain solutions. In analogy to the case of nonunique multipliers one could think that criticality of a solution of a more general problem means that, for this solution, there is no local error bound with \(\gamma =1\). However, before making a conjecture, it appears to be advisable to further clarify the criticality of multipliers and their influence on the global convergence behavior of algorithms. Also note that by another description of the feasible set the degeneracy may vanish.

I do not expect that one will be able to completely avoid the phenomenon that algorithms “like” to converge to critical multipliers. Therefore, as Izmailov and Solodov say, it might be helpful “to improve the efficiency in the case of convergence to critical multipliers”. There are several attempts to accelerate the linear rate of convergence of algorithms applied to somehow singular problems, see Griewank (1980), Izmailov and Solodov (2002), Oberlin and Wright (2009) and references therein. Another direction of thinking is whether it can be useful to somehow avoid dealing with multipliers. Some of the restoration techniques (see Martínez and Pilotta 2000; Fischer and Friedlander 2010) do not use multipliers (with the effect of local slow convergence). Results in Birgin and Martínez (2005) and Izmailov et al. (2014) show that superlinear convergence is possible (by means of multipliers). Maybe a higher (but expensive) accuracy when dealing with feasibility could help.