Skip to main content
Log in

Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We propose a multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for stochastic optimization both with and without inequality constraints. The algorithm combines the smoothed functional (SF) scheme for estimating the gradient with the quasi-Newton method to solve the optimization problem. Newton algorithms typically update the Hessian at each instant and subsequently (a) project them to the space of positive definite and symmetric matrices, and (b) invert the projected Hessian. The latter operation is computationally expensive. In order to save computational effort, we propose in this paper a quasi-Newton SF (QN-SF) algorithm based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rule. In Bhatnagar (ACM TModel Comput S. 18(1): 27–62, 2007), a Jacobi variant of Newton SF (JN-SF) was proposed and implemented to save computational effort. We compare our QN-SF algorithm with gradient SF (G-SF) and JN-SF algorithms on two different problems – first on a simple stochastic function minimization problem and the other on a problem of optimal routing in a queueing network. We observe from the experiments that the QN-SF algorithm performs significantly better than both G-SF and JN-SF algorithms on both the problem settings. Next we extend the QN-SF algorithm to the case of constrained optimization. In this case too, the QN-SF algorithm performs much better than the JN-SF algorithm. Finally we present the proof of convergence for the QN-SF algorithm in both unconstrained and constrained settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. An alternative simpler proof of stability of \(\{Z(n)\}\) can be provided by a straightforward verification of the few stability requirements in [14] that are easily seen to hold in our setting.

References

  1. Andradottir, S.: A scaled stochastic approximation algorithm. Manag. Sci. 42, 475–498 (1996)

    Article  MATH  Google Scholar 

  2. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Comput. Netw. 38(4), 393–422 (2002)

    Article  Google Scholar 

  3. Bhatnagar, S.: Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization. ACM Trans. Model. Comput. Simul. 15(1), 74–107 (2005)

    Article  Google Scholar 

  4. Bhatnagar, S.: Adaptive Newton-based smoothed functional algorithms for simulation optimization. ACM Trans. Model. Comput. Simul. 18(1), 27–62 (2007)

    Article  MathSciNet  Google Scholar 

  5. Bhatnagar, S., Borkar, V.S.: A two time scale stochastic approximation scheme for simulation based parametric optimization. Probab. Eng. Inf. Sci. 12, 519–531 (1998)

    Article  MATH  Google Scholar 

  6. Bhatnagar, S., Fu, M.C., Marcus, S.I., Fard, P.J.: Optimal structured feedback policies for ABR flow control using two-timescale SPSA. IEEE/ACM Trans. Netw. 9(4), 479–491 (2001)

    Article  Google Scholar 

  7. Bhatnagar, S., Fu, M.C., Marcus, S.I., Bhatnagar, S.: Two timescale algorithms for simulation optimization of hidden Markov models. IIE Trans. 33(3), 245–258 (2001)

    Google Scholar 

  8. Bhatnagar, S., Hemachandra, N., Mishra, V.: Stochastic approximation algorithms for constrained optimization via simulation. ACM Trans. Model. Comput. Simul. 21(2), 15:1–15:22 (2011)

    Google Scholar 

  9. Bhatnagar, S., Prasad, H.L., Prashanth, L.A.: Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods. Springer, New York (2013). LNCIS Series

    Book  MATH  Google Scholar 

  10. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A Stochastic Quasi-Newton Method for Large-Scale Optimization. CoRR arXiv:1401.7020 (2014)

  11. Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-Newton stochastic gradient descent. J. Mach. Learn. Res. 10, 1737–1754 (2009)

    MathSciNet  MATH  Google Scholar 

  12. Borkar, V.S.: Stochastic Approximation: A Dynamical Systems View point. Cambridge University Press and Hindustan Book Agency, New Delhi (2008)

    MATH  Google Scholar 

  13. Borkar, V.S.: An actor-critic algorithm for constrained Markov decision processes. Syst. Control Lett. 54, 207–213 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Borkar, V.S., Meyn, S.P.: The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2), 447–469 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  15. Brandiere, O.: Some pathological traps for stochastic approximation. SIAM J. Control Optim. 36, 1293–1314 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cohen, J.E., Kelly, F.P.: A paradox of congestion in a queueing network. J. Appl. Probab. 27, 730–734 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  17. Dennis, J.E., Morée, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  18. Harchol-Balter, M., Crovella, M., Murta, C.: On choosing a task assignment policy for a distributed server system. IEEE J. Parallel Distrib. Comput. 59(2), 204–228 (1999)

    Article  Google Scholar 

  19. Hirsch, M.W.: Convergent activation dynamics in continuous time networks. Neural Netw. 2, 331–349 (1989)

    Article  Google Scholar 

  20. Kao, C., Chen, S.: A stochastic quasi-Newton method for simulation response optimization. Eur. J. Oper. Res. 173, 30–46 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Katkovnik, V.Y., Kulchitsky, Y.: Convergence of a class of random search algorithms. Autom. Remote Control 8, 1321–1326 (1972)

    MathSciNet  MATH  Google Scholar 

  22. Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York (2003)

    MATH  Google Scholar 

  23. Lakshmanan, K., Bhatnagar, S.: Smoothed functional and quasi-Newton algorithms for routing in multi-stage queueing network with constraints. In: International Conference on Distributed Computing and Internet Technology (ICDCIT), vol. 6536, pp. 175–186. LNCS (2011)

  24. Pemantle, R.: Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18, 698–712 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  25. Schweitzer, P.J.: Perturbation theory and finite Markov chains. J. Appl. Probab. 5, 401–413 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  26. Spall, J.C.: Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45, 1839–1853 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  27. Sunehag, P., Trumpf, J., Vishwanathan, S.V.N., Schraudolph, N.N.: Variable metric stochastic approximation theory. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 560–566 (2009)

  28. Vazquez-Abad, F.J., Kushner, H.J.: Estimation of the derivative of a stationary measure with respect to a control parameter. J. Appl. Probab. 29, 343–352 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  29. Xiao, X., Lionel, M.N.: Internet QoS: a big picture. IEEE Netw. 13, 8–18 (1999)

    Article  Google Scholar 

  30. Zhu, X., Spall, J.C.: A modified second-order SPSA optimization algorithm for finite samples. Int. J. Adapt. Control. 16, 397–409 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Lakshmanan.

Appendix: Proofs of Sect. 4

Appendix: Proofs of Sect. 4

Proof of Lemma 1

Note that \(Q_l(n)\) is measurable \(\mathcal {F}(n)\) for all \(n\ge 0\). Further, it is easy to see that \(E[Q_l(n+1)|\mathcal {F}(n)] = Q_l(n)\) a.s., \(\forall n \ge 0\). We can show that for any real \(a_n\) and \(b_n\),

$$\begin{aligned} \bigg ( \sum _{n=1}^K (a_n - b_n) \bigg )^2&= \bigg ( \sum _{n=1}^K \frac{a_n-b_n}{\sqrt{a_n^2+b_n^2}} \sqrt{a_n^2+b_n^2} \bigg )^2\\&\le \sum _{n=1}^K \bigg (1-\frac{2 a_n b_n}{a_n^2 + b_n^2} \bigg ) \sum _{n=1}^K (a_n^2 + b_n^2) \le 2K \sum _{n=1}^K(a_n^2+ b_n^2), \end{aligned}$$

where first we have used the Cauchy-Schwarz inequality and the second inequality follows since \(-\frac{2a_nb_n}{a_n^2+b_n^2} \le 1\). Hence we have

$$\begin{aligned} E[Q_l^2(n)]&\le \frac{2n}{\beta ^2} \sum _{m=1}^n b^2(m) E \bigg ( \eta _{l}^2(m) \Big ( h(X^{'}_{m}) - h(X_m) \Big )^2 \\&\quad + E^2 \Big ( \eta _l(m) \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \bigg ). \end{aligned}$$

Here, we let \(E^a(X)\) denote \((E(X))^a\). By conditional Jensen’s inequality, we have

$$\begin{aligned}&E^2 \Big ( \eta _l(m) \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \\&\quad \le E \Big ( \eta _l^2(m) \big ( h(X^{'}_{m}) - h(X_m) \big )^2 \big | \mathcal {F}(m-1) \Big ). \end{aligned}$$

Hence by the Cauchy-Schwarz inequality

$$\begin{aligned} E[Q_l^2(n)]&\le \frac{4n}{\beta ^2} \sum _{m=1}^n b^2(m) E \bigg ( \eta _{l}^2(m) \Big ( h(X^{'}_{m}) - h(X_m) \Big )^2 \bigg ) \\&\le \frac{4n}{\beta ^2} \sum _{m=1}^n b^2(m) E^{1/2} \Big ( \eta _{l}^4(m) \Big ) E^{1/2} \Big ( \big ( h(X^{'}_{m}) - h(X_m) \big )^4 \Big ). \end{aligned}$$

Since h(.) is a Lipschitz continuous function, we have \(|h(X^{'}_m) - h(X_m)|^4 \le K ||X^{'}_m - X_m ||^4\), for some constant \(K > 0\). Hence, \(E^{1/2}[(h(X^{'}_m) - h(X_m))^4] \le \sqrt{K} E^{1/2} || X^{'}_m - X_m ||^4\). As a consequence of Assumption 3, \(\sup _m E[ || X_m - X^{'}_m ||^4] < \infty \) [4]. Thus, \(E[Q^2_{l}(n)] < \infty \), for all \(n \ge 1\), i.e., \(Q_l(n)\) are square-integrable and hence also integrable random variables. Thus \((Q_l(n),\mathcal {F}(n)), n\ge 0\) is a square-integrable martingale sequence. We now show that its quadratic variation process is convergent. Thus, note that

$$\begin{aligned}&\sum _n E[(Q_l(n+1) - Q_l(n) )^2 | \mathcal {F}(n)] \\&= \sum _n E \Big [ \Big ( b(n+1) \Big ( \frac{\eta _l(n+1)}{\beta } \Big ( h(X^{'}_{n+1}) - h(X_{n+1}) \Big ) \\&\quad -E \Big ( \frac{\eta _l(n+1)}{\beta } \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big ) \big | \mathcal {F}(n) \Big ) \Big ) \Big )^2 \Big | \mathcal {F}(n) \Big ] \\&\le \sum _n b^2(n+1) \bigg ( E \Big [ \frac{\eta _l^2(n+1)}{\beta ^2} \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big )^2 \big | \mathcal {F}(n) \Big ]\\&\quad + E \Big [ E^2 \big [ \frac{\eta _l(n+1)}{\beta } \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big ) \big | \mathcal {F}(n) \big ] \big | \mathcal {F}(n) \Big ] \bigg ) \\&\le \sum _n 2b^2(n+1) \bigg ( E \Big [ \frac{\eta _l^2(n+1)}{\beta ^2} \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big )^2 \big | \mathcal {F}(n) \Big ] \bigg ) \end{aligned}$$

where the second inequality follows by another application of conditional Jensen’s inequality. It can now be seen as before using an application of the Cauchy-Schwarz inequality as well as Assumptions 2 and 3 that

$$\begin{aligned} \sup _n \frac{1}{\beta ^2} E \big [ \eta _l^2(n+1) (h(X^{'}_{n+1}) - h(X_{n+1}))^2 \big | \mathcal {F}(n) \big ] < \infty \text{ w.p. } \text{1. } \end{aligned}$$

Now from Assumption 4, \(\sum _n E[(Q_l(n+1) - Q_l(n) )^2 | \mathcal {F}(n)] < \infty \) a.s. Thus, the quadratic variation process of \(\{Z_l(n)\}\) is almost surely convergent. Hence, by the martingale convergence theorem for square integrable martingales, \(\{Q_l(n)\}\) are a.s. convergent martingale sequences. \(\square \)

Proof of Lemma 3

Note that (9) can be rewritten as

$$\begin{aligned} Z_l(n+1)&= Z_l(n) + b(n) \bigg ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - Z_l(n) \bigg ) \nonumber \\&\quad + b(n) \bigg ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \nonumber \\&\quad - E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \bigg ). \end{aligned}$$
(34)

Hence we have

$$\begin{aligned}&\sum _n (Z_l(n+1) - Z_l(n))\nonumber \\&\quad = \sum _n b(n) \Big ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - Z_l(n) \Big )\nonumber \\&\qquad \quad + \sum _n b(n) \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) - E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) \Big ). \end{aligned}$$
(35)

Note that as a consequence of Lemma 1, \(\exists Q_l(\infty )<\infty \) a.s. such that \(Q_l(n) \rightarrow Q_l(\infty )\) a.s. as \(n\rightarrow \infty \). Now from the definition of \(Q_l(n)\), it is clear that

$$\begin{aligned} Q_l(\infty ) =&\sum _{m=1}^\infty b(m) \bigg ( \frac{\eta _l(m)}{\beta } \Big ( h(X^{'}_{m}) - h(X_m) \Big ) \\&- E \Big ( \frac{\eta _l(m)}{\beta } \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \bigg ) < \infty \text{ a.s. } \end{aligned}$$

Note that the second term in the RHS of (35) is precisely \(Q_l(\infty )<\infty \). As a consequence of the above, it is sufficient to show the boundedness of the following recursion in place of (34):

$$\begin{aligned} {\bar{Z}}_l(n+1) = {\bar{Z}}_l(n) + b(n) \bigg ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - {\bar{Z}}_l(n) \bigg ),\nonumber \\ \end{aligned}$$
(36)

with \({\bar{Z}}(0) = Z(0)\).

As in the proof of Lemma 1 it can be seen that

$$\begin{aligned} \sup _n E \left[ \frac{\eta _l(n)}{\beta } \left( h \left( X'_n \right) - h \left( X_n \right) \right) \big | \mathcal {F}(n-1) \right] < \infty \end{aligned}$$

with probability 1. Now since \(b(n) \rightarrow 0\) as \(n \rightarrow \infty \), there exists a \(p_0\) such that for all \(n \ge p_0, \, 0 \le b(n) \le 1\). Hence for all \(n \ge p_0, \, {\bar{Z}}(n+1)\) is a convex combination of \({\bar{Z}}(n)\) and a quantity that is almost surely uniformly bounded. The claim followsFootnote 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lakshmanan, K., Bhatnagar, S. Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization. Comput Optim Appl 66, 533–556 (2017). https://doi.org/10.1007/s10589-016-9875-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9875-4

Keywords

Mathematics Subject Classification

Navigation