Abstract
When measuring the value of a function to be minimized is not only expensive but also with noise, the popular simultaneous perturbation stochastic approximation (SPSA) algorithm requires only two function values in each iteration. In this paper, we present a method requiring only one function measurement value per iteration in the average sense. We prove the strong convergence and asymptotic normality of the new algorithm. Limited experimental results demonstrate the effectiveness and potential of our algorithm for solving low-dimensional problems.
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Abdulsadda, A.T., Iqbal, K.: An improved algorithm for system identification using fuzzy rules for training neural networks. Int. J. Autom. Comput. 8(3), 333–339 (2001)
Altaf, M.U., Heemink, A.W., Verlaan, M., Hoteit, I.: Simultaneous perturbation stochastic approximation for tidal models. Ocean Dyn. 61(8), 1093–1105 (2001)
Bartkutė, V., Sakalauskas, L.: Simultaneous perturbation stochastic approximation of nonsmooth functions. Eur. J. Oper. Res. 181(3), 397–409 (2007)
Bartkutė, V., Sakalauskas, L.: Statistical inferences for termination of Markov type random search algorithms. J. Optim. Theory Appl. 141(3), 475–493 (2009)
Doob, J.L.: Stochastic processes. John Wiley and Sons, New York (1953)
Fabian, V.: On asymptotic normality in stochastic approximation. Ann. Math. Stat. 39(4), 1327–1332 (1968)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
Kushner, H.J., Clark, D.S.: Stochastic approximation methods for constrained and unconstrained systems. Springer, New York (1978)
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17–41 (1981)
Spall, J. C.: Accelerated second-order stochastic optimization using only function measurements. In: Proceedings of the 36th IEEE Conference on Decision and Control, vol. 2 pp. 1417–1424. (1997)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Spall, J.C.: A one-measurement form of simultaneous perturbation stochastic approximation. Automatica. 33(1), 109–112 (1997)
Spall, J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst. 34(3), 817–823 (1998)
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control. 37(3), 332–341 (1992)
Sadegh, P., Spall, J.C.: Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control. 43(10), 1480–1484 (1998)
Spall, J. C.: Stochastic version of second-order (Newton-Raphson) optimization using only function measurements. In: Proceedings of the 1995 Winter Simulation Conference, pp. 347–352. (1995)
Xu, Z., Dai, Y.H.: A stochastic approximation frame algorithm with adaptive directions. Numerical Mathematics: Theory. Meth. Appl. 1(4), 460–474 (2008)
Zhu, X., Spall, J.C.: A modified second-order SPSA optimization algorithm for finite samples. Int. J. Adapt. Control Sig. Process. 16(5), 397–409 (2002)
Funding
This research is supported by the Beijing Natural Science Foundation, grant Z180005 and by the National Natural Science Foundation of China under grants 12171021, 12071279 and 11822103, and by General Project of Shanghai Natural Science Foundation (No. 20ZR1420600).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Lemma 1
Proof
We start from the observation
Notice that each component \(\hat{\xi }_{ki}~\left( i=1,2,\cdots ,n\right) \) of \(\hat{\xi }_k\) obeys the symmetric Bernoulli distribution and satisfies \({\hat{\xi }_k}^T \hat{g}_k\ge 0\). Therefore, at least half of the components of \(\hat{\xi }_k\) have the same sign as the components \(\hat{g}_{ki}~\left( i=1,2,\cdots ,n\right) \) of \(\hat{g}_k\). If n is even, \(\hat{\xi }_k\) has \(C_{n}^{ n/2}+C_{n}^{ n/2 +1}+\cdots +C_{n}^{n}=2^{n-1}+ C_{n}^{n/2}/2\) choices. So for any possible choice \(\zeta _j\), we have
If the signs of \(\hat{\xi }_{ki}\) and \(\hat{g}_{ki}\) \(\left( \forall i\in \{1,2,\cdots ,n\}\right) \) are the same, then at least \( n/2-1\) of the remaining \(n-1\) elements of \(\hat{\xi }_k\) share the same signs as that of \(\hat{g}_{ki}\). In this case, \(\hat{\xi }_{k}\) has \(C_{n-1}^{\left( n-2\right) /2}+C_{n-1}^{ n/2}+\cdots +C_{n-1}^{n-1}=2^{n-2}+C_{n-1}^{\left( n-2\right) /2}\) choices. Then we can write
If n is odd, \(\{\zeta _j\}\) has \(C_{n}^{\left( n+1\right) /2}+C_{n}^{\left( n+1\right) /2+1}+\cdots +C_{n}^{n}=2^{n-1}\) choices, each with probability \(P\left( \hat{\xi }_k=\zeta _j\right) = 1/2^{n-1}\). The sign of \(\hat{\xi }_{ki}\) is either the same as or opposite to that of \(\hat{g}_{ki}\). If their signs are the same, then at least \(\left( n-1\right) /2\) of the remaining \(n-1\) elements of \(\hat{\xi }_k\) share the same signs as \(\hat{g}_{ki}\). In this case, \(\hat{\xi }_k\) has \(C_{n-1}^{\left( n-1\right) /2}+C_{n-1}^{\left( n+1\right) /2}+\cdots +C_{n-1}^{n-1}=2^{n-2}+ C_{n-1}^{\left( n-1\right) /2}/2\) choices. Then we can write
It then holds that
For \(\rho \) defined in Step 1 of Algorithm SPSA1-A, we have
Let \(\rho _k=\rho /\Vert \hat{g}_k\Vert _{\infty }\). We can complete the proof
\(\square \)
Appendix B: Proof of Lemma 2
Proof
By a proof similar to that of Lemma 1, we have
Then, we can obtain
In the sequel, the lemma can be proved similar to that in [14]. \(\square \)
Appendix C: Proof of Proposition 1
Proof
According to [8], based on Lemma 2 and Assumption 1, we have
i)
Next, we prove that
ii)
First, for any \(l\in \{1,2,\cdots ,n\}\), we have
Then we obtain
It follows from [14] that
Since \(\frac{1}{\left( 1+\rho _i\right) ^2}=\frac{1}{\left( 1+\rho /\vert g_{il}\vert \right) ^2}\le 1\) and \(\mathbb E\hat{\xi }_{il}^2= 1\), we have
Therefore, it holds that
Next, since \(\{\sum _{i=k}^{m}a_ie_i\}_{m\ge k}\) is a martingale sequence, it follows from the inequality in [5, P. 315] (see also [8, P. 27]) that
where the equality holds as \(\mathbb E\left[ e_{i}^Te_j\right] =\mathbb E\left[ e_{i}^T\mathbb E\left[ e_j\vert \hat{x}_j\right] \right] =0,~\forall ~i<j\).
Then, by (C1) and Assumption 1, we complete the proof of ii). \(\square \)
Appendix D: Proof of Proposition 2
Proof
In order to complete the proof, we need to verify whether conditions (2.2.1), (2.2.2), and (2.2.3) in Fabian [6] are true. Here we assume that all assumptions on \(\theta _k\) or \(\mathscr {F}_k\) hold. According to the notation in [6], we can get
where \(\Gamma _k=aH~\left( \overline{x}_k\right) \), \(V_{k}=k^{-{\gamma }}\left\{ \frac{\hat{g}_k (\hat{x}_k)+\hat{\xi }_k}{1+\rho _k}-{\mathbb E}\left[ \frac{\hat{g}_k(\hat{x}_k)+\hat{\xi }_k}{1+\rho _k}\bigg \vert \hat{x}_k\right] \right\} \), \(\Phi _k=-aI\), and \(T_k=-ak^{\beta /2}b_k\left( \hat{x}_k\right) \). In fact, there is an open neighborhood of \(\hat{x}_k\) (for k sufficiently large) containing \( x^*\) in which \(H\left( \cdot \right) \) is continuous. Then
where \(\Gamma _k=aH\left( \overline{x}_k\right) \) lies in the line segment between \(\hat{x}_k\) and \(x^*\).
Based on the continuity of \(H\left( \cdot \right) \) and a.s. convergence of \(\hat{x}_k\), we have \(\Gamma _k=aH\left( \overline{x}_k\right) \rightarrow aH\left( x^*\right) \) a.s.
Now we prove the convergence of \(T_k\) for \(3\gamma -\alpha /2\ge 0\). When \(3\gamma -\alpha /2>0\), as \(b_k\left( \hat{x}_k\right) =O\left( k^{-2\gamma }\right) \) a.s., we can write that \(T_k\rightarrow 0\) a.s. When \(3\gamma -\alpha /2=0\), by the facts that \(\hat{x}_k\rightarrow x^*\) a.s. and the uniformly boundedness of \(L^{\left( 3\right) }\) near \(x^*\), we have
Then \({\xi _{ki}}\) is symmetrically i.i.d. for each k, which means that the l-th element of \(T_k\) satisfies that
Therefore, \(T_k\) converges for \(3\gamma -\alpha /2\ge 0\).
We can write
where
Define \(\xi ^{-1}_k:=\left( \xi ^{-1}_{k1},\cdots ,\xi ^{-1}_{kp}\right) ^T\). Then we have
where
and the last equation \(\rightarrow 1\) with \(k\rightarrow \infty \). Therefore, (D2) is same as the third term in (3.5) in [14]. As the element of \(\hat{g}_k\hat{\xi }_k^T+\hat{\xi }_k\hat{g}_k^T+\hat{\xi }_k\hat{\xi }_k^T\) is bounded, we have
and
According to [14], we obtain
Thus we have obtained the conditions (2.2.1) and (2.2.2) of [6]. Next we prove condition (2.2.3), i.e.,
By Holder’s inequality and \(0<\delta '<\delta /2\), the upper bound of the above limit can be obtained as
Notice that
Then the proof is completed following from the proof of ([14], Proposition 1). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Xia, Y. & Xu, Z. Simultaneous perturbation stochastic approximation: towards one-measurement per iteration. Numer Algor 94, 1085–1101 (2023). https://doi.org/10.1007/s11075-023-01528-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-023-01528-7