Skip to main content
Log in

A simultaneous perturbation weak derivative estimator for stochastic neural networks

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

In this paper we study gradient estimation for a network of nonlinear stochastic units known as the Little model. Many machine learning systems can be described as networks of homogeneous units, and the Little model is of a particularly general form, which includes as special cases several popular machine learning architectures. However, since a closed form solution for the stationary distribution is not known, gradient methods which work for similar models such as the Boltzmann machine or sigmoid belief network cannot be used. To address this we introduce a method to calculate derivatives for this system based on measure-valued differentiation and simultaneous perturbation. This extends previous works in which gradient estimation algorithm’s were presented for networks with restrictive features like symmetry or acyclic connectivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. This norm for matrices is defined as as \(\Vert w\Vert _{\infty } = \sup _{\Vert u\Vert _{\infty }=1}\Vert wu\Vert _{\infty }\), where for the vector u and wu, the norm \(\Vert \cdot \Vert _{\infty }\) is defined in the usual way.

  2. http://yann.lecun.com/exdb/mnist/.

References

  • Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cognit Sci 9(1):147–169

    Article  Google Scholar 

  • Apolloni B, de Falco D (1991) Learning by asymmetric parallel Boltzmann machines. Neural Comput 3(3):402–408

    Article  Google Scholar 

  • Apolloni B, de Falco D (1991) Learning by parallel Boltzmann machines. IEEE Trans Inf Theory 37(4):1162–1165. https://doi.org/10.1109/18.87009

    Article  Google Scholar 

  • Cao XR (1998) The relations among potentials, perturbation analysis, and markov decision processes. Discrete Event Dyn Syst 8(1):71–87. https://doi.org/10.1023/A:1008260528575

    Article  Google Scholar 

  • Ermoliev Y (1983) Stochastic quasigradient methods and their application to system optimization. Stochastics 9(1–2):1–36. https://doi.org/10.1080/17442508308833246

    Article  Google Scholar 

  • Heidergott B, Hordijk A (2003) Taylor series expansions for stationary markov chains. Adv Appl Probab 35(4):1046–1070

    Article  Google Scholar 

  • Heidergott B, Vázquez-Abad FJ (2006) Measure-valued differentiation for random horizon problems. Markov Process Relat Fields 12(3):509–536

    Google Scholar 

  • Heidergott B, Vázquez-Abad FJ (2008) Measure-valued differentiation for markov chains. J Optim Theory Appl 136(2):187–209. https://doi.org/10.1007/s10957-007-9297-7

    Article  Google Scholar 

  • Heidergott B, Vázquez-Abad FJ, Pflug G, Farenhorst-Yuan T (2010) Gradient estimation for discrete-event systems by measure-valued differentiation. ACM Trans Model Comput Simul 20(1):5:1–5:28. https://doi.org/10.1145/1667072.1667077

    Article  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  Google Scholar 

  • Hinton GE, Sejnowski TJ (1983) Optimal perceptual inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE

  • Kirkland S (2003) Conditioning properties of the stationary distribution for a Markov chain. Electron. J. Linear Algebra 10(1):1

    Article  Google Scholar 

  • Kushner H, Clark D (1978) Stochastic approximation methods for constrained and unconstrained systems, vol 26. Springer, Berlin

    Book  Google Scholar 

  • Little WA (1974) The existence of persistent states in the brain. Math Biosci 19(1–2):101–120

    Article  Google Scholar 

  • McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133

    Article  Google Scholar 

  • Minsky M (1961) Steps toward artificial intelligence. Proc IRE 49(1):8–30

    Article  Google Scholar 

  • Neal RM (1992) Connectionist learning of belief networks. Artif Intell 56(1):71–113

    Article  Google Scholar 

  • Peretto P (1984) Collective properties of neural networks: a statistical physics approach. Biol Cybern 50(1):51–62

    Article  Google Scholar 

  • Pflug GC (1990) On-line optimization of simulated Markovian processes. Math Oper Res 15(3):381–395

    Article  Google Scholar 

  • Pflug GC (1992) Gradient estimates for the performance of Markov chains and discrete event processes. Ann Oper Res 39(1):173–194. https://doi.org/10.1007/BF02060941

    Article  Google Scholar 

  • Pflug GC (1996) Optimization of stochastic models: the interface between simulation and optimization. The Kluwer International Series in Engineering and Computer Science. Kluwer, Dordrecht

  • Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Article  Google Scholar 

  • Ross SM (1990) A course in simulation. Prentice Hall PTR, Englewood Cliffs

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  • Smolensky P (1987) Information processing in dynamical systems: foundations of harmony theory, vol 1. MIT Press, Cambridge, pp 194–281

    Google Scholar 

  • Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Autom Control 37(3):332–341

    Article  Google Scholar 

  • Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep boltzmann machines. Adv Neural Inf Process Syst 25:2231–2239

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Flynn.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Derivations related to the little model

Appendix A: Derivations related to the little model

1.1 A.1 Derivation of Eq. 12

Fix an \(x^0\) and a direction \(v \in \mathbb {R}^{n\times n}\times \mathbb {R}^n\). Then

$$\begin{aligned}&\nabla _{\lambda }P_{\theta + \lambda v}(x^{0},x^{1})\\&\quad = P_{\theta + \lambda v}(x^0,x^1) \nabla _{\lambda }\log P_{\theta + \lambda v}(x^0,x^1) \\&\quad = P_{\theta + \lambda v}(x^0,x^1) \sum \limits _{i=1}^{n}\nabla _{\lambda } \log \left( \sigma ( (x_{i}^{1})^{\dag }u_{i}(x^{0},\theta +\lambda v))\right) \\&\quad = P_{\theta + \lambda v}(x^0,x^1) \sum \limits _{i=1}^{n} \left( 1 -\sigma ( (x_{i}^{1})^{\dag }u_{i}(x^{0},\theta +\lambda v))\right) (x_{i}^{1})^{\dag } \nabla _{\lambda }u_{i}(x^{0},\theta +\lambda v) \\&\quad = P_{\theta + \lambda v}(x^0,x^1) \sum \limits _{i=1}^{n} \left( 1 -\sigma ( (x_{i}^{1})^{\dag }u_{i}(x^{0},\theta +\lambda v))\right) (x_{i}^{1})^{\dag } \left( \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_{i}\right) . \end{aligned}$$

Evaluating this at \(\delta =0\) we find that

$$\begin{aligned} \nabla _{\theta }P_{\theta }(x^{0},x^{1})v = P_{\theta }(x^0,x^1) \sum \limits _{i=1}^{n}(1 -\sigma ( (x_{i}^{1})^{\dag }u_{i}(x^{0},\theta ))) (x_{i}^{1})^{\dag } \left( \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_{i}\right) . \end{aligned}$$
(24)

Note also that

$$\begin{aligned} (1- \sigma ( x^{\dag }u))x^{\dag } = {\left\{ \begin{array}{ll} (1-\sigma (u)) &{}\text { if } x = 1 \\ -\sigma (u)&{}\text { if } x = 0 \end{array}\right. } \end{aligned}$$

which means

$$\begin{aligned} (1-\sigma (x^{\dag }u))x^{\dag } = x - \sigma (u). \end{aligned}$$
(25)

Combining (24) and (25),

$$\begin{aligned}&\nabla _{\theta }\textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^{0},x^{1})v\nonumber \\&\quad = \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1) \textstyle \sum \limits _{i=1}^{n}(1 -\sigma ( (x_{i}^{1})^{\dag }u_{i}(x^{0},\theta )))(x_{i}^{1})^{\dag } \left( \textstyle \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_i\right) \nonumber \\&\quad = \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1) \textstyle \sum \limits _{i=1}^{n}(x_{i}^{1} -\sigma (u_{i}(x^{0},\theta ))) \left( \textstyle \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_i\right) \nonumber \\&\quad = \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\left[ \textstyle \sum \limits _{i=1}^{n}x_{i}^{1} \left( \textstyle \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_i\right) \right. \nonumber \\&\qquad \left. - \textstyle \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}v_{i,j}x^{0}_{j} + v_i\right) \right] . \end{aligned}$$
(26)

Splitting each \(v_{i,j}\) and \(v_i\) into positive and negative parts,

$$\begin{aligned}&= \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\left[ \textstyle \sum \limits _{i=1}^{n}x_{i}^{1} \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_+\right) \right. \nonumber \\&\qquad \left. - \textstyle \sum \limits _{i=1}^{n}x_{i}^{1} \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_{i})_{-}\right) \right] \nonumber \\&- \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\left[ \textstyle \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_{+}\right) \right. \nonumber \\&\qquad \left. - \textstyle \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_{i})_{-}\right) \right] \nonumber \\&= \left( \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\left[ \textstyle \sum \limits _{i=1}^{n}x_{i}^{1} \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_i)_{+}\right) \right. \right. \nonumber \\&\qquad \left. \left. + \textstyle \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_{i})_{-}\right) \right] \right) \nonumber \\&\quad - \left( \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\left[ \textstyle \sum \limits _{i=1}^{n}x_{i}^{1} \left( \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_i)_{-}\right) \right. \right. \nonumber \\&\qquad \left. \left. + \textstyle \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_{+}\right) \right] \right) \nonumber \\&= \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\sum \limits _{i=1}^{n} \left[ x_{i}^{1}\left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_{+}\right) \right. \nonumber \\&\qquad \left. + \sigma (u_{i}(x^{0},\theta )) \left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_i)_{-} \right) \right] \nonumber \\&\quad - \textstyle \sum \limits _{x^{1}}e(x^{1})P_{\theta }(x^0,x^1)\textstyle \sum \limits _{i=1}^{n} \left[ x_{i}^{1}\left( \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_i)_-\right) \right. \nonumber \\&\qquad \left. + \sigma (u_{i}(x^{0},\theta ))\left( \textstyle \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_i)_{+}\right) \right] .\nonumber \\ \end{aligned}$$
(27)

Note that

$$\begin{aligned}&\sum \limits _{x^{1}}P_{\theta }(x^0,x^1)\sum \limits _{i=1}^{n} \left[ x_{i}^{1}\left( \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_{+}\right) \right. \nonumber \\&\qquad \left. + \sigma (u_{i}(x^{0},\theta )) \left( \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_{i})_{-}\right) \right] \nonumber \\&\quad =\sum \limits _{i=1}^{n}\left( \sum \limits _{x^{1}}P_{\theta }(x^0,x^1)x_{i}^{1}\right) \left( \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_{i})_{+}\right) \nonumber \\&\qquad + \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_i)_{-}\right) \nonumber \\&\quad =\sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \sum \limits _{j=1}^{n}(v_{i,j})_{+}x^{0}_{j} + (v_i)_{+}\right) \nonumber \\&\qquad + \sum \limits _{i=1}^{n}\sigma (u_{i}(x^{0},\theta )) \left( \sum \limits _{j=1}^{n}(v_{i,j})_{-}x^{0}_{j} + (v_{i})_{-}\right) \nonumber \\&\quad = \sum \limits _{i=1}^{n}\sigma (u_{i}(x^0))|v_i| + \sum \limits _{i=1}^{n}\sum \limits _{j=1}^{n} \sigma (u_{i}(x^{0},\theta ))|v_{i,j}|x^{0}_{j}. \end{aligned}$$
(28)

Combining (27) with (28) and the definitions (13) and (14) we obtain (12).

1.2 A.2 Derivation of Eqs. 17 and 18

We have

$$\begin{aligned} Q(x_1)= & {} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} Q(x_1,x_2,\ldots ,x_n) \nonumber \\= & {} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=1}^{n} \beta _i^{x_i} (1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=1}^{n}\alpha _{i}x_i \right) \nonumber \\= & {} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \beta _1^{x_1}(1-\beta _1)^{1-x_{1}} \left( d + \alpha _1x_1 + \sum \limits _{i=2}^{n}\alpha _{i}x_i \right) \nonumber \\= & {} \beta _1^{x_1}(1-\beta _1)^{1-x_1} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \alpha _1x_1 + \sum \limits _{i=2}^{n}\alpha _{i}x_i \right) \nonumber \\= & {} \beta _1^{x_1}(1-\beta _1)^{1-x_1} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c}\prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=2}^{n}\alpha _{i}x_i \right) \nonumber \\&\qquad + \beta _1^{x_1}(1-\beta _1)^{1-x_1} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c}\prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i}\alpha _1x_1 \nonumber \\= & {} \beta _1^{x_1}(1-\beta _1)^{1-x_1} \alpha _1x_1\frac{1}{c}\nonumber \\&\qquad + \beta _1^{x_1}(1-\beta _1)^{1-x_1} \frac{1}{c} \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i}\left( d + \sum \limits _{i=2}^{n}\alpha _{i}x_i \right) .\nonumber \\ \end{aligned}$$
(29)

To simplify this equation, note that for \(n>1\),

$$\begin{aligned}&\sum \limits _{x_1 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \prod _{i=1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=1}^{n}a_ix_i\right) \nonumber \\&= \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \left[ \beta _1\prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=2}^na_ix_i + a_1\right) \right. \nonumber \\&\qquad \left. + (1-\beta _1)\prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=2}^na_ix_i\right) \right] \nonumber \\&\quad =\sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}}\left[ \beta _1a_1\prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} + \prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=2}^na_ix_i\right) \right] \nonumber \\&\quad = \beta _1\alpha _1 + \sum \limits _{x_2 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \prod _{i=2}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=2}^na_ix_i\right) ,\nonumber \\ \end{aligned}$$
(30)

and if \(n=1\) then

$$\begin{aligned} \begin{aligned} \sum \limits _{x_1 \in \{0,1\}}\prod _{i=1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=1}^{n}a_ix_i\right)&= \beta _1(d + a_1) + (1-\beta _1)d \\&= \beta _1d + \beta _1a_1 + d - \beta _1d = \beta _1a_1 +d. \end{aligned} \end{aligned}$$
(31)

Combining Eqs. 30 and 31, we see that for any \(n\ge 1\),

$$\begin{aligned} \sum \limits _{x_1 \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \prod _{i=1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=1}^{n}a_ix_i \right) = d + \sum \limits _{i=1}^{n}\beta _i\alpha _i. \end{aligned}$$
(32)

Combining Eqs. 29 with 32,

$$\begin{aligned} Q(x_1)&= \beta _1^{x_1}(1-\beta _1)^{1-x_1}\frac{\alpha _1x_1}{c} + \beta _1^{x_1}(1-\beta _1)^{1-x_1}\frac{1}{c} \left( d + \sum \limits _{i=2}^{n}\beta _i\alpha _i\right) \\&= \beta _1^{x_1}(1-\beta _1)^{1-x_1}\frac{1}{c} \left[ d + \alpha _1x_1 + \sum \limits _{i=2}^{n}\beta _i\alpha _i\right] . \end{aligned}$$

In general,

$$\begin{aligned}&Q(x_k, x_{k-1},\ldots , x_1) \\&\quad = \textstyle \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} Q(x_1,\ldots ,x_k,,\ldots ,x_n) \\&\quad = \textstyle \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c}\prod _{i=1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=1}^{n}a_{i}x_i\right) \\&\quad = \textstyle \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=k+1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \\&\quad \quad \times \beta _k^{x_k}(1-\beta _k)^{1-x_k} \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( \sum \limits _{i=1}^{k-1}a_ix_i + a_kx_k + d + \sum \limits _{i=k+1}^{n}a_{i}x_i \right) \\&\quad = \textstyle \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=k+1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \beta _k^{x_k}(1-\beta _k)^{1-x_k}\\&\qquad \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( a_kx_k + \sum \limits _{i=1}^{k-1}a_ix_i\right) \\&\qquad + \textstyle \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}} \frac{1}{c} \prod _{i=k+1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \beta _k^{x_k}(1-\beta _k)^{1-x_k}\\&\qquad \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=k+1}^{n}a_{i}x_i \right) \\&\quad = \textstyle \beta _k^{x_k}(1-\beta _k)^{1-x_k}\frac{1}{c} \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( a_kx_k + \sum \limits _{i=1}^{k-1}a_ix_i\right) \\&\qquad + \textstyle \beta _k^{x_k}(1-\beta _k)^{1-x_k}\frac{1}{c} \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \sum \limits _{x_{k+1} \in \{0,1\}}\ldots \sum \limits _{x_n \in \{0,1\}}\\&\qquad \prod _{i=k+1}^{n}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d + \sum \limits _{i=k+1}^{n}a_{i}x_i \right) \\&\quad = \textstyle \beta _k^{x_k}(1-\beta _k)^{1-x_k}\frac{1}{c} \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( a_kx_k + \sum \limits _{i=1}^{k-1}a_ix_i\right) \\&\qquad + \textstyle \beta _k^{x_k}(1-\beta _k)^{1-x_k}\frac{1}{c} \prod _{i=1}^{k-1}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d +\sum \limits _{i=k+1}^{n}\beta _i\alpha _i\right) \\&\quad = \textstyle \frac{1}{c} \prod _{i=1}^{k}\beta _i^{x_i}(1-\beta _i)^{1-x_i} \left( d+ \sum \limits _{i=1}^{k}a_ix_i + \sum \limits _{i=k+1}^{n}\beta _i\alpha _i\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flynn, T., Vázquez-Abad, F. A simultaneous perturbation weak derivative estimator for stochastic neural networks. Comput Manag Sci 16, 715–738 (2019). https://doi.org/10.1007/s10287-019-00357-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10287-019-00357-1

Navigation