# Stochastic Adaptive Control

**DOI:**https://doi.org/10.1007/978-1-4471-5102-9_231-1

## Definition

Stochastic adaptive control denotes the control of partially known stochastic control systems. The stochastic control systems can be described by discrete- or continuous-time Markov chains or Markov processes, linear and nonlinear difference equations, and linear and nonlinear stochastic differential equations. The solution of a stochastic adaptive control problem typically requires the identification of the partially known stochastic system and the simultaneous control of the partially known system using the information from the concurrent identification scheme. Two desirable goals for the solution of a stochastic adaptive control problem are called self-tuning and self-optimality. Self-tuning denotes the convergence of the family of adaptive controls indexed by time to the optimal control for the true system. Self-optimizing denotes the convergence of the long-run average costs to the optimal long-run average cost for the true system. Typically to achieve the self-optimality, it is important that the family of parameter estimators from the identification scheme be strongly consistent, that is, this family converges (almost surely) to the true parameter values. Thus, with self-optimality, asymptotically a partially known system can be controlled as well as the corresponding known system.

## Motivation and Background

In almost every formulation of a stochastic control problem from a physical system, the physical system is incompletely known so the stochastic system model is only partially known. This lack of knowledge can often be described by some unknown parameters for a mathematical model, and the noise inputs for the model can describe unmodeled dynamics or perturbations to the system. The lack of knowledge of some parameters of the model can be modeled either by random variables with known prior distributions or as fixed unknown values. The former description requires Bayesian estimation, and the latter description requires parameter estimation such as least squares or maximum likelihood.

Stochastic adaptive control arose as a natural evolution from the results in stochastic control, and in particular it developed for some well-known control problems. The optimal control of Markov chains had been developed for some time, so it was natural to investigate the adaptive control of Markov chains. Mandl (1973) was probably the first to consider this adaptive control problem in generality. His conditions for strong consistency of a family of estimators were fairly restrictive. Borkar and Varaiya (1982) simplified the conditions for the estimation part of the problem by only requiring convergence of the estimators of the parameters so that the resulting transition probabilities of the Markov chain are identical to the transition probabilities for the true optimal solution.

A second major direction for stochastic adaptive control is described by ARMAX (autoregressive-moving average with exogenous inputs) models. These are discrete-time models that can be described in terms of polynomials in a time shift operator. A closely related and often equivalent model is multidimensional linear difference equations in a state-space form. Since the solution of the infinite time horizon stochastic control problem was available in the late 1950s, it was natural to consider the adaptive control problem. Methods such as least squares, weighted least squares, maximum likelihood, and stochastic approximation were used for parameter identification and a certainty equivalence adaptive control for the system, that is, using the current estimate of the parameters as the true parameters to verify self-optimality. An important development in stochastic adaptive control is a result called the self-tuning regulator where the convergence of estimators of unknown parameters implied the convergence of the output tracking error (Astrom and Wittenmark 1973; Goodwin et al. 1981; Guo 1995, 1996; Guo and Chen 1991; Kumar 1990).

A number of monographs treat various aspects of stochastic adaptive control problems, e.g., Astrom and Wittenmark (1989), Chen and Guo (1991), Kumar and Varaiya (1986), and Ljung and Soderstrom (1983). An extensive survey article on the early years of stochastic adaptive control is given by Kumar (1985).

## Structures and Approaches

Various requirements can be made for the adaptive control of a stochastic system. It can only be required that the family of adaptive controls is stabilizing the unknown system or that the family of adaptive controls converges to the optimal control for the true system or that the family of adaptive controls has a long-run average cost that is equal to the optimal average cost for the true system. The identification part of the adaptive control problem can be Bayesian estimation (Kumar 1990) if the parameters are assumed to be random variables or parameter estimation (Bercu 1995; Lai and Wei 1982) if the parameters are assumed to be unknown constants. The identification scheme may also incorporate information about the running cost.

For linear systems with white noise inputs, it is well known to use least squares (or equivalently maximum likelihood) estimation to estimate parameters. However, for stochastic adaptive control problems, the sufficient conditions for the family of estimators to be strongly consistent are fairly restrictive (e.g., Lai and Wei 1982), and in fact the family of estimators may not even converge in general. A weighted least squares estimation scheme can guarantee convergence of the family of estimators (Bercu 1995) and can often be strongly consistent (Guo 1996). Some other estimation methods are stochastic approximation (Guo and Chen 1991) and an ordinary differential equation approach (Ljung and Soderstrom 1983). For discrete-time nonlinear systems, a family of strongly consistent estimators may not converge sufficiently rapidly even to stabilize the nonlinear system (Guo 1997).

The study of stochastic adaptive control of continuous-time linear stochastic systems with long-run average quadratic costs developed somewhat after the corresponding discrete-time study (e.g., Duncan and Pasik-Duncan 1990). A solution with basically the natural assumptions from the solution of the known system problem using a weighted least squares identification scheme is given in Duncan et al. (1999).

Another family of stochastic adaptive control problems is described by linear stochastic equations in an infinite dimensional Hilbert space. These models can describe stochastic partial differential equations and stochastic hereditary differential equations. Some linear-quadratic-Gaussian control problems have been solved, and these solutions have been used to solve some corresponding stochastic adaptive control problems (e.g., Duncan et al. 1994a).

Optimal control methods such as Hamilton-Jacobi-Bellman equations and a stochastic maximum principle have been used to solve stochastic control problems described by nonlinear stochastic differential equations (Fleming and Rishel 1975). Thus, it was natural to consider stochastic adaptive control problems for these systems. The results are more limited than the results for linear stochastic systems (e.g., Duncan et al. 1994b).

Other stochastic adaptive control problems have recently emerged that are modeled by multi-agents, such as mean field stochastic adaptive control problems (e.g., Nourian et al. 2012).

## A Detailed Example: Adaptive Linear-Quadratic-Gaussian Control

*X*(

*t*) ∈

*ℝ*

^{ n },

*U*(

*t*) ∈

*ℝ*

^{ m }, and (

*W*(

*t*),

*t*≥ 0) is an

*ℝ*

^{ p }-valued standard Brownian motion and (

*A*,

*B*,

*C*) are appropriate linear transformations.

*X*(

*t*) is the state of the system at time

*t*and

*U*(

*t*) is the control at time

*t*. It is assumed that

*A*,

*B*,

*C*are unknown linear transformations. The cost functional,

*J*(⋅), is a long-run average (ergodic) quadratic cost functional that is given by

*R*> 0 and

*Q*≥ 0 are symmetric linear transformations and < ⋅,⋅ > is the canonical inner product in the appropriate Euclidean space. The standard assumptions for the control of the known system are made also for the adaptive control problem, that is, the pair (

*A*,

*B*) is controllable and \((A, Q^{\frac {1}{2}})\) is observable. An optimal control for the known system is

*S*is the unique positive, symmetric solution of the following algebraic Riccati equation:

*C*

^{ T }

*C*can be identified given (

*X*(

*t*),

*t*∈ [

*a*,

*b*]) for

*a*<

*b*arbitrary from the quadratic variation of Brownian motion, so the identification of

*C*is not considered here. Since it is assumed that the pair (

*A*,

*B*) is unknown, the system equation is rewritten in the following form:

*θ*

^{ T }= [

*A*

*B*] and

*φ*

^{ T }(

*t*) = [

*X*

^{ T }(

*t*)

*U*

^{ T }(

*t*)]. A family of continuous-time weighted least squares recursive estimators (

*θ*(

*t*),

*t*≥ 0) of

*θ*is given by the following stochastic equation:

*a*(

*t*),

*t*≥ 0) is a suitable family of positive stochastic weights (Duncan et al. 1999). A family of estimates \((\hat {\theta }(t), t \geq 0)\) is obtained from (

*θ*(

*t*),

*t*≥ 0) and is expressed as \(\hat {\theta }(t)=[A(t) \text { } B(t)]\) (Duncan et al. 1999). A process (

*S*(

*t*),

*t*≥ 0) is obtained using (

*A*(

*t*),

*B*(

*t*)) by solving the following stochastic algebraic Riccati equation for each

*t*≥ 0:

*A*(

*t*),

*B*(

*t*)) is the correct pair for the true system, so a certainty equivalence adaptive control

*U*(

*t*) is given by

It can be shown (Duncan et al. 1999) that the family of estimators ((*A*(*t*),*B*(*t*)),*t* ≥ 0) is strongly consistent and that the family of adaptive controls given by the previous equality is self-optimizing, that is, the long-run average cost *J*(*U*) = *J*(*U* ^{0}) = *tr*(*C* ^{ T } *SC*) where *S* is the solution of the algebraic Riccati equation for the true system.

## Future Directions

A number of important directions for stochastic adaptive control are easily identified. Only three of them are described briefly here. The adaptive control of the partially observed linear-quadratic-Gaussian control problem (Fleming and Rishel 1975) is a major problem to be solved using the same assumptions of controllability and observability as for the known system. This problem is a generalization of the example given above where the output (linear transformation) of the system is observed with additive noise and the family of controls is restricted to depend only on these observations. Another major direction is to modify the detailed example above by replacing the Brownian motion in the stochastic equation for the state by an arbitrary fractional Brownian motion or by an arbitrary square-integrable stochastic process with continuous sample paths. For this latter problem it is necessary to use recent results for optimal controls for the true system and to have strongly consistent families of estimators. A third major direction is the adaptive control of nonlinear stochastic systems.

## Cross-References

## Notes

### Acknowledgements

Research supported by NSF grant DMS 1108884, AFOSR grant FA9550-12-1-0384, and ARO grant W911NF-10-1-0248.

## Reference

- Astrom KJ, Wittenmark B (1973) On self-tuning regulators. Automatica 9:185–199CrossRefGoogle Scholar
- Astrom KJ, Wittenmark B (1989) Adaptive control. Addison-Wesley, ReadingGoogle Scholar
- Bercu B (1995) Weighted estimation and tracking for ARMAX models. SIAM J Control Optim 33:89–106CrossRefzbMATHMathSciNetGoogle Scholar
- Borkar V, Varaiya P (1982) Identification and adaptive control of Markov chains. SIAM J Control Optim 20:470–489CrossRefzbMATHMathSciNetGoogle Scholar
- Chen HF, Guo L (1991) Identification and stochastic adaptive control. Birkhauser, BostonCrossRefzbMATHGoogle Scholar
- Duncan TE, Pasik-Duncan B (1990) Adaptive control of continuous time linear systems. Math Control Signals Syst 3:43–60CrossRefMathSciNetGoogle Scholar
- Duncan TE, Maslowski B, Pasik-Duncan B (1994a) Adaptive boundary and point control of linear stochastic distributed parameter systems. SIAM J Control Optim 32:648–672CrossRefzbMATHMathSciNetGoogle Scholar
- Duncan TE, Pasik-Duncan B, Stettner L (1994b) Almost self-optimizing strategies for the adaptive control of diffusion processes. J Optim Theory Appl 81:470–507Google Scholar
- Duncan TE, Guo L, Pasik-Duncan B (1999) Adaptive continuous-time linear quadratic Gaussian control. IEEE Trans Autom Control 44:1653–1662CrossRefzbMATHMathSciNetGoogle Scholar
- Fleming WH, Rishel RW (1975) Deterministic and stochastic optimal control. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Goodwin G, Ramadge P, Caines PE (1981) Discrete time stochastic adaptive control. SIAM J Control Optim 19:820–853CrossRefMathSciNetGoogle Scholar
- Guo L (1995) Convergence and logarithm laws of self-tuning regulators. Automatica 31:435–450CrossRefzbMATHGoogle Scholar
- Guo L (1996) Self-convergence of weighted least squares with applications. IEEE Trans Autom Control 41:79–89CrossRefzbMATHGoogle Scholar
- Guo L (1997) On critical stability of discrete time adaptive nonlinear control. IEEE Trans Autom Control 42:1488–1499CrossRefzbMATHGoogle Scholar
- Guo L, Chen HF (1991) The Astrom-Wittenmark self-tuning regulator revisited and ELS based adaptive trackers. IEEE Trans Autom Control 36:802–812CrossRefzbMATHGoogle Scholar
- Kumar PR (1985) A survey of some results in stochastic adaptive control. SIAM J Control Optim 23:329–380CrossRefzbMATHMathSciNetGoogle Scholar
- Kumar PR (1990) Convergence of adaptive control schemes with least squares estimates. IEEE Trans Autom Control 35:416–424CrossRefzbMATHGoogle Scholar
- Kumar PR, Varaiya P (1986) Stochastic systems, estimation, identification and adaptive control. Prentice-Hall, Englewood CliffszbMATHGoogle Scholar
- Lai TL, Wei CZ (1982) Least square estimation is stochastic regression models with applications to identification and control of dynamic systems. Ann Stat 10:154–166CrossRefMathSciNetGoogle Scholar
- Ljung L, Soderstrom T (1983) Theory and practice of recursive identification. MIT, CambridgezbMATHGoogle Scholar
- Mandl P (1973) On the adaptive control of finite state Markov processes. Z Wahr Verw Geb 27:263–276CrossRefzbMATHMathSciNetGoogle Scholar
- Nourian M, Caines PE, Malhame RP (2012) Mean field LQG control in leader-follower stochastic multi-agent systems: likelihood ratio based adaptation. IEEE Trans Autom Control 57:2801–2816CrossRefMathSciNetGoogle Scholar