Nonlinear systems are everywhere, yet most engineering curricula devote very little time, if any, to them. The reason is twofold: first of all, there is no general theory describing arbitrary nonlinear systems. Second, the theory of linear systems is effective in facilitating the design of many real-world systems. In fact, for sufficiently small input signals, the behaviour of most nonlinear systems can be approximated by a linear model. As a consequence the majority of engineered systems are designed based on linear system theory and their usability is limited in one way or another by the deviation of the real system from the assumed linear behaviour. This book is intended to give engineers a powerful tool to model, understand and reduce the impact of mild deviations from linear behaviour and thereby design better systems.

This chapter tries to develop some intuition for what we call weakly-nonlinear systems. It also tries to give an idea of the theory to which this book is devoted and to its applicability. The chapter is not meant to introduce in a precise way any concept. In fact the exposition is rather informal. A proper systematic development of the theory will start with the next chapter.

1.1 Nonlinear Phenomena

The range of phenomena exhibited by nonlinear systems is much richer than the one of linear systems. To understand the applicability of the presented theory it’s useful to have an idea of the main ones that may appear. In the following we give a bird’s-eye view of them with qualitative descriptions.

1.1.1 Multiple Equilibrium Points

Most dynamical systems can be described by a system of differential equations that can be written in the form

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}u = f(u,x,t)\,, \end{aligned}$$

with \(u\in \mathbb {R}^n\) the state of the system and \(x\in \mathbb {R}^m\) the driving or input signal. For simplicity in this chapter we limit ourselves to autonomous systems. These are systems described by the simpler equation

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}u = f(u)\,. \end{aligned}$$
(1.1)

In the case of first and second order systems one can obtain a good qualitative understanding by examining the phase portrait of the system. This is a graphical representation of a family of state trajectories \(t \mapsto u(t)\) for various initial conditions \(u_0\) in the plane spanned by the components \(u_1\) and \(u_2\) of u that in this context is called the  state or phase space. Note that the phase portrait can be sketched without having to solve the equation by considering the vector field defined by f in the state plane. With it, it’s easy to estimate the trajectory of the state u for every initial condition \(u_0\).

Of special interest are the zeros of the vector function f. That’s because in those states the derivative with respect to time of the state vector u vanishes. In other words, the zeros of f are the equilibrium points of the system. A nonlinear function f in general has several equilibrium points and that’s a first fundamental difference from linear systems which always have only one equilibrium point.

If f is well-behaved,Footnote 1 for every initial state \(u_0\), the system (1.1) has a unique solution. This means that the trajectories in the phase plane do not intersect. Therefore, the trajectories can only begin or end at equilibrium points, at infinity or on limit cycles (see below).

Fig. 1.1
figure 1

Pendumum in Earth’s gravitational field

As an example consider the ideal friction-less pendulum shown in Fig. 1.1 and described by the differential equation

$$\begin{aligned} \frac{{\textrm{d}}^{2}}{{\textrm{d}t}^{2}}\phi + \omega _0^2 \sin (\phi ) = 0\,, \qquad \omega _0 = \sqrt{\frac{g}{l}} \end{aligned}$$

with \(\phi \) the angle from the vertical, g the gravitational acceleration and l the length of the arm. The equation can be rewritten as

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}\begin{pmatrix} u_1\\ u_2 \end{pmatrix} = \begin{pmatrix} u_2\\ -\omega _0^2 \sin (u_1) \end{pmatrix} \end{aligned}$$

where we have set \(u_1 = \phi \), \(u_2 = \textrm{d}\phi /\textrm{d}t\). The phase portrait of this system is clearly periodic along the \(u_1\) axis. We can therefore limit the study to the range \(u_1=[-\pi , \pi )\).Footnote 2 In this range the system has two equilibrium points: \(u_{0a}=(0, 0)\) and \(u_{0b} = (\pi , 0)\).

Fig. 1.2
figure 2

Phase portrait of an ideal pendulum with \(\omega _0=1\)

The phase portrait of this system is depicted in Fig. 1.2 with the equilibrium points shown as black dots. The dashed lines connect the two equilibrium points and separates the phase plane in two distinct regions in which the system has different behaviour. The boundary between the two regions (the surface constituted by the dashed lines) is called the separatrix. Trajectories surrounding the equilibrium point \(u_{0a}\) are closed curves that represent oscillations. The trajectories above and below the separatrix represent the pendulum perpetually rotating around the pivot.

This shows a second fundamental difference from linear systems. Nonlinear systems can exhibit different behaviour and characteristics in different regions of the phase space.

1.1.2 Limit Cycles

A further phenomenon present in some nonlinear systems that doesn’t exist in linear ones is that of the limit cycles. These are periodic solutions of the equations at specific signal levels. As a simple example, consider the oscillator shown in Fig. 1.3a. It consists of a passive RLC resonator and a nonlinear saturating voltage-controlled current source (VCCS) with characteristic

Fig. 1.3
figure 3

a Electrical RLC oscillator with nonlinear feedback b Voltage-controlled current-source characteristic

$$\begin{aligned} i(v) = I_0 \tanh \Big (\frac{v}{V_s}\Big ) \end{aligned}$$

and plotted in Fig. 1.3b. The system is described by

$$\begin{aligned} \frac{{\textrm{d}}^{2}}{{\textrm{d}t}^{2}}v + \frac{\omega _o}{q} \Big [ 1 - G_m(v) R \Big ] \frac{\textrm{d}}{{\textrm{d}t}}v + \omega _0^2 v = 0. \end{aligned}$$
(1.2)

with

$$\begin{aligned} \omega _0 &= \frac{1}{\sqrt{LC}} & q &= \frac{R}{\omega _0 L} \end{aligned}$$

and the nonlinear transconductance

$$\begin{aligned} G_m(v) = \frac{\textrm{d}}{\textrm{d}v} i(v) = \frac{I_0}{V_s} \textrm{sech}^2\Big (\frac{v}{V_s}\Big ). \end{aligned}$$

If the maximum of \(|v(t) |\) over a full period \({\mathcal {T}}=2\pi /\omega _0\) remains small compared to \(V_s\) then the value of \(\textrm{sech}(v(t)/V_s)\) remains very nearly 1 over a full cycle. Therefore, under this assumption, if \(G_m(0) R > 1\) the coefficient of the first order derivative of v in (1.2) is negative and the (0, 0) equilibrium point of the equation is unstable. Differently from this, if the maximum of \(|v(t) |\) over a period is much larger than \(V_s\) then the value of \(\textrm{sech}(v(t)/V_s)\) approaches 0 for most part of a period. Hence, in this regime of operation the system is governed by an equation corresponding to the one of a damped oscillator. Between these two extreme cases there is a periodic trajectory, a limit cycle. On this trajectory the energy dissipated during one cycle by the resistor R is perfectly balanced by the energy injected in the resonator by the controlled source. This behaviour of the system is clearly discernible in the phase portrait shown in Fig. 1.4 in which we chose the current flowing through the inductor (downwards) \(i_L\) and v as state variables.

Fig. 1.4
figure 4

Phase portrait of an electrical oscillator. \(\omega _0=1\,\text {rad/s}, q = 4, R = 1~\Omega , V_s=1~\text {V}, I_0 = 3V_s/R\)

This example has a stable limit cycle, but there are systems with unstable limit cycles: any infinitesimally small deviation from the perfectly periodic trajectory leads to a trajectory diverging from the limit cycle. Limit cycles can also be stable on one side and unstable on the other one.

1.1.3 Bifurcations

All practical systems depend upon some parameters. For example the oscillator of the previous section depends on the value of the resistor R, and it is interesting to study how the value of that parameter affects the behaviour of the system. In particular the number and type of equilibrium points of a system may depend on the value of some parameter. This is in fact the case for our oscillator: For \(G_m(0)R < 1\) the system has a single stable equilibrium point, while for \(G_m(0)R > 1\) that equilibrium point becomes unstable and a limit cycle makes its appearance. Parameter values at which the character of the system behaviour changes are called criticalor bifurcation points.

Fig. 1.5
figure 5

a Pitchfork bifurcation potential b Pitchfork bifurcation equilibrium points

As a second example, consider a system described by the differential equation

$$\begin{aligned} \frac{{\textrm{d}}^{2}}{{\textrm{d}t}^{2}}u = - \lambda u + u^3. \end{aligned}$$

The system can be interpreted as having a potential energy

$$\begin{aligned} U_\lambda (u) = \frac{u^4}{4} - \lambda \frac{u^2}{2}. \end{aligned}$$

For \(\lambda < 0\) the potential energy has a single minimum at \(u=0\), while for \(\lambda > 0\) it has two minima as shown in Fig. 1.5a. In the latter case \(u=0\) is an unstable equilibrium point and two new stable equilibrium points at \(\pm \sqrt{\lambda }\) do appear. If we draw the equilibrium points of the system as a function of \(\lambda \) one obtain the so-called pitchfork shown in Fig. 1.5b.

1.1.4 Chaos

Consider the driven pendulum shown in Fig. 1.6. It is similar to the one of Sect. 1.1.1, with the difference that now the pivot moves in time in the vertical direction as described by the function \(y_p\). This movement introduces a driving term in the differential equation that then becomes

Fig. 1.6
figure 6

Driven pendumum in Earth’s gravitational field

$$\begin{aligned} \frac{{\textrm{d}}^{2}}{{\textrm{d}t}^{2}}\phi + \omega _0^2 \sin \phi = - \frac{\sin \phi }{l} \frac{{\textrm{d}}^{2}}{{\textrm{d}t}^{2}}y_p(t). \end{aligned}$$

Lets assume that the drive is periodic \(y_p(t) = A \cos (\omega t)\). Figure 1.7 shows the time evolution of \(\phi \) for two almost identical initial conditions. The upper curve was computed with the pendulum starting with \(\frac{\textrm{d}}{{\textrm{d}t}}\phi (0) = 0\) rad/s and at an angle of \(\phi (0) = 1\) rad. The lower curve was computed with almost identical initial conditions \(\frac{\textrm{d}}{{\textrm{d}t}}\phi (0) = 0\) rad/s and \(\phi (0) = 1 + 10^{-10}/l\) rad. The lower curve was thus started with a displacement corresponding to approximately an atom diameter from the upper one. The evolution of the two is initially very similar. However, after some time they become completely different and uncorrelated. This extreme sensitivity to initial conditions is the characteristic defining chaotic systems and makes long term predictions essentially impossible. In those systems the initial difference between adjacent trajectories grows on average exponentially [3].

Fig. 1.7
figure 7

Time evolution of the driven pendulum with \(\omega _0=1\,\text {rad/s}, \omega = 2\omega _0, g = 9,8\,\text {m/s}^2, l = 9.8\,\text {m}, A = l/10\). The upper curve was computed with initial conditions \(\phi (0) = 1\,\text {rad}, \frac{\textrm{d}}{{\textrm{d}t}}\phi (0) = 0\,\text {rad/s}\), the lower one with initial conditions \(\phi (0) = 1 + 10^{-10}/l\,\text {rad}, \frac{\textrm{d}}{{\textrm{d}t}}\phi (0) = 0\,\text {rad/s}\)

In this simple case the phenomenon is intuitively understandable. When the pendulum reaches a position very close to the vertical, an infinitesimal difference in velocity can determine if it makes a full turn or if it goes back.

Note that if the initial oscillation is sufficiently small, the force exercised by the vertical drive is almost orthogonal to the direction in which the mass is free to move. For this reason, small oscillations are not pushed to large swings and do not show chaotic behaviour. There are therefore regions of the phase space exhibiting chaotic behaviour and regions not exhibiting it. The areas of these regions depend of course on the amplitude A of the drive. Small values of A lead to large areas in which the system behaves predictably and only small areas displaying chaotic behaviour.

1.2 Weakly-Nonlinear Systems

Chaos, bifurcations and other phenomena of nonlinear systems are fascinating, important and sometimes fundamental to the problem at hand. However, the vast majority of engineered system operate around stable equilibrium points by design. From an engineering point of view a quantitative theory to study the behaviour of nonlinear systems in the proximity of stable equilibrium points is therefore very important.

Inspection of the presented phase portraits suggest that in the neighbourhood of equilibrium points the behaviour of nonlinear systems is not too different from the one of linear systems, the deviation increasing with increasing distance of the state from the equilibrium points. In fact this statement can be made more precise. Consider a time invariant, single-input single-output (SISO) system with input x whose state dynamics is governed by the system of first order differential equations

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}u(t) = f(u(t),x(t))\,, \qquad f: \mathbb {R}^n \times \mathbb {R}\rightarrow \mathbb {R}^n \end{aligned}$$

and its output y by the algebraic equation

$$\begin{aligned} y(t) = g(u(t),x(t))\,, \qquad g: \mathbb {R}^n \times \mathbb {R}\rightarrow \mathbb {R}\,. \end{aligned}$$

If around the equilibrium point \(u_0 = 0\)Footnote 3 and \(x=0\) the functions f and g are differentiable, then, using a Taylor expansion, the system behaviour can be approximated by the linear equations

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}u(t) \approx A u(t) + B x(t) \qquad A \in \mathbb {R}^{n \times n}\,, \quad B \in \mathbb {R}^{n \times 1} \end{aligned}$$

and

$$\begin{aligned} y(t) \approx C u(t) + D x(t)\,, \qquad C \in \mathbb {R}^{1 \times n}\,, \quad D \in \mathbb {R}\,. \end{aligned}$$

The response of the system to the input signal can then be expressed by a convolution integral between the  impulse response h of the system and the input signal

$$\begin{aligned} y(t) = h(\tau ) *x(t)\,. \end{aligned}$$
(1.3)

Note that in this chapter by stable equilibrium point we mean one for which all eigenvalues of the linearized state equation are negative.

The linear systems theory is very useful. However, for many practical applications this idealisation is too crude and doesn’t capture effects that limit the usability of a vast array of systems. The theory presented in this book enables one to solve the system equations when f and g are approximated by a higher order polynomial or even by power series. The theory therefore is able to give a more faithful description of the behaviour of many real systems. In particular, it allows probing into effects outside the reach of linear systems theory.

Consider first a memory-lesssystem, that is, a system whose output y(t) at time t depends only on the value of its input signal x(t) at time t and not on any of its past (or future) values. Such a system can be represented by a function g mapping for every value of t the value x(t) to y(t)

$$\begin{aligned} y(t) = g(x(t))\,. \end{aligned}$$

Let’s assume that for a zero input signal the output is zero and that g can be expanded in a Taylor series. Then we can write

$$\begin{aligned} y(t) = \sum _{k=1}^\infty g_k x^k(t)\,. \end{aligned}$$

Using the Dirac \(\delta \) distribution this expression can be written in the different form

$$\begin{aligned} y(t) = \sum _{k=1}^\infty g_k\,\delta (\tau _1,\dotsc ,\tau _k) *x^{\otimes k}(t)\,. \end{aligned}$$

That’s because, under assumptions to be made precise later, the \(\delta \) distribution is the unit of the  convolution product

$$\begin{aligned} \delta (\tau ) *x(t) = x(t)\,. \end{aligned}$$

The response of a linear memory-less system can therefore be written as

$$\begin{aligned} y(t) = g_1 \delta (\tau ) *x(t) \end{aligned}$$

which shows a striking similarity with (1.3), the response of a linear dynamical system with impulse response h. In fact \(g_1\delta \) is the impulse response of a linear memory-less system, and it vanishes everywhere except at the origin as expected.

From these considerations it’s natural to hypothesise that the response of a class of nonlinear systems, around a stable equilibrium point, can be represented by a series of the form

$$\begin{aligned} y(t) = \sum _{k=1}^\infty h_k(\tau _1,\dotsc ,\tau _k) *x^{\otimes k}(t)\,. \end{aligned}$$

This is in fact true, and it is the  Volterra series representation of the system with \(h_k\) its kth order impulse response. This representation is valid only for sufficiently small input signals not pushing the state of the system beyond a separatrix. This limitation of the Volterra series should not surprise. In fact power series, which are a subset of the Volterra series, in general also have a finite convergence radius.

The similarity between power series and the Volterra series doesn’t end here. By introducing suitable definitions, we can represent the cascade of nonlinear systems represented by their respective Volterra series in a similar way as the composition of power series.

We call systems that can be represented by a Volterra series weakly-nonlinearsystems. A feature of weakly-nonlinear systems shared with linear ones is the fact that the differential equations describing the system have to be solved only once to obtain the impulse responses. The response of the system to a large set of different input signals can then be computed directly from them. The impulse responses therefore completely characterise weakly-nonlinear systems. As with linear systems, weakly-nonlinear ones have a frequency domain representation in terms of nonlinear transfer functions.

The theory can be extended to cover time-varying systems. In this case the impulse responses (or nonlinear transfer functions) have an explicit dependence on time

$$\begin{aligned} h_k(t, \tau _1,\dotsc ,\tau _k)\,. \end{aligned}$$

1.3 Distributions

The Dirac \(\delta \) distribution plays a key role in highlighting the relationship between power and Volterra series. An ad-hoc use of the \(\delta \) distribution however easily leads to problems.

Consider for example the Heaviside unit step function (or unit step function)

$$\begin{aligned} \textsf{1}_{+}(t) :={\left\{ \begin{array}{ll} 0 &{} t < 0 \\ 1 &{} t \ge 0 \end{array}\right. } \end{aligned}$$
(1.4)

and the theorem stating that the Laplace transform of the derivative of a function f continuous for \(t>0\) is

$$\begin{aligned} s F(s) - f(0+) \end{aligned}$$

with F the Laplace transform of f and \(f(0+)\) the right-hand side limit to 0 of the function. A careless application of this theorem to \(\textsf{1}_{+}\) gives

$$\begin{aligned} {\mathcal {L}}\left\{ \frac{\textrm{d}}{{\textrm{d}t}}\textsf{1}_{+}\right\} = s \frac{1}{s} - 1 = 0 \end{aligned}$$

where we have used the fact that \({\mathcal {L}}\left\{ \textsf{1}_{+}\right\} = 1/s\). However, we will show that the derivative of \(\textsf{1}_{+}\) is the \(\delta \) impulse whose Laplace transform is 1. The error lies in the fact that \(\delta \) is not a function, but rather a Schwartz’s distribution, or distribution for short. The above theorem, in the stated form, is therefore not applicable.

Distributions are the proper setting for studying linear and weakly-nonlinear systems. In this setting the  convolution product comes to play a central role. In fact, distributions allow defining convolution algebras with \(\delta \) playing the role of the unit. The Laplace transform then not only maps convolution products into multiplications, but it also maps the unit of the convolution algebra into the unit of multiplication. In addition, in this setting, the derivative of a distribution f can be represented as the convolution of the distribution with the derivative of the unit

$$\begin{aligned} \frac{\textrm{d}}{{\textrm{d}t}}f = \frac{\textrm{d}}{{\textrm{d}t}}\delta *f\,. \end{aligned}$$

Differential equations can therefore be transformed into convolution equations to obtain a complete time-domain mirror image of the Laplace domain algebraic equations. Distributions enable a uniform representation in terms of convolution products of ubiquitous and embarrassingly simple linear systems such as inductors, which a theory based on functions is unable to do

$$\begin{aligned} v = L\frac{\textrm{d}}{{\textrm{d}t}}\delta *i\,. \end{aligned}$$

Here we see the current i as the input and the voltage v as the output of the system.

While we have been implicitly assuming causal systems and signals vanishing for \(t<0\), there are other  convolution algebras. One of them is the convolution algebra of periodic distributions intimately related to the Fourier series, where the \(\delta \) distribution plays a central role as well.

1.4 Numerical Simulations

Learning a theory requires some investment of time. The question is: is it worth it in a world full of computers and where numerical methods able to solve most nonlinear equations are readily available? In our view the answer is definitely a resounding yes. Numerical simulations and theory are not in competition, but rather they complement each other.

The theory is able to reveal the origin of the various effects at play and to clarify how each parameter affects the performance of a system. However, to do so we must use relatively simple models and therefore, most of the time, obtain approximate results.

Simulations on the other hand can be used to obtain accurate answers taking into account all details. However, the results are presented as tables of numbers (or curves) valid only for a specific set of values of the parameters. We can of course run many simulations and sweep parameters, but complex simulations are not fast and this poses practical limits. In addition, inferring the relationship between parameters and a specific effect from simulation results only is often challenging. We can say that a good theory based engineering model is like a (slightly distorted) picture, while numerical simulations are like dots of a halftone image. A good model is worth thousands of simulations.

Most of the time the difficulty in engineering problems lies in finding the simplest model able to correctly characterise the effects of interest. During the phase of model development, numerical simulators can be extremely useful by using them as ideal laboratories in which to validate hypotheses. In these virtual laboratories it’s easy to change the laws of physics and suppress or decouple phenomena in a way that’s impossible in the real world. Experiments conducted in these virtual laboratories can therefore be an invaluable guide in the development of a model. Once a good model has been found, it will rapidly guide the development of the system. Numerical simulations will then serve further for final tuning and verification.

1.5 Historical Notes

Around 1887 Vito Volterra developed the concept of functionals as an extension of functions of multiple variables to ones with an uncountable infinite number [4, 5]. Let \(f(x_1,\dotsc ,x_k)\) be a real valued function of the k real variables \(x_1\) to \(x_k\). The latter can be interpreted as the values of a function \(x_.\) evaluated at the discrete points 1 to k. He therefore conceived a functional as a function of another function x defined over a continuous finite interval. He then proposed the series now bearing his name as an extension of the Taylor series from functions to functionals.

In 1910 the mathematician M. Fréchet published a more detailed analysis of the conditions under which a functional can be expanded into a Volterra series [6]. This work is regarded by most as the foundation demonstrating the validity of Volterra’s series expansion.

Volterra was well known and was invited to present his works in several countries, including the United States. During the second World War there was great pressure to develop anti-aircraft systems and N. Wiener of the Massachusetts Institute of Technology (MIT) found that by using Volterra’s series he could analyse the response of a nonlinear device to noise [7]. His report was initially restricted. Its release after the war sparked interest in the engineering community at MIT and elsewhere. Several studies followed in the 50s to the 70s applying Volterra’s series to nonlinear engineering problems, with [8–10] among the most significant ones. Wiener himself remained interested in the subject and developed his own variant of the theory based on Browning motion and leading to what’s now called the Wiener series [11]. At the beginning of the 80s some books summarised the Volterra and Wiener theories [12, 13]. At around the same time, with the raise of desktop computers, engineering efforts started more and more to embrace numerical methods.

During the first decades of the 20th century there were two mathematical methods used by engineers and physicists that kept mathematicians occupied. The first was to find a solid mathematical justification for the operational calculus popularised by O. Heaviside. The second was the search for a solid mathematical interpretation for the  \(\delta \) distribution and its derivatives extensively used by P. A. M. Dirac in his landmark treatise on quantum mechanics which first appeared in 1930 [14].

The former was solved in two ways:

  1. (i)

    With the help of the Laplace- and Fourier-transforms, highlighting the frequency domain aspects, and

  2. (ii)

    by Mikusinski [15] using purely algebraic methods.

The second was solved by L. Schwartz by introducing new mathematical objects called distributions [16]. These are a special class of functionals with particularly attractive properties such at the fact that they are indefinitely differentiable. In addition, differentiation of distributions is a linear operation making series always differentiable term by term. With distributions Schwartz not only did put the \(\delta \) “function” and its derivatives on a solid ground, but he also introduced  convolution algebras and unified the two justifications for the operational calculus.

A deep understanding of distributions requires familiarity with advanced concepts of topological vector spaces [16, 17], which is probably why they are rarely introduced to engineers. However, the elementary part of the theory can be developed without recurring to particularly deep mathematical concepts and is of great practical value in physics and engineering problems. The aim of this book is to introduce distributions to engineers and use them to view the Volterra series and, more generally, weakly-nonlinear systems from a point of view different from the traditional one. The advantages are, among others, a conceptual simplification, a simpler notation freeing expressions from multiple integrals and an exposition of the theory of weakly-nonlinear systems as a natural extension of the linear one.