1 Introduction

Modelling and understanding of democratic processes such as election or referendum have become increasingly important in recent years, in light of the potential threat to democracy posed by the large-scale dissemination of disinformation on the internet and in other media (cf. [1, 2]). In particular, in response to the widely publicised circulation of disinformation during the 2016 US presidential election and the ‘Brexit’ referendum in the UK on the membership of the European Union, a lot of research efforts have been devoted towards the detection, prevention, and retrospective analysis of false information circulated on the internet—the so-called fact checking—for which there is already a substantial body of literature [3,4,5,6,7,8,9]. Whilst important, fact checking alone, however, is insufficient to counter impacts of disinformation; what is equally important is the development of dynamical modelling frameworks for democratic processes, and how dissemination of disinformation might affect them, with a view towards scenario analysis, impact studies, and strategic planning.

In response to this demand we have recently introduced an information-based framework for modelling the dynamics of opinion-poll statistics in elections and referendums [10]. The idea that imperfect information about candidates’ positions on issues, or about their competency and integrity, must play a significant role in modelling election has been argued previously, for instance in [11], where game theory under incomplete information is applied to model electoral competition. In fact, various authors have explored the role of information in election. To name a few, in [12] conditions for the existence of an equilibrium state are established when there are informed and uninformed voters. The aggregation of information in election with imperfect information, and conditions for establishing equilibrium, are investigated in [13]. There are also empirical studies to show how voting pattern vary in accordance with how informed the electorate are [14]. In view of the prevalence of social media usage and the advances in mobile technology, where information can instantly be transmitted far and wide, the role of information in election modelling is now becoming acutely more important. With this in mind, here we explore further details of the election modelling by extending the work presented in [10].

What distinguishes the information-based approach of [10] from the previous work is that a dynamical model for the flow of information is specified as a starting point. How the flow of information would affect the dynamical evolution of the opinion poll is then derived as an output, rather than being modelled. This is achieved by borrowing mathematical techniques from signal processing (cf. [15, 16]) of converting noisy information into the best estimates of the quantities of interests. Because the transformation of the underlying information process into the output is highly nonlinear, even with a relatively simplified model choice for the flow of information it is nevertheless possible to describe dynamical behaviours of complex systems in such a way that it is consistent with our intuition, as we shall demonstrate through various analysis and examples. (That this is the case has been demonstrated in financial modelling for the price processes of various assets, including credit-risky bonds [17], reinsurance contracts [18], or crude oil [19].) When it comes to modelling electoral competition, in particular, we emphasise that our scheme offers, to our best knowledge, the first fully-dynamical framework that captures the impact of information revelation in a noisy environment; whereas previously proposed dynamical models are either deterministic or else are not dependent on information revelation (see, e.g., [20,21,22,23]). This, in turn, allows us to work out probabilities of the occurrences of future events explicitly, as we shall demonstrate, and these results are indispensable for scenario analysis, impact studies, and strategic planning.

The paper is organised as follows. We begin in Sect. 2 with a brief overview of the two approaches for modelling electoral competition proposed in [10]; one called structural approach and the other called reduced-form approach. Our focus in the present paper will be on the latter approach, which is developed further in Sect. 3 with the analysis of identifying the arrival of new information in a noisy communication channel. The results are then applied in Sect. 4 to derive the formula for the a priori probability of a given candidate winning the election. The dynamical evolution of the winning probability is then worked out in Sect. 5. In Sect. 6 we examine the impact of disinformation on the winning probability. In particular, we identify the optimal strategy for disinformation so as to maximally impact the winning probability, in the case of a very simple model for fake news. More generally, the management of information in an electoral competition is examined in Sect. 7, where we analyse how the winning probability is affected as we vary the information flow rate in a time-dependent setup. The sensitivity of the winning probability against the information flow rate parameter is analysed in Sect. 8 by borrowing techniques from information geometry and working out the Fisher information. The result will be useful for the purpose of cost-benefit analysis in an advertisement campaign. We conclude in Sect. 9 with a discussion on how the techniques developed in this paper can be applied, mutatis mutandis, within the structural approach, to help election campaign in a realistic setup, and more generally to manage information in advertisement. We remark that the purpose of this paper is not in developing new mathematics, but rather to develop a novel information-based approach to model the impact of advertisement, with a focus on electoral competition. Indeed, most of the mathematical manipulations are elementary, and can be found, for instance, in [17, 24]. With this declaration we shall omit repetitive citation to these papers when similar calculations are performed in different contexts.

2 Information-based modelling frameworks for election

In [10] we introduced two closely-related approaches to model election dynamics: one is called an election-microstructure approach, or a structural approach, which encapsulates structural details in a voting scenario; and the other is called a representative voter approach, or a reduced-form approach, which captures qualitative features of election dynamics without the specification of structural details. Let us begin by briefly describing these approaches.

In the structural approach, one considers a set of issues that are of concern to the electorates. These may include, for instance, the candidates’ positions on social welfare, immigration, abortion, climate policy, gun control, healthcare, and so on. The l-th candidate’s position on the k-th issue will then be represented by \(X^l_k\), whose value is not necessarily apparent to the voters. The uncertainties for the factor \(X^l_k\) thus make it a random variable on a suitably defined probability space equipped with the ‘real-world’ probability measure \({{\mathbb {P}}}\). The idea is that, for instance, if the k-th factor were concerned with, say, climate policy, then in a simple model setup \(X^l_k=0\) would represent the situation in which candidate l is against implementing policies to counter climate change, while \(X^l_k=1\) would represent the situation in which candidate l is for implementing such policies. Not all factors need to be binary, of course, but at any rate the voters are not certain about the positions of the candidates on these issues, if they were elected.

The voters, however, possess partial information concerning the values of these factors, and as they learn more about the candidate, or perhaps about the political party to which the candidate belongs, their best estimates for these factors will change in time. The voters also have their preferences: what is a desirable position of a candidate on a given issue to one voter may well be undesirable to another voter. Let us denote by \(w^k_n\) the preference weight of the n-th voter for the k-th factor, which may be positive or negative, depending on the voter’s position on that issue. Then writing \({{\mathbb {E}}}_t[X^l_k]\) for the expectation under the probability measure \({{\mathbb {P}}}\) conditional on the information available at time t, the score at time t assigned to the l-th candidate by the n-th voter, under the assumption of a linear scoring rule, is given by the sum

$$\begin{aligned} S^l_n(t) = \sum _{k} w^k_n\, {{\mathbb {E}}}_t[X^l_k]\, . \end{aligned}$$
(1)

In particular, the voter will select the candidate with the highest score at time \(T\ge t\) when the election takes place. Thus by modelling the flow of information associated with each of the factors, we arrive at a rather detailed dynamical model for election, and this is the basis of the structural approach.

In the present paper we shall be concerned primarily with the reduced-form approach and develop the theory in more detail. We consider an election in which there are N candidates. In the reduced-form approach, the voters are in general not fully certain about which candidate they should be voting for, but they have their opinions based on (1) the information available to them about the candidates, or perhaps about the political party to which they belong, and (2) the voter preferences. The diverse opinion held by the public can then be aggregated in the form of a probability distribution, representing the public preference likelihoods of the candidates. Thus we can think of an abstract random variable X taking the value \(x_i\) with the a priori probability \({{\mathbb {P}}}(X=x_i)=p_i\) defined on a probability space, where \(x_i\) represents the i-th candidate, and the a priori probability \(p_i\) represents today’s opinion-poll statistics.

Today’s public opinion, however, changes over time in accordance with the unravelling of new information relevant to the election. Hence the a priori probability will be updated accordingly, generating a shift in the opinion-poll statistics. To model this dynamics, let us assume (merely for simplicity) that reliable knowledge regarding which candidate represents the ‘best choice’ increases linearly in time, at a constant rate \(\sigma \). There is also a lot of rumours and speculations that obscure the reliable information in the form of noise. This uncertainty, or noise, will be modelled by a Brownian motion, denoted by \(\{B_t\}\), which is assumed to be independent of X because otherwise it cannot be viewed as representing pure noise. Hence, under these modelling assumptions the flow of information, which we denote by \(\{\xi _t\}\), can be expressed in the form

$$\begin{aligned} \xi _t = \sigma X t + B_t . \end{aligned}$$
(2)

For the voters, the quantity of interest is the actual value of X, but there are two unknowns, X and \(\{B_t\}\), and only one known, \(\{\xi _t\}\). In this situation, a rational individual considers the probability that \(X=x_i\) conditional on the information contained in the time series \(\{\xi _s\}_{0\le s\le t}\) gathered up to time t.

We proceed to determine the conditional probability \({{\mathbb {P}}}(X=x_i|\{\xi _s\}_{0\le s\le t})\). We begin by remarking that the information process \(\{\xi _t\}\) is Markovian. An intuitive way of seeing this is to observe that the increment \(\mathrm{d}\xi _t\) of \(\{\xi _t\}\) is given by the sum of \(\sigma X \mathrm{d}t\) and \(\mathrm{d}B_t\), but the coefficient of \(\mathrm{d}t\) is constant in time, while the Brownian motion has independent increments, so the process \(\{\xi _t\}\) of (2) does not carry any ‘memory’. Establishing the Markov property is equivalent to showing that \( {{\mathbb {P}}}(\xi _t\le x | \xi _s,\xi _{s_1},\xi _{s_2},\ldots ,\xi _{s_k}) = {{\mathbb {P}}}( \xi _t\le x|\xi _s)\) for any collection of ordered times \(t, s, s_1, s_2, \ldots ,s_k\). However, the process \(\{\xi _t\}\) conditional on \(X=x_i\) is just a drifted Brownian motion, which clearly is Markovian, so we have

$$\begin{aligned} {{\mathbb {P}}}\left( \xi _t\le x| \xi _s,\xi _{s_1}, \ldots ,\xi _{s_k}\right)= & {} \sum _i {{\mathbb {P}}}\left( \xi _t\le x| X=x_i, \xi _s,\xi _{s_1}, \ldots ,\xi _{s_k}\right) {{\mathbb {P}}}(X=x_i) \nonumber \\= & {} \sum _i {{\mathbb {P}}}\left( \xi _t\le x| X=x_i,\xi _s\right) {{\mathbb {P}}}(X=x_i) \nonumber \\= & {} {{\mathbb {P}}}\left( \xi _t\le x\Big | \xi _s\right) . \end{aligned}$$
(3)

In addition to the Markovian property, the random variable X is a function of the time series \(\{\xi _t\}\) because, with probability one we have

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\xi _t}{\sigma t} = X . \end{aligned}$$
(4)

It follows that the conditional probability \({{\mathbb {P}}}(X=x_i|\{\xi _s\}_{0\le s\le t})\) simplifies to \({{\mathbb {P}}}(X=x_i|\xi _t)\).

The logical step of converting the prior probabilities \({{\mathbb {P}}}(X=x_i)\) into the posterior probability \({{\mathbb {P}}}(X=x_i|\xi _t)\) is captured by the Bayes formula:

$$\begin{aligned} {{\mathbb {P}}}(X=x_i|\xi _t)= & {} \frac{{{\mathbb {P}}}(X=x_i)\rho (\xi _t|X=x_i)}{\sum _{j} {{\mathbb {P}}} (X=x_j) \rho (\xi _t|X=x_j)} . \end{aligned}$$
(5)

Here the conditional density function \(\rho (\xi _t|X=x_i)\) for the random variable \(\xi _t\) is defined by the relation

$$\begin{aligned} {{\mathbb {P}}}\left( \xi _t\le y|X=x_i\right) =\int _{-\infty }^y \rho (\xi |X=x_i)\,\mathrm{d}\xi , \end{aligned}$$
(6)

and is given by

$$\begin{aligned} \rho (\xi |X=x_i)=\frac{1}{\sqrt{2\pi t}} \exp \left( - \frac{(\xi -\sigma x_i t)^2}{2t}\right) . \end{aligned}$$
(7)

This follows from the fact that, conditional on \(X=x_i\), the random variable \(\xi _t\) is normally distributed with mean \(\sigma x_i t\) and variance t. We thus deduce that

$$\begin{aligned} {{\mathbb {P}}}(X=x_i|\xi _t) =\frac{p_i\exp \left( \sigma x_i \xi _t- \frac{1}{2} \sigma ^2 x_i^2 t\right) }{\sum _j p_j \exp \left( \sigma x_j \xi _t-\frac{1}{2} \sigma ^2 x_j^2 t\right) } . \end{aligned}$$
(8)

Inferences based on the use of (8) are optimal in the sense that they minimise the uncertainty concerning the value of X, as measured by the variance or entropic measures subject to the information available. Thus the a posteriori probabilities (8) determine the best estimate for the unknown variable X, in the sense of minimising the error.

In the reduced-form approach it is the conditional probability (8), which is a nonlinear function of the model input \(\{\xi _t\}\), that models the complicated dynamics of the opinion poll statistics. Our objective in this paper is to investigate various properties of the model, as well as to explore different ways in which the model can be exploited.

3 Election dynamics and the arrival of new information

We begin by considering the dynamical evolution of the conditional probability obtained in (8). For this purpose let us introduce a simpler notation by writing \(\pi _{it}={{\mathbb {P}}}(X=x_i|\xi _t)\). Then an application of Ito’s formula on (8) yields

$$\begin{aligned} \mathrm{d}\pi _{it} = \sigma \left( x_i-{{\hat{X}}}_t\right) \pi _{it} \left( \mathrm{d}\xi _t - \sigma {{\hat{X}}}_t \mathrm{d}t \right) , \end{aligned}$$
(9)

where we have written

$$\begin{aligned} {{\hat{X}}}_t = \sum _{i} x_i {{\mathbb {P}}}(X=x_i|\xi _t) \end{aligned}$$
(10)

for the conditional expectation of X given \(\xi _t\).

Examining the dynamics of the conditional probability is of interest because it allows us to isolate the arrival of new information. We remark that the time series \(\{\xi _t\}\) in itself contains new information as well as redundant information that has already been identified. (It is often the case that when one reads a newspaper article, for instance, only a fraction of what one reads is genuinely new—and this tendency is more pronounced with the web-generated news contents, which is becoming mainstream.) That the conditional expectation (10) gives the ‘best’ estimate of X is then reflected in the remarkable fact that at time t, the small increment in the electorate’s belief, represented by \(\mathrm{d}\pi _{it}\), is induced only by the new information. In other words, the small increment \(\mathrm{d}W_t\) of the process \(\{W_t\}\) defined by

$$\begin{aligned} W_t = \xi _t - \sigma \int _0^t {{\hat{X}}}_u \mathrm{d}u \end{aligned}$$
(11)

represents the arrival of new information.

The process \(\{W_t\}\) defined in (11) is known as the innovations process in the theory of signal processing [16]. It can be shown, in fact, that \(\{W_t\}\) is a standard Brownian motion. For this purpose, we recall the Lévy criterion for Brownian motion that if a process \(\{W_t\}\) is a martingale, and if \((\mathrm{d}W_t)^2=\mathrm{d}t\), then \(\{W_t\}\) is a Brownian motion. To show that \(\{W_t\}\) is a martingale, we observe for \(t\ge s\) that

$$\begin{aligned} {{\mathbb {E}}}_s\left[ W_t\right]= & {} {{\mathbb {E}}}_s[\xi _t]-\sigma \, {{\mathbb {E}}}_s\left[ \int _0^t {{\hat{X}}}_u {\mathrm{d}}u\right] \nonumber \\= & {} \sigma t {{\hat{X}}}_s + {{\mathbb {E}}}_s[ B_t] - \sigma \int _0^s {{\hat{X}}}_u \mathrm{d}u - \sigma (t-s) {{\hat{X}}}_s , \end{aligned}$$
(12)

but from the tower property of conditional expectation we have \({{\mathbb {E}}}_s[B_t]={{\mathbb {E}}}_s[B_s]\), so writing \({{\hat{X}}}_s={{\mathbb {E}}}_s[X]\) in (12) we find, on account of \({{\mathbb {E}}}_s[ \sigma s X + B_s] = {{\mathbb {E}}}_s[ \xi _s]=\xi _s\), that

$$\begin{aligned} {{\mathbb {E}}}_s\left[ W_t\right] = {{\mathbb {E}}}_s[ \sigma s X + B_s] - \sigma \int _0^s {{\hat{X}}}_u \mathrm{d}u = W_s \, , \end{aligned}$$
(13)

establishing the martingale condition. Then from \((\mathrm{d}\xi _t)^2=\mathrm{d}t\) we find at once that \((\mathrm{d}W_t)^2=\mathrm{d}t\), from which it follows that \(\{W_t\}\) is a Brownian motion under the measure \({{\mathbb {P}}}\) with respect to the information flow generated by \(\{\xi _t\}\).

With this observation at hand, we infer from (11) that the information process \(\{\xi _t\}\) in the ‘real-world’ probability measure \({{\mathbb {P}}}\) is a drifted Brownian motion, which means that there exists a fictitious probability measure \({{\mathbb {Q}}}\) under which the information process \(\{\xi _t\}\) is itself a Brownian motion. The details of this measure change becomes handy in the analysis below for the probability of a given candidate winning the election. To clarify the relation between the two measures \({{\mathbb {P}}}\) and \({{\mathbb {Q}}}\) let us examine the denominator of the conditional probability obtained in (8), and define it to be

$$\begin{aligned} \Phi _t = \sum _j p_j \exp \left( \sigma x_j \xi _t-\textstyle \frac{1}{2}\sigma ^2 x_j^2 t\right) . \end{aligned}$$
(14)

Then an application of Ito’s formula shows that

$$\begin{aligned} \frac{\mathrm{d}\Phi _t}{\Phi _t} = \sigma {{\hat{X}}}_t \mathrm{d}\xi _t \, , \end{aligned}$$
(15)

from which, upon integration and making use of the initial condition \(\Phi _0=1\), it follows that

$$\begin{aligned} \Phi _t = \exp \left( \sigma \int _0^t {{\hat{X}}}_s \mathrm{d}\xi _s - \textstyle \frac{1}{2}\sigma ^2 \int _0^t {{\hat{X}}}_s^2\mathrm{d}s\right) . \end{aligned}$$
(16)

Therefore, on account of Girsanov’s theorem there exists an equivalent probability measure \({{\mathbb {Q}}}\) over any finite time horizon such that the process \(\{\xi _t\}\) defined by (11) is a standard Brownian motion in the \({{\mathbb {Q}}}\)-measure, where \(\{\Phi _t\}\) is the change-of-measure density martingale. In particular, for any measurable random variable \(Y_t\) the conditional expectations in these two probability measures are related according to

$$\begin{aligned} {{\mathbb {E}}}_s^{{\mathbb {P}}}[Y_t]=\frac{1}{\Phi _s}{\mathbb E}_s^{{\mathbb {Q}}} [\Phi _t Y_t] \quad \mathrm{and}\quad {\mathbb E}_s^{{\mathbb {Q}}}[Y_t] = \Phi _s{{\mathbb {E}}}_s^{{\mathbb {P}}} \left[ \frac{1}{\Phi _t} Y_t\right] . \end{aligned}$$
(17)

4 Probability of winning an election

We have derived in (8) the dynamical process for the a posteriori probability that the preferred candidate is the k-th one. With this at hand we can ask a range of quantitative questions, for instance, given that the upcoming election is in T years time, what is the probability that candidate k will secure more than K% of the votes. We now address this question in the simple case where there are only two dominant candidates. It is worth remarking that such a question cannot be answered without a dynamical model at hand.

For a two-candidate election (or, equivalently, for a ‘yes-no’ referendum), we may label the candidates using the binary system by calling them 0 and 1. In other words, we let \(x_0=0\) and \(x_1=1\); and accordingly, for the a priori probabilities we set \(p_0=p\) and \(p_1=1-p\). Then the conditional expectation of the random variable X is given by

$$\begin{aligned} {{\hat{X}}}_t = \sum _{i=0}^1 x_i {{\mathbb {P}}}(X=x_i|\xi _t) = \frac{(1-p)\exp \left( \sigma \xi _t- \frac{1}{2} \sigma ^2 t\right) }{p + (1-p) \exp \left( \sigma \xi _t-\frac{1}{2} \sigma ^2 t\right) } , \end{aligned}$$
(18)

which has the interpretation of representing the a posteriori expectation of the percentage share of the votes for candidate 1, because in the binary case we have \({{\hat{X}}}_t=\pi _{1t}\).

We now examine the a priori probability \({{\mathbb {P}}}({{\hat{X}}}_T>K)\) that candidate 1 will get more than \(K\%\) of the votes in the election to take place at time T in the future. We shall make use of the fact that

$$\begin{aligned} {{\mathbb {P}}}\left( {{\hat{X}}}_T>K\right) = {{\mathbb {E}}}\left[ {\mathbb {1}}\{ {{\hat{X}}}_T>K\} \right] , \end{aligned}$$
(19)

that is, the probability of an event can be calculated by taking the expectation of the indicator function for that event. To calculate the expectation in the right side of (19) we shall change the measure \({{\mathbb {P}}} \rightarrow {{\mathbb {Q}}}\) by use of the density martingale

$$\begin{aligned} \Phi _t = p + (1-p) \exp \left( \sigma \xi _t-\textstyle \frac{1}{2}\sigma ^2 t\right) . \end{aligned}$$
(20)

Then we have

$$\begin{aligned} {{\mathbb {P}}}\left( {{\hat{X}}}_T>K\right) = {{\mathbb {E}}}^{{\mathbb {Q}}}\left[ \Phi _T \, {\mathbb {1}}\{ {{\hat{X}}}_T>K\} \right] . \end{aligned}$$
(21)

We now observe that \({{\hat{X}}}_T\) is an increasing function of \(\xi _t\). Thus the condition \({{\hat{X}}}_T>K\) is equivalent to a condition on \(\xi _T\). This can be worked out explicitly. We have

$$\begin{aligned} (1-p)\exp \left( \sigma \xi _T-\textstyle \frac{1}{2}\sigma ^2 T\right) > K \left[ p + (1-p) \exp \left( \sigma \xi _T-\textstyle \frac{1}{2}\sigma ^2 T\right) \right] , \end{aligned}$$
(22)

from which it follows that \({{\hat{X}}}_T>K\) holds if and only if \(\xi _T>z^*\sqrt{T}\), where

$$\begin{aligned} z^* = \frac{ \log \left( \frac{pK}{(1-p)(1-K)}\right) + \frac{1}{2}\sigma ^2 T}{\sigma \sqrt{T}} . \end{aligned}$$
(23)
Fig. 1
figure 1

Winning likelihood. The probability that candidate 1 will win the election in a year (\(T=1\)), as a function of the a priori probability \(p\in (0,1)\) and the information flow rate \(\sigma \in (0,0.75)\). In the limit \(\sigma \rightarrow 0\) where the future uncertainty is large, the probability approaches a step function, whereas in the opposite limit \(\sigma \rightarrow \infty \) the probability approaches a linear function of the current poll represented by p

As indicated above, under the measure \({{\mathbb {Q}}}\) the information process \(\{\xi _t\}\) is a standard Brownian motion, so we deduce that

$$\begin{aligned} {{\mathbb {P}}}\left( {{\hat{X}}}_T>K\right) = \frac{1}{\sqrt{2\pi }} \int _{z^*}^\infty \mathrm{e}^{-\frac{1}{2}z^2} \left( p + (1-p) \mathrm{e}^{\sigma \sqrt{T} z -\frac{1}{2} \sigma ^2 T} \right) \mathrm{d}z . \end{aligned}$$
(24)

Therefore, if we define \(d^-=-z^*\) and \(d^+=\sigma \sqrt{T}-z^*\), or more explicitly,

$$\begin{aligned} d^\pm = \frac{ \log \left( \frac{(1-p)(1-K)}{pK}\right) \pm \frac{1}{2}\sigma ^2 T}{\sigma \sqrt{T}} , \end{aligned}$$
(25)

then we have

$$\begin{aligned} {{\mathbb {P}}}\left( {{\hat{X}}}_T>K\right) = p\, N(d^-) + (1-p)\, N(d^+) , \end{aligned}$$
(26)

where

$$\begin{aligned} N(x) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^{x} \mathrm{e}^{-\frac{1}{2}z^2} \mathrm{d}z \end{aligned}$$
(27)

denotes the cumulative normal distribution function. By setting \(K=1/2\) in (26) we thus obtain the probability that candidate 1 will win the election. This is sketched in Fig. 1. We remark, incidentally, that the formula (26) thus obtained for the probability that a given candidate winning the election is essentially identical to the pricing formula in financial markets for an option on a stock in the Black–Scholes model [25].

Fig. 2
figure 2

Winning likelihood. The probability that candidate 1 will win the election in a year (\(T=1\)), as a function of the current popularity rate \(1-p\) for the candidate. If today’s poll were the predictor for the winning probability, then the function should be a straight line shown here (in brown). However, according to the information-based model we see that the actual likelihood of winning the election is higher than what today’s poll might suggest if \(1-p>\frac{1}{2}\); and conversely lower than what the poll suggests if \(1-p<\frac{1}{2}\). Two examples are shown, corresponding to the values \(\sigma =0.15\) (in purple) and \(\sigma =0.95\) (in magenta) for the information flow rate. The result shows that the deviation away from the poll indicator is greater if future uncertainty is greater (smaller \(\sigma \))

It is interesting to examine more closely the behaviour of the probability (26) of candidate 1 winning the future election. For instance, in the limit \(\sigma \rightarrow 0\) the probability as a function of the current poll approaches a step function. That is, if, say, candidate 1 has 51% of support today, then in this limit the probability of candidate 1 winning the future election approaches one. This is because in the limit where the information flow rate going to zero, no information relevant to the election will be revealed. Hence, in the absence of any further information, the current state will be the future state, i.e. 51% of the voters will vote for candidate 1, and hence the probability of winning approaches one.

In reality, of course, information unravels, resulting in the dynamical evolution of the poll. In Fig. 2 we plot the cross-section of (26) for two values of \(\sigma \). If the current poll were the reflection of the election predictor, then the probability of a given candidate winning the future election would be a linear function of the current popularity, i.e. the current support rate equals the likelihood of election victory. However, according to the information-based model, the correspondence is nonlinear. In particular, if today’s support rate of a candidate is greater than 50%, then the likelihood of that candidate winning the future election is always greater than what is suggested by today’s poll, and conversely if the current support rate is less than 50% then the actual likelihood of winning is smaller than what is suggested by today’s poll. Furthermore, the gap between today’s poll and the winning likelihood increases as the future uncertainty increases.

5 Predicting the election outcome

In the foregoing analysis, we have introduced an abstract random variable X that represents in some sense the ‘preferred’ candidate. Consequently, the conditional expectation \({{\hat{X}}}_t\) of X does not converge to any one of the candidates because the variability in public opinion remains wide leading up to the election day. From the viewpoint of a campaign manager, an election pollster, an election pundit, or a betting agency involved in elections, however, what matters is not so much about an abstract idea of which candidate might ultimately be judged by history to be the most ideal candidate. What matters to them is the more concrete notion of who might actually win the election.

To model the prediction of an election we require a dynamical version of the winning probability (26). To keep the discussion simple, let us for the moment continue on the assumption that there are only two candidates: candidate 0 and candidate 1. Then the probability that candidate 1, say, winning the election has to converge to either zero or one, depending on the election outcome. In other words, we need to consider the probability

$$\begin{aligned} {{\mathbb {P}}}_t\left( {{\hat{X}}}_T>K\right) = {{\mathbb {E}}}_t\left[ {\mathbb {1}}\{ {{\hat{X}}}_T>K\} \right] \end{aligned}$$
(28)

conditional on the information available up to time t. This conditional probability process will then evolve in a random manner such that it will converge to either zero or one, as the election day approaches, i.e. as \(t\rightarrow T\).

As a matter of clarification we remark that (28) represents a dynamical extension of the a priori probability (26) in the sense that while (26) with \(K=1/2\) represents the current (at time \(t=0\)) probability that candidate 1 will secure a win on the election day at \(t=T\), this probability will change from day to day dynamically in accordance with the revelation of new information, so in particular given the information available at time t the winning probability changes to (28) with \(K=1/2\).

To work out the conditional expectation in (28) we begin by remarking that the model under consideration entails a dynamic consistency in the following sense. Suppose that information has been gathered until time \(s\in [0,t]\) so that the a priori probability \(p_i\) has changed into the a posteriori probability \(\pi _{is}={{\mathbb {P}}}(X=x_i|\xi _s)\). Then starting from this point, and given the knowledge of the past \(\{\xi _u\}_{0\le u\le s}\), the reinitialised information process commencing from time s, according to the original model (2) for information flow, will take the form

$$\begin{aligned} \xi _{st} = \xi _t-\xi _s = \sigma X (t-s) + (B_t-B_s) . \end{aligned}$$
(29)

Thus, starting from time s, what was the a posteriori probability \(\pi _{is}\) now becomes the a priori probability for the future times \(t\ge s\), so according to the logic leading to (8) the new a posteriori probability \(\pi _{it}={{\mathbb {P}}}(X=x_i|\xi _{t})\) should be of the form

$$\begin{aligned} \pi _{it} =\frac{\pi _{is}\exp \left( \sigma x_i \xi _{st}- \frac{1}{2} \sigma ^2 x_i^2 (t-s)\right) }{\sum _j \pi _{js} \exp \left( \sigma x_j \xi _{st}-\frac{1}{2} \sigma ^2 x_j^2 (t-s)\right) } . \end{aligned}$$
(30)

Substituting the expressions for \(\xi _{st}\) and \(\pi _{is}\) in (30), a short calculation shows that the resulting expression indeed agrees with that obtained in (8), thus establishing dynamic consistency of the model.

The dynamic consistency implies that to work out the conditional expectation in (28) it suffices to recycle the calculation leading up to (26) instead of performing a direct calculation making use of the time-dependent version of the measure change rule \({{\mathbb {E}}}_t^{{\mathbb {P}}}[Y_T]={{\mathbb {E}}}_t^{{\mathbb {Q}}} [\Phi _T Y_T]/\Phi _t\). In particular, we deduce at once that

$$\begin{aligned} {{\mathbb {P}}}_t\left( {{\hat{X}}}_T>K\right) = \pi _t\, N(d_t^-) + (1-\pi _t)\, N(d_t^+) , \end{aligned}$$
(31)

where \(\pi _t={{\mathbb {P}}}(X=0|\xi _{t})\) and

$$\begin{aligned} d_t^\pm = \frac{ \log \left( \frac{(1-\pi _t)(1-K)}{\pi _t K}\right) \pm \frac{1}{2}\sigma ^2 (T-t)}{\sigma \sqrt{T-t}} . \end{aligned}$$
(32)

Setting \(K=1/2\) in (31) we obtain the a posteriori probability that candidate 1 will win the election to take place at time T, and this indeed converges to either zero or one, depending on how information unravels along the way. Figure 3 shows five sample-path simulations of the conditional probability process. We remark, incidentally, that if there are more than two candidates, say, N candidates competing, then the corresponding a posteriori probability at time t that the kth candidate winning the election to take place at time T is determined by computing \({{\mathbb {P}}}_t(\pi _{kT}>N^{-1})\).

Fig. 3
figure 3

Dynamics of election prediction. The probability \({{\mathbb {P}}}_t({{\hat{X}}}_T>1/2)\) at time t that candidate 1 will win the election to take place at time T is simulated. Five sample paths are shown here, with the parameter values \(p=0.48\), \(\sigma =0.5\), and \(T=1\)

6 Impact of disinformation: when to release fake news?

The information-based model for election naturally allows for a generalisation to model situations in which there are disinformation in circulation, with a deliberate intent to obscure the true value of X. In particular, in [10] we have defined what one might mean by ‘fake news’ as a modification of the information process (2) in the following form:

$$\begin{aligned} \eta _t = \sigma X t + B_t + F_t, \end{aligned}$$
(33)

where the term \(\{F_t\}\), which has a bias so that \({{\mathbb {E}}}[F_t]\ne 0\), models deliberate disinformation. The idea can be described as follows. There are unfounded rumours and speculations obscuring the value of X, but a large number of such random speculations will average over to give rise to an unbiased noise so that \({{\mathbb {E}}}[B_t]=0\). In other words, while noise will interfere with giving an accurate estimate for X, it is not directed in any particular orientation; whereas fake news can be distinguished from conventional noise by its desire to disorient the public. Thus, those who are not aware of the existence of \(\{F_t\}\) will arrive at their estimates based on formula (8), but with the distorted information \(\{\eta _t\}\) in place of \(\{\xi _t\}\). In other words, they will ‘perceive’ the information as taking the normal form (2) and proceed to make appropriate inferences based on (8); but their inferences are now skewed owing to the existence of \(\{F_t\}\).

In the information-based model, the effect of disinformation can be understood in an intuitive and transparent manner: if \({{\mathbb {E}}}[F_t]>0\) then people (unaware of the existence of \(\{F_t\}\)) are misguided to thinking that the true value of X is greater than what it really is; and similarly, if \({{\mathbb {E}}}[F_t]<0\) then people are misguided to thinking that the true value of X is less than what it really is. By choosing specific models for the process \(\{F_t\}\) one can therefore apply a simulation study to determine, for a given choice of \(\{F_t\}\), how the opinion-poll statistics might be affected in that situation.

From the viewpoint of those who wish to disseminate disinformation, the most obvious question that arises is: how to find optimum choice for \(\{F_t\}\)? Evidently, the notion of optimality depends on the choice of the criteria, but in the present context perhaps the most natural one is that maximises the probability of a given candidate winning the election. In general, finding a solution to this question requires solving a new type stochastic optimisation problem that combines both (a) the theory of signal detection and in particular nonlinear filtering [16], and (b) the theory of change-point detection problem [26]. Thus, we encounter a situation here in which a new type of problem in society leads to a new type of mathematical challenge.

Here we shall analyse this problem in a simple setup in which there is only one chance for disseminating fake news. Hence the problem is to find the optimal timing to release fake news so as to maximise its impact on the upcoming election. We shall assume, in particular, a model for fake news of the form \(F_t = \mu (t-\tau ) \, \mathrm{e}^{-\alpha (T-\tau )}\, {\mathbb {1}}\{t>\tau \}\), where \(\mu \) and \(\alpha >0\) are constants, \(\tau \) denotes the time at which fake news are released, and \({\mathbb {1}}\{t>\tau \}\) as before denotes the indicator function so that \({\mathbb {1}}\{t>\tau \}=0\) if \(t\le \tau \) and \({\mathbb {1}}\{t>\tau \}=1\) otherwise. This choice of \(F_t\) has the interpretation that when fake news are released at time \(\tau \), initially their strengths grow linearly in time at the rate \(\mu \), but over time the strengths of fake news get suppressed exponentially at the rate \(\alpha \).

Let us consider the problem in the context of a two-candidate election. Recall that in the absence of disinformation the probability of candidate 1 securing an election win has been worked out in (26) with \(K=1/2\). If, however, disinformation is disseminated in such a way that the voters are unaware of this, then this probability is altered in the following way. Noting that (26) has been obtained by assuming a genuine information flow \(\{\xi _t\}\), if in reality this is replaced by \(\{\eta _t\}\), then the threshold value \(z^*\) of (23) is now replaced by \(z^*-F_T/\sqrt{T}\). This follows on account of the fact that the original condition was \(\xi _T>z^*\sqrt{T}\), but in the presence of disinformation \(\xi _T\) is replaced by \(\xi _T+F_T\); hence the condition now reads \(\xi _T>(z^*-F_T/\sqrt{T})\sqrt{T}\). Accordingly, the variables \(d^\pm \) are replaced by \(d^\pm +F_T/\sqrt{T}\). The effect of this is to increase (respectively, decrease) the winning probability \({{\mathbb {P}}}({{\hat{X}}}_T>1/2)\) for candidate 1 if \(F_T>0\) (respectively, if \(F_T<0\)). Thus, in principle one can optimise the form of \(\{F_t\}\) so as to maximise (or minimise) the probability \({{\mathbb {P}}}({{\hat{X}}}_T>1/2)\). However, in reality, owing to the Markovianity of \(\{\xi _t\}\) along with the assumption that the existence of \(\{F_t\}\) is hidden from the voters, maximum impact is obtained in this case by simply maximising \(F_T\). In the case of the model \(F_t=\mu (t-\tau )\, \mathrm{e}^{-\alpha (T-\tau )}\, {\mathbb {1}}\{t>\tau \}\), therefore, to achieve a maximum impact on the election outcome, fake news should be released at

$$\begin{aligned} \tau ^* = \frac{\alpha T-1}{\alpha } \end{aligned}$$
(34)

if \(\alpha >T^{-1}\), and at \(\tau ^*=0\) if \(\alpha \le T^{-1}\).

Fig. 4
figure 4

Winning likelihood. The probability that candidate 1 will win the election to take place in one year (\(T=1\)), as a function of the timing \(\tau \) at which fake news in favour of candidate 1 are released, is plotted, when we set \(\alpha =5\). The figure shows that by releasing fake news four-fifth of the way into the election period, the probability of candidate 1 winning the election can be enhanced by about 5.5% (from about 47.3% to about 52.8%). Other model parameters are: \(\sigma =0.3\) for the information flow rate, \(p=0.505\) for the a priori probability that candidate 0 wins the election, and \(\mu =1.5\) for the fake news strength

In Fig. 4 we plot the probability \({{\mathbb {P}}}({{\hat{X}}}_T>1/2)\) of candidate 1 winning the election in the presence of fake news as a function of the release timing \(\tau \). For the parameter choices made therein, we see that in the absence of fake news, if the a priori probability (today’s opinion poll) for candidate 1 to win the election is \(1-p=49.5\)%, then the actual probability (today’s prediction) of winning the election one year later is about 47.3%. However, the release of fake news in favour of candidate 1, if undetected by the voters, will enhance this likelihood whenever it is released prior to the election day. If, in particular, the release timing is optimised, then the probability can be enhanced by as much as 5.5% (in this example), perhaps just sufficient to overcome statistical uncertainties for candidate 1 to secure a victory. The specific figures mentioned here are of course based on an arbitrarily chosen model parameters, but the model clearly illustrates the impact of fake news in an intuitive manner, and allows for a more comprehensive impact studies, scenario analysis, as well as analysis on parameter sensitivity; the results of which we hope will then be fed into the development of counter measures to tackle the impact of fake news.

7 Managing information flow in an election campaign

In the previous section, and more generally in [10], we examined the impact of the dissemination of disinformation on the dynamics of the opinion poll statistics. It should be apparent, however, that the framework is not restricted to modelling disinformation. For instance, if the campaign team is confident about the value of X, then they can proactively promote (or perhaps to hide) relevant information, but not in secret. In the simplest situation one could set \(F_t=\kappa X (t-\tau ) {\mathbb {1}}\{t>\tau \}\) for some \(\kappa >0\), where \(\{F_t\}\) now represents genuine information. Then from time \(\tau \) onwards the voters will obtain more reliable information about X than otherwise. This is equivalent to having a time-dependent information flow rate \(\sigma _t\) such that \(\sigma _t=\sigma \) for \(t\le \tau \) and \(\sigma _t=\sigma +\kappa \) for \(t>\tau \). More generally, we may consider a generic time-dependent information flow rate \(\sigma _t\). If the campaign team is in the position to control certain information, then they would naturally like to optimise the way in which information is managed so as to maximise the chances of their candidate winning the election. (In fact, as explained in the final section below, such a situation is very natural within the structural approach; whereas in the reduced-form approach the random variable X is to an extent an abstraction so it is not always apparent whether the campaign team can manage information on X. However, because the mathematical treatment of the problem in either of the frameworks is the same, and because the present paper is concerned with the reduced-form model, we shall proceed to develop the idea here, with the caveat that practical implementations of the ideas presented in this section are more suitable within the structural framework.)

In the general case for which the information flow rate is time dependent, the information process takes the form

$$\begin{aligned} \xi _t = X \int _0^t \sigma _s \mathrm{d}s + B_t . \end{aligned}$$
(35)

In this case, the information process \(\{\xi _t\}\) is no longer Markovian so that we do not have the simplifying reduction \({{\mathbb {P}}}(X=x_i|\{\xi _s\}_{0\le s\le t}) \rightarrow {{\mathbb {P}}}(X=x_i|\xi _t)\). In other words, the conditional probability for the random variable X is now path dependent. To work out the posterior probabilities we begin by remarking that if we define the process \(\{\Phi _t\}\) according to

$$\begin{aligned} \Phi _t = \exp \left( X\int _0^t \sigma _s \mathrm{d}\xi _s - \frac{1}{2}\int _0^t\sigma _s^2\mathrm{d}s \right) , \end{aligned}$$
(36)

then we can use the unit-initialised martingale \(\{\Phi _t\}\) to change probability measure \({{\mathbb {P}}}\rightarrow {{\mathbb {Q}}}\) with the property that (i) the information process \(\{\xi _t\}\) is a Brownian motion under \({{\mathbb {Q}}}\) independent of X; (ii) the random variable X has the same probability law under \({{\mathbb {Q}}}\) as it does under \({{\mathbb {P}}}\); and (iii) the conditional expectation \(f_t={{\mathbb {E}}}^{{\mathbb {P}}}[f(X)|\{\xi _s\}_{0\le s\le t}]\) for a function of the random variable X can be obtained by use of the Kallianpur–Striebel [27] formula

$$\begin{aligned} f_t = \frac{{{\mathbb {E}}}^{{\mathbb {Q}}}[f(X) \Phi _t|\{\xi _s\}_{0\le s\le t}]}{{{\mathbb {E}}}^{{\mathbb {Q}}}[\Phi _t|\{\xi _s\}_{0\le s\le t}]} . \end{aligned}$$
(37)

By setting \(f(X)={\mathbb {1}}\{X=x_i\}\) in (37) we thus deduce the expression for the conditional probability \(\pi _{it}= {{\mathbb {P}}}(X=x_i|\{\xi _s\}_{0\le s\le t})\) with time-dependent information flow:

$$\begin{aligned} \pi _{it} = \frac{p_i \exp \left( x_i \int _0^t \sigma _s \mathrm{d}\xi _s - \frac{1}{2}x_i^2\int _0^t \sigma _s^2 \mathrm{d}s\right) }{\sum _j p_j \exp \left( x_j \int _0^t \sigma _s \mathrm{d}\xi _s - \frac{1}{2}x_j^2\int _0^t \sigma _s^2 \mathrm{d}s\right) } , \end{aligned}$$
(38)

from which the best estimate \({{\hat{X}}}_t\) for X can be obtained according to \({{\hat{X}}}_t = \sum _i x_i \pi _{it}\).

Let us proceed further by considering, as in Sect. 4, the case in which there are only two candidates. Then we have

$$\begin{aligned} {{\hat{X}}}_T = \frac{(1-p) \exp \left( \int _0^T \sigma _s \mathrm{d}\xi _s - \frac{1}{2}\int _0^T \sigma _s^2 \mathrm{d}s\right) }{p + (1-p) \exp \left( \int _0^T \sigma _s \mathrm{d}\xi _s - \frac{1}{2}\int _0^T \sigma _s^2 \mathrm{d}s\right) } , \end{aligned}$$
(39)

which is an increasing function of a single Gaussian random variable \(\int _0^T \sigma _s \mathrm{d}\xi _s\). It follows that \({{\hat{X}}}_T>K\) holds if and only if

$$\begin{aligned} \frac{\int _0^T \sigma _s \mathrm{d}\xi _s}{\sqrt{\int _0^T \sigma _s^2 \mathrm{d}s}} > \frac{ \log \left( \frac{pK}{(1-p)(1-K)}\right) + \frac{1}{2}\int _0^T \sigma _s^2 \mathrm{d}s}{\sqrt{\int _0^T \sigma _s^2 \mathrm{d}s}} . \end{aligned}$$
(40)

On account of the relation \({{\mathbb {P}}}({{\hat{X}}}_T>K)= {{\mathbb {E}}}^{{\mathbb {Q}}}[\Phi _T {\mathbb {1}}\{{{\hat{X}}}_T>K\}]\), where \(\{\Phi _T\}\) is the denominator of (39), a short calculation then shows that

$$\begin{aligned} {{\mathbb {P}}}\left( {{\hat{X}}}_T>K\right) = p\, N(d^-) + (1-p)\, N(d^+) , \end{aligned}$$
(41)

where

$$\begin{aligned} d^\pm = \frac{ \log \left( \frac{(1-p)(1-K)}{pK}\right) \pm \frac{1}{2}\int _0^T \sigma _s^2 \mathrm{d}s}{\sqrt{\int _0^T \sigma _s^2 \mathrm{d}s}} . \end{aligned}$$
(42)

Setting \(K=1/2\) in (41) we thus obtain the probability that candidate 1 wins the future election, when information flow rate \(\{\sigma _t\}\) is time dependent.

From the viewpoint of the campaign team for candidate 1, if they had the ability to control the rate of information flow, then it would be desirable to find a \(\{\sigma _t\}\) that maximises the success probability (41). The basic idea can already be inferred from inspecting Fig. 2 in the case of constant \(\sigma \): depending on the value of the a priori probability p, one would like to either let \(\sigma \rightarrow \infty \) or to let \(\sigma \rightarrow 0\) so as to impact the probability of a given candidate winning the future election. These extreme cases, however, are unrealistic (even in the structural model): release of information (marketing) is in general costly. It is therefore reasonable to assume that, if the campaign period is [0, T], then the number \(\int _0^T \sigma _s\, \mathrm{d}s\) is strictly bounded.

With this in mind, we examine how the winning probability \(P= {{\mathbb {P}}}({{\hat{X}}}_T>1/2)\) for candidate 1 changes as the information flow rate \(\sigma _t\) is varied. To this end let us consider the functional derivative of P with respect to \(\sigma _t\). That is, regarding \(P=P[\sigma ]\) as a functional of \(\sigma \), we consider perturbing \(\sigma _t\) by a small amount \(\epsilon \) in the direction of \(\phi _t\):

$$\begin{aligned} \frac{\delta P}{\delta \sigma } = \lim _{\epsilon \rightarrow 0} \frac{P[\sigma +\epsilon \phi ]-P[\sigma ]}{\epsilon } . \end{aligned}$$
(43)

Then a short calculation shows that

$$\begin{aligned} \frac{\delta P}{\delta \sigma } =\frac{1}{\sqrt{2\pi }} \left[ p\, \mathrm{e}^{-\frac{1}{2}(d^-)^2} \, \frac{\delta d^-}{\delta \sigma } + (1-p)\, \mathrm{e}^{-\frac{1}{2}(d^+)^2} \, \frac{\delta d^+}{\delta \sigma } \right] , \end{aligned}$$
(44)

where

$$\begin{aligned} \frac{\delta d^\pm }{\delta \sigma } = \frac{\int _0^T \sigma _s \phi _s \mathrm{d}s}{\sqrt{\int _0^T\sigma _s^2\mathrm{d}s}} \left[ \frac{\log \left( \frac{pK}{(1-p)(1-K)}\right) }{\int _0^T \sigma _s^2\mathrm{d}s} \pm 1\right] . \end{aligned}$$
(45)

From (44) we deduce that the perturbation on P is proportional to the inner product \(\int _0^T \sigma _s \phi _s \mathrm{d}s\) between \(\{\sigma _t\}\) and \(\{\phi _t\}\). Therefore, for a given \(\{\sigma _t\}\) one can explore how it may be perturbed so as to either increase or decrease P. In practical applications, however, it is likely that one would be working with a parametric family of models for the information flow rates \(\{\sigma _t\}\), and in this case optimisation can be achieved with normal differentiation, not with a functional derivative.

We note, more generally, that the way in which the winning probability P changes against a perturbation of the information flow rate today will not be the same as what it is tomorrow. Indeed, the a posteriori probability \(\pi _t={{\mathbb {P}}}(X=x_0|\{\xi _s\}_{0\le s\le t})\) tomorrow may change from the a priori probability p today in such a way that while there is an advantage in increasing \(\sigma \) today, it would be advantageous to decrease \(\sigma \) tomorrow. In other words, what we require is a dynamical version of (44) based on the conditional version of the winning probability: \(P_t={{\mathbb {P}}}({{\hat{X}}}_T>1/2|\{\xi _s\}_{0\le s\le t})\). This can be worked out straightforwardly, and we deduce that the result takes a form identical to (44), except that p is replaced by \(\pi _t\), \(\int _0^T \sigma _s^2\mathrm{d}s\) is replaced by \(\int _t^T \sigma _s^2\mathrm{d}s\), and \(\int _0^T \sigma _s \phi _s \mathrm{d}s\) is replaced by \(\int _t^T \sigma _s \phi _s \mathrm{d}s\).

8 Sensitivity analysis

Another aspect of the information-based model that will be useful to explore for strategic planning is concerned with parameter sensitivity. If, for instance, releasing of information is costly (e.g., advertising cost), and if the result is not likely to significantly change the state of affairs, then it may not be advantageous to proceed with the release. For such a consideration, knowledge of the parameter sensitivity of the model will help in assisting decision making.

In the present context, we are interested, in particular, in the parameter sensitivity of the probability distribution of the future random variable \({{\hat{X}}}_T\). Let us therefore begin by working out the a priori density function for \({{\hat{X}}}_T\). Recalling that the density function \(\rho (x)\) for \({{\hat{X}}}_T\) is given by \(\rho (x)= \mathrm{d}{{\mathbb {P}}}({{\hat{X}}}_T<x)/\mathrm{d}x\), we deduce, by differentiating (26), that

$$\begin{aligned} \rho (x) = \frac{1}{\sqrt{2\pi \sigma ^2T} x(1-x)} \left[ p\,\mathrm{e}^{-\frac{1}{2}(d^-)^2} + (1-p) \mathrm{e}^{-\frac{1}{2}(d^+)^2} \right] , \end{aligned}$$
(46)

where

$$\begin{aligned} (d^\pm )^2 = \left( \frac{ \log \left( \frac{(1-p)(1-x)}{px}\right) }{\sigma \sqrt{T}} \right) ^2 + \frac{1}{4} \sigma ^2 T \pm \log \left( \frac{(1-p)(1-x)}{px}\right) . \end{aligned}$$
(47)

In Fig. 5 we plot the density function \(\rho (x)\) for different values of the parameter \(\sigma \). It may not be immediately apparent from (46) that \(\rho (x)\) is a density function satisfying the normalisation condition \(\int _0^1\rho (x)\mathrm{d}x=1\) (although by definition \(\rho (x)\) clearly is a density function), but in fact \(\rho (x)\) is a Gaussian mixture density in a transformed variable. To see this, let use change the variable \(x\rightarrow u\) according to

$$\begin{aligned} u = \log \left( \frac{1-x}{x}\right) , \end{aligned}$$
(48)

and likewise to simplify the notation let us define \(\beta =\log [(1-p)/p]\). Then it should be immediately apparent that while x varies from 0 to 1, u varies from \(\infty \) to \(-\infty \). Together with the fact that \(\mathrm{d}u = -\mathrm{d}x /x(1-x)\) we thus deduce that the density function f(u) for the transformed variable \(U_T=\log [(1-{{\hat{X}}}_T)/{{\hat{X}}}_T]\) takes the form

$$\begin{aligned} f(u) = \frac{1}{1+\mathrm{e}^\beta } \frac{1}{\sqrt{2\pi \sigma ^2T}} \left[ \mathrm{e}^{-\frac{1}{2\sigma ^2T} \left( u - (\beta - \sigma ^2 T/2)\right) ^2} + \mathrm{e}^\beta \mathrm{e}^{-\frac{1}{2\sigma ^2T} \left( u - (\beta + \sigma ^2 T/2)\right) ^2} \right] , \nonumber \\ \end{aligned}$$
(49)

which we recognise at once to be a normalised Gaussian mixture density.

Fig. 5
figure 5

Density function for \({{\hat{X}}}_T\). The a priori probability density \(\rho (x)\) for the random variable \({{\hat{X}}}_T\) is shown here for different values of the information flow-rate parameter \(\sigma \in [0.25,0.75]\). Other model parameters are \(p=0.5\) and \(T=1\)

With the expression for the density function at hand we are in the position to address the question regarding parameter sensitivity. We are interested, in particular, in the sensitivity of the model against the information flow rate \(\sigma \) and the current poll statistics p, for, \(\sigma \) can be directly linked to the advertisement cost, while the value of p can be deduced from pollsters. For this analysis we shall borrow ideas from information geometry (see, e.g., [28, 29]) to determine the parameter sensitivity, i.e. we shall examine the Fisher information matrix (or the Fisher–Rao metric) associated with the parameters \(\sigma \) and p. The Fisher–Rao information metric \(G_{ij}(\sigma ,p)\) is useful inasmuch as it introduces the notion of a metric in the space of density functions that determines the relative separation of the densities for different parameter values. In other words, the amount of impact caused by changing, say, the value of the information revelation rate from \(\sigma \) to \(\sigma '\) is in general not related to the naive difference \(\sigma -\sigma '\). Instead, it is measured in terms of the geodesic distance associated with the Fisher–Rao information metric. Writing \(\theta ^1=\sigma \) and \(\theta ^2=p\), the Fisher information is determined by expression

$$\begin{aligned} G_{ij}(\sigma ,p) = \int _{-\infty }^\infty \frac{1}{f(u)} \frac{\partial f(u)}{\partial \theta ^i}\frac{\partial f(u)}{\partial \theta ^j} \mathrm{d}u . \end{aligned}$$
(50)

It should be noted here that the Fisher–Rao metric, unlike invariant quantities such as the curvature, is not strictly invariant under the variable transformation \(X_T\rightarrow U_T\). However, under certain regularity conditions, Chentsov’s theorem shows that the Fisher information is, up to scaling, the only invariant metric under certain statistically important transformations [30]. Hence, without loss but for the gain of computational simplicity we shall investigate the Fisher information associated with the density function f(u), rather than that for \(\rho (x)\). Specifically, if we substitute (49) in (50), then after a short calculation we deduce that

$$\begin{aligned} G_{ij}(\sigma ,p) = \left( \begin{array}{cc} T + \frac{2}{\sigma ^2} &{} \frac{2p-1}{\sigma p(1-p)} \\ \frac{2p-1}{\sigma p(1-p)} &{} ~~\frac{1}{p(1-p)}\left( 1+ \frac{1}{\sigma ^2Tp(1-p)} \right) \end{array} \right) . \end{aligned}$$
(51)

On inspection of the one-dimensional subspace parameterised by the information revelation rate \(\sigma \), the corresponding Fisher information is just a constant T plus the the Fisher information \(2/\sigma ^2\) associated with the normal density function of standard deviation \(\sigma \), and this is decreasing in \(\sigma \), indicating the that the smaller the \(\sigma \) is, the greater is the impact (associated with changing the value of \(\sigma \)) on the distribution of the projected future values of X. Putting the matter differently, if there is already a lot of reliable information being transmitted to the electorate, then spending campaign funding on further advertisements is probably not advisable because it will entail little additional impact; whereas if there is very little information being transmitted, then it is worth engaging in an additional advertisement campaign. This conclusion may be intuitively obvious, however, what is less obvious from intuition alone is where one draws the line between these two extremes. The advantage of working out the Fisher information is that it provides a quantitative measure that allows one to analyse such a problem. In particular, as regards the sensitivity of the distribution on the information revelation parameter, the geodesic distance between the densities associated with the parameter values \(\sigma \) and \(\sigma '\) can be worked out in closed form (see [29]), which can be used to conduct a quantitative analysis on the cost-benefit analysis.

9 Towards election planning: structural approach

The two modelling approaches introduced in [10] are similar to the structural and reduced-form models used in credit-risk management in finance and investment banking context [31]. In the financial context, reduced-form models are commonly used because for a given financial contract, the number of cash flows and the number of independent market factors related to it are typically so vast that it is impractical to even attempt to dissect the product so as to identify its structural details. In contrast, in the reduced-form approach it is possible to capture abstractly all the qualitative features resulting from structural models, without going into any of the structural details, and this makes the reduced-form approach more practical to implement. In the context of election modelling, on the other hand, the structural approach is entirely feasible, for, (a) typically in an election there are only a small number of plausible candidates; and (b) the number of important independent issues relevant to a large number of voters is also relatively small.

In the present paper we have focused our attention on reduced-form models because (i) it captures all the qualitative features arising from structural models, thus making it an ideal framework to conduct academic study of the system; (ii) one can pursue mathematical analysis quite far without cluttering it with all the structural details; and (iii) the mathematical formalisms and derived formulae carry through directly to structural models. However, if one were to apply the information-based framework as a part of election planning, then structural models are considerably more advantageous, for, there is a degree of abstraction in the reduced-form model that, in some situations, makes it difficult to implement in practice. For instance, in the discussion in Sect. 7 on controlling the flow of information, this is feasible if the information is concerned specifically with the candidate’s own positions on key issues, which would be the case in the structural approach. An analogous statement can be made on the material considered in Sect. 8. Thus, while the features of the information-based model investigated here for electoral competition can be well explored within the reduced form approach presented above, for a realistic implementation it is preferable to apply the techniques outlined in the present paper in the structural setup of [10], where independent factors have realistic and tangible interpretations. In this connection it is worth adding that for model calibration one may use (\({\upalpha }\)) the current poll statistics to estimate the a priori probabilities \(\{p_i\}\); (\({\upbeta }\)) the magnitude of the volatility for the poll statistics to estimate the information flow rate \(\sigma \); and (\({\upgamma }\)) various public surveys conducted by pollsters to estimate the density for the preference weights \(\{w_n^k\}\). In the presence of disinformation, the distribution of \(\{F_t\}\) can be estimated from data available to fact checkers.

With these in mind, we conclude by remarking that the analysis presented in this paper is in fact applicable more generally to generic advertisements; not merely to election competition. From this point of view we can regard the foregoing material as providing a new information-based mathematical model for controlling information, as well as for characterising the impact of advertisement.