# Observational Learning

**DOI:**https://doi.org/10.1057/978-1-349-95189-5_2990

## Abstract

Observational learning occurs when privately informed individuals sequentially choose among finitely many actions after seeing predecessors’ choices. We summarise the general theory of this paradigm: *belief convergence* forces *action convergence*; specifically, copycat ‘herds’ arise. Also, beliefs converge to a point mass on the truth exactly when the private information is not uniformly bounded. This subsumes two key findings of the original herding literature: With multinomial signals, *cascades* occur, where individuals rationally ignore their private signals, and incorrect herds start with positive probability. The framework is flexible – some individuals may be committed to an action, or individuals may have divergent cardinal or even ordinal preferences.

## Keywords

Action herd Experimentation Information aggregation Informational cascade Informational herding Limit cascade Markov process Martingale Observational learning Social learning Stochastic difference equation## JEL Classifications

D8 D83## Observational Learning

Suppose that an infinite number of individuals each must make an irreversible choice among finitely many actions – encumbered solely by uncertainty about the state of the world. If preferences are identical, there are no congestion effects or network externalities, and information is complete and symmetric, then all ideally wish to make the same decision.

*Observational learning* occurs specifically when the individuals must decide sequentially, all in some preordained order. Each may condition his decision both on his endowed private signal about the state of the world and on all his predecessors’ decisions, but *not* their hidden private signals. This article summarizes the general framework for the herding model that subsumes all signals, and establishes the correct conclusions. The framework is flexible – e.g., some individuals may be committed to an action, or individuals may have divergent preferences.

Banerjee (1992) and Bikhchandani et al. (1992) (hereafter, BHW) both introduced this framework. Ottaviani and Sørensen (2006) later noted that the same mechanism drives expert herding behaviour in the earlier model of Scharfstein and Stein (1990), after dropping their assumption that private signals are conditionally correlated. In BHW’s logic, *cascades* eventually start, in which individuals rationally ignore their private signals. Copycat action herds therefore arise *ipso facto*. Also, despite the surfeit of available information, a herd develops on an incorrect action with positive probability: after some point, everyone might just settle on the identical less profitable decision. This result sparked a welcome renaissance in informational economics. Observational learning explains correlation of human behaviour in environments without network externalities where one might otherwise expect greater independence. Various twists on the herding phenomenon have been applied in a host of settings from finance to organisational theory, and even lately into experimental and behavioural work.

In this article, we develop and flesh out the general theory of how Bayes-rational individuals sequentially learn from the actions of posterity, as developed in Smith and Sørensen (2000). Our logical structure is to deduce that almost sure *belief convergence* occurs, which in turn forces *action convergence*, or the action herds. Also, beliefs converge to a point mass on the correct state exactly when the private signal likelihood ratios are not uniformly bounded. For instance, incorrect herds arose in the original herding papers since they assumed finite multinomial signals. We hereby correct a claim by Bikhchandani et al. (2008), which unfortunately concludes, ‘In other words, in a continuous signals setting herds tend to form in which an individual follows the behaviour of his predecessor with high probability, even though this action is not necessarily correct. Thus, the welfare inefficiencies of the discrete cascades model are also present in continuous settings’.

Multinomial signals also violate a log-concavity condition, and for this reason yield the rather strong form of belief convergence that is a cascade. One recent lesson is the extent to which cascades are the exception rather than rule.

## The Model

Assume a completely ordered sequence of individuals 1, 2,… Each faces an identical binary choice decision problem, choosing an action *a* ∈{1, 2}. Individual *n*’s payoff *u*(*a*_{n}, *ω*) depends on the realisation of a state of the world, ω ∈ {*H*, *L*}, common across *n*. The high action pays more in the high state: *u*(1, *L*) >*u*(2, *L*) and *u*(1, *H*) <*u*(2, *H*). Individuals act as Bayesian expected utility maximisers, choosing action *a* = 2 above a threshold posterior belief *r*¯, and otherwise action *a* = 1. All share a common prior *q*_{0} = *P*(*ω* = *H*), and for simplicity, *q*_{0} = 1/2.

The decision-making here is partially informed. For exogenous reasons, each individual *n* privately observes the realisation of a noisy signal *σ*_{n}, whose distribution depends on the state ω. Conditional on ω, signals are independently and identically distributed. Observational learning is modelled via the assumption that individual *i* can observe the full history of actions *h*_{n} = (*a*_{1}, … , *a*_{n−1}). While predecessors’ private signals cannot be observed directly, they may be partially inferred. The interesting properties of observational learning follow because the private signals are filtered by coarse public action observations.

*σ*

_{n}, with no other information, yields an updated

*private belief p*

_{n}∈[0, 1] in the state of the world

*ω*=

*H*. The private belief

*p*

_{n}is a sufficient statistic for the private signal

*σ*

_{n}in the

*n*th individual’s decision problem. Its cumulative distribution

*F*(

*p*|

*ω*) in state

*ω*is a key primitive of the model. Define the unconditional cumulative distribution

*F*(

*p*) = [

*F*(

*p*|

*H*) +

*F*(

*p*|

*L*)]/2. The theory is valid for arbitrary signal distributions, having a combination of discrete and continuous portions. But to simplify the exposition, we assume a continuous distribution with density

*f*. The state-conditional densities

*f*(

*p*|ω) obey the Bayesian relation

*p*= (1/2)

*f*(

*p|H*)/

*f*(

*p*) with

*f*(

*p*)= [

*f*(

*p|H*) +

*f*(

*p|L*)]/2, implying

*f*(

*p|H*) = 2

*pf*(

*p*) and

*f*(

*p|L*) = 2(1 –

*p*)

*f*(

*p*). The equality

*f*(

*p|H*)/

*f*(

*p|L*) =

*p*/(1 –

*p*) can be usefully reinterpreted as a

*no introspection condition*: understanding the model likelihood ratio of one’s private belief

*p*does not allow any further inference about the state. This special ratio ordering implies that the conditional distributions share the same support, but that

*F*(

*p|H*) <

*F*(

*p|L*) for all private beliefs strictly inside the support (Fig. 1).

Private beliefs are said to be *bounded* if there exist *p*′, *p*″ ∈(0, 1) with *F*(*p*′) = 0 and *F*(*p*″) = 1, and *unbounded* if *F*(*p*) ∈ (0, 1) for all *p* ∈ (0, 1). For instance, a uniform density *f* (*p*) ≡ 1 results in the unbounded private belief distributions *F*(*p|H* ) = *p*^{2}< 2*p* – *p*^{2} = *F*(*p|L*). But if *f* (*p*) ≡ 3 on the support [1/3, 2/3], then the bounded private belief distributions are *F*(*p|H* ) = (3*p* – 1)(1 + 3*p*)/3 < (3*p* – 1)(5 – 3*p*)/3 =*F*(*p|L*).

## Analysis via Stochastic Processes

*public belief q*

_{n}in state

*H*is based on the observed history of the first

*n*–1 actions alone. The associated

*likelihood ratio*of state

*L*to state

*H*is then ℓ

_{n}= (1 –

*q*

_{n})/

*q*

_{n}. And if so desired, we can recover public beliefs from the likelihood ratios using

*q*

_{n}= 1/(1 + ℓ

_{n}). Incorporating the most recent private belief

*p*

_{n}yields the posterior belief

*r*

_{n}=

*p*

_{n}/(

*p*

_{n}) ℓ

_{n}(1 –

*p*

_{n})) in state

*H*. So indifference prevails at the

*private belief threshold p¯*(ℓ) defined by

Individual *n* chooses action *a* = 1 for all private beliefs \( {p}_n\le \overline{p}\left(\ell \right) \), and otherwise picks *a* = 2. Since higher public beliefs (i.e., lower likelihood ratios) compensate for lower private beliefs in Bayes Rule, the threshold is monotone \( {\overline{p}}^{\prime}\left(\ell \right)>0 \).

*a*= 1, 2 happens with chance

*ρ*(

*a*|ℓ,

*ω*) in state

*ω*∈ {

*H*,

*L*}, where

*n*takes action

*a*

_{n}, the updated public likelihood ratio is

_{n}) that is a

*martingale*, conditional on state

*H*.

_{n}) is a non-negative martingale in state

*H*, the Martingale Convergence Theorem applies. Namely, (ℓ

_{n}) converges almost surely to the (random variable) limit ℓ

_{∞}= lim

_{n}

_{→}

_{∞}ℓ

_{n}, namely having (finite) values in [0,∞). The support of ℓ

_{∞}contains all candidate limit likelihood ratios. Among the most immediate of implications,

*learning cannot result in a fully erroneous belief*ℓ = ∞

*with positive probability.*Just as well, this follows from Fatou’s Lemma in measure theory, for

*E*[lim inf

_{n}

_{→}

_{∞}ℓ

_{n}|

*H*] ≤ lim inf

_{n→}

_{∞}

*E*[ℓ

_{n}|

*H*] = ℓ

_{0}.

*a*

_{n}, ℓ

_{n}) is also a

*Markov process*on the domain {1, 2}× [0,∞). For we can see that each new pair only depends on the last:

*a*

_{n}, ℓ

_{n}) is clearly nonstandard, Smith and Sørensen (2000) prove the following version of the Markov stationarity condition:

*If the transition functions*ρ

*and*φ

*are continuous in*ℓ,

*then for any*\( \widehat{\ell} \)

*in the support of*ℓ

_{∞}

*and for all m, we have either*\( \rho \left(m|H,\widehat{\ell}\right)=0 \)

*or*\( \upvarphi \left(m,\widehat{\ell}\right)=\widehat{\ell} \). In other words, either an action does not occur, or it yields no new information, or both.

The stationary points of the (*a*_{n}, ℓ_{n}) process are therefore the *cascade sets*, namely, those sets of likelihood ratios ℓ indexed by actions *m* that almost surely repeat action *m*, namely,\( {\overline{J}}_m=\left\{\ell |\rho \left(m|\ell, H\right)=1\right\}. \) With *bounded private beliefs*, there must exist some high (low) enough likelihood ratios ℓ that pull all private beliefs below (above) the threshold posterior belief \( \overline{r} \). In this case, the cascade sets \( {\overline{J}}_1,{\overline{J}}_2 \) for the two actions are both non-empty. When private beliefs are unbounded, the cascade sets collapse to the extreme points, \( {\overline{J}}_1 \) = {∞} and \( {\overline{J}}_2 \) = {0}. And since we have seen that ℓ = ∞ cannot arise with positive probability, we must converge to a point mass on the truth (or ℓ = 0).

Next, we claim that convergence of beliefs implies convergence of actions. Whenever someone optimally chooses action *m*, any successor must optimally follow suit if he bases his decision just on public information. Individual *n* – 1 solves the same decision problem as *n* faces, but with more information, (*a*_{1},…, *a*_{n−2}) and σ_{n−1}. Contrary actions completely ‘overturn’ the weight of the entire action history, however long. By this *Overturning Principle*, an infinite subsequence of contrary actions precludes belief convergence. By the Martingale Convergence Theorem, this almost surely cannot happen. By the last paragraph, we conclude that *with unbounded private beliefs, a correct herd eventually arises.*

## When Only Correct Herds Arise

Consider an illustrative example, with individuals deciding whether to ‘invest’ in or ‘decline’ an investment project of uncertain value. Investing (action 2) is risky, paying *u*> 1 in state *H* and −1 in state *L*, declining (action 1) is a neutral action with zero payoff in both states. Indifference prevails at the posterior belief \( \overline{r} \) = 1/(1 + *u*). Then Eq. 1 yields the private belief threshold \( \overline{p} \)(ℓ) = ℓ/(*u* + ℓ).

*ρ*(1|ℓ,

*H*) = ℓ

^{2}/(

*u*+ ℓ)

^{2}and

*ρ*(2|ℓ,

*L*) = ℓ(ℓ + 2

*u*)/(

*u*+ ℓ)

^{2}, and continuations

*H*: The limit ℓ

_{∞}is thus concentrated on 0, the truth.

Whenever action 2 is taken, the new likelihood ratio is ℓ_{n}≥ 2*u*. This can only happen finitely many times.^{1} So belief convergence implies action convergence, namely, a herd. This example precisely illustrates the logic for one main result: interestingly, a herd arises despite the fact that a cascade never does, since at each and every stage, a contrary action was possible. Since convergence occurs towards the cascade set but forever lies outside, this is called a *limit cascade*.

## When Incorrect Herds Must Sometimes Arise

*u*+ ℓ), the laws of motion (2)–(3) yield transitions

*u*/2, 2

*u*). As seen in Fig. 2 (left panel), a cascade can never start after the first individual decides. But since the likelihood ratio must converge, a limit cascade starts, towards one of the cascade sets

*J*¯

_{1}or

*J*¯

_{2}. A herd on the corresponding action must then start eventually, lest beliefs fail to converge.

We now explore the easy logic for why *an incorrect herd occurs with strictly positive probability given bounded beliefs.* Again, we appeal to a big gun from measure theory. For if we start at some public likelihood ratio ℓ_{0} ∈(*u*/2, 2*u*), then by Fig. 2, dynamics are trapped in (*u*/2, 2*u*). Since 0 ≤ ℓ_{n} ≤ 2*u*, Lebesgue’s Dominated Convergence Theorem allows us to swap the expectation and limit operations, and thus conclude that *E*[ℓ_{∞} | *H* ] = lim_{n→∞}*E*[ℓ_{n} | *H* ] = ℓ_{0}. Write ℓ_{0} = π(*u*/2) + (1 – π)(2*u*), where 0 < π < 1 whenever *u*/2 <ℓ_{0}< 2*u*. Then the random variable ℓ_{∞} places weight π on *u*/2 and weight 1 – π on 2*u*. So in state *H*, a herd arises with chance π on action 2, and with chance 1 – π on action 1.

## Herds Without Cascades

_{n}∈(

*u*/2, 2

*u*) for all

*n*, provided that initially ℓ

_{0}∈ (

*u*/2, 2

*u*). So while the analysis in BHW explicitly depended on cascades ending the dynamics in finite time, a somewhat subtler dynamic story emerges here:

*Herds must arise even though a contrarian has positive probability at every stage.*

This no-cascades result is robust to changes in both the signal distribution and payoffs, for it arises whenever the continuation functions φ(1,ℓ), φ(2,ℓ) are monotone increasing in ℓ. Monotonicity asserts the seemingly plausible condition that a higher prior public belief implies a higher posterior public belief after every action. Yet, despite how intuitive this property may seem, it is violated by any multinomial signal distribution (loosely, because it is ‘lumpy’).

We have shown in Smith and Sørensen (2008) that the continuation functions are monotone under an easily verifiable regularity condition – namely, that the unconditional density of the log-likelihood ratio log(*p*/(1 – *p*)) be log-concave. Most popular continuous distributions satisfy this condition, for instance, the Gaussian, uniform or generalised exponential. But the analysis in BHW and a vast number of successor papers was based on the multinomial family – namely, the one main signal family for which the regularity condition *fails*. This discussion hereby corrects the claim by Bikhchandani et al. (2008), that ‘In some continuous signal settings cascades do not form (Smith and Sørensen 2000)’. On the contrary, one really must view cascades as the informationally rare outcome, a case where a tractable example class proved misleading. The true touchstone of this literature is simply the observed phenomenon of action herding.

## Cascades with Smooth Signals

To fully flesh out this picture, we offer an example of a continuous signal distribution that violates the monotonicity result. (This example is based on one included in the original working paper of Smith and Sørensen (2000) found in Sørensen (1996)). To this end, we construct a sufficiently heroic violation of our log-concavity condition. Suppose that private beliefs *p* have a quadratic density *f* (*p*) = 324(*p* – 1/2)^{2} over the bounded support [1/3, 2/3]. Then the conditional private belief densities are *f* (*p|H* ) = 2*pf* (*p*) and *f* (*p|L*) = 2(1 –*p*)*f* (*p*), as depicted in the right panel of Fig. 1. Integration yields the (suppressed) polynomial expressions for *F*(*p|L*), *F*(*p|H*).

*u*/2, 2

*u*), we find the likelihood ratio transitions (left panel of Fig. 3):

## A More General Observational Learning Framework

The Overturning Principle may not sound very realistic, *a priori*. Should we expect that a single deviator from an action herd of one million individuals can, entirely by himself, change the course of subsequent play? Is the excessive reliance on the assumption of common knowledge of rationality implicit in the overturning principle reasonable? Experimental results on the informational herding model, e.g., Çelen and Kariv (2004), have cast doubt on this. (The review by Anderson and Holt (2008) speaks more broadly to such experimental evidence.)

It turns out that our reduction of the model to a stochastic difference equation in the likelihood ratio obeying a martingale property is robust to a wide array of economically inspired modifications that can accommodate deviations from the overturning principle. For instance, suppose that a fraction of ‘crazy’ individuals randomly choose actions. Figure 3 depicts the modified continuation functions in the right panel, for a case where 10% of individuals are committed to action 1 and 10% are committed to action 2. The remaining population is rational. Since all actions occur with a non-vanishing frequency, none can have drastic effects. Yet the limit beliefs are unaffected by the noise, contrary actions being deemed irrational (and ignored) inside the cascade sets. Of course, the failure of the overturning principle invalidates the argument that limit cascades force herds. But because actions are still informative of beliefs, social learning is productive.

We show more strongly in Smith and Sørensen (2000) that herds nonetheless do arise among all rational (non-crazy) individuals, when beliefs are bounded and have non-zero density near the bounds. Essentially, the public likelihood ratios (ℓ_{n}) converge so fast that the chance of an infinite string of rational contrarians is zero. (Of course, an outside observer of the action history would hardly be able to detect infrequent rational non-herders, should they occur.)

Alternatively, we may relax the assumption that all individuals solve the same decision problem. Individuals may well have different rational preference types. First, if ordinal preferences are aligned, so that everyone takes action 2 for stronger beliefs in state *H*, then the limit likelihood ratio ℓ_{∞} is focused on the intersection of their respective cascade sets.

Suppose instead that the ordinal preferences differ for some pair of types. Then there arises the possibility of a *confounded learning point*. This is a non-cascade likelihood ratio ℓ^{*} such that if ℓ_{n−1} = ℓ^{*}, then individual *n*’s observation of action *a*_{n} is non-informative – the probabilities satisfy *ρ*(1|*H*, ℓ^{*}) = *ρ*(1|*L*, ℓ^{*}). In this case, ℓ_{n+1} = ℓ_{n} following either action of individual *n*. If such a confounding outcome ℓ^{*} exists, then it is *locally stochastically stable*: there is positive probability that ℓ_{∞} = ℓ* provided some ℓ_{n} is ever sufficiently close to ℓ^{*}.

## Conclusion

This model of observational learning explores a modelling framework to analyse imitation of observed behaviour. The model is quite tractable. Public beliefs based on the ever-lengthening action history must converge to a limit, which is among the fixed points of a stochastic difference equation. As long as all ordinal preferences coincide, we eventually settle on an action herd, even though beliefs might never settle down. When private signals sufficiently violate a log-concavity condition, a cascade can arise.

Lee (1993) noted that beliefs can be perfectly revealed when the action space is continuous, just like the belief space. The social learning paradigm instead by and large explores when a coarse action set communicates the private beliefs of decision makers. It may sufficiently frustrates the learning dynamics that an incorrect action herd occurs. If individuals seek to help each other by taking more informative actions, and if this signaling is understood by successors, then any cascade sets shrink, and the welfare of later individuals generally rises. As we show in Smith and Sørensen (2008), the analysis is qualitatively similar to that outlined here, although solving for the new, forward-looking transition chances requires dynamic programming.

A greater message of social learning is the self-defeating nature of learning from others. Moving outside the finite action, sequential entry model into a Gaussian world, Vives (1993) found that social learning is slower than private learning in a market setting where individual decisions are obscured by Gaussian noise.

If observations are not made of an ever-expanding history, such as simply knowing the number but not order of past action choices, then our approach is less useful. The survey by Gale and Kariv (2008) discusses the problem of learning in networks. In Smith and Sørensen (1994), and Chapter 3 of Sørensen (1996), we identified a case where the stochastic difference equation is a useful tool, even when public beliefs do not follow a martingale.

## See Also

## Footnotes

- 1.Still, it helps to introspect on exactly
*why*no such contrarian can arise. Let the chance that the*k*th individual breaks the herd be*pk*, given the state. If these chances vanish fast enough that they are summable, then their tail sum can be made as small as desired. Then by conditional independence, the chance that no one among 1, 2, … ,*k*breaks the herd is positive:$$ \left(1-{p}_1\right)\cdots \left(1-{p}_k\right)>1-{p}_1-{p}_2-\cdots -{p}_k>0. $$

## Bibliography

- Anderson, L., and C.A. Holt. 2008. Information cascade experiments. In
*Macmillan Publishers Ltd*, ed. S.N. Durlauf and L.E. Blume. New York: Palgrave MacMillan.Google Scholar - Banerjee, A.V. 1992. A simple model of herd behavior.
*Quarterly Journal of Economics*107: 797–817.CrossRefGoogle Scholar - Bikhchandani, S., D. Hirshleifer, and I. Welch. 1992. A theory of fads, fashion, custom, and cultural change as information cascades.
*Journal of Political Economy*100: 992–1026.CrossRefGoogle Scholar - Bikhchandani, S., D. Hirshleifer, and I. Welch. 2008. Information cascades. In
*Macmillan Publishers Ltd*, ed. S.N. Durlauf and L.E. Blume. New York: Palgrave MacMillan.Google Scholar - Çelen, B., and S. Kariv. 2004. Distinguishing informational cascades from herd behavior in the laboratory.
*American Economic Review*94: 484–498.CrossRefGoogle Scholar - Gale, D., and S. Kariv. 2008. Learning and information aggregation in networks. In
*Macmillan Publishers Ltd*, ed. S.N. Durlauf and L.E. Blume. New York: Palgrave MacMillan.Google Scholar - Lee, I.H. 1993. On the convergence of informational cascades.
*Journal of Economic Theory*61: 395–411.CrossRefGoogle Scholar - Ottaviani, M., and P.N. Sørensen. 2006. Professional advice.
*Journal of Economic Theory*126: 120–142.CrossRefGoogle Scholar - Scharfstein, D.S., and J.C. Stein. 1990. Herd behavior and investment.
*American Economic Review*80: 465–479.Google Scholar - Smith, L., and P. Sørensen. 1994. An example of Non-martingale learning. MIT Working Paper.Google Scholar
- Smith, L., and P. Sørensen. 2000. Pathological outcomes of observational learning.
*Econometrica*68: 371–398.CrossRefGoogle Scholar - Smith, L., and P. N. Sørensen. 2008. Informational herding and optimal experimentation. University of Copenhagen Working Paper.Google Scholar
- Sørensen, P. 1996. Rational social learning. PhD thesis, MIT.Google Scholar
- Vives, X. 1993. How fast do rational agents learn?
*Review of Economic Studies*60: 329–347.CrossRefGoogle Scholar