1 Introduction

The fact that probabilistic inferences methods break down in situations of complexity is well known and has been developed extensively in the literature (e.g., Weaver, 1948; von Hayek, 1967; Zadeh, 1962, 1969; Malik, 1996). However, these discussions are rather anecdotal (e.g., Hayek, 1967; Churchman, 1961) or often interpreted as if the central cause of this inability is a lack of enough data and/or computational power (Seising, 2012; and Weaver himself did pin his hope on the power of digital computers to solve problems of complexity). Indeed, the literature makes it seem as though a perfect computer exploiting big data could transcend those all-too-human limitations (Zin et al., 2019). In this paper, we argue that the problem is far deeper than a mere lack of power or a paucity of data. Rather, we argue that the problem stems from a widely discussed philosophical conundrum—i.e., the problem of induction. Specifically, we hope to both demonstrate that the failure of probabilistic inference methods in complex situations is a particularly interesting case of this philosophical problem as well as argue that this problem belies the hope that more raw computational power and richer or more precise data can save us from states of deep uncertainty, particularly present in situations of high complexity. This, we believe, is necessary stage-work so that the problem of complexity in banking and elsewhere can be thought about properly.

To begin, in philosophical discussions, the problem of induction has been widely debated since at least Hume. Roughly, the problem claims that there is no principled philosophical justification that allows one to infer from the fact that x has always y-ed (e.g., the sun has always risen) to the fact that x will y at a future point (e.g., the sun will rise tomorrow). Many potential solutions have been proposed including: a radical rethinking of the nature of causality so that it is grounded in the inherent features of the objects in question (e.g., Cartwright, 2013); a reliance on nomic regularities that link the initial event and the outcome together (e.g., Davidson, 1967); a pragmatic shrug-of-the-shoulders (Hume’s own reply); and so on. However, much of this discussion in the philosophical literature has focused on what should be paradigmatic examples of causality. By contrast, our focus is on the use of probabilistic methods that attempt to “tame chance” (Hacking, 2006). Specifically, our paper follows several theorists who have argued that probabilistic reasoning is neither useful nor successful in the realm of so-called organized complexity (Weaver, 1948). However, it has been shown that these thinkers’ discussions are often afflicted with vagueness (e.g., Gershenson, 2008; Byrne & Callaghan, 2014; Hoffmann, 2017). Ergo, the central aim of our paper is to supplement and enrich these discussions by (a) explicitly deducing the limits of probability theory from philosophical considerations and (b) verifying that this deduction holds for the measurement of complex risks in financial systems. Thereby, elaborating on financial systems makes sense for three main reasons:

  1. (1)

    A paradigm from a complexity lens: Modern financial systems are more interconnected and complex than many other (economic) systems (Bookstaber et al., 2015: 152; Acharya & Yorulmazer, 2008; Allen & Gale, 2000).

  2. (2)

    Special status: Whereas in organizations of other industries risk management usually focuses on the negative—threats and failures rather than opportunities and successes (Kaplan & Mikes, 2012: 50)—, risk management plays a decisively different, namely a broader and more central and sophisticated role in financial institutions: In the financial universe, risk and return are two sides of the same coin. Absorbing, repackaging and passing on risks to others (with different levels of intensity, to be sure) is the core of their business model (Bessis, 2010: 38). Accordingly, the effective management of risks is central to a financial institution’s performance (Saunders & Cornett, 2010: 182).

  3. (3)

    Alleged expertise: Financial institutions and intermediaries have been regarded as the world’s processors of risk par excellence, as leaders in the use of risk assessment and management techniques (Pergler & Freeman, 2008: 13; Mehta et al., 2012). Yet, numerous clear examples of financial risks that could not have been prevented due to the problem of induction, such as the financial and economic crisis that erupted in 2008 or the failure of LTCM (Lowenstein, 2002), show that these same institutions have also been the worst hit by the eruptions (or simply by reality). This seems to highlight shortcomings in their methodologies and presage that too much emphasis has been placed on the deceptively precise outputs of statistical measures and probabilistic models—“the hubris of spurious precision” (Rebonato, 2007: xi). Financial institutions have not managed to act as the specialists in risk measurement/management they are supposed to be (Saunders & Cornett, 2010: 18).

Should our argument prove correct, it will thus go hand in hand with severe implications for the business practice since probability theory will lose its claim to be capable of guiding the practical management of complex risk in banking and other financial institutions.

In Sect. 2, we delineate the philosophical problem of induction and briefly review its development in a systematic (not historical) manner. We do this because “[n]owhere is the problem of induction more relevant than in the world of trading [in the branch of financial economics, more generally; C.H.]—and nowhere has it been as ignored” (Taleb, 2007: 116; cf. also Spitznagel, 2013: 98, 178, 243; von Mises, 1949/1998: 31).Footnote 1 In Sect. 3, we introduce an axiomatic and formal version of probability theory. In Sect. 4, we discuss attempts to apply probability to complex systems as well as common criticisms. In Sect. 5, we discuss more sophisticated models that are thought to avoid the application problems voiced in Sect. 4. We argue that these, too, fall short. In Sect. 6, we develop our central argument at length. Specifically, we reason that the real source of the breakdown of probability in the face of complexity emerges as a conceptual correlate to the problem of induction. Indeed, we find that in situations of organized and dynamic complexity, especially, Hume et al.’s seemingly ‘academic’ worries take on a concrete reality. Moreover, we also demonstrate that this way of thinking about the failure of probability is far deeper and more articulate than worries about data or lack of computational power. The paper is closed in Sect. 7 by linking this conceptual critique with objections to the use of probability theory in risk management in banking. In addition, we include an appendix that, on the one hand, clarifies the sorts of systems one might be interested in as well as how these relate to complexity, organization, and predictable behavior. On the other, we detail the characteristics of financial systems and banking in terms of complexity.

2 Philosophical Roots of the Problem of Induction: Some Preliminaries

Arguably, David Hume’s greatest single contribution to philosophy of science has been the so-called problem of induction (1739-40/1888).Footnote 2 At a first pass, and at a minimum, induction concerns inferences whose conclusions are not validly entailed by the premises (Hájek & Hall, 2002: 149). However, notice that this minimalist definition would imply that any non-deductive sort of inference—from random guesswork to careful empirical studies—are all instances of the same reasoning pattern. Ergo, this net is far too broad. To home in on the form of the problematic induction that interests us, let us briefly consider some more specific versions of the problem.

To begin from an intuitive place, it is clear that one form of induction is what is referred to as enumerative induction or universal inference. This is basically an inference from particular inferences:

a1, a2, …, an are all Fs (a certain predicate) that are also G (another predicate),


to a general law or principle

All Fs are G.

Notice that this form of enumerative induction is problematic as it faces many counterexamples. For example, prior to 1697 AD, a common example of induction was as follows: I observe a swan around here (i.e., in Europe) and it is white; I observe another swan around here and it is white; etc. I then conclude “All swans have the property of whiteness.” However, clearly Australian (black) swans refute this inference. Given this problem with induction, a new problem of induction emerges (this has some overlap with Goodman (1955) and his discussion of the new riddle of induction). This new problem of induction is narrow in that it does not focus on induction as such. Rather, it begins by assuming that induction is already in good working order. Given this assumption, the new problem is finding some criteria that distinguish between a good inductive inference (e.g., copper conducts electricity) from a flawed one (e.g., all swans are white).Footnote 3

Though this new problem of induction is certainly interesting, it is not our focus, for two reasons. First, the new problem simply assumes that induction is in order. Pace this, part of what we show in this paper is that in topics of organized complexity, it is most certainly not in good working order. Second, by presupposing induction, the new riddle of induction avoids further specification of what constitutes a specifically inductive inference. In turn, this, again, lends itself to the thought that inductive inferences are simply all and only non-deductive inference (as Rudolf Carnap suggested).

As suggested, we find this dichotomy insufficient and, therefore, let us attempt to characterize inductive inferences further. Fortunately, though inductive inferences are not easily characterized, we do have lucid indicators of what marks off an inference as being an instance of induction. Specifically, inductive inferences are: (a) “contingent, deductive inferences are necessary” (Vickers, 2014)Footnote 4; induction is (b) ampliative, i.e., they “can amplify and generalize our experience, broaden and deepen our empirical knowledge” whereas deduction is explicative, i.e., it “orders and rearranges our knowledge without adding to its content” (ibid.). Though these features fall well short of a proper definition of induction, they will suffice in guiding our analysis.

Accordingly, the original, or old, problem of induction can now be explicated a bit. To wit, the problem of induction concerns “the support or justification of inductive methods” (Vickers, 2014)—i.e., methods that predict or infer, from past experiences, features of the future. Thus, and in Hume’s words, the problem of induction concerns what justifies our assumption that “instances of which we have had no experience resemble those of which we have had experience” (Hume, 1739–40/1888: 89), i.e., nature continues always uniformly the same. In turn, this demand for justification gives rise to a deep dilemma. On the one hand, the claim that nature has uniformity, the Uniformity Principle (UP), “cannot be proved deductively, for it is contingent, and only necessary truths can be proved deductively” (Vickers, 2014). On the other, uniformity cannot “be supported inductively—by arguing that it has always or usually been reliable in the past—for that would beg the question” (ibid.). In sum, for this paper, the old problem of induction focuses on the lack of justification for assuming a uniformity in the behavior in nature. As it were, we have no principled reason to assume the future will be like the past. Or more formally, Hume’s argument can be reconstructed as follows (following Henderson, 2018, where premises are labeled as P, and subconclusions and conclusions as C):

P1. There are only two kinds of arguments: demonstrative and probable (Hume’s fork).

P2. Inductive inferences presuppose the UP.


1st horn:

P3. A demonstrative argument establishes a conclusion whose negation is a contradiction.

P4. The negation of the UP is not a contradiction.

C1. There is no demonstrative argument for the UP (by P3 and P4).


2nd horn:

P5. Any probable argument for UP presupposes UP.

P6. An argument for a principle may not presuppose the same principle (Non-circularity).

C2. There is no probable argument for the UP (by P5 and P6).

C3. There is no argument for the UP (by P1, C1 and C2).


Consequences:

P7. If there is no argument for the UP, there is no chain of reasoning from the premises to the conclusion of any inference that presupposes the UP.

C4. There is no chain of reasoning from the premises to the conclusion of inductive inferences (by P2, C3 and P7).

P8. If there is no chain of reasoning from the premises to the conclusion of inductive inferences, the inferences are not justified.

C5. Inductive inferences are not justified (by C4 and P8).

Many different replies to different problems of induction are brought forward in the literature and it would be beyond the scope of the present study to restate and agitate them. Two concise comments shall suffice at this point. On the one hand, there is an extended debate revolving around probability and induction, a point we return to briefly in the next paragraph.Footnote 5 On the other hand, one of the most influential and controversial views on the problem of induction has been that of Karl Popper who argued that induction has no place in the logic of science (Popper, 1959).Footnote 6

In summation, we focus on the justificatory aspect of the old problem of induction. In other words, we focus on what grounds or reasons can be adduced that can warrant the inference from “the x’s have always y-ed”, to “the x’s will y tomorrow.” Notice, to presage our discussion a bit, one very likely candidate that can do this justificatory work seems to be probability: e.g., (early) Peirce (1878/1992: 155–169) or later on Reichenbach (1949) push in exactly this direction. Indeed, it seems like the analytic status and precision of probability can offer a powerful justification for inductive inferences. As it were, I can calculate that, given enough time, the number of even roles of a fair, six-sided, dice, will converge to ½. And I am justified in this inference exactly because of the mathematical structure of probability theory itself.

3 Probability Theory in a Nutshell

The Latin root of probability is a combination of probare, which means to test, to prove, or to approve; and ilis, which means able to be, something like “worthy of approbation” (Bernstein, 1996: 48). Probability has always carried this double meaning, one looking into the future, the other interpreting the past, one concerned with our opinions and beliefs (interpretation as subjective probabilities), the other concerned with what we actually know or with frequencies in the real world (objective concept) (Hájek, 2019; Hacking, 2006: Chpt. 2; Jaynes, 2003: 39). But even though the interpretation of probability is an important foundational problem (ibid.) and even though the inadequacy of frequency interpretations of probability for many situations in risk management deserves appreciation (Rebonato, 2007),Footnote 7 it plays no major role in motivating and developing the central argument given in Sect. 6. Nor does probability theory in some strict sense represent the object of criticism. This is exactly because we focus on what could be termed the pure logical syntax of probability theory, without any interpretation per se. In other words, what we refer to as “probability theory” is the branch of mathematics and the standard formalism concerned with probability as axiomatized by Kolmogorov (1933) (cf. Schneider, 1988: Chapter 8). To be sure, Kolmogorov’s axiomatization, which we shortly present in what follows, has achieved the status of orthodoxy for discourses on risk (McNeil et al., 2005: 1f.) and it is typically what economists, risk experts and others, including these authors, have in mind when they think of “probability theory”.Footnote 8

Following Kolmogorov (1933) and the outline by Hájek (2019), let Ω be a non-empty set (‘the universal set’). A field (or algebra) on Ω is a set F of subsets of Ω that has Ω as a member, and that is closed under complementation (with respect to Ω) and union. Let P be a function from F to the real numbers obeying:

  1. 1.

    (Non-negativityP (A) ≥ 0, for all A ∈ F.

  2. 2.

    (NormalizationP (Ω) = 1.

  3. 3.

    (Finite additivityP (A ∪ B) = P (A) + P (B) for all AB ∈ F such that A ∩ B = ∅.

Call P a probability function, the bearers of probabilities “events”, “sentences (of a formal language)”, or “outcomes”, and (Ω, FP) a probability space. The assumption that P is defined on a field guarantees that these axioms are non-vacuously instantiated, as are the various theorems following from them (ibid.).

In the words of Hájek (2019), let us now strengthen our closure assumptions regarding F, requiring it to be closed under complementation and countable union; it is then called a sigma field (or \(\sigma \)-algebra) on ΩFootnote 9:

3′. (Countable additivity) If A1A2A3 … is a countably infinite sequence of (pairwise) disjoint sets, each of which is an element of F, then

$$P \left(\bigcup_{n=1}^{\infty }{A}_{n}\right) = \sum_{n=1}^{\infty }P \left({A}_{n}\right)$$

The non-negativity and normalization axioms establish, by and large, a convention for measurement, although it is non-trivial that probability functions take at least the two values 0 and 1, and that they have a maximal value (unlike various other measures, such as length, volume, and so on, which are unbounded). It is now easy to see that the theory applies to various familiar cases. For example, we may represent the results of throwing a single fair die once by the set Ω = {1, 2, 3, 4, 5, 6}, and F be the set of all subsets of Ω. Under the natural assignment of probabilities to members of F, we obtain such welcome results as P ({6}) = 1/6, P (even) = P ({2} ∪ {4} ∪ {6}) = 3/6, P (odd or less than 4) = P (odd) + P (less than 4) – P (odd ∩ less than 4) = 1/2 + 1/2 − 2/6 = 4/6, and so on (ibid.).

To the extent that Kolmogorov’s first two axioms are simply matters of convention (which is not undisputed),Footnote 10 they are not controversial. In any case, the third postulate (‘finite/countable additivity’) is seen as the most puzzling one for debate—and justifying it is rightly regarded as one of the main achievements of Bayesian scholarship (Bradley, 2017).Footnote 11 However, none of these axioms are the target of our central argument. Given that our argument does not turn on possible axiomatizations of probability theory, we hereby adopt the Kolmogorvian axioms as an eloquently abstract and logically precise formulation. However, circumspect readers should keep in mind that our scope is broader than this.

4 Probability, Complexity, and Common Problems

With the nuts and bolts of probability in view, we now turn to both attempts to deploy it in complex situations (e.g., Gilboa, 2009) and standard criticisms of these deployments. First, we lay out a fairly typical view of how one can employ probability in complex situations. Second, we examine several criticisms of this application. We should emphasize that most critics of the use of probability in complex systems (and we ourselves) do not argue that probability-based risk models are invalid (in terms of the deductive reasoning). Rather, we argue that modeling complex risks lies outside the scope of classical probability theory. In a nutshell, the problem is with that application, not with the theory itself.

To begin, let us bring the problem of complexity into view. Specifically, let us consider the disjoint between two categorically different situations. One of these is a dice game with Pascal. Given that the event-space for the dice is known, given that we can assign the probabilities correctly (e.g., that the distribution of probabilities is in part guided by the prior uniformity of the event space), etc., the use of probability makes perfect sense. I know, e.g., that Pascal’s bet on snake-eyes is not likely to win as the chances of rolling 2 die so that each is one is 1/36. By contrast, consider a rather different situation. Ted wants to figure out how likely it is that Jane will enjoy the meal he is cooking for her. In marked contrast to our dice with Monsieur Pascal, we simply have no idea how to begin to partition the possible event space, assigning values, and so on. Given this fairly obvious difference, trying to make an analogy between these two cases commits what Taleb has called “the ludic fallacy” (Taleb, 2007: 309) wherein we confuse games of chance with complex worldly systems (cf. also Gilboa, 2009). And, clearly, the financial world is more complex than either dice or dinner.

Given these intuitive cases, it must be noted though that the number of different possible applications of probability theory is, not surprisingly, much larger than the variety of playgrounds where dice, coins, urns, playing cards, etc. are found. Specifically, in addition to games of chance that constitute a key application of probability theory (Weaver 1963), there are other fields or types of problems to which probability theory reasonably applies and which have been classified as disorganized and complex: Disorganized complexity is represented by systems with very large numbers of variables and high degrees of randomness (Weaver, 1948: 537f.). Interest in systems of this sort began in the late nineteenth century with the investigation of systems representing the motions of gas molecules in a closed space. This is the realm where the methods of statistical mechanics are applicable, emanating from the work of Ludwig Boltzmann and Josiah W. Gibbs (see Table 1 in our Appendix for a perspicuous overview of different sorts of systems).

However, this use of probability according to the theory of complex and disorganized systems has been extended illegitimately to an application of probability to complex systems altogether, which raises a host of problems (Weaver, 1948). Given our paper’s focus, we examine one proposed, and allegedly predestined application of probability, namely that of risk management in banking which, at the same time, is a classic example of dynamic or organized complexity (see Table 2 in the Appendix). The purpose of probabilistic risk management is not necessarily to determine “the correct” probability distribution (where it might be unclear what that even means), but to make decision-guiding forecasts of the likely losses that would be incurred for a variety of risks the bank faces (Rebonato, 2007: 43, 107f.): “it is vital to remain mindful that it is the forecast that matters” (Brose et al., 2014: 340). The choice of risk metrics has been named the cornerstone of risk management in that connection (Stulz, 2008). In other words, a goal of risk management viz probability theory is to find “better” probability distributions wherein “better” means leading to more reliable and useful predictions for use in risk management.Footnote 12 The application of probability to complexity in the shape of risk management can be undertaken in several ways including risk measures such as Value at Risk, Expected Shortfall, etc. (cf. e.g., Jorion, 1997; McNeil et al., 2005). However, critically, for these applications to be applications of probability, clearly, they must be grounded in probability theory (Malz, 2011). In other words, a risk model must be a probability model. This entails two things:

  1. a)

    Risk models rely on formal languages that are often drawn from Kolmogorov’s (1933) axiomatization of probability. Indeed, this and its accompanying theory provide the lingua franca for describing risks, which often corresponds to the standard view of risk (McNeil et al., 2005: 1f.).

  2. b)

    Risk models are probabilistic in the sense that they represent financial quantities such as profits and losses with random variables whose values are described by probability distributions (Pergler & Freeman, 2008: 3; Rebonato, 2007: 148). That quantities are expressed by random variables entails that the quantities are governed by (stable) randomness, corresponding to functions on the probability space (see below).Footnote 13

In turn, probability distributions are thus fundamental to risk models. A probability distribution in a purely formal sense is further a function that satisfies Kolmogorov’s axioms of probability (Frigg et al., 2014: 5). Thus, there seem, at least at first glance, to be ways to extend probability to cover complex situations. Given this, and the above, it is also clear that one extension of probability models are risk models, designed explicitly to cope with complex situations—to the extent that modern financial systems are truly complex, and not just disorganized and complex, which is unquestioned (Hoffmann, 2017; see also Table 2 in the Appendix). Prior to finding out if this is a legitimate application, let us examine how the probability distributions for real world risk modeling situations are constructed. Intuitively, to apply probability to Ted’s question of if Jane will like his meal, we need to construct a probability distribution of, e.g., Jane’s gustatory taste. Similarly, when we want to use a risk model in banking, we need to construct a probability distribution. Broadly speaking, three different ways to do this have been proposed (Mark & Krishna, 2014: 40f.; Rebonato, 2007: 148f., 180; and Embrechts, 2000: 449f.):

  1. 1)

    Historical method: The distribution of the losses is simply constructed by ‘brute force’, i.e., by using the empirical distribution (frequencies) of past returns or losses. For example, and following Rebonato (2007: 148f.), imagine that we have 1000 observations of changes in the S&P index and, therefore, 1000 ranked hypothetical profits and losses. Let us take the tenth worst hypothetical outcome, and let us assume that it was − $20 millions (ibid.). We have 990 better (more favorable) outcomes and 9 worse outcomes (ibid.). By definition, we have just built from our sample an estimate of the true 99th percentile of the population (ibid.). Since the historical method appears to make use of no free parameters, “it is often referred to as a nonparametric approach” (ibid.: 148). Yet, an extremely important parameter is used; namely, the length of the statistical record or the length of the so-called ‘statistical window’ to be considered in the analysis (ibid.). This implicitly determines what constitutes the relevant past and does so in a binary manner—fully relevant and included in the data set or totally irrelevant and outside the boundaries of our data set (ibid.).

  2. 2)

    Parametric techniques, including the ‘Variance–Covariance Approach’ (Mark & Krishna, 2014: 40) ‘Empirical Fitting Approach’ (Rebonato, 2007: 150f.) and the ‘Fundamental Fitting Approach’ (ibid.: 152f.): These techniques assume that the market or the distribution of profits and losses obeys a simple statistical formula or is a member of one of the many families of distribution functions one finds in statistics books. For example, we take our 1000 observations of changes in the S&P index again, but this time map it to the elegant Gaussian distribution so that there are then only two parameters needed to describe the returns, its mean and its variance or standard deviation.

  3. 3)

    Monte Carlo simulation method: It randomly generates market factors such as interest rates and calculates the expected return on, e.g., the portfolio. This procedure is repeated many times (potentially hundreds of thousands of times). Ultimately, the resultant set of portfolio values form a distribution from which risk measures such as Value at Risk can be calculated. Notice that this method is particularly useful for sampling the tails, provided that we have strong ex ante reason to believe that the probability distribution is of a certain type and that we know its parameters.

Given this construction, it may seem as if we can apply risk models to complex systems like banking without ado. However, each proposal has some critical weaknesses. For example, Rebonato (2007: 157f.) points out that in common situations ex ante information on the type of the probability distribution in question is not available or is of dubious validity (especially in the tails): “What we do have is a finite number of actual data points, painstakingly collected in the real world. On the basis of these data points I can choose an acceptable distribution and its most likely parameters given the data […]. This approach may well do a good job at describing the body of the distribution (where the points I have collected are plentiful); but, when it comes to the rare events that the timorous risk manager is concerned about, the precision in determining the shape of the tails is ultimately dictated, again, by the quantity and quality of my real data [emphasis added].” (Rebonato, 2007: 157f.). But there is worse news. There is a fatal trade-off between the increase in accuracy that comes from learning more precisely what the relevant conditions are and the loss of estimation power that arises from the progressive loss of pertinent data: For example, if we wish to describe the magnitude of bank losses by means of a distribution, then we can take data from the last 100 years which includes long periods of boring and predictable banking—beautifully illustrated by the activities of the Bailey Brothers' Building and Loan from the classic It's a Wonderful Life. Or we can refer to the last 40 years when modern financial systems have come into existence, when in the wake of the Vietnam War and the oil price shocks of 1974 and 1979 inflation and interest rates in the money market rose significantly (Admati & Hellwig, 2013: 53) and deregulation of financial markets set in (e.g., the so-called ‘Big Bang’ in 1986).Footnote 14 The results would be very different. Moreover, what constitutes the relevant past is not ‘contained in the data’. Consequently, there is also an assumption, implicit or explicit, about what constitutes relevance.

In turn, these weaknesses can engender at least two reactions. One of them is to accept that probability theory cannot be applied to risk in banking. This would entail that one abandons the use of risk models. However, many theorists think that this reaction is both an over-reaction and that it throws the baby out with the bathwater. Another reaction is to try to locate the source of the weaknesses and remove them. It is one such proposal that we now turn to.

5 Fat-Tails and Big Problems

Recall in the last section we discussed the attempt to apply probability to banking via risk models and argued that this application faced substantial problems. One prima facie way to avoid these problems and continue to use risk models is to abandon the assumption that the probability distribution we construct in for such cases should be ‘normal’ or thin-tailed. In turn, one might contend that more sophisticated risk models can cope with the above worries. And indeed, many scholars (Hagel, 2013; Newman, 2013; Helbing, 2010; Mandelbrot & Taleb, 2010; Sornette, 2009) think that we should instead rely on, e.g., by ‘power laws’, ‘Pareto distributions’ or ‘Zipf’s law’ etc., or more generally, so-called ‘heavy-tail distributions’, allegedly better suited for describing many fundamental quantities such as stock and real asset prices.Footnote 15 First, we lay out how these more sophisticated models work. However, second, we argue that they still face a fundamental problem. In turn, this means the increase in sophistication has not addressed the underlying issue. This sets the stage for our presentation of the real problem in Sect. 6.

To begin, advocates of non-standard distributions note that a key problem with using standard distributions is that they systematically and dramatically underestimate extreme risk and black swan events (such as realized in the financial crisis which crescendoed in 2008. See, e.g., Taleb, 2007). They further believe that, once this problem is solved, the application of risk models to banking is no longer problematic. To see the difference, Fig. 1 juxtaposes the ‘old and denounced’ normal distributionFootnote 16 with a ‘new and powerful’ power law distribution.

Fig. 1
figure 1

Heavy/fat and long versus thin and short tails: The value of the random variable according to the normal distribution or Gaussian distribution or the bell curve is practically zero when it lies more than a few standard deviations away from the mean. Insofar, the choice of a bell curve may not be appropriate when a significant fraction of outliers—values which lie many standard deviations away from the mean—is to be expected. A tricky problem is, as Taleb (2013: 9) states, that fat-tailed probability distributions can masquerade as thin tailed

Adding to this contrasting juxtaposition, the following Fig. 2 evidences that actual empirical data or an empirical return distribution conforms poorly to a prespecified theoretical normal distribution (cf. also Lux, 1998). (Obviously, the exact form of an empirical return distribution depends on the asset type and on the data frequency; Söderlind, 2014.Footnote 17 In other words, different data sets are differently well approximated by a certain prespecified theoretical distribution.)

Fig. 2
figure 2

Source: Söderlind, 2014

Normal distributions are not the (new) norm(al): A QQ plot of daily S&P 500 returns (the American stock market index Standard & Poor's 500), from 1957:Q1–2014:Q3; in general, a QQ plot is a way to assess if an empirical distribution matches reasonably well a prespecified theoretical distribution; for instance, a normal distribution (or a normally distributed continuous random variable N) where the mean (µ) and variance (σ2) have been estimated from the data. Each point in the QQ plot shows a specific percentile (quantile) according to the empirical as well as according to the theoretical distribution. For instance, if the 0.02 percentile is at − 10 in the empirical distribution, but at only − 3 in the theoretical distribution, then this shows that the two distributions have fairly different left tails. The visual inspection of a histogram can also give useful clues for assessing the normality of returns.

In particular, this so-called Q–Q plot (Quantile–Quantile plot) indicates that the two distributions have fairly different left tails, which means that the frequency of factual extreme losses is not well-described by the tail of a bell curve. This might be seen as grist for the mill of power law proponents.

Given this increase in sophistication, one might think that the problem of deploying probability theory in complex systems has been solved. Indeed, in the words of the mathematician Walter Willinger and his colleagues: “[t]he presence of [power law] distributions in data obtained from complex natural or engineered systems should be considered the norm rather than the exception” (Mitchell, 2009: 269) and so a method designed explicitly for such situations should solve, or at least take a step towards solving, the underlying problem.

However, notice that this account simply assumes (in lieu of explicating) that ‘fat tail’ analysis leads to better predictions. And we have, as yet, been given no reason to accept this claim. Indeed, the triumphalist account of ‘fat tails’ voiced above rests on the unjustified assertion, (*): namely, that we now have a better (i.e., more reliable or trustworthy) or the right (?) extrapolation rule at hand; or, in the end, even an underlying theory of how bodies in the financial system behave and interact. Again, we simply have not been given a reason to accept this rather bold presupposition.

Indeed, the following statement by Sornette (2009: 5) who embraces the abandonment of ‘normal’ distributions in favor of wilder distributions feeds the doubt that the search for the one right rule or theory is tantamount to a never-ending task: “[I]n a significant number of complex systems, extreme events are even ‘wilder’ than predicted by the extrapolation of the power law distributions in their tail” (cf. also Spitznagel, 2013: 244). Moreover, the physicist and network scientist Cosma Shalizi had a less polite phrasing of his sentiments: “Our tendency to hallucinate power laws is a disgrace” (Mitchell, 2009: 254). Stumpf & Porter (2012) reinforce the stance and contend that most reported power laws lack statistical support and mechanistic backing: “[…] a statistically sound power law is no evidence of universality without a concrete underlying theory to support it” (ibid.: 666). Renn (2008: 16) already remarked fundamentally that the interactions between human activities and consequences are more complex, sophisticated or unique than the average probabilities used in technical risk analyses are able to capture. Alan Greenspan conceded “that nobody can predict the future when people are involved. Human behavior hasn’t changed, people are unpredictable.” (Cited in Sornette, 2003: 321). Churchman (1961: 164) differentiates that in the case of the gambler, such a theory [in the context of his discussion: a ‘well-substantiated’ theory of the manner in which draws of various types occur for a given reference class] may be available; in the case of the executive [metaphorically speaking of real-world businesses and risk management], there is no such theory.Footnote 18 Von Hayek (1967: 30) makes a very similar point.Footnote 19 And, finally, if we attempt to retreat from the ‘right’ distribution to simply a ‘better’ one, as defined in Sect. 4, we are faced with an odd epistemic situation. To wit, we may simply have no idea why one distribution is better for some specific case. In turn, this means we have no reliable way to determine if, when, how, and how far, the better distribution can be applied to other cases (or even the same case, if it changes enough).

All of this is to say that the assertion (*), that fat tail analysis and the real behavior of markets align, is, as it stands, unjustified since it has not yet been sufficiently accounted for the challenges posed by the complexity of modern financial systems (see Table 2). To sum up in a more abstract manner consider:

While Mandelbrot (1997) and other scholars have taken a step in the right direction by addressing some

  • non-linearities (e.g., the fatter tail syndrome) and

  • some dependencies (e.g., conspicuous and strong symptoms of non-independence of price changes or other time series),


important aspects of complexity have been neglected:

  • real dynamics or unstable time series—evolving or changing complex systems over time, which does not entitle us to pick a specific probability function because we do not have a good reason to prefer one to the other—and

  • other dependencies such as feedback in open systems—self-similarity and invariance in distribution with respect to changes of time and space scale do not indicate systems whose parts interact and that exchange material, information, and/or energy with their environment across a boundary.Footnote 20

This finding can be packaged in a resounding, but in this form still under-determined and more suggestive research statement (which is why we refer to it as a conjecture, not as a fully-fledged proposition).

There is no good theory of how bodies in the financial system behave and interact and, therefore, a scientific basis for proclaiming the accuracy, suitability, and commonality of power law distributions is missing.

From the perspective of dynamic complexity, it is not power laws per se that are important, but the presence of underlying high variability and dynamics (Alderson & Doyle, 2010: 849). Moreover, the foundational and epistemological problem of identifying and utilizing the ‘right’ (or else providing an objective sense to ‘better’) probability distribution is not, and perhaps cannot be solved (Sect. 6). All in all, ‘power law’ proponents like other probabilists seem to be just poking around in the dark. Trying to predict future (extreme) losses with structurally wrong models is like trying to predict the visibility of a comet with an Aristotelian model. For Aristotle, a comet was understood as an atmospheric event and, because of this, an Aristotelian-model would utilize the wrong parameters, idealizations, and so on, as it simply misunderstands what a comet is. And no amount of fancy math, more data (against the background of a flawed theoretical assumption), and so on, could help here. Similarly, the use of risk models for complex and organized systems simply misunderstands the nature of these systems (see Table 1). Given this misunderstanding, no fancy math, no additional parameters, no amount of data, and so on, can help here either. In short, without prior (and correct) knowledge of the targeted system, the use of more fancy math, attempts to add parameters, and so on, are doomed to miss their mark. Comets are not atmospheric events and risk in the financial system is not akin to risk when betting on dice.

Before moving on, let us take stock. We have discussed the syntactic structure of axiomatic probability theory. We have brought into focus a range of possible applications. We argued that one such application, risk models and banking, suffers from some weaknesses. We have examined attempts to fix this by making the mathematical machinery more complicated. However, we argued that this more complicated machinery does not fix the problem. At this point, the reader might suspect that the problem is not that our math isn’t fancy enough or that our data isn’t rich enough. And, indeed, the next section tries to systematically articulate this suspicion.

6 The central Argument Against Using Probability Theory for Financial Risk Management

In this section, we show that the determination of a probability (or loss) distribution for risk management purposes in banking is not possible for either standard or fat-tail style analysis. This is due to a foundational epistemological problem of induction concerning deep doxastic uncertainty which, in turn, stems from organized complexity characterizing modern financial systems (see Table 2). Moreover, should this argument prove valid and sound, it will on top of that show that Hume’s worries about induction, at least in this instance, are far from mere ‘academic’ skepticism.

To begin, let us assume that the application of probability theory to a situation requires the partitioning of an event space. This assumption is simply built into the Kolmogorovian axiomatization of probability we sketched in Sect. 3. Given this, the partition can be made on either a priori or a posteriori grounds where “a priori” means that the event space is already given (e.g., dice) and “a posteriori” means that the event space is derived from empirical evidence (e.g., Bernoulli, 1713a, b/1988: 65). Let us examine each in turn.

If the event space can be partitioned a priori, then the use of probability is perfectly licit and the application of probability to the target raises no issues. However, as Taleb has stressed, the assumption that we have access to such a priori partitions for almost any real-world complex and organized system rests on a flawed analogy between these systems and, e.g., games of chance (e.g., Taleb, 2007: 309). Such an analogy is flawed for at least three reasons. First, unlike simple games, it is impossible to know all the parameters involved, especially given that modern financial systems are organizationally complex, i.e., consist of a plethora of parts and elements that are tightly coupled and in close interrelation (Hoffmann, 2017; Admati & Hellwig, 2013: 89, 76; Williams, 2010), Second, unlike simple games, every (risk) model developed invariably extrapolates from data of the past to the future (Heinemann, 2014: 194). In particular, there are basically three ways of building a probability distribution for some financial random variable (see Sect. 4), which depend mainly on hindsight, and which are thus based on ‘empirics’ (Stulz, 2008). In other words, methodology in risk management in banking relies on inductive inferences. Third, even granting that complex financial systems have a set of probability distributions, we have no epistemic access to it, unlike simple games (Taleb & Pilpel, 2004). This is partly because making sense of and understanding the behavior of complex systems is hardly possible because they are non-linear and have too many interdependent variables (e.g., Churchman, 1961). Thus, it seems that a priori partitioning is not viable.

If the event space is a posteriori, then the goal should be to derive the proper (or at least a better) partition of an event space from some empirical record. However, this derivation faces at least five different objections. The first and second objections were mentioned in our introduction. To wit, the data we have is woefully incomplete, spotty sketchy, and so on, and our computational power deeply inadequate. However, as we also noted, these two objections are contingent and one may retain the hope that, eventually, we (or perhaps some super AI) can overcome these flaws. In contrast, the next three objections undermine this hope.

The third objection begins with a well-established point from the philosophy of science. To wit, a given hypothesis is always underdetermined by (finite) data (e.g., Duhem, 1991: 185–218, & Quine, 1951).Footnote 21 This is partly because the testing of a hypothesis from the theory relies on a broader theory’s architecture, complete with auxiliary hypotheses, devices for idealization, pre-set parameters, and so on. Given this, any number of these background assumptions can be tweaked so that recalcitrant data can be accommodated. For example, one response to the perturbation of Mercury was not to alter Newton’s theory, but instead to posit a non-existent planet, Vulcan, whose orbit caused the perturbation, and then to argue that the telescopes simply weren’t powerful enough yet to detect Vulcan. Moreover, such a seemingly ‘ad hoc’ move led astronomers who noticed a perturbation in the orbit of Uranus to posit, and later discover, Neptune. It is also partly because one key purpose of a theory and hypotheses (and indeed risk management) is to go beyond the extant data and allow for predictions. For us, there are two key lessons to be learned from this. First, a theory only confronts evidence against the background of a host of complex assumptions, idealizations, and so on, and we can always tweak these and save the theory (or increase the theory’s power!) with respect to this evidence. Second, a good theory does not simply cope with data but gives us means to go beyond it.

Given this, by analogy, a probability distribution is under-determined by the finite data it is applied to, for similar reasons. For example, how I partition the event space is not something I can ‘read off’ from the data as I can partition it in any number of ways. Returning to Ted’s dinner, and his dining experiences with Jane, he might think Jane’s taste is important. However, he might equally well take atmosphere, conversation, comfort, Jane’s day, liberal amounts of wine, etc., as critical. And nothing in Jane’s past dining behavior determines which is correct. Moreover, it is simply not enough that we come up with a distribution that merely copes with the data. Rather, we need this partition to make reliable predictions. However, without some idea of why some given distribution makes reliable predictions for this case, we have no reason to expect it to work outside the case at issue. Indeed, if the case undergoes fairly drastic changes (as complex organized systems are wont to do), we have no reason to expect it works for a future version of it either.

Pursuant to the third objection, the fourth objection focuses on the data itself. Let us make the bold assumption that the data we have will always be finite as, if for no other reason, the universe ‘banged’ into being at a certain point. If this is so, then under determination is not a feature we can overcome by accruing more and more data. Indeed, the insistence that ‘more data’ can overcome the problem is a promissory note that cannot be cashed.

Finally, fifth, and most powerfully, one key assumption behind the attempt to derive a posteriori a probability distribution for complex organized systems like financial markets is that their past behavior gives us some handle on their future behavior. Indeed, the entire hope of such a derivation is that getting some probability distribution will enable prediction. However, first, this runs right into Hume’s objection against induction. To reiterate, we have been given no reason to grant this assumption—i.e., that the past will be like the future. Second, relatedly but worse, empirical studies of the behavior of complex organized systems give us solid empirical evidence to reject it (see our Appendix, Tables 1 and 2). Indeed, non-linear dynamic systems might be fully deterministic and yet in principle unpredictable as the amount of interdependent variables belie any hope that we can compress the behavior of the system into some simple and tractable form. As often said, the simplest description of such complex systems is often themselves. And this undermines the hope that we can develop some way to ‘outrun’ what they are doing. Here, indeed, we have good reason to expect past behavior to not be like future behavior. Moreover, if this is so, even trying to find a ‘better’ distribution is misguided as ‘better’ also presupposes that what the system did before will be like what it does in the future. And Hume laughs.

6.1 The Ramifications for Risk Models

Thus, we have given arguments that an a priori partition of event space is not viable for real-world, complex and organized systems like the global financial markets. And we have proffered five objections to the attempt to derive such a partition, three of which we take to be in principle epistemic objections. If these arguments (or only some subset of them) are valid and sound, then it is clear that the application of probability to risk management is deeply problematic.

Moreover, objection five, in particular, is an instance of Hume’s argument from induction adumbrated in Sect. 2. In both cases, what we have established is that inferring from the past behavior or a target to its future behavior is not justified. However, whereas Hume’s problem is ‘academic’ in that it applies to everything from the sun rising tomorrow to sophisticated inferences about copper and electricity, ours is more focused and specific. Precisely, it is partly because the banking system is an organized and complex system whereas in situations of disorganized complexity we find systems with very large numbers of variables and high degrees of (genuine) randomness, so that the laws of large numbers, the Central Limit Theorem and other so-called “properties of stochasticity” (1966: 604) hold. By contrast in organized and complex systems of risks in banking, any inference from past behavior is illicit and Hume’s worries are not ‘merely academic.’ If this is so, our argument has clear ramifications for the use of risk models in the banking system. Specifically:

  • Local: The suitability of a certain probability distribution for describing some past and future profit and loss data as well as for a given risk management purpose cannot be determined.

  • Global: Thus, any choice of a particular probability distribution for a given risk management purpose is necessarily arbitrary and cannot be justified. In other words, no striking reason or ground can be given that can both (a) assure us that the distribution we claim for the past behavior of a target is the right one and (b) ensure that a projection of the target’s past behavior into the future will be ‘enough alike’ to allow for predictions.

7 Conclusion

In order to draw inferences from data as described by econometric texts, it is necessary to make whimsical assumptions […]. The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable.

(Edward E. Leamer, 1983)

Our findings and conclusions seem to be in line with the results reported by many mathematicians (mathematical finance)—in contrast to some applied scientists and practitioners (involved in the broader quantitative finance scene): “[…] The [Financial] Crisis [of 2007–09] saw numerous examples where ‘practice’ fully misunderstood the conditions under which some mathematical concepts or results could be applied. Or indeed where models were applied to totally insufficient or badly understood data; the typical ‘garbage in, garbage out’ syndrome.” (Das et al., 2013: 702; cf. also Riedel, 2013: 23f., 33, 52, 180; Jaynes, 2003: 65f., 74; Daníelsson et al., 2001: 3). In the light of our central argument, however, they unfortunately phrased their concerns in a too reluctant manner.

In fact, our global conclusion is different from, and stronger than, this charge of abusing mathematical tools. The real issue is not that we have ‘garbage’ data, that if we had more computational power, we could make predictions, etc. Rather, the problem is an inherent feature of complex systems themselves. By their very nature, we have no reason to assume that their behavior will evolve in such a way that their future will be like (i.e., predictable in some set abstract way) their past. Moreover, we have no way to establish and justify that our understanding of the system’s past behavior is correct.

It is important to see though that the correctness of our argument does not depend on problems of complex risk assessment being organized or dynamically complex and escaping statistical analysis—otherwise, the reasoning would be circular. Rather, we simply emphasized and specified the link between complexity and the inadequacy of probability-based modeling of risks. The importance of our central argument has different repercussions.

First and foremost, it exposes the uselessness of probability-based models of predominantly extreme risks, as the differences between probability distributions are less significant in the body (see Fig. 1). Put differently, we clearly reveal a wrong track of risk modeling in today's world, namely one which cannot succeed by normal statistical means, when we push logic to the extreme. The output of risk measures like Value at Risk, for example, is a value for a loss of money the magnitude of which depends on which loss or probability distribution is assumed. Probability distributions are not directly observable and as long as we have no systematic way of drawing a line between the ‘good’ and the ‘bad’ distributions, we should not rely on our calculations of minimum or expected losses etc. when making provisions for the future (Frigg et al., 2013: 487). We are aware that there exists an established body of literature on precisely this question of where to draw the line, and it is argued in that literature that scoring rules or "epistemic utility" measures (how they are sometimes called in philosophy) give us the answer. Yet, we showed that, conditional on the settings depicted by the central argument, and excluding, for example, situations of disorganized complexity, probabilistic forecasts are unreliable and do not provide a good guide for action as such (ibid.: 490). In other words, our central argument contends that the issue is not one of finding better distributions, but the very idea that complex organized systems work in the way risk management assumes they do.

And second, this entire discussion on probability distributions would have remained completely theoretical if it was not occurring against the background of risk management practices in the financial industry. Any real-life risk calculations which hinge on knowledge about probability distributions are suspicious. This covers many aspects of finance in banking, such as trading, asset or portfolio management, and does not exclude recent or future developments of financial markets and banking, such as the development of algorithmic trading, sometimes linked to the promise of making markets more stable (Li et al., 2019; Georges & Pereira, 2019). And as it can be very expensive to use the ‘wrong’ probability distribution (consider, for example, the collapse of the hedge fund LTCM; cf. Lowenstein, 2002) which we cannot separate from a suitable one according to the central argument, it is finally time to conclude:

There is no point in drawing on probability theory for organizationally complex risk management purposes.

Our central argument is valid and the rationale beneath the premises ought to ensure that they are correct (or at least, very plausible) and that, therefore, the whole argument is sound. In that case, the house of modern risk management where probability theory and distributions are constituting elements is not in danger of collapsing, but has never been well-established since it lacks a theoretical foundation.

Probabilistic reasoning is not worthless, not even in terms of its application to the real world (as opposed to gambling games of various sorts and text book cases); but, instead of using it because we are convinced that it gets things right, we (should) use it because it is so very convenient or maybe even the only tool we currently have. In other words, it is time, at the risk of boresome repetition, to use it with caution, deliberation, and restraint; the latter especially when facing organized and dynamic complexity (cf. also Peters & Gell-Mann, 2016). In sum, Hume’s pragmatic shrug may be the best we can do. However, we must always keep in mind that a shrug is not a justification. And even more, risk management professionals should be shaken out of their ‘dogmatic slumber’.