About sixty years ago, Hans Albert (1959, 1963) criticized economists for their, as he called it, “model platonism”, that is, a methodological attitude that immunizes theoretical models against empirical criticism.Footnote 1 Model platonism is opposed to scientific realism in economics, that is, the view that economic theories and models are attempts to capture interesting truths about the world and, therefore, can at least in principle be criticized empirically, by demonstrating that the relevant claims are false.

Since the original publication of the model-platonism critique, economics has changed radically. Historians of economics speak of an “empirical turn” in economics or announce the “age of the applied economist” (Backhouse & Cherrier, 2017). The two or three decades after World War II when pure theory dominated economics now appear as an exception.

The self-image of modern economists emphasizes the interplay between theory and experience (see, e.g., Rodrik, 2018, 276). Economic theory comes in the form of theoretical models. A model is a list of assumptions yielding conclusions concerning some phenomena of interest. The mainstream view is that such a model has to be tested empirically, by confronting its conclusions with relevant observations. If the observations contradict the conclusions, the model must be modified or discarded. If the observations bear out the conclusions, the model can be accepted for explanatory and predictive purposes as well as a basis for deriving policy recommendations.

However, there is a hitch. In the discussion of a model, all sides, including the model’s inventor, typically agree that the model is unrealistic: many or even all of its assumptions are considered as false. Indeed, most economists think that this is unavoidable (see, e.g., Sugden, 2000, 24, 28; Pfleiderer, 2020, 81–82): the complexities of the situations under investigation defy realistic description; whether they like it or not, economists must simplify. If, however, many of the assumptions of the model are believed to be false from the outset, what do we learn from an empirical test? What can “empirical criticism” of such a model mean?

One radical attempt to resolve this difficulty draws upon Friedman’s (1953) methodology of positive economics: a model’s unrealistic assumptions do not matter as long as the conclusions of interest are correct, which is what one finds out by testing them. However, if the realism of assumptions were completely irrelevant, any absurd model with an interesting conclusion would have to be taken as seriously as any traditional model: aliens, magic, fairy tales—anything would go. Indeed, it is hard to see why one would need theories and models at all. Why not just invent interesting and testable hypotheses in an ad-hoc fashion, without recourse to any models?

Actually, of course, most economists tend to take their models seriously. But this requires a less permissive solution to the problem of unrealistic assumptions and an explanation of the relevance of empirical investigations to economic theory and model building. Without a clear understanding of the interaction between theory and experience, model building and empirical work tend to remain separate activities, and severe outbreaks of model platonism are to be expected. Pfleiderer (2020), for instance, criticizes methodological attitudes I would subsume under the term “model platonism”, emphasizing the irrelevance of empirical work inspired by models with highly unrealistic assumptions and the noxiousness of policy advice derived from them.

Though heavy reliance on unrealistic assumptions nowadays often meets with criticism, the critics still fail to clarify the interaction between theoretical model building and empirical work (Sect. 1). The reason for this seems to be a failure to distinguish two kinds of assumptions in a model and their different roles in empirical investigations: law-like assumptions, which taken together form the theoretical part of the model (or the theory, for short), and situational assumptions, which describe the real or hypothetical situation to which the theory is applied (Sect. 2).Footnote 2 Without taking this distinction into account, it is impossible to understand the logic of empirical investigations and, specifically, to deal with unrealistic assumptions without falling back into model platonism.

The solution to the problem of unrealistic assumptions lies in robustness considerations. Robustness is often invoked but the usual definition of robustness is unconvincing and the details of how robustness requirements might solve the problem of unrealistic assumptions remain hazy. On the basis of the distinction between law-like and situational assumptions, I propose an improved account of robustness that makes robustness a testable conjecture (Sect. 3). This solves the problem of unrealistic situational assumptions. Moreover, it allows for a straightforward analysis of the sense in which models with unrealistic situational assumptions can provide approximate explanations if their law-like assumptions are true.

The problem of unrealistic law-like assumptions—that is, of false theories—is not amenable to the same kind of solution. However, solving the problem of unrealistic situational assumptions is sufficient to restore the logic of empirical investigations. On this basis, it is also possible to specify the role of false but nevertheless useful theories.

The proposed solution results from a straightforward application of Hans Albert’s critical rationalismFootnote 3 to the process of modelling in economics. It not only shows how to escape from model platonism. It also yields a more convincing description of the research process than the naïve empiricism of the modern self-image—a description reflecting the practice of critical model discussion in those not so rare instances where methodological common sense prevails (Sects. 4 and 5).

1 How to be a Model Platonist

1.1 Elementary Model Platonism

Model platonism in economics is a methodological attitude that immunizes economic theory against empirical criticism. Technically, platonism (spelled with a small “p” to indicate that this is a modern position rather than Plato’s) is “the view that there exist such things as abstract objects—where an abstract object is an object that does not exist in space or time and which is therefore entirely non-physical and non-mental” (Balaguer, 2016).

Model platonism in this sense results if one takes the assumptions of a model to be statements about an “ideal type” while empirical inquiry is concerned with “real types” existing in space and time.Footnote 4 Since the model’s assumptions define the ideal type, they are necessarily true (unless they are contradictory). If economic theory talks only about ideal types and not about real types, it contains no testable claims and is immune to empirical criticism.

The term “model platonism” is, however, not restricted to a platonism in the preceding sense. It encompasses all methodological positions denying that economic models can contain testable claims. Nevertheless, model platonism leaves room for empirical economics: one can investigate how historical situations differ from models. However, if economic models involve no testable claims, such comparisons yield no evaluations of the models but only implicit classifications of historical situations: a situation under investigation is judged to be more or less similar to a model, where similarities along some dimensions may, of course, be accompanied by dissimilarities along others. “Unrealistic assumptions” just fall under the heading of dissimilarities.

Model platonism, then, turns empirical economics into economic history, but economic history without a lesson. After all, noting similarities and dissimilarities between models and historical situations is just a way to re-describe the empirical data.Footnote 5 By themselves, these data yield neither explanations nor predictions nor policy advice. This would require a theoretical model that applies to the situation under investigation—which we cannot have according to model platonism.

One might, of course, select some model for predictions or policy advice anyway. However, the only conclusion from failures would be that, in the situation under investigation, the model was, in an important respect, not similar enough to the situation, implying that the researcher chose the wrong model. Model platonists can only blame researchers for failures; the models are always innocent.Footnote 6

In order to save empirical investigations from practical irrelevance, model platonists have two options. The first option is inductivism. Roughly speaking, inductive arguments are arguments whose premises describe observed cases—for instance, similarities and dissimilarities between a model and several historical situations—and whose conclusions are about, or extend to, unobserved cases, in particular, future cases. The problem with inductive arguments is that their conclusions do not follow from their premises, that is, one can, without contradiction, accept the premises and still reject the conclusion. Inductivists maintain, nevertheless, that science proceeds by induction: they claim that, at least under certain circumstances—for instance, if the number of observed cases is large enough—it is rational to accept the conclusions of inductive arguments.

There is, of course, nothing wrong with forming conjectures inspired by observations (or, for that matter, by anything else). According to critical rationalism, however, these conjectures need to be tested severely before they can be accepted. Listing supporting cases is no substitute for severe testing. A severe test is an effort to find counterexamples. It makes use of background ideas and, possibly, competing conjectures suggesting where counterexamples might be found. New conjectures should not be accepted unless they have survived a serious search for counterexamples. Indeed, survival of such a search is the best reason we can have for accepting a conjecture.Footnote 7

This principle applies not only to science, where counterexamples are failed predictions, but also to logic and mathematics (Lakatos, 1976). A logical or mathematical proof is nothing but a chain of conjectures about deductive relations between propositions. Each of these conjectures can, in principle, be refuted by a counterexample. If no counterexamples can be found, the proof is tentatively accepted. The subjective feeling of certainty caused by reading a well-presented proof is irrelevant. What is relevant is the fact that a critical discussion among experienced specialists has yielded no counterexamples to any of the steps in the chain of deductions.

Inductivism, then, cannot solve model platonism’s problems. Another option is more attractive. Model platonists might develop assumptions about the circumstances under which models with unrealistic assumptions are, in certain respects, sufficiently similar to situations in time and space. This means that there are two levels of theorizing: a first level where unrealistic models are developed, and a second level concerning the relation of the unrealistic models to experience.

Yet, a list of second-level assumptions would be a second-level model. If this second-level model can do without unrealistic assumptions, the same should be possible on the first level. And if the second-level model also features unrealistic assumptions, model platonists face, again, the same problem they were unable to solve on the first level.

Model platonism, then, is hardly convincing once one begins to wonder how it might make sense of empirical investigations. Before turning to the critical-rationalist alternative, I want to discuss two important variants of model platonism: the theory-as-tautology view and structuralism. Both run into the problems just discussed. In each case, the problem of unrealistic assumptions takes center stage.

1.2 The Theory-As-Tautology View

According to the standard view, economic theories come in the form of models, and models are lists of assumptions. Let \({A}_{1}\) to \({A}_{n}\) be a model’s assumptions. We write \(\wedge\) for “and” and \(A\) for the conjunction \({A}_{1}\wedge \dots \wedge {A}_{n}\). Let \(F\) be a conclusion of interest (or a conjunction of such conclusions) that follows from these assumptions.

As an example, consider the neoclassical model of a competitive exchange economy, which consists of the following assumptions, stated in a non-technical way: There are many agents, each of which is endowed with specific quantities of several consumption goods. Each agent has a complete preference ordering over the set of all conceivable bundles of these goods. There are perfectly competitive markets for the goods where agents trade these goods at market-clearing prices. External effects are absent. From the conjunction of these assumptions (\(A\)), it follows that the allocation of goods resulting from market transactions is efficient (\(F\)).

Which claim of this model should be tested? Superficially, the logical structure of the model gives no hints. Clearly, it makes no sense to test just \(F\): the model’s message is not that, unconditionally, market allocations are always efficient. Any relevant testable claim derived from the model must be a conditional claim.Footnote 8

One might be tempted to consider, as an alternative to just \(F\), the conditional statement “if \(A\), then \(F\)”, symbolically \(A\to F\). However, since \(F\) follows from \(A\), the statement \(A\to F\) is a conceptual truth or “tautology”, as it is often called in economics: it is true but contains no factual information. From a logical point of view, \(A\to F\) is equivalent to “All bachelors are unmarried”.Footnote 9

The theory-as-tautology view holds that economic theory consists solely of tautologies like \(A\to F\): the only message of a model is that its conclusions must hold under its assumptions.

This view is quite unattractive. Tautologies, after all, do not explain or predict. We cannot explain why Uncle Bill is unmarried by pointing out that he is a bachelor. Nor makes it any sense to use the fact that he is a bachelor to predict that he is unmarried.Footnote 10

Moreover, the theory-as-tautology view runs into problems when confronted with the question of how theory and empirical investigations interact. Empirical testing of a tautology like \(A\to F\) is obviously a waste of time since the result is known beforehand, independently of all empirical facts. If we find no cases where \(A\) is true, \(A\to F\) is irrelevant but not refuted. If we find cases where \(A\) is true, that is, cases where all the assumptions of the model hold, then \(F\) must hold—just as every bachelor must turn out to be unmarried.

So what does empirical work achieve? On the basis of the theory-as-tautology view, there is no reasonable answer. For instance, in a book that, at its time, was regarded as an exemplary combination of theory and empirical investigation, the econometrician and international-trade theorist Edward Leamer retreated to the claim that “[a] judgment about the success of an empirical approximation to a tautological theory is ultimately a matter of aesthetics” (Leamer, 1984, xvi). Yet, claiming that a tautology is only approximately true is a contradiction, as when one claims that almost all, but not all bachelors are unmarried.Footnote 11

The only attractive aspect of the theory-as-tautology view is that it absolves theorists from taking unrealistic assumptions seriously. Leamer (1984), for instance, is concerned with a model of international trade—an extension of the neoclassical model of the competitive exchange economy—whose assumptions are highly unrealistic, as he points out in great detail. Not only are most of the assumptions false for the years and countries he considers; for some of them, it is almost inconceivable that they could ever be true of any group of trading countries at any time and place.

While the conjunction \(A\) of the model’s assumptions is false, one can still empirically check some conclusion \(F\). In Leamer’s case, \(F\) is a linear relation between trade vectors and factor endowments across countries. However, what could one learn from testing \(F\)? Should \(F\) turn out to be true in some situation, it would be unclear why—the model, at least, cannot explain such a result since it states conditions for \(F\) that did not hold. Should \(F\) turn out to be false, this implies that at least one of the model’s assumptions must have been false in the situation under investigation—but this was already known before the test.

Unrealistic assumptions, then, make it difficult to say what a test could achieve. This difficulty should matter. But according to the theory-as-tautology view, it does not. As a tautology, the theory cannot and need not be tested. Of course, while the assumptions of the model might be false, elementary logic implies that the conclusion of interest might still be true. Indeed, in Leamer’s (1984, 187) opinion, the linear relationship works surprisingly well in the two years, 1958 and 1975, under consideration, despite drastic deviations from the model’s assumptions. Because the whole exercise is not considered as a test of a model or a theory, it does not matter that the econometric test is just concerned with the model’s conclusion.

Leamer the trade theorist, then, is completely safe from Leamer the econometrician. The trade theorist states only the tautology \(A\to F\) and is silent on the question of whether any of the assumptions or the conclusion might be true. The econometrician shows that \(A\) is false but that \(F\), nevertheless, looks quite good from an econometric point of view. This is considered as a surprise and might please the trade theorist. But if \(F\) had been rejected, the tautology \(A\to F\) would still be true.

The empirical results themselves are contributions to economic history. Leamer found that, in two years, a certain linear relationship held up quite well among sixty countries. No conclusion for other times and places follows. Any conclusions going beyond the data require as a further premise some non-tautological theory; however, according to the theory-as-tautology view, economics has no such theories on offer.

1.3 Structuralism: the Non-statement View of Theories

A second variant of model platonism is called structuralism. While the theory-as-tautology view appears only in side remarks, structuralism is a movement in philosophy of science.Footnote 12 Structuralists view the model of the exchange economy as a purely formal structure involving variables like “agent 1”, “good 1” and so on. An assumption like “agent 1 has ten units of good 1” must be considered as a formula which is neither true nor false. In order to apply the formal structure, the variables have to be interpreted in terms of a specific historical situation. Of course, not any interpretation will do. There exists an intended interpretation: “agent 1” has to be interpreted as a person, “good” as a consumption good, and so on. For instance, in the situation under consideration, there might be a person, Adam, and some apples. We stipulate that agent 1 is Adam and that good 1 is apples, with pieces as unit of measurement. With this interpretation, “agent 1 has ten units of good 1” turns into “Adam has ten apples”, which is true or false depending on the number of apples owned by Adam.

Given an interpretation of the complete formal structure, we can ask whether the interpretation turns all formulas into true statements. In this case, the set of all the things providing the interpretation (Adam, the apples, and so on) are said to be a model (in the logico-mathematical sense) of the formal structure. What economists call a model, then, structuralists consider as a formal structure (a set of formulas), and each interpretation that turns the formulas into true statements is called a model. From this point of view, the principal empirical question is whether the formal structure has any models.

Since structuralism considers scientific theories not as statements about the world but only as formal structures, it has also been called the “non-statement view” of scientific theories. If structuralism meets unrealistic assumptions, model platonism results. A formal structure whose intended interpretation yields unrealistic assumptions has no relevant models (in the logico-mathematical sense) in space and time. The way out is to assume that there exists a model of the formal structure not in time and space but as an abstract object. When we then interpret the formal structure in terms of the abstract object, the formulas turn into true statements about the abstract object. This move brings us back to platonism in the strict sense of the term.

According to structuralism, what theoreticians have to say is, again, necessarily true and not subject to empirical criticism. Empirical economists are left with the task of finding out which economic model fits which historical situation. Since, however, no economic model fits exactly (because of the unrealistic assumptions), they can only classify the situations they consider as more or less similar to the theoreticians’ abstract objects. This is, of course, just a way of re-describing the observational data. In order to achieve more, structuralists need to embrace inductivism or resort to second-level theorizing about the relations between the theorists’ abstract objects and the world of experience.

1.4 Approximation and Robustness

Some economists explicitly oppose the unrestricted use of unrealistic assumptions, demanding greater realism in economic models. On the face of it, these economists seem to reject model platonism. Yet, a closer look reveals that demands for more realism alone are insufficient to escape from model platonism.

Let us begin with the intuitively appealing idea that a model with unrealistic assumptions should in some sense be a good approximation to the situation under investigation. This seems to rule out wildly unrealistic assumptions. Alas, Friedman (1953, 15) defines a model to be a good approximation if and only if the conclusion of interest from the model holds in the situation under investigation. With this definition, the requirement that a model should be a good approximation to the situation under investigation just means that the conclusion of interest should hold. Again, it does not matter whether the assumptions are realistic or unrealistic.

Some economists have tried to improve upon this idea of models as approximations by adding robustness requirements, thereby restricting the use of unrealistic assumptions. According to Gibbard and Varian (1978, 674), even models with very unrealistic assumptions may help us to understand a situation if their conclusions are robust, meaning that the conclusions do not depend on the details of the assumptions. Similarly, Ng (2016, 182) considers simplifying assumptions to be acceptable if they simplify the analysis but do not change the conclusions substantially. Pfleiderer (2020, 84–85) argues in favor of using a „real-world filter“, rejecting models if critical assumptions contradict what is already known. Rodrik (2015, 19, 26–27, 94–98), citing an earlier version of Pfleiderer’s paper, agrees and requires critical assumptions to be close to reality. Actually, Solow (1956, 65) already expresses similar thoughts in an opening paragraph that reads like an implicit criticism of Friedman (1953).

All these considerations come down to the same point: one may use unrealistic assumptions only if this simplifies the analysis without changing the conclusion of interest. This robustness requirement sounds reasonable, but as it stands, it is useless. Let us state the requirement formally. Let \(A\) be the conjunction of the assumptions of the unrealistic model and \(F\) the conclusion of interest from the model, so that \(A\to F\) is a tautology. “Unrealistic” means: in the situation under investigation, it is known that \(A\) is false. Robustness in this sense would require that the unrealistic model \(A\) could, in principle if not in practice, be replaced by a perfectly realistic model \({A}^{*}\) also implying \(F\).

It is surprisingly trivial to check empirically whether robustness in this sense holds: just check \(F\). If \(F\) turns out to be false in the situation under investigation, this implies that \({A}^{*}\) is false for any tautology \({A}^{*}\to F\); hence, robustness fails. If \(F\) turns out to be true in the situation under investigation, and if we assume that \(F\) is no miracle and, therefore, amenable to an explanation in principle (even if we may be unable to find this explanation), then there must exist some perfectly realistic model \({A}^{*}\) implying \(F\). Consequently, the robustness requirement holds if and only if \(F\) is true.Footnote 13

For this reason, the robustness requirement invoked by the post-Friedmanian proponents of realism adds nothing to Friedman’s approach to approximation. Again, an unrealistic model turns out to be a good approximation if and only if the conclusion of interest from the model holds in the situation under investigation because then and only then the robustness requirement holds.

While the demand for more realism in economic models goes in the right direction, it must be supported by a more detailed analysis of a model’s components, which then allows for a better characterization of robustness and approximate explanation.

2 Theories and Models

2.1 Law-like Assumptions, Situational Assumptions, and the Rationale of Model Building

The problem of unrealistic assumptions changes completely once one acknowledges that there are two kinds of assumptions in a model: law-like assumptions describing relationships assumed to hold always and everywhere,Footnote 14 and situational assumptions describing a situation to which the law-like assumptions are applied. The point of testing is to find true law-like assumptions (called “laws”) and weed out false ones.Footnote 15

Obviously, both kinds of assumptions appear in economic models. Consider the neoclassical model of an exchange economy already discussed above. The assumption that all agents have complete preference orderings on the set of alternatives is law-like: it is made in each neoclassical model and belongs to the core of the neoclassical theory. On the other hand, the assumption that each agent is endowed with some stock of consumption goods is just a description of the situation where the agents act.

A theoretical model can be written as \(T\wedge S\), with \(T\) as the conjunction of all the law-like assumptions, usually called the theory, and with \(S\) as the conjunction of all the situational assumptions. We also refer to \(S\) as the description of a situation. We consider a theoretical model where \(S\) is generic, that is, given in general terms, without specifying a time or location of the situation or the persons involved.

Again, we consider some consequence of interest \(F\) of the model \(T\wedge S\). The statement \(T\wedge S\to F\), then, is a tautology. Now, however, the focus is not on the tautology \(T\wedge S\to F\) but on the statement \(S\to F\). This statement is not a tautology but a law-like consequence of the theory \(T\): the theory implies that, whenever and wherever the situation \(S\) obtains, \(F\) must hold.Footnote 16 If \(T\) is true, its law-like consequence \(S\to F\) is also true, even if the situation described by \(S\) never occurs or is impossible.Footnote 17

Deriving this kind of law-like statement from a theory is the rationale of modeling. Tests, explanations, prediction, policy advice—no matter what we want to do, we have to find out what our theories imply for the situations where we want to apply them. In the case of the competitive exchange economy, the situation is hypothetical, although situations coming close to it can be implemented in laboratories. Considering hypothetical situations is not only relevant for model building but also a crucial element in decision making. A rational decision maker selects from the available options one whose causal consequences he believes to be at least as good as the causal consequences of the others. His assumptions about the hypothetical scenarios resulting from different choices—so-called subjunctive conditionals of the form “if I took action \(a\), consequence \(c\) would obtain”—will be true only if they follow from true law-like assumptions.

The concept of a theoretical model as a combination of law-like and situational assumptions captures the main use of the term “model” in economics and in other sciences (see Bunge, 1973, 97–99). It has an important but often overlooked aspect: a theory comes with its own language, that is, a set of terms occurring in the law-like assumptions and denoting the things to which the theory refers (Hans Albert, 1987, 108–111). The description \(S\) of a situation contained in a model \(T\wedge S\) uses only the language of the theory.

For instance, neoclassical theory speaks, among others, of agents and goods. These basic terms have no explicit definition within the theory but, of course, a meaning: the agents of economic theory, for instance, are humans.Footnote 18 These terms leave some room for interpretation since meanings are not perfectly sharp. Does every human being qualify as an economic agent, or are there, for instance, some age qualifications? In special cases, such fine points might matter. More important, however, is the fact that the language of neoclassical economics lacks many of the terms that are used for describing people’s personal characteristics or the characteristics of the goods people’s preferences refer to.

Characteristics of a situation that cannot be described with the language of the theory must be ignored in any relevant description of a situation. Therefore, a certain level of abstraction is a built-in feature of any theory. Given the usual law-like assumptions of neoclassical theory, different assumptions about the color of agents’ eyes among the situational assumptions would make no relevant difference: according to the model of a competitive exchange economy, trade among blue-eyed agents would be as efficient as trade among brown-eyed agents. The often-heard claim (e.g., Roberts, 1987, 838) that a perfectly realistic model would be as useless as a map of scale 1:1, then, is false. The situational assumptions of a perfectly realistic neoclassical model must be stated in the language of neoclassical theory, and this theory already implies that an enormous amount of details would have to be left out of the model.

The situational assumptions must state not only what is present in the situation under investigation but also what is absent. For instance, an economic model’s assumption that there are two agents is to be interpreted as the assumption that there are exactly two agents: no one else is present. In the same spirit, the situational assumptions of an economic model implicitly exclude any feature of the situation which can be described in the language of the relevant theory but which is not explicitly mentioned.

Since economists do not distinguish explicitly between law-like and situational assumptions, economic models are often ambiguous in this respect. For instance, when testing the neoclassical theory of behavior in laboratory experiments by letting players play some game, the relevant model assumes that players’ strategies are in equilibrium. Is this a situational assumption or a law-like assumption? This is not easy to say.

If it were assumed that experimental subjects always play according to equilibrium strategies, the equilibrium assumption would be law-like. Alternatively, one might consider the equilibrium assumption as situational, which, however, would rob the theory of its empirical content. Experimentalists often consider a version of the theory which assumes that experimental subjects need time to learn about the game and the other players before equilibrium play occurs. This alternative interpretation invokes (typically: not very precise) law-like assumptions about learning, which are used to choose an experimental design under which the equilibrium assumption is predicted to hold. In effect, problematic situational assumptions are replaced by law-like assumptions claiming that, under relatively easy-to check situational assumptions, the problematic situational assumptions will hold. This is one of the ways to “operationalize” a theory.

Ambiguities concerning the distinction between law-like and situational assumptions must be resolved, explicitly or implicitly, in any test of a theory. Different ways to resolve these ambiguities yield similar but nevertheless different theories.Footnote 19 Often, the hypotheses introduced at this stage are so-called auxiliaries, that is, law-like hypotheses required to derive predictions for specific contexts but not really in the center of interest. In experimental economics, for instance, an often-used auxiliary is that experimental subjects who correctly answered some test questions about the experiment have understood the instructions.

In practice, there is a large set of law-like assumptions, some of them quite similar to each other, which are subjected to tests in different combinations. Some combinations of law-like assumptions turn out to be successful in empirical tests, others fail. The process of weeding out false law-like assumptions and identifying true ones is complicated; its discussion is beyond the scope of the present paper. However, for the whole process to work at all, it is necessary to forge a connection between theories, that is, combinations of law-like assumptions, and empirical investigations. And this requires a solution to the problem posed by unrealistic assumptions.

2.2 Basic Methodological Problem Constellations

On the basis of the distinction between law-like and situational assumptions, the interaction between model building and empirical investigations can be clarified. Given a theory \(T\), a description \(S\) of a situation and a conclusion of interest \(F\) implied by the model \(T\wedge S\), the focus is on the law-like statement \(S\to F\): whenever and wherever the situation \(S\) obtains, \(F\) must hold. Depending on the status of the model’s components, we can distinguish several problem constellations in empirical investigations (see Table 1).

Table 1 Possible cases in an empirical investigation based on a model \(T\wedge S\)

With respect to the theory \(T\), we have to acknowledge that tests are never completely conclusive. False theories might yield correct predictions in some situations, and observational errors might lead us to the false conclusion that predictions from a true theory failed. If theories must be tested by statistical methods, both kinds of errors may, in addition, be caused by sampling variation. Therefore, we can never be certain whether \(T\) is true or false. Moreover, a small number of tests, no matter what the results might be, will usually be insufficient to support even a tentative judgment. Therefore, we distinguish three cases: \(T\) may be well-corroborated and tentatively accepted as true, untested or insufficiently tested, or falsified and tentatively rejected as false (corroborated, untested, or falsified, for short).

Strictly speaking, the same categories apply to the situational assumptions \(S\). We assume that these assumptions can be checked by direct observation, but since observational errors are always possible, such checks are best viewed as tests. However, this problem seems to be less severe than in the case of theories. We therefore simplify and assume that \(S\) is correctly classified as realistic (true) or unrealistic (false) in the situation under investigation.

If the situational assumptions cannot all be checked by direct observation, an empirical investigation might be considered as an indirect test of the unobservable situational assumptions or as a means to estimate some variable whose value cannot be measured in a more direct way. The robustness considerations of Sect. 3 below can be adapted to deal with this case but this is beyond the scope of the present paper.

On the basis of these considerations and restrictions, we can distinguish the six different cases of Table 1. Cases I-III are relatively unproblematic textbook cases. The main idea of the paper is to use robustness considerations to reduce cases IV-VI to their relatively unproblematic counterparts I-III.

2.3 Case I: Untested Theory \(T\) and Realistic Situational Assumptions S

This is the textbook case of theory testing. The derivation of \(S\to F\) for some observable conclusion \(F\) allows us to test \(T\) by checking \(F\). If \(F\) holds, \(T\) is corroborated; otherwise, it is falsified. A single corroboration or falsification is usually not enough to determine the status of a theory. Yet, repeated falsifications usually trigger a search for alternatives for at least one of the law-like statements used in the derivation of \(F\).

If \(T\) is corroborated in several tests based on different situations S,S′… yielding different conclusions \(F, {F}^{^{\prime}}\dots\) and never falsified, \(T\) achieves the status of a corroborated theory and can provisionally be accepted as true.Footnote 20 Acceptance is always provisional because even a well-corroborated theory might be false, so that future falsification can never be ruled out.

Checking whether the conclusion \(F\) from the model \(T\wedge S\) holds, then, is a means to test \(T\), the set of law-like assumptions of the model. Learning about law-like assumptions is crucial in science because only these assumptions have implications beyond the situation under investigation. The aim of tests is to weed out false theories and trigger the search for better ones. As the history of science demonstrates, this process can lead to impressive successes even if the new theories it produces are, again, falsified. These successes are due to the fact that false theories may have important law-like consequences that are true, or approximately true in the sense that predictive errors are small for practical purposes.

2.4 Case II: Corroborated Theory T and Realistic Situational Assumptions S

Typically, the scientific community carries on with testing even well-corroborated and provisionally accepted theories. The point is, of course, not to endlessly check the same conclusions in the same kind of situations but to come up with new, hitherto untested conclusions for new situations.

Alternatively, accepted theories may be used for making predictions or for finding explanations. Predictions are formally identical to tests but are made in the hope not of finding something new but of getting the prediction right. In the case of an explanation, the phenomenon \(F\) to be explained has already been observed. The challenge is to show that \(F\) follows from the theory and the description \(S\) of the situation where \(F\) occurred. If this turns out to be the case, the model \(T\wedge S\) is said to explain \(F\).Footnote 21

Obviously, predictive failures yield falsifications. The same may happen if the search for an explanation of \(F\) fails, that is, if it turns out that the realistic model \(T\wedge S\) implies that \(F\) should not occur. This case is, however, often less straightforward because it may be quite difficult to observe, or reconstruct after the fact, the relevant situation where \(F\) occurred. In contrast, testing a theory allows the researcher to seek out easily observable situations or to implement such situations in a laboratory.

2.5 Case III: Falsified Theory T and Realistic Situational Assumptions S

Even if the theory \(T\) is false, its law-like consequence \(S\to F\) might be true and can, therefore, reasonably be tested by checking \(F\). In this way, a theory that, in principle, has been rejected as false can serve as a heuristic for finding new and true law-like hypotheses.

This is especially relevant if \(T\) had been successful for a long time, that is, used to be well-corroborated and had been provisionally accepted as true. The most important case in the history of science is classical mechanics, which must be considered as falsified but is still used for many purposes. Using classical mechanics is, of course, made easy because its successor, general relativity theory, is extremely well-corroborated and predicts in which situations which consequences of classical mechanics should hold.

The situation in economics is less fortunate. The neoclassical theory of human behavior has been thoroughly falsified in laboratory experiments but lacks a well-corroborated successor. Yet, the theory is still used. This can be justified if the theory turns out to be a useful heuristic (see Albert 1996). Testing a new law-like assumption \(S\to F\) derived from a falsified theory \(T\) is not only interesting because \(S\to F\) might be true; it can also be considered as a test of the heuristic quality of \(T\). While the logic of testing remains the same, judgments about a heuristic are more lenient: a heuristic may be considered as useful even if its rate of failures is quite high.

Of course, heuristic successes of a falsified theory do not speak against attempts to come up with better theories. In the search for better theories, a falsified theory may serve as a benchmark: one may try to find out in new empirical investigations which consequences of the falsified theory are more or less in agreement with reality. Knowing exactly where and how a theory fails may yield important information for finding a successor—or, less desirable but possibly relevant in the case of behavioral economics, a set of successors, with each new theory covering only a subset of the domain of its falsified predecessor.

2.6 Case IV-VI: Unrealistic Situational Assumptions S

The discussion of cases I-III shows that, with realistic situational assumptions, the logic of empirical investigations of a model based on a given theory \(T\) is always the same, independently of \(T\)’s status. Even in case III, an empirical investigation may make sense and, if undertaken, must proceed according to the same principles as in cases I and II.

With unrealistic situational assumptions, the logic of empirical investigations seems to break down. If \(S\) is false in the situation under investigation, the law-like consequence \(S\to F\) following from \(T\) is irrelevant: it predicts \(F\) for some other situation while implying nothing at all for the situation at hand. Hence, the observation of \(\neg F\) would provide no argument against \(T\), implying that the empirical investigation is not a test of \(T\). This also implies that observing \(F\) is no argument in favor of \(T\): a corroboration requires a severe test. For the same reasons, the empirical investigation would not contribute to the evaluation of \(T\)’s heuristic potential if \(T\) was already falsified.

However, at this point, it is possible to come back to the ideas of robustness and approximation considered before, although with important modifications made possible by the distinction between law-like and situational assumptions.

3 Robustness as a Testable Conjecture

3.1 The Method of Decreasing Abstraction

Let us consider some theory \(T\) and a set of several different models \(T\wedge {S}_{1}, T\wedge {S}_{2},\dots\) based on \(T\), with the different situational-assumption parts \({S}_{k}\) of the models collected in a (possibly infinite) set \({\Sigma }:=\left\{ {S_{1} , S_{2} , \ldots } \right\}\). According to a standard definition, a conclusion \(F\) is robust in this set of models if \(F\) follows from each model in the set. With a given theory \(T\), we just write that \(F\) is robust in \(\Sigma\).

Let us further consider an empirical investigation, and let \({S}^{*}\) be the true description of the situation in the language of \(T\). Thus, \({S}^{*}\) is perfectly realistic, while \(T\) might be false. As we have seen, the status of \(T\)—untested, corroborated, or even falsified—makes no difference for the logic of empirical investigations. Moreover, even if many of the descriptions in ∑ were unrealistic, the problem of unrealistic situational assumptions would be absent if it were known that \(F\) is robust in \(\Sigma\) and that \({S}^{*}\in\Sigma\).

This unproblematic case of robustness often holds in economics with respect to the “dimensionality” of models. For instance, consider the model of the competitive exchange economy with an unspecified but finite number of goods. From a logical point of view, this model is actually an infinite set of models, each with a different number of goods. The conclusion that trade leads to an efficient allocation is robust in this set of models, that is, it holds independently of the number of goods. To apply the model to some situation, then, one need not know the number of goods in this situation in order to conclude that the theory predicts efficiency.Footnote 22

The question is whether robustness considerations can be extended to the case where the realistic description \({S}^{*}\) of the situation is not available and/or \(T\wedge {S}^{*}\), if available, cannot be analyzed, so that it is unknown whether \(F\) follows from \(T\wedge {S}^{*}\).

The simplest extension of the robustness argument relies on induction: one argues that one’s confidence that \(T\wedge {S}^{*}\) implies \(F\) increases with the size of the set \(\Sigma\) of unrealistic descriptions where \(F\) is robust. In this crude form, the inductive argument is obviously not acceptable: it is often easy to come up with many models whose conclusion is \(\neg F\). While \(F\) might be robust in \(\Sigma\), \(\neg F\) might be robust in Σ′. For the inductive argument to make any sense, the set \(\Sigma\) must be relevant to the situation under investigation.

Sugden (2000) considers an improved version of the argument where the elements of \(\Sigma\) form a sequence \({S}_{0}, {S}_{1}, \dots\) of increasingly realistic but still unrealistic descriptions of the situation under investigation. While the completely realistic description \({S}^{*}\) is not in \(\Sigma\), he argues that it may be possible to conclude by induction that, if \(F\) is robust in \(\Sigma\), it is also the case that \(F\) follows from \(T\wedge {S}^{*}\).Footnote 23

In economics, the idea of constructing a sequence of increasingly realistic models is known as the “method of decreasing (or diminishing) abstraction”. As already noted by Hans Albert, the method can be misused by model platonists to immunize their theory against empirical criticism. After all, theoreticians could blame any failures of their models on the fact that their assumptions were not yet realistic enough, thereby postponing severe testing of their theory indefinitely. Obviously, an acceptable argument to the effect that \(F\) actually follows from the perfectly realistic model \(T\wedge {S}^{*}\) would block this immunization strategy since it would imply that \(\neg F\) speaks against \(T\).Footnote 24

However, inductive arguments are not acceptable. As already explained, the best reason we can have for tentatively accepting a conjecture, even a mathematical or logical conjecture, is that it survived severe tests, that is, a serious search for counterexamples. The challenge, then, is to come up with an improved testable version of robustness.

3.2 Robustness, Approximate Explanations, and Critical Assumptions

The improved definition of robustnessFootnote 25 involves four elements: a given theory \(T\), a situation under investigation where \(T\) is to be applied, an unrealistic description \({S}_{0}\) of this situation in the language of \(T\), and some interesting consequence \(F\) of the model \(T\wedge {S}_{0}\). We call \(F\) a robust consequence of \(T\) for the situation under investigation if and only if \(F\) follows from all models that are more realistic than \(T\wedge {S}_{0}\), that is, all models combining \(T\) with a description of the situation that is more realistic than \({S}_{0}\). The set of these more realistic descriptions is denoted by \({\Sigma }_{0}\).

The realistic description \({S}^{*}\) of the situation under investigation belongs to \({\Sigma }_{0}\) by definition. If \(F\) is actually robust in \({\Sigma }_{0}\), this implies that \(F\) follows from \(T\wedge {S}^{*}\). Therefore, the conjecture that \(F\) is a robust consequence of \(T\) in the situation under investigation implies that the problematic cases IV-VI of Table 1 can be treated like the relatively unproblematic cases I-III.

This concept of robustness includes an important special case where increasingly realistic models \(T\wedge {S}_{0}\), \(T\wedge {S}_{1}\) etc. lead to increasingly precise predictions \({F}_{0}, {F}_{1}\) etc. (see, e.g., Betz, 2011: 657). “Increasingly precise” means that the predictions are numerical intervals, with each interval predicted by a more realistic model being a proper subset of the interval predicted by previous models. This is a special case of robustness because \({F}_{k+1}\) implies \({F}_{k}\) in this sequence: the less precise prediction is correct if the more precise prediction is correct. Hence, \({F}_{0}\) follows from all models more realistic than \(T\wedge {S}_{0}\).

While the definition of robustness is a generalization of this special case, it is still very strong, which means that the corresponding robustness conjecture is also very strong. A definition, moreover, cannot solve the problem of unrealistic situational assumptions. However, the definition simplifies the presentation of the solution.

In presenting the solution, we focus on the case of explaining an observed phenomenon described by \(F\) using a corroborated theory \(T\) (case V). Accordingly, we supplement the definition of robustness by a definition of an approximate explanation: in the situation under investigation, the unrealistic model \(T\wedge {S}_{0}\) approximately explains \(F\) if and only if \(T\) is true and \(F\) is a robust consequence of \(T\).

By definition, an approximate explanation could in principle be extended into a perfect explanation, which is given by the perfectly realistic model \(T\wedge {S}^{*}\). As before, we assume that this is not possible in practice because \({S}^{*}\) is unavailable and/or \(T\wedge {S}^{*}\) cannot be analyzed. By the definition of an approximate explanation, however, \(T\) ist true and \(F\) is robust, meaning that \(F\) follows from all models more realistic than \(T\wedge {S}_{0}\), including \(T\wedge {S}^{*}\). Hence, by definition, an approximate explanation of \(F\) implies the existence of a perfect explanation, even if this perfect explanation is not available. Yet, taken by itself, the unrealistic model \(T\wedge {S}_{0}\) does not explain \(F\) because an explanation needs to be true while \({S}_{0}\) is false in the situation under investigation. Hence, the label “approximate explanation” is justified.

Robustness implies that none of the unrealistic assumptions in \({S}_{0}\) is critical: no improvement of the realism of the situational assumptions would lead to a model not implying \(F\). Of course, other conclusions than \(F\) also deriving from \(T\wedge {S}_{0}\) may be false. Moreover, \(T\wedge {S}_{0}\) may not be the simplest model approximately explaining \(F\). Nevertheless, it seems that an approximate explanation of \(F\) by \(T\wedge {S}_{0}\) would be completely satisfactory.

Consider, in contrast, the case where \(F\) is not robust, that is, the case where a more realistic model \(T\wedge {S}_{1}\) not implying \(F\)—or, more drastically, implying \(\neg F\)—exists. Under these circumstances, \(T\wedge {S}_{0}\) could not be considered as an explanation of \(F\), even if it were known that \(T\wedge {S}^{*}\) implied \(F\). Moving from \(T\wedge {S}_{0}\) to \(T\wedge {S}_{1}\) improves the realism of the situational assumptions, that is, it introduces some features of the situation under investigation that are represented by \({S}_{1}\) but not by \({S}_{0}\). If \(T\wedge {S}^{*}\) actually implies \(F\), \(T\wedge {S}_{0}\) is insufficient as an explanation because the situation under investigation must contain additional features not represented by \({S}_{1}\)—features counteracting the effect of the features newly introduced in \({S}_{1}\), thereby restoring \(F\). A satisfactory approximate explanation of \(F\) would have to include these countervailing features of the situation under investigation.

A trivial example illustrates the point (see Fig. 1). A ball rolls down a sloping plane. The fact \(F\) to be explained is that the ball reaches the floor. The plane is bumpy and crossed by a curved solid ridge. The ridge has a hole at its lowest point where the ball can pass through. Without the hole, the ball would by caught by the ridge.

Fig. 1
figure 1

A ball moves down a sloping and bumpy plane. On the way, it meets a ridge with a hole where it can pass through. The description on the left leaves out the ridge. The description in the middle leaves out the hole. The description on the right, like the others, still ignores the bumps

A first model assumes that the plane is perfectly flat (\({S}_{0}\)); together with the theory of gravity \(T\), the model implies \(F\). A more realistic model includes the ridge (\({S}_{1}\)); according to this model, the ball is caught at the lowest point of the ridge (\(\neg F\)). A further, even more realistic model includes the hole in the ridge (\({S}_{2}\)). This last model again predicts \(F\) but still leaves out the bumps on the plane that have an effect on the exact movement of the ball (not to speak of air resistance, friction, etc.). However, all more realistic models imply that the ball eventually finishes its journey down to the floor.

In the example, the initial assumption that the plane is perfectly flat is false and critical. The assumption that the plane is perfectly flat except for a ridge without holes is also false and critical. The assumption that the plane is perfectly flat except for a ridge with a hole at the lowest point is false but uncritical since accounting for the bumps on the plane would not change the conclusion. The conjecture that the conclusion is robust in the set of all models more realistic than the third model is true.

Given the situation under investigation, the initial model \(T\wedge {S}_{0}\) cannot satisfactorily explain how the ball could reach the floor, although it predicts this event. A satisfactory explanation needs to mention the ridge and the hole in the ridge. This is achieved by the last model, \(T\wedge {S}_{2}\). Nevertheless, \(T\wedge {S}_{2}\) is not a perfect explanation since \({S}_{2}\) is still false; hence, the standard definition of an explanation, which requires that all components of an explanation be true, does not apply. Nor is it possible to achieve a standard explanation by taking in some way or other the robustness conjecture into account. While the robustness conjecture is true in the example, it is neither a law-like nor a situational assumption; it cannot be a part of the model but says something about the model in relation to the situation under investigation. We are therefore stuck with an approximate explanation provided by \(T\wedge {S}_{2}\).

There is a further sense in which the explanation is approximate. Typically, many conclusions from an unrealistic model will be false. In the example, the model \(T\wedge {S}_{2}\) is an approximate explanation for the observation that the ball reaches the floor. Let us assume that there are further observations, for instance, the time it takes the ball to reach the floor. The sequence of models just considered focuses on the explanation of just one of the known facts—that the ball reached the floor—at the expense of the other known facts.Footnote 26 The last model \(T\wedge {S}_{2}\) approximately explains the selected fact; however, its conclusion with respect to, for instance, the ball’s travelling time might be quite off the mark since this is influenced by a lot of factors not included in the model. The situational assumptions of the model, then, are an approximation chosen for a purpose, namely, explaining a specific fact rather than all the facts.

3.3 The Critical Discussion of Models

Even if the theory \(T\) is well-corroborated and tentatively accepted as true, the question of whether an unrealistic model \(T\wedge {S}_{0}\) can be accepted as an approximate explanation is difficult to answer. If the realistic description \({S}^{*}\) is unavailable and/or unanalyzable, the same goes for many elements of the set \({\Sigma }_{0}\) of more realistic descriptions. Therefore, an argument to the effect that \(T\wedge {S}_{0}\) can be accepted as an approximate explanation of \(F\) involves two problems, which often come together. Identifying members of \({\Sigma }_{0}\), that is, descriptions of the situation under investigation more realistic than \({S}_{0}\), is an empirical problem. Given some more realistic description \({S}_{1}\in {\Sigma }_{0}\), it needs to be shown that \(T\wedge {S}_{1}\) implies \(F\), which is a theoretical (that is, logical or mathematical) problem.

While it is impossible to prove the conjecture that \(F\) is robust in \({\Sigma }_{0}\), the conjecture can be tested. Severe testing means searching for features in the situation under investigation which are missing in \({S}_{0}\) and which, when taken into account, lead to a more realistic model \(T\wedge {S}_{1}\) not implying \(F\) (or, even more strongly, implying \(\neg F\)). Any rebuttal of the robustness conjecture can trigger a search for an even more realistic model \(T\wedge {S}_{2}\) implying \(F\) and a new robustness conjecture: \(F\) might be robust in \({\Sigma }_{2}\), the set of all descriptions more realistic than \({S}_{2}\).

With respect to the simple example of Fig. 1, we can imagine an equally simple story of a critical discussion leading to an approximate explanation. Let us assume that it is a well-established fact \(F\) that balls rolling down the plane reach the floor. Furthermore, let us assume that it is difficult to observe the paths taken by the balls and the exact properties of the plane but that it is already known that the plane is bumpy. Nevertheless, a researcher proposes the unrealistic “perfectly flat plane” model \(T\wedge {S}_{0}\) as an explanation of the fact \(F\) that balls rolling down the plane reach the floor. This model is criticized by a second researcher who found evidence of a ridge crossing the plane. Extending the model to account for the ridge leads to the more realistic model \(T\wedge {S}_{1}\) implying \(\neg F\). Hence, in the situation under investigation, the conclusion \(F\) is not robust in \({\Sigma }_{0}\). It is now unclear how the balls can reach the floor—they might, for instance, jump over the ridge for some reason.

However, further empirical investigations by the first researcher lead to the discovery of the hole in the ridge and, consequently, to an even more realistic model \({T\wedge S}_{2}\) which again implies \(F\). Against this third model, the second researcher argues that the bumps on the plane might be so high and lie so dense that the balls cannot pass them. This is conceivable but logical possibility alone is no admissible argument. If despite their best efforts nobody can produce empirical evidence of further impediments that could stop the balls, the conjecture that \(F\) is robust in the set of all models more realistic than the third model becomes accepted. This also implies that the third model is accepted as an approximate explanation of \(F\).

This simple story can be extended in several ways. For instance, the researchers may go on to improve the model in order to explain, in addition, the variation in the traveling times of the balls. This may require an extension of the theory used in the explanation: the theory of gravity is insufficient if there is friction and air resistance. If this research program is successful, the aim of model building might become to improve the precision of the model’s explanation of traveling times. Yet, given the limited possibilities of observing the features of the plane, it may be impossible to come up with explanations for the distribution of traveling times. Moreover, there may be theoretical problems involved in determining the conclusions of the models.

Counterexamples need not be based on models incorporating all the complexities that have been discussed up to this point. The models starting from some initial model \(T\wedge {S}_{0}\) may form a treelike structure rather than a line. Along each branch of the tree, the realism of the situational assumptions increases. However, there is no need to compare models from different branches with respect to their degree of realism. In fact, it is completely unnecessary to define the notion of a degree of realism; it suffices to identify increases of realism. This possibility is implicitly admitted by all sides of the debate about unrealistic assumptions: any evidence-based argument to the effect that some situational assumption is unrealistic already indicates what kinds of assumptions would increase a model’s realism.

This point is exemplified by, for instance, Leamer’s detailed empirical investigations of the situational assumptions of his own model of international trade. For instance, the assumption of no costs of transport is shown to be unrealistic by presenting statistics on these costs, which in the years he considers were quite high. A more realistic model would have to include these costs (but would no longer imply the linear relation tested by Leamer).

Any empirical criticism of a situational assumption, then, points the way to improvements of realism. However, this holds only for situational assumptions. A falsification of the theory-part of the model does not indicate what a more realistic theory would look like. It just throws up a fact contradicting the theory. All we learn from the falsification is that a new theory must avoid to be in conflict with this fact (as with other known facts).

Importantly, the focus on criticizing robustness allows for a broader range of arguments than a focus on proving robustness. Numerical simulations are often considered as insufficient surrogates for mathematical proofs. However, simulations within the empirically relevant range of parameter values provide counterexamples to the robustness conjecture if parameter values are found for which the conclusion of interest \(F\) fails. In the search for counterexamples, it is legitimate to focus on extreme values of the parameters—values at which background knowledge suggests that robustness might fail—as long as one stays in the empirically relevant range. Consequently, a failure to find counterexamples by simulations that are, in this way, “rigged” against \(F\) is a corroboration of the robustness conjecture.

Laboratory experiments may also provide robustness checks. To see this, let us again assume that the aim of research is to explain some observed fact \(F\) on the basis of some accepted theory \(T\). Let there be an unrealistic initial model \(T\wedge {S}_{0}\) implying \(F\), and some more realistic model \(T\wedge {S}_{1}\) which, however, is too difficult to analyze so that it is unknown whether it implies \(F\). Yet, it may be possible to implement the situation described by \({S}_{1}\) as an experimental design in the laboratory. If \(F\) fails to occur in this experiment, and if \(T\) is indeed true, \(F\) does not follow from \(T\wedge {S}_{1}\) and is, therefore, not a robust consequence of \(T\). Hence, laboratory experiments with designs that cannot be analyzed theoretically can provide counterexamples to robustness conjectures. The failure to refute robustness in the laboratory contributes to the corroboration of the robustness conjecture and, consequently, helps to establish the simple model as an approximate explanation of \(F\).

There is no logical endpoint to the search for an approximate explanation. Any conclusion is accepted only tentatively, until somebody comes up with a new argument. In this respect, the search for approximate explanations is not different from any other dispute in science (including logic and mathematics). Different contributors take different positions, and one side wins if the other sides run out of arguments. Although it is never possible to ascertain conclusively what is true and who is right, there are methodological rules regulating this competitive process. They determine which kind of arguments are admissible and which side has, at a given time, the upper hand. While the details of scientific methodologies, which depend on accepted scientific theories and available technologies, change over time, the top level rules regulating the interplay of theoretical argumentation and empirical investigation remain the same. The conjecture that these top-level rules are the best rules for promoting scientific progress can be accepted in view of the success of science and the fact that, despite intense critical discussions, no better rules have been found.

As already explained, searches for an explanation can lead to the conclusion that \(F\) cannot, in fact, be explained by \(T\). This can, of course, also happen in the case of approximate explanations, and it seems to me that it happened, for instance, in the case of the movement of the perihelion of the planet Mercury, for which no explanation on the basis of Newton’s theory of gravitation could be found. Given the complexity of the solar system and the fact that, according to Newton’s theory, all masses in the solar system and, actually, in the whole universe should instantaneously affect the movement of the bodies in the solar system, the search for an explanation in this case must be considered as a search for an approximate explanation—a search that famously failed, while an (it seems: also approximate) explanation on the basis of general relativity theory could be found.

The example of Newton’s theory suggests that the notion of a given situation under investigation is not as straightforward as it seems. While the language of the theory \(T\) determines what could conceivably be relevant—namely, anything that can be described in this language—, the content of \(T\) determines which elements of this description are actually relevant in which way. Without consulting \(T\), it is impossible to say what has to be included and where the limits of the situation under investigation are to be drawn. Depending on \(T\), things that are far away in space and time might be relevant. Moreover, the question is: relevant to what? If we focus just on one conclusion of interest \(F\), some things turn out to be irrelevant that may be relevant to some other conclusion.

Fortunately, it is not necessary to clarify the limits of the situation under investigation in advance. As far as such a clarification is needed, it emerges as a byproduct of the robustness discussion. The discussion begins with the problem of explaining, with the help of \(T\), some observed phenomenon \(F\). The observation of \(F\) always comes with a rough-and-ready notion of the situation under investigation. We propose a model \(T\wedge {S}_{0}\) and call \(F\) a robust consequence of \(T\) for the (not precisely defined) situation under investigation if and only if \(F\) follows from all models that are more realistic than \(T\wedge {S}_{0}\). Such a more realistic model replaces \({S}_{0}\) by situational assumptions \({S}_{1}\) improving on \({S}_{0}\) in the light of some empirical criticism raised against \({S}_{0}\). If it can be shown that \(F\) is not a consequence of \(T\wedge {S}_{1}\), robustness has been successfully criticized and it has been shown that the phenomena newly included in \({S}_{1}\) belong to the situation under investigation. If, after serious attempts at refuting it, we accept the conjecture that \(F\) is a robust consequence of some model \(T\wedge {S}_{n}\), we thereby also accept that some observable features of the situation under investigation describable in the language of \(T\) but not captured in \({S}_{n}\) are irrelevant for an approximate explanation of \(F\). Whether these irrelevant features are, intuitively, considered as part of the situation under investigation or not makes no difference.

Robustness conjectures, then, lend a specific structure to a critical discussion of potential approximate explanations of some observed phenomenon \(F\) on the basis of a well-corroborated and tentatively accepted theory \(T\). The same kind of interaction between empirical and theoretical considerations is relevant if \(T\) is untested or falsified (cases IV and VI in Table 1). If one wants to argue that the result of checking some conclusion \(F\) from an unrealistic model \(T\wedge {S}_{0}\) is relevant for the assessment of the truth or the heuristic potential of \(T\), one must argue that, in the situation under investigation, \(F\) is a robust consequence of \(T\). Of course, experimental robustness checks presuppose that \(T\) is true; they make no sense if \(T\) is not accepted. But apart from this caveat, the robustness conjecture triggers always the same kind of critical discussion.

Independently of the status of the theory \(T\), then, the logical structure of the critical discussion of models is always the same and drives model building in the direction of increasing realism or “decreasing abstraction”. This variant of the “method of decreasing abstraction” is not a pretext for indefinitely postponing empirical criticism. Rather, it is an application of the basic principles of critical rationalism: a way of integrating empirical criticism into the modelling process.

4 Robustness in Economic Research

In the case of Leamer (1984), robustness of the conclusion of interest—the linear relationship between factor endowments and trade—in the situations under investigation fails, which makes it hard to explain what the empirical investigation of this conclusion is meant to achieve. But there are other examples where robustness seems to hold. I want to discuss two of them.

The first example comes from the field of mechanism design. The typical problem in mechanism design is to find institutional arrangements that generate some desirable result or avoid some undesirable result. In the present context, mechanism design is interesting for two reasons. First, designers frequently use numerical simulations and experiments to supplement theoretical considerations. Second, design problems illustrate an important general point: the same considerations relevant in the search for explanations are also relevant in the search for solutions to practical problems.

Roth (2002) reviews the history of the US American labor market for new doctors seeking a first job at hospitals. After the market came into existence around 1900, intense competition caused it to “unravel”: in an effort to secure an attractive partner before their respective competitors became active, hospitals and medical students entered into contracts earlier and earlier in students’ careers. By the 1940s, students were hired almost two years before graduation, at a time when hospitals still lacked reliable information about the prospective doctors’ qualifications, and students had not yet found their preferred field of specialization. As a consequence, matchings between doctors and hospitals were highly inefficient.

Attempts to reform the matching process led, in the 1950s, to the creation of a centralized clearinghouse. Hospitals interviewed graduates as they saw fit and then provided preference rankings of graduates to the clearinghouse, while graduates provided their preferences rankings of hospital positions. The clearinghouse used these rankings to propose a matching using an algorithm that subsequently was improved several times. Although participation in the clearinghouse was voluntary and the matching provided by the clearinghouse was a non-binding proposal, this solved the problem of unraveling. In the 1990s, however, hospitals and medical students became dissatisfied with the operation of the clearinghouse (for reasons we need not discuss), and Roth was asked to improve the situation. This led to the adoption of a new algorithm. Subsequently, we focus on selected aspects relevant in the design of this algorithm.

Roth (2002, 1348) begins with the question of what explains the successes and failures of clearinghouses in various labor markets. The explanation he seeks is based on the neoclassical theory of human behavior and, specifically, game theory. As the paper shows, Roth accepts this theory in a slightly weakened form. It is not assumed that equilibria are reached without a learning phase; indeed, exploratory behavior may persist to some degree even after the learning phase. Moreover, it is conceded that individuals are unable to solve very complex problems; therefore, equilibrium predictions for some situation are taken to be theoretically relevant also for slightly different situations where, actually, small gains could be achieved by using different but hard-to-find strategies. These deviations from hard-core neoclassicism are not unusual in applied and, specifically, experimental economics.Footnote 27

The starting point for the explanation is what Roth calls a “too simple model” of the labor market. The model assumes that the clearinghouse cannot be circumvented and finds a stable matching between doctors and hospitals. A matching is stable if and only if (a) all participants are assigned partners acceptable to them and (b) there exists no hospital-doctor pair where both prefer each other to their assigned partners. Stability goes beyond efficiency: it ensures that searching for a better match will not be successful.

Stability of the proposed matchings is a plausible requirement for the success of a clearinghouse. With unstable matchings, participants dissatisfied with their assigned partner may find better matches, thereby displacing others who would then have to search for a new partner. Depending on the extent of this effect, using the clearinghouse may become unattractive, leading to the decentralization of the market and unraveling.

In line with this intuition, an empirical investigation of several clearinghouses from different markets suggested that using “stable algorithms” tends to prevent unraveling while using “unstable algorithms” does not (Roth, 2002, 1351). A matching algorithm is called (un)stable if it leads to (un)stable matchings on the basis of stated preferences. Specifically, the algorithm used by the successful clearinghouse on the medical labor market was stable. One important question, then, is whether the initial success of the medical clearinghouse is, indeed, explained by the stability of its algorithm.

Due to the complications of the medical labor market, this question could not be answered on the basis of existing models. Roth’s discussion of these complications is an instance of robustness checks in the sense of the present paper. There is a situation, the medical labor market in the 1950s. The theory \(T\) is the (slightly weakened version of the) neoclassical theory. The conclusion \(F\) of interest is that a stable algorithm prevents unraveling. While this intuitively appealing conclusion does not follow from the “too simple” model—in this model, the clearinghouse cannot be circumvented—, an extended model actually implies \(F\) (Roth & Xing, 1994). However, this model still ignores several complications of the medical labor market. We discuss just two of them.

One complication is presented by incentives to manipulate the outcome of the algorithm by lying about one’s preferences. In the model of Roth and Xing (1994), such incentives are absent; however, the theory implies that incentives to lie must exist in the medical labor market (Roth and Sotomayor 1992, 525–527). Lying about one’s preferences is a problem because a stable algorithm ensures stability of the matching only with respect to stated preferences. If market participants lie about their preferences, the resulting matching can be unstable with respect to their true preferences.Footnote 28

Yet, the incentive problem may be irrelevant. In sufficiently large markets and with a lack of information about the preferences of other market participants, manipulating the algorithm by lying is difficult and the profits tend to be small.Footnote 29 The problem is that there exists no clear-cut theoretical result yielding a threshold beyond which a market is large enough. It is therefore unknown whether one can conclude that there is no incentive problem in the medical labor market.

Kagel and Roth (2000) tackle this problem experimentally.Footnote 30 The experiment is not a test of the basic theory: Kagel and Roth were unable to derive an equilibrium prediction for the experimental design; moreover, they do not express any doubts that the theory is correct in this domain of application.Footnote 31 Nor is the experiment a simulation of any labor market encountered in the field: the experimental design is still much too simple (see Kagel & Roth, 2000, 208). The experiment can only be interpreted as a robustness tests. And, indeed, Kagel and Roth (2000, 202, 229) repeatedly claim that the experiment checks the robustness with respect to market size of the hypothesis that clearinghouses using stable algorithms prevent unraveling. This seems to be correct although the authors do not provide a complete account of this robustness check. The missing step in the argument, which may have been obvious to the authors, is that, in the very small markets implemented in the experiment, incentive problems loom much larger than in large markets. Given this premise, the experiments provide a severe robustness check. The experimental subjects first gained experience with a decentralized market that led to unraveling. Then, a clearing house was introduced which used either a stable or an unstable algorithm. However, subjects could still make early binding contracts instead of waiting for the clearinghouse to open. Moreover, lying was possible as well as potentially profitable and did occur.Footnote 32 Yet, the clearinghouse reduced unraveling significantly if it used a stable algorithm; if the algorithm was unstable, unraveling prevailed.

Another complication in the medical labor market, which became increasingly relevant in the 1970s, is the presence of couples of doctors seeking jobs in the same city. Couples pose a problem because they cannot individually state their true preferences and may, therefore, have reasons to circumvent the clearinghouse. While the algorithm of the medical clearinghouse can be, and has been, adjusted to allow for couples seeking positions at the same hospital, it can be shown that, depending on participants’ preferences, stable matches may not exist. It is just a conjecture that some modified algorithm is stable. However, numerical simulations with actual stated-preference data from several years indicated that this seems not to be a problem in practice: a suitably modified algorithm always found stable matches and, therefore, seems to be stable within the range of observed stated preferences (Roth, 2002, 1359). Again, it seems that these simulations can only be interpreted as robustness checks in the sense of the present paper since no theory or model is tested.Footnote 33

Thus, the conclusion that the modified algorithm would prevent unraveling in the medical labor market has survived at least two serious robustness checks.

It is perhaps no surprise that model platonism plays no role when economists work out solutions to practical problems. Yet, robustness considerations are also relevant for the development of pure theory, as shown by a further example, Akerlof’s (1970) lemon-market model. This is one of the two paradigmatic examples of useful unrealistic models discussed by Sugden (2000).

The lemon-market model is an extremely simple model of the market for used cars. The observation the model intends to explain is the price of almost-new used cars, which often seems to be much lower than the quality of the car would warrant. Akerlof’s explanation for this alleged fact is that buyers on the used-car market cannot distinguish between almost new good cars and almost new “lemons” (that, is, cars with manufacturing defects), while sellers know the quality of their cars. This information asymmetry between buyers and sellers prevents an efficient market equilibrium even if all other assumptions of a competitive market hold because the willingness to sell as such signals low quality. In the extreme case, the market could break down, meaning that only the lowest quality is traded at all: the good cars do not command the price at which sellers would sell them, although buyers would pay this price if they could be sure not to get a lemon.

Akerlof’s (1970) basic model and the extensions he discusses are quite unrealistic. Nevertheless, one gets the impression that one learns a lot about markets from reading the paper. The reason, however, is not, as Sugden (2000) argued, that one concludes inductively that conclusions from several unrealistic models should also follow from a realistic model of the used-car market. The persuasive power of Akerlof’s argumentation is based on his robustness discussion, which shows that obvious counterarguments against his conclusion fail. He extends his model by adding familiar market institutions like warranties which, at first sight, might be able to overcome the inefficiency caused by asymmetric information. Then he goes on to show that these institutions are unable to restore efficiency.

While the discussion of the model extensions is informal, it is quite clear that one might write down a model along these lines which still shows a market failure. At the end of Akerlof’s discussion, one is left without objections: there seems to be no conceivable remedy for the problem. Akerlof’s arguments also demonstrate that a believed-to-be-robust consequence of neoclassical economics—competitive markets are efficient if there are no externalities—is actually not robust.Footnote 34 This forces believers in market efficiency as well as non-believers to consider existing complications on competitive markets and to continue Akerlof’s informal discussion with theoretical and empirical arguments.

Akerlof’s line of argument illustrates the fact that robustness discussions in the sense of the present paper are an essential element in the evaluation of economic models. Critics of a model point out complications existing in the situation under investigation but missing in the model and try to show that accounting for these complications invalidates the conclusions from the initial model. Defenders of the model do not argue that Friedman taught us that the realism of assumptions is irrelevant. Instead, they try to show that accounting for the complications does not change the conclusions. Such robustness discussions drive model building in the direction of greater realism of the situational assumptions of economic models.

5 Conclusion

By acknowledging the distinction between law-like and situational assumptions, then, economists can escape from model platonism. Yet, as the examples above show, it is not necessary to refer to the distinction explicitly in order to come up with reasonable arguments. Nor do economists need to specialize in philosophy of science. Apart from a few basic ideas, all it takes to practise Hans Albert’s critical rationalism is a commitment to realism and critical discussion, and some common sense.