Robustness analysis and tractability in modeling

In the philosophy of science and epistemology literature, robustness analysis has become an umbrella term that refers to a variety of strategies. One of the main purposes of this paper is to argue that different strategies rely on different criteria for justifications. More specifically, I will claim that: i) robustness analysis differs from de-idealization even though the two concepts have often been conflated in the literature; ii) the comparison of different model frameworks requires different justifications than the comparison of models that differ only for the assumption under test; iii) the replacement of specific assumptions with different ones can encounter specific difficulties in scientific practice. These claims will be supported by a case study in population ecology and a case study in geographical economics.


Introduction
In the philosophy of science and epistemology literature, robustness analysis has become an umbrella term that refers to a variety of strategies. Probably also due to the fact that different taxonomies have been put forward by various authors quite closely in time (Kuorikoski et al. 2010;Weisberg and Reisman 2008;Woodward 2006), a certain confusion has emerged in the literature, as well as an overlap in the terminology.
Scientists often refer to one sense or the other without specifying their source, or without mapping their terminology to that of other scientists. More importantly, the epistemic virtues underlying different senses of robustness analysis have been often left implicit and their legitimacy as confirmatory tools remains to be clarified.
It is one of the purposes of this paper to distinguish different epistemological arguments behind distinct uses of robustness analysis in theoretical models. In so doing, this work identifies certain gaps between the ideal characterization of robustness analysis and its application to scientific practice. 1 In what follow, I will first show that robustness analysis differs from deidealization even though the two concepts have often been conflated in the literature. Secondly, that there are different justifications for using robustness analysis, according to whether it is considered as a strategy to compare different modeling frameworks or models that differ only as to the assumption being tested. Finally, that in scientific practice it can be difficult to introduce single changes in a model without altering its main structure. If robustness analysis were a 'surgical' operation, in which controversial aspects could be replaced by different ones with no other relevant changes, then the role of a single assumption could be evaluated and the invariance of the results assessed. However, it is often the case that the intimate connection between simplifying assumptions and mathematical tractability is such that variations can often only be introduced by altering the overall structure of the model. This argument will be supported by a case study in population ecology and a case study in geographical economics. In what follows, possible solutions to the previous shortcomings and related difficulties will be examined. This paper is organized as follows. In Section 2, I introduce the argument for robustness analysis and present different taxonomies proposed in the literature. In Section 3, drawing on a model in population ecology, I explain how robustness analysis differs from de-idealization. In Section 4, I examine the goal and the import of robustness analysis as a strategy to compare different mathematical approaches to describing the same phenomenon. In Section 5, I discuss a case study from geographical economics, which reveals some possible practical difficulties in using robustness analysis. In Section 6, I conclude by pointing out some challenges that robustness analysis faces in actual scientific practice, which feature as candidate directions for future research.

Robustness analysis
Suppose that we have a theoretical model, based on a set of initial assumptions, from which we can derive a number of predictions. If the initial assumptions are unrealistic representations of a real-world phenomenon, it is natural to ask how the predictions can apply to the real-world phenomenon, where these unrealistic assumptions do not hold. Intuitively, a way of proceeding is to replace the initial assumptions with slightly different ones, in order to observe whether the result holds true across conditions. Invariance of the result would suggest that the unrealistic assumptions were irrelevant to the final result; variation of the result would show that the predictions were not independent of the specific initial assumptions.
This method of testing whether a result is invariant under different initial assumptions is known as robustness analysis. The idea behind this strategy is that confidence in the predictions of a model increases if the predictions are invariant to small changes in the assumptions from which they are derived. In a slightly more formal notation, robustness analysis can be described as follows: we start from a model M, which consists of a core assumption C and an auxiliary assumption A 1 , from which the result R follows. If the same result R occurs respectively under different auxiliary assumptions A 2 , A 3 , A 4 , etc., we can conclude that changes in the assumptions do not influence the final result. In other words, if the result is invariant across conditions, we have an indication that it is the core of the model that is driving the result rather than the auxiliary assumptions (Kuorikoski et al. 2010).
As an example of robustness analysis, consider Schelling's segregation model (Schelling 1978), which describes the dynamics that lead to racial segregation within social groups. 2 Schelling's model starts from a simplified representation of its target system: a checkerboard, standing for a certain metropolitan area, and dimes and pennies, standing for the individuals of two different groups, for example Blacks and Whites. The main rule of the game is that the individuals on the checkerboard move from one place to another until the composition of their neighborhood meets their preferences.
As it turns out, regardless of their initial distribution in the metropolitan area, Black and White citizens will end up being clustered in different parts of the city, as a consequence of their preference for having at least half their neighbors of their own color. 3 With respect to robustness, the fact that segregation is shown to follow from different initial positions provides a robust result, which does not depend on one specific assumption, i.e. a particular representation of the distribution of the individuals in space.
The robustness of Schelling's result has been tested under a number of different assumptions, other than the initial position. For example, Bruch and Mare (2006) have shown that segregation occurs under different structures of neighborhood, and alternative choice functions. Muldoon et al. (2012) have shown that segregation occurs even when the individuals prefer to be in the minority group of their neighborhood. In the literature, alternative mathematical approaches to Schelling's model have even been explored that are based on analytical methods instead of simulations (Zhang 2004). The comparison of the results achieved via simulated models versus analytical ones is a further example of robustness analysis, this time applied across different mathematical treatments of the problem under analysis.
The different senses of robustness analysis illustrated above can be traced back to a classification drawn by Weisberg and Reisman (2008), according to which: 1) parameter robustness refers to variations in the initial conditions or in the value of the parameters of the model; 2) structural robustness refers to changes in the variables included in the model; 3) representational robustness refers to modifications in the mathematical structure in which the model has been implemented, as in the example of the analytical versus the simulated version of Schelling's model.
In the context of economic modeling, a further distinction has been introduced by Kuorikoski et al. (2010), who refer to robustness analysis as a strategy to assess the role of different tractability assumptions, i.e. different mathematical formulations of the same factor in a model. The label tractability assumptions was originally introduced in the literature by Hindriks (2005Hindriks ( , 2006Hindriks ( , 2012 to indicate assumptions imposed if the problem at issue cannot be solved or is significantly more difficult to solve without them. From this brief introduction, it should already be evident that a variety of strategies have been described as robustness analysis. 4,5 The attempt to connect one account with another sheds light on important differences between them that the use of a similar vocabulary has so far obscured. 6 In the philosophy of science literature, robustness analysis as a confirmatory strategy is at the centre of a contentious debate; the core of the dispute is about whether this practice is appropriate to guide the comparison between a theory or a model and the empirical world. According to the critics (Cartwright 1991;Odenbaugh and Alexandrova 2011;Orzack and Sober 1993;Sugden 2001;Stegenga 2009) robustness analysis is not a method of boosting confidence in an hypothesis. Above all, they maintain that robustness analysis is a non-experimental method of inquiry, at odds with the principles of scientific method. According to these principles, our hypotheses should be tested against the empirical evidence rather than against a priori reasoning.
According to its advocates, on the other hand, robustness analysis can be an effective guide in scientific practice (Kuorikoski et al. 2010;Weisberg and Reisman 2008). If a process or event is shown to be invariant across a range of assumptions, then scientists can omit the details of the problem without this undermining the final result. This turns out to be a crucial feature in all those areas of science where scientists cannot know or specify the exact configuration of the system under scrutiny.
A way to reconcile the different positions is to consider the merits that each one has. In line with its critics, it can be said that robustness analysis has lower confirmatory power than empirical testing. In line with its advocates, however, robustness analysis remains the preliminary strategy to assess the results of abstract mathematical models.
In this paper, I will first spell out different epistemic arguments underlying different cases of robustness analysis. In so doing, I will investigate the extent to which robustness analysis is a viable strategy to be used by scientists. In the example of Schelling's model, it is straightforward to vary, for instance, the initial position of the agents on the checkerboard and to observe the result across conditions. Schelling's model is often characterized in the literature as an example of a 'toy-model', to indicate a very abstract representation that idealizes away the characterizing features of the real world phenomenon. In Schelling's case, as well as in other examples, the single parts of a model can be replaced with different ones as if they were Lego building blocks. But is robustness analysis an option in scientific practice, when modelers have to deal with more complex model structures? I will show that in the case of complex models, whose components are in relation with each other partly to satisfy analytical requirements, it becomes more difficult to break them down into single units that can be exchanged with different ones. Even though this objection does not undermine the validity of robustness analysis under ideal conditions, it suggests that an evaluation of robustness analysis is still needed for those cases where the ideal conditions do not obtain.

Robustness analysis and de-idealization
The previous section lists a number of ways in which robustness analysis has been described in the literature. What is common to a variety of cases is that robustness analysis is a strategy to increase confidence in the results of theoretical models; also, that confidence increases through changes in the assumptions of a model and observation of the consequent effects. The main differences between cases are the elements that are manipulated and the logic behind their manipulation.
Let us start from the question of why an assumption should be replaced with a different one. In the philosophy of science, the literature on idealizations, abstractions, approximations, simplifications etc. is now extensive but it is uncontroversial across different accounts that unrealistic assumptions are not problematic per se. With a simple example, consider negligible assumptions, i.e. assumptions that represent factors that are irrelevant for the phenomenon under study (Mäki 2009(Mäki , 2011. A model should not be criticised for being unrealistic with respect to negligible assumptions. A model is by definition a partial representation of the target system; what makes a model adequate, at least according to certain accounts, is that it isolates the causal mechanism that is relevant for the phenomenon under study (Mäki 2009(Mäki , 2011. In its most basic sense, robustness analysis is conducted precisely to test whether the result of a model depends on the putative causes rather than on possible confounders. To do so, the assumptions are replaced by different ones, so to test that changes in the final result depend on variations in the causal factors and not on confounding factors. Note that this is the sense of robustness analysis that corresponds to what Weisberg and Reisman call parameter robustness analysis.
A further reason why an assumption might require modifications is that it leaves out aspects of the target system that might be relevant for the phenomenon under study. In such a case, one way to proceed is to relax the problematic assumption so as to consider the result under a more accurate representation of the target system. 7 To illustrate the case, I will refer to Weisberg and Reisman's discussion of robustness analysis. In their paper The Robust Volterra Principle, they present a body of results from the Lotka-Volterra model, i.e. a population abundance model for a predator species and a prey species, consisting of a pair of coupled first-order differential equations: where V is the population of the prey, t is time, P is the population of the predator, r is the intrinsic rate of increase in prey population, a is a measure of the capture efficiency, B is a measure of conversion of the captured prey into more predator, and m is the death rate of the predator population. One of the most important properties of the Lotka-Volterra model is known as the Volterra principle, which shows that the introduction of an external cause of death in the system, such as a pesticide that equally affects the prey and the predator, determines a relative increase in the abundance of the prey population.
In the Lotka-Volterra model, unrealistic assumptions are, e.g., that populations are treated as continuous, even though populations are discrete; that they grow indefinitely and continuously in time (see Colyvan (2013) for this and similar examples). These assumptions are said to be of the tractability kind, to indicate that they are mainly adopted for reasons of mathematical tractability: without these assumptions, it would not be possible to derive the solution of the problem. 8 Suppose it were not possible to quantify the error that these assumptions entail. In this case, one way to increase confidence in the validity of the model is to test the results under more realistic assumptions than the initial ones. Consider how the Lotka-Volterra model includes time as a tractability assumption in population dynamics. According to Colyvan and Ginzburg (2003, 72-3): 'Our view, in fact, calls for discrete equations, where time is treated discretely. These equations, however, are notoriously hard to deal with. We therefore continue to use differential equations, but we bear in mind that these are idealizations of the underlying finite, discrete-time model'.
Here, not only is it expected that both mathematical treatments, i.e. discrete and differential equations, would provide results that are largely in agreement, if differential equations are a good enough representation of the phenomenon under analysis; more importantly, confidence in the validity of results increases once a more accurate mathematical treatment is provided for the system under analysis. When it is so, then the more accurate formulation becomes the benchmark by which to assess the validity of the results; it is not the invariance of the result that increases confirmation of the result itself. This is because, if the results were not consistent, this would reveal certain problematic aspects of the original assumption.
But the replacement of a particular assumption with a more realistic one is based on different epistemological criteria than the standard argument for robustness analysis. When we replace an unrealistic assumption with a more realistic assumption, what we are doing is known in the literature as the de-idealization of a model. This method has been largely discussed in the philosophy of science and in epistemology and it raises questions that are tangential to robustness analysis (see e.g. Batterman 2008;Cartwright 2006;Mäki 2011;McMullin 1985;Odenbaugh and Alexandrova 2011). Examples of such questions are, how de-idealization should proceed, or what is the appropriate level of idealization according to the different nature and aims of the inquiry.
Robustness analysis is a different strategy from de-idealization. In robustness analysis, it is because different assumptions, none of which is more realistic than another, all determine the same result, that we claim that the unrealistic aspects of the assumptions do not compromise the validity of the model. This is the sense in which 'our truth is at the intersection of independent lies', as the famous biologist Richard Levins affirmed when introducing the notion of robustness analysis in the literature (Levins 1966, 423). When we replace a tractability assumption with a more realistic one, we 8 As with the concept of robustness analysis, so also 'tractability assumption' is a heavily loaded term. According to Colyvan 2013, the effect of the assumptions adopted for reasons of mathematical tractability is often negligible; for instance, nobody would criticize the result of the Lotka-Volterra model because it relies on continuous populations. Hindriks (2006), on the other hand, regards tractability assumptions as those whose effects are presumably non-negligible for the final result. In turn, Mäki (2011) has questioned the non-negligible features attributed to tractability assumptions. Overall, I will use the term in its general sense to indicate assumptions that omit aspects of the target system in order to facilitate mathematical tractability.
are not working within a network of models or assumptions, each of which controls for different aspects of the problem under consideration. Here, our truth is neither at the intersection of different lies, nor are these lies independent. Levins' idea refers instead to a situation in which we have a collection of models, which either stand or fall together, since each of them tackles specific aspects of the problem under analysis.
Despite its different underlying justifications, robustness analysis has often been referred to as a strategy to increase confidence in the validity of a model by showing that the result is invariant under more realistic representations of the system under analysis. Weisberg and Reisman's notion of structural robustness reflects this intent, as exemplified by the analysis of Schelling's result against refined utility functions, or of the Volterra principle under the density dependence parameter (Weisberg and Reisman 2008).
A further example comes from evolutionary game theory. Here, a standard objection to the validity of certain results about the emergence of cooperative behaviors is based on their alleged lack of robustness with respect to the individuals' cognitive constraints (Skyrms 1996;Sugden 1986;D'Arms et al. 1998). A limitation on the kind of possible strategies that can be transmitted across generations is the cognitive load they impose on individuals; thus, a result will not be considered significant if it is not robust under a model that takes these limitations into account.
In the above cases, confidence in the validity of the result increases as more realistic models are adopted. Robustness analysis, by converse, is not a process of 'concretization' of the model. In robustness analysis, unrealistic assumptions are replaced by other unrealistic assumptions in order to test the extent to which the final result of the model depends on them. On the one hand, this implies that the possibility cannot be ruled out that the further unrealistic assumptions might be affecting the final result. On the other hand, however, in abstract mathematical models, both from physics and economics, it is often the case that the level of theoretical abstraction is such that it is difficult to assess their validity in terms of the accuracy with which they represent the target system. Credible and unrealistic aspects are intertwined with one another to an extent that makes it inappropriate to talk about de-idealisation when replacing any of them with different ones. In these circumstances, robustness analysis does not deal with the realism or truth of the assumptions. In these cases, the underlying idea is that if a result is invariant across conditions, then the result does not strictly depend on the particular way in which the assumptions represent the target system, and thus on their falsifications.
Regardless of the position that one takes on the argument for robustness analysis, conflating it with de-idealization creates a terminological as well as a conceptual confusion. Terminologically, if robustness analysis is taken as a synonym for deidealization, the original definition by Levins no longer applies. Conceptually, the overlap of the two notions obscures the fact that the alleged confirmatory power of the two methods relies on different grounds. In the next section, by looking at an example of how scientists conduct robustness analysis, we unravel some of the philosophical confusion but at the same time some practical problems come to light.

Across-models robustness analysis
In the previous section, I introduced the problem of tractability assumptions in the Lotka-Volterra model. Ideally, if there are no other ways to justify their adoption, tractability assumptions should be replaced by more realistic ones. However, deidealization is often not an easy matter. In the Lotka-Volterra model, mathematical assumptions are adopted precisely because it is not clear how to proceed otherwise. In these circumstances, it is also difficult to replace them with other tractability assumptions. Continuous populations, infinite populations and continuous time require to be exchanged with assumptions such as discrete or finite populations, or discrete time. A different way to proceed in these cases is by comparing models that differ from one another along multiple lines. This type of robustness analysis corresponds to what Weisberg and Reisman (2008) define as representational robustness analysis, i.e. a test of the invariance of predictions across different mathematical approaches.
With respect to representational robustness, the predator-prey interaction has been analyzed both via differential equations and computational simulations and it has been shown that the Volterra principle holds in both cases. By deploying different modeling frameworks, i.e. a population-level model (differential equations) and an individual-based model (simulations), Weisberg and Reisman (2008) compare two different mathematical approaches for the analysis of predator-prey interaction. Let us consider in more detail the purpose of this comparison.
In biology-related disciplines, individual-based models are becoming increasingly common despite the lack of analytical results. This is mainly because the degree of specificity they enable scientists to achieve is higher than that achievable via previous standard analytic treatments, such as differential equations as in the Lotka-Volterra model. The question for robustness analysis is what to expect from a comparison of the results from the population-level and the individual-based models. Are the assumptions of one mathematical framework being tested, using another framework that does not take the same assumptions into account? What exactly is it, that is being compared across cases?
Consider again a mathematical assumption in the Lotka-Volterra model, such as that populations are continuous, not discrete. Is the individual-based model testing the effect of this assumption on the result? Strictly speaking, the effect of the continuous populations assumption in the Lotka-Volterra model can be tested when adopting the assumption of discrete populations, which is possible once the original tractability problem has been solved.
On the one hand, the fact that an individual-based model which is based on discrete populations gives the same result as the Lotka-Volterra model is an indication that the Volterra principle can also be derived under the assumption of discrete populations. On the other hand, however, when translating the Lotka-Volterra model into an individual-based model, many aspects of the initial model change. These changes come within an entirely new modeling 'package', whose assumptions will have to be tested in turn. Note that the more aspects have been changed, the further we are from analyzing the effect of one specific assumption.
The above claim can be illustrated with an example from Schelling's model. Suppose that a modeler were interested in testing whether two different network structures have different impacts on segregation. The two network structures differ from one another in the number of neighbors that the individuals take into account when making their decision to move on the checkerboard. Suppose that, apart from the network structures, the two models were alike. By simulation, the result proves to be invariant across conditions, which indicates that the differences between the two assumptions do not have relevant effects on the final result.
When the comparison is between an individual-based model and a populationbased model we are in a different situation. Here we are not testing the effect of one single change in the assumptions. In this case, we are comparing two models that differ not only in the assumption that we wanted to test, but in that one plus many others. This is because-at least at the moment-we are only able to include discrete populations within a entirely different modeling structure. This means that, whenever we are testing the invariance of results across conditions, we always test the original tractability assumption plus a number of other assumptions that are implied by the new one. Hence, whether or not the results are in agreement, we cannot conclude that this provides an indication about the role of the original tractability assumption. The result has to be taken as determined by the model as a whole, not as a case of robustness analysis where a single or a few assumptions have been replaced with a different one to assess their impact on the result.
An objection to the above claim is that it does not really matter that several elements change from one model to another, if the result is invariant across conditions. In other words, if two models differ in many respects, and still the result is invariant, this provides an even stronger indication of the validity of the result, regardless of whether the target system is more accurately represented by one model or the other.
Notice, however, that the argument just outlined is grounded on other considerations than the standard argument for robustness analysis. If we take the comparison of entirely different whole models as an instance of robustness analysis, then the confirmatory power of this strategy no longer derives from what it has hitherto claimed to be (Kuorikoski et al. 2010;Levins 1966;Odenbaugh 2011). Robustness analysis has been described as a practice of building models of the same phenomenon, which differ slightly from one another, so as to identify which assumptions are necessary for deriving the final result. This is done on the basis that the results that are robust across conditions depend on the shared, rather than on the different assumptions. According to Lehtinen (2016, 2): 'If a result is robust, only the assumptions that overlap between the models could be needed for its derivation, and the other assumptions are thus dispensable.' This is not the case for the simulated version of the Lotka-Volterra model. Here, Weisberg and Reisman (2008) had to introduce new factors, such as a density dependence parameter, in order to get results, which were only comparable with the population-based model. In fact, a situation in which very different models provide the same result is quite a fortunate case, probably an exception in science. At that point, the problem becomes that of assessing which result is more accurate on the basis of the different merits of each model.
The example above illustrates in simple terms a problem under discussion among scientists working with complex simulation models. In the assessments of climate sciences models, we find that experts are cautious about the possibility of comparing the results of models that differ from one another in a number of different elements. According to Parker (2006, 350): 'Complex climate models generally are physically incompatible with one another-they represent the physical processes acting in the climate system in mutually incompatible ways'. According to Lenhard and Winsberg (2010, 258): 'The complexity of interaction between the modules of the simulation is so severe that it becomes impossible to independently assess the merits or shortcomings of each submodel'.
In conclusion, the question of how to compare substantially different models differs from robustness analysis defined as a method of testing the effect of controversial assumptions by replacing them with single different ones. The comparison of different modeling frameworks needs further investigation and the subject opens new challenges that are already attracting the efforts and the attention of scientists, challenges that are however different from the comparison of models that differ only with respect to the assumption under analysis. In these cases, the differences between models are several and such that it is not clear how to map the different components with one another. When it is so, is it an open question whether, and on what grounds, the robust results are mutually supporting each other. In the next section, I will return to the initial problem of how to replace a particular assumption in isolation, this time with a case study in geographical economics.

Robustness analysis and tractability assumptions
In the previous sections, I have first shown that robustness analysis differs from de-idealizations; and then, that when the replacement of a particular assumption in isolation is not a viable option, an underdetermination problem occurs concerning how to compare models that differ from one another along multiple lines. In this section, I will present a case-study from geographical economics which again provides insights into the actual process of model manipulation and into the possibility of changing single assumptions in isolation.
In the literature on robustness analysis, a paradigmatic case study comes from the literature in geographical economics (Kuorikoski et al. 2010;McCann 2005;Neary 2001). Geographical economics is at the centre of a debate between economists and philosophers of science precisely because of the tractability assumptions on which it is based (see below). Broadly speaking, geographical economics is a sub-field of economics that studies the relation between economic activity and spatial location. The model at the centre of the debate is known as the Core-Periphery model; it was formulated by Paul Krugman in 1991 and earned him the Nobel prize for economics in 2008.
The Core-Periphery model investigates the conditions under which an economic activity agglomerates in a certain region (the core), as against the conditions under which it disperses (the periphery). Various factors influence this process. The forces affecting geographical concentration depend on the advantages of being in a region with good access to the market as against the advantages of being in a region where competition is lower and there is no risk of market congestion. A key factor is the cost of transporting goods from the place of production to that of delivery. The higher the transportation costs, the nearer the economic activities to the place of demand and, contrariwise, the lower the transportation costs the farther the economic activity from the centre.
In the history of geographical economics, Krugman's contribution was crucial in determining a paradigm shift from previous theories of international trade, which were based on tariff costs. The advancement in the field was attained thanks to the introduction of an 'iceberg' costs function, which is so called because it is based on the principle that part of the goods 'melts away' when transferred from the place of production to the place of delivery.
Even though the 'iceberg' formulation is obviously a theoretical construct, i.e. it is not based on direct observation, still it is considered to be appropriate mainly for two reasons: first, it reflects the idea that goods are costly to transport; secondly, it enables the formulation of transport factors not as a separate component of the model but as part of the goods themselves. This is the sense in which the 'iceberg' cost function is a tractability assumption. Since it would be problematic to introduce additional factors into the model to account explicitly for the diminishing value of the goods, the mathematical trick is to do as if a lesser amount of goods would arrive at their destination. In the words of Krugman: 'In terms of modelling convenience, there turns out to be a spectacular synergy between [...] market structure and 'iceberg' transport costs: not only can one avoid the need to model an additional industry, but because the transport cost between any two locations is always a constant fraction of the free-on-board price, the constant elasticity of demand is preserved' (Krugman 1998, 11).
In the geographical economics literature, some of the features of the Core-Periphery model are a matter of debate. According to Fingleton and McCann (2007, 168) for instance: "Geography enters these [economic geography] models specifically and only via the Krugman adaptation of the Samuelson (1952) model, the properties of which are implausible and counter to most observed evidence." In response to this and other difficulties, in subsequent formulations of the Core-Periphery model, geographical economists have tried to measure how sensitive the predictions are to the 'iceberg' assumption. To do so, the attempt has been made to test the results under different functions that do not show the same problematic aspects. The main problem is that the functions that differ from the 'iceberg' function, by not showing the same contended properties, are difficult to implement in the Core-Periphery model, which is what a test of robustness would require.
One of the most controversial aspects of the Core-Periphery model is the convexity of the price function. The convexity of the price function is derived from the way in which price, value and quantity of goods are defined in geographical economics and combine together in the Core-Periphery model. It is not guaranteed that if a certain feature such as 'price increases convexly with distance' needs to be tested, then a model can be built where the feature 'price is concave with distance' is introduced while all the rest remains as before. The convexity of transportation costs follows from the mathematics of the model as a whole.
In fact, one of the reasons why the 'iceberg' cost function was initially introduced was indeed to enable the mathematical tractability of a certain problem.
This assumption accommodates the analytical requirements of the model, such as increasing returns to scale, imperfect competition and constant elasticity of substitution. Because of the very features of this assumption, it is particularly difficult to replace it with a different one, and leave the rest of the model untouched.
A further consideration is that a model that differed from the initial one, by showing concave price with distance, would not maintain the crucial properties of the Core-Periphery model, i.e concentration of the economic activity in the Core versus dispersion in the Periphery. This is because high transportation costs exert a counterbalance to agglomeration, which is crucial for the interplay of centrifugal and centripetal forces in equilibrium formation (McCann 2005).
A different strategy would be to prove that other theoretical frameworks, not based on 'iceberg' transportation costs, produce similar results to those of the Core-Periphery model, thereby providing independent evidence. However, according to McCann: 'It is almost impossible to provide direct comparisons between models with the 'iceberg' assumption and those with other sets of transport costs assumptions embedded in them. [...] This is because these more traditional transport costs functions are analytically incompatible with new-economic geography models' (McCann 2005, 312). 9 Theories in international trade that differ from the geographical economic model have not been as successful as the Core-Periphery model in terms of equilibrium analysis, so that at the moment there is no theoretical alternative available, with which the results of the Core-Periphery model can be compared. This brings us back to the problem discussed in the previous section: when the results of different models embedded in different mathematical frameworks are tested against one another, a way has to be found to map their constituents for the comparison of the results to be meaningful.
In line with this analysis, the reaction of the scientific community to the shortcomings of the 'iceberg' cost function was indeed an effort to build models that were not based on the same problematic assumption. For instance, according to Isard: 'The first advance [in space economy] would involve dropping the iceberg assumption regarding transport cost.' (Isard 1999, 383). Also, in 2009, the World Bank Development Report was dedicated to geographical economics and there we find: 'By using techniques that essentially assumed away the internal workings of transport [...] the more critical policy-related aspects also have been assumed away.' (World Bank Report 2009, 185). These remarks are to show that the response of economists to the shortcomings of the model was to make progress on how to avoid tractability assumptions in the first place.
The case-study discussed above shows a particular way in which robustness analysis can go wrong. A single counterexample does not undermine robustness analysis across the board. The strategy can still be successful in evaluating models that differ in some specific aspects from one another. The question is to what extent the very concept of tractability assumptions poses a limit to the possibility of replacing them in isolation. This depends on how we interpret the concept of tractability assumptions. If the set of tractability assumptions is stretched so far as to include all kinds of assumptions, insofar as they represent factors in a way that has to be tractable in some sense, then the problems highlighted above are restricted to some extreme cases. However, if there is something specific about tractability assumptions, in that they have a particular mathematical role in a model, as in the geographical economics case or in the Lotka-Volterra example, then their replacement with specific different ones might be problematic for the very reasons why they have been used at the outset. Consider for instance how Colyvan defines this kind of assumptions: "These idealisations are usually invoked in order to employ familiar and well-understood mathematical machinery." (2013,1339). Implied in this statement is that the reason why we do not use a different mathematical machinery is that it is not as well understood as the one that we use. Or, consider how Morrison discusses mathematical abstractions: "In situations like this where we have mathematical abstractions that are necessary for arriving at a certain result there is no question of relaxing or correcting the assumptions in the way we de-idealize cases like frictionless planes and so on; the abstractions are what make the model work." (2009,110). Here, the replacement of a mathematical assumption is not even considered as a possibility.
Overall, several different accounts have been put forward in the literature concerning the status of tractability assumptions and their role in a model. There is a continuum of cases that ranges from assumptions considered to be innocuous, to assumptions that cannot be relaxed, to assumptions that are assumed in spite of their unrealistic features. The claim defended here does not hinge on the peculiarities of any specific position. Across cases, the possibility has to be considered that, if we conceive of models as systems of inter-connected parts, then it is reasonable to expect that changes in certain aspects of the model will in turn determine further changes in other aspects of the model, in a chain of related effects. This is especially the case for assumptions introduced partly for the purpose of satisfying certain analytical requirements dictated by the formal structure of the model. The adoption of this kind of assumptions turns out to play a crucial part in a variety of cases of model building. Thus, it is particularly urgent to think of novel criteria for the assessment of models that rely on their use.

Conclusion
The aim of this paper was to investigate the epistemic goal of robustness analysis in theoretical models and to spell out the details of the procedure in scientific practice. In philosophy of science, robustness analysis is defended as a method of testing the invariance of a model's results under different assumptions. As argued in this paper, several different reasons underlie the replacement of an assumption with a different one. One reason is that an assumption represents a possible confounding factor; by changing it, a modeler tests whether the result depends on the mechanism identified as responsible for the phenomenon and not on possible confounders. Another reason is that an assumption omits aspects of the target system that might be relevant for the phenomenon under study. In this case, a possible strategy is to replace that assumption with a different one, so as to assess the results across conditions. In this paper, two examples have been presented-one in population ecology and one in geographical economics-in which the replacement of an assumption with a different one requires to change the model in more than the aspect under test. Comparing substantially different models, however, is based on other considerations than comparing models that differ only with respect to a few assumptions. Just as different experimental practices might lead to different results, thereby raising the question of how to interpret these results (Stegenga 2009), the same is true of predictions deriving from models with different initial assumptions. A view on model validation that is becoming prominent in the philosophy of science literature maintains that families of models, rather than single models, should be used as a basis for assessment of the final results (Knuuttila 2011;Muldoon 2007;Wimsatt 2007). However, the standard argument for robustness analysis does not necessarily apply to these situations, thus making this an area of research where philosophical work is particularly needed. The problem of how to compare results deriving from structurally different models is one of the most interesting questions that the debate on robustness analysis has opened to today's scientific practice and promising works are expected to come from this research area in the near future.