Classical or Gravity? Which Trade Model Best Matches the UK Facts?

We examine the empirical evidence bearing on whether UK trade is governed by a Classical model or by a Gravity model, using annual data from 1965 to 2015 and the method of Indirect Inference which has very large power in this application. The Gravity model here differs from the Classical model in assuming imperfect competition and a positive effect of total trade on productivity. We found that the Classical model passed the test comfortably, and that the Gravity model also passed it but at a rather lower level of probability, though as the test power was raised it was rejected. The two models’ policy implications are similar.


Introduction
In the last few years debate has raged over whether EU trade arrangements are beneficial, in particular to the UK. The EU is a customs union and so erects trade barriers around its Single Market where economic activity is regulated according to EU rules.
The welfare effects of a customs union have always been controversial. According to classical trade theory global welfare is reduced compared with free trade as is the average welfare of citizens inside the customs union; however one country's citizens may gain from the union if it is a net exporter to others in the union, as then its terms of trade gain may offset the losses experienced by its consumers (Meade 1955). However in recent times a new line of reasoning has become popular amongst trade economists: this 'gravity model' (e.g. Costinot and Rodriguez-Clare 2014) regards trade as an outcrop of internal trade, the only difference being that it crosses borders. Otherwise it grows naturally due to the specialisation and division of labour within neighbouring markets. Viewed through the lens of the gravity model a customs union merely makes official what is already a fact of neighbourly inter-trade. Other sorts of trade, with more distant markets, grows analogously but more weakly, the greater the distance; size of distant markets may make up for their distance to some extent, because they are a 'neighbourhood' that naturally leads to inter-trade. 'Gravity' in trade creation can be thought of as a function of distance and size. In this view of trade it makes no sense to put obstacles in the way of trade with close neighbours such as the EU in the hope of boosting trade with distant markets via new trade agreements that lower trade costs. The disruption from the former will reduce welfare whilst the gains from the latter will be small, simply because the reduced trade costs will have little effect in switching demand from existing products in the presence of weak and imperfect competition.
Clearly these two models, the classical and the gravity models, are different and so may well have different welfare implications. However, whilst trade economists have recently tended to favour the gravity model over the classical, there has been no convincing empirical test of the two models as overall predictors of the data. Gravity modellers do point to the Tinbergen (1962) gravity regressions as evidence in favour of the gravity model. However these regressions have long been familiar to trade economists, and classical trade models too can generate trade data in line with these regressions. Thus we face here an identification problem: two models can both generate the same data, at least that would be the claim of their proponents. We need an empirical test that can discriminate powerfully between the two models. To state this requirement is to ask for something that hitherto has not been attempted by trade economists: it has seemed simply too difficult to subject these large non-linear general equilibrium models, often with many hundreds of equations, to any such test. It is after all hard enough to solve them for particular policy constellations, let alone have them generate predictions that can be compared with the facts of trade. So it is quite understandable that trade economists have not felt any urgency in testing their models other than in informal and quite casual ways. This is a case however where cross-fertilisation can occur in economics; in other areas of the subject there have been substantial steps taken in developing methods that can allow large models to be tested against the data. These areas are econometrics, macroeconomics and computer studies. The power of the computer has grown steadily and massively over the past few decades and brought within reach highly computer-intensive methods of model estimation and testing. The use of Monte Carlo experiment has enabled economists to gauge the effectiveness of these methods in small samples, which of course the trade economist is like the macroeconomist condemned to.
One example of this progress is in Bayesian estimation in which priors are assigned representing theorists' knowledge. However, for an area of such controversy as trade models today, it would be hard to construct priors that would command any agreement. Instead we are at a scientific inflexion point where we need to have tests that convincingly decide what the world is like, so that this knowledge can later be embedded in priors for future research. Such tests are 'frequentist', that is they reject models that do not generate the known data with adequate frequency or probability.
Macroeconomic models share many features with trade models: they are large, complex, based on maximising behaviour of agents, may be non-linear, and the timeseries data which they require is limited in quantity. It so happens that a great deal of work has been done in applying frequentist methods to macroeconomic models. Two main methods are candidates for the roles of estimation and testing: maximum likelihood and indirect inference. Le et al. (2016) review the two approaches and their small sample properties; they conclude that maximum likelihood has poor small sample properties, with both substantial estimation bias and low testing power. By contrast indirect inference has low estimation bias in small samples and largescale testing power. They review an increasing number of examples where indirect inference has been applied (such as Le et al. 2011) and show how policymakers could have benefited from considerable assurance about the robustness of their models.
Indirect Inference is a relatively unfamiliar procedure but is being increasingly used because of these properties. In essence it uses the same tools as Bayesian estimation, namely simulation of the model being tested by the method of bootstrapping in which the actual model errors are repeatedly resampled as the best guide to their underlying distribution: one can think of bootstrapping as a practical way of applying Monte Carlo methods of simulation when the underlying error distribution is not known. In Indirect Inference the facts of the data behaviour are estimated separately from the model being tested; this estimated model of the data is known as the 'auxiliary model' and it is designed to capture the key relationships in the data that the modellers need to match with their theory-based ('structural') model under test. The test procedure is highly intuitive. First we estimate the auxiliary model which records the relationships found in the data for the sample period we are dealing with. Then we simulate the model repeatedly to generate 'parallel histories' of this sample period; each of these parallel history samples then has the same auxiliary model estimated on it, the logic being that each 'sample' could have occurred and therefore could have given a different set of auxiliary relationships. Finally the many different estimated auxiliary relationships give us their 'joint distribution'-that is, the probability of different combinations of them according to the structural model. From this joint distribution we can determine how likely it was that this model generated the actual relationships we found in the data. To put it quite informally we create the world according to the model and then we ask how likely the actual world we see would be according to that model. If the likelihood is low-typically we choose a cut-off probability of 5%-then we reject the model.
In this paper we have applied this method to testing the gravity and classical models on available UK post-war annual data from 1965 to now. It is the first time to our knowledge that any trade model has been tested by modern computer-based methods and so we feel that it should be a useful contribution to the debate. To anticipate our main conclusions we find that both the models pass our main test fairly easily, the gravity model having the lower probability (its probability drops further as the gravity elements are strengthened and if the test's power is raised enough it is rejected); however both models behave quite similarly and their key policy conclusions on tariffs do not differ.
The paper proceeds with the following sections. We begin by describing the two models and discussing how we might set them up as alternatives. We go on to describe the classical model we choose here in full detail. In the next section we do the same for the gravity model, explaining exactly where it departs from the classical model. After a short section showing the data, we proceed to a section describing the auxiliary model and then to the section where the models are tested by indirect inference, going through the mechanics of the whole process and revealing the results. We then move to our conclusions.

What are the Classical and Gravity Models of Trade?
At the current time many economists who specialise in trade favour, as already noted, the gravity model of trade-see Breinlich et al. (2016), Costinot and Rodriguez-Clare (2014). Under this model trade is determined largely by the forces of demand, from neighbours wanting imports and from others modified by the factor of distance-due to transport costs and border costs; competition is rather limited, highly 'imperfect', and prices are set by producers as a mark-up on costs, so they move rather little. Once demand has determined trade and the production to meet it, foreign direct investment (FDI) and associated innovation follow it, boosting productivity. In short, whilst supply is important in this gravity approach, supply is largely determined by the forces of demand.
Because it is hard to break into new and distant markets it makes sense in this approach to support existing markets. Hence leaving the EU will damage existing markets' demand, so reducing trade and so reducing supply and productivity via falling FDI and innovation. Reducing trade barriers with the rest of the world will only weakly substitute for this loss of demand by stimulating more demand there.
Even though the EU protects its markets via trade barriers, this on the gravity view is good for the UK because it raises demand for our exports within the EU. Hence this school of thought is in favour of EU protectionism-it could be called 'neoprotectionist'. In general free trade according to the gravity approach is something that must be evaluated case by case on the basis of its effects on demand for UK products and so the supply side of the economy.
Proponents of this gravity approach claim that it is supported by the 'facts'consisting of many estimated relationships between exports and the GDP of the demanding countries, adjusted for distance. Indeed the gravity 'model' is essentially calibrated to replicate these relationships. However, as already explained, we need to allow for a possible identification problem: that the rival classical model also generates these relationships.
The rival model of trade is the classical one developed by the great trade theorists of the past two centuries-starting with Ricardo (1817)-and pursued in much empirical work based on it. The fact that these ideas come from a long tradition of thinking does not of course mean that they are thereby wrong because 'old'. We have also witnessed an earlier major reversal of classical thought, the Keynesian Revolution, which has now been largely ditched in favour of a return to classical principles.
The classical model assumes high competition across world markets, with world prices being the same across the world subject to transport costs and trade barriers; there is free entry into all industries so that prices equal average costs. Capital flows freely across borders in the modern world version, but each country has largely fixed supplies of other factors, namely unskilled labour, skilled labour and land. In this model supply forces such as the supply factors and their productivity determine the size of a country's different sectors. The resulting income is then spent according to home demands and the surplus of supply over demand is then exported, the deficit imported in each sector. The model is silent on the allocation of demand to imports and home goods and on the allocation of exports to different foreign markets. However, it would be normal to add on some such allocative model on top of the basic structure, as we will do here. Thus it can be seen that the causal structure of the classical model is quite different from that of the gravity model. In the classical model supply determines the essential structure of trade; demand adjusts to be consistent with this. In the gravity model demand determines the structure of trade and in turn forces supply to adjust to this.

What Must Be in a Trade Model of Either Type?
The aim of this paper is to set out and test a model of UK trade that can answer questions about big trade regime changes, such as Brexit. Such a model needs to capture some salient features of the modern globalised world.
One such feature has been the inexorable rise of highly competitive supply chains where buyers for the final product distributors have ruthlessly eliminated cost from their supplies. A good example is the way in which Tesco has used these techniques to streamline its purchasing and create 'lean' inputs-see Evans and Mason (2015).
Related to this rise of the supply-chain is the massive fall in tariffs that has occurred around the world without any assistance from a multilateral 'round' (the Doha round having failed). The World Bank data bank shows that weighted average world tariffs fell from around 34% in 1996 to around 2% today-an astonishing drop. It appears that so eager are countries to have their own input products join supply chains that they eliminate all tariffs on their inputs to enhance their competitiveness for the chain. Countries further down the chain buying from them do the same and the whole tariff level comes tumbling down. One must assume that the same is happening for non-tariff barriers along these chains since exactly the same logic applies. Data on these is of course rather sparse.
Another feature should be the presence of brands. However note that brands will buy the cheapest inputs as part of their survival strategy. A brand that does not can well go out of business-examples are IBM laptops, Nokia and Blackberry.
Input markets are business-to-business and do not generally follow branding strategies, rather relying on demonstrable quality (reviewed by professionals) in the business market. Free entry cannot be prevented and since the world market is of massive size, economies of scale can be assumed to be exploited.
A feature that must also be included is the country cost base as determined by its factor endowments. Capital can in general be considered mobile and therefore not specific to any country. However, land and labour (with different education and skill levels) differ markedly across countries and have a natural role in determining product mix. It is plain for example that the UK's heavy endowment of educated and skilled labour is an important factor in its emergence as a major supplier of traded services, such as education, healthcare and 'City'/financial services. A further element in the cost base are the 'institutional endowments', such as good law and infrastructure, which reveal themselves in sectoral productivity.
We are interested in the capacity of the structural model to generate the trend behaviour found in the data. Plainly we do not want to judge our model by some short term behaviour since it is a model of the long term behaviour of trade and the economy. As a computable general equilibrium model it is solved by comparative static methods and it has no explicit dynamics; it is not a 'Dynamic' Stochastic General Equilibrium model like a macro model whose role is to pick up short and medium term economic fluctuations.
The questions trade models are designed to answer concern which sectors of output will grow or contract via trade channels and how trade patterns will develop with other countries/blocs; also the effects on factor markets, such as wages and labour supply. These elements should be in the auxiliary model. A further element could be the effects of commercial regime changes-such as joining the EU or making changes in tariffs and non-tariff barriers. The main difficulty in Indirect Inference testing is that any factors must be stochastic so that they can meaningfully be simulated. For example German reunification cannot intelligibly be considered to be a stochastic event, at least for a sample just of the post-war period; any sample of data with such an event has to have this event's effects stripped out of it much like seasonality is stripped out. Similarly the act of the UK joining the EU is a one-off event, with no stochastic distribution. However in terms of a trade model its significance lies in the resulting changes in commercial poliy, such as tariff changes brought in by the UK, including those resulting from EU accession, together with later EU-instigated tariff changes; these can together be treated as a process with stochastic properties.
To test a model's simulation performance against the data behaviour requires careful selection of the data features to be matched. Indirect inference tests tend towards unlimited power as the number of features is increased: as one tries to match all features of behaviour one ultimately requires to have the real world itself as the model. Hence to give the test a reasonable level of power, that on the one hand will reject tractable models of some moderate falsity but on the other will not reject all models that are even slightly false, a small number of relevant data behaviour features need to be selected; experience suggests close to a dozen.
The main data movement we want to explain is in output shares by sector and trade (export+import or total trade) shares by country bloc. We have two of each: i.e. manufactures and services output (the implied residual share being agriculture) and trade shares of the EU and North America (the final one being the rest of the world). These two sets of shares summarise the economy's output structure and direction of trade. Accompanying these trends are: a) world relative prices and UK relative productivity of manufactures and services, treating raw materials as the numeraire. b) UK relative factor supplies of land and skilled labour, treating unskilled labour supply as the numeraire c) relative tariffs and transport costs from each country bloc into the UK; and from the UK into each bloc. Here the main changes will be in the relative fall in transport costs from more distant markets as containerisation has reduced shipping and air freight costs; and in trade barriers with the UK's joining of the EU in 1972 and subsequent changes in EU commercial policy.
Not all of these elements are 'exogenous' necessarily. In the gravity model productivity is endogenous, as are relative factor supplies in both models. We are concerned to use statistical relationships we find in the data and since all this data is trended in some way we need to be assured that the associated variables are cointegrated, which we can check by testing their residuals for stationarity.
To construct these relationships we relate the trade shares and the output shares and these other elements in a series of multiple regressions; these constitute the auxiliary model. We would hope to find around a dozen key coefficients from this to use as elements of the Wald statistic matching the data behaviour to the simulated behaviour from the structural model.

The Classical Model of Trade
We begin with the 'classical' model of world trade, whose intellectual origins lie in the work of Ricardo (1817), Heckscher (1959), Ohlin (1933), Stolper and Samuelson (1941) and Rybczynski (1955). In this model output is determined by factor supplies and sectoral productivity. Outputs here are defined as intermediate products, which will be used as inputs into final goods for consumption; they are divided into primary (agriculture and raw materials), manufactures, traded services and nontraded output. For the UK world prices are exogenous as is also the commercial policy regime setting tariffs and non-tariff barriers. Capital is freely available from the rest of the world at the world's exogenous cost of capital.
UK consumers can choose consumption by product origin for each sector. The idea is that distribution is imperfectly competitive, whilst intermediate output is all sold in perfectly competitive world markets. Retail products are bundles of intermediate supply-chain products. These bundles are 'branded' to create distinct products that consumers will not easily switch from owing to shortage of time, habit etc. However bundlers will buy inputs that are commoditised to yield best value.
The bundles are differentiated by country of origin-as well as by product type but we ignore this aspect here. The origin differentiation arises because of differential tariffs etc and transport costs-'trade frictions'. Thus whilst all the inputs have the same cost at some notional point in the world market midway between borders, their total cost includes these frictions. The distributor applies a mark-up reflecting the elasticity of substitution in the final market.
However because of perfect competition in the world intermediate market world intermediate prices are immune to all tariffs and transport costs in a standard way. This can be seen informally as follows. Imagine a country, the EU, puts a tariff on the manufactures from the UK and we assume for simplicity that it lowers the tariffs on other sources so that consumer income is unchanged and only relative prices altered. We assume total EU demand for the product is unchanged therefore; this is the case in the model where total demand equals GDP, and the share of the product depends on its relative price, determined in world markets. Now demand for the UK product in the EU falls, demand for non-UK product rises. With world prices of intermediates unchanged total supplies of intermediates from all countries remain the same. Hence in other markets supplies from non-UK sources will be smaller by exactly the amount that UK supplies will be larger. Hence we can think of retail bundlers using more UK supply and less non-UK supply in retail brands where the two origins are equal in frictional costs. Effectively the UK output displaced from the EU is diverted to other markets whilst non-UK output is diverted to the EU market; in the third markets bundlers are indifferent between the two supplies and switch seamlessly between them, so avoiding any movement in world prices. We get pure trade diversion from the imposition of the tariff.
The model here is as in Minford et al. (2015), a CGE model of trade, output, factor supply and demand with four products, four factors and four 'countries' (or country blocs), of which the UK is one, and the others are the EU, NAFTA and the Rest of the World. Capital is mobile. The products are manufactures, other goods (agriculture and raw materials), traded services and non-traded.
These products are considered as intermediates which are supplied at the border or the factory gate in country markets to country distribution industries that operate under imperfect competition. As noted above we treat these products as aggregated and do not consider any disaggregation by type of sub-product. However, we consider disaggregation by product 'country origin'. Thus all products are supplied by distributors as branded products which differ according to country origin characteristics. Thus country products will be identically branded if they happen to have the same country origin characteristics: ie the same transport cost and tariff which are the features distinguishing different country origin.

The Model of Consumption
Distributors' costs are identical and all supply to the retail market at marginal cost times a mark-up reflecting the (identical) elasticity of demand. Demand for each brand is determined by an Armington cascade model in each country. Thus consumers have a disaggregated utility function, C, over country brands as follows: Maximising this subject to total consumption demand, being the mark-up. J is the main product category and MC is normalised at unity. C J is the amount demanded of the main product according to the model's Cobb-Douglas demand function. Overall demand (consumption) is set equal to overall output of each country by the equilbrium conditions. p i is the relative price of the ith product within J. ρ < 0 so that σ = 1 1+ρ > 1. P J , the product's price to the country from the world market, is set equal to world prices adjusted for the general MFN tariff rate and transport cost from the world market in the country. p i is the relative price of the country product dependent on the country's relative distance and tariff rate.
The demand functions above are specified for the UK, the EU and NAFTA where we have data on differential tariffs by country. In the Rest of the World (ROW) we assume that MFN tariffs hold and distances from the three other blocs are all the same. Thus in effect the ROW acts as a residual market where product not demanded by other countries is sold, by virtue of the world balance conditions noted earlier.
By these demand mechanisms we allocate all UK output to the home, EU, NAFTA and ROW markets by destination; and we allocate all UK demand similarly to all these markets as origins. We do not consider the origin/destination of other countries' trade, since the focus of our model is on the UK solely for testing purposes in this paper; of course it could be done for them in principle. But testing trade models on other countries' experience is a substantial undertaking which we believe to be an essential one for the trade economist community, hitherto oblivious as it has been to issues of empirical testing. For this test of the model on UK data we treat EU and US consumption of each J product as exogenous, rather than solving the model for all EU and US variables.

The Model of Intermediate Production and Trade
This model follows the one Minford et al. (1997) developed for assessing the effects of globalisation on the world economy. This model performed well empirically in accounting for the trade trends of the 1970-1990 period; it identified a group of The first order condition yields C i = C J ( λpi υi ) −σ . To find λ note that from the Lagrangean δL δCJ = λ. Note also that when the constraint is satisfied (as it must be at all times) L = C J so that in addition δL δCJ = 1. Hence λ = 1. major causal 'shocks' during this period which between them gave a good fit to the salient features of the period-including terms of trade, production shares, sectoral trade balances, relative wage movements and employment/unemployment trends. The model adopts the key assumptions of the Heckscher-Ohlin-Samuelson set-up. Production functions are assumed to be Cobb-Douglas and identical across countries, up to a differing productivity multiplier factor; thus factor shares are constant, enabling us to calibrate the model parsimoniously from detailed UK data that we were able to gather. There are four sectors: non-traded and three traded ones, viz. primary, basic (unskilled-labour-intensive) manufacturing, and services and other (skilled-labour-intensive) manufacturing. Three immobile factors of production are identified: unskilled and skilled labour and land. Capital is mobile. All sectors are competitive and prices of traded goods of each sector are equalised across borders.
This set-up gives rise to a well-known set of equations: 1. given world prices of traded goods, price=average costs determine the prices of immobile factors of productions 2. these factor prices induce domestic supplies of these factors. 3. outputs of each sector are determined by these immobile factor supplies; nontraded sector output is fixed by demand, the traded sector outputs by the supplies of immobile factors not used in the non-traded sector. 4. demands for traded goods are set by the resulting level of total GDP. 5. world prices are set by world demand=world supply The world is divided into four blocs: UK, REU (rest of EU), NAFTA, ROW (rest of world). In our model here, focusing on the UK, we treat world prices and other countries' consumption as exogenous processes.
In the UK we treat primary sector output (agriculture mainly) as politically controlled and essentially fixed exogenously because of the highly interventionist planning system. The supply of land is adjusted (via planning and other controls) to adjust to this output requirement; in other words the supply of land is demanddetermined. Whilst this assumption is crude in overriding all incentive effects on output, the reality of agricultural production is closer to this than to the uncontrolled alternative.

The Full Model
To these equations we add the demand equations discussed above. There are: EU and NAFTA demand for UK products, these being UK exports to these areas; UK demand for EU, NAFTA, and Rest of World products, these being UK imports from these areas. Exports to the Rest of the World are determined as the residual to ensure current account balance-as explained above.
The model can now be listed: π M , π S , π A , π D are exogenous productivity error processes 5-7 Factor demands, UK [Rest of EU, NAFTA, Rest of World] N, H , L : +0.51832 · p S · y S + 0.132 · p A · y A ) .e S ] [L = l −1 · (0.113 · p D · y D + 0.035 · y M · p M +0.033 · p S · y S + 0.079 · p A · y A ) e A ] ·{0.331 · p D · y D + 0.299 · p M · y M + 0.237 · p S · y S + 0.642 · p A · y A } e K e M , e S , e A , e K are factor demand error processes e A is agriculture land demand error process 9-11 Factor supplies: L is supplied equal to demand through the government/planning system (which fixes agricultural output exogenously).
T M , T S , T A are simply the tariff+non-tariff+transport cost real barriers to trade between the UK and world markets. As we do not have time-series data on these, they are all set to unity; what this implies is that all these effects are absorbed into the model's error terms. The exchange rate simply changes all prices in proportion in sterling, leaving them unchanged in dollars. So effectively all the prices in this model are in dollars relative to world manufacturing prices in dollars-the numeraire.
World prices, p W orld : exogenous processes. 29 Error process We assume the log (errors) in the model follow a AR(1) process with intercept and trend, i.e., Tariffs and other trade barriers affect these demands, but as already noted we have not got time-series data for these so their effects are included in the errors.

Setting Up the Gravity Model
In the gravity model trade patterns are determined by the trade share equations. Because of imperfect competition throughout all markets the supply of goods is determined by their demand; the trade share equations express this demand. We now need to include the effect of the real exchange rate, RXR,in the trade equations since prices are no longer set in world markets; instead the real exchange rate moves the prices of UK goods relative to foreign competitor prices in order to achieve current account balance.
Thus we now have the same trade bloc except that now the demand from the rest of the world also determines exports to the ROW, and all trade shares are affected by RXR.
Trade share bloc: This now gives us total trade. The em i and ex i are exogenous error processesthese include the effects of trade barriers which we cannot observe in a time-series manner. We estimate cm i and cx i by OLS and bootstrap the trade share data (M i /E T and X i /GDP i ) from above equations; we set the elasticities of demand to the real exchange rate at (import) ψ = 2, (export) ψ = −2.
According to gravity modellers, the total size of trade (exports plus Imports) determines flows of foreign direct investment and so productivity, via intensifying links with foreign firms through trade relationships. We now therefore write the productivity terms as a function of total trade, T . π M , π S , π A , π D are now no longer purely exogenous productivity error processes but now each contain a term in T . T are defined as following.

fixed (equal to the sample mean), and ln(M i /E T ) = cm i + e M,i i = EU, NAF T A, ROW ln(X i /E i ) = cx i + e X,i i = EU, NAF T A, ROW
so that T is an exogenous variable; here we omit the RXR effect on trade flows on the grounds that it will not affect total trade, only the relative size of exports and imports. Thus a fall in RXR will raise exports and lower imports through expenditureswitching, leaving total trade approximately unchanged. The productivity terms are then written as We now turn to the factor price equations where as before productivity is a key determinant: World prices as before, together with productivity, determine home factor prices. Note that there is in addition 1) a sectoral imperfect competition mark-up relating home prices to world prices for that sector; as we assume this mark-up is exogenous, it will be absorbed into the productivity error process, which is found from these equations; 2) a general imperfect competition mark-up across all traded sectors, representing a real devaluation. This last is the same across all sectors and world prices here are adjusted for this-effectively they are world prices converted into sterling. The rest of the model is the same.
In this Gravity model we have imperfect competition; but UK suppliers must adjust their mark-up, RXR, in order to achieve current account balance. RXR moves to solve for current account equilibrium.

Data
The sources of the data are as follows: 1) Output by sector: Agriculture, Industry, Service, Nontraded -source ONS national accounts. For: 2) Trade data (export and import data) by sector: Agriculture, Industry, Service; 3) Population and employment.
Sources are: World Bank, World Development Indicators. 4) Skilled workers (Adult Tertiary education as % of total population).
Sources: Statistical abstract for the United Kingdom 1935, Board of Trade; Annual abstract of statistics, ONS/CSO; Higher Education Statistics Agency. 5) Earnings of skilled workers: Ratio of skilled earning to unskilled earnings (Decile9/Decile5); Source: OECD Database. 6) Goods price index: Agriculture, Industry, Service.
Source: Free market commodity price indices, United Nations Conference on Trade and Development. It has two price index: agriculture and raw materials price index, unit value of index of manufactured goods exports by developed economies. We use them as world agriculture and manufacture price index respectively. The world service price data is not available. We use UK service producer prices, which are obtained from Office of National Statistics(ONS), to proxy world service index. 7) Rent on land (£ per hectare), Real interest rate; Source: ONS.
All data are annual data from 1965 to 2015. Figure 1 below plots the data series. , OS UK = y M y S ,which we put on the left hand side for covenience; and on the right hand side we have the relative productivity residual of manufacturing/services, π M π S ; the relative factor share, skilled/unskilled labour, H N ; the wage of unskilled relative to skilled workers, w h ; and EU GDP and NAFTA GDP.
The auxiliary model equations are potentially: We will use these equations in full at a final point in our analysis. However, we begin with a reduced set of equations, 1)-3) and without the coefficients in w h . The reason for choosing this reduced set was to achieve good but not excessive power in our test. As noted above, the more features are included in the test -in this case the features are the coefficients α ij -the higher generally the test's power; it is therefore possible for the power to be so great that only models very close to the real world can pass, in which none will. Our basic test is chosen to keep a limit on the test's power. Later, we will discuss the effects of raising the test power further.
These variables, endogenous and exogenous, will not be stationary but rather will have either deterministic or stochastic trends. However the residuals in the reduced form are stationary since the regressions will be relationships derived from equilibrium structural relationships such as those found in our CGE model; these should be co-integrated therefore (Table 1).

Testing the Models by Indirect Inference
The indirect inference (II) test criterion is based on the difference between descriptors, the auxiliary model, from simulated data and actual data as represented by a Wald statistic, hence we call it an IIW (Indirect Inference Wald) test. If the structural model is correct (the null hypothesis) then the simulated data, and the data descriptors based on these data, will not be significantly different from those derived from the actual data. The simulated data from the structural model are obtained by bootstrapping the model using the structural shocks implied by the given (or previously estimated) model and computed from the historical data; we bootstrap the UK shocks but not the exogenous world variables so that in effect we are using the model to create histories that embody local UK shocks but all include the same world history. The test then compares the data descriptors estimated on the actual data with the distribution of data descriptors derived from multiple independent sets of the simulated data. Intuitively, we can think of this as asking whether actual UK history, which of course embodies the actual UK shocks as well as actual world history, can be shown at some chosen test level of probability to come from the distribution of potential histories created by differential UK shocks together with actual world history. This forms the basis of our test which as we will shortly see has considerable power. 2 2 In these trade models world variables are solved for in the model when the whole world model is operating; here this is not the case as the UK part of the model is solved on its own, with the rest of the world treated as exogenous. We hope in future work to endogenise the rest of the world's trade in the context of the whole model working in full stochastic mode. It might be thought one could treat the exogenous variables as simple time series and bootstrap them accordingly, as is done often with DSGE models of the open economy. However, the values produced by such bootstrapping produce unbounded and unlikely behaviour in the highly nonlinear UK trade model; when disciplined by the whole world model's structure these values would be tightly bounded by the whole model's solution processes.
We then use a Wald statistic based on the difference between a T , the estimates of the data descriptors derived from actual data, and a S (θ 0 ), the mean of their distribution based on the simulated data, which is given by: where W (θ 0 ) is the inverse of the variance-covariance matrix of the distribution of simulated estimates a S and θ 0 is the vector of parameters of the trade model on the null hypothesis that it is true.
The following steps summarise our implementation of the Wald test by bootstrapping. A detailed description of the IIW test can also be found in Le et al. (2016).
Step 1 Estimate the errors of the economic model conditional on the observed data and θ 0 . Estimate the structural errors of the structural model, x t (θ 0 ), given the stated values θ 0 and the observed data. The number of independent structural errors is taken to be less than or equal to the number of endogenous variables. The errors are not assumed to be normally distributed. Where the equations contain no expectations the errors can simply be backed out of the equation and the data. This is of course the case in the models here.
Step 2 Derive the simulated data On the null hypothesis the {π i,t } T t=1 and {e i,t } T t=1 are the structural errors. The simulated disturbances are drawn from these errors.One requirement for the bootstrap is that the disturbances are serially independent. In some models, including the trade model, many of the structural errors are assumed to be generated by autoregressive processes rather than being serially independent. If they are, then under our method we need to estimate them. Depending on the stationarity property of the structural errors, we may estimated them as AR(1), AR(1) with time trend or AR(1) on the first difference process.
We then derive the simulated data by drawing the bootstrapped disturbances by time vector to preserve any simultaneity between them, and solving the resulting model. To obtain the N bootstrapped simulations we repeat this, drawing each sample independently.
Step 3 Compute the Wald statistic We estimate the auxiliary model, using both the actual data and the N samples of simulated data to obtain estimates a T and a S (θ 0 ) of the vector α. The distribution of a T − a S (θ 0 ) and its covariance matrix W (θ 0 ) −1 are estimated by bootstrapping a S (θ 0 ). The bootstrapping proceeds by drawing N bootstrap samples of the structural model, and estimating the auxiliary model on each, thus obtaining N values of a S (θ 0 ); we obtain the covariance of the simulated variables directly from the bootstrap samples. The resulting set of a k vectors (k = 1, ...., N) represents the sampling variation implied by the structural model from which estimates of its mean, covariance matrix where a k = 1 N N k=1 a k . We then calculate the Wald statistic for the data sample; we estimate the bootstrap distribution of the Wald from the N bootstrap samples. The IIW statistics are given by We can show where in the Wald statistic's bootstrap distribution the Wald statistic based on the data lies (the Wald percentile). We can also show the Mahalanobis Distance based on the same joint distribution, normalised as a t-statistics, and also the equivalent Wald p-value, as an overall measure of closeness between the model and the data. 3 One important issue concerns the power of the Wald test in this context. We gauge this by a Monte Carlo experiment where we treat one of these models as true and generate many samples from it. We then test the model on each of these samples and compute the rate at which it is rejected by our 5% test. Plainly when it is true it will be rejected 5% of the time. What we want to know is how this rejection rate will rise as the model departs further and further from the truth; we do this by changing all the structural model parameters by x% (+ and − alternately). Table 2 shows the results of this experiment where the classical model is treated as true. We do it for our main auxiliary model, which contains equations 1)-3) above, and without the relative wage variable: here we include the trade share with the EU and NAFTA but not with ROW.

II Test Results
Step 1 Estimate the errors of the economic model conditional on the observed data and θ 0 . For the classical and gravity trade model listed above, we extract the structural errors π i,t , e i,t , em i,t , ex i,t given the stated parameter values in the model and the observed actual data. We test the stationarity of the errors by ADF and KPSS tests (Table 3) and estimate an appropriate process.
Step 2 Derive the simulated data

Classical trade model
Based on the ADF test above, we assuming trade share errors are following an AR(1) process: We estimate the AR(1) process above and the implied model innovations ε mi and ε xi are serial independent. We draw the bootstrapped innovations and then the trade share errors. We generate trade share data from trade share equations in classical trade model.
To bootstrap the other trade variables listed from Eq. 1 to 27, we first get the implied model residuals (π i and e i ) from Eq. 1 to 27. Based on the tests in Table 3, the productivity errors are nonstationary and we assume their first differences follow an AR(1) process with drift. The factor share residuals are trend stationary and we assume they follow an AR(1) process with a constant and time trend, i.e., We estimate the AR(1) process above and bootstrap the productivity residuals (π i,t ) and factor share residuals (e i,t ). And then we can bootstrap all the other endogenous variables 4 by solving the trade model listed from Eqs. 1 to 27. The details of the model solving process are summarised in Appendix 1.
The trade share errors are stationary and we assume they follow an AR(1) process with a constant: We estimate the AR(1) process above and draw the bootstrapped trade share data from trade share equations in classical trade model.

Gravity model
To bootstrap the other trade variables listed from Eqs. 1 to 27 in the gravity model, the productivity terms are determined by the trade effect T where we assume a semi-elasticity of 2.0 for both manufacturing and traded services-thus a 1 percentage point change in the total trade share in GDP causes a 2% rise in productivity in each case. The factor share residuals are trend stationary and follow an AR(1) process with a constant and time trend, i.e.,

S, A, N, H
We estimate the equations above and bootstrap the productivity residuals (π i,t ) and factor share residuals (e i,t ). And then we can bootstrap all the other endogenous variables from the trade model listed from Eqs. 1 to 27 in the gravity model.
The trade share errors are stationary and we assume they follow an AR(1) process with a constant: We estimate the AR(1) process above and draw the bootstrapped trade share data from the trade share equations in the gravity trade model.
The estimated coefficients for the error processes are reported in Table 4 below. Appendix 2 shows the residuals for the classical model (Fig. 3) and the model innovations (classical model Fig. 4 and gravity model Fig. 5).
Step 3 Compute the Wald statistic II Wald test results, bootstrap number 5000 What we see here is that both models pass the test, with the classical model having a higher probability. Some indication of why this might be happening is provided by Fig. 2 showing the behaviour of the data on our variables and also the average of all the simulations by each model for these. As can be seen the gravity model tends to overpredict the EU and NAFTA trade shares, and also fails to pick up the trend in the output ratio. These comparisons are merely indicative since the rigorous Wald test (Table 5) is based on the whole joint distribution of the simulated coefficients of the auxiliary model, which plainly depend on all the simulations and not simply the average. We also examine the results when one eliminates the specific gravity model effects one by one. The two 'gravity effects' are 1) the assumption of imperfect competition which affects the trade share equations (the 'Gravity trade share equations') and 2) the effect of the total trade share on productivity (the dT effect). What we see is that as we remove either of these gravity effects the probability of the gravity model rises to about the same as the classical model.
Essentially one can see from these results that the models are actually stochastically rather close to each other. The 'gravity effects' within this computable general equilibrium model are quite small in the end. The Total Trade shares do not fluctuate   enough to have much effect on productivity; and the disturbances to current balance equilibrium from demand shocks to trade do not make RXR move much either so that the trade shares move much as they do in the classical model. Of course it would be possible to construct another 'gravity' model entirely where the production functions differed from the ones assumed here. But such a model would differ not just because of the gravity assumptions but because of other differences in approach, on the supply side of the model; that would be another story. What we have investigated here is what happens when one introduces imperfect competition, with a limited size of elasticities, in trade and also a link from trade shares to productivity (via channels such as FDI)-these being the two elements stressed in the recent gravity literature. The answer seems to be not much due to each element alone ( Table 6).
As a last experiment we greatly increased the quantitative size of the gravity effects, tripling the elasticity of trade shares on productivity and halving the RXR elasticities. We denote this as Gravity model Mega. The results are reported in the table below. The result is a big deterioration in the probability of the gravity model ( Table 7).
The implication of all these experiments is simple enough. The most probable model is the classical model. The gravity model, specified in a moderate way, is about 15% less probable. By dropping either imperfect competition or the link from trade size to productivity the probability loss can be roughly eliminated. Making the gravity model elements stronger-tripling the size of the trade/productivity link and halving the trade elasticities (more imperfection in competition)-reduces the gravity model probability further still, making it 40% less probable than the classical model.
We have chosen to use the three equations 1)-3) as our auxiliary model. It is of interest to ask what happens as we raise the number of equations and features included in the test, thus raising its power. Adding equation 4) would raise the power of the test, increasing the rejection rate, and it also puts even more emphasis on trade shares as opposed to output or other aspects of the data. So we have not used it as our main criterion. What we see in the following Table of Wald p-values is that that  it lowers the probability of both models to about equal, further illustrating the point that these models are close in character (Table 8). Finally, we add into all the equations the relative wage, w h , as the extra regressor; this additional feature raises the power of the test further as is evident from the Monte Carlo experiment shown in Table 9. Any model with 3% or more inaccuracy is rejected virtually 100% of the time. Now we can also see from Table 10 that at this level of power the gravity model is rather strongly rejected, whilst the classical model continues to be accepted quite easily.
The general conclusion from this series of Indirect Inference tests with increasing power is that the classical model fits the UK trade facts well, and better than the gravity model. With a test of really considerable power, the gravity model is even rejected quite strongly whereas the classical model survives. The base run is based on year 2015 data. W elf are = 100[y t /p t − y/p − (N t + H t + L t + K t − N − H − L − K)]/y, where y t , p t , N t , H t , L t , K t are simulated data after tariff Finally, if we consider a typical policy simulation where we raise the tariff rate on food and manufactures by 10%, we can see that the results do not differ much across the two models. What this Table shows is that the two models generate the same welfare loss from a rise in UK-imposed tariffs of 10% on food and manufacturing. 5 An important part of the UK government's free trade policy is the negotiating away of the tariffs on food and manufacturing currently placed by the EU on UK imports from non-EU sources; as we have noted this also raises prices from EU sources within both the models here so that it is as if this is also a tariff on imports from the EU where the tariff revenue goes to EU producers. Thus we can think of UK government policy as consisting of a) the abolition of a general tariff on food and manufactures plus b) the return of tariff revenue currently paid to EU producers. This simulation considers a)b) can be computed from net UK imports from the EU of these commodities. What is interesting is that on both models this policy is computed to have the same effect (Table 11).

Conclusions
In this paper we have examined the empirical evidence bearing on whether UK trade is governed by a Classical model or by a Gravity model. We used annual data from 1965 to 2015 and the method of Indirect Inference which has very large power in this application. The Gravity model here differs from the Classical model in two ways: it assumes imperfect competition in world markets (affecting the trade share equations) and it assumes that the total trade share has a positive impact on productivity. We found that the Classical model passed our main test rather easily, and that the Gravity model did so also, if at a rather lower level of probability; however as the power of the test was raised to include the maximum number of data features to be matched, the gravity model was rejected whilst the classical model survived. These are stringent tests; our Monte Carlo power function implies that even in the least powerful test quite small parameter errors would cause rejection all the time. The fact that both these models can pass the least powerful test suggest that they are close in character and also close to the truth. It is therefore not surprising that the policy implications of the two models do not seem to differ on the key issue of protection.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Step 4 Solve for E t and other endogenous variables in the model