Distortions and the size distribution of plants: evidence from cross-country data

We study the relationship between economic distortions and the size distribution of plants using comparable plant-level data across 104 developing countries. Our main result is to show that, other things equal, countries with larger economic distortions allocate more labor to small unproductive units. By decomposing the business environment into different type of distortions, we find that poor access to financial credit is the one driving our results. We also show that there exists a significant cross-country relationship between the size distribution and aggregate productivity. These results are consistent with a large recent literature on misallocation of resources across firms.


Introduction
Why do some countries have so low levels of income per capita? Why, for instance, income per capita in Nepal is only 2.5 % that of the United States? A common view is that a high proportion of income variation across countries can be attributed to differences in total factor productivity (TFP). 1 Moreover, a recent strand of literature has started to emphasize misallocation of resources across plants as a source of these differences in aggregate productivity. 2 This literature emphasizes that the existence of distortions favoring small low productivity firms and hindering large high productivity firms makes the economy deviate from its first best. Any distortion that leads to too many resources being allocated to relatively small unproductive firms makes aggregate productivity fall. As a result, the size distribution of firms becomes a crucial object in understanding the aggregate productivity of a country.
Although these recent works on misallocation have provided valuable insights on the impact of distortions on how resources are allocated across firms, our understanding of the underlying factors driving the variation across countries in the firm size distribution remains somehow limited. In this paper, we empirically investigate the cross-country relationship between the amount of labor allocated to small plants and a number of economic distortions using plant-level data: the Enterprise Surveys of the World Bank 2006-2010. This dataset has three main advantages. First, it is standardized. This means that every plant in every country answers the same questions, allowing for comparability across countries. Second, coverage is very broad (more than 100 countries), which gives us power to validate the statistical significance of our findings. And third, the sample of surveyed plants is representative of the population of formal private non-agricultural plants. This allows us to establish some facts about the allocation of resources beyond manufacturing. Figure 1 shows that there exists substantial variation across countries in the share of labor allocated to small plants. To motivate our analysis, we first show that there exists a strong relationship between the size distribution and productivity at the aggregate level: countries with a higher amount of labor allocated to small plants have lower levels of income per worker and TFP. We then show that high economic distortions, as measured by the Doing Business Index, are systematically associated to a higher amount of labor allocated to small plants. This is so after conditioning for other determinants of the size distribution, such as the size of the informal sector, amount of FDI, and presence of export firms. We address endogeneity concerns by instrumenting the economic distortions with variables argued in the literature to provide exogenous variation in institutions. We also explore the different components of the business environment driving the previous result. We decompose the Doing Business Index into measures of access to credit, tax rates, costs of entry, rule of law, trading easiness, and corruption. We find that, when we include all these components together, access to credit is the one driving our results. From a theoretical point of view, the sign of the relationship between TFP and the share of employment accounted for by small plants is ambiguous, since it depends on the type of model and the nature of the distortions studied. This comes from the fact that the amount of misallocation and its relationship with the prevalence of small plants is also model specific. However, in the type of models and distortions that have been analyzed in the recent misallocation literature, this latter relationship is generally positive, implying a negative association between TFP and the amount of resources allocated to small plants. This is the case, for instance, in models of occupational choice à la Lucas (1978) under the presence of distortions that impose restrictions on firms' size, see for instance Guner et al. (2008) and García-Santana and Pijoan-Mas (2014). This is also the case in the model used by Hsieh and Klenow (2009) when firms' idiosyncratic taxes are positively correlated to productivity. In their model, this positive correlation implies that more productive firms are inefficiently small, whereas small unproductive firms are inefficiently large. This has actually been found to be the case when carrying out the Hsieh and Klenow (2009) exercise for a large set of developing countries, see Busso et al. (2013). Finally, Bartelsman et al. (2013) find substantial cross-country variation in the covariance between size and productivity. In particular, they find that this covariance is generally lower in poor countries. This suggests that the higher presence of distortions in low-income countries biases the size distribution of plants towards small unproductive units.
The rest of the paper is organized as follows. Section 2 discusses the related literature and places the paper within it. Section 3 explains in detail the characteristics of our dataset. Section 4 illustrates the relationship between size and productivity at the aggregate level. Section 5 shows that economic distortions significantly explain part of the cross-country variation in the amount of labor allocated to small plants. Section 6 analyzes the role of particular distortions. Finally, Sect. 7 gives concluding remarks.

Related literature
Our paper is related to some old works that study the size distribution of plants in developing countries. Banarji (1978) shows for a small number of countries that the average size of plants is positively correlated with physical capital intensity. Liedholm and Donald (1987) provide evidence of poor countries having most of the employment allocated to small and large plants, establishing a phenomenon known as "the missing middle". In a classic paper, Tybout (2000) collects this evidence and relates it to the poor performance of the manufacturing sector in developing countries. Leaning on country-level studies, he argues that a strong business regulation can be behind the excessive presence of small entrepreneurs. By remaining small, entrepreneurs are able to avoid government regulation and hence do not achieve a larger size. 3 However, in a recent work, Hsieh and Olken (2014) find evidence that suggest that large firms are actually the ones most affected by regulation.
Our paper also contributes to a more recent literature that investigate in a systematic way the cross-country variation in the size distribution of firms. Alfaro et al. (2009) use establishment-level data for 79 countries to calibrate a closed economy version of Melitz (2003), which they use to infer the level of distortions necessary to generate the observed deviation in the size distribution of establishments with respect to the US. Our exercise is different, since we take direct measures of distortions from the data rather than inferring them, and look at their relationship with the size distribution of firms. Poschke (2014) uses the Global Entrepreneurship Monitor (GEM) and Amadeus dataset for around 50 countries to document that the average, standard deviation, and skewness of the size distribution of firms are positively correlated to income per capita. His empirical findings are consistent with ours, since he shows a strong cross-country negative relationship between income and the importance of small production units in the economy. He then constructs a model of occupational choice with skill-biased change in entrepreneurial technology that can account for this facts. Busso et al. (2013) apply the Hsieh and Klenow (2009) methodology to 10 Latin American countries using establishment-level data produced by the countries' statistical offices. They find that dispersion in the marginal products of capital and labor account for the low measured productivity in the manufacturing sector of these countries. Finally, in a very recent work, Bento and Restuccia (2015) construct a new dataset containing information of the size distribution of establishments in 134 countries. They find a clear relationship between income per capita and the prevalence of small production units. They rationalize this finding with a version of Hsieh and Klenow (2009) with endogenous entry and productivity investment in which firms' idiosyncratic distortions, i.e., wedges, are positively correlated to physical productivity.
Our paper provides additional evidence about the cross-country differences in size distribution, and emphasize that a poor regulatory environment is behind the excessive amount of resources allocated to small plants in developing economies. This result is consistent with the cross-country implications of a recent influential literature that uses theoretical frameworks to quantitatively measure the aggregate effects of the presence of distortions. 4 This literature shows that the existence of distortions diverts the economy from its first best. In particular, distortions make too many resources be allocated to small unproductive firms, generating big output losses. Guner et al. (2008) show that policies that reduce the average size of establishments by 20 % lead to reductions in output up to around 8 %. Hsieh and Klenow (2009) find that removing distortions in India and China such that marginal products are equalized to the extent observed in the US would imply TFP gains of up to 50 % in China and up to 60 % in India. García-Santana and Pijoan-Mas (2014) show that removing a particular sizedependent policy in India, the Reservation Laws, would imply a TFP gain of 2 % in the Indian manufacturing sector. 5 Our paper is also related to recent works that investigate the relationship between finance and misallocation. The quantitative impact of financial frictions on aggregate productivity varies across studies, since it depends on the framework used. Erosa and Hidalgo-Cabrillana (2008) show that financial frictions can generate misallocation of resources both across entrepreneurs of different talent and across industries with different needs of external financing. Amaral and Quintin (2010) present a model of occupational choice in which poor enforcement can generate big output losses by increasing the use of less productive technologies. Buera et al. (2011) calibrate a two sectors version of Lucas (1978) to the US economy, showing that financial frictions can generate TFP losses of up to 40 %. Caselli and Gennailoli (2013) study a model in which poor contract enforceability is associated to the prevalence of dynasticfamily firms, which reduces TFP. Moll (2014) shows that, when firms' productivity shocks are persistent, TFP steady state losses due to financial frictions are small, since entrepreneurs can overcome financial constraints by self-financing using their own wealth. Using Korean plant-level data, Midrigan and Xu (2014) show that most of the income losses due to financial frictions stem from distortions associated to entry and technology adoption decisions. In these works, the presence of financial frictions is often associated to a higher presence of unproductive small entrepreneurs and hence lower average firm sizes. This relationship is explicitly quantified by Quintin (2008), who shows that a model of limited enforcement calibrated to the US, by matching the measures of access to finance of some developing countries, can replicate the observed differences in average firm size between the US and Argentina and Mexico. Our paper contributes to this literature by presenting cross-country evidence of this relationship.

Enterprise surveys of the world bank
The ESWB are a collection of plant-level surveys meant to be representative of a country's non-agricultural private formal economy. The goal of these surveys is to collect information about the business environment and how it affects the performance of plants across developing countries. The data is collected from business owners and top managers. They include number of employees, amount of capital employed, sales, type of ownership, perception of corruption, finance, and obstacles to growth, among other plant characteristics.
Although the ESWB are compelling from the country coverage point of view, the number of observations per country is usually small. Typically between 1200 and 1800 establishments are interviewed in large economies, 360 in medium-sized economies and 150 in small economies. While representative pictures of the aggregates can be computed by using the provided population weights, the dataset is somehow limited to construct representative measures at a high disaggregation level. This is why we take the country-level regressions as our preferred specifications, and carry out some country-(broadly defined) sector analysis as a robustness exercise.
We use the Standardized Data covering the period 2006 to 2010. 6 Our sample consists of 104 countries, most of them of low and middle income per capita. 7 Table  9 in the "Appendix 1" lists the countries included as well as some characteristics of them. Per capita GDP of percentiles 25, 50 and 75 in our sample are respectively 3, 10 and 24 % that of the US.
The ESWB do not cover either informal firms or establishments with less than 5 employees. Both issues go against finding a positive relationship between economic distortions and more labor in small firms. In a simple regression of the share of labor in small plants on economic distortions, the coefficient of distortions is biased downwards, because informal firms are generally smaller, and they are more prevalent in countries with higher economic distortions. Despite this issue, we still find a significant positive correlation between economic distortions and the amount of labor in small plants. Moreover, we observe the degree of competition stemming from informal firms, hence we introduce it as an additional control.
Micro data not targeting firms under a certain size is a common shortcoming in the literature of size distribution. For instance, Alfaro et al. (2009) truncate the data in 20 employees, as countries with low coverage in their database are very likely to over represent older and larger establishments. Also, Hsieh and Klenow (2009) work with Indian plants of more than 10 workers and with non-state Chinese firms of more than 5 million yuan in revenue. The dataset used by Poschke (2014), the Global Entrepreneurship Monitor (GEM), includes very small establishments. This survey targets households instead of firms in order to identify entrepreneurship. Its main limitations are that coverage is not very wide and that it does not contain information about large firms. Poschke (2014) overcomes the latter problem by merging GEM with Amadeus, which provides good coverage of large firms. Table 1 shows the main characteristics of the datasets used for cross-country analysis of the size distribution. The main characteristics of different datasets used to perform cross-country analysis of the size distribution of production units

External validation of enterprise surveys
Most of our sample is comprised of countries with very low levels of income per capita. This raises a concern regarding the accurateness of the measurement of employment in our data, as surveys may be less reliable for poorer countries. We address this issue by performing an external validation of the ESWB data, comparing it to a widely used aggregate dataset, the Penn World Table 7.0 (PWT), see Heston et al. (2011). We proceed as follows. For each country in our sample, we estimate the total number of workers in the sectors targeted by the Enterprise Surveys. The sampling methodology of the surveys is random sampling with replacement, and weights are provided for each observation. 8 Then, we estimate the total number of workers in the ESWB by multiplying the number of employees of each plant by the associated weight (see Table 10 in the "Appendix 2" for a definition of all variables used in the paper). Then, we compare this number with the total number of workers in the country reported by the PWT. Panel A of Fig. 2 shows that there is a high cross-country correlation between both estimates (0.69), suggesting a fair degree of accurateness of the Enterprise Surveys data.
Nevertheless, we still find some dispersion between the number of workers computed from both data sources, and this dispersion widens in countries with a lower number of workers. The ESWB do not target all economic sectors, such as for instance agriculture. On the contrary, PWT takes into account overall employment (including agriculture). For this reason, we do a second exercise consisting on regressing the log number of workers provided by the PWT against the log number of workers estimated from the ESWB, adding the share of employment in agriculture as a control. Panel B of Fig. 2 shows the partial correlation (net of employment in agriculture) of the number of workers in the ESWB and the PWT. The dispersion is significantly reduced.
There are a few countries that depart from the straight line, though. For the sake of transparency, we decided not to drop them from our analysis. Nevertheless, as shown later, excluding them makes the results stronger. 9 Overall, this comparison speaks in favor of the quality of employment data of the Enterprise Surveys.  Table 7.0 (PWT). a displays the raw correlation. b shows the correlation controlling for the size of the agricultural sector

Size distribution of plants and productivity at the aggregate level
In this section, we provide evidence of a cross-country negative relationship between aggregate productivity and a size distribution of plants skewed towards small establishments. Our aim is to show that the plants' size distribution is an important object for understanding the variation in aggregate productivity. This motivates the analysis of the next section, in which we explore the determinants of the cross-country heterogeneity in the amount of labor employed in small plants. 10 We consider two measures of aggregate productivity: TFP-computed as in Caselli (2005)-and labor productivity-GDP per worker. 11 We explore the relationship between aggregate productivity and two statistics of the size distribution: the average plant size and the share of employment accounted by plants of different size.
We start our analysis by looking at the cross-country relationship between average plant size and aggregate productivity. We classify countries in different groups according to their aggregate productivity. For TFP, we split countries in two groups (below and above the median) and for GDP per worker we consider three groups of the same size. We then compare the mean across countries of the average plant size between the different categories of aggregate productivity. 12 We find that plants are on average considerably larger in countries with a level of TFP above the median, see Panel A of Table 2. In particular, we find that the average 10 In "Appendix 3" we analyze the relationship between size and productivity at the micro level. The results are in line with a broad literature that have documented the positive association between firm size and productivity, see, for instance, Leung et al. (2008), Bernard et al. (2003), Van Ark andMonnikhof (1996), andLittle (1987). 11 Our measure of TFP is: TFP = y k 1/3 h 2/3 where y = real GDP per worker in international dollars (PWT 6.1); k = capital-labor ratio (PWT 6.1) and h = average human capital computed using Barro and Lee (2001). 12 There are 47 countries in ESWB with data on TFP and 99 with data on GDP per worker. The 5 countries without data on GDP per worker are assigned to a group according to their level of income per capita using the classification of the World Bank. The mean across countries of the average log size of establishments for different levels of aggregate productivity relative to the US. Panel A uses TFP -estimated as in Caselli (2005)-and Panel B uses GDP per Worker. The 47 countries in the ESWB with data on TFP are divided in two groups: below and above median TFP. The 99 countries in our sample with data on GDP per worker are classified in three groups of the same size: low, medium and high labor productivity. 5 countries without data on GDP per worker are assigned to a group according to their level of income per capita using the classification of the World Bank. Standard deviations are in parenthesis. Columns (3) and (7) displays the P values of a one-side test with null hypothesis being that the average of the low group is lower than the high group, allowing for unequal variances plant size in countries with a high level of TFP is around 30 % higher than in countries with low TFP. The same pattern is found in Panel B, where we compare countries across different levels of labor productivity. We observe a monotonic positive crosscountry relationship between the average plant size and the level of GDP per worker. In particular, the average plant size in countries with medium and high labor productivity is around 25 percent higher than in countries with low labor productivity, being these differences statistically significant. Next, we compute for each country the amount of labor allocated to plants of different size. In particular, we calculate the share of labor allocated to small (less than 20 employees) and large plants (100 or more employees). That is, we compute: where S s c and S l c are the shares of employment allocated to small and large plants in country c, respectively; i denotes an establishment, and N c is the number of establishments surveyed in country c. L i is the number of employees and ω i corresponds to the weight associated to establishment i.
Panel A of Table 3 shows that countries with lower levels of TFP allocate more employment to small plants. In particular, we observe that, on average, countries with TFP below the median allocate 17 % of employment in small establishments, whereas in countries with TFP above the median, this share is just 9 %. Not surprisingly, this pattern is reversed when looking at the percentage of workers employed in large plants. The average share of employment accounted by small and large establishments across countries according to their level of TFP (Panel A) and labor productivity (Panel B). TFP is computed as in Caselli (2005). Labor productivity is calculated as GDP per worker. The 47 countries in our sample with data on TFP are divided in two groups according to their TFP below or above the median (low and high). The 99 countries in our sample with data on GDP per Worker are divided in three groups of the same size according to their GDP per worker (low, medium and high). The 5 countries without data on GDP per Worker are assigned to a group according to their level of income per capita, using the World Bank classification. Standard deviations are in parenthesis In countries with high TFP, 70 % of employment is accounted by large establishments, whereas this figure is just 58 % in countries with low TFP. 13 The same picture emerges when we use GDP per worker as a measure of productivity. Panel B of Table 3 shows that countries with low GDP per worker allocate on average 20 percent of employment to small plants, whereas countries with high GDP per worker allocate just 14 %. The opposite result arises when we look at large plants. These account for more than 60 % of total employment in counties with high GDP per worker and only 54 percent in countries with low GDP per worker. 14 In Fig. 3 we show the linear correlation between the share of employment accounted by small plants and aggregate productivity (TFP in Panel A and GDP per worker in Panel B). A one standard deviation increase in the share of employment accounted by small plants is associated to a .35 decrease in TFP and a .23 decrease in GDP per worker. These relationships are statistically significant in both cases. Specifically, both linear equations look as follows: This negative cross-country association between productivity and the share of employment allocated to small plants is consistent with previous works. There is a large literature that shows that low income countries tend to allocate more employment to small production units, see Tybout (2000). Using recent data, Poschke (2014) documents a positive correlation between firms' average employment and per capita GDP, a fact that is also present in our data. 15 Given the strong positive cross-country association between income per capita and productivity, it is then expected productivity and the share of employment allocated to small plants to be negatively correlated.
In the next section, we emphasize that economic distortions can significantly explain why some countries allocate more resources to small plants.

Size distribution and distortions
In this section, we provide evidence on the relevance of economic distortions in explaining the cross-country variation in the plants' size distribution. We perform OLS regressions of the share of employment accounted by small plants (which in our specification summarizes the size distribution of a country) on economic distortions and several controls.
We start with the Ease of Doing Business Index as a proxy for economic distortions. This index provides objective measures of the business regulations that local firms face, covering a wide range of topics such as entry costs, government bureaucracy, financial conditions, taxes, contract enforcement, etc. One main advantage of this index is that it covers many countries in the developing world, for which statistics are usually hard to obtain. Also, it is based mostly on quantitative measures, hence cross-country comparisons are more meaningful. In its original form, the index is a ranking of economies from best to worst business environment, ranging in our sample from 13 to 183 (the country with lowest economic distortions and the country with highest, respectively, see "Appendix" Table 9). We normalize it so that a higher value of the index means a better business environment and to have unit standard deviation in order to ease the interpretation of the coefficients. Later on we focus on the specific distortions that potentially underlie the results that we find on the overall business environment.
Certainly, there are many additional factors beyond the business environment that affect the size distribution of plants across countries. These factors generate inconsistent estimates of the causal relationship of local distortions on the size distribution, as long as they simultaneously affect the business environment and the size distribution through a channel independent from the business environment. This is probably the case, as the regulatory environment of a country is the result of numerous discretionary decisions, complex economic forces, and natural conditions, which very likely have an additional independent effect on the size distribution. For instance, foreign and export firms are on average larger than domestic firms, see for instance Bernard et al. (2007), Eaton et al. (2011 and García-Santana (2014), and the presence of these type of firms is probably explained by factors that go beyond the prevailing regulatory conditions (e.g. entrepreneurship, cultural and political distance to rich countries, comparative advantage in tradable sectors, etc.). As controlling for all these factors is difficult (some of them are unobservable), causal effects of the business environment on the size distribution in this cross-country framework are hard to pursue and we favor a prudent predictive interpretation of our results. Anyhow, to somewhat attenuate this endogeneity concerns and to provide with an specification able to explain as much as possible of the variation in the size distribution across countries, we include several variables aimed at controlling for these factors and run instrumental variables regressions to check the robustness of the results.
Our set of controls include the presence of informal, export and foreign firms, size of the country and human capital. Our choice of controls is based on economic theory and the availability of data. The latter allows us to estimate different specifications on a stable sample, which guarantees transparency and eases the interpretation of the results. Accounting for informality allows us to solve a negative bias on the coefficient of economic distortions, as countries with a worse business environment have more informal firms and these tend to be very small. As formerly stated, export and foreign firms are larger on average that firms selling only to the domestic market, thus including these controls prevents a positive bias on the coefficient we are interested in. Also, a higher human capital is positively associated with firms of larger size, certainly by means that go beyond its effect on economic regulations (through technology adoption, for instance). Finally, the size of the country might affect both the optimal size of plants and the emergence of institutions that shape the business environment.
Our OLS regressions take the following form: S s c = β 0 + β 1 Distortions c + β 2 Informality c + β 3 Log Population c +β 4 %Foreign Firms c + β 5 %Export Firms c + β 6 Human Capital c + u c where the dependent variable, S s c , is the share of employment allocated to small plants in country c, as computed in Eq. (1). The level of distortions is captured by the Doing Business Index and that of informality by the percentage of plants that report that competition from informal firms is a major or severe obstacle to growth. Additional controls include the percentage of foreign and export firms, as well as the average number of years of experience of managers in the country, which proxies for human capital and managerial talent. See Table 10 in the "Appendix 2" for further details on these variables.
The results of the estimation of Eq. (5) are shown in Table 4. In column (1) we show that countries with a better business environment have on average a significant lower share of labor allocated to small plants. A one standard deviation improvement in the business environment is associated to a 3.0 % points decrease in the share of labor allocated to small plants. This is 0.24 standard deviations of the dependent variable. Column (2) adds informality as an additional covariate. As expected, the coefficient on the business environment goes down to 3.7 % points. This coefficient increases to 2.9 % when we add a full set of controls-column (3)-, but it remains quantitatively large and statistically significant at 1 percent. The point estimate indicates that, conditional on the controls, a country improving its business environment from the worst to the best value in our sample is associated to 10.4 percentage points less labor allocated to small plants. According to Eq. (3), this means an improvement of TFP relative to the US of 8.2 % points. The controls have the expected signs and most of them are statistically significant. Countries with more export and foreign firms as well as managers with more experience allocate less labor to small plants. Also, bigger countries assign more resources to larger firms. The specification of column (3) accounts for 42 % of the variation in the size distribution across countries.
In column (4) we explore the sensitivity of these results to dropping those countries that did not pass the validation exercise carried out in Sect. 3.1. These countries are suspicious of having a survey that is less representative of the population of plants. When we exclude them from the regression, our coefficient associated to the business Table 4 Relationship between economic distortions and size distribution The regressions of the share of employment accounted by small firms on economic distortions, captured by the Doing Business Index, and several covariates. A higher value of the index means lower economic distortions. Columns (1) to (4) perform OLS regressions whereas columns (5) and (6) run IV regressions. Column (1) includes the Doing Business Index as the only covariate. Column (2) adds informality as a control. Column (3) adds additional controls. Column (4) excludes those countries suspicious of not having a World Bank survey representative of the population of plants, as documented in Sect. 3.1. These are Angola, Lesotho, Nigeria, Samoa and Tonga. Column (5) instruments the business environment with distance from the equator, the percentage of the population speaking the main European languages, and the predicted trade share constructed from a gravity model, see Hall and Jones (1999). Column (6) instruments the business environment with log settler mortality of European colonizers, see Acemoglu et al. (2001). Robust standard errors are in parenthesis. Significance levels: * 10%; ** 5%; *** 1% environment becomes quantitatively larger. A country going from the highest to the lowest economic distortions would be associated to 12.4 % points less labor allocated to small plants and 9.8 percentage points higher TFP relative to the US. 16 In Fig. 4 we show the correlation of each covariate and the dependent variable once the effect of the rest of the covariates are controlled for (partial correlations). It is reassuring that no outliers drive the results, specially on the relationship between economic distortions and the size distribution.
In columns (5) and (6) we address the issue of endogeneity that arises when studying the relationship between economic distortions and the size distribution. We instrument the Doing Business Index with variables that in the literature have been argued to cause exogenous variation in institutions. Specifically, we use the instruments proposed by Hall and Jones (1999) in column (5) and by Acemoglu et al. (2001) in column (6). 17 The instruments by Hall and Jones (1999) are based on the extent of Western European influence around the world, which correlates with geographic characteristics of a country as well as language. Specifically, the instruments are distance from the equator, the extent to which the primary languages of Western Europe (English, French, German, Portuguese, and Spanish) are spoken as first languages today, and the predicted trade share based on a gravity model of international trade, constructed by Frankel and Romer (1999). 18 The instrument proposed by Acemoglu et al. (2001) is based on a theory of institutional differences among countries colonized by Europeans. Their proposal is to use European mortality rates during the period of colonization as an exogenous variation in institutions. 19 We find that the coefficient on the Doing Business Index decreases to 6.8 % when we use the instruments of Hall and Jones (1999), roughly twice as low as the OLS estimates, see column (5) of Table 4. When we use as instrument the one proposed by Acemoglu et al. (2001), the effect of economic distortions in even higher and bears the expected sign, although it is less precisely estimated (P value is .11). Probably, this is due to the significant drop in the number 16 We also tried including alternative controls with high data availability such as log area, internal distance, openness, foreign direct investment and education of labor force, and obtained similar results. 17 A similar IV strategy is used by Barseghyan (2008), who studies the cross-country relationship between entry costs and TFP. He finds big effects: an increase of half a standard deviation in entry costs is associated to a 22 % lower TFP. 18 It is argued that Western influence leads to better institutions today, for instance through the ideas of Adam Smith, the importance of property rights, etc. The positive correlation between European languages and Western influence seems reasonable. Distance to the equator is argued to be correlated with Western influence for two reasons. First, Western Europeans were more likely to migrate and settle in regions of the world that were sparsely populated, which are those far from the equator. And second, they were more likely to migrate to regions with similar climate, which again points to regions far from the equator. Regarding the exclusion restriction, it can be argued that Europeans did not systematically conquer areas of the world that today exhibit better economic outcomes. See Hall and Jones (1999) for a more detailed explanation. 19 This theory rests on three premises. First, there were different types of colonization policies which created different set of institutions, ranging from "extractive states" (extractive institutions) to "Neo-Europes" (replications of European institutions). Second, the colonization strategy was influenced by the feasibility of settlements: places with unfavorable disease environments were more likely to develop extractive institutions. And third, the colonial state and institutions persisted even after independence. The exclusion restriction implies that, conditional on controls, the mortality rates of European settlers have no effect on the size distribution today.

Robustness: sectoral decomposition of size distribution
In this section we address the issue of the sectoral composition of activity. Countries specialize in different sectors, and firms in different sectors are of different size. Then, if there exists a correlation between economic distortions and the sectoral composition of activity, the results of the previous section might be contaminated (for example if firms in the manufacturing sector are relatively larger, and distortions are associated to a lower share of manufacturing in overall activity). Table 5 shows statistics of the size distribution of plants across sectors. We consider four sectors: manufacturing, construction, trade and services. Column (1) shows that there are big differences in average plant size across sectors. Scales of production are much larger in manufacturing and construction than in trade and services. For instance, average size in manufacturing is almost 60 log points higher than in trade. These differences are also observed in the amount of labor allocated to plants of different size. Manufacturing and construction have a lower amount of labor working in small plants as compared to services and trade. 21 This suggests that the sectoral composition of activity is a relevant issue in explaining the aggregate allocation of labor across countries.
Then, to control for the sectoral composition of activity and check the robustness of our previous results, we compute the share of employment accounted by small plants at the country-sector level, and run the following regression: S s cj = β 0 + β 1 Distortions c + β 2 Informality cj +β 3 Log Population c + β 4 %Foreign Firms cj +β 5 %Export Firms cj + β 6 Human Capital cj + j γ j + u cj (6) where S s cj is the share of employment allocated to small plants in sector j of country c and the rest of the covariates are those of Eq. (5), having variation at the country-sector level when data are available. γ j are sector dummies corresponding to manufacturing, 21 These differences in the size distribution across sectors are statistically significant under a T test of equality of means. The only differences not statistically significant are the share of employment accounted by small plants between manufacturing and construction and that accounted by large plants between trade and services. construction, services and trade. The inclusion of sector dummies control for technological characteristics of each sector that affect the scale of production and hence the distribution of employment. Table 6 shows the results of estimating Eq. (6). The results are very similar to those found in the cross-country counterpart regressions of Table 4. In column (1) we observe that countries with a better business environment are associated to a lower amount of labor allocated to small firms, at the country-sector level. This relationship is higher when we add informality as a control-column (2)-and remains of similar magnitude when we add several covariates-column (3). Finally, excluding those countries for which the quality of data might be compromised increases the relationship found. Quantitatively, the point estimates of these country-sector regressions are of similar magnitude as those found in the cross-country regressions of Table 4.

A look on particular distortions
In the previous section, we analyzed the relationship between economic distortions (captured by the Doing Business Index) and the share of employment allocated to small plants. As mentioned above, the Doing Business Index is a composite index that accounts for several features of the business environment. In this section, we look at particular distortions in order to shed light on the specific policies that drove the previous results. In particular, we focus on access to finance, taxes, cost of entry, easiness of exporting, rule of law, and corruption.
We capture the availability of financial credit by computing from our micro data the percentage of firms in each country that have neither a line of credit nor a loan, and report to be in need of capital. As mentioned in Sect. 2, many works emphasize the importance of financial frictions in explaining the cross-country different levels of aggregate productivity. The mechanism through which financial frictions can generate misallocation and hence TFP losses is straightforward. Suppose that there are poor and rich individuals, and both rich and poor can be talented or untalented. In a context of lack of full contract enforcement, poor talented people will not even operate or will do at a too small scale. They will not be able to capture enough resources from financial markets to achieve their optimal size. On the other hand, rich entrepreneurs will be able to finance themselves using their own resources. Then, if the correlation between wealth and talent is not one, misallocation of entrepreneurial talent, labor, and capital arises in equilibrium. In these kind of situations, the aggregate demand for labor and hence equilibrium wages are inefficiently low, implying a too low average plant size and a too high amount of resources allocated to small plants.
The quantitative effect of taxation in a context of heterogeneous producers has also been studied in the literature, as in, for instance, Guner et al. (2008). Government policies that promote the existence of small less productive firms by levying taxes on large ones can generate big TFP and output losses. This is so as taxing large firms makes the aggregate demand for labor as well as the equilibrium wage be inefficiently low. In such a situation, unproductive entrepreneurs can afford to operate, biasing the size distribution of plants towards small production units. We measure taxes as the percentage of commercial profits taxed by the public administration.
Our proxy for the cost of entry is the cost of business start-up procedures as a percentage of gross national income per capita. As noted by de Soto (1989), barriers to entry aim at protecting current producers. Taking advantage of the lack of competition, these producers are able to extract rents. As insiders, small unproductive firms prevent the entry of productive and potentially large ones. If productive firms cannot enter, wages remain low, allowing small unproductive producers to keep operating. 22 A large recent literature has documented the fact that export firms are bigger and more productive than domestic firms, see, for instance, Clerides et al. (1998), Aw et al. (2000 and Bernard et al. (2007). Therefore, policies aimed at facilitating the process of exporting and importing have the potential to shape the size distribution towards big firms. The mechanism is twofold. On the one hand, a reduction in trade costs provides larger business opportunities for the most productive plants, which are able 22 Recent papers have emphasized the importance of entry costs in explaining differences in income levels and growth. Barseghyan and DiCecio (2001) and Herrendorf and Teixeira (2011) quantify the effects of entry costs on aggregate TFP and income in developing countries. Nicoletti et al. (2003) show that differences in the regulation of entry explain the productivity growth divergence between continental Europe and the US during the 1980s and the 1990s. Asturias et al. (2012) study the effect of the interaction between financial frictions and entry barriers on growth.
to grow, as in Melitz (2003). On the other hand, as shown by De Loecker (2013), by serving foreign markets, firms are able to learn better technologies and improve their productivity. We use approximate the easiness of conducting businesses with foreign agents with an index that computes the procedural requirements for exporting and importing a standardized cargo of goods.
Finally, we also explore the relationship between the size distribution and the protection of property rights in a country (rule of law) as well as corruption. These components of the economic environment may act as constraints on firms' growth, see, for instance, Svensson (2003). Table 7 shows the relationship between these features of the business environment and the amount of labor allocated to small plants. The regressions are the same as in Eq. (5), except that we substitute the Doing Business Index by the particular distortions mentioned above. We find that, when included separately, access to finance and entry costs play a significant role in explaining the share of employment accounted by small plants, as shown in columns (1) and (3). On the contrary, lower taxes, a better rule of law, lower international trade costs, and a better control of corruption do not appear to be significantly related to a lower amount of resources in small plants, although they bear the expected signs.
In column (1) of Table 8 we include all these features of the business environment simultaneously. It turns out that only financial constraints have significant explanatory power in accounting for the cross-country variation in the amount of labor employed by small plants. Conditional on the rest of covariates, a one standard deviation reduction in the financial constraints faced by local firms (meaning that the percentage of financially constrained firms is reduced in 19.6 % points) is associated to a 3.5 % points decrease in the share of labor allocated to small firms. This translates into a 2.8 percentage points TFP gain, according to Eq. (3) In the following specifications we check the robustness of this result. In column (2) we exclude those countries for which the quality of the data might be compromised, according to Sect. 3.1, and find that the coefficient of financial constraints increases its magnitude and is significant at 1 %. In columns (3) to (5) we add different proxies for access to credit. In column (3) we include the Getting Credit Index, which measures the strength of legal rights, the depth of credit information, and both the public and private coverage of credit histories of individuals. In column (4) we add the percentage of firms using banks to finance investment, and in column (5) the ratio of domestic credit to the private sector over GDP. All these variables enter significantly in the regressions: the better are the financial conditions of a country, the lower is the share of employment accounted by small plants. Finally, in column (6) we instrument the financial constraints with the instruments of Hall and Jones (1999). The results show that the estimated effect is even larger than the OLS estimate: a one standard deviation improvement in financial conditions is associated to a 6.8 percentage points decrease in the amount of labor employed by small firms, and a TFP gain of 5.4 % points relative to the US. 23   The regressions of the share of employment accounted by small firms on several components of the business environment: financial constraints-column (1); taxes--column (2); cost of entry-column (3); rule of law-column (4); trade easiness-column (5); and corruption-column (6). Robust standard errors are in parenthesis. Significance levels: * 10 %; ** 5 %; *** 1 %

Conclusions
In this paper we show cross-country empirical evidence of the relationship between aggregate productivity, plants' size distribution, and economic distortions. Consistent with the recent literature on misallocation, we show that countries that allocate more resources to small plants are associated to lower levels of aggregate productivity, and   (6) instruments the measure of financial constraints of column (1) with distance from the equator, the percentage of the population speaking the main European languages, and the predicted trade share constructed from a gravity model, see Hall and Jones (1999). Robust standard errors are in parenthesis. Significance levels: * : 10 %; * * : 5 %; * * * : 1 % that economic distortions can partly explain this excessive allocation of resources to small production units. After decomposing the set of economic distortions, we conclude that distortions related to the capacity of the economy to provide credit are the main driver of our results. Our findings open the door to further investigate the specific mechanisms through which distortions affect the size distribution of firms. One of the main issues that are worth exploring is how the business environment affects the life cycle of plants. Looking at cross-country differences on how plants enter, grow, and exit would shed more light on the specific mechanisms through which economic distortions affect the allocation of resources and hence aggregate productivity. A recent work that looks at differences in the life cycle of plants across countries is Hsieh and Klenow (2014). They find that, whereas in the US surviving plants grow dramatically over time, this growth is much more moderate in Mexico and almost non-existent in India. We view the study of these plants' life cycle differences across countries as a promising avenue for future research.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix 1: Countries included in dataset
See Table 9. The countries included in our sample-column (1). Column (2) shows the three-digit ISO code. Column (3) corresponds to the year in which the latest Enterprise Survey was conducted. Column (4) shows the share of employment allocated to plants of less than 20 employees. Column (5) shows real GDP per capita with respect to the US. Column (6) shows the Doing Business Index as originally provided by the World Bank. Column (7) shows the normalized Doing Business Index

Appendix 2: Data: definitions and sources
See Table 10.  Definitions and sources of all variables used throughout the paper. When the source is the Enterprise Surveys of the World Bank (ESWB) the codes in parenthesis in the definition correspond to the code of the Questionnaire of the ESWB

Appendix 3: Size and productivity at the plant-level
In this section we analyze the relationship between size and productivity at the micro level in our dataset. We compare the productivity of plants of different size, controlling for country, sector, and several determinants of the efficiency of plants. Specifically, we control for the age of the plant, export status, foreign ownership status and capitallabor ratio. These characteristics are likely to be associated with both higher levels of productivity and a larger firm size. Our measure of size is the number of permanent plus temporary employees employed by the plant. We define three types of plants according to its size: small plants are those with less than 20 employees, medium establishments are those with a number of workers between 20 and 99, and large establishments are those with 100 or more employees. We also consider a continuous measure of size, defined as the log Table 11 Relationship between size and productivity: plant-level evidence Firm-level regressions of log labor productivity on size of the establishment. Observations are weighted to be consistent with the stratified random sampling procedure of ESWB. Columns (1) to (3) computes size under three categories, small (excluded), medium and large establishments, defined as less than 20 employees, between 20 and 99 employees and more than 99 employees, respectively. Columns (3) to (6) computes size as the log number of employees. Sector dummies correspond to 2-digit ISIC Rev 3.1. Robust standard errors are in parenthesis, clustered at the country level. Significance levels: * 10 %; ** 5 %; *** 1 % number of employees. Productivity is computed as valued added per worker. Value added is defined as the cost of raw materials and electricity subtracted from revenue. We run the following regression: log V A i jc L i jc = γ 0 + γ 1 Size i + γ 2 log Age i + γ 3 Foreign i + γ 4 Export i + γ 5 log where V A i jc L i jc is valued added per worker of plant i in sector j and country c; Size can be either whether plant i is small, medium or large, or the log number of employees; logAge is the log number of years during which plant i has been operating; Foreign is a dummy taking value one in plant i is foreign owned; Export takes value one if plant i exports and zero otherwise; and K i L i is plant i's capital labor ratio. Sector (2-digit ISIC) and country dummies are also included in all specifications. Observations are weighted according to the stratified random sampling procedure of the ESWB. Table 11 shows the results of estimating Eq. (7). Column (1) shows that, within countries and sectors, large establishments are, on average, 97 percent more productive than small plants, which is the excluded category. This difference is highly statistically significant and quantitatively large. In column (2), when conditioning on plant characteristics, the difference decreases to 86 %. Interestingly, export and foreign plants have a higher labor productivity conditional on size. In column (3) we include an additional firm characteristic, the capital labor ratio, in order to control for substitution between production inputs. This tackles the concern that small plants might have less labor productivity because they use intensively less capital. We find that, conditional on the capital-labor ratio, large firms exhibit a significant higher labor productivity, of 72 % on average. In columns (4) to (6) we use a continuous measure of size: log number of employees. The same qualitative result arises: larger establishments are significantly more productive than small plants, even when conditioning in plant characteristics. A 10 % increase in the number of employees is associated to a increase in labor productivity of between 2 and 3 percent, depending on the specification.