Market versus endowment: explaining early industrial location in Italy (1871–1911)

This article aims to explain the location of the manufacturing industries in Italy in the period 1871–1911. The analytical framework takes into account of two competing theories on the determinants of the location of economic activity: the Heckscher–Ohlin (H–O) theory on factor endowments and the new economic geography (NEG) theory on access to markets. The methodology used here is based on Midelfart-Knarvik et al. (The location of European industry, European Economy Economic Papers 142. European Commission, 2000) and has seen several historical applications. The location of industries is explained through interactions between characteristics of the regions and characteristics of the sectors, of both H–O type and NEG type. The main finding is that endowments, and in particular energy and human capital, were the determinants of the geography of the first Italian industrialization. Market access, at this point of industrialization, mattered only in its domestic formulation and only through economies of scale.


Introduction
Since its Unification, Italy has been characterized by profound regional disparities. At the time, regions already differed in both social and economic indicators, suggesting that the diverging trajectories originated well before 1861. The decades 1 3 following Unification saw a consolidation of the North-South gap with the formation of the so-called Industrial Triangle between the three regions of the North-west of the country-Piedmont, Lombardy and Liguria. The "Questione Meridionale" is as old as Italy, and it emerged immediately as one of the most compelling issues in the post-Unification period. At the turn of the century, the idea that the negative economic conditions of the South represented an obstacle for the entire nation emerged. Nitti (1900) claimed that the first Italian tax system redistributed wealth from South to North, impairing its economic development; De Viti de Marco (1930) suggested that the South fell behind because it had been transformed in a colonial market for northern industrial goods; Salvemini (1955) believed that the delay of the South was due to inequalities in the ownership of land which ought to have led a land reform. More recently, further explanations have been proposed by economic historians to account for the formation and persistence of regional inequality. For instance, the backwardness of southern agriculture has been pointed to as the cause of the gap by Cafagna (1989) and Zamagni (1990). This view has been later challenged by Federico (2007). In the 1990s, the work by Putnam et al. (1993) started a new line of research that focused on southern culture and institutions, which were regarded as less conducive to economic growth compared to the northern ones. Human capital has also gained increasing attention as an explanatory variable, along with social capital (Felice 2012;Cappelli 2016Cappelli , 2017. Felice and Vasta (2015) proposed, as origin of the divergence, the failure of elites in the South to guide society through the stages of active industrialization, with progress taking place only through state intervention. Over time, differences in physical geography have also been seen as a possible cause of regional disparities. Fenoaltea (2014) focuses on the different energy endowments from water and Daniele and Malanima (2011) on proximity to the centre of Europe.
The present article aims to explain what drove the industrialization of the Italian regions in the period between Unification and the First World War by testing several of the above hypotheses in a unified model. The goal is to identify the determinants of the industrialization of regions within an analytical framework that takes account of both the Heckscher-Ohlin (H-O) theory on factor endowment and the new economic geography (NEG) theory on market access. The methodology used is theoretically grounded on the seminal work by Midelfart-Knarvik et al. (2000) on the location of European industries from 1970 to 2000. This methodology has also been applied to historical cases by Crafts and Mulatu (2006) on Britain, Wolf (2007) on Poland, Martinez-Galarraga (2012) on Spain, Klein and Crafts (2012) on the USA and Nikolić (2018) on Yugoslavia.
The methodology used here mainly follows Klein and Crafts (2012) and is quite straightforward: the dependent variable is the share of employment of each industrial sector in each region. This is explained by interactions between industry characteristics and regional characteristics of both the H-O and the NEG type, plus region and sector controls. The regional characteristics considered include market access, energy endowment, agricultural labour and human capital availability. Industry characteristics include measures of energy, labour and skill intensity as well as intermediate input use, sales to industry and mean plant size.

3
Market versus endowment: explaining early industrial location… The contributions of this work are several. First, we bring a new perspective to the quantitative study of the Italian North-South divide. The development of the industrial sectors played a big role in the regional divergence process and studying how industries located in their early stages is informative on how the overall disparities among regions were formed in the first place. Also, by applying a well established methodology to the Italian case, we propose for the first time quantitative testing for several hypotheses regarding market and endowment forces by exploiting variation across industrial sectors and regions making use of both regional and industrial characteristics. Moreover, the Italian case is particularly fruitful for the study of location decisions in early stages of industrialization and their persistence over the long run.
The main results of this study are that the energy interaction is the most consistent of all in explaining industrial location, with positive and significant coefficients across all specifications. Human capital also gives positive and significant results. Market forces drive the industrial location through the interaction describing the effect of economies of scale only when domestic markets are considered.
The paper is organized as follows. Section 2 provides some historical background and literature on the economic development of the Italian regions. Section 3 explains the methodology for studying the determinants of industrial location. Section 4 provides an overview of the sources. Section 5 shows the empirical results for the Italian case. Section 6 compares our results with the existing literature. Section 7 concludes.

Historical background and literature
The years between the Unification of 1861 and the outbreak of the First World War are fundamental to understanding the dynamics behind Italy's economic development. At the time of Unification, the country was predominantly agricultural, with some early manufacturing in the North-west, in particular in the textile sector (Cafagna 1989). In this period, all the Italian regions experienced industrial growth of some kind. By 1911, all the modern sectors had been to some extent established (Zamagni 1990). This process was much smaller in scale than the big industrial boom of the 1960s and 1970s, when Italy became one of the most industrialized countries in the world. However, this first wave of industrialization is worth our attention for several reasons. According to Felice (2015a), in spite of the limited growth experienced by the country between its Unification and Second World War (with only a doubling of its GDP per capita), the dynamics taking place in this period created the premises for the economic boom started in the 1950s. From a more pragmatic perspective, studying the period immediately after Unification and before the Fascist rule presents several advantages: all internal borders and tariffs were removed and the administrative and legal framework became the same for all regions, making them more comparable across periods. Secondly, the relatively limited state intervention in the decades preceding Fascism is suitable for isolating the impact of geographical factors. 1 Moreover, in this period many of the pre-conditions for growth took shape, as well as their uneven geographical distribution. This is when some sectors were established for the first time and some regions experienced their very first industrialization: choosing this period therefore gives a chance to partially avoid a path dependency bias that would later become more serious.
This paper tests geography, both in its first and second nature aspects, as the primary cause of the imbalanced process of industrialization. The issue has been of course addressed in previous literature. For instance, Fenoaltea (2014) gives a possible explanation for the imbalanced industrialization of Italy before the First World War based on the comparative advantages of the North, in particular in terms of energy endowment from water. This hypothesis, however, is not translated into a formal model by the author. Both Cafagna (1999) and Daniele and Malanima (2011) discuss the role of the physical distance of the southern regions to the centre of Europe and they claim that the position of the South constituted a natural disadvantage for its industrialization. This explanation is strongly rejected by Felice (2015b) who uses several other historical examples, i.e. California, Japan, Australia and New Zealand, to demonstrate that mere physical distance from the core does not necessarily imply a disadvantage. One other work that links geography to the regional disparities in Italy is A'Hearn and Venables (2012). The chapter explores the relationship between regional disparities, internal geography and external trade from 1861 to 2011. The authors propose different drivers of economic activity across Italian regions in different periods: in the period 1861-1890, the main driver was natural advantage; in the period, 1890-1950 it was access to the domestic market; and finally in the post-war period it was the access to foreign markets. The idea that the north-western regions had an advantage in a triad of resources constituted by natural endowments, human capital and social capital has been proposed by Cafagna (1999). Felice (2010) has since re-proposed this view, arguing that different elements of it mattered at different stages of development (natural resources before 1880, human capital between 1880 and 1970 and social capital after 1970). Felice (2012) took a step forward by formally testing the role of human and social capital on regional convergence from 1891, confirming that social capital only mattered in the more recent decades while human capital explained convergence before Second World War. More recently, other scholars have engaged in further quantitative testing of some of these hypotheses. For instance, Daniele et al. (2016) have proposed an econometric model to test the effect of market access, measured following Harris (1954), on the manufacturing employment at province level. The model also includes geographical controls such as urbanization, latitude, whether a province is landlocked and its literacy rate. Although their article represents an advancement in that it uses a formal model, the dependent variable is limited to the total manufacturing employment in the provinces, without any distinction among sectors. This somehow neglects the intra-sectoral variation that can be studied by including in the model each industrial sector separately. Moreover, although the article proposes some descriptive statistics on 1871, the econometric analysis only starts with 1911 and continues until 2011. This is 1 3 Market versus endowment: explaining early industrial location… consistent with a broad long-term approach, but of course there is an implicit trade off between temporal comparability and detailed analysis on a more specific period. Measures of industrial performance have also been related to some of the classical hypothesis, for instance by Cappelli (2017) in relation to human and social capital and Nuvolari and Vasta (2017) in relation to secondary schooling and innovation. Finally, the most recent contribution in this direction is the one by Basile and Ciccarelli (2018) who propose a formal model to explain the value added in each sector at provincial level using literacy, energy and market forces as main explanatory variables. Although this latter contribution is indeed using the rich intra-industrial information available, the authors do not look at interactions between industry characteristics and region characteristics.
In this article we take one step further by testing the effect of both market and endowment at the region-sector level, providing for the first time an explicit study of the interactions between region and the industry characteristics. Doing so, we are able to identify through which channels regions are able to attract a certain type of industrial activity.

Modelling industrial location
The methodology used here goes back to Midelfart-Knarvik et al. (2000), a reduced form of which has been applied to historical cases by Crafts and Mulatu (2006), Wolf (2007), Martinez-Galarraga (2012), Klein and Crafts (2012) and Nikolić (2018). We rely here on the specification used by Klein and Crafts (2012). The model is described by Eq. (1) where (s i,k ) is the share of employment of industry k in region i. 2 j is a vector of the interactions between regional characteristics and industry characteristics. ∑ w w are the coefficients of the interactions which are the focus of our analysis. A set of dummies for each region and for each industrial sector is included to control for size differences among regions and sectors and for any other unobserved characteristic. The model will be estimated as a repeated cross section. 3 This is due to the length of the period we are looking at, during which a fair amount of structural change has taken place. The intuition behind the model is the following. The coefficients of Similarly to Crafts and Mulatu (2006) and Wolf (2007), and unlike Klein and Crafts (2012), we decided to use employment rather than value added to measure industrial location. In principle, in a country with similar labour productivity across regions this choice is not a fundamental one. But for the case of the Italian regions, which present dramatic differences in productivity, value added would not properly account for the location of less productive firms. This is because unproductive firms might in principle have a much more sizeable share in the labour force compared to their share in the total value added. 3 As in the case of Klein and Crafts (2012), a Chow poolability test suggests that the different benchmark years should not be pooled (the null hypothesis of t j = j is rejected at the 1% level with an F statistic of 5.23. Note that the null hypothesis is rejected in spite of only the regional characteristics being time variant.

3
the interactions indicate whether industries with a high level of a given characteristic tend to be over represented in regions where the corresponding regional characteristic is more abundant. For instance, if energy endowment is a determinant of industrial location, we expect the interaction between water power production in the regions and horse power use in the sectors to be significant, meaning that manufacturing sectors with more use of energy tend to locate in regions with a larger water power production. The use of the interactions makes the model differ from the ones that have already been proposed in the literature for the Italian case. Using interactions allows us to explicitly consider which are the features of the industrial sectors that are attracted by certain regional features. For instance, we are able to disentangle the possible channels through which market access is able to attract industries. The use of interactions also helps isolating a specific effect of a regional characteristic: for instance, higher literacy rates could be associated with the presence of industries in a region because they could be an indicator of other factors attracting industries (trust, better institutions, more public spending, etc.). The use of interactions ensures that the effect found is indeed related with the direct use of a certain factor in the industrial sectors. In this sense, interactions identify specific channels of action of the region characteristics. The estimating equation (Eq. 2) is the following: The analysis considers all 16 Italian regions ( Fig. 1) and 12 manufacturing sectors according to the population and industrial censuses of the period. The population censuses were carried out in 1871, 1881, 1901 and 1911. 4 Therefore, this analysis is based on 10-year benchmarks, with the exception of 1891. The interactions are presented in Table 1. Table 1, Panel A, shows the H-O type interactions. The first one measures the availability of human capital in the regions, proxied by literacy rates. The intensity of human capital in an industrial sector is measured as the share of white-collar workers over total number of workers in each sector. The human capital interaction is straightforward in its interpretation, as higher literacy rates are always expected to attract sectors with a larger share of white-collar workers. The second (2) ln(s i,k ) = 1 (Literacy Rate × Whitecollars) + 2 (Agr. Employment × Agr. Input) + 3 (Deposits Per Capita × Horsepower) + 4 (Waterpower × Horsepower) + 5 (Market Potential × Forward Linkages) + 6 (Market Potential × Backward Linkages) + 7 (Market Potential × Mean Plant Size) interaction is an agricultural interaction that links the share of the labour force in agriculture in each region to the share of inputs from agriculture used in each sector. This interaction is expected to be positive since, whenever there is a high share of labour force in agriculture, the regional economy is expected to attract industries that are more intensive in the use of agricultural inputs. However, this expectation might be not fulfilled when a high share of agricultural workers in the labour force is not reflected in the high availability of agricultural inputs. This can happen in regions where a large share of the population is engaged in agriculture but the levels of agricultural productivity are very low. In these cases the abundance of agricultural labour may have a negative effect on industrial location. The third interaction captures the availability and intensity of financial capital, measured through credit per capita in each region and the capital intensity proxied by horse power per worker in each sector. Similarly to the previous, deposits per capita are expected to be positively associated with the presence of more capital-intensive sectors. The last H-O interaction is based on energy, liking its endowment to its industrial use. For this interaction, it was decided to  keep separate the two main sources of energy used in Italy in the period: water power and hydroelectric power. 5 The intensity of the use of energy is measured by horse power per unit of production. For the case of Italy we decided to use water power as the baseline interaction, as hydroelectric production started only in the second half of our period. For water and hydroelectric power interacted with horse power, the expected sign is positive. In particular, we expect water power to be more likely to drive the location of industries because it reflects an energy source that is often produced very close to the plant, if not in the plant itself. However, hydroelectric power could be transported over relatively longer distances and therefore having a potentially less strong connection with industrial location.
Before moving on to the NEG-type interactions, a remark on the inclusion or not of further endowments variables is necessary. In the literature, other factors have been proposed as possible drivers of economic activity. One example is technological innovation, often proxied by patent availability and intensity. The reason for deciding not to include a similar set of variables in our model is that the industrial census of 1911 does not include information on patent intensity of the industrial sectors, making this type of variable not suited for this particular casestudy. Moreover, the industrial classification used by Nuvolari and Vasta (2015) in their regional reconstruction of Italian patents is not consistent with the one used in the employment reconstructions available, making it hard to include patents also from the regional side of the interaction. 6 The other variable that in a broad way represents an endowment is social capital, which has been the object of various studies, most recently Felice (2012) and Cappelli (2017). The nature of the model we are using makes it hard to include a measure of social capital: we should have a clear hypothesis of which industrial channel social capital would use to attract industrial employment, but it is hard to think of a single measurable industry characteristic to consider. Moreover, our model already includes human capital, which has been indicated by Felice (2012) as more important in explaining regional patterns compared to social capital. A similar conclusion is reached by Cappelli (2017) looking specifically at this period.
The NEG-type interactions of Table 1, Panel B, are based on the calculation of market potential at the regional level. Market potential is a standard measure of market access that is widely used in this type of literature. In this work, we rely on the estimates by Missiaia (2016) who uses the formulation proposed by Harris (1954): market potential of region A is defined as the sum of the GDP of all the 1 3 adjacent regions, each weighted by its distance from region A, plus the GDP of the region itself. The idea behind market potential is quite straightforward. For a given region A, the larger the GDP of the other regions, the better the access to the markets of the region; and the greater the distance between region A and the other regions, the lower the weight of the GDP of each of these regions in the market access of the region considered. This formulation of market potential has been used in several works. Crafts (2005) applies it to the regions of Britain , Schulze (2007) to the Habsburg regions (1870Habsburg regions ( -1910 and Martinez-Galarraga (2012) to Spain (1859Spain ( -1929. Forward and backward linkages are the value of inputs and outputs taken from other sectors as a share of the total value added of the sector. The third interaction is between the mean number of workers per plant and market potential and captures the tendency of firms to benefit from economies of scale by locating in regions with good access to markets. In an new economic geography approach, when transport costs are very high or very low, market access does not influence location decisions. This is because inputs either cannot be transported at all or they can be transported so cheaply that transport costs are not considered for location decisions. However, when transport costs fall to an intermediate level, market access becomes relevant. We believe this is the case of Italy in our period. The three market interactions proposed are all expected to have a positive sign when transport costs are at an intermediate level and market forces determine industrial location. The two interactions based on inter-industry linkages are positive when firms tend to locate close to their suppliers or to the sectors for which they are suppliers. The last interaction between market potential and mean plant size is expected to be positive and significant when firms use market access to achieve economies of scale. Before moving on to the description of sources, a comment on the unit of analysis adopted is necessary. Compared to the more recent literature, we decided to carry on our main analysis at the regional rather than the provincial level for two orders of reasons. The first is empirical, as we believe that the current quality of measurement achievable for same fundamental variables is insufficient for the purpose of our model. In particular, we lack both provincial GDP and transport cost-adjusted distances, and short cut methods could lead, in our opinion, to severe bias. To give an example, the short cut method used by both Daniele et al. (2016) and Basile and Ciccarelli (2018) to obtain provincial GDP per capita implies that regional GDP can be allocated to provinces based on population figures. This method relies on the assumption of homogeneous labour productivity among the provinces of one region. This is problematic in both dimensions as there is no reason to believe that there was no productivity gap between rural and urban areas. Also, this method is biased if provinces have different employment structures (which are not captured by population figures). Furthermore, the contribution by Missiaia (2016) shows that for the case of Italy, the correct accounting of transport cost-adjusted distances is fundamental for measuring market access: straight line distances neglect the potential advantage that the southern regions had in accessing the international markets because of their ports and therefore the use of this simplified version of market 1 3 Market versus endowment: explaining early industrial location… potential dramatically changes the picture of relative market access. 7 There is no reason to think that this bias would not be as severe, if not more severe, at the provincial level. The second order of reasons for choosing to conduct of analysis at the regional level has to do with the integration of the regional economies and their meaning as a unit of analysis. Kim and Margo (2003), in their work on the historical patterns of economic geography of the USA, remarked that "the most compelling reason for studying geographical areas of differing scales is that models which explain the location of economic activities at one scale, such as the region, may not apply to smaller scales such as urban areas, or even finer ones like financial and industrial districts". In line with this view, we believe that some of the determinants of the location across regions might be different from the ones across provinces for the Italian case. For this reason, we prefer to look at regional aspects leaving the provincial ones for future research in light of more precise provincial data reconstructions. 8 Finally, it should be noted that our model uses as dependent variable employment at the region-industry level. This represents a different choice compared to all previous empirical works on the Italian case. We made this choice because here we are interested here in explaining location patterns (what made the industrial sectors locate where they located, regardless of their performance) rather than productivity patterns (why was there more industrial production in certain regions).

Sources
In this section we discuss the sources for the dependent variable, the region characteristics and the industry characteristics.

Employment figures
The dependent variable of the model is the logarithm of the share of employment in a given sector on the total employment of each region. The employment figures are taken from Ciccarelli and Missiaia (2013), where labour force estimates from the population censuses at the provincial and regional level are presented. 9 Ciccarelli and Missiaia (2013) discuss at length the shortcomings related to the misreported textile figures for women in the southern regions. 10 We decided to follow Fenoaltea (2003) and correct female textile employment by capping the number of women at

Region characteristics
This section describes the sources for each of the region characteristics, which are reported in Table 2.

Agricultural employment
The agricultural regional characteristic used here is the share of the labour force in agriculture. Population censuses (MAIC 1874(MAIC , 1883(MAIC , 1902(MAIC , 1914a provide the figures for the active population in agriculture.

Human capital
Literacy rates are used as a measure of human capital endowment in the regions. A'Hearn et al. (2011) provide the latest estimates for literacy rates at the regional level for the whole population over the age of 15. This threshold is quite convenient for the present work because 15 and over is the age group that best captures industrial workers. For literacy, we propose two instruments: the first one is the 10-year lagged literacy rate (for instance, the literacy rate in 1861 as instrument for 1871), and the second one is the inverse of the geographical distance from Paris. The two instruments are different in nature as the first uses the temporal dimension to avoid contemporaneous feedback effects, while the second instrument uses the exogeneity of physical distance. The motivation for including the former is chronological (literacy at time t cannot be caused by literacy at time t + 1 ), while for the latter it is historical: literacy rates have a strong North-South gradient and a milder but still significant West-East gradient. This gradient is well reflected by the inverse of the distance from Paris in terms of straight lines. 11 The historical validity of this instrument goes back to the influence of France over the Italian pre-unitary states during the Napoleonic period. The Italian pre-unitary states entered the Napoleonic sphere of influence with different political arrangements. Piedmont, Tuscany and Lazio were simply annexed to the French Empire and therefore were exposed to the same reforms of the rest of France, although at different times (Piedmont was first occupied in 1798 and annexed in 1802, Tuscany and Rome in 1808); Lombardy and Emilia had both been under the influence of France, although not directly annexed, since 1796 as part of various republics and finally as part of the Kingdom of Italy. Venetia joined them later, in 1805. The continental South also fell under the influence of the French Empire from 1806, becoming the Kingdom of Naples, which was first ruled by Joseph Bonaparte, brother of the Emperor and finally by his brother-in-law, 11 If we were to take the transport cost from Paris we would incur in a similar problem to that of market potentials and we would also undermine the validity of the instrument, since transport costs are not necessarily exogenous. Joachim Murat. Sicily and Sardinia remained under the rule of the House of Bourbons and Savoy, respectively, and therefore left untouched by any Napoleonic reform.
In terms of effect of the reforms, Grab (2003) divides Europe in three parts: the inner empire, where "reforms had a major impact and were implemented effectively", the outer empire, where "the application of the reforms was inadequate and left few institutional traces", and the intermediate states, where the reform policies were "carried out with limited degrees of success". According to the author, Northern Italy, which was part of the inner empire, the reforms implemented during the Napoleonic period persisted after 1814. 12 On the other hand, the Kingdom of Naples, who falls in the intermediate cases, had a restricted impact of the Napoleonic reforms. 13 Our claim is that the parts of Italy that have been more directly or for longer under the French rule were the ones that France was able to conquer first (or at all) also thanks to their proximity. The existing literature supports the idea that public schooling was seen by Napoleon as a fundamental tool for spreading revolutionary values, while it was seen by his opponents as a danger. It is therefore not surprising that areas more affected by the French rule were the ones that managed to better implement the reforms and to maintain them after 1815, producing this North-west-South gradient that is still visible after Unification. Looking at our candidate instrument, we see that it indeed presents a correlation of 0.93 with the literacy rates. Therefore, we use here the inverse of distance from Paris as an instrument for the literacy after 1861.

Credit
Bank deposits at the regional level in 1911, used as proxy for credit availability, are provided for the various types of bank from the statistical yearbook for Italy (ISTAT 1912). Unfortunately, not all the statistical yearbooks provide this information for all types of banks and all years. We decided to use the information on the Casse di risparmio ordinarie, which are available for all years, as a proxy for all types of bank. We are aware of the limitations of this strategy, but it has not been possible to find a common source for all types of bank.

Energy endowment
Water power data are constructed from two ministerial sources: MAIC (1884), which is a province-level census on the production of water power in 1877 and MLP (1935), which records the new concessions for water power production between 12 Grab (2003) provides a generally positive assessment of the application of the Napoleonic reforms in Northern Italy, in spite of the relatively low level of funding. Moreover, Pepe (2003) describes the republican period between 1796 and 1799 as a fundamental one for creating the basis of public education in Lombardy and Emilia. The fact that Venetia had been under the French rule only from 1805 could explain its lower literacy levels in 1871 compared to Lombardy. 13 The low impact and persistence of the Napoleonic reforms in the Kingdom of Naples are described in detail by Lupo (2006), who notes how in spite of the best intentions, a very low level of funding could be devoted to public primary schools. Moreover, after the Restoration of 1815, public education was seen by the king as a nest for revolutionary ideas and therefore he openly opposed it: in 1821, about half of the teachers were fired under the suspicion of having taken part to the riots against the re-established monarchy, leading to further dismantling of the Napoleonic reforms.

3
1870 and 1932. The production level for 1877 is aggregated in regions, and then the data on the new concessions are used to expand the series backward and forward in time. Hydroelectric power production is taken from Mortara (1934). Hydroelectric power production is present only in the last two benchmark years of the sample, since the production before 1901 was negligible. Both these two variables will be instrumented with the total stream of rivers per km 2 in each region from SVIMEZ (2011). The use of this instrument is justified by the idea that rivers represent an exogenous potential for water power production and similarly for hydroelectric power production. A similar variable is used by Basile and Ciccarelli (2018) to proxy for energy.

Market potentials
The estimates for market potentials for the 16 Italian regions in the benchmark years 1871, 1881, 1901 and 1911 are from Missiaia (2016), except that here they are taken in current prices. 14 In Missiaia (2016), two specifications of market potential for the Italian regions are discussed. The first, domestic market potential, includes the Italian regions only, while the second, total market potential, includes all the Italian regions and the main trading partners of Italy. One of the main results is that domestic market potential is a stronger and more consistent determinant of regional GDP per capita. For this reason, we decided to show the results with both formulations of market potential. As instrument, as in Missiaia (2016) we will use the inverse of transport costs, which represents the part of market potentials that does not depend on GDP. By removing GDP, we take out an important source of endogeneity while still considering the different effect of various means of transportation that would not be accounted for if straight line distances were considered.

Industry characteristics
This section describes the sources for the industry characteristics used in the model. The two main sources for the industry characteristics are the Industrial Census of 1911 and the input-output table provided by Vitali (2003). The two sources are described in detail in this section, while the variables are reported in Table 3.

Industrial Census of 1911
The Industrial Census of 1911 (MAIC 1914b) was the first complete census of this type carried out in unified Italy. It provides information on the number of plants, the number of workers in each plant, the number of employees by type of occupation and by sector. It also provides information on the horse power used by plants in each sector. The interactions relying on this source are: the human capital interactions, with the share of white-collar workers in each industrial sector; the energy interactions, with horse power per value added used in the plants; the financial capital interaction that uses horse power per worker as proxy for the capital-labour ratio; and the interaction between market potential and economies of scale measured by mean plant size in terms of employment.

Input-output table
Market potential is interacted with three industry characteristics. The one discussed above, mean plant size, is computed using information from the 1911 Industrial Census. The other two rely on forward and backward linkages which are measures of the value of output used as inputs by other sectors (forward linkages) and the value of inputs that come from other sectors (backward linkages), all as a share of the total value added of each sector. 15 The source here is the input-output  (2003) also provide the value of inputs from agriculture to each industrial sector, which is used in the agricultural interaction. 1 3

Market versus endowment: explaining early industrial location…
Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total regional employment

Determinants of industrial location: empirical results
In this section, we present the estimation results of our model of the determinants of the location of manufacturing industries. Table 4 shows the estimation of Eq. 2 as a repeated cross section, showing for each year the specification with domestic and total market potential as explanatory variables. All coefficients are standardized beta coefficients, and we report heteroscedastic-robust and regional cluster-robust t statistics. 16 Region and industry fixed effects are also included in all specifications.
In our baseline specification we find that among the H-O interactions, two stand out as having a significant impact on industrial location. The first is the human capital one, significant for all years except the first when domestic market potential is used in the formulation. In terms of coefficient size, the effect nearly doubles over the period. The other H-O interaction that appears significant, in this case in the second two benchmarks, is the energy one, based on water power production as regional characteristic. In 1901 and 1911, the effect appears slightly smaller that the human capital one but still in the same order of magnitude of 0.2 standard deviations.
Looking at the NEG interactions, we first notice that when domestic market potential is used instead of the total one, the results are significant for two out of three interactions. In particular, the interaction with mean plant size is significant and has a coefficient size between 0.6 and 0.9 in the four benchmark years. The interaction with forward linkages is also significant but never at 1% and with a coefficient of about half the size compared to the one with mean plant size.
The different results using the two formulations of market potential are perfectly in line with previous findings on the impact of market access on GDP per capita by Missiaia (2016), suggesting once more that during this stage of industrialization, the home market played a more important role compared to the international ones. 17 It is also interesting that the effect of human capital is smaller when domestic rather than total market potential is included, suggesting that when market is properly accounted for, the effect of human capital is weaker. Summing up, the first baseline regression suggests that the determinants of industrial locations are to be found in two endowment forces, namely human capital and energy, and possibly in two market forces, but only when the Italian market is considered. Table 5 proposes an alternative formulation for the energy interaction using hydroelectric, rather than water power. Here the exercise can be performed only for the last two benchmark years as a sizeable production of hydroelectric power in Italy only started at the turn of the century. The results are similar to the ones of Table 4, showing a positive and significant effect of the energy interaction on the industrial location in both years. The results of the first two tables would 1 3 Market versus endowment: explaining early industrial location… suggest that both H-O and NEG forces played a role in shaping the industrial geography of Italy in the pre-WWI period. However, as already noted in the previous literature, some of the components of the interactions might present endogeneity issues. To convince the reader of our results, we propose in the next three tables an instrumental variable strategy to overcome these issues. Table 6 starts by instrumenting the regional characteristic included in the energy interactions of Tables 4 and 5. As an instrument, we use the total stream of rivers per km 2 in each region. This measure is intended to capture the potential for producing energy from water and is therefore exogenous with respect to the Table 5 The determinants of industrial location with hydroelectric power,  Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total regional employment 1 3 region employment structure and economic performance in general. We use this instrument for both water and hydroelectric power. In all the specifications the instrument is significant at the 1% level in the first step. Columns 1-4 show the results for the former and Columns 5-6 for the latter. The coefficients are reduced in magnitude but the results of the previous tables are basically confirmed in terms of significance, suggesting that for 1901 and 1911 energy was an important determinant for the location of industries across the Italian regions.
Literacy is a notoriously difficult variable to instrument, and there have been few attempts in the literature on industrial location. For the Italian literacy rates, we propose two different instruments: the inverse of the geographical distance from Paris and the 10-year lagged literacy rates, as illustrated in the previous section. Table 7 shows the results. In all years the instrument is significant in the first stage at the 1% level. Literacy in the second stage is significant at the 5% in 1881 and 1901, and not significant in 1871 and 1911 when distance from Paris is used Table 6 The determinants of industrial location (IV energy),  Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total regional employment

3
Market versus endowment: explaining early industrial location… Table 7 The determinants of industrial location (IV literacy), 1871-1911 Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total regional employment. Columns 1-3-5-7 use distance from Paris as instrument. Columns 2-4-6-8 use lagged literacy rates as instrument 1871 1881 1901

1911
(1) (3)   1 3 as instrument; the same variable is significant at the 10% level in 1871 and the 5% in 1881 and 1901, and not significant in 1911 when its lagged values are used as instrument. The size of the coefficient is similar between the two instruments, about half compared to the simple OLS specification. These results suggest that in spite of the challenge posed by the IV strategy for literacy, a positive effect of the human capital interaction is still detectable. Finally, in Table 8 we propose a standard instrumentation for domestic market potential also used in Missiaia (2016). We take the inverse of the sum of all transport cost-adjusted distances, which basically corresponds to taking market potential without its GDP component. Here the first stage presents more problems compared to the other instrumentations. For the interactions with backward and forward linkages, the instrument is always significant at the 1% level. However, for the interaction based on mean plant size, the instrument is significant at the 5% level in 1871, 1901 and 1911 while it is not significant in 1881. In terms of the second stage, both interactions with forward and backward linkages are now nonsignificant, leaving only the interaction describing mean plant size to play a role among the NEG interactions with a large coefficient size. Table 9 proposes a summary of the beta coefficients to help the reader make sense of the results in terms of relative impact. We only report the coefficients of the interactions that were significant after the instrumentation, with the coefficients significant at least at the 10% level in bold. We notice that the sizes of the coefficients are relatively stable over the different specifications, with some exceptions. The largest relative impact by the domestic market potential interacted with economies of scale always around 0.8-0.9 standard deviations (except when instrumented, where the size of the coefficients increases to 1.5 or even 2 in the case of 1881). Human capital and energy (both using water power and hydroelectric power) have similar impacts in the OLS regression, but they decrease their impact to 0.112 and 0.151 in 1901 when instrumented. Literacy also has a similar decrease in size when instrumented, resulting in both variables having a comparable impact.
Summing up, our econometric analysis suggests three main determinants for the industrial location of post-Unification Italy: human capital, which is significant in all benchmark years in the OLS regression (while both instruments fail to confirm its role in 1911); energy, which is more consistently significant in the last 2 years (both with water power and hydroelectric power and both also when instrumented); and the presence of economies of scale through the domestic market, although the instrumentation for this interaction appears more problematic compared to the others.

Discussion
The general result of our work is that endowments, and in particular energy in the form of water and hydroelectric power, are central in the location of the Italian industrial sectors during the first industrialization of the country. We also observe that energy has an increasing importance over time, which is consistent with the technological path taken by a late industrializing economy characterized by extreme scarcity of home produced coal (Bardini 1997). Human capital also shows a consistent effect as a determinant, but the instrumentation is effective only for the earlier period. This pattern would suggest a transition from human capital to energy as a main determinant among endowments. The results on market interactions suggest that only domestic market potential had a positive and significant effect, pointing to a strong role for this latter when interacted with mean plant size. However, the instrumentation proved to be more challenging for this interaction. Our analysis also confirms that international markets were not yet fundamental in shaping the industrial geography of the country, as the results are not confirmed when total market potential is used.
The study of the different trajectories and regional patterns for the Italian industries is not new in the literature. The role of energy availability (in the form of water power) in shaping the industrial geography of the country has already been brought forward by several scholars with a variety of approaches: Cafagna (1999) considers Table 8 The determinants of industrial location (IV MP),  Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total regional employment it one of the main determinants of the Italian dualism along with human and social capital, while Fenoaltea (2014) is even giving to energy the first place among these three. This thesis is essentially confirmed by our formal analysis. On the other hand, our results somehow qualify the findings by Bardini (1997), who claims that electricity did not play a big role in Italy's first industrialization: the contribution to the overall economic growth might have been small, but it appears as different energy endowments were able to influence location decisions. Moreover, the large gap in literacy rates between southern regions one side and north-western regions on the other has also attracted much of scholars' attention in explaining the Italian dualism (see Zamagni 1978 andmore recently A'Hearn et al. 2011;Felice and Vasta 2015). Felice (2012) more recently finds that human capital is responsible for divergence in the period 1891-1911, which is consistent with our results at least until 1901. 18 The most interesting comparison of our results is with a number of empirical works that have been released in the recent years and that use a variety of formal models to tackle the issue of the determinants of industrial production and location in the pre-First World War period. The first one to be mentioned is Ciccarelli and Fachin (2017), in which productivity is explained by human and social capital, political participation and the building of infrastructures. The article finds that there is an absence of dynamic spill over effects, measured as the growth rate of industrial value added in neighbouring provinces. This is proposed by the authors as evidence that market access did not play a role in shaping the industrial geography of Italy in this early period of industrialization. This contribution represents indeed the first one to have proposed a formal causal model to explain regional patterns of industrial activity; however, the lack of a proper market access measure accounting for provincial GDP and transport costs in the model leads to different conclusions on the role of the home market compared to our results. Nuvolari and Vasta (2017) use the levels of provincial industrial production as dependent variable, finding that patents were connected with technical education and these two variables were determinants of industrial activity. Here regional domestic market potential from Missiaia (2016) is included as regressor at provincial level, resulting nonsignificant. Again, the lack of an ad hoc measure of market access at the same unit of analysis of the dependent variable is problematic. Cappelli (2017) also looks at the growth of industrial value added for provinces in relation to human and social capital. In this contribution, the author finds that human capital is a stronger determinant of output growth compared to social capital. This article does not explicitly model market access, but it includes controls for water power and hydroelectric power, finding no effect on industrial growth rates. Finally, the recent work by Basile and Ciccarelli (2018) represents the most relevant comparison. The article looks at the location of industrial output at provincial level using as explanatory variables literacy, market potential and water power production. The article uses for the first time data on each sector separately. The main result is that capital-intensive production was driven by domestic market potential and that once both market access and literacy are accounted for, the effect of energy gives mixed results. It appears that these two latter works looking at the provincial level find that energy is less of a driving force compared to our results. The fact that models at the provincial level more rarely find an effect of energy variables is in our view not surprising: when the unit of analysis gets smaller, the chances that different geographical units will share the same source of energy located in one of them is more likely. This is particularly true for hydroelectric power, which is often produced in a more mountainous area, often adjacent to a more industrialized one. This result is confirmed by the tentative estimation at the provincial level that we propose in "Appendix", where hydroelectric power interacted with energy use in industries is not significant. Regarding human capital, there is a consensus that it represented an important factor explaining both employment and production. 19 Regarding market access, we find that market potential was indeed important in its domestic formulation only, as in Basile and Ciccarelli (2018). Looking at the results in "Appendix", we notice that at the provincial level the effect of economies of scale is far weaker. Once again, a model based on interactions requires to properly account for GDP and transport costs in order to correctly assess the effect of market access. This different result could also be connected with the unit of analysis used, as firms might be able to use market access of neighbouring provinces, and not only of their own province, to take advantage of economies of scale.
Summing up, although previous scholars have approached the issue of the drivers of industrial location from many perspectives, we believe that the present contribution covers new ground in several respects. First, it looks at manufacturing employment rather than manufacturing value added. This distinction is paramount, as regional location and productivity patterns do not necessarily go hand in hand: when looking at manufacturing employment in a certain region, we are asking where the industrial activity is located, regardless of the regional productivity of the labour force; when looking at value added in a certain region, we are identifying the most productive locations. In the Italian case in particular, these two questions might provide quite different answers because of very large productivity gaps across regions and provinces. The second novelty of this paper is that it uses for the first time market potential estimates that do not rely on any major short cut, either on the GDP component or on the transport costs component. This is of course possible because we are using regions rather than provinces as unit of analysis. Given the current state of GDP and transport cost reconstructions at provincial level, the gains from extending the analysis to provinces not always outweight the loss in terms of noise in the data. In particular, we believe that the comparison performed by Missiaia (2016) between domestic regional market potential with transport cost-adjusted distances and straight line distances gives a strong warning against drawing conclusions on the effects of market access without properly accounting for transport costs. The third contribution of this paper is to explicitly relate for the first time specific characteristics of the regions with the corresponding characteristics of the sectors. This formulation of the model allows to look at the specific channels through which certain regional characteristics attract industrial employment.

Conclusions
The aim of this article is to account for what determined the location of the Italian manufacturing industries in the period 1871-1911, which embraces the first wave of industrialization of the country. Our methodology explains the regional share of employment in each industrial sector using a set of interactions between industry characteristics and region characteristics of both the H-O and NEG type.
In terms of comparison of our results with the previous literature on Italy, there are a few remarks to make. First, in spite of the variety of approaches, it seems clear that human capital had a strong effect in shaping the industrial geography of Italy. However, we find that when regions, rather then provinces, are used as unit of analysis, the role of energy is even more important than human capital. Regarding the role of markets, we confirm the intuition by Missiaia (2016) and the results by Basile and Ciccarelli (2018) that market mattered only in its domestic formulation. We also add to the literature the identification of economies of scale as main channel through which market mattered.
The importance of energy endowment, human capital and domestic market has also been underlined in works on industrial location in other countries. Crafts and Mulatu (2006) find that access to coal was a determinant for Britain while total market potential was not. Wolf (2007) finds that human capital and the access to the domestic market were determinant for industrial location in interwar Poland. Nikolić (2018), looking at interwar Yugoslavia, also finds that both market forces and human capital mattered in location decisions. Other studies focusing on the twentieth century, such as Midelfart-Knarvik et al. (2000) on the EU and Klein and Crafts (2012) on the USA, find evidence that market forces were more important than endowment forces in explaining industrial patterns. Our analysis suggests that the Italian case fits the typical pattern of nineteenth-century industrializing countries in which endowments prevailed over the international market in determining the location of economic activity, although there was a role for domestic market access in determining the location of industries.
estimates following the widespread methodology by Geary and Stark (2002) and transport cost-adjusted distances, which represent the two components of market potential. Existing empirical works such as Daniele et al. (2016) and Basile and Ciccarelli (2018) have filled this gap by allocating regional GDP reconstructed by Emanuele Felice to provinces based on population figures. This is far from ideal because the assumption is that all workers are equally productive across provinces and that the regional structure of employment is reflected in each province. Similarly, we do not have a comprehensive reconstruction of railway and sea transportation costs across provinces. It has been shown by Missiaia (2016) that for the case of the Italian regions, taking straight line distances can lead to very different results for market potential. This problem is particularly severe for Italy because of the frequent interchange between different means of transportations that makes physical distance only one of the determining factors of transport costs (the other being access to sea transportation and existence of railways). For our provincial estimates, we have decided to propose a different short cut method for market potentials that we believe would give a more realistic picture compared to the existing literature.
For market potential, we have taken the estimates from Missiaia (2016) and allocated them to provinces following three equally weighted principles: a third of market potential is allocated according to the share of population of the province in the population of the region; a third is allocated according to the share of the regional rail roads in the province using reconstructions from Ciccarelli and Groote (2017); the remaining third is allocated within the regions to provinces that had a port. 20 The intuition behind the three principles is to somehow account for all the three factors that drive a composite measure such as market potential. Using all three weights, we are able to capture the economic size of each province (as in the previous literature proxied by population), the presence of railways and the access to the sea. Another variable that was not possible to directly extend to the provincial level is deposits per capita. We have used provincial population weights to go from regions to provinces. This is not ideal because of possible endogeneity between the population weights and the dependent variable. For the hydroelectric power production, the same source used for the regional level does not provide information on the provincial one. We allocated the regional production to the provinces in each region that present mountainous areas (proportionally to their geographical extension).
The dependent variable is based on the figures by Ciccarelli and Missiaia (2013). Provincial agricultural employment was easily retrieved from the population censuses, while the provincial literacy rates are the ones used by Cappelli (2017). For water power, the same source used for the regional level was also available at the province level. Tables 10 and 11 propose the same specification at the provincial level of Tables 4 and 5 of the main text.

3
Market versus endowment: explaining early industrial location…  Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the total provincial employment Table 11 The determinants of industrial location (with hydroelectric power),  Heteroscedastic robust t statistics in parentheses *, **, *** Correspond to a coefficient significantly different from zero with a 10, 5 and 1% confidence level, respectively. The dependent variable is the share of the sectoral employment over the provincial employment