1 Introduction

Countries characterized by extractive institutions and the concentration of political power in the hands of small elites tend to display a pattern of limited long-term economic growth (Acemoglu et al. 2001, 2005). Recent studies have also shown that human capital is essential in explaining the link between institutions and economic development (Gennaioli et al. 2012; Glaeser et al. 2004). Under the assumption that political power is linked to landownership concentration in pre-industrial economies, assessing the role of land inequality becomes fundamental for understanding the path of development of countries and regions.

This paper aims to analyze the relationship between land inequality and literacy rates in the late nineteenth century and the early twentieth century in Italy.Footnote 1 I build a new dataset at both the district and province levels in Italy from 1871 to 1921, which allows me to exploit both the temporal and cross-sectional dimensions of the relationship between landownership concentration and education. I estimate several regressions at each point during the considered period, showing how this relationship seems to be fairly stable over time. However, when I explore the panel dimension of the dataset, the relationship vanishes over time. This was likely due to different institutional shocks such as population enfranchisement and polity regime changes to implement public policies (i.e., decentralization vs centralization). The most important shock was the introduction of the Daneo-Credaro law in 1911, which transferred the decisions to invest in public schooling from municipalities to the central government, thus limiting the political power of local notables. By interacting the initial level of landownership concentration with time shocks, I employ a stepwise difference-in-differences strategy to account for the differential effect of those institutional changes on areas with a different distribution of political power.

The dataset’s cross-sectional and panel structure creates several advantages in terms of identification strategy. The panel dimension allows me to control for time-invariant unobserved heterogeneity, whereas the cross-sectional dimension permits me to exploit the advantages of an instrumental variable approach that helps identify a causal relationship between landownership concentration and education levels. More specifically, I use malaria pervasiveness as a source of exogenous variation in land inequality. Extensive research documents the link between latifundia and malaria. This thesis was put forth by Celli and Fraentzel (1933), who linked malaria endemicity with the creation of large estates, a pattern observed for centuries in the Pontine marshes, a plain in the surroundings of Rome. At the same time, extensive literature documents the greater capacity of large farmers to cope with the risks of malaria-infested areas (see Snowden (2008)). They were forced to manage adverse shocks and adopt appropriate incentive-compatible strategies, while rural workers were discouraged from living close to the lands, thus influencing their settlement pattern.

I find the following results. First, separate cross-section estimates at different points in time reveal a negative impact of the measure of land inequality on literacy rates. Second, IV estimates, using malaria as an instrument, rule out concerns regarding potential endogeneity. Third, the validity of the exclusion restriction hypothesis is addressed in Section 5.3. Although malaria may affect education outcomes through health channels, robustness checks demonstrate that the main findings are hardly changed under second-order assumptions. Finally, once I explore the panel dimension of the dataset, further evidence to support the negative association between land inequality and education levels is provided, albeit it seems to vanish over time, in line with the “passive modernization process” Italy was facing at that time (Felice 2013).

The main contribution is twofold. First, although much literature at a global level has already documented the impact of extractive institutions and inequality on growth factors, I find that the epidemiological environment is a strong predictor of historical land inequality. Specifically, this paper demonstrates that malaria virulence was a major driver of landownership concentration in Italy. Second, I explore the mechanism of the effect transmission behind the observed relationship to distinguish between the impact of supply and demand factors. I argue that landownership concentration may have adversely affected literacy rates, not only influencing the supply of schooling through the political process but also affecting the private demand for education of landless peasants.

A vast amount of literature has primarily focused on the role of landowning elites in preventing the poor majority from gaining education and power. Assuming that large landowners rule local municipalities and acquire decision autonomy to invest in human capital, as in Italy in the post-unification decades, their interest lies in the vote against the expansion of public spending on mass schooling. I gather information on the number of public schools and the corresponding appointed teachers and municipal expenditures and restrict the sample to rural areas with low urbanization rates to test the relationship between supply factors in education and land inequality. I find partial support for the hypothesis that the transmission mechanism worked through political voice. Consequently, while local notables, on the one hand, might have low returns to invest in public schooling, on the other hand, they could have burgeoning stakes in the industrial sector.Footnote 2 However, since the industrialization process presumably made for a greater demand for education (see Galor and Moav (2006)), I assess whether demand factors played a role too. Using data on enrollment rates and child labor, I find that the poor majority, which faces tightening budget constraints and limited opportunities for upward mobility and offering their labor outside agriculture, had fewer incentives to invest in human capital. Hence, land inequality affects human capital accumulation in diverse ways. This influence not only operates through the role of extractive institutions blocking the supply of public schooling infrastructures but also via demand effects.

I proceed as follows. In Section 2, I survey the corresponding literature review. In Section 3, I briefly provide a historical background of post-unification Italy. Section 4 describes data, while Section 5 outlines the econometric methodology and reports the main empirical results, focusing on the role of malaria as a source of exogenous variation of latifundia. Moreover, I conduct several robustness checks, enabling the identification of a plausibly causal relationship between land inequality and education. Particular attention is also devoted to panel estimates, with a stepwise diff-in-diffs approach. Section 6 focuses on the different possible channels of transmission of the effect. Finally, Section 7 provides concluding remarks.

2 Literature review

The literature on the role of inequality and its long-run effect on human capital is vast and presents broad empirical support. Galor et al. (2009) build a theoretical model where capitalists, on the one hand, push for policies aimed to promote the education of the masses to have a skilled labor force, whereas landowners, on the other hand, support policies that deprive the masses of education. The interest of landowners is to reduce labor mobility to keep wages low and have an available labor force in the countryside. That is especially true in later stages of development, and industrialization has already emerged, increasing the return to education and the incentives to invest in human capital (Galor 2011). Besides, Galor et al. (2009) assess the inverse relationship between land inequality and education, drawing from the theoretical model an empirically testable prediction tested using US data for the 1900–1940 period, finding that changes in landownership concentration negatively affect education expenditures. Moreover, to demonstrate the causality of the effect, they exploit as an instrument the interaction between changes in the relative price of crops and climatic characteristics across states.

Ramcharan (2010) investigates the effect of land inequality on redistributive policies. By looking at US census data for the 1890–1930 period, he finds a negative association between inequality and expenditures in education, identifying a causal relationship using geographic variables as a source of exogenous variation in land inequality.Footnote 3 Similarly, Vollrath (2013) shows a negative correlation between landownership concentration and taxes for funding public schooling in US counties in 1890. The same results can be found in Chaudhary (2009), who links low public spending on primary schooling with greater castes’ interests in India.Footnote 4

At the same time, Engerman and Sokoloff (1997); Sokoloff and Engerman (2000) assert that Latin American elites blocked the expansion of mass schooling and suffrage enlargement to retain their political power. Go and Lindert (2010) aim to explain differences in enrollment rates between the North and the South of the USA by looking at local autonomy and more widespread access to political voice for the masses in the North. Also, they analyze the effect of enfranchisement on tax levels for public schooling and on enrollment rates. Along the same line of research, focusing on estimating the correlation between schooling and education levels with inequality in political power, Mariscal and Sokoloff (2000) find that inequality in Latin America is negatively associated with enrollment and literacy rates. Besides, they argue that the extension of the franchise promoted mass schooling, a result also found by Acemoglu and Robinson (2000) and Gallego (2010). The latter states that democratization enhances primary education while decentralizing political power impacts only secondary and higher education levels. At odds with these latter findings, Aghion et al. (2019) argue that the role of democratization is irrelevant to explaining higher education levels, and they propose military rivalry as the real incentive for countries to invest in mass schooling.

Recently, another body of literature has called this view into question, as incentives to invest in human capital entering individuals’ utility functions and concerns regarding their limited budget constraints may be considered non-mutually exclusive channels of transmission of the effect. Indeed, a more equally distributed land ownership may have contributed to unleashing resources to invest in acquiring literacy and numeracy skills (Cinnirella and Hornung 2016; Beltrán Tapia and Martinez-Galarraga 2018), leaving the supply-side mechanism linked to the political process aside.Footnote 5 Overall, less empirical research has been carried out on the demand-based viewpoint concerning the historical link between land inequality and human capital. The first attempt dates back to Galor and Zeira (1993), which highlights that in the presence of frictions in the credit market, unequal distribution of resources can have a long-lasting effect on the lower class’ ability to invest in human capital. Individual underinvestment in human capital resulting from capital market imperfections is also found by Deininger and Squire (1998). At the same time, Reis (2005) provides a somewhat similar explanation of disparities in education levels across European countries in pre-industrial times. He emphasizes the role of education opportunity cost for the vast majority of the population rather than for the powerful landed nobility acting as a constraint for the supply of schooling. Cinnirella and Hornung (2016) find that emancipated peasants had more incentives and resources to invest in education after serfdom abolition in nineteenth-century Prussia. Thus, the negative relationship between land inequality and literacy was mitigated in Prussian counties with a larger number of land and labor redemption cases, indicating the achievement of complete independence from the feudal landlord. Similarly, Beltrán Tapia and Martinez-Galarraga (2018), under the assumption that girls were not fully engaged in agricultural tasks, indicate that while farm laborers were negatively associated with male and female enrollment rates, they had a negative effect only on the supply of male school teachers. In the same direction, they also argue that land access inequality affects boys’ schooling enrollment more than the provision of male teachers. Lastly, the results seem unaltered when the analysis is restricted to a rural sample, where landowners’ power would plausibly be more extensive. In another work, relying on a newly assembled dataset on individuals living in Madrid in 1880 and 1905, the authors find that despite the supply of mass schooling improved access to education for children with disadvantaged backgrounds, the public effort was insufficient to offset the constraints these families were to face (Beltrán Tapia and de Miguel Salanova 2019).

I contribute to this literature by providing new insights on the role of land inequality on human capital accumulation in post-unification Italy, a country characterized by severe inequality levels and a North-South gap. Before describing data and the empirical strategy to test the proposed hypotheses, I briefly outline a historical background of the Italian economy in the liberal age, focusing on its educational and agricultural structure.

3 Historical context

3.1 Italy’s education system, 1859–1821

During the unification process, the Casati law (1859), firstly operating only in the Kingdom of Sardinia, was extended to the annexed regions that later became part of the Kingdom. The subsequent education system consisted of first-grade primary schools free of charge, delivered by municipalities according to their fiscal capacity. The first two years of primary school were mandatory for boys and girls, but households were not to pay sanctions in case of no compliance with the rule. Municipalities with more than 4000 inhabitants were responsible for setting up second-grade schools, characterized by two additional years of primary schooling. Each municipal council was then authorized to build schoolhouses, hire teachers, pay their salaries, and enforce attendance.

In 1877, the Coppino reform amended the electoral system established by the Casati Law, whose main contribution consisted in raising the compulsory years of schooling from two to three. Also, it introduced the possibility for municipal councils to receive subsidies for building schoolhouses and furnishing didactic material and established sanctions for households not complying with the rule of law. Generally, it enhanced the power of local councils to enforce compulsory attendance, especially in rural and less developed areas of the country. The Nasi and Orlando laws followed this reform, approved respectively in 1903 and 1904, which further raised the age of mandatory education and increased teachers’ salaries, blocked since 1886 (see Cives (1990)). Finally, in 1906, the Special Law for the South of Italy was enacted, including reforms aiming to bridge the socioeconomic gap in schooling between northern and southern regions. However, although these reforms improved the education system, none decisively boosted human capital.

A further crucial step was taken with the introduction of the Daneo-Credaro Law in 1911, following the Corradini Inquiry published one year before. The reform significantly changed the education system in Italy, centralizing the funding of schooling to a large extent. The main characteristic consisted of creating the provincial school boards, the so-called Consiglio Scolastico Provinciale (CSP), which worked as a link between municipalities and the central government. On the one hand, they limited the power of municipal councils, whereas, on the other hand, they allowed a certain degree of administrative decentralization in the management of public resources. While municipalities could apply for free loans to fund the construction of schoolhouses, the state was responsible for paying teachers’ salaries. Moreover, municipalities included in the CSP system were obliged to transfer an amount of money equal to the previous years’ budget to the Treasury. The state would, in turn, re-transfer these funds to the CSPs, which were charged with the duty of financing public education (see Cappelli (2015) for further details).

3.2 Agrarian structure of the Italian economy

Italy, soon after unification, was a predominantly rural economy. In 1911, according to the Population Census, the share of agricultural employment over the total labor force was about 60%, and it remained relatively unchanged until the Second World War (see Felice (2018)). Since the first years after unification, land inequality has always been considered one of the main issues at the core of political projects aimed at improving the conditions of the southern economy. The debate around latifundia involved prominent scholars such as Gramsci, Nitti, Sereni, and Croce, at least as late as the implementation of the land reform after the Second World War.

First of all, land inequality is a measure of wealth dispersion per se, whose observation contributes to shedding light on the level of concentration of resources among the population. Second, land constitutes the main productive input in a prevalently agrarian economy. Thereby, it may create oligopsonistic local labor markets if highly concentrated in the hands of few landowners. As a result, many rural laborers lose bargaining power and are left with the only alternative to migrate, looking for better conditions outside agriculture. Naturally, large landowners’ interests will consist of keeping wages low, trying to impede the migration process, and maintaining a large mass labor force at disposal.

Traditional historical literature considered latifundia negatively, attributing them as one of the profound causes of the backwardness of the South. Accordingly, large landowners were deemed absentee and not rational economic agents, more concerned with rent extraction than undertaking productive investments on their land (Sereni 1971).

A more recent branch of literature revised the traditionalist view. Specifically, the hypothesis of southern large landowners’ irrational behavior is called into question (Petrusewicz 1989; Placanica 1990). Instead, their decisions are attributed to the existence of specific non-removable constraints they were facing, different from the ones landowners were to address in the Po valley, such as drought in the summer, lack of rain, and the presence of malaria (Galassi and Cohen 1994; Bevilacqua 2005; Lupo 1990).

Notwithstanding an entrenched controversy and long-term debate on the role of land inequality in the North-South divide in Italy, there have been no attempts to integrate the discussion with quantitative evidence for the post-unification period. I try to fill this void to assess its correlation with human capital accumulation. Then, I focus on one specific component of the development process, leaving aside other possible channels through which land inequality affects long-run economic growth, such as efficiency in agriculture (Deininger and Squire 1998), excessive taxation (Persson and Tabellini 1994), trade policy that protects rent extraction (Adamopoulos 2008), extractive institutions (Acemoglu et al. 2002), and market power (Martinelli 2014).

The following section describes the main variables employed in the empirical analysis.

4 Data

I rely on a newly assembled dataset that includes land inequality and human capital information in post-unification Italy. I digitized data from various sources and referred to six years (1871, 1881, 1891, 1901, 1911, 1921) for the 69 provinces and to only three points in time (1881, 1901, and 1921) for 205 districts to test whether the concentration of landownership is associated with the expansion of education.Footnote 6 Data coming from Population Censuses contain several available information, including literacy rates, all agrarian contracts employed in agriculture, their percentage out of the total labor force, surface extent, child labor, and population. I integrated this information with further data coming from alternative sources. Specifically, data on municipal expenditures, literacy rates for the population aged between 15 and 19, the number of children and teachers, enrollment rates in primary school, and the number of schools divided by gender come from Bozzano et al. (2019) and Cappelli and Quiroga Valle (2020) and are kindly provided by the authors.

Since Population Censuses only provide information on contractual agreements employed in agriculture, without including information on the size of landholdings and their distribution among individuals, I cannot build up a proper measure of land inequality. Vollrath (2013) uses a Gini index as the first measure of land inequality, despite several issues to be addressed. It is computed across all operationalizing farms divided by bin size and adjusted to account for males aged 21 or more. Similarly, Ramcharan (2010) studies the effect of land inequality on redistributive policy in the USA, employing the Gini coefficient at the county level for 1890–1930. Another accurate measure of land inequality alternative to the Gini index would have been the size of landholdings (Ramcharan 2010; Cinnirella and Hornung 2016; Vollrath 2013). However, it has been used to approximate the extent of serfdom or the tax base in rural areas. Therefore, I follow Beltrán Tapia and Martinez-Galarraga (2018) and Oto-Peralìas and Romero-’Avila (2016), which measure land inequality, respectively, with the percentage of landless peasants over the total agricultural active workforce and the fraction of farm laborers over the total agricultural population.Footnote 7 In this context, the adequate measure of the variable of interest is the fraction of farm laborers divided by the total labor force engaged in agriculture. In particular, the numerator includes the number of daily laborers, the so-called braccianti, and wage workers, generally hired on an annual basis. The two agrarian figures constitute the mass of landless peasants, counterposed to large farm owners. Furthermore, the Censuses explicitly state that if a person had declared to exercise simultaneously or, alternately, two or more professions, only the profession declared as the primary one had been taken into account. Besides, the standard computation applies throughout all the Italian territory homogeneously. Hence, double-counting issues and heterogeneous misrepresentation of the occupational structure in circumscribed areas are simultaneously avoided. Besides, the share of farm workers is more appropriate than the Gini index to measure large landowners’ de facto political power in the first phase of the industrialization process. Indeed, I am primarily interested in the class conflict between landowners and landless peasants rather than among landowners. Thus, farm workers’ share is better suited to capture the extensive margin of political power distribution rather than the intensive distribution of landownership. I expect the fraction of farm workers to be negatively associated with literacy rates.

In Fig. 1, I report the share of farm laborers in 1881 (one of the years included in my sample). Although Fig. 1 reflects what I expected to find about land inequality in Italy, it might be possible that those registered as landless indeed owned some small plots of land, yet not sufficient to ensure they reached the subsistence level. Nevertheless, this type of rural worker was more prevalent where braccianti were less widespread, balancing the disparities in land inequality among regions. Moreover, farm workers had to rely on wage work and child labor to meet their needs, further incentivizing them not to invest in education because of its opportunity cost. Generally, incentives to acquire education are associated with occupations attaching economic value to literacy skills, most widespread in urban areas. Indeed, towns and cities offered more accessible admittance to schools and provided inhabitants with more opportunities to bump into market exchanges and frequent encounters with the law and the authorities. Therefore, it was indispensable to read and write bills of exchange and other trade documents in a similar context. In contrast to urban areas, the countryside was mainly characterized by a less dynamic environment. Thus, it is more likely that only local notables had access to education and the opportunity to acquire literacy and numeracy skills.

Fig. 1
figure 1

Farm workers (share over total labor force in agriculture) in 1881. Author’s elaborations on data coming from 1881 Population Census

Figure 2 plots literacy rates for Italian districts for the same reference year in the liberal age, 1881. They were much higher in the northwestern part of the country, coinciding with the so-called industrial triangle around Turin, Genova, and Milan, Italy’s three most industrialized cities. Although some income disparities already existed between the North-West and the rest of the country at the unification and in the first decades after 1861, industrialization was only taking its first steps, and we do not observe noticeable disparities yet.Footnote 8 Differences in the industrialization index would have enlarged from that period, bringing about an even larger gap in income, particularly accentuated in the interwar period.Footnote 9 Remarkably, the North of Italy was not urban, except in Turin, Milan, and Genova, but was mainly constituted by sparsely settled small towns, with a high number of small municipalities. The area was marked by the settlement pattern of the cascina irrigua, the typical Po Valley agrarian regime, which favored the settlement of peasants close to their land. The highest urbanization rates were registered in some southern areas, mostly coinciding with the so-called agro-towns, composed of a huge mass of agrarian laborers settled on top of hills and far from cultivated lands, often infested by malaria (Malanima 2005).

Fig. 2
figure 2

Literacy rate (1881). Author’s elaborations on 1881 Population Census

Figure 3 displays the variation of literacy rates over time for the four main Italian macroregions. Whereas the graph shows a clear North-South gap, it seems that literacy rates grew steadily at the same pace throughout the period under analysis. Neither divergence nor convergence seems to occur (at best modest convergence rates) until the implementation of the Daneo-Credaro law. From that moment on, the growth rates’ speed seemed enhanced for all Italian regions, yet slightly more in southern regions.Footnote 10

Fig. 3
figure 3

Literacy rate by macroregion (1861–1921)

I test the hypothesis that the degree of landownership concentration is negatively associated with the level of human capital. A preliminary overview confirms the conjecture, but the results may be driven by some omitted or unobserved factors related to both variables. Hence, in the next section, I attempt to clarify similar concerns by including additional controls and implementing an identification strategy based on an instrumental variable approach. Furthermore, I exploit the panel structure of data by employing a stepwise diff-in-diffs strategy, accounting for the differential effect of structural reforms on literacy between areas with low and high initial land inequality. Among them, the centralization of public investment decisions in mass schooling with the introduction of the Daneo-Credaro law in 1911 could have played a decisive role to prompt education, acting as a “substitute for prerequisites” (Allen 2011; Gerschenkron 1962; Cappelli and Vasta 2020).

Descriptive statistics referred to variables employed in the empirical analysis are reported in Tables 14 and 15 in the Appendix, both for districts and provinces and each year. For a thorough definition of the variables and their source, see Table 16 in the Appendix.

5 Empirical strategy

This section evaluates the contemporaneous and lagged relationship between landownership concentration and literacy rates. First, I present OLS results from separate cross-sections at different points in time, both for districts and provinces in post-unification Italy. Then, I introduce an instrumental variable approach based on the presence of malaria, which aims to address potential endogeneity. Next, I conduct a robustness check analysis by including further controls and validating the instrument with a reduced form regression and several falsification tests. Lastly, I explore the temporal dimension of the dataset presenting panel estimates, which allow accounting for time-invariant unobserved heterogeneity.

5.1 OLS estimates

I estimate a standard OLS model where literacy rates are a function of the share of farm laborers in each district or province i:

$$\begin{aligned} Lit_{it}=\alpha _{1}+\beta _{1}LandIneq_{it}+\gamma _{1}X^{\prime }+\epsilon _{it}; \end{aligned}$$
(1)

\(\forall t \in {1871, 1881, 1891, 1901, 1911, 1921}\), where Lit is the literacy rate, LandIneq is the share of farm workers, and \(\beta _{1}\) the coefficient of interest. One may contend that literacy of the overall population is determined much earlier than farm laborers are measured. Hence, LandIneq is measured both at time t and \(t-1\), whenever possible. \(\epsilon\) is the standard error term (henceforth, SE), clustered at the district- and provincial levels. For robustness, I also compute Conley SE (Conley 1999) to account for the possibility that errors are cross-correlated among neighboring districts and provinces, and wild cluster bootstrap SE, following Cameron et al. (2008).Footnote 11X is a vector of socio-economic and geographical covariates, which permits controlling for other factors related to education and land inequality that may drive the results. These variables, indeed, capture other aspects of demand and supply of schooling potentially associated with landownership concentration. I include population (in logs), urbanization rate, sharecroppers out of total agricultural workforce, and the fraction of workers engaged in agriculture. In the absence of disaggregated data for GDP and GDP per capita for the period under analysis, the inclusion of these variables permits to account for opportunities outside agriculture that create an incentive to invest in human capital. Indeed, urban areas are typically more prosperous, as more productive agriculture can sustain other activities where literacy skills are required. Following the same argument, I include the share of the agricultural labor force to distinguish between rural and urban geographic areas. Then, I expect it to be negatively associated with literacy rates. The proportion of sharecroppers allows controlling for areas where it is likely to register high land inequality levels not accounted for, given how the variable is constructed.Footnote 12 Furthermore, I control for a set of geographic factors: a dummy that takes on a value equal to 1 whether the district or the province is landlocked and 0 otherwise; the latitude and four dummies for the Italian macroregions. The first variable allows capturing the openness of the local environment to trade and cultural flows, commonly due to better access to the sea. In contrast, the latitude controls for a diverse climate and temperature throughout the peninsula, potentially driving different agrarian regimes. Finally, four departmental dummies allow capturing most of the unobserved heterogeneity at the macroregional level.

OLS results are presented in Tables 1 and 2, respectively, at the district- and province-level. The coefficient of interest is negative and statistically significant for all years under analysis, both for the district and the province estimates.Footnote 13 Furthermore, concerning estimates based on a more detailed territorial level (district-level), the coefficient magnitude appears to decrease over time, meaning that ongoing changes associated with landownership concentration had a role in reducing its impact on education levels. Therefore, even though OLS results are not conclusive, they suggest that the relationship between land inequality and literacy rates was likely to be more accentuated in the first stages of development. The share of the agrarian labor force, distinguishing between rural vis-à-vis urban areas, shows a negative and statistically significant coefficient, as expected. Even the share of sharecroppers over the total labor force in the primary sector is negatively associated with literacy, in line with the fact that only northwestern regions had higher levels of education. Latitude is positively associated with literacy, while other controls seem less relevant, except the dummy for being landlocked, positively associated with the provincial estimates and the district-level outcomes in 1881 and 1901. Moreover, in columns 3 and 5 of Table 1 and columns 3, 5, 7, 9, and 11 of Table 2, I report estimates with lagged levels of farm workers (\(LandIneq_{it-1}\)), presenting an even greater magnitude of the parameter of interest.Footnote 14 Finally, to account for spatially correlated standard errors, I report wild-bootstrap p-values (1,000 repetitions) for SE of farm workers clustered at the regional level and Conley SE adjusted considering a radius of 200km at the bottom of the table. Both alternative ways of computing SE produce similar results about the significant impact of farm workers on literacy. Overall, while OLS estimates highlight a robust association between land inequality and human capital in post-unification Italy, they still present several sources of endogeneity to be addressed, and the estimated coefficients cannot be interpreted as causal relations.

Table 1 Literacy and farm workers, OLS, district level
Table 2 Literacy and farm workers, OLS, province level

5.2 IV estimates: the role of malaria

While empirical evidence of OLS regressions appears robust, the findings do not reproduce a causal effect as unobserved omitted factors correlated with land inequality and literacy might bias the results. On the one hand, more educated peasants may be more able to rent plots of land they can redeem and obtain better tenant conditions. This could be particularly true in a fixed-rent regime, ultimately resulting in a more equally distributed land ownership. Conversely, more educated peasants may decide to sell their lands to seize better rents in occupations outside agriculture, with a higher payoff for high-skilled workers, potentially leading to increased land inequality. While in this latter case, I eventually find a downwardly biased coefficient, in the first, the estimates may become upwardly biased toward zero. Thus, I address endogeneity by adopting an instrumental variable approach. Specifically, I exploit a new measure related to malaria pervasiveness in the liberal age as a source of exogenous variation in land inequality. Before moving to the empirical analysis, I briefly describe the peculiar relationship between malaria and land inequality and their role in Italian historiography.

5.2.1 Malaria and latifundia in the Italian countryside

Malaria is the basis of all social life. It determines relations of production and the distribution of wealth. Malaria lies at the root of the most important demographic and economic facts. The distribution of property, the prevailing crop systems, and patterns of settlement are under the influence of this one powerful cause.

Francesco Saverio Nitti

Italy registered a long record of malaria exposure throughout the centuries. Archaeological evidence dates back to the eighth century BC the presence of the disease on the Italian territory, attesting to abandoned settlements beside the southern Italian and Sicilian coasts attributable to malaria epidemics (Corbellini and Merzagora 1998, pp. 22–23). First, the ancient Romans met one of their greatest enemies in malaria, as the disease rendered the Pontine swamplands around Rome completely uninhabitable (Nájera 2001). Then, around 400–100 BC, it spread along Tuscan coasts, either coming from the South of Italy or directly from North Africa (Sallares et al. 2004). During the last decades of the nineteenth century and the first half of the twentieth century, the disease was still considered one of the main political and socio-economic issues because of its association with rural poverty and underpopulation (Amorosa Jr et al. 2005). Malarial zones in Sardinia, in the Pontine marshes (Agro Romano), and along the coasts of Mezzogiorno were all characterized by low agricultural productivity and population density (Celli and Fraentzel 1933; Missiroli 1949).

While the interrelationship between high land inequality and malaria has a long track record, environmental factors contributed to entrenching high inequalities once institutional factors became less relevant after the subversion of feudalism. Feudal residues and fiscal privileges were removed between the Napoleonic wars and the unification. Inheritance laws such as maggiorasco and fideicommesso, which had maintained high levels of wealth inequality, were eventually abolished.Footnote 15 Nevertheless, the lands of the fiefs, which remained deprived of the feudal lords’ possession, returned to the availability of pre-unification governments, which reconfigured their ownership through assignments, taking the “useful possession” and, in its absence, direct assignment as the main guideline criteria. Therefore, it was easy for local notables to exhibit all the necessary titles regarding their “useful possession” of landsFootnote 16. Besides, many workers who received a share of land could not pay land taxes for the scarcity of resources and means at their disposal. They were compelled to sell it to the wealthiest owners, too large were the rights still recognized for the barons that the individuals could redeem only by resorting to the courts, exposing themselves to intolerable economic costs. Consequently, after the subversive laws of feudalism, local notables maintained property on most of the feudal lands de facto (Villani 1965; Anatra 1977).

Thus, although this process of institutional change did not profoundly modify the agrarian structure of the economy, geographical non-removable constraints played a major role in exacerbating high wealth and land inequality well after the unification. Malaria-infested regions, impeding rural households from living close to their lands, contributed to further concentrating land ownership and maintaining latifundism. A similar agrarian structure of society was perpetrated as late as the interwar period. Arrigo Serpieri, former Minister of Agriculture during fascism, attested that a different geographical and social environment was necessary to switch from an extensive to an intensive agrarian regime. He also added that “(...)intensified cultivation systems are not possible where good and tolerable hygienic conditions are not determined before or in parallel with their introduction. Many large estates are now infested with malaria, in front of which the current system has the great merit of not requiring the permanence in the countryside of several people, especially during the period of the most serious infection” (my translation from Boccini and Piccialuti (2003)).

The hypothesis that malaria was a driver of latifundia creation was originally suggested by Celli and Fraentzel (1933). More recently, it was tested by Chaves (2013), who uses Markov chain models to study the dynamics of latifundia creation in 1930 Spain. Celli observes a high correlation between malaria endemicity and the creation of large estates in the Pontine marshes, a plain in the surroundings of Rome.Footnote 17 More specifically, the historical archives of the Roman Pontine marshes document the debilitating effects of malaria on farmers, which do not allow them to harvest crops and properly cultivate lands. Thus, malaria, forcing rural households to abandon their lands, brought about that only fewer wealthy landowners could afford them. Then, large landowners would exploit the land as latifundia, based on extensive and land-intensive agriculture, requiring a landless peasants’ workforce. During harvest time, farm workers were forced to settle in towns located on hills and far from infested areas. Idle lands reinforced the presence of latifundia, ultimately resulting in a vicious circle that maintained such a feudal agrarian regime as late as the Second World War.

Moreover, extensive literature documents the greater capacity of large farmers to cope with climatic and environmental risks (Binswanger and Rosenzweig 1993; Eastwood et al. 2010 and Ramcharan (2010)). Among these factors, malaria played a major role, posing the infested areas with a constant uncertain and risky condition. It forced large landowners to manage adverse shocks and adopt appropriate incentive-compatible strategies. Snowden (2008) documents the greater capacity of larger farmers to promptly replace sick workers during the harvest season, further complicated by their location far away from cultivated lands. Extensive and labor-saving agriculture, farming practices based on specific cultivars, and measures to cut transaction costs in hiring external laborers prevailed in malaria-infested regions. At the local level, land-use regimes and tenancy systems based on short-term leases involving an annual circulation of tenants dominated. Extensive mono-cultivar cereal farming demanded work only at certain times of the year. Hence, residence on the land was discouraged, and peasants preferred living in the “agro-towns,” located on top of hills (King and Strachan 1978).Footnote 18 The settlement pattern of the population was therefore highly influenced by the presence of malaria.

To exploit information on malaria at a very granular territorial level, I digitized the map “Carta Della Malaria dell’Italia,” illustrated by Luigi Torelli in 1882 (see Fig. 4), obtaining a measure of its diffusion at provincial and district levels. The map was obtained as a result of the governmental investigation entitled “Reclamation of malaria regions along the Italian railways,”Footnote 19 led by Senator Luigi Torelli in 1882, and contained within the documentation of bill 17 of the 15th legislature. Malaria prevalence is classified into three degrees of intensity: weak, moderate, and severe.Footnote 20 To my knowledge, there exists no other similar historical data on malaria virulence at such a high spatial resolution for pre-industrial Italy.

Fig. 4
figure 4

Carta della malaria dell’Italia,” Torelli map (1880)

Figure 4 shows more than one similarity with the pattern observed for farm workers in the same year. As well as in the South and the two islands, I find malaria widespread in the areas of Maremma and Pontine marshes, Tuscany and Latium, and some provinces of the Po Valley. I claim that this measure is representative of the diffusion of malaria throughout Italy in all the liberal age, even though it is observed at only one point in time, and I am not able to exploit its panel dimension. Indeed, according to Snowden (2008), no remarkable changes occurred in the South until the Second World War, despite the introduction of quinine at the turn of the twentieth century, which somewhat reduced mortality due to malaria, especially in the North.Footnote 21 Nevertheless, mortality rates from malaria represent a less proper index for the extension of the phenomenon. Indeed, like other parasitic diseases adapted to their hosts, it is generally associated with a low case-mortality ratio. In 1882, the Italian government conducted a survey attesting that around 11 million people were at constant risk of infection, and another 2 million contracted the disease every year (see Snowden (2008)).Footnote 22 Hence, while the disease can contaminate a substantial percentage of the population, the number of deaths directly attributable to malaria endemicity accounts for only a low fraction of the overall annual mortality rate.

I use the variable as an excluded instrument in all cross-sectional IV regressions at different points in time, both for districts and provinces. The following equation expresses the first-stage regression:

$$\begin{aligned} LandIneq_{it}=\alpha _{2}+\beta _{2}\,Malaria_{it}+X^{\prime }_{it}\,\gamma _{2}+u_{it} \end{aligned}$$
(2)

where LandIneq is the proxy for landownership concentration, Malaria is the share of geographic territory covered by malaria, and X is the same vector of covariates included in the second-stage equation. The identification strategy hinges on the assumption that malaria endemicity affects literacy rates only through land inequality. Nonetheless, if malaria can influence agricultural productivity and crop mixes requiring different skill intensities, this would violate the exclusion restriction. Following the same argument, malaria could affect households’ fertility decisions and variations of child labor in agriculture, keeping children out of school. This phenomenon might even be reinforced by adopting crops requiring high skills and monitoring levels, inducing households not to invest in their children’s education. In light of these considerations, the estimates should be interpreted with caution, as a violation of the exclusion restriction may still occur.

5.2.2 Malaria and latifundia: an empirical analysis

First-stage results are reported in Tables 17 and 18 in the Appendix, showing the relationship between malaria and farm workers for each year at the district and provincial levels. The share of the surface covered by malaria is significantly and positively correlated with the fraction of farm workers. The first-stage F-statistic is constantly high, confirming the instrument’s validity and power, significantly related to the land ownership structure. Indeed, values exceed the threshold of ten as a rule of thumb, ranging from 11.18 to 26.49 for provincial estimates and from 36.32 to 76.86 for the district dataset.Footnote 23 Columns gradually add controls to evaluate the robustness of the association to different model specifications. Only the inclusion of the share of sharecroppers seems to substantially reduce the coefficient size in the first-stage regressions, while estimates remain stable conditioning for the remaining variables. In particular, residual baseline covariates and time-invariant geographic controls do not alter the association between malaria and land inequality. Besides, the F-test of excluded instruments slowly decreases as long as gradually conditioning for baseline and geographic controls, yet never plummeting below ten.

To provide further evidence of malaria as a fundamental driver of landownership concentration, I gather data on the share of the population scattered in the countryside and not located in towns (over the total population) for the district-level dataset at each point in time. Then, in another specification, I regress this measure on malaria and the set of geographic controls. Living close to cultivated lands was dangerous in mosquitos-infested regions. Thus, farmer families implemented a risk-coping strategy based on their settlement in towns located on top of hills, far away from malarious lands dispersed in the plains. The results reported in Table 3 confirm my conjecture. I find fewer people located in scattered dwellings where malaria was pervasive but more concentrated in small centers and villages. A 10 percentage points increase in the share of territory covered by malaria induces a reduction of 0.9 percentage points in the population scattered in the countryside. This indicates that a similar settlement pattern, while it seemed inevitable to offer protection from contagion, moved the agricultural workers away from the fields. The increased distance to the workplace favored the adoption of labor-saving agrarian regimes and extensive agriculture based on large estates concentrated in the hands of absentee landowners.

Table 3 Scattered dwellings and malaria, OLS, district level

Second-stage results are reported in Tables 4 and 5.Footnote 24 The two-stage least-squares (2SLS) regressions confirm the significant negative relationship between landownership concentration and human capital at the end of the nineteenth century and Italy’s first years of the twentieth century. In terms of magnitude, if I increased by one percentage point the share of farm workers for the Italian districts, I would find a literacy rate 0.41 percentage points lower in 1881, 0.24 percentage points lower in 1901, and 0.221 percentage points lower in 1921. In particular, consider the difference between the farm workers’ share in the districts of Bari and Lugo in 1881. In Lugo, LandIneq = 0.556 (which is the 25th percentile of the distribution of LandIneq across districts in Italy), and in Bari, LandIneq = 0.8032 (which is the 75th percentile). Using the estimates in the first column of Table 4 implies that Bari’s literacy rates in 1881 would have been almost 10% higher if it had a farm workers’ share as small as Lugo’s. The coefficient magnitude is also high for provincial regressions, ranging from 0.381 percentage points in 1881 to about 0.27 in 1921, even though inferior to the specification considering districts, plausibly due to unobserved factors not accounted for that make parameters downwardly biased.Footnote 25 To account for the small number of regions at which SEs are clustered, I additionally report wild bootstrap p-values and, alternatively, Conley SE adjusted for spatial correlation (cutoff of 200km) at the bottom of the table. The statistical inference produces similar findings under both approaches.

Table 4 IV estimates, district level
Table 5 IV estimates, provincial level

Overall, the IV coefficients are higher than the OLS coefficients at all points in time. The magnitude of the bias is in line with the results obtained by Cinnirella and Hornung (2016) and Ramcharan (2010). The authors find that IV results are more than three times larger than OLS estimates. The bias toward zero referred to OLS results might arise for several reasons, ranging from measurement error to the presence of unobservables only partially captured by the model and the identification strategy. However, the size of the bias varies according to the period. The \(\beta\) coefficient of the district level IV regression is almost 1.5 times higher than the corresponding OLS coefficient of 1881, while about 1.2 and 1.3 times higher in 1901 and 1921. This further suggests that the bias might be due to measurement error of farm workers in the first years, presumably because of a less refined grade of data collection in cadastres and censuses. Finally, similar to OLS results, we observe a decreasing effect over time by comparing coefficients across the three separate cross-sections for the district-level dataset. The same holds for the provincial estimates, except for an increase in size in 1911.

As to the other control variables, the share of the total agricultural labor force, capturing opportunities outside agriculture and distinguishing between rural and urban areas, shows a negative and statistically significant sign, as expected. However, excluding this variable does not affect the results. Interestingly, the negative relationship between sharecropping activities and human capital remains robust, although slightly more modest than the variable of interest, unlike in the OLS framework. The macroregional coefficients (not reported for brevity) are alternate in significance, with northern areas positively associated with literacy rates and latitude. The remaining control variables are not related to literacy. Specifically, urbanization rates are not associated with more education, likely because of high levels registered in the South due to the agro-towns’ presence, as already mentioned in Section 4. Then, the dummy for the landlocked area is slightly positively associated with literacy rates in the first stages of the period under analysis. Finally, odd columns show results for lagged values of farm workers. Interestingly, the size of the associated parameter is more than twice the size of the contemporary effect in the district level regression in 1901, while it seems in line with the respective coeval regressions in all remaining years. Then, findings seem consistent with the conjecture that the structure of the agricultural population and the share of farm workers are determined earlier than observed literacy rates.

In Section 5.3, I also report the model specification with the inclusion of the industrialization index, mortality rates due to malaria, agricultural productivity, and the role of domestic market access. Again, changes in the results are not significant, confirming the goodness of the model specification and the robustness of the identification strategy to factors potentially invalidating the exclusion restriction hypothesis. Moreover, I go even further by undertaking three falsification tests: the first consisting of a reduced-form regression of literacy rates on the indicator for malaria; the second allowing for a deviation from the perfect exogeneity assumption; and the third consisting in making assumptions on the degree and sign of the variable of interest’s endogeneity. Next, in Section 5.4, I exploit the panel dimension of the dataset.

5.3 Robustness checks

Malaria-endemic countries exhibit lower economic growth rates. The channels through which malaria prevents a country from undertaking a process of development are multiple, including effects on fertility, poverty, population growth, saving and investment, worker productivity, mortality rates, and medical costs (Sachs and Malaney 2002). Malaria is also seen to reduce the labor force and impede the adoption of new technologies and practices in agriculture (see Asenso-Okyere et al. (2011) for a review). All the more, the effects of malaria on school attendance and performance may directly affect human capital accumulation. Bleakley (2010) finds a positive effect of eradication campaigns in terms of income and an increase in years of schooling, whereas Cutler et al. (2010) find support for a mixed but indicative effect only for female education, with no impact on male income and education values. Similarly, Lucas (2010) finds that malaria eradication increased years of educational attainment and literacy. In contrast, high dropout rates are generated by school absenteeism, resulting from the diffusion of the disease.Footnote 26

Furthermore, a new and increasing branch of research also points toward ways in which malaria can permanently affect the cognitive performance of children (see Al Serouri et al. (2000); Holding and Snow (2001)). According to these studies, parasitemic children, for instance, score lower grades on specific tests than non-parasitemic children. Also, the long-term cognitive performance of a child may be inhibited by its previous in-utero experience during the pregnancy of the malaria-infected mother (McCormick et al. 1992).

5.3.1 Additional controls

In this section, I address the problem of potential time-varying confounding factors that can bias previous results. By doing so, I carry out some robustness checks of the IV baseline specification, including additional controls that can affect literacy through their interaction with landownership concentration. Results are reported in Table 6 at the provincial level.Footnote 27 For simplicity, I report regressions only for 1901, as all variables are at hand for this year, and comparison among different specifications is possible.

Table 6 Robustness checks, 1901

In the first column, I add mortality rates due to malaria in 1900, the first year this information is available Ministero di Agricoltura 1900. According to Malaney et al. (2004), in areas characterized by high levels of malaria endemicity, its mortality burden generally falls most heavily on children less than 5 years of age. Nonetheless, including mortality rates due to malaria does not affect the results, revealing that differences in mortality caused by malaria morbidity were not relevant in explaining literacy rates.

Column 2 includes a regional measure of labor productivity in agriculture to control for heterogeneity in agricultural productivity.Footnote 28 Indeed, malaria can undermine land suitability, ultimately affecting crop mixes and the adopted agrarian regimes. Nevertheless, the magnitude and significance of the coefficients are very similar to the estimates for 1901 in the IV baseline specification.Footnote 29

Moreover, a shift from an agricultural to an industrial economy, creating a suitable environment for commerce and trade, might cause heterogeneity in the incentives to invest in education. Then, I include a provincial industrialization index in column 3, coming from the work by Ciccarelli and Fenoaltea (2010). High industrialized areas, coinciding with the provinces of the so-called industrial triangle, are plausibly negatively associated with high land inequality levels, capturing its effect on literacy rates. Nevertheless, once again, the coefficient of the variable of interest retains its significance and remains almost unaltered.

The same holds once I add a regional measure of market access, coming from Missiaia (2016), to account for heterogeneity in opportunities to trade. Indeed, market access may have favored education expansion and the acquisition of literacy skills thanks to the closeness with markets and more flourishing trade. In her work, the author argues that only domestic market potential, representing the home market, shows a traditional north-south divide. In contrast, the South does not show remarkable disparities with the rest of the country whenever international markets are introduced. Therefore, I only use such a measure of domestic market potential, and I add it within the set of covariates. The estimates presented in column 4 with its inclusion reveal that the magnitude and significance of the coefficient of interest are virtually unchanged.

5.3.2 Validating the instrument: falsification tests

As is common in an IV framework, although it is easy to assess whether an instrument is relevant, less obvious is the satisfaction of the exclusion restriction hypothesis. As mentioned before, a problem with the choice of malaria as an instrument is that it may impact schooling through health channels, thus violating the exclusion restriction hypothesis. To rule out this channel, or at least to show that it is not the dominant one, I provide the reader with some insight into the instrument’s validity by performing specific falsification tests.

Reduced-form The first test consists of running a reduced form regression of literacy on the instrument, i.e., the indicator of malaria prevalence in a territory.Footnote 30 Results are reported in Table 7, in the row “Reduced form (full sample).” Not surprisingly, the presence of malaria has a negative and statistically significant correlation with literacy rates in 1881. To verify that the effect is not spurious, I replicate the exercise by running two regressions dividing the sample into subsamples with low and high values of the farm workers’ share (respectively, rows “Direct effect (remaining sample)” and “Direct effect (zero-first-stage group)”). Specifically, I consider the median of the variable of interest as the threshold value, being 0.6952 in 1881, 0.4196 in 1901, and 0.4155 in 1921. In the former case, the correlation between malaria and literacy rates is negative but not significantly different from zero (positive for 1901). By contrast, the regression conducted over the subsample with high values of farm workers shows a negative and statistically significant correlation between the indicator of malaria and literacy rates (except in 1921). Moreover, the magnitude of the effect almost reaches the one shown for the whole sample (i.e., −0.04 against −0.05 in 1881), meaning that the significant correlation displayed in the baseline case is mainly driven by districts exhibiting a substantial landownership concentration. In other words, when the conditions for latifundia creation are not met, the presence of malaria is not a sufficient condition to have lower human capital levels.

Table 7 Falsification tests: summary of regression results

Plausibly (and beyond) exogenous test The second and most reliable falsification test comes from the relaxation of the assumption that the instrument is strictly exogenous. As widely argued, several concerns can be raised about the instrumental variable’s validity and strictly exogenous nature. Hence, I relax the hypothesis that malaria is not correlated with other unobservables influencing literacy; that is, I argue that malaria is plausibly exogenous. Such a definition comes from Conley et al. (2012), who present a tractable and straightforward test of conducting inference consistent with the hypothesis that the exclusion restriction does not hold exactly. Expressly, the authors assume that the instrument enters the second-stage regression linearly as a further control and directly affects the outcome variable. Hence, the associated coefficient, which is the parameter on which they infer, is allowed to have a non-zero value, in contradiction with the exclusion restriction hypothesis. Based on two distinct approaches, the union-of-confidence intervals (henceforth, UCI) and the local-to-zero approach (henceforth, LTZ), they make prior assumptions on the true value of such a parameter that I will call \(\gamma\). In the first case, the support for \(\gamma\) is restricted to be an interval between two values, whereas, in the second case, further hypotheses are advanced concerning its distribution function.Footnote 31

Therefore, the plausibly exogenous method is advantageous if we have prior information on the sign and size of the violation of the exclusion restriction. However, no guidance is provided on how to obtain a plausible and credible prior. Of course, priors over \(\gamma\) centered at 0 may be inappropriate in this case, as the related literature suggests a negative effect of malaria on education (Bleakley 2010; Lucas 2010). A recent contribution by van Kippersluis and Rietveld (2018) proposed a two-step procedure to provide a novel sensitivity analysis for IV estimation. First, this approach estimates the direct effect of the IV on the outcome in a subsample group for which the first stage (that is, the effect of the IV on the treatment variable) is zero. The intuition is that in the same subsample, the reduced form (the effect of the IV on the outcome) should also be zero if the exclusion restriction is satisfied. After, this estimate of the direct effect is used as input for the plausibly exogenous method developed by Conley et al. (2012).Footnote 32

Although we do not know the subsample for which the first-stage effect is exactly zero, we might rely on the reduced form specification for low levels of the treatment variable. In such a case, we expect malaria pervasiveness to have, at the very least, a lower association with land inequality. Hence, the estimates of the reduced-form relationship between malaria and literacy where the percentage of farm workers is low provide consistent estimates of \(\gamma\), as we expect malaria to be less likely to produce an effect on literacy.Footnote 33

Table 7 summarizes regression results and reports estimated beta coefficients of the main variables for each year.Footnote 34 While malaria retains a statistically significant adverse effect in the zero-first-stage group (despite being smaller in magnitude), the respective direct effect is not significantly different from zero for each year.Footnote 35 This is reassuring, as a significant association between the instrument and the outcome in the subgroup for which we expect to have a zero-first-stage effect would have us believe in a substantial violation of the exclusion restriction. The corresponding results displayed in rows “plausibly exogenous” and “plausibly exogenous (with uncertainty)” are mixed and, to some extent, inconclusive. While the “plausibly exogenous” row shows that the effect of farm workers on literacy completely disappears when correcting for the estimated direct effect of the IV on the outcome in the reference sample, the subsequent row incorporating uncertainty around the direct effect shows that its size does not seem sufficiently large to alter the main conclusions substantively.

Figure 5 shows that while in some cases, there is still a significant effect of landownership concentration on literacy rates, even allowing for plausible amounts of imperfect exogeneity, in some others, more caution is required in the interpretation of results.Footnote 36 For instance, in the UCI approach (left-side) for 1881 regression, if I allow \(\gamma\) to be negative and ranging from −0.05 to 0, the corresponding confidence set for \(\beta\) is approximately \([-0.6879; 0.2453]\), while for 1921 is \([-0.3346; 0.0703]\). By contrast, in 1901, it remains negative at almost every value in the support of \(\gamma\). As for the LTZ approach, we can straightforwardly verify that the bounds of the confidence intervals for \(\beta\) go across 0 as soon as I allow the direct effect to be greater than, respectively, −0.018, −0.02, and −0.026 approximately.Footnote 37 This means that priors consistent with beliefs above this threshold value are sufficient to lose confidence in finding a significantly negative effect of land inequality on education. By contrast, below this prior belief for \(\gamma\), I find evidence that \(\beta\) still represents an economically relevant impact.

Fig. 5
figure 5

Plausibly exogenous test: 95% interval estimates. Notes: The figure presents 95% confidence intervals for the effect of farm workers (share over total labor force in agriculture) on literacy rates in 1881 at the district level. The definition of \(\gamma\) differs between the support-only intervals (UCI, left side) and the interval that uses the full prior (LTZ, right side). In the first approach, I let \(\gamma\) assume a range of values between −0.05 and 0, while in the LTZ approach, I impose a normal distribution with a mean equal to \(\hat{\gamma }\) in the zero-first-stage group and variance equal to \({S^2_0}\)

To conclude, applying Conley et al. (2012), we can show that the negative estimate of gamma reported in Table 7 for the zero-first-stage group is not above (in absolute value) the value of gamma necessary to lose confidence in the finding of a negative impact of land inequality on literacy in 1881 and 1901. For instance, in 1881, for the 90% confidence interval for beta to include zero, gamma must be larger than −0.018 circa. Thus, it is more than three times the estimate of −0.006 in Table 7. Therefore, even allowing for reasonable amounts of imperfect exogeneity, we can still confirm the presence of an adverse effect of land inequality on education. However, the opposite holds for 1921, as −0.031 is larger than −0.026. This implies that we cannot reject a zero effect of land access inequality on literacy rates. The significant OLS and IV results might be the consequence of the selection of comparatively more malaria-pervasive districts into those with higher farm worker proportions.Footnote 38

Degree of endogeneity An alternative identification strategy that does not rely on the exclusion restriction hypothesis comes from Kiviet (2020). Instead of constructing conservative confidence sets that allow for a mild violation of the exclusion restrictions, assuming a plausible range of direct effects for the instruments, the author imposes assumptions on the degree of regressor endogeneity, which is left unrestricted in an IV world. In other words, this approach identifies the confidence bounds of regression coefficients by allowing the admissible correlation of the regressors with the error term to be within plausible bounds.Footnote 39 The kinky least squares approach (henceforth, KLS) developed by Kiviet (2020) corrects the OLS estimator for all values on a grid of endogeneity correlations to provide a set of consistent coefficient estimates under the postulated endogeneity range.Footnote 40 Although inference can thus be conducted without instruments, the KLS approach enables testing the validity of any potential exclusion restrictions by adding them into the KLS regression and (set)-identifying their direct effect.Footnote 41

Several hypotheses can be made on the farm workers’ plausible degree of endogeneity. On the one hand, omitted variables proxying for economic performance and prosperity can be conducive to a negative correlation between the error term and the endogenous variable. Similarly, omitted variables capturing channels through which malaria might impact literacy, such as cognitive development of children, childhood exposure to malaria, childhood mortality due to malaria, and school absenteeism, might induce the endogenous regressor to further negatively correlate with the disturbances. On the other hand, allowing for a mild measurement error of farm workers, an imperfect measure of landownership concentration and political power, induces a positive correlation with the error term by construction.Footnote 42

KLS results are presented in Fig. 6 in conjunction with 2SLS estimates.Footnote 43 While KLS confidence intervals are narrower than IV’s, the actual correlation rate is unknown, and one should consider the union of confidence intervals over the whole range of correlations, or at least a reasonable one. For high levels (in absolute value) of correlations, the union of KLS confidence sets is around three times larger than the 2SLS confidence interval. Notably, however, KLS and 2SLS confidence intervals overlap for median endogeneity correlations, and the 2SLS point estimates are inside the KLS intervals over a sub-range of slightly positive values. This is at the very least reassuring about the appropriateness of the chosen instrumental variable. Nevertheless, without prior information on the reasonable range of endogeneity, I cannot sharpen the KLS inference.

Fig. 6
figure 6

KLS and 2SLS coefficient estimates and confidence intervals of farm workers for each year. Notes: Vertical red lines indicate the point at which KLS coefficient estimates become insignificant

Generally, every year we observe that for a positive range of postulated endogeneity correlations, the KLS confidence intervals never intersect 0, always resulting significantly negative. On the contrary, assuming a negative range of endogeneity, we always notice a cutoff value (vertical red lines) above which we lose confidence in farm workers’ significantly negative economic impact on literacy. The opposite holds with priors consistent with a postulated endogeneity range below the same threshold value. Similarly, we find that we do not reject the null hypothesis that the instrument is validly excluded from the model for values ranging from slightly negative to small positive correlation levels.Footnote 44 Hence, I add malaria among the set of covariates in the following specification, as the KLS approach has the advantage that we can obtain valid inference without any instruments.Footnote 45

I report graphs of interest in Fig. 7. Directly implied by the exclusion restrictions tests, the coefficient of malaria seems to take on negative values when allowing for negative endogeneity correlations to occur, although the process takes place at a slower speed than with the coefficient of farm workers. This suggests that the coefficient associated with malaria is more stable in comparison and steadily remains at zero for a broader range of endogeneity correlations.

Fig. 7
figure 7

KLS coefficient estimates with malaria among regressors for each year

Nevertheless, I can further test the exclusion restriction by employing the level of malaria coefficient in the previous KLS specification as a prior in the plausibly exogenous test, applying Conley et al. (2012). As seen, shaping a prior belief about a plausible range of direct effects could generally be equally complex or even more challenging than agreeing on a reasonable range of endogeneity correlations. Hence, I take the value corresponding to the KLS specification with a negative correlation level equal to the threshold above which the coefficient of farm workers loses significance.Footnote 46 Then, I assume that the direct effect is halved and standing at an equal distance between 0 and such a value. Thus, I can finally apply Conley et al. (2012) to obtain the corresponding union of confidence intervals.

Instead of reporting the graph from the plausibly exogenous test, I add the resulting confidence bands, \([-0.6437; 0.0073]\) in 1881, \([-0.3333; -0.0142]\) in 1901, and \([-0.3164; -0.0188]\) in 1921 to the graphical output of the KLS regression.Footnote 47 The results are displayed in Fig. 8. Notice that imposing zero as the lower limit for the direct effect of malaria pervasiveness, for it would be a validly excluded instrument, the lower bounds of the plausibly exogenous test coincide with the respective 2SLS bounds. The principal result of this empirical exercise is that we cannot reject the hypothesis that malaria is a valid instrument for the range of endogeneity correlations that do not overstep the previous cutoff values. Only the widened PE confidence bounds for 1881 boiling down from allowing malaria to have a (small) nonpositive direct effect make it more difficult to infer meaningful conclusions. In contrast, the KLS inference remains informative for the 1901 and 1921 regressions as long as I restrict the attention to a subset of postulated endogeneity correlations.

Fig. 8
figure 8

KLS, 2SLS, and PE coefficient estimates and confidence intervals in the specification with malaria pervasiveness among covariates

Overall, maintaining the assumption that the endogeneity of malaria is due to measurement error (thus positive), the KLS estimate of the farm workers is still significantly negative. In contrast, if we are sure that omitted variables play a crucial role and the correlation between the endogenous variable and the error term is negative, two possible outcomes may arise. On the one hand, when land inequality is mildly endogenous, the associated coefficient remains statistically significantly negative (but smaller), and estimates’ bounds encompass the respective 2SLS point estimates. This indicates that malaria pervasiveness might indeed be a valid and relevant instrument. On the other hand, if we accept that the omitted-variables bias is so large that we cannot maintain that our prior belief about the endogeneity correlation ranges between 0 and a small negative value, we are forced to acknowledge that malaria becomes an invalid instrument. However, well-established literature points to measurement error when evaluating the effect of land inequality on education, justifying the upwardly biased OLS coefficient and the difference in magnitude with IV estimates (Cinnirella and Hornung 2016; Easterly 2007; Ramcharan 2010; Beltrán Tapia and Martinez-Galarraga 2018). Therefore, it is unrealistic to assume a largely negative true endogeneity correlation, thus providing support for an identified causal adverse effect of landownership concentration on education during this juncture of Italian history.

5.4 Panel results

Exploration of the panel dimension of the dataset to estimate the within-province and within-district relationship between changes in land inequality and literacy produces similar results. The advantage I have is twofold: consider all the unobserved time-invariant heterogeneity potentially correlated with the land ownership structure (fixed effects) and time shocks common to all units of observation, potentially producing changes in the variable of interest then affecting education levels (time dummies). Furthermore, shocks could have determined a change in the political power of landowning elites. Hence, I interact land inequality with time dummies to account for institutional changes common to all units in a stepwise diff-in-diffs approach. The corresponding equation assumes the following form:

$$\begin{aligned} Edu_{it}=\alpha _{i}+\delta _{t}+\sum \limits _{t}\,\beta \,Land_{i,1881}\,\delta _{t}+\sum \limits _{t}\,\gamma \,X^{\prime }_{it}\,\delta _{t}+\upsilon _{it} \end{aligned}$$
(3)

where Edu are literacy rates (overall and by gender), \(\alpha _{i}\) and \(\delta _{t}\) are, respectively, fixed effects and time shocks, while \(X^{\prime }_{it}\,\delta _{t}\) represent the interaction between baseline controls and time dummies. The variable of interest is represented by the interaction term \(\sum _{t}\,\beta \,Land_{i,1881}\,\delta _{t}\); finally, \(\upsilon _{it}\) is the error term. Land inequality is fixed in 1881, and the coefficients for the interaction terms are to be interpreted with respect to this initial value. By doing so, we avoid that changes in farm workers are endogenously driven by those in education levels. I am primarily interested in \(\beta ^{\prime }\)s.

Results are reported in Tables 8 and 9, both for districts and provinces. As mentioned above, I opt for a model specification in which all regressors are interacted with time dummies to account for their marginal effect to change over time (see Cinnirella and Hornung (2016) and Cappelli (2016)). This is due to the different nature of the variables: literacy rates follow a constantly increasing trend, while the fraction of farm workers appears to be slowly changing, only showing remarkable geographical differences between regions.Footnote 48 The coefficient sign referred to the initial level of land inequality is negative and statistically significant in 1901 and positive but not significantly different from zero in 1921. In the provincial case, with a longer time series at hand, it becomes significant in 1891, reaching a peak in 1901 and then decreasing subsequently, eventually losing significance in 1921. When I consider literacy by gender, the evidence is more mixed. In the district-level panel dataset, while the impact on female literacy rates seems to follow the same path, the effect on male literacy rates is surprisingly positive and significant in 1921. Similarly, in the provincial case, I find a negative and statistically significant effect throughout the considered period on female literacy rates, with the coefficient magnitude becoming smaller and insignificant over time, while the impact on male literacy vanishes even before, eventually turning positive in 1921 (though not significant). In columns 1, 4, and 7, the control variables are allowed to change over time. However, changes in the covariates might be endogenous to those in the distribution of political power, thus biasing our estimates. Similar to farm workers’ share, I fix all time-varying control variables in 1881 and interact them with time shocks (columns 2, 5, and 8). The coefficients for interacted landownership concentration become even higher in size, reinforcing the effect on literacy. Interestingly, in the district panel dataset, the impact on literacy for all people and males becomes significantly positive in 1921. The pattern of results for the provincial estimates is hardly affected. In columns 3, 6, and 9, I move one step forward and predict farm laborers in 1881 with malaria in 1881 and use the predicted values interacted with time dummies as the main regressor. Again, while the size of coefficients increases, their time pattern remains unaffected, with a negative impact vanishing before turning positive in 1921 (though not significant). Finally, spatial analysis accounting for spatially correlated standard errors does not alter the main conclusions. Wild bootstrap p-values (1,000 repetitions) substantially confirm the main results, although slightly hampering coefficients significance in columns 1, 2, and 5 of Table 8, and columns 1, 2, 4, and 5 of Table 9. Conley SEs adjusted considering a threshold of 200km corroborate previous findings, only reducing the significance of the coefficient interacted with the 1891 dummy.

Table 8 Panel results: district level
Table 9 Panel results: province level

Overall, a comparison of the coefficients for the interaction terms indicates that districts and provinces exhibit a change over time in the relationship between land ownership concentration and literacy, with the former becoming less and less influential due to a weakening of the political power of local notables. That is, districts and provinces with an initially stronger landownership concentration, while first exhibiting lower literacy rates relative to 1881, see their education levels increase at the end of the period. Take Table 8, column 2 as an example: while in 1901 literacy rates are 1.3 percentage points lower relatively to 1881 in districts with 10 percentage points higher initial share of farm workers, those same districts exhibit in 1921 literacy 1.2 percentage points higher with respect to 1881. Remarkably, the differential seems to increase after 1911, suggesting a greater scope for convergence for more unequal areas.

Similar results conform with the passive modernization process Italy was experiencing at that time, with the imposition of many top-down public policies that, along with minor changes in the land ownership structure, may have reduced its impact on education through time. If the conjecture is correct, reforms of the end of the nineteenth century and the beginning of the twentieth century somewhat changed the relationship between the landed nobility and the peasantry and their incentives to invest in education. Indeed, democratization and the achievement of political equality in Western European countries during the late nineteenth and early twentieth century were generally followed by rapid public policy changes. Lindert (2004) defined it as “the 1880-1930 laboratory,” documenting the historical proximity between franchise extension and public provision of education, increased spending in social transfers, labor market reforms, and creation of income tax systems. Post-unitary Italy is part of this laboratory, as local governments were in charge of providing many pro-growth public goods, including education and healthcare. Their decisions were likely to have determined long-lasting regional variation in income per capita. The Daneo-Credaro bill, enacted in 1911, is only one example of similar top-down policies. Another fundamental structural break is the piecemeal enfranchisement process experienced in Italy at different stages, with the stepwise enlargement of the electoral franchise for male people as late as 1919. All these institutional shocks weakened the political power of landowning elites, thus limiting their ability to intervene in public choice in later stages of development. In this case, it is reasonable to expect such a variable to have had lesser explanatory power regarding human capital accumulation.

In the next section, I explore the mechanism of transmission of the effect by focusing on the effect of land inequality on some intermediate measures of supply and demand of schooling.

6 Mechanism of transmission: supply vs demand

Literacy rates represent a so-called equilibrium measure of human capital. Despite finding a negative and statistically significant relationship with the share of farm workers, approximating land inequality, I still do not know whether this impact runs through either a demand or a supply channel. On the one hand, an unequal land ownership structure implies a more concentrated economic and political power in the hands of a few large landowners. This is particularly relevant for the purpose of this study since local notables, who took over municipal councils, could influence the decisions to invest in mass schooling (certainly until 1911) and prevent the majority of the population from gaining decisional power. On the other hand, a high fraction of annual and daily workers in agriculture, a social class owing few economic resources and facing a binding budget constraint, had lower incentives to invest in human capital, given its low payoff and uncertain long-term returns.

As discussed in Section 1, Galor et al. (2009) assert that human capital is complementary to industry-related tasks, most widespread in urban and industrialized areas. In contrast, the rural labor force is not required to possess literacy skills. Thus, large landowners’ stake was to reduce and possibly oppose human capital-enhancing institutions, limit migration, and maintain low agricultural wages. By contrast, capitalists were in favor of supporting institutions promoting education. To explore how land inequality affects human capital accumulation, I re-estimate the IV model restricting the analysis to rural districts and comparing results to those obtained using the entire sample.Footnote 49 We expect the coefficient associated with land inequality to be even larger where landowning elites were more able not to favor the expansion of the provision of mass schooling in rural areas.

Results are reported in Table 10. I employ malaria pervasiveness as the exogenous instrument in all specifications and include the complete set of controls.Footnote 50 Once comparing the estimated coefficients with those reported in Table 4, we observe that the size ranges from 1.75 to 1.3 times higher in the subsample containing districts with urbanization below the 33rd percentile. In contrast, it ranges from 1.4 to 1.16 in the subsample containing districts registering urbanization rates below the median. The supply-side mechanisms stressed by Galor et al. (2009) seem to be confirmed by this evidence.

Table 10 Restricted sample: IV with rural districts

The landed nobility retaining political power in rural areas was liable for constructing schooling infrastructures, school teachers’ appointments, and municipal fiscal resources management. Thus, I employ these intermediate outputs approximating schooling supply to test further whether landowning elites could obstruct education provision. Specifically, I use child-teacher, school density, and municipal expenditures per child aggregated at the provincial level. Besides, I gather information on enrollment rates and child labor to assess the role of rural households’ demand for education. Unfortunately, I can barely conduct IV regressions without the possibility to exploit the panel dimension of the dataset because this information is only observed for 1881. Moreover, data at a more granular district level is only available for child labor, while enrollment rates are only obtainable for provinces.Footnote 51 They are constructed by dividing people enrolled in primary schools over those aged between six and fifteen. Moreover, it is possible to disentangle the variable by gender.

The baseline specification reports IV estimates, employing malaria as an excluded instrument and including the set of baseline controls.Footnote 52 Notwithstanding that these variables aim to proxy supply and demand factors, they are somehow endogenous. Indeed, a larger number of teachers appointed to school and a higher level of school density may induce children to attend school. Conversely, enrollment rates, capturing children’s demand to go to school, may boost the supply of schooling. The same argument applies to municipal expenditures per capita, which may be greater where education demand is high. Nonetheless, assessing the impact of inequality on these schooling outputs remains highly informative per se.

Results are reported in Table 11. I find a positive and statistically significant association between farm workers and the ratio between children and teachers. This finding suggests that where landownership was highly concentrated, the number of teachers appointed to schools was lower, likely resulting from a minor effort to expand education. The significance seems to vanish over time, suggesting this supply channel worked more vigorously in the first stages of the period under analysis and lost strength over time with changes influencing power balance. When it comes to school density, I find a negative association, though not statistically significant.Footnote 53 This means that disparities in the construction of schoolhouses between areas with greater and lower levels of landownership concentration had been almost non-existent as late as the rise of the Fascist party. The effort spent to build them up appears to be the same across Italy’s municipalities.Footnote 54 Similarly, except for the first period under analysis, I find no evidence of a significant systematic correlation when investigating the relationship between farm workers and the value of municipal expenditures per child.Footnote 55

Table 11 Mechanism of transmission: supply

The sufficient cross-sectional and temporal variation allows the implementation of panel estimates for all three outcomes. I do not consider the presence of malaria as an exogenous instrument due to its lack of time variation. However, as in Section 5.4, it is possible to account for unobserved provincial heterogeneity (fixed effects) and the presence of temporal shocks common to all provinces (time dummies). Furthermore, I employ the exact model specification as in Table 9, with the share of farm workers fixed at the beginning of the period and interacting with time shocks. Even covariates interact with time dummies in the vein of the already employed stepwise diff-in-diffs estimation. Columns 2, 5, and 8 fix all control variables in 1881, and columns 3, 6, and 9 use predicted values of farm workers in 1881 using the malaria index as an instrument.

Results are reported in Table 12. In the first three columns, I find a negative and statistically significant effect of farm workers on the child-teacher ratio. Also, the impact displays an increasing trend in magnitude. Take column two as an example: the parameter ranges from −0.1 for 1891 to a value of −0.2 for 1921. It means that geographic areas that exhibit high levels of land inequality at the beginning of the period register the highest decreases in child-teacher ratios through time. Besides, the effect magnitude is even reinforced moving forward in time, apparently suggesting an increasing role of the supply process in expanding public schooling.

Table 12 Mechanism of transmission: panel estimates

Nevertheless, while this finding might seem in contrast with cross-sectional results and appear surprising and somehow counterintuitive, the stepwise diff-in-diffs estimates must be interpreted differently. They suggest that the channel referred to the appointment of teachers into schoolhouses becomes relevant as long as the distribution of political power diverts away from landholders. This is evidenced by the coefficient magnitude associated with the interaction term in 1921, which is almost twice the previous ones. After registering steady negative values in the first three periods, we observe the coefficient plummeting sharply, in conjunction with the introduction of the Daneo-Credaro law in 1911. Thus, higher investments in teachers are to be attributed to the central government.

The following columns display findings using municipal expenditures per child and school density as outcome variables. While we observe a modest role of political power in affecting public expenditure, school density is never influenced by the interaction between the initial level of farm workers and time shocks. Finally, spatial analysis clustering standard errors with the Conley method and wild bootstrap estimation confirm the main findings in both tables. Therefore, while findings corroborate the results of the IV model specifications, further research is needed to find out more about which supply factors mattered most to transmit the effect, if any.

As for the demand side of the story, I test the effect on enrollment rates and child labor. Although there is a high correlation with literacy rates, enrollment rates should accurately capture decisions to attend school and then the demand for education. This is because literacy rates are referred to the whole population aged more than six, thereby including people who could have acquired such a skill years before. In this case, the contemporaneous correlation with land inequality would say little about the presence of literate within a district or a province. The literacy endowment of a specific territory may thus represent the result of private agents’ decisions undertaken years or even decades earlier.Footnote 56 By contrast, people enrolled in primary schools are prevalently children, thus only representing a population cohort. Hence, they should better reflect the decisions of the families to invest in their children’s education.

Results are reported in Table 13, panel A. Unlike estimates with school density and municipal expenses as outcomes, land inequality negatively affects enrollment rates in every specification (overall and by gender). Moreover, the magnitude of the effect is stronger than the one on the schooling supply outcomes. For instance, relative to the regression with the child-teacher ratio as the outcome variable for the same year (Table 11, column 2), the coefficient is more than 1.5 times higher. Unfortunately, contrary to the case of the child-teacher ratio, here we cannot investigate the robustness of the correlation between the two phenomena by evaluating whether the effect vanishes over time.

Table 13 Mechanism of transmission: demand

Nevertheless, most often, the number of children enrolled at school does not coincide with the proportion of children attending school. Absenteeism rates were elevated in depressed areas, and the law was not equally enforced throughout the territory. Since attendance rates are not observed for the period under analysis, I rely on district-level data on child labor in 1881 by gender to approximate the decision of families to send their children to work.Footnote 57 Indeed, child labor is the alternative to education.

Doepke and Zilibotti (2005) propose a theoretical approach in which unskilled workers compete with children in the labor market, facing a tradeoff. On the one hand, unskilled workers support child labor regulations to raise their own wages. On the other hand, they oppose it when their working children provide plenty of family revenue. Furthermore, historical evidence observes a preference for child labor regulations in the first phases of industrialization because of an increasing return to education. Hence, I expect a higher proportion of child labor to be associated with greater landownership concentration.

Panel B of Table 13 reports results using agricultural child labor as the outcome variable, overall and divided by gender. When I do not distinguish by gender, regression outputs only show a modest significant result. Nevertheless, a positive and statistically significant impact is found once I account only for male child labor. This aligns with the argument that only males were devoted entirely to working the land, albeit sometimes female involvement in agricultural tasks was likely to occur.Footnote 58 Both methods to spatially cluster SE produce consistent results.

Finally, I move one step forward by considering only people aged between 15 and 19 who have acquired literacy skills. In this case, the pattern of literate should represent the coeval result of the decisions of households to invest in education, the alternative to their decision to send their children to work. Following a similar argument, I expect farm workers to be negatively correlated with literacy computed for people aged between 15 and 19, in contrast to coefficient signs for agricultural child labor. Hence, I run regressions employing this measure at the provincial level as the outcome, finding a negative and significant effect of land inequality. Results are robust to different model specifications and estimators (OLS and IV) and different years.Footnote 59 The results are less striking in panel regressions, employing the usual stepwise diff-in-diffs approach. Only when controls are held fixed at the beginning of the period do we find provinces with a larger share of farm workers registering a significant increase in literacy for people aged between 15 and 19 in 1911. This finding is in line with the vanishing size of the coefficients in the cross-sectional estimates. De facto political power and landownership concentration play a constantly decreasing role in the ongoing modernization process of Italy.

Therefore, although the findings do not sweep away the role of mass schooling supply and decisions of landowning elites in promoting it, they evidence the existence of alternative factors explaining the negative relationship between land inequality and education. Similar demand-based incentives may be related to opportunities outside agriculture, to which payoffs of literacy skills are attached, and binding budget constraints of the rural population during this stage of development. Further research would enhance knowledge on the identification of the proper mechanism.

7 Conclusions

Relying on an adequate indicator of land inequality in post-unification Italy, a uniquely assembled database both at the district and provincial levels offers the possibility to investigate whether and how landownership concentration affected literacy rates. I find that areas that exhibit higher levels of landownership concentration show systematically lower literacy rates throughout the period under analysis. Exploration of the panel dimension of the dataset shows that the magnitude and significance of the effect decrease over time, in line with the “passive modernization process” hypothesis. An instrumental variable approach based on malaria virulence addresses omitted variables and reverse causality concerns. Under the assumption that malaria is exogenously determined, IV regressions corroborate OLS results, confirming a negative relationship between landownership concentration and literacy. Moreover, the findings seem particularly robust to a different size of the sample (districts and provinces) and the inclusion of additional controls, whose omission might have violated the exclusion restriction hypothesis. Furthermore, several falsification tests to validate the identification strategy confirm the hypothesis of impact causality.

Another remarkable result is obtained when analyzing the effect’s transmission mechanism. Notwithstanding the difficulty of distinguishing between supply and demand factors, not only land inequality may have adversely affected literacy rates influencing schooling supply through the political process, but also through private education demand. Results employing several measures of intermediate outputs of schooling supply and demand confirm this conjecture, showing that the latter played an even major role in determining disparities in human capital accumulation. Indeed, once I employ enrollment rates, child labor, and literacy rates for people aged between 15 and 19 as outcomes, results show a robust association with landownership concentration, indicating that rural households preferred utilizing their children’s workforce to perform work duties rather than sending them to school. Thus, where land ownership was highly concentrated, the economic condition of the vast majority of the rural population was astonishingly precarious, and incentives to invest in education were beyond the binding budget constraints of large segments of society. Likewise, the opportunities to acquire literacy skills were limited because the possibilities of upward mobility were almost non-existent.

Therefore, although alternative explanations are not to discard, the findings of this work shed new light on the mechanism through which land inequality shapes different human capital accumulation paths, highlighting the role of malaria in creating and maintaining latifundism in the first place.