Skip to main content

The role of lactase persistence in precolonial development


This paper argues that a genetic adaptation to the Neolithic Revolution led to differential levels of development in the precolonial era. The ability to digest milk, or to be lactase persistent, is conferred by a gene variant that is unequally distributed across the Old World. Milk provided qualitative and quantitative advantages to the diet that led to differences in the carrying capacities of respective countries. It is shown through a number of specifications that country-level variation in the frequency of lactase persistence is positively and significantly related to population density in 1,500 CE; specifically, a one standard deviation increase in the frequency of lactase persistent individuals (roughly 24 percentage points) is associated with roughly a 40 % increase in precolonial population density. This relationship is robust to a large number of sample specifications and potentially omitted variables.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. The Neolithic Revolution is the name given to the transition from hunting and gathering to agriculture.

  2. As explained in Sect 1.3, lactase persistence is equivalent to lactose tolerance.

  3. Sect. 1.3 and the supplemental appendix give more discussion to the validity of this assumption.

  4. This is discussed with greater detail in Sect. 2.1.1. In order to confirm my results, I also use a second, cruder strategy of assigning majority ethnic groups to represent countries in the 1,500 CE. This strategy is pursued in similar research–i.e., Spolaore and Wacziarg (2009). The correlation between the two measures of the frequency of lactase persistence is 0.98. Estimation with the alternative measure is found in the supplemental appendix.

  5. See, e.g., Acemoglu et al. (2001), Bockstette et al. (2002), Chanda and Putterman (2007), Comin et al. (2010), Engerman and Sokoloff (1997), Engerman and Sokoloff (2002), La Porta et al. (1999), Nunn (2008).

  6. For simplicity I reference milk to be from cattle.

  7. This idea from Nunn and Qian is supported by a companion paper that shows the introduction of the potato following the Columbian Exchange had larger effects in countries with large frequencies of lactase persistence (Cook forthcoming).

  8. All infants produce lactase in order to digest mother’s milk.

  9. Lactose is found in all milk.

  10. Weaning is the process of an infant taking nourishment other than by suckling.

  11. As is consistent with the literature, I will use lactase persistence instead of lactose tolerance. Although, the two terms have equivalent definitions.

  12. This is dependent on the availability of milk. If no milk is available; no advantage exists.

  13. A selective sweep is defined as, “The process in which a favorable mutation becomes fixed in a population (Hartl and Clark, p. 184).”

  14. The supplemental appendix provides a fuller discussion of the potential for reverse causality.

  15. A measure of the frequency of lactase persistence has been calculated by using the frequency of the gene that allows for the continued production of lactase in European populations. Substituting this measure into the estimating equation specified above leads to a positive and significant coefficient, but the use of the European gene frequency is sensitive to the inclusion of a number of controls. This is to be expected, due to the gene’s positive relationship with milk consumption in Europeans and nonexistent relationship with milk consumption in all other ethnic populations, which results in a large measurement error on the explanatory variable of interest and an attenuation of the coefficient.

  16. A phenotype is the physical expression of a genotype (Hartl and Clark 2006).

  17. Swedes and Danes belong to the East Scandinavian branch of the Indo-European language group, Filipinos and Maori belong to Malayo-Polynesian branch of the Austronesian language group, and the Fon and Yoruba belong to the Volta-Niger branch of the Niger-Congo language group.

  18. See the supplemental appendix for all estimations involving the reduced, conservative sample.

  19. The supplemental appendix explores an alternative method of establishing ethnic compositions in 1,500 CE.

  20. Another possible concern involves the monotonicity, or relative frequencies, of lactase persistence over the past 500 years. For this to bias estimation, one of two (or both) events must occur: Either highly populated countries obtained lactase persistence at a greater rate than less dense countries, or less populated countries have reduced their frequencies of lactase persistence relative to densely populated countries. Please see the supplemental appendix for a fuller discussion.

  21. The world lactase persistence frequency calculated by Ingram et al. (2009a), however, is based on a flawed population weighted average.

  22. Finland, Norway, and Sweden are also statistical outliers in the relationship between the frequency of lactase persistence and population density as signified by a robust standardized residual larger than 2.25 (Verardi and Croux 2009). The coefficient of lactase persistence is unchanged in magnitude or significance when excluding the Scandinavian dummy. The Scandinavian dummy consists of markers for Denmark, Finland, Norway, and Sweden. While Finland is not culturally Scandinavian, it does exhibit characteristics for which the indicator is intended to control for, mainly high levels of lactase persistence and relatively low population density. The exclusion of Finland from the Scandinavia indicator, or the exclusion of the indicator itself, does not alter the estimated relationship between the country-level frequency of lactase persistence and population density in 1,500 CE.

  23. As is shown in Table 6, the country-level advantage of lactase persistence is a relatively recent phenomenon. Lactase persistence was most likely associated with pastoral societies and didn’t result in denser populations until the widespread use of sedentary agriculture.

  24. The suitability of land for pasture uses the contemporary suitability, implying that this measure may also be influenced by 1,500 CE population densities; therefore, as with the measures from the Ethnographic Atlas, the use of land suitable for pasture may not be controlling for historic differences in land for pasture but differences in population density in 1,500 CE.

  25. The main measure of ecological suitability for the tsetse fly is the average across three strains–Fusca, Morsitans, and Palpalis. The inclusion of each suitability measure, either separately or together, does not alter the coefficient of the frequency of lactase persistence.

  26. Distance from the a technological frontier should also account for the potential effects of trade.

  27. A previous version of this paper also considered solar radiation as a potential instrument for the country-level frequency of lactase persistence.

  28. The individual inclusion of either measure does not alter the coefficient of lactase persistence. Additionally, the use of the complete base sample given in Table 2 does not alter the results.

  29. The percent within the tropics also accounts for tropical disease environments that may prevent the spread of cattle and populations.

  30. Both historic economic development and the fraction of the diet from animal husbandry are derived from George Murdock’s Ethnographic Atlas; therefore, each measure is based on an ethnicity (Murdock and White 1969). Using this data on ethnicities and the contemporary ethnic composition of countries (Lewis 2009), Alesina et al. (2013) create country-level measures for many of the variables within the Ethnographic Atlas. These country-level measures are based on contemporary populations and not population compositions in 1,500 CE; however, limited migration in the Old World likely results in roughly consistent ethnic compositions across time–e.g., the correlation between the 1,500 CE measure of lactase persistence and the measure of lactase persistence using contemporary populations is roughly 1. A further concern of data from the Ethnographic Atlas is the time of collection occurs after the primary dependent variable of interest, population density in 1,500 CE. As with the frequency of lactase persistence, variation in these measures may be driven by differences in historic population densities; therefore, the inclusion of the intensity of animal husbandry and economic complexity may not be exogenously accounting for differences in the specified variable.

  31. The land suitable for pasture, the intensity of plow agriculture, and the ecological suitability of the tsetse fly are also intended as controls for the additional benefits of domesticate animals.

  32. Biogeographic controls are excluded due to the sample truncation.

  33. The effect of lactase persistence remains positive, statistically significant, and consistent in magnitude when altering the sample across the differing variables of Table 5. The reduced consistent sample does not significantly alter the relationship between lactase persistence and precolonial population density.

  34. The frequency of lactase persistence is calculated with ethnic compositions for the year 1,500. I have little reason to suspect the measure is invalid as a proxy for previous periods.

  35. The p-value of the coefficient of the frequency of lactase persistence in column (4) is 0.058.

  36. Use of Maddison income estimates for the year 1,500 results in a sample size of 25. As a consequence of this reduced sample, the estimated relationship between lactase persistence and population density becomes insignificant. Urbanization from Nunn and Qian (2011) is the fraction of a country’s population living within a city of a population greater than 40,000.

  37. The 3 excluded countries for Table 7, which are missing data on urbanization, are Azerbaijan, Bahrain, and Georgia. The baseline relationship between lactase persistence and population density in 1,500 CE is unaffected from the exclusion of these countries.

  38. The negative relationship between lactase persistence and urbanization is unaffected by the exclusion of the dummy for Roman heritage.

  39. The contemporary controls include an ancestry adjusted measure of precolonial population densities, an ancestry adjusted measure for the millennia a country has practiced agriculture, ethnic fractionalization, land productivity (i.e., the first principle component of agricultural suitability and the percent of arable land), the suitability of land for plow-positive crops, plow-negative crops, and pasture, absolute latitude, and the average country-level distance to the coast or navigable river.


  • Acemoglu, D., Johnson, S., & Robinson, J. (2001). The colonial origins of comparative development: An empirical investigation. American Economic Review, 91(5), 1369–1401.

    Article  Google Scholar 

  • Acemoglu, D., Johnson, S., & Robinson, J. (2002). Reversal of fortune: Geography and institutions in the making of the modern world income distribution. Quarterly Journal of Economics, 107, 12311294.

    Google Scholar 

  • Acemoglu, D., Johnson, S., & Robinson, J. (2005). The rise of Europe: Atlantic trade, institutional change and economic growth. American Economic Review, 95(3), 546–579.

    Article  Google Scholar 

  • Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., & Wacziarg, R. (2003). Fractionalization. Journal of Economic Growth, 8(2), 155–194.

    Article  Google Scholar 

  • Alesina, A., Giuliano, P., & Nunn, N. (2013). On the origins of gender roles: Women and the plough. Quarterly Journal of Economics, 128(2), 469–530.

    Article  Google Scholar 

  • Alsan, M. (2012). The effect of the tsetse fly on African development. Working Paper.

  • Anderson, B., & Vullo, C. (1994). Did malaria select for primary adult lactase deficiency? Gut, 35(10), 1487–1489.

    Article  Google Scholar 

  • Ashraf, Q., & Galor, O. (2011). Dynamics and stagnation in the Malthusian epoch. American Economic Review, 101(5), 2003–2041.

    Article  Google Scholar 

  • Ashraf, Q., & Galor, O. (2013). The “Out of Africa” hypothesis, human genetic diversity, and comparative economic development. American Economic Review, 103(1), 1–46.

    Article  Google Scholar 

  • Bersaglieri, T., Sabeti, P., Patterson, N., Vanderploeg, T., Schaffner, S., Drake, J., et al. (2004). Genetic signatures of strong recent positive selection at the lactase gene. The American Journal of Human Genetics, 74, 1111–20.

    Article  Google Scholar 

  • Bockstette, V., Chanda, A., & Putterman, L. (2002). States and markets: the advantage of an early start. Journal of Economic Growth, 8, 155–194.

    Google Scholar 

  • Burger, J., Kirchner, M., Bramanti, B., Haak, W., & Thomas, M. (2007). Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proceedings of the National Academy of Sciences, 104, 3736.

    Article  Google Scholar 

  • Cavalli-Sforza, L. L., Menozzi, P., & Piazza, A. (1994). The history and geography of human genes. Princeton, NJ: Princeton University Press.

  • Chanda, A., & Putterman, L. (2007). Early starts, reversals and catch-up in the process of economic development. Scandinavian Journal of Economics, 109(2), 387–413.

    Article  Google Scholar 

  • Clark, G. (2008). A farewell to alms: A brief economic history of the world. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Coelho, M., Luiselli, D., Bertorelle, G., Lopes, A., Seixas, S., Destro-Bisol, G., et al. (2005). Microsatellite variation and evolution of human lactase persistence. Human Genetics, 117, 329–339.

    Article  Google Scholar 

  • Cohen, M., & Armelagos, G. (1984). Paleopathology at the origins of agriculture. New York, NY: Academic Press.

    Google Scholar 

  • Comin, D., Easterly, W., & Gong, E. (2010). Was the wealth of nations determined in 1000 BC? American Economic Journal—Macroeconomics, 2(3), 65–97.

    Article  Google Scholar 

  • Cook, C.J. (forthcoming). Potatoes, milk, and the Old World population boom. Journal of Development Economics.

  • Cook, G., & Al-Torki, M. (1975). High intestinal lactase concentrations in adult Arabs in Saudi Arabia. British Medical Journal, 135–136.

  • Cooper, M., & Spillman, W. (1917). Human food from an acre of staple farm products. Washington, D.C.: US Department of Agriculture.

    Google Scholar 

  • Copley, M., Berstan, R., Dudd, S., Docherty, G., Mukherjee, A., Straker, V., et al. (2003). Direct chemical evidence for widespread dairying in prehistoric Britain. Proceedings of the National Academy of Sciences USA, 100, 1524.

    Article  Google Scholar 

  • Craig, O., Taylor, G., Mulville, J., Collins, M., & Parker Pearson, M. (2005). The identification of prehistoric dairying activities in the Western Isles of Scotland: An integrated biomolecular approach. Journal of Archaeological Science, 32, 91–103.

    Article  Google Scholar 

  • Di Sabatino, A., & Corazza, G. (2009). Coeliac disease. Lancet, 373, 14801493.

    Article  Google Scholar 

  • Diamond, J. (1997). Guns, Germs, and Steel. New York, NY: W.W. Norton & Company.

    Google Scholar 

  • Dunne, J., Evershed, R., Salque, M., Cramp, L., Bruni, S., Ryan, K., et al. (2012). First dairying in green Saharan Africa in the fifth millennium BC. Nature, 486, 390–394.

    Article  Google Scholar 

  • Engerman, S., & Sokoloff, K. (1997). Factor endowments, institutions, and differential paths of growth among New World economies: A view from economic historians of the United States. In Stephen Harber (Ed.), How Latin America fell behind (pp. 260–304). Stanford, CA: Stanford University Press.

    Google Scholar 

  • Engerman, S., & Sokoloff, K. (2002). Factor endowments, inequality, and paths of development among New World economies. NBER Working Paper 9259. Cambridge, MA: National Bureau of Economic Research.

  • Evershed, R., Payne, S., Sherratt, A., Copley, M., Coolidge, J., Urem-Kotsu, D., et al. (2008). Earliest date for milk use in the Near East and southeastern Europe linked to cattle herding. Nature, 455, 528–531.

  • Fallang, L., Bergseng, E., Hotta, K., Berg-Larsen, A., Kim, C., & Sollid, L. (2010). Differences in the risk of celiac disease associated with HLA-DQ2.5 or HLA-DQ2.2 are related to sustained gluten antigen presentation. Nature Immunology, 10(10), 1096–1102.

    Article  Google Scholar 

  • Fasano, A., Berti, I., Gerarduzzi, T., Not, T., Colletti, R., & others (2003). Prevalence of celiac disease in at-risk and not-at-risk groups in the United States: a large multicenter study. Archives of Internal Medicine, 163(3), 286–292.

  • Galor, O., & Michalopoulos, S. (2012). Evolution and the growth process: Natural selection of entrepreneurial traits. Journal of Economic Theory, 147(2), 759–780.

    Article  Google Scholar 

  • Galor, O., & Moav, O. (2002). Natural selection and the origin of economic growth. Quarterly Journal of Economics, 117(4), 1133–1191.

    Article  Google Scholar 

  • Galor, O., & Moav, O. (2007). The Neolithic origins of contemporary variation in life expectancy. Brown University Department of Economics Working Paper 2007–14.

  • Gallup, J., Sachs, J., & Mellinger, A. (1999). Geography and economic development. CID Working Paper No. 1, March 1999.

  • Gonzalez-Galarza, F., Christmas, S., Middleton, D., & Jones, A. (2011). Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Research, 39(Database Issue), D913–D1009.

    Article  Google Scholar 

  • Hartl, D., & Clark, A. (2006). Principles of population genetics (4th ed.). Sunderland, MA: Sinauer.

    Google Scholar 

  • Hibbs, D., & Olsson, O. (2004). Geography, biogeography, and why some countries are rich and others are poor. Proceedings of the National Academy of Sciences USA, 101, 3715–3720.

    Article  Google Scholar 

  • Heston, A., Summers, R., & Aten, B. (2012). Penn World Table Version 7.1. Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania.

  • Hoppe, C., Molgaard, C., & Michaelsen, K. (2006). Cow’s milk and linear growth in industrialized and developing countries. Annual Review of Nutrition, 26, 131–173.

    Article  Google Scholar 

  • Ingram, C., Mulcare, C., Itan, Y., Thomas, M., & Swallow, D. (2009a). Lactose digestion and the evolutionary genetics of lactase persistence. Human Genetics, 124(6), 579–591.

    Article  Google Scholar 

  • Ingram, C., Raga, T., Tarekegn, A., Browning, S., Elamin, M., Bekele, E., et al. (2009b). Multiple rare variants as a cause of a common phenotype: Several different lactase persistence associated alleles in a single ethnic group. Journal of Molecular Evolution, 69, 577–588.

    Article  Google Scholar 

  • Jain, A., Hsu, T., Freedman, R., & Chang, M. (1970). Demographic aspects of lactation and postpartum amenorrhea. Demography, 7, 255–271.

    Article  Google Scholar 

  • Kiszewski, A., Mellinger, A., Spielman, A., Malaney, P., Sachs, S., & Sachs, J. (2004). A global index representing the stability of malaria transmission. The American Journal of Tropical Medicine and Hygiene, 70(5), 486–498.

    Google Scholar 

  • La Porta, R., Lopez-de-Silanes, F., Shleifer, A., & Vishny, R. (1999). The quality of government. Journal of Law, Economics and Organization, 15, 222–279.

    Article  Google Scholar 

  • Lewis, M. P. (2009). Ethnologue: Languages of the World (Sixteenth ed.). Dallax, TX: SIL International.

    Google Scholar 

  • McEvedy, C., & Jones, R. (1976). Atlas of world population history. New York, NY: Facts on File.

    Google Scholar 

  • Michalopoulos, S. (2012). The origins of ethno-linguistic diversity. American Economic Review, 102(4), 1508–1539.

    Article  Google Scholar 

  • Mulcare, C. (2006). The evolution of the lactase persistence phenotype. London: University of London.

    Google Scholar 

  • Murdock, G., & White, D. (1969). Standard cross-cultural sample. Ethnology, 8(4), 329–369.

    Article  Google Scholar 

  • Nielsen, R., Williamson, S., Kim, Y., Hubisz, M., Clark, A., & Bustamante, C. (2005). Genomic scans for selective sweeps using SNP data. Genome Research, 15, 1566–1575.

    Article  Google Scholar 

  • Nordhaus, W. (2006). Geography and macroeconomics: New data and new findings. Proceedings of the National Academy of Sciences USA, 103(10), 3510–3517.

    Article  Google Scholar 

  • Nunn, N. (2008). The long term Effects of Africa’s slave trades. Quarterly Journal of Economics, 123(1), 139–176.

    Article  Google Scholar 

  • Nunn, N. (2009). The importance of history for economic development. Annual Review of Economics, 1(1), 65–92.

    Article  Google Scholar 

  • Nunn, N., & Puga, D. (2012). Ruggedness: The blessing of bad geography in Africa. Review of Economics and Statistics, 94(1), 20–36.

    Article  Google Scholar 

  • Nunn, N., & Qian, N. (2011). The potato’s contribution to population and urbanization: Evidence from a historical experiment. Quarterly Journal of Economics, 126(2), 593–650.

    Article  Google Scholar 

  • Plantinga, T., Alonso, S., Izagirre, N., et al. (2012). Low prevalence of lactase persistence in Neolithic South-West Europe. European Journal of Human Genetics, 20, 778–782.

  • Putterman, L. (2008). Agriculture, diffusion, and development: Ripple effects of the Neolithic Revolution. Economica, 75, 729–748.

    Article  Google Scholar 

  • Putterman, L., & Trainor, C. (2006). Agricultural Transition Year Country Data Set.

  • Putterman, L., & Weil, D. (2010). Post-1500 population flows and the long-run determinants of economic growth and inequality. Quarterly Journal of Economics, 125(4), 1627–82.

    Article  Google Scholar 

  • Ramankutty, N., Foley, J., Norman, J., & McSweeney, K. (2002). The global distribution of cultivable lands: Current patterns and sensitivity to possible climate change. Global Ecology and Biogeography, 11(5), 377392.

    Article  Google Scholar 

  • Simoons, F. (1969). Primary adult lactose intolerance and the milking habit: A problem in biological and cultural interrelations. I. Review of the medical research. The American Journal of Digestive Diseases, 14, 819–836.

    Article  Google Scholar 

  • Simoons, F. (1970). Primary adult lactose intolerance and the milking habit: A problem in biological and cultural interrelations. II. A culture historical hypothesis. The American Journal of Digestive Diseases, 15, 695–710.

    Article  Google Scholar 

  • Simoons, F. (1978). The geographic hypothesis and lactose malabsorption: A weighing of the evidence. The American Journal of Digestive Diseases, 23, 963–80.

    Article  Google Scholar 

  • Spolaore, E., & Wacziarg, R. (2009). The diffusion of development. Quarterly Journal of Economics, 124(2), 469–592.

    Article  Google Scholar 

  • Tishkoff, S., Reed, F., Ranciaro, A., Voight, B., Babbitt, C., Silverman, J., Powell, K., Mortensen, H., Hirbo, J., Osman, M., et al. (2006). Convergent adaptation of human lactase persistence in Africa and Europe. Nature Genetics, 39, 31–40.

  • Verardi, V., & Croux, C. (2009). Robust regression in Stata. The Stata Journal, 9(3), 439–453.

    Google Scholar 

  • Vuorisalo, T., Arjamaa, O., Vasemagi, A., & Taavitsainen, J. (2012). High lactose tolerance in North Europeans: A result of migration, not in situ milk consumption. Perspectives in Biology and Medicine, 55(2), 163–174.

    Article  Google Scholar 

  • Wint, W., & Rogers, D. (2000). Predicted distributions of tsetse in Africa. DFID working paper.

  • World Health Organization. (1998). The World Health Organization multinational study of breast-feeding and lactational amenorrhea. I. Description of infant feeding patterns and of the return of menses. Fertility and Sterility, 70(3), 448–460.

  • World Health Organization. (2009). Milk fluoridation for the prevention of dental carries. Geneva: WHO.

Download references


I owe thanks to Areendam Chanda for thoughtful discussion and direction; to Quamrul Ashraf, James Feyrer, Oded Galor, David Weil, participants at the 2013 Deep Determinants of International Comparative Development Conference at Brown University, participants at the 2011 Integrating Genetics and the Social Sciences Conference at the University of Colorado, participants at the LSU 3rd year paper presentation, and three anonymous referees for helpful comments and suggestions; and to Stelios Michalopolous for sharing data on agricultural suitability and Anastasia Litina for suggesting the use of pastoral suitability. All errors and omissions are my own.

Author information

Authors and Affiliations


Corresponding author

Correspondence to C. Justin Cook.

Additional information

The majority of this work was completed as part of my dissertation at Louisiana State University.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 145 KB)

Supplementary material 2 (zip 116 KB)


Appendix 1: Frequency of lactase persistence for 1,500 CE

See Tables 9 and 10.

Table 9 Ethnic lactase persistence frequencies
Table 10 Country level lactase persistence frequencies

Appendix 2: Variable definitions and sources (alphabetical order)

Absolute latitude

The absolute value of a country’s representative latitude. Representative latitude is given by the centroid latitude of a country from The World Factbook (2011).

Average of tsetse suitability index

An average measure of the ecological suitability of three species of tsetse fly: Fusca, Morsitans, and Palpalis. Data are from Wint and Rogers (2000).

Distance from coast or navigable river

The average distance in thousands of kilometers from an ice-free coast or navigable river. This variable is from the Center for International Development, which is derived from Gallup et al. (1999).

Distance to technology frontier

Technology frontiers are the two largest cities within each continent belonging to differing polities. For the Old World, the frontiers are London (UK), Paris (France), Cairo (Egypt), Fez (Morocco), Constantinople (Turkey), and Peking (China). Country-level distances, in thousands of kilometers, are calculated by the distance from a country’s modern capital to the closest frontier. These data are from Ashraf and Galor (2011).

Fraction within boreal

The fraction of a country within a Köppen–Geiger boreal climate. These data come from Gallup et al. (1999).

Fraction within desert

The fraction of a country with sandy desert, dunes, rocky or lava flows. These data come from Nunn and Puga (2012).

Fraction within dry Temperate

The fraction of a country within a Köppen–Geiger dry temperate climate. These data come from Gallup et al. (1999).

Fraction within Polar

The fraction of a country within a Köppen–Geiger polar climate. These data come from Gallup et al. (1999).

Fraction within Tropics

The fraction of a country with a Köppen–Geiger tropical climate. These data come from Nunn and Puga (2012).

Fraction within Subtropics

The fraction of a country with a Köppen–Geiger subtropical climate. These data come from Gallup et al. (1999).

Fraction within wet Temperate

The fraction of a country within a Köppen–Geiger wet temperate climate. These data come from Gallup et al. (1999).

The frequency of DQ 2.5

This variable is intended to be a proxy for the frequency of Celiac Disease. The data represent the fraction of a country’s population containing the HLA-DQ 2.5 haplotype, or gene variant combination. The DQ 2.5 haplotype is the dual occurence of DQA1*0501 and DQB1*0201 genes. These data are given at the ethnic level, which are then matched to ethnic groups given by Ingram et al. (2009a) and aggregated to the country level by 1,500 ethnic compositions. Data at the ethnic level can be found at (Gonzalez-Galarza et al. 2011).

The frequency of lactase persistence in 1,500 CE

Lactase persistence frequencies for Old World ethnicities are given by Ingram et al. (2009a). Ethnic data are then aggregated to the country level by matching ethnic groups from Ingram et al. (2009a) to compositions in Alesina et al. (2003) by language group similarities. This gives a contemporary, country-level measure for the frequency of lactase persistence. Contemporary ethnic compositions are modified by the inverse of the Putterman and Weil Migration Matrix (2010) to create representative ethnic compositions for the year 1,500 CE.

GDP per capita in 2,000

PPP converted GDP per capita in constant 2005 constant prices. Data come from the Penn World Table, version 7.1 (Heston et al. 2012).

Genetic distance from the U.K.

Genetic distance is a measure of genetic diversity between societies. This measure is calculated with the fixation index, or \(F_{ST}\), from population genetics and measures the variation in gene frequencies across differing groups. \(F_{ST}\) scores are given for 42 indigenous populations; the data come from Cavalli-Sforza et al. (1994). The genetic distance measures are then aggregated to the country level by Spolaore and Wacziarg (2009), from which genetic distance to the UK is found for 206 countries. The UK is chosen as the technology frontier in 1,500 CE. Genetic distance from this frontier is intended to convey difficulty in the diffusion of technology.

Genetic diversity

Genetic diversity is the predicted country-level heterozygosity based on migratory distance from East Africa. These data represent the probability that two randomly selected individuals contain different gene variants at the same locus. The data are from Ashraf and Galor (2013).

Historic intensity of animal husbandry

A measure from 0 to 1 % that measures an ethnicity’s historic dependency on animal husbandry. The measure is aggregated to the country-level by the contemporary ethnic composition of a country found within the Ethnologue (Lewis 2009). These data are from Alesina et al. (2013).

Historic economic development

An aggregated country-level index from 1 to 8 that represents historic economic development–nomadic or fully migratory, semi-nomadic, semi-sedentary, compact but temporary settlements, neighborhoods of dispersed family homes, separated hamlets forming a single community, compact and relatively permanent, complex settlements–for ethnicities within modern country borders. These data are from Alesina et al. (2013).

Land productivity

Land productivity is the first principle component between a country’s fraction of arable land and the country’s suitability of agriculture. The fraction of arable land comes from the World Development Indicators. Suitability of agriculture is an index capturing soil and climate conditions favorable for agriculture. Suitability data are from Ramankutty et al. (2002) and aggregated to the country level by Michalopoulos (2012). Land productivity data are adopted from Ashraf and Galor (2011).

Malaria ecology index

The malaria ecology index takes into account differences in the environment and mosquito vectors that contribute to the spread of malaria. These data come from Kiszewski et al. (2004).

Mean elevation

The average elevation in kilometers above sea-level. Data are from Nordhaus (2006) by way of Ashraf and Galor (2013).

Mean precipitation

The average yearly precipitation in millimeters within a country between 1960 and 1990. Data are from Nordhaus (2006) by way of Ashraf and Galor (2013).

Mean temperature

The average yearly temperature in Celsius within a country between 1960 and 1990. Data are from Nordhaus (2006) by way of Ashraf and Galor (2013).

Member of the Roman Empire

An indicator variable coded to one for countries with Roman heritage that were part of the Roman Empire but not belonging to the Ottoman Empire. These include Belgium, Britain, France, Italy, the Netherlands, Portugal, Spain, and Switzerland. These data are from Acemoglu et al. (2005).

Millennia of agriculture

The millennia since the majority of a country’s population adopted agriculture for subsistence. These data are from Putterman and Trainor (2006).

Number of potential domesticate animals

The number of prehistoric, native animals that were a potential source of domestication within a country. These data are from Hibbs and Olsson (2004).

Number of potential domesticate plants

The number of prehistoric, native plants that were a potential source of domestication within a country. These data are from Hibbs and Olsson (2004).

Population density in 1, 1,000, and 1,500 CE

Population data for 1, 1,000, and 1,500 CE come from McEvedy and Jones (1976). Land area for each country is based on contemporary borders and is from the World Development Indicators. These data are adopted from Ashraf and Galor (2011).


Ruggedness represents the average standard deviation of grid elevation within a country. These data come from Nunn and Puga (2012).

State history in 1,500

This variable measures differing levels of societal formation in 50 year intervals from 1 CE to 1,500 CE. The data are then aggregated to form an index for state history. These data come from Chanda and Putterman (2007).

Suitability of land for pasture

A suitability index, ranging from 0 to 1, for pasture within 5 arc-minute by 5 arc-minute grids, which is then averaged within modern country borders. This index takes into account climate, soil, and and terrain conditions necessary in developing grasslands. The raster data can be found at

Suitability of land for plow-negative crops

The agricultural suitability for sorghum, maize, millet, roots, tubers, and tree crops. Data are from Alesina et al. (2013).

Suitability of land for plow-positive crops

The agricultural suitability for wheat, teff, barley, and rye. Data are from Alesina et al. (2013).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cook, C.J. The role of lactase persistence in precolonial development. J Econ Growth 19, 369–406 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Historical development
  • Genetic diversity
  • Neolithic Revolution
  • Population density

JEL Classification

  • O13
  • N5
  • Z13