The origins of cultural divergence: evidence from Vietnam

Cultural norms diverge substantially across societies, often within the same country. We propose and investigate a self-domestication/selective migration hypothesis, proposing that cultural differences along the individualism–collectivism dimension are driven by the out-migration of individualistic people from collectivist core regions of states to peripheral frontier areas, and that such patterns of historical migration are reflected even in the current distribution of cultural norms. Gaining independence in 939 CE after about a thousand years of Chinese colonization, historical Vietnam emerged in the region that is now north Vietnam with a collectivist social organization. From the eleventh to the eighteenth centuries, historical Vietnam gradually expanded its territory southward to the Mekong River Delta through repeated waves of conquest and migration. Using a nationwide household survey, a population census, and a lab-in-the-field experiment, we demonstrate that areas annexed earlier to historical Vietnam are currently more prone to collectivist norms, and that these cultural norms are embodied in individual beliefs. Relying on many historical accounts, together with various robustness checks, we argue that the southward out-migration of individualistic people during the eight centuries of the territorial expansion is an important driver, among many others, of these cultural differences.


Introduction
Economic research has uncovered strong associations between many cultural traits and various indicators of individual behavior, and institutional and economic development (e.g., Guiso et al., 2011;Fernández, 2011;Algan & Pierre, 2014;Doepke & Fabrizio, 2014;Alesina & Giuliano, 2014. Among the cultural traits, the individualism-collectivism dimension has been found to be a powerful predictor of economic and democratic development in a large sample of countries (Gorodnichenko & Roland, 2011, 2017. 1 These empirical findings lead us to an important question: Why are some societies more or less collectivistic or individualistic than others?
In the present paper, we hypothesize that cultural differences along the individualism-collectivism dimension across modern societies can be traced back to repeated processes of territorial expansion and migration that happened in historical times. In particular, we advance a selective migration hypothesis, consisting of three building blocks. First, in regions where settled agriculture and states arose early, collectivist societies emerged through a process of self-domestication as communities made the transition from huntergatherer strategies of food procurement, which were characterized by individualism, to agricultural food production, resulting in a gradual strengthening of "civilizing" collectivism. Second, these collectivist societies triggered the out-migration of individualistic members toward peripheral areas. This pattern then repeated itself as the individualistic migrants inhabited and developed these peripheral areas into less collectivistic societies compared to the ones they left behind, which in turn induced more individualistic members to migrate toward more peripheral areas. Eventually, these migration processes gave rise to cultural differences along the individualism-collectivism dimension across societies. Third, owing to the slow-moving nature of culture, these differences have persisted over time and constitute an important feature of the cultural landscapes exhibited in modern times. As a result, the time elapsed since the collectivist transformation can predict the strength of collectivism across modern societies.
Testing the selective migration hypothesis requires a historical setting where there was a large out-migration of people from a collectivist society in the core region to settle down in new regions with a similar biogeography as in the core region, and that this migration repeated over time once collectivism was gradually strengthened in the new regions. We find such an ideal setting in the process of territorial expansion and migration in historical Vietnam. Gaining independence from the colonization of imperial China during the first millennium, historical Vietnam initially governed the region of what is now north Vietnam with a centralized government and a collectivist social organization. At the same time, the territory in the south of historical Vietnam was sparsely populated by many ethnic tribes that did not have a centralized government. From 968 to 1757, historical Vietnam gradually expanded its territory southward to the Mekong River Delta to establish the country as it is today (see Fig. 1). This process happened through successive waves of state conquest followed by civil migration, resulting in the displacement of most of the population of local ethnic tribes. Applying the logic of the selective migration hypothesis, we argue that the time elapsed since annexation to historical Vietnam is an important predictor of the strength of collectivism across regions within contemporary Vietnam.
To test the selective migration hypothesis, an ideal empirical strategy would consist of three integral parts. The first part should demonstrate that some early agricultural states were characterized by collectivism, and that people who migrated to the new territories following state expansions were less collectivistic or more individualistic than those who stayed. The second requires historical data to prove that these selective migrations gave rise to early cultural differences along the individualism-collectivism dimension between the initial regions and the new territories. Finally, the third part involves using present-day data to conduct an empirical analysis on the relationship between the time elapsed since the collectivist transformation and the strength of collectivism.
To match this ideal empirical strategy in the context of Vietnam, we first present qualitative accounts to demonstrate that the initial society of historical Vietnam was characterized by strong collectivist norms. Second, we examine available primary records on the territorial expansion of historical Vietnam to identify the categories of people who migrated to the new annexed territories. In addition, we provide qualitative accounts and descriptive statistics to show that cultural differences along the individualism-collectivism dimension across regions were already present in Vietnam in the Seventeenth century. Third, we provide empirical evidence for a positive relationship between the time elapsed since an area was annexed to historical Vietnam and various indicators of collectivism in the present day. Using different robustness checks, we further show that these empirical findings are consistent with the self-domestication/selective migration hypothesis.
To capture the strength of collectivism, we focus on the societal ability to solve collective action problems, which is the main feature of collectivism studied in related economic models (Gorodnichenko & Roland, 2015, 2017. What constitutes a collective action, of course, varies significantly across societies. In Vietnam, labor contribution to public goods production is a typical collective action (Adams & Hancock, 1970). In particular, every year households in a local area send their members to work without payment to build or repair local public infrastructure such as roads, wells, irrigation, schools, and health clinics. Because collectivist societies are considered to be better at solving collective action problems, it should be able to mobilize a larger amount of voluntary labor contribution to public goods production from their in-group members.
Using data on voluntary labor contribution to public goods production from around 30,000 households in the Vietnam Household Living Standard Survey, we aggregate three related indicators at the district level: (1) the percentage of households contributing labor, (2) the average number of persons per household making labor contributions, and (3) the average number of labor days contributed per household. We find that districts annexed earlier to historical Vietnam currently have higher percentages of households contributing labor, more members per household making labor contributions, and more labor days contributed per household. The estimated effects are economically and statistically significant, and robust to the inclusion of potential confounding factors, various sub-samples, and omitted-variable bias, among other checks.
In addition, we also use data from the Population and Housing Census (covering in total 16.5 million households nationwide) to construct two measures of individualism-collectivism traits that have frequently been used in the literature as proxies for such cultural norms (Vandello & Cohen, 1999;Talhelm et al., 2014): Extended family structure and marriage stability. In line with our hypothesis, we find that households in districts that were annexed earlier have a higher percentage of households with grandchildren living in them and a lower prevalence of divorced households.
1 3 Fig. 1 The Vietnamese Southern advance Note: The year at which a district was annexed into historical Vietnam. See Online Appendix A for data sources 1 3 To examine in-group cooperation in more detail, we conduct a lab-in-the-field public goods experiment with high school students from the same districts, including an earlierannexed district and a later-annexed district. This is a subject pool who are old enough to be aware of the cooperative norms in their communities, but not yet affected by living outside their communities. The advantage of the experiment is that the institutional setting can be kept constant, which helps ruling out the influences of informal institutions on cooperation behaviors. More importantly, the experimental design allows us to examine if the difference in the contribution to public goods between the two chosen districts is driven by a difference in preferences for cooperation or a difference in beliefs about the cooperative behaviors of others. We find that subjects from the earlier-annexed district contribute substantially more in the public goods experiments compared to subjects from the laterannexed district, and that the result is mainly driven by the belief about the contribution levels of the other subjects. Thus, the experimental findings corroborate the survey data analysis and further suggest that cultural differences across Vietnamese regions are embodied in individual beliefs.
Our research relates to a growing multidisciplinary literature examining the origins of cultural differences along the individualism-collectivism dimension. 2 Theories based on ecological context posit that some forms of production in subsistence economies (e.g., farming) require more functional interdependence than others (e.g., hunting), which gave rise to collectivism as an adaptation mechanism (e.g., Vandello & Cohen, 1999;Talhelm et al., 2014). In a recent paper, Buggle (2020) documents that societies where irrigation agriculture was practiced tend to have stronger collectivist norms (and a lower degree of innovative activities) even today. In related research, Bentzen et al. (2017) show that historical irrigation is also associated with autocratic governance. Litina (2016) argues further that lower level of natural land productivity increased the return to public agricultural infrastructure, which generated higher incentives for cooperation to solve the problem of collective action. 3 Motivated by the history of settlement in the United States and its highly individualist culture, Kitayama et al. (2006Kitayama et al. ( , 2009 proposed the voluntary settlement hypothesis, asserting that settlers in frontier areas are likely to have highly autonomous, independent, and goal-oriented mindsets. Bazzi et al. (2020) expand on this theme and study the cultural legacy of the nineteenth century westward expansion in the United States. The authors show that contemporary individualism is stronger in historical frontier areas and that a selective migration of individualists to the periphery explains part of this pattern, along with the particular characteristics of wilderness and isolation in the west. Knudsen (2019) finds a similar pattern of selective migration among Scandinavian migrants to the United States in the nineteenth century, using uncommonness of first names as a proxy for the degree of individualism. She also documents that the out-migration of strong individualists made the home regions more collectivist in the long run. Giavazzi et al. (2019) study the evolution of preferences among European immigrants to the United States and find that the persistence in cultural attitudes is substantial across the spectrum of values. Using a similar line of argument as in the current paper, Olsson and Paik (2016) show that collectivism is 1 3 stronger in regions across western Eurasia that adopted agriculture earlier during the Neolithic Revolution.
The present paper builds on and adds to this literature in various ways. To the best of our knowledge, no studies on the origins of cultural differences along the individualism-collectivism dimension have examined the societal ability to solve the problem of collective action, especially using a combination of survey and experimental data. Furthermore, most studies so far have either employed cross-country comparisons or concentrated on currently developed societies. Because these societies have gone through the modernization process to a greater extent, the reduction of the traditional cultural landscapes makes it harder to study the historical origins of cultural differences. By comparing different regions within a single and biogeographically homogenous developing country that experienced a relatively recent economic modernization, our research is able to overcome these limitations.
In a related recent study, Dell et al. (2018) use a border regression discontinuity design along a border segment in southern Vietnam which they claim was a stable demarcation between historical Vietnam and tributary polities to the Khmer Empire from 1698 to the 1830s. The main hypothesis is that the presence of a centralized historical state should crowd in local collective action, which in turn was beneficial for subsequent economic development. The authors show that living standards are currently higher in the border areas governed for a longer period of time by the centralized states of historical Vietnam. As one potential mechanism for this result, Dell et al. (2018) explore whether historical institutions contributed to a greater ability of collective action, measured by participation in civil society organizations.
Our research differentiates from Dell et al. (2018) in the following ways. First, our emphasis is on a mechanism of selective migration rather than on crowding in of norms by a powerful state. In the sections below, we argue that our mechanism has strong support in the historical literature on Vietnam, as well as reflecting a general pattern throughout Southeast Asia (Scott, 2009). It should be recognized though that the two hypotheses are strongly linked and that the true historical process probably had elements of both crowding in and selective migration. Second, our main outcome variable is cultural norms of collectivism-individualism rather than indicators for economic development. We would argue our research is complementary since it investigates a different cultural dimension for understanding long-run economic development. Third, rather than using a border regression discontinuity design as in Dell et al. (2018), our main empirical strategy is to exploit a countrywide sample of districts across all of Vietnam. Our basic rationale for this strategy is that our coding of the official chronicles of historical Vietnam suggests a more or less continuous process of state expansion that was completed in 1757. 4 The remainder of the present paper is organized as follows. The next section discusses in detail the conceptual framework behind the selective migration hypothesis. Section 3 provides the historical background of the southward territorial expansion of historical Vietnam and the accompanying migration process, with a focus on the three building blocks of the selective migration hypothesis. Section 4 presents the empirical analysis with survey data. Section 5 describes the sample selection, experimental design, and corresponding results. Section 6 closes the paper with some concluding remarks.

Conceptual framework
In this section, we first define the individualism-collectivism dimension in the cultural repertoire of a population. We then outline a theory of selective migration and cultural divergence along the individualism-collectivism dimension. This theory is the backbone of the selective migration hypothesis.

Individualism versus collectivism
Research on the individualism-collectivism dimension of culture was first initiated within social psychology. Many of the key insights were summarized by Triandis (1995) and in subsequent cross-country empirical research by Hofstede (2001). In this voluminous literature, collectivism is considered to be characterized by a strong focus on the goals of a collective that forms the in-group boundary. In other words, the goals of the individual are subordinate to the goals of the collective, and the individual willingly makes costly sacrifices for the group. The individual typically has low self-expression and self-esteem, and an interdependent sense of agency. Family, duty, honor, and respect for the elders are central for collectivists. On the macro level, collectivist societies are typically characterized by a highly stratified or autocratic leadership (sometimes referred to as vertical collectivism) and hostility towards out-groups.
Individualism is the opposite of collectivism on all the features mentioned above. There is a strong focus on the goals of the individual, and the in-group identity is weak. The goals of the individual are superior to the goals of the collective, and the individual is typically unwilling to make costly contributions to the group at the expense of himself or herself. The individual has a strong sense of personal agency, high self-expression and self-esteem. The extended family does not play a central role, and individual preferences and fulfillment are more important than duty and honor. Individualists tend to live in egalitarian societies, are not very loyal to their fellow in-group members, and are willing to cooperate with outgroup members (Triandis, 1995). 5 How do these cultural norms translate into economic behaviors? This issue has recently been studied in a series of papers by Roland (2011, 2017). The authors outline a hypothesis and demonstrate empirically that societies characterized by individualism are less bound by rules and authority, reward personal achievement, and hence tend to 1 3 be associated with fewer constraints and stronger incentives for innovation. Analogously, the strong norms towards in-group cooperation, combined with subordination of the self to the goals of the collective, give collectivists a comparative advantage for collective action and public goods production that under certain circumstances might be necessary for the in-group to survive. Individualistic societies are thus more loosely held together but are on the other hand more dynamic, whereas collectivist societies have tight social ties and effective cooperation but limited growth potential in the longer run. 6

Selective migration and cultural divergence
To develop the self-domestication/selective migration hypothesis, we build on Triandis (1995) and Olsson and Paik (2016), and assign a crucial role to the rise and consolidation of the early agricultural states. Before the first transition to agriculture about 12,000 years ago, all societies relied on hunting-gathering-fishing where the household of core family members was often the main unit of social organization. Households only stayed in larger camps during shorter periods but then splintered in order to avoid crowding and social tensions. In some environments, hunting required greater coordination, which sometimes led to larger and semi-sedentary social groups, but whenever possible, the basic tendency in pre-agricultural societies was autonomous households without stronger bonds or obligations to other in-group members (Johnson and Earle, 2000). 7 The first agricultural societies emerged in regions such as Southwest Asia and China. In these regions, a highly productive irrigation agriculture gave rise to a dense and sedentary population, living in crowded villages and depending on the cultivation of a few domesticated crops and animals. Compared to hunter-gatherer households in pre-agricultural societies, these early farming villages were characterized by a much greater degree of collectivism, where the goals of the collective were far more important than individual aspirations. The survival of such villages often required sophisticated public goods such as irrigation canals, protective walls, military defense, public granaries, and deep wells. Such projects were initiated, coordinated and supervised by a social elite that managed to control the great majority of the population.
By the 4th millennium BCE, the first states arose from such complex farming communities in Mesopotamia, and soon after that, also in Egypt (Borcan et al., 2021;Scott, 2017). As has recently been studied by Mayshar et al. (2017Mayshar et al. ( , 2019, the ability of the elite to economically exploit the great majority of farmers depended to a great extent on the transparency and appropriability of agricultural production. Mayshar et al. (2019) argue that crops such as wheat and rice were more appropriable for taxation purposes than tubers like potatoes. In a related paper, Mayshar et al. (2017) contend that the greater reliability of the Nile floods in comparison to those of Euphrates and Tigris, explains why Egypt had a more centralized and more durable state organization during antiquity than the various state formations that repeatedly emerged and collapsed in Mesopotamia. Comparing the attitudes of people from Chinese areas dominated by highly labor-and coordination-intensive rice production with areas dominated by wheat cultivation, Talhelm et al. (2014) find that collectivist norms are stronger in rice areas. Thus, even among communities where a sedentary population practiced a cultivation of domesticated plants, there were great differences in the strength of individualism-collectivism depending on the specific character of the agricultural production process.
The domestication of plants and animals some 12,000 years ago did not only fundamentally change human subsistence production, it also gradually domesticated humans themselves (Johnson & Earle, 2000;Scott, 2017). The term domestication denotes a longterm relationship where one species strongly influences and controls the reproduction and care of another species so that the domesticated species loses its capacity to survive in its original habitat. The most often discussed of such relationships is of course modern man's domestication of plants and animals during the Neolithic.
By self-domestication, we mean the process whereby a new kind of subsistence food production and its associated social organization, induced humans to adopt a more collaborative and docile social behavior, which in turn implied a loss of capacity to survive on their own. One might for instance argue that the control of fire and the universally adopted inventions of cooking, several hundred thousand years ago, implied a first self-domestication of hominins. 8 The Neolithic transition to agriculture, was a second self-domestication event. In the permanent early farming villages, food production was more functionally interdependent across households and people had to live together in much larger units than before and without the option of fissioning and setting up camp elsewhere as in most hunting and gathering communities. The natural inclinations towards family-level social units were overcome through a strong selective pressure favoring individuals and groups who could more successfully adapt to the new lifestyle in the farming villages, with a higher pathogen load, more toil and work hours in the fields, a new diet with more carbohydrates and less protein, and more children per woman.
In addition to these changes, we argue that there must have been a very strong pressure towards the adoption of collectivist norms. It is well known that social stratification expanded with agriculture and the goals of the individual were suppressed for the benefit of the common good, involving larger and larger collective action projects such as irrigation, city walls, the construction of cult centers, and even massive burial complexes for divine rulers (Diamond, 1997). This development was not only evident in areas with irrigation agriculture. The fact that people became sedentary also in rain-fed areas implied the need for much stronger investments in permanent physical structures, property rights claims tied to a clan or lineage, and a stronger reliance on protection from a strongman or a proto-state. This kind of social organization would not have been possible without a great increase in the proliferation of collectivist norms. In the official narrative of this process, chroniclers of the early states would typically describe it as the introduction of civilization to an environment populated by previously primitive or barbaric tribes. We argue that the civilizing self-domestication process discussed above first emerged in regions with favorable conditions for agriculture, but it then repeated itself all around the world when farming replaced hunting-gathering and states arose from the dense farming communities. This self-domestication process probably included several related mechanisms. Evolution provides a selective advantage for individuals with genes that helped 1 3 them cope with the physiological, psychological and cultural challenges of intensive farming. In addition, there were presumably push factors such as a conscious "weeding out" of individualists who did not adapt to the new collectivist norms. Social exclusion or ostracism might be one mechanism whereby individualists were pushed from the collectivist core to peripheral areas.
There were surely also pull factors, inducing individualists to leave the collectivist core voluntarily in order to have a freer life at the periphery. In his description of the history of state formation in Southeast Asia, Scott (2009) describes the very conscious escape of population groups from the rice-growing core areas of the different states as "an art of not being governed". For these people, a withdrawal to peripheral areas was a key feature of their strategy of state evasion. Ho (2020) further shows that the Vietnamese state often tempted landless migrants with the prospect of obtaining private property rights in the new lands, rights that were sometimes retracted after a few generations when population density had increased.
A similar "pioneer spirit" was emphasized in Turner (1920)'s work on the westward expansion of settlers in North America, and more recently studied by Bazzi et al. (2020). It is also similar to Kitayama et al. (2006Kitayama et al. ( , 2009)'s notion of voluntary settlement of peripheral areas by individualistic people. As discussed by Bazzi et al. (2020), the adaptation to the living conditions in the "rugged frontier", where a strong sense of individual agency most likely was necessary for survival, surely also contributed to a greater degree of individualism even among people with collectivistic inclinations. As shown by Knudsen (2019), the out-migration of individualists probably contributed to making the culture in the core even more collectivist than before.
Typically, the peripheries to the original agricultural core region were sooner or later colonized by a collectivist farmer-state through territorial expansion. Evolutionary adaptation, push and pull forces then played out in a similar manner, making the peripheral population more collectivistic as well. The exact nature of these adaptations would depend importantly on the biogeographical characteristics of the settled peripheral areas, which in turn would determine the specific technology of agricultural production. But as described by Olsson and Paik (2016) in their application of the selective migration logic to the expansion of Neolithic agriculture throughout the Western hemisphere, the most individualistic people in the periphery would soon once again take off towards more peripheral areas in repeated frontier colonizations. Scott (2009) provides a narrative account of how rice-based states gradually expanded across Southeast Asia and provoked marginal population groups to settle the highlands or the more peripheral parts of the lowland plains. The core areas of the early states in contemporary Myanmar, Thailand, Cambodia and Vietnam were generally characterized by highly productive irrigated rice cultivation in the lowland valleys of major rivers such as the Irrawaddy (Myanmar), the Mekong (Cambodia) and Red River (Vietnam). Historical Southeast Asia had a much lower population density than India and China, which meant that the periphery was often a feasible alternative for population groups who, for different reasons, wanted to evade the influence of the central governments. There were many different strategies used and reasons for trying to evade the influence of the expanding states in the region. In Scott (2009, p. 326)'s own words: Those who for whatever reason wished to evade incorporation as subjects had to place themselves out of range either on the plains at greater remove from the core or in the less accessible hills. ... it is clear that the hills were populated increasingly by pulses of migration by state subjects fleeing valley kingdoms for any one of several reasons -corvée labor, taxes, conscription, war, struggles over succession, religious dissent -all having directly to do with state making.
Despite this unwillingness to be subjects of the expanding states, the populations in the periphery were often willing to engage in mutually beneficial trade. Their societies were non-hierarchical and fluid and often based on foraging or swiddening agriculture. The officials of the rice states typically considered the peripheral populations as uncivilized and barbaric (Scott, 2009). In the terminology of our framework above, we might describe them as non-domesticated individualists.
Since self-domestication, just like evolution, is a function of time, the penetration of collectivist norms was typically also an increasing function of the time elapsed since the "civilizing" collectivist transformation. In this manner, a gradient arose with the greatest degree of collectivism in the oldest regions and the highest degree of individualism in the youngest territories of the farmer-state. The slow-moving nature of culture implied that, centuries or even millennia after the first settlement of individualistic farmers, signals from these early migration processes are still visible in contemporary cultural record. 9 Nevertheless, as already argued by Triandis (1995), the Industrial Revolution in Europe, with innovation as a key driving factor, once again turned the tables and gave individualism an economic advantage in north European countries such as Britain and the Netherlands. Thus, we might expect that the collectivist legacy of the transition from hunting-gathering to farming should be weaker in countries where an industrial economy has existed for a longer period of time. In addition, Western colonization of regions outside Europe might change the indigenous cultural landscapes to a large extent. In some developing countries that only experienced industrialization recently and had strong indigenous states, the cultural imprint from the historical expansion of the collectivist farmer-state is more likely to be observable in the present day.

Historical background
In the previous section, we outlined a theory of self-domestication, selective migration and cultural divergence along the individualism-collectivism dimension. In this section, we survey historical materials to examine three building blocks of our theory in the context of Vietnam: (1) the initial region of historical Vietnam was home to a collectivist society; (2) individualistic people migrated southward as the country expanded its territory, eventually giving rise to cultural differences along the individualism-collectivism dimension; and (3) these cultural differences have persisted to the present day.

Core region of historical vietnam was a collectivist society
The history of modern humans in Southeast Asia goes back at least 65,000 years and genetic analyses suggest that the region was an important node in out-of-Africa migrations into China, East Asia, and Oceania (Pischedda et al., 2017). Archaeological evidence indicates that ancient populations, probably migrating from southern China, had settled down in the Red River Delta with rice agriculture around 2000 BCE during the Neolithic Revolution (Nguyen et al., 2004). They gradually assimilated or replaced indigenous hunter-gatherer groups in the area. The farming populations lived together, without a centralized state, in the region that is now north Vietnam (see Fig. 1). From 111 BCE to 939 CE, the whole region was brought under the colonization of the centralized bureaucracy of imperial China. During this period, "the Vietnamese evolved from a preliterate society within a "south-sea civilization" into a distinctive member of the East Asian cultural world" (Taylor, 1983, p. xvii).
After the victory over historical China in 939 CE, the first unified state of historical Vietnam was founded in 968 CE and inherited a centralized bureaucratic system from the Chinese colonizer (Taylor, 2013, p. 51-77). Thus, in terms of the theory discussed above, we might argue that Vietnamese society was, to a large extent, domesticated into a collectivist social organization by an external agent. The dominant ethnic group in this new state formation was the Kinh, speaking an Austroasiatic language with its roots in Southern China. Subsequent dynasties governing historical Vietnam continued to build stronger structures and orders into the society, which emphasized the values of social groups above the needs and desires of its constituent members (Whitmore, 1984(Whitmore, , 1997. The collectivist nature of historical Vietnam was best exemplified by its village-based administrative system and family organization. The village was the lowest administrative level, which was responsible for regulating almost all aspects of the daily living of its members (Nguyen , 2003). Two important responsibilities of the village were to allocate public land under its management to its members (Dao, 1993), and to organize unpaid labor for public goods production such as irrigation facilities, roads, and communal buildings (Adams & Hancock, 1970). With respect to the family, parents had absolute authority over their children in almost all aspects of life (e.g., education, marriage, and housing), while children had to serve and obey their parents with the utmost respect throughout their lives (Haines, 1984).
The area bordering historical Vietnam in the south, which is now central Vietnam (see Fig. 1), was inhabited by various ethnic groups that formed the Champa Kingdom. The dominant ethnic group at the time were the Cham, who was an Austronesian-speaking group that is believed to have orgininally settled central Vietnam sometime during the first millennium BCE. The settlement was part of a larger expansion of Austronesians, probably originating from Taiwan, across Indonesia, Melanesia and the Pacific Islands. And next to the Champa Kingdom in the south, which is now south Vietnam ( Fig. 1), was a large area of swampy forest belonging to the Khmer Empire.
In contrast to the centralized state of historical Vietnam, both the Champa Kingdom and Khmer Empire were basically networks of small political entities (Hall, 2011, p. 67-102, 159-210). The Champa Kingdom in the south was traditionally a trading-oriented nation integrated in the south Asian spice trade and in the broader Austronesian cultural community. Available historical materials do not allow us to draw any comparison between these societies and historical Vietnam along the individualism-collectivism dimension. However, in the terminology of Scott (2009), it is clear that southern Vietnam, over long periods populated mainly by Cham and Khmer ethnic groups, was a periphery to the more centralized states in the core areas of historical Vietnam and the Khmer Empire. The fact that the Champa Kingdom was less centralized and more open to contact with foreigners, probably made it a relatively attractive refuge for more individualist people during the Vietnamese southern advance (see also below).

Selective migration and cultural divergence
From 968 to 1757, historical Vietnam expanded its territory southward along the coast to the Mekong River Delta. This so-called Vietnamese Southern Advance (Nam Tien) took place gradually through various annexations and was completed in 1757, by which time the border of Vietnam was established as it is today (see Online Appendix A). Historical Vietnam first annexed the land from modern Quang Binh to modern Binh Dinh from 968 to 1471. This land was effectively governed by the Nguyen Lords since the early sixteenth century, when the fight to control the throne erupted between them and the Trinh Lords in the initial core region. From 1611 to 1757, the Nguyen Lords continued to expand the country southwards to the Mekong River Delta to establish the border as it is today. Compared to the initial region, the annexed region under the government of the Nguyen Lords was more open towards foreign trade (Tana, 1998, p. 59-98).
After historical Vietnam conquered an area, immigration into the area always followed. There are two types of evidence on this process: anecdotal evidence from historical chronicles and ethno-genetic evidence from historical and contemporary populations. Records from the two official chronicles of historical Vietnam, Dai Viet Su Ky Toan Thu (from 204 BCE to 1675) and Dai Nam Thuc Luc (from 1558 to 1888), indicate that Vietnamese migrants to the annexed region ranged from landless farmers to rich adventurers, who took advantage of the opportunities in the new land, and from exiled criminals to recruited soldiers, who were sent to the new land by the government (see Online Appendix B). There are no records available to identify who were the dominant settlers, let alone their cultural characteristics. 10 Regarding the local ethnic groups, most of their populations moved to more distant peripheries such as the highlands or to the hinterland of other states (Scott, 2009), while those who decided to stay were acculturated to the Vietnamese culture (Wook, 2004). 11 The logic of the selective migration hypothesis discussed in the previous section implies that cultural differences along the individualism-collectivism dimension between the annexed region and the core region would emerge as a result of the selective out-migration of individualists. The historical evidence presented below is in support of this prediction.
Within historical Vietnam, cultural differences along the individualism-collectivism dimension between the annexed region and the core region were already remarkable as early as the seventeenth century. For example, Tana (1998, p. 99-116) provides many historical accounts to demonstrate that the social environment in the annexed region was characterized by greater openness, mobility and autonomy compared to the core region. Available statistics of land allocation in the early nineteenth century also illustrate this cultural divergence. In the core region, land was only allocated to or owned by village members (Nguyen, 2003). In the annexed region, however, the in-village/out-village distinction was 1 3 loosened and land was commonly allocated to or owned by people from other villages. For example, studies on the land registries (cadastres) in the annexed region in the early nineteenth century show that the proportions of land allocated to or owned by people from other villages were 20-30% in the southernmost provinces (Nguyen, 1994) compared to 8-15% in more northern provinces (Nguyen, 1997(Nguyen, , 2010(Nguyen, , 1996a. The second kind of evidence on the Southern Advance migrations comes from the ethno-genetic configuration of the contemporary population. Although nothing like a census from 968 CE exists, most sources indicate that the territory of current Vietnam was a mosaic of smaller ethnic groups a thousand years ago. Today, Vietnam's population is dominated by the Kinh group (86% of the population) while there are 53 recognized ethnic minorities that make up the remaining 14%. These 54 groups can in turn be subdivided into five different macro language families, of which the Austroasiatic family (henceforth "AA", including the Kinh and Muong) is by far the largest.
The people who migrated southwards and into the highlands during the Southern Advance were of two different kinds: individualistic people from Kinh and other AAspeakers who wanted to evade the burdens of the expanding Vietnamese state, and people from out-groups who moved in order to preserve their cultural and political independence vis-à-vis the dominant Kinh. In a recent study, Liu et al. (2020) collect genome-wide data from 22 ethnic groups in Vietnam, including the Kinh, Muong, and Cham. On the basis of this data, it is possible to reconstruct each group's effective population size during the past 50 generations (equivalent to about 1450 years). It is shown that whereas most groups have gone through a decline in effective population sizes, the closely related AA-speakers Kinh and Muong experienced a rapid expansion about 20 generations back (Liu et al., 2020, Figure 5). This surge coincides closely with the date of a great military defeat of the Champa Kingdom against the Vietnamese state in 1471. As a result of this defeat, a great share of the Cham population were either killed or enslaved and replaced by incoming Kinh. In the 1700s, the last remnants of the Champa Kingdom was dissolved and the Cham group experienced a large out-migration to what is now Cambodia. Those who remained were to a great extent assimilated into the Vietnamese state.
Among the out-group refugees from the Vietnamese state, one would expect that trying to retain their unique ethnic identity was a key motivating factor. Several of these groups indeed still speak a non-AA language, which sets them apart from the dominant Kinh. Somewhat surprisingly, the data from Liu et al. (2020) show that a number of these contemporary ethnic groups are actually genetically proximate to Kinh. A likely reason is that Kinh highland colonists over the centuries have interbred with the highland peoples in a more or less forced manner. Despite this genetic proximity, these groups have managed to maintain a distinct cultural identity.
Besides the selective migration of individualistic people as proposed by our theory, there are certainly other potential explanations for the cultural differences along the individualism-collectivism dimension between the annexed southern and northern core regions of historical Vietnam as described above. First, the characteristics of the frontier environment in the annexed region (e.g., sparsely populated) might have induced Vietnamese migrants to be more individualistic. Second, Vietnamese migrants might have been influenced by the cultural characteristics of the Champa Kingdom and Khmer Empire, which in turn might have been more individualistic than historical Vietnam. Finally, Vietnamese migrants to the annexed region in the south might have picked up individualistic traits from foreigners because of the open trade policy of the Nguyen Lords, which in turn was a continuation of the trade orientation of the Champa Kingdom within a greater cross-border Austronesian community. The main difference between our theory of selective migration and these explanations is that our theory predicts cultural differences even within the annexed region, i.e., areas annexed earlier are predicted to be more collectivistic. The empirical evidence presented below is in support of this prediction.

Cultural differences have persisted to the present day
The last block of the selective migration hypothesis argues that the cultural differences across regions of historical Vietnam found around the seventeenth century have persisted and made up a key characteristic of the cultural landscape of modern Vietnam. 12 In other words, the time elapsed since annexation to historical Vietnam is an important predictor of the strength of collectivism within contemporary Vietnam. The north-south cultural differences along the individualism-collectivism dimension in modern Vietnam have been documented in details in many anthropological studies, e.g., Hickey (1964), Rambo (1973), andVan Luong (1992). This north-south cultural divergence is also a typical characteristic that is normally mentioned in descriptions about modern Vietnam. 13 To sum up, the north-south cultural differences along the individualism-collectivism dimension were already in place as early as the seventeenth century and are currently a central theme of Vietnam. Our theory of selective migration discussed in the previous section predicts that areas annexed earlier to historical Vietnam are currently more prone to collectivist norms, and this relationship holds even within the annexed region. We now turn to investigate these predictions empirically using survey data in Sect. 4 and experimental data in Sect. 5.

Empirical model
In this section, we use survey data to investigate the proposed theory of selective migration in the context of Vietnam. The key argument of the theory is that collectivist societies triggered the out-migration of individualistic members toward peripheral areas, and, owing to the slow-moving nature of culture, these differences have persisted over time. Our empirical strategy revolves around regressing a measure capturing the strength of collectivism on the time elapsed since a district was annexed to historical Vietnam, while controlling for potential confounding factors. The core regression model takes the following form: 12 The French colonization started in 1858 and ended with the Vietnamese victory in the First Indochina War (1946)(1947)(1948)(1949)(1950)(1951)(1952)(1953)(1954), during which the French colonizers concentrated most of their activities in south Vietnam. Following the French defeat was the American intervention in south Vietnam (i.e., the Second Indochina War, commonly known as the Vietnam War), which ended with the reunification of the country in 1975. Meanwhile, Communism started to develop in north Vietnam in the early twentieth century and gained control of this part of the country since 1954. 13 Ending his Vietnamese history, Taylor (2013, p. 624) notes that "northerners are more disciplined to accept and to exercise government authority" and "southerners are more individualistic, egalitarian, entrepreneurial, interested in wealth more than in authority". Although regarding collectivism as the main cultural theme in their practical guide to Vietnam, Ashwill and Diep (2005, p. 71-72) note that "northerners are considered to be more intelligent, conservative, austere, serious, and frugal, ..., [and] are more apt to save for a rainy day", while "southerners are perceived as fun-loving, easy-going, open people who rarely think of saving for a rainy day".
In this equation, Collectivism i is a measure of the strength of collectivism in district i, TimeSinceAnnexation i is the time since annexation to historical Vietnam, X i is a set of potential confounding factors, and i is a random error term. Our hypothesis postulates that is positive with respect to the strength of the collectivist measure, i.e., the longer the time since annexation, the stronger the collectivist norms.
Ideally, our main independent variable should capture historical migrations from the core area. Unfortunately, we have not been able to find such a direct measure of selective migration. Our main variable TimeSinceAnnexation i is an indirect proxy for historical migrations in the sense that we should expect that regions annexed last should host the greatest amount of population groups seeking to evade the influence of the northern state.
To what extent would an estimated coefficient of > 0 rule out other potential hypotheses regarding the persistent cultural impact of historical states? In particular, does our main explanatory variable allow us to distinguish between (1) selective migration, (2) a crowding-in of collectivist norms by a strong state, and (3) migrants' adoption of individualist norms that were already strong in the southern periphery?
We argue that a > 0 would be consistent with our selective migration hypothesis, but that it would not disprove the two other hypotheses. In fact, as discussed earlier in our theoretical framework, we recognize that there is a significant overlap between the three hypotheses, and that they are to some extent reflections of the same underlying process. For instance, a strong crowding-in of norms by a collectivist state will push individualists to migrate to the periphery, and the absence of a strong state and a culture of individualism in the south will pull an even greater number of individualist migrants to the periphery.

Variables
The Individualism-Collectivism Trait. In the present paper, we follow the conventional definition of culture in economic research as "decision making heuristics or 'rules of thumb' that have evolved given our need to make decisions in complex and uncertain environments" (Nunn, 2012, p. S109). 14 Based on this definition, many observable outcomes have been used in the literature to capture different aspects of the individualism-collectivism trait, such as extended family structure, marriage stability, and inventiveness (Vandello & Cohen, 1999;Talhelm et al., 2014) or unusual names (Bazzi et al., 2020;Knudsen, 2019). 15 We argue that an outcome variable must satisfy two conditions to be a good measure of the individualism-collectivism dimension. First, it must capture an aspect of the individualism-collectivism trait that is theoretically relevant to understand individual behaviors or economic development. Second, it must feature as a traditional practice of the society in question, i.e., it captures a decision making heuristic in daily living.
In the present paper, we use voluntary labor contribution to public goods production to capture the strength of collectivism. We argue that this measure satisfies the two conditions (1) Collectivism i = TimeSinceAnnexation i + X i + i . mentioned above. First, the ability to solve collective action problems such as public goods production is the main feature of collectivism in related economic models Roland, 2015, 2017). Because collectivist societies are considered to be better in this respect, they should be able to mobilize a larger amount of voluntary labor contribution to public goods production from their in-group members. Second, labor contribution to public goods production is a traditional activity in Vietnam (Adams & Hancock, 1970). In particular, every year households in a local area send their members to work without payment to build or repair local public infrastructure such as roads, wells, irrigation, schools, and health clinics. These labor contributions are not paid, and hence are arguably voluntary. Thus, the decision to contribute labor to public goods production should capture a decision making heuristic in daily living.
Our main dataset is the Vietnam Household Living Standard Survey (VHLSS) in 2002, which covers almost 30,000 households in 607 (out of 630) districts (roughly 50 households per district) across all 61 provinces in Vietnam and is the only survey round that contains detailed information about voluntary labor contribution to public goods production. We measure cultural norms at the district level by aggregation of household data. 16 In particular, we construct three related variables based on voluntary labor contribution to public goods production. First, we calculate the percentage of households contributing labor in the district to measure the prevalence of labor contributions. Second, we calculate the average number of persons making labor contributions per household. Finally, we calculate the average number of labor days contributed per household. These last two variables capture the intensity of labor contributions. Table 1 shows that, in 2002, around 31% percent of households contributed labor to public goods production, whereas the average number of persons making labor contributions per household is 0.55 and the average number of labor days contributed per household is 4.05. Figure 2 provides a spatial presentation of these data. A visual comparison with Fig. 1 suggests that districts annexed earlier to historical Vietnam currently have higher percentages of households contributing labor, more members per household making labor contributions, and more labor days contributed per household.
To further validate the empirical results, we also employ two other measures of the individualism-collectivism trait that have been used in previous studies: extended family structure and marriage stability (Vandello & Cohen, 1999;Talhelm et al., 2014). Households in collectivist societies should be more likely to include extended family members rather than simply nuclear family members, and less likely to divorce. We follow previous studies to measure at the district level extended family structure by the percentage of households with grandchildren living in them, and marriage instability by the number of divorced households per 100 married households. We use data from the Population and Housing Census in 1999, which cover around 16.5 million households across all districts in Vietnam. Table 1 shows that, on average, the percentage of households with grandchildren in them is 7.2% (ranging from 0 to 27.17%), and the number of divorced households per 100 married households is 2.46% (ranging from 0.21% to 7.82%). As shown in Table D1 in Online Appendix D, the correlation coefficient between labor contribution and extended family is 0.5 (p value = 0.000), labor contribution and marriage instability is −0.3 (p value = 0.000), and extended family and marriage instability is −0.2 (p value = 0.000). The Time since Annexation to Historical Vietnam. As previously mentioned, our main explanatory variable is the time elapsed since an area was annexed to historical Vietnam. Following the historical background discussed earlier, we choose the first unified state of historical Vietnam in 968 as the beginning year, while the terminal year is set to 1990. In our analyses, we measure the time since annexation in centuries (100 years) to make the estimated coefficients easy to read in the reported tables. The descriptive statistics in Table 1 show that the annexations took place between 2.33 to 10.22 centuries before the terminal year of 1990.
To construct the time since annexation to historical Vietnam for each modern district, two dimensions are needed: (1) the district's corresponding area in historical Vietnam and (2) the year that this area was annexed. For the year that a historical area was annexed, we rely on two official chronicles of historical Vietnam, Dai (2012) and Dai (2002), recording events from the beginning to 1675 and in the 1558-1888 period, respectively. These chronicles were written by state officials of historical Vietnam to keep track of events and, to the best of our knowledge, constituted the primary sources for Vietnamese histories. We code an area as having been annexed when there is a record in the chronicles demonstrating that this area was under the control of historical Vietnam. To link historical areas to their modern counterparts, we rely on two seminal works of Vietnamese historians: Dao (2005) and Le Phan et al. (2011). All details on the coding procedure are presented in Online Appendix A.
Control Variables. To tackle the endogeneity of the time since annexation into historical Vietnam, we identify a set of potential confounding factors, i.e., factors that might influence both the time since annexation to historical Vietnam and the strength of collectivism. A necessary condition for a variable to be a confounding factor is that it must have existed before the annexation to historical Vietnam. Variables that came to exist after the annexation such as demographic characteristics in the modern day might be caused by the annexation, and hence are bad controls (Angrist & Pischke, 2009). Nevertheless, as shown in the following subsection, our empirical results are also robust to the inclusion of numerous bad controls. Below, we describe the set of potential confounding factors in detail.
First, agricultural suitability might have both attracted historical Vietnam to conquer a region and promoted the development of collectivism. We control for natural land productivity, which has been argued to influence the incentive to cooperate in the production of public infrastructure in the subsistence agricultural economy (Litina, 2016). Second, geographical conditions might affect the difficulty in conquering a region. Geographical isolation is also conducive to cultural assimilation and the development of an in-group identity, giving rise to collectivism (Triandis, 1995). We control for distance to the coast, elevation, and ruggedness to capture geographical isolation. Third, we also control for irrigation suitability, because irrigation agriculture has been shown to be conducive to the development of collectivism (Buggle, 2020). In addition, we control for climatic zones to capture any potential influence of climatic conditions on the development of collectivism.
Natural land productivity is measured by caloric suitability constructed at 5 arc-minute resolution by Galor and Özak (2016), who make their calculation based on data from the Global Agro-Ecological Zones project of the Food and Agriculture Organization. This index measures the average potential yield (million kilo calories per squared kilometer per year) attainable in each grid cell given the set of crops that are suitable for cultivation in the post-1500 period. To capture the natural component of land productivity, the production conditions are set at a low level of inputs and rain-fed agriculture based on agro-climatic conditions, which are unaffected by human intervention. Distance to the coast is measured by the shortest (bird-fly) distance to the coastal line. Elevation is taken from the Global 30 1 3 Arc-Second Elevation Dataset (GTOPO30) provided by the Earth Resources Observation and Science Center. The terrain ruggedness index was originally devised by Riley et al. (1999) to quantify topographic heterogeneity in wildlife habitats providing concealment for prey and lookout posts. This index is calculated by Nunn and Puga (2012) based on the GTOPO30 dataset.
The Food and Agriculture Organization of the United Nations (FAO) defines irrigation suitability as the potential increase in agricultural output that can be obtained by fully exploiting irrigation compared to rain-fed agriculture, and classifies irrigation suitability into five classes: (1) only suitable for rain-fed agriculture, (2) output yield increases by 0-20%, (3) output increases by 20-50%, (4) output increases by 50-100%, and (5) output increases by more than 100% (Fischer et al., 2002). Following Buggle (2020), we classify an area to be suitable for irrigation if agricultural output increases by at least 50%. Climatic zones are defined by the Köppen-Geiger classification, and are constructed using precipitation and temperature data in the period of 1901-1925 (Rubel & Kottek, 2010). Descriptive statistics of all variables can be found in Table 1.

Baseline results
To begin with, we regress the percentage of households contributing labor in a district on the time since a district was annexed to historical Vietnam, controlling for potential confounding factors as discussed above. In all regressions, we report both robust standard errors and standard errors adjusted for spatial autocorrelation following Conley (1999). 17 Table 2 shows that the estimated coefficients of the time since annexation are positive and significant, whether or not all control variables are included. Thus, districts annexed earlier to historical Vietnam today have a higher percentage of households contributing labor on average. Relative to the mean value of the dependent variable, the marginal effect is economically significant. When all control variables are included, for example, a one century increase in the time since annexation is associated with an additional 1.9% points of households contributing labor, which is more than 6% of the mean value of the variable. The reduction in the magnitude of the estimated coefficient of the time since annexation when control variables are added indicates that these variables do confound the impact of the time since annexation on the prevalence of collectivism to some extent. The time since annexation accounts for almost 10% of the variation in the percentage of households contributing labor.
The estimated coefficient of caloric suitability is negative and significant (Column 2), suggesting that a higher natural land productivity is associated with a lower percentage of households contributing labor, which concurs with Litina (2016). In line with Triandis (1995), the estimated coefficients of distance to the coast, elevation, and ruggedness are all significant and positive (Columns 3 to 5), indicating that areas farther from the coastal line, more highly elevated and rugged have higher percentages of households contributing labor. The estimated coefficient of irrigation suitability is negative and significant (Column 6), while only the dummy for Cwa climatic zone has a significant and positive estimated coefficient. When all control variables are added together, their estimated coefficients decrease substantially in magnitude, which is expected given that these variables are correlated (Table D1 in Online Robust standard errors are in parentheses, standard errors in squared parentheses are calculated following Conley (1999) with the assumption that autocorrelation decreases in the distance between district centroids and equals zero for districts that are more than 50 km apart. All regressions include a constant. The base climatic zone is Am. *p < 0.1 ; **p< 0.05 , ***p < 0.01

3
Appendix D). The estimated coefficients of caloric suitability, irrigation suitability, and climatic zones are now not different from zero (Column 8). Together, these control variables account for nearly 20% of the total variation in the percentage of households contributing labor. Table 3 reports the results of similar regressions for the two other dependent variables measured at the district level, i.e., the average number of persons per household making labor contributions (panel A) and the average number of labor days contributed per household (panel B). With respect to both variables, the estimated coefficients of the time since annexation are significant and positive, whether or not all control variables are included. Districts annexed earlier to historical Vietnam currently have more members per household making labor contributions and more labor days contributed per household on average. For both dependent variables, the marginal effects are economically significant, i.e., more than 7% of the respective mean values. The time since annexation accounts for approximately 10% of the total variations in both dependent variables, while control variables altogether account for another 20%. Table 4

Robustness analysis
Sub-sample Analysis. As discussed earlier in the historical background, we recognize that there might be various characteristics in the annexed region (i.e., harsh living conditions, openness to trade, and existing individualist norms of the Cham and Khmer) that certainly play a role in explaining the cultural differences along the individualism-collectivism dimension between the annexed and the initial regions. In order to examine if our selective migration hypothesis differentiates itself from these channels, we investigate the relationship between the time since annexation to historical Vietnam and the strength of collectivism within the subsample of the annexed areas. As can be seen in Panel A of Table 5, the estimated coefficients of the time since annexation to historical Vietnam remain qualitatively the same with respect to all dependent variables, whether or not all control variables are added. We argue that this finding is consistent with our hypothesis that selective migration of individualistic people in the past is an important driver behind the contemporary cultural differences across Vietnam.
Next, historical Vietnamese immigrants (the Kinh ethnicity) often inhabited the coastal plain with their traditional rice agriculture. At the same time, the highland areas were mainly inhabited by various ethnic groups. After the Reunification in 1975, the Kinh started to migrate to the highland areas on a large scale through state-sponsored programs under the central planning economy to establish new production zones (Hardy, 2003). These later migrations, therefore, might be different from those that happened in historical Table 3 The Robust standard errors are in parentheses, standard errors in squared parentheses are calculated following Conley (1999) with the assumption that autocorrelation decreases in the distance between district centroids and equals zero for districts that are more than 50 km apart. All regressions include a constant. The base climatic zone is Am.

Percentage of households with grandchildren in them
(1) (3)   Robust standard errors are in parentheses, standard errors in squared parentheses are calculated following Conley (1999) with the assumption that autocorrelation decreases in the distance between district centroids and equals zero for districts that are more than 50 km apart. All regressions include a constant. The base climatic zone is Am. *p < 0.1 ; **p < 0.05 ; ***p < 0.01 Table 5 Robustness to different sub-samples Robust standard errors are in parentheses, standard errors in squared parentheses are calculated following Conley (1999) with the assumption that autocorrelation decreases in the distance between district centroids and equals zero for districts that are more than 50 km apart. Control variables include caloric suitability, distance to the coast, elevation, ruggedness, irrigation suitability, Köppen-Geiger climatic zones, and a constant. Panel A only includes districts in the annexed region. Panel B further excludes districts in Ha Noi and Ho Chi Minh City and districts whose elevations are above 500 m. *p < 0.1 ; **p < 0.05 ; ***p < 0.01 Labor contribution times. To examine this issue, we further exclude from the estimation all districts in the highland areas, i.e., where the average elevations are above 500 meters (the results are robust to other values such as 400 and 600 meters). 18 Furthermore, we also exclude two provinces, Ha Noi (in the north) and Ho Chi Minh City (in the south), which are the two biggest venues for immigrants in modern times. Panel B of Table 5 shows that the estimated coefficients of the time since annexation remain qualitatively the same with respect to all dependent variables, whether or not all control variables are added.
Omitted-Variable Bias. Although we have considered an extensive list of confounding factors, some factors cannot be included owing to data availability. To examine the potential omitted-variable bias caused by some of these factors, we use an instrumental variable (IV) estimation. We observe that, from the Red River Delta in the north, historical Vietnam could not conquer the Mekong River Delta in the south without annexing all areas located in between, making the north-south geographical order a strong predictor of the time since annexation within the subsample of the annexed areas. 19 Thus, by using the north-south geographical order as an instrumental variable for the time since annexation, we can examine the potential bias caused by unobserved factors that are likely to influence the time since annexation and the individualism-collectivism trait, but do not correlate with the north-south geographical order. In Online Appendix C, we discuss many such factors in detail, including indigenous cultural environments, local economic activities, market access, and the influence of northern culture or institutions.
We propose to use the distance from an annexed area to a northern reference point as a measure of the north-south geographical order. Quang Binh (see Fig. 1), the first area that was annexed to historical Vietnam, is arbitrarily chosen as the northern reference point (the result is robust to other choices as well), and the walking distance along the coast (instead of the geodesic, "bird-fly" distance) is calculated to capture the military route in historical times. Distance is measured in 100 km using the district centroids, where district borders are taken from the Global Administrative Unit Layers. The walking distance from Quang Binh to the farthest district in the south is roughly 1350 km. Panel A of Table 6 shows that the estimated coefficients of the time since annexation remain economically and statistically significant with respect to all dependent variables, whether or not all control variables are added. The first-stage results ensure that the walking distance to Quang Binh is a strong predictor of the time since annexation, i.e., it has significant and negative estimated coefficients and large F statistics (full results are available upon request).
In the spirit of Scott (2009), more distant and rugged areas would make it easier for migrants to escape the influence of the central state. Thus, we can use the interaction between the walking distance from Quang Binh and terrain ruggedness as an instrumental variable to further examine the credibility of the IV estimation. Panel B of Table 6 shows that the estimated coefficients of the time since annexation remain qualitatively the same with respect to all dependent variables, whether or not all control variables are added. The first-stage results also suggest that the interaction term between the walking distance from Quang Binh and terrain ruggedness is a strong predictor of the time since annexation, i.e., Table 6 Robustness to instrumental variables Robust standard errors are in parentheses, standard errors in squared parentheses are calculated following Conley (1999) with the assumption that autocorrelation decreases in the distance between district centroids and equals zero for districts that are more than 50 km apart. Control variables include caloric suitability, distance to the coast, elevation, ruggedness, irrigation suitability, Köppen-Geiger climatic zones, and a constant. All regressions only include districts in the annexed region. In Panel A, walking distance to Quang Binh is the instrumental variable. In Panel B, the interaction term between walking distance to Quang Binh and terrain ruggedness is added as an instrumental variable.
*p < 0.1 ; **p < 0.05 ; ***p < 0.01 Labor contribution it has significant and negative estimated coefficients and large F statistics (full results are available upon request). Another way to examine the potential bias from omitted variables is to use the change in the estimated coefficients of the time since annexation into historical Vietnam when the observed confounding factors are included to infer about the potential change caused by the unobserved confounding factors. Under the assumption that the selection on unobserved confounding factors is proportional to the selection on observed confounding factors, Oster (2019) shows that a consistent estimate adjusted for omitted-variable bias can be obtained for each value of this assumed proportional relationship. To examine how the estimated coefficients of the time since annexation responses to omitted variables, we assume that the selection on unobserved confounding factors equals one half of the selection on observed confounding factors. 20 Columns 8 of Tables 2, 3, and 4 show that the adjusted estimated coefficients remain more than one half in magnitude compared to their unadjusted counterparts with respect to all dependent variables. Thus, the potential bias from omitted variables, if there is any, should be relatively small.
Bad Controls. There are numerous factors that came to exist after the annexation into historical Vietnam that might influence the strength of collectivism. For example, big cities in the annexed region such as Hoi An and Sai Gon were built after the annexation, the concentration of French colonizers and American intervention in south Vietnam (the latest region to be annexed into historical Vietnam), and various demographic characteristics in the modern day. These factors are bad controls, and should not be included in the regression (Angrist & Pischke, 2009). Nevertheless, Table D2 in Online Appendix D reports that the estimated coefficients of the time since annexation to historical Vietnam remain qualitatively intact when the shortest distance to big cities, a dummy variable for south Vietnam, and various district demographic characteristics such as percentage of households with male head, average head age, average schooling years of head, percentage of urban households, percentage of Kinh households, average household size, and per capita expenditure are added to the regression model.

Discussion
The survey and census data analysis so far has demonstrated that, on average, districts annexed earlier to historical Vietnam are today more prone to collectivist norms, and this result is robust to a battery of checks. In particular, a novel finding is that voluntary labor contribution to public goods production is both more prevalent and more intensive in districts annexed earlier to historical Vietnam. We have also shown that family structure is more traditional and marriage is more stable in districts annexed earlier. Although the VHLSS provides naturally occurring data that are available for almost all districts across Vietnam, the biggest drawback is that it cannot help us examine further why more collectivist societies in districts annexed earlier to historical Vietnam could mobilize a larger amount of voluntary labor contribution to public goods production from their in-group members. Was that because these societies developed informal institutions that punished non-contributing members (e.g., ostracism)? Or because they have a large fraction of members with strong cooperative preferences? Or because their members share a strong belief that other people would also contribute labor to public goods production?
To complement the survey data analysis, we conduct a lab-in-the-field public goods experiment. Although it is impossible to run the experiment in all districts across Vietnam, the advantage of the experiment is that it allows us to examine deeper why more collectivist societies in districts annexed earlier to historical Vietnam could mobilize a larger amount of voluntary labor contribution to public goods production from their in-group members. First, the experiment holds the institutional setting constant, eliminating the possibility that there are informal institutions that punish non-contributing members. Second, as discussed in detail below, we adopt an experimental design that allows us to measure preferences for cooperation and beliefs about the contributing behaviors of other members. In turn, we can investigate whether preferences or beliefs that drive individual contributions to public goods production.

Sample selection
To capture in-group cooperation, we recruit subjects who come from the same local areas. A crucial aspect is the selection of experimental sites in such a way to minimize differences between the selected sites. First, we focus on the annexed region to rule out differences in the historical frontier environment. Second, we select provinces, and rural districts in them, located along the coast, which was the typical route of migration and settlement of historical Vietnamese. Finally, we choose provinces, and rural districts in them, that were historically inhabited mainly by the Kinh ethnicity (historical Vietnamese) and whose populations have been living there for many generations, i.e., neither any significant immigration nor emigration from these places. Thus, this procedure leaves us with coastal, rural, and Kinh-dominated districts in the annexed region. From this subsample, we randomly select one of the districts with the longest time since annexation to historical Vietnam and one of the districts with the shortest time. This process narrows our selection to randomly choose one rural district in Thua Thien Hue province and one rural district in Ben Tre province; the former is located more to the north and thus has a longer time since annexation (see Fig. 1).
We use high school students as our subjects in the experiment since they are old enough to embody the cultural environments of the places where they grew up, but not yet affected by living outside their communities, which could make it harder to capture the local cultural norms. 21 Our proposed selective migration hypothesis predicts that subjects in Thua Thien Hue (henceforth the "northern site") share stronger norms of in-group cooperation, and hence on average contribute at a higher level compared to subjects from Ben Tre (henceforth the "southern site"). Each rural district in Vietnam has three to five high schools. To keep similarities between the selected districts, we randomly selected one school located in the center of the district among the schools that had at least six classes for the oldest age cohort, which means that students come from a larger catchment area 1 3 where they have attended different secondary schools. The latter requirement was imposed to avoid measuring cooperation norms within a specific class, which might have developed its own norms, when we are aiming at measuring norms in the community in which they lived.

The public goods experiment
We build our experimental design on the one-shot linear public goods experiment developed by Fischbacher et al. (2001). 22 We begin by describing the general features of a public goods experiment before discussing the specific features of the design in Fischbacher et al. (2001).
The basic idea of a public goods experiment is to create a social dilemma situation, where there is a conflict between the social and private optima. In our setting, the subjects are randomly assigned to groups of three, where each member comes from a different class at the high school, and this was clearly stated in the instructions of the experiment. This feature of the design was chosen to avoid having subjects allocated to groups consisting of classmates with whom subjects might have developed a specific norm of behavior, reducing the possibility of measuring norms of cooperation in the places where they reside. All subjects receive an endowment of 20 tokens and must decide simultaneously how much of their endowments to invest in a public good, and the residual is kept for themselves, which is labeled as a private good. The marginal per capita return (MPCR) from the public good is 0.5, which means that each token contributed to the public good by a group member results in 0.5 token to all group members, including the member who contributes the token. If a subject is rational and selfish, then a MPCR below 1 leads to a dominant strategy to free ride (i.e., to contribute zero to the public good), because the return from the public good is lower than the return from the private good. Nevertheless, it is socially optimal to contribute the whole endowment if MPCR × n > 1 , where n is the number of group members. Thus, our choice of the MPCR of 0.5 generates the conflict between private and social optima that characterizes a public good. The payoff for subject i consists of two components: (1) the amount of the endowment that is not invested in the public good (i.e., what is kept as a private good), and (2) the return from the public good. The payoff function for subject i is given by: Each token earned in the experiment is exchanged for money at the exchange rate of one token equals 10000 Vietnamese Dong.
The specific feature of the public goods experiment developed by Fischbacher et al. (2001) is that it is based on the strategy method. In their design, each subject makes two types of contribution decisions to the public good: (1) unconditional contribution and (2) conditional contribution. In the unconditional contribution decision, which is the standard public goods experiment described above, each subject states how much he or she would like to contribute to the public good from his or her endowment of 20 tokens. The additional feature of the design of Fischbacher et al. (2001) is the introduction of the contribution table in which subjects make contribution decisions conditional on the other group members' average contributions. In a contribution table, which includes all possible average contributions of the two other players in the group, rounded to integers and ranging from 0 to 20 points, a subject indicates how much he or she would contribute to the public good if these were the average contributions to the public good by the other two group members. The contributions reported in the table are referred to as conditional contributions. The final feature of the design of Fischbacher et al. (2001) is to ensure that all decisions, i.e., both unconditional and conditional contributions, are incentive compatible by using the following approach. For two randomly selected group members, it is the unconditional contribution to the public good that is pay-off relevant. For the third member, the average unconditional contribution of the other two group members is calculated, and the contribution of the third member is then determined from her conditional contribution given the average contribution of the other two group members. Thus, when a subject makes his or her decisions, he or she does not know which of all the decisions will be payoff relevant, and hence has no incentive to choose anything other than the preferred option. After the experiment, we also elicited beliefs by asking a subject what he or she thought that the other two group members had contributed unconditionally on average. We pay subjects for the accuracy of their guesses to create incentives for truthful revelation. 23 The strength of the strategy method is that subjects can be categorized into different contributor types based on their 21 conditional contribution decisions to the public good, i.e., how much they decided to contribute to the public good conditional on the average contribution of the other two group members for all integers in the range 0 to 20. These contributor types capture the preferences for cooperation. We use the same classification as proposed in the original paper by Fischbacher et al. (2001). A subject is classified as a "conditional cooperator" if his or her conditional contribution increases weakly monotonically with the average contribution of the other group members or if the relationship between his or her conditional contribution and the average contribution of the others is positive and significant at the 1% significance level, using a Spearman rank correlation coefficient. A "free rider" is a subject who contributes zero to the public good for all levels of the average contribution by others. A "hump-shaped" contributor is characterized by a subject who shows weakly monotonically increasing contributions or a positive Spearman rank correlation coefficient at the 1% significance level, which is the same classification strategy as applied to a conditional contributor, but it only holds up to an inflection point. For average contribution levels by others above this level, the subject's own conditional contributions decrease weakly monotonically or show a significant and negative Spearman rank correlation coefficient at the 1% significance level. Those who cannot be categorized based on any of the above criteria are referred to as "others".
At the beginning of the experiment, subjects received written instructions for the experiment and the instructions were also read aloud. 24 Before the experiment began, various examples were given to facilitate understanding of the experiment and the subjects also completed some exercises. When the experiments were finished, subjects answered a short survey about basic socioeconomic information. Finally, subjects were called one at a time for payment done in private. Subjects were recruited by teachers, and the participation rates of students are similar across schools: 70% in the northern site (140 out of 200) and 73% (235 out of 320) in the southern site. In accordance with our expectation, around 97% of the subjects were born in the sampled districts, while the others were born in other districts in the sampled provinces. 25 Table D3 of Online Appendix D summarizes other socioeconomic characteristics (gender, household size, and a wealth index), which will be controlled for in the following regression analysis. Table 7 shows that subjects from the northern site and southern site on average unconditionally contributed 7.50 tokens and 6.58 tokens respectively out of the endowment of 20. 26 Thus, subjects from the northern site on average contribute higher than subjects from the southern site by 0.92 tokens, and this difference is statistically significant (p value = 0.024, Mann-Whitney U test). Previous studies have indicated that a large fraction of subjects are conditional cooperators, i.e., their contributions are positively correlated with contribution levels by others. We also elicited guesses about the average contributions by the other two group members, in which subjects from the northern site and southern site on average guessed 8.25 tokens and 7.60 tokens respectively (p value = 0.053, Mann-Whitney U test). At the aggregate level, the results indicate conditional cooperative behavior. It is common that guesses about the average contribution by others are higher than own contribution levels because, on average, people are imperfect conditional cooperators, and there is also a fraction of free-riders.

Results
The innovative part of the design developed by Fischbacher et al. (2001) is that it allows us to classify subjects into different contributor types. The lower panel of Table 7 shows the distribution of types. By far, conditional cooperators are the most frequent type both in the northern site (52.17%) and in the southern site (54.04%), while the fractions of free riders are low (3.62% and 5.53% in the northern and the southern site, respectively). 27 We indeed cannot reject the null hypothesis that the compositions of contributor types in the northern site and the southern site are drawn from the same distribution (p value = 0.930, Pearson's 2 test), indicating that cooperative preferences are not significantly different between the two sites. Furthermore, Table 7 shows that, except for free riders, other types in the northern site on average have higher levels of unconditional contribution and guesses about the average contribution of other group members compared to their counterparts in the southern site. The largest contributor type is the conditional cooperators, and this type is considered the key group for contributions to public goods. Their average unconditional contributions are 7.29 and 6.39 respectively in the northern and the southern site, with guesses of 8.19 and 7.88. These findings suggest that the north-south difference in contribution behaviors is driven by beliefs about the contributing behaviors of other people rather than cooperative preferences. 25 The following results are robust to the omission of subjects who were not born in the sampled district. Details are available upon request. 26 The contribution levels are similar to what has been found in the literature of public goods experiments (Zelmer, 2003;Chaudhur, 2011). 27 This distribution of types is also similar with previous findings in the literature (Chaudhuri , 2011).
We use regression models to examine the unconditional contribution behaviors further and the results are reported in Table 8. In all models, we include a dummy variable if a subject comes from the northern site. In line with the descriptive statistics, the estimated coefficient of the northern site dummy is positive and significant when entering the regression alone (Column 1). In the next regression model, we add belief about the average contribution of the other two group members and find that its estimated coefficient is positive and significant (Column 2). In the same regression, the estimated coefficient of the northern site dummy is reduced substantially in magnitude, indicating that higher levels of contribution in the northern site are mainly driven by higher beliefs about the average contribution of the other two group members. These estimated coefficients remain positive and significant when the socioeconomic characteristics (gender, household size, and a wealth index) are also added to the regression (Columns 3 to 6).
To summarize, the experimental findings corroborate the tendency found in survey data that districts annexed earlier to historical Vietnam currently have stronger norms for cooperation, and further suggest that cultural differences across Vietnamese regions are embodied in individual beliefs.

Conclusion
The individualism-collectivism dimension has been found to be a powerful predictor of economic and democratic development in a large sample of countries (Gorodnichenko & Roland, 2011, 2017. Thus, why some societies have become more collectivistic or individualistic than others is a crucial question in understanding long-run comparative development. In the present paper, we propose and investigate the selective migration hypothesis, stating that cultural differences along the individualism-collectivism dimension are driven by the out-migration of individualistic people from collectivist societies to settle down in frontier areas, and that such patterns of historical migration are reflected even in the current distribution of cultural norms. We use the territorial expansion of historical Vietnam from the eleventh to the eighteenth centuries as an ideal setting to empirically examine this hypothesis. During this period, historical Vietnam gradually expanded its territory southward along the coast from the Red River Delta to the Mekong River Delta through various waves of conquest and migration to form the country as it is today.
Our empirical analysis focuses on the ability to solve collective action problems, which is the main feature of collectivism in related economic models, by using data on voluntary contributions to public goods, which is the most typical collective action in daily living in Vietnam. Using a household survey, we find that areas annexed earlier to historical Vietnam currently have higher levels of voluntary labor contribution to public goods production. Using a population census, we also find that households in districts that were annexed earlier have a higher percentage of households with grandchildren living in them and a lower prevalence of divorced households, which are two other standard measures of individualism-collectivism traits. Conducting a public goods experiment with high school students, we find that subjects from areas annexed earlier to historical Vietnam contribute substantially more to the public good compared to subjects from areas annexed later, and that the result is mainly driven by the belief about the contributions of other subjects. Relying on various Vietnamese historical accounts, together with various robustness checks, we show that the southward out-migration of individualistic people during eight centuries of territorial expansion of historical Vietnam is an important driver behind these cultural differences.
Despite our efforts, we recognize, however, that in the current study, it is empirically challenging to completely isolate the effects of selective migration from a crowding-in of collectivist norms by a strong state or the pre-existence of individualist norms in the periphery. We leave it for future research on other areas, with access to more detailed data on historical migration patterns and attitudes, to potentially shed further light on the relative contributions of each of these interrelated mechanisms.
We believe that the present paper makes a contribution by offering an extended conceptual framework and an empirical strategy combining survey and experimental evidence for understanding long-run cultural divergence. First and foremost, the migration patterns in the distant past played a crucial role in explaining cultural differences across modern societies. As time goes on, similar processes may continue to enhance cultural differences across societies. These cultural differences may, in turn, have important implications for future levels of comparative development.