1 Introduction

Creativity is often highly concentrated in time and space, and across different domains. In the fifteenth century, Florence was home to an amazing number of groundbreaking innovators in literature, paintings, sculpture and philosophy. At the turn of the nineteenth century, Vienna hosted pioneers in paintings, medicine, biology, psychology, philosophy, music, who interacted with each other. Antwerp in the sixteenth century, London and Paris in the seventeenth and eighteenth centuries, San Francisco and New York in the past few decades, are other recent examples (Banks, 1997; Kandel, 2012).

What explains the formation and decay of such centers of innovation? A priori, one would expect creative clusters to emerge in societies that are: (i) more affluent—because economic prosperity fuels the demand for the services of creatives, and provides resources for training and education; and (ii) more free—because political freedoms create a more open and dynamic social environment. In this paper we show that the historical evidence from European cities strongly supports (ii), but not (i).

We analyze data on European creative elites born in the eleventh–nineteenth centuries. We exploit information on the date and place of birth and death of notable individuals in arts, humanities, science and business. Our main source is Freebase.com, a large data base owned by Google and coded by Schich et al. (2014), that stores information from a variety of publicly editable sources, most notably Wikipedia. After integrating these individual data with additional information scraped from the internet, we match them with a historical data set on European cities and local institutions put together by Bairoch et al. (1988) and Bosker et al. (2013). Our unit of observation is thus a city in a particular century between the eleventh and nineteenth centuries.

We consider two main variables. First, the number of famous people born in a city (per 1000 inhabitants) during a century. Births of famous creatives are a measure of the local opportunities for radical innovation offered to young talent. As emphasized by Kubler (1962), successful creativity reflects a fortuitous match of individual predispositions with local opportunities for innovation.Footnote 1 Place of birth is more informative than place of death, because the external environment has a greater impact on younger individuals, through role models and opportunities for social learning and training—see also Bell et al. (2019).Footnote 2

Second, we consider the number of famous immigrants, defined as the number of deaths (per 1000 inhabitants) of famous creatives born elsewhere. This variable captures the attractiveness of a locality due to opportunities for professional enhancement or a market for one’s services. Given the breadth of our data in terms of time, geography and disciplines, we don’t have information on where these notable individuals did their most important work. We doubt that this is an important omission, however, since any invention reflects ideas and experiences accumulated through a life time.Footnote 3

We find two main results. First, there is no evidence that changes in local economic conditions play an important role in the formation or decay of creative clusters, except in a few specific disciplines. In line with other historical studies—e.g. Bosker et al. (2013)—we use urban population as an index of local economic conditions. Neither current nor lagged changes in population are correlated with changes in famous births or immigrants, although there is some heterogeneity across disciplines: changes in births of famous non-performing artists and changes in famous immigrants in business are preceded by changes in population in the same direction. The same finding also holds in a gravity model, where changes in the bilateral flow of creative migrants across cities are not correlated with changes in the size of the origin or destination city. Population is an imperfect measure of local economic conditions. We thus repeat the analysis with historical data on real wages of skilled workers. Here the sample only includes about 30 major European cities, but time is measured in decades rather than centuries, and for most cities the period goes from 1400 to the mid nineteenth century. There is no evidence that wages started increasing before famous births or the arrival of famous immigrants. This is true for all disciplines. A possible concern with wage data is that they do not exhibit enough variation, but there is no correlation with creativity even if we restrict the sample to cities and decades where wage variation is more pronounced. On the other hand, our wage data do predict urban population, indicating that they vary sufficiently over time to have explanatory power in other contexts.

This finding may seem surprising to economists, and it could reflect measurement error in our indicators of economic conditions. But it is in line with historical anecdotal evidence. Although there are instances where good local economic conditions and artistic florescence went hand in hand, like the emergence of a market for Belgian and Dutch paintings in the fifteenth and sixteenth centuries (De Marchi and Van Miegroet, 2006), there are also prominent examples to the contrary. For instance, London under Queen Elisabeth, Florence during the Renaissance, and Spanish cities in the seventeenth century are examples of peak creativity achieved during difficult economic times, while rich Genoa remained in artistic obscurity for several centuries.Footnote 4

Fig. 1
figure 1

Transitions into commune and births of famous creatives. The figure plots point estimates for leading and lagging indicators for the change in commune status. We estimate (5) by OLS, including separate indicator variables for two centuries before the transition, the century of the transition, one century after, and for centuries 2 and forward. In other words, we constrain the effects of the transition to remain constant from century 2 onwards. We normalize \(\beta _{-1}\) to zero, so that all post-event coefficients can be interpreted as treatment effects. The dependent variable is \(Log (1+Births)\). The variable Births is equal to the number of famous creatives born in a city, per 1000 inhabitants. Vertical bars correspond to 95% confidence intervals with region-clustered standard errors. We include city FE and century FE. Results are very similar when adding the full set of controls, including the spatial lag

Fig. 2
figure 2

Spatial distribution of births of famous creatives, fifteenth century. The darker the tone, the higher the number of famous creatives born in a city during the century, per 1000 inhabitants. The larger the circle, the larger the population of the city. The names in the map indicate the location of present-day cities, which may have been small or may have not existed in the fifteenth century. The map only displays those cities in our sample which are geographically more central

Our second main result is that, instead, the formation of creative clusters is strongly correlated with becoming a free city, as measured by the dummy variable “Commune” coded by Bosker et al. (2013). Communes protected basic economic freedoms, promoted trade, guaranteed freedom of movement and from censorship and other personal freedoms. This attracted religious and political exiles and created a dynamic social environment, in frequent contact with other trading centers and open to external ideas and innovations.Footnote 5 Communal participatory institutions also established an inclusive social order, that reinforced civic capital and emphasized the importance of the common good over particularistic interests (Brucker, 2015). These cultural traits created a fertile ground for innovative activities that would benefit all, such as the pursuit of knowledge and artistic creations. Moreover, Communes often promoted works of art that could become symbols of the city and enhance its prestige—see for instance Paoletti and Radke (2005), Connell (2002) on Florence and Norman (1999) on Siena. These institutional features that made Communes hubs of innovation are reflected in our findings. Becoming a Commune is followed by a rise in births of famous creatives, as illustrated in Fig. 1. The treatment variable is Commune, and it is measured at about the beginning of each century. Becoming a Commune (date 0 in Fig. 1) is associated with a 5 percentage point increase in the births of creative individuals (per 1000 inhabitants) during the same century (an increase of about 25% relative to the average number of famous births), with an additional increase in the subsequent century. We obtain similar results in a gravity model where we study the bilateral flow of creative immigrants across European cities, using a diff-in-diff methodology: becoming a Commune is associated with an increase in the inflow of notable individuals, that almost doubles in size.

As described by Pirenne (2014) and Parker (2004), communal institutions often evolved from within the city, and were guided by the aspiration of the urban middle classes to gain freedom and independence from external influence (primarily in opposition to the Church or an external Lord). In some cases, autonomy and freedom were granted in order to encourage new settlements during periods of intense migration (Bartlett, 1994). An obvious concern, therefore, is that the economic and social changes that led to the emergence of communal institutions also had a direct effect on creative endeavors, for instance by creating a demand for works of art or of education. Our first finding, that economic conditions are not correlated with current or subsequent creativity, reduces this concern somewhat. Nevertheless, to limit the scope of omitted variables, we construct two instrumental variables. Our main instrument exploits the idea that political transitions were facilitated by external forces, such as a vacuum of regional powers or contagion effects in the aspirations of cities. Adapting a strategy introduced by Persson and Tabellini (2009) and Acemoglu et al. (2019) to study democratic transitions, we instrument Commune with the incidence of Commune in the remainder of the region (defined by current NUTS 1 administrative borders). The identifying assumption is that, conditional on time and city fixed effects and other covariates, regional waves of institutional transitions influence city level creativity only through the city political institutions. To make this assumption less restrictive, we also control for regional waves of creativity (measured by the spatial lag of births of notable individuals in the region). These IV estimates confirm the results of the event study and are very robust. On average, births of creative people increase by 10 percentage points or more during the century (almost a 50% increase relative to the average), upon a transition into Commune.

As a robustness check, we also rely on a second instrument inspired by Schulz et al. (2019): whether a city was exposed to the medieval Church policy of banning cousin-marriage. As argued by Goody and Goody (1983), this religious innovation led to the dissolution of kin networks in early medieval Europe. This resulted in profound cultural transformations, also documented in Schulz et al. (2019), which in turn facilitated the emergence of participatory political institutions. This second instrument is not strongly correlated with the first one, and it confirms the robustness of our findings.

Finally, the paper describes a number of stylized facts on the temporal and spatial patterns of creative clusters. First, births of creative people and famous immigrants are more spatially concentrated than population, and they are clustered across disciplines. Hence spillover effects associated with local proximity and/or local factors are important for creative activities, and operate across disciplines and not just within each field. This finding is consistent with the discussion and evidence in Jacobs (1969) and Glaeser et al. (1992).Footnote 6 Second, births and immigration are persistent, but less than population. Cities that are at the frontier of creativity in one period retain an advantage that persists for a while but not indefinitely. This too echoes similar results on clusters of innovation (Saxenian, 1994; Duranton, 2007; Kerr, 2010). Estimating a transition matrix, we also find that persistence of creativity is higher at the bottom of the distribution than at the top. Most small and uncreative cities remain in that condition. But at the top of the distribution there is more reshuffling in creative clusters than for population: while most large cities keep growing and remain large, creative clusters exhibit more change over the centuries. Third, the overall spatial proximity of births and the distribution of birth-to-death distances did not change much over the centuries. This stability is somewhat surprising, in light of the consolidation of states and the improvements in the means of transportation throughout this period. It suggests that the agglomeration of creative activities is not very sensitive to the cost of transportation and communication, but reflects historically stable forces.Footnote 7

Our paper is related to a large literature. A strong link is with the important work of Mokyr (1990) on the history of technology, and Mokyr et al. (2002) and Mokyr (2016) on the flow of ideas across Europe, and their role in the industrial revolution. Mokyr et al. (2002) and Mokyr (2016) mostly focus on the second half of our sample period, and emphasize the importance of interactions within a European community of intellectuals. Our results suggest that self-governing cities were an important component of the relatively free environment in which these exchanges thrived. Our paper is also related to Cox (2017), who argues that local autonomy and economic freedoms were key to European growth in the period leading to the industrial revolution. Finally, our study is also motivated by the literature on upper tail human capital and the industrial revolution (Mokyr, 2009; Meisenzahl and Mokyr, 2011; Squicciarini and Voigtländer, 2014).

The link between democratic institutions and innovation has been studied in the context of economic growth by Acemoglu and Robinson (2012) and Acemoglu (2008), while De Long and Shleifer (1993), Bosker et al. (2013) and Guiso et al. (2016) have shown the positive relationship between political institutions and urban development. Others have investigated how local cultural traits affect innovation. In particular, Bénabou et al. (2015a) and Bénabou et al. (2015b) show that religiosity is negatively correlated with indicators of innovation. See also Saxenian (1994), Florida (2005), Falck et al. (2011), Acemoglu et al. (2014), Akcigit et al. (2017a) and Akcigit et al. (2017b).

Our paper is also related to a growing literature on innovation. One line of research, surveyed in Carlino and Kerr (2015), analyzes the connections between agglomeration and innovation. Agglomeration advantages are also reviewed by Combes and Gobillon (2015). Much of this research focuses on recent periods and exploits patent data.

A similar historical perspective to ours is taken by a set of studies using microdata on upper tail human capital, such as Schich et al. (2014), De la Croix and Licandro (2015), Gergaud et al. (2016) and Laouenan et al. (2021).Footnote 8 In particular Gergaud et al. (2016) analyze a database of more than one million famous individuals and more than seven million places associated with them throughout human history (3000BCE−2015AD). They document several interesting facts regarding notable people, including a positive correlation between the number of entrepreneurs and artists and subsequent urban growth, which is consistent with our evidence, and a zero or negative correlation between the share of “militaries, politicians and religious people” and urban growth. Relative to their paper, we focus on the effects of local self-government institutions on the formation of creative clusters. More recently, De La Croix et al. (2020) have studied the mobility decisions of scholars across European universities; relative to this paper, our sample also includes creatives who were not affiliated with any university, such as artists and members of scientific academies, and it probably represents the upper tail of the distribution of scholars. Finally, two related papers study the effects of local institutions on innovation, using historical data on Germany. Donges et al. (2016) show that counties whose institutions are more inclusive as a consequence of the French occupation after 1789 turn out to be more innovative (in terms of patents per capita). Dittmar and Meisenzahl (2020) show that sixteenth century legal reforms establishing mass public education and increasing state capacity had a positive effect on upper tail human capital.

The outline of the paper is as follows. The next section defines the data and their sources. Section 3 describes a number of stylized facts about the spatial and temporal distribution of creativity. Section 4 studies the correlation between indicators of local economic conditions and creativity, while Sect. 5 focuses on the relationship between local institutions and the births of famous creatives. In Sect. 6 we study the migration of famous people between European cities. Section 7 discusses future directions and concludes.

2 Data and variable construction

The data used in this paper cover Europe between the eleventh and the nineteenth centuries, and matches information on notable individuals with population and institutional variables at the city level.

Table 1 Freebase professional categories
Table 2 Count of famous creatives and population

Notable individuals The records on notable individuals come from Freebase.com, as coded by Schich et al. (2014). Freebase is a “large Google-owned knowledge base that is publicly editable and available under a Creative Commons Attribution (CC-BY) license, which allows for both sharing and remixing of the data” (Schich et al. Supplementary Material, p. 2). It stores information from a variety of sources, most notably Wikipedia, and contains dates and locations of birth and death, as well as occupations, of notable people. Location information in the records is geocoded, making good quality latitude and longitude data available. Notability of people is “simply defined as the curatorial decision of inclusion” in the (partly crowd-sourced) Freebase (Schich et al., 2014, p. 558). Using these records, we identify 40,980 notable individuals who can be matched by city of birth and/or city of death to the Bairoch et al. (1988) sample (described below). If an individual is born or dies in a small city not included in the Bairoch et al. (1988) sample, we assign it to the closest city in the sample.Footnote 9 Of these individuals, we retain 21,906 who became famous thanks to their creative endeavours in the following domains: arts (performing and non-performing), humanities and science, and business. Table 1 provides a count of the individuals active in each domain. The last row reports the total number of creatives.Footnote 10 Famous creatives in performing arts include: actors, singers, musician, playwrights; in non-performing arts: writers, novelists, journalists, composers, authors, architects; in humanities and science: mathematicians, physicians, philosophers, scientists, physicists, chemists, historians; in business: entrepreneurs, engineers, business-persons, sailors, managers. A further breakdown of polymaths is available in Table A.1 in the Appendix.Footnote 11

Using this information, we define the variable Births\(_{ct}\) as the number of famous creatives born in city c during century t, per 1000 inhabitants at the beginning of each century. This variable measures the city production of upper-tail creative human capital. As a measure of attraction of upper-tail human capital, we define the variable Immigrants\(_{ct}\) as the number of deaths in city c of famous creatives born elsewhere during century t, also per 1000 inhabitants. For individuals who die in a century different from that of birth, we face a problem. Ideally, we would like to attribute these famous people to the century in which their migration decision was taken. Hence, using the century of death risks erroneously posticipating their migration decision, while using the century of birth risks erroneously anticipating their migration decision. One of our goals is to estimate the effect of a change in city institutions (or other observables) on migrations. Using the century of birth (irrespective of the century of death) minimizes the risk of erroneously attributing to institutional changes outcomes that actually took place earlier.

Table 2 reports descriptive information on the number of famous creatives (unscaled by city population) born and immigrated in all the cities and total city population in our sample—the city sample by Bairoch et al. (1988) described below. The Table shows that there is substantial mobility of famous creatives in each century: the number of immigrants is a large fraction of the number of births, even in earlier centuries. Note that there are much fewer famous creatives in earlier centuries.

Our dataset is very broad in terms of time, geography and discipline. Since the universe from which famous creatives are drawn is open ended, our sample includes individuals who became famous for their achievements, and it is not restricted to members of specific professions or institutions. In particular, compared to De La Croix et al. (2020), our data also include artists and members of scientific academies who were not affiliated with any European university. This broadness comes at the cost of some limitations, however. As notability is not based on uniform criteria, selection into the sample reflects the idiosyncrasies of crowd-sourcing. First, as illustrated in Table 2, we have better records of more recent individuals. Second, Freebase editors may have an English Bias and a Western Bias, as well as a gender bias towards males (Yu et al., 2016; Laouenan et al., 2021); furthermore the database may be unsuccessful in recording information on works where the participation of creative groups (e.g. orchestras or research teams in firms) outdoes that of creative individuals. In our regression analysis we always include century fixed effects and city fixed effects, which address these concerns.Footnote 12 Third, it is possible that information is more readily available for individuals that were born in (or migrated to) cities that at the time were renowned centers of excellence in their discipline. If so, our data would overweight creative clusters and discount cities that only gave birth to (or attracted) a few famous creatives. This kind of non-linearity would not be a problem, however, since our goal is to describe and explain patterns in clusters of creativity, more than explaining the location of a few isolated innovators. The opposite mismeasurement is also possible: young individuals born in the vicinity of an existing creative cluster are likely to move there in their formative years, and yet we may classify them as not born in the creative cluster because their birthplace is a city nearby—see the brief narrative on Florence in the early Renaissance (Sect. A.I of the Appendix) for some prominent examples. A more serious concern would be over-recording of famous creatives who were born or died in important political and economic centers, because this would create spurious positive correlation with some of our variables of interest. This may be an issue for state capitals, but it is less likely to be a problem for the cities that became commune, given that in our analysis we control for city population.

A final limitation is that these data treat all notable individuals equally, without weighting them by their achievements and visibility. Below we also discuss the robustness of our results to a similar data set that weights individuals by citations. Specifically, Yu et al. (2016) collect records of individuals present in more than 25 languages in Wikipedia, which contain dates and locations of birth, as well as occupations. Compared to the dataset constructed by Schich et al. (2014), this dataset is significantly smaller and does not include information on the place of death. However, it is manually verified, and it is enriched with the Historical Popularity Index (HPI), a measure that integrates information on the number of languages in which a biography is present in Wikipedia, the time since birth, and the number of page-views between 2008 and 2013. Moreover, the creatives in Yu et al. (2016) belong to the upper tail of the fame distribution in Schich et al. (2014), which also addresses concerns related to city-based determinants of visibility for lesser known individuals. Using these records, we identify 1583 notable individuals who (a) can be matched by city of birth to the Bairoch et al. (1988) sample and (b) became famous thanks to their creative endeavours in the arts (performing and non-performing), humanities and science, and business. We define the variable Births, Yu et al.\(_{ct}\) as the HPI-weighted number of famous creatives born in city c during century t, per 1000 inhabitants at the beginning of each century. The correlation between Births and Births, Yu et al. is 0.48.

Finally, a word on the timing of construction of the records of notable individuals in our sample. Freebase editors are our contemporaries, and this has the advantage of generating some distance between the date of the innovation and the construction of their record, or of the weights used by Yu et al. (2016). This distance arguably allows for a better assessment of breakthrough ideas that may have been too radical (and therefore not accepted) at the time of conception.Footnote 13 The creators of this type of innovation would be more likely to be recorded in posterous editions than by their contemporaries. And conversely, individuals generating fashionable ideas that did not stand the test of time would be less likely to be recorded in posterous editions. All that said, we have compared Births with records from the Index Bio-bibliographicus Notorum Hominum (IBN), which was compiled from around 3000 biographical sources (mainly dictionaries and encyclopedias) with year of publication between 1600 and 1980—see De la Croix and Licandro (2015). We define the variable Births, IBN.\(_{ct}\) as the number of famous creatives in IBN born in city c during century t, per 1000 inhabitants. Unfortunately the geocoding information available to us covers only 59 cities in our estimation sample. We therefore cannot perform our main estimation analysis with these data. However, the correlation between our variable Births and the variable Births, IBN is 0.57.

Correlation with measures of technological innovation The famous creatives included in our sample are not such a large number. Hence, we are really measuring the upper tail of creativity and innovation, particularly in the early part of the sample where the numbers are smaller. This is not necessarily a drawback. Since the returns to innovation are often highly non-linear, a focus on exceptional clusters of creativity is particularly appropriate. Nevertheless, there is evidence that these exceptional clusters were also a locus of less radical innovation. Meisenzahl and Mokyr (2011) collected data on mechanics and engineers born in UK between 1660 and 1830, and on the patents that they created.Footnote 14 Many of these individuals were not great inventors, but rather highly skilled and able craftsmen, who adapted new technologies and provided micro innovations. Almost one third of the 747 innovators in Meisenzahl and Mokyr (2011) are also included in our sample of famous creatives—probably the individuals with greater accomplishments. But interestingly, our variable Births is also correlated with the birthplace of the remaining mechanics and engineers, included in Meisenzahl and Mokyr (2011) but not classified as notable individuals in our sample.

Specifically, from the Oxford Dictionary of National Biography we obtained the place of birth of the innovators in Meisenzahl and Mokyr (2011) not in our sample, and matched it with the cities in our data set. Let Inventors be the number of mechanics and engineers born in a city during a century (per 1000 inhabitants), and Patents be their number of patents (also per 1000 inhabitants), both variables restricted to the individuals in Meisenzahl and Mokyr (2011) that are not in our data set. We then regress Log(1 + Inventors) and Log(1 + Patents) on Log (1 + Births) plus other city observables corresponding to all the other city covariates described below (namely, the variables Large state, Bishop, Archbishop, Capital, Plundered, Commune, Population, University). As shown in Figs. A.1 and A.2 in the Appendix, that depict the added-variable plots, our variable Births is positively and significantly correlated with both dependent variables. Specifically, a 10 p.p. change in Births is associated with a 1.8 p.p. change in Inventors and a 1.6 p.p. change in Patents.Footnote 15 Thus, for the UK between the eleventh and nineteenth centuries, our proxy for production of creative talent predicts indicators of local innovation, and in particular of the technological micro-innovations that contributed to the industrial revolution.

European cities City population is measured at about the beginning of each century. The source is Bairoch et al. (1988). This is a wide-ranging database with information on 2200 European cities that reached 5000 residents between 800 and 1800. Given the scarcity of data on notable individuals in the very early part of the sample, we restrict the analysis to the period between the eleventh and the nineteenth centuries, interpolating population for the missing century 1100.Footnote 16 In some sensitivity analysis, for Italian cities we also use the population data of Malanima (1998).Footnote 17 Information on socio-economic and institutional variables comes from Bosker et al. (2013), who, for a subset of the cities in Bairoch et al. (1988), assembled a large database covering an extensive array of institutional characteristics of European cities between the IX and the nineteenth centuries. The sample covered by Bosker et al. (2013) includes all cities in Bairoch et al. (1988) that reached 10,000 inhabitants between 800 and 1800. In our analysis on the effect of institutions, our sample is always that of Bosker et al. (2013), except in Sect. 3 where we describe the main features of the data and we rely on the larger sample of cities by Bairoch et al. (1988). Both panels of cities are unbalanced: because of the gradually increasing (urban) population, the number of cities increased during our sample period (Bosker et al., 2013, p. 1421). Note that the data coded by Bosker et al. (2013) seek to capture the institutions that were in place at the beginning of each century. Thus, institutional changes that took place during century t would generally show up in century \(t+1\). In other words, although undoubtedly measured with much timing error, our data are more likely to erroneously postpone institutional changes rather than viceversa, compared to our outcomes of interest (births and immigrations of famous people)—a conservative feature of the data definition, given our goal of estimating the effects of institutional changes.

Our main institutional variable of interest is whether a city had a form of self-governance that gave it some autonomy and could constrain the dominant role of the church, state or feudal lords. This is captured by the dummy variable \(Commune_{ct}\) coded by Bosker et al. (2013). This variable measures the extent of local participatory government at the beginning of each century. Typically Communes had autonomy in the regulation of commerce, taxation and other administrative activities. Communal institutions also guaranteed economic and personal freedoms and enforced the rule of law, also through the evolution of civil and penal codes. Besides check and balances on executive authority, Communal political institutions also had forms of limited representative democracy (Pirenne 1925).

Forms of local participative government began to develop in the eleventh and twelfth centuries when Europe was politically fragmented, after the fall of the Carolingian Empire. In the power vacuum that ensued, cities could claim a kind of self-rule that was frequently recognized by the sovereign in return for taxes or loyalty (Jones, 2003). This form of self-government emerged in Northern Italy between the eleventh and twelfth centuries, then they spread to Southern France; nearly at the same time as in Italy, Communes also began to appear in the Flanders and in Northern France (Pirenne 1925). Independent cities emerged in Germany in the thirteenth century, also in association with migration to the East where imperial control was weaker (Parker, 2000 and Fig. A.8 in the Appendix).

As emphasized by Clarke (1926), communes were constitutional oligarchies that represented the interests of merchants, bankers and landowners. Representatives of a ruling class initially acted as the link between the town and its overlord, and gradually gained autonomy from external influence and became accountable to the city bourgeoisie. The degree of emancipation of cities varied across Europe and by centuries, depending on the strength of control over his territory by the prince. In the kingdom of Naples and Sicily, control was strong enough that communes were rare or non-existent. By contrast, the cities of Northern and Central Italy took advantage of the conflict between the Empire and the Pope to gain autonomy from both sources of power. Similarly, in Germany the landed aristocracy was fully occupied “in resisting or supporting the Emperor, extending boundaries and colonising new lands in the north and east”, and this gave German towns an opportunity to grab autonomy and develop their own city institutions (Clarke, 1926). England is somewhere in between: the territory administered by the King was large enough that he had to delegate administrative tasks and tax collection to the main towns in his territory. Yet, unlike in Italy and in the Dutch Countries, the King retained sufficient military capacity to prevent self administered cities from gaining full independence (Angelucci et al., 2017).

The status of Commune is not irreversible, and we observe transitions in both directions, though transitions into the status of Commune (Fig. A.9 in the Appendix) are much more frequent that transitions out of it (Fig. A.10 in the Appendix). The twelfth century is the period with the highest number of transitions into Commune, and the highest incidence of Commune in our sample is observed at the beginning of the sixteenth century. During the fifteenth century, Communes in Italy started to grant long-term authority to a strongman who then acquired absolutist powers over the city and its territory. Here too, external circumstances played an important role, through emulation or because specific external threats convinced towns to grant extraordinary powers to a single individual. As noted by Guiso et al. (2016) : “In several cases the Signoria retained the fundamental institutions of the commune, including the principle that power originated from the people and was to be exercised in the people’s name. In cities such as Florence and Genoa, the Signoria also preserved the political institutions and the personal liberties that had characterized the commune period.” In this regard the Signoria was an evolution of the Commune (Prezzolini, 1948; Chittolini, 1999). Nevertheless, Bosker et al. (2013) (and hence our data) code the transitions into Signoria as a loss of status of Commune. In other parts of Europe, such as the Netherlands, instead, local lords favoured towns, “granted rights of jurisdiction and administration freely and protected commerce from troublesome neighbours” (Clarke, 1926). In our estimation strategy we exploit this geographic variation in the emergence and stability of Communal institutions to build an instrumental variable for being a Commune. A second instrument for Commune, constructed from data by Schulz et al. (2019), is described in context.

Unavoidably, a dummy variable can only imperfectly capture the complexities of historically distant political institutions. Bosker et al. (2013) rely on several criteria for their classification. First, they check whether historical sources (national sources, the Lexikon des Mittelalters and historical encyclopedias) mention the presence of communal institutions such as consuls and town councils. The date is then attributed to the whole century subsequent to the first evidence of such institutions. As a fallback option, they use the building date of a town hall, and if this information is also missing, they use information on the first time city rights were granted (as mentioned in historical encyclopedias), dating the commune from the first century after such rights were first granted. The criterion for exit from Commune status is symmetrical, namely they code if local participatory institutions stopped functioning, because the town council was taken over by a powerful local family (as in hereditary Signorie), or it was dissolved by a central authority or external power (again dating it from the first whole century after the occurrence). In the absence of specific evidence that the town council stopped functioning or that the city was included in a hereditary Signoria, it is assumed that local participatory institutions kept functioning until 1800, in line with local historical sources. Since preservation of Commune status is the default assumption, it is possible that some exits from Commune are under-reported. In the sensitivity analysis we assess the robustness of our estimates to this kind of measurement error by only considering the effect of transitions into Commune up to the first century after the transition, neglecting subsequent centuries and transitions out of Commune.Footnote 18

Notable people could also be attracted by (or be born in) cities that had universities or that were the location of political or religious power, although religion also led to persecutions and hence could also expel rather than attract innovative individuals. To capture these features, we rely on the following variables, also in the data set by Bosker et al. (2013): whether a city has a university (University), and three variables indicating a city’s status in the political and ecclesiastical hierarchy, namely a dummy variable indicating whether a city is the seat of a bishop (Bishop), is the seat of an archbishop (Archbishop) and is a state capital (Capital). We also make use of variables, constructed by Bosker et al. (2013), indicating the number of times a city was plundered in the previous century (Plundered), and whether it is ruled by a large state (Large state).Footnote 19

To study the time series properties and their correlation with local economic conditions, we also collected data on the average nominal wage of skilled worker expressed in grams of silver per day from a variety of sources, as well as data on real wages (i.e. adjusted for purchasing power).Footnote 20 For details on the data sources, see Sect. A.II. These data are only available for 28 major European cities for a long enough period (18 for real wages), but they are yearly and they cover several centuries. Table A.2 in the appendix lists the city and years included in our sample. To reduce measurement error and minimize missing observations, we express the wage as a 10-year average (called Wage), and obtain an unbalanced panel by decades that covers the period 1260–1890. Because real wages display more time variation, we only report results using this variable, although results are similar with nominal wages. Figure A.3 shows the time variation in real wages in five prominent cities that we further discuss below.

Finally, in some specifications we define region-specific or nation-specific variables. Unless noted otherwise, they all refer to current administrative borders, as defined by Eurostat. NUTS 1 refers to macro-regions, NUTS 2 to regions. Other geographic variables are defined in context below, when we introduce them.

Table 3 Summary statistics
Table 4 Spearman’s rank correlation coefficient for key variables over time

Table 3 reports summary statistics for all these variables including births and immigrants for the 2137 cities in our sample.Footnote 21 To give a better sense of the data, Appendix Tables A.3–A.7 list the famous creatives in our sample that were born during periods of peak creativity of five prominent cities, namely Florence in the early Renaissance, Antwerp and Amsterdam between 1200 and 1599, Paris and Vienna in the eighteenth and nineteenth centuries. Section A.I of the Appendix also provides a brief narrative of these amazing periods.

3 Stylized facts

In this section we document several stylized facts. Our goal here is not to test specific hypothesis or establish causality, but to describe the spatial and temporal patterns of the data.

Fig. 3
figure 3

Spatial distribution of births of famous creatives, nineteenth century. The darker the tone, the higher the number of famous creatives born in a city during the century, per 1000 inhabitants. The larger the circle, the larger the population of the city. See Fig. 2 for further notes

Fig. 4
figure 4

Coefficient of variation of births, immigrants, population, fourteenth–nineteenth centuries

Fig. 5
figure 5

Distribution of distances between place of birth of any two famous creatives

Fig. 6
figure 6

Distribution of birth-to-death distances over time

Spatial agglomeration How concentrated are famous creatives in space? Figure 2 displays the spatial distribution of Births in the fifteenth century, the middle of our sample. Darker tones indicate a larger number of Births,  while population size is captured by the circle diameter. Famous Births are shown to be concentrated in a subset of the cities, not always those with larger populations. Amongst the large cities, Florence, Nuremberg and Siena have the most births of creatives, per 1000 inhabitants. These cities are recognized as the centers of the Renaissance in Italy and in Northern Europe respectively. Figure 3 displays the spatial distribution of Births in the nineteenth century, the end of our sample period. Now many more cities are included in the sample, and the darker tones have shifted to Northern Europe and the UK. The spatial distribution of Immigrants displays similar patterns, with Florence and Rome having the largest number of famous immigrants per capita in the fifteenth century—cf. Appendix Figs. A.6 and A.7. Births and Immigrants are positively correlated: their correlation coefficient is 0.56 (see Appendix Table A.8). Thus, cities that give birth to creative individuals also tend to attract famous immigrants—in this sense, one can speak of clusters of creativity. Finally, note that in the overall sample around half of city-century observations have zero Births and around 70% have zero Immigrants, although the fraction of cities with zero Births and Immigrants declines over time.

How did spatial concentration evolve over time? Figure 4 plots the coefficient of variation of Births, Immigrants and population between cities, in each century between 1300 and 1800. A higher coefficient of variation indicates more geographic concentration (a plot of Gini coefficients is very similar). Recall that Births and Immigrants are expressed per capita. The following facts stand out. First, Immigrants are always more spatially concentrated than both population and Births, presumably due to sorting. Second, until 1600 Births were also more spatially concentrated than population. These features suggest that local factors or spillovers associated with spatial proximity were particularly important for creative activities. Third, while early on the tendency for famous creatives has been towards less concentration (convergence) over time, the spatial concentration and the spatial patterns of famous creatives did not change much between 1500 and 1800, despite the consolidation of states and the improvements in the means of transportation, suggesting that the forces behind agglomeration of creative activities are historically stable. Nevertheless, Immigrants were more concentrated in 1700, a century of significant innovations in several domains.

Persistence Next, we analyze the persistence of clusters of famous creatives. A comparison of Figs. 2 and 3 suggests that there is some spatial movement of clusters over time. To further explore the temporal patterns of the data, Table 4 displays Spearman’s rank correlation coefficients for BirthsImmigrants and Population over each consecutive century (each row reports Spearman’s \(\rho\) between the same variable measured in t and in \(t+1\)). The Spearman’s \(\rho\) for Births and Immigrants increases over time but it remains below 0.5 until the last century. Births are generally less correlated over time than Immigrants,  and both are much less auto-correlated than population—though this may also reflect larger measurement error in famous creatives than for population. Table suggests that cities at the frontier of creativity have an advantage, but generally not strong enough to guarantee dominance in creativity in the next century.Footnote 22

Table 5 Markov transition matrices for key variables across cities
Table 6 Coagglomeration

To complement this analysis, we estimate Markov transition matrices for Births, Immigrants and population. Specifically, for each variable (Births, Immigrants and population) and each century, we partition cities in five groups: the first group includes cities that in a given century had a value of zero for that variable; the remaining groups correspond to the quartiles of the distribution in any given century, conditional on being positive. Table 5 displays the probability that a city transits from the row group to the column group in the next century, estimated by Maximum Likelihood (assuming that transition probabilities have remained constant over time). Thus, with regard to Births, the first row says that a city that had 0 Births in century t has a 0.61 probability of retaining 0 Births in \(t+1\), it has a 0.13 probability of being in the first quartile of cities with positive Births in \(t+1\), and so on.

For all variables, the top left and bottom right cells in Table 5 are the largest, indicating that the probability of remaining in the bottom and top groups, conditional on being there, is the highest. For cities at the top of the distribution, there is strong evidence of more persistence in population than in famous creatives: a city that belongs to the fourth quartile in population has a probability of 0.79 of remaining there in the next century, while for Immigrants this probability is 0.49, and for Births it is 0.43. On the other hand, at the bottom of the distribution persistence is roughly similar for famous creatives and for population: cities that have just a few famous creatives and belong to the first quartile have a probability of remaining there or falling in group 0 of about 0.55 for Births and Immigrants; the corresponding value for population is 0.5. In other words, emerging as a large city or a creative cluster is an unlikely event for cities that start out at the bottom of the distribution: most small and uncreative cities remain in that condition. But at the top of the distribution there is more reshuffling in creative clusters than for population: while most large cities keep growing and remain large, creative clusters exhibit more change over the centuries. This pattern conforms with anecdotal evidence about the rise and decline of creativity in cities like Florence, Rome and Vienna.

Notwithstanding the spatial movement of clusters, the overall pattern of spatial proximity of creatives is quite stable over time, despite the consolidation of states and the improvements in the means of transportation throughout these centuries. This can be seen in Figs. 5 and 6. The former displays the distribution of the distance between places of birth of every pair of famous creatives born in the same century, for different centuries. The latter displays the distribution of birth-to-death distances. Both distributions remained quite stable in different centuries, despite the changes in the cost of transportation and communication during this period.

Agglomeration across disciplines We then explore whether creative clusters tend to be specialized or diverse. We want to know whether spillover effects and local factors (observed or unobserved) operate across or within disciplines. We thus estimate the matrix of pairwise correlation coefficients of famous people by discipline, for both Births and Immigrants. Disciplines are disaggregated in performing arts, non-performing arts, humanities and science, and business. Table 6 reports the results, in Panel A for Births and in Panel B for Immigrants . In the first line of each matrix, we condition on century dummy variables, to make variables comparable over time; thus, we first regress famous people in each discipline (Births or Immigrants) on a full set of century dummy variables, and then estimate the correlation coefficients across disciplines of the estimated residuals. In the second line of each matrix we condition on both century dummy variables and the set of observable city characteristics described above.Footnote 23 Thus, the second line of each matrix is a measure of co-agglomeration across disciplines due to either unobserved common local factors or spillover effects, while the first raw also includes the effect of observables. In Panel A, the dependent variable is defined as Log(1+Births in discipline i). We take the Log of 1 + Births in discipline i (rather than of Births in discipline i) to retain in the sample the large number of city observations with 0 births—see also the more extensive discussion in the next section. In panel B, the dependent variable is defined as Log(1 + Immigrants in discipline i) in Panel B.

All correlation coefficients are positive and significant and quantitatively large, indicating that creative elites are clustered across disciplines, as in the prominent examples of Florence and Vienna cited above. Correlation coefficients are very similar with or without conditioning on the full set of observables, implying that common observable shocks are not responsible for the correlations. For instance, based on Panel A of Table 6, if we compare two cities with the same observables, but city A has 10 p.p. more Births of non-performing artists (per 1000 inhabitants) than city B, then on average city A also has 5.5 p.p. more (scaled) Births in humanities and sciences compared to city B. This suggests that spillover effects and/or unobserved local factors operate across disciplines and not just within each field. Note that correlations tend to be somehow stronger for Immigrants than for Births.Footnote 24

4 Population, wages and famous creatives

Given the patterns discussed so far, a natural question is whether creative clusters are influenced by local economic conditions. Higher income and wealth increase the demand for services of creatives and provide resources for investing in innovation, so one would expect that creative clusters are more likely to emerge during good economic times. On the other hand, the primary goal of artists and scientists is to seek influence and recognition amongst peers, knowing that wider social recognition and material rewards will follow. Morover, proximity to other creatives facilitates innovation more than proximity to power and money. With these non-experimental data, we cannot provide a definitive answer. Nevertheless, we can exploit the temporal variation in the data, using city size or wages as indicators of local economic conditions in a distributed lag model that only exploits within city correlations.

Population We start by estimating the following specification:

$$\begin{aligned} Y_{ct}&= \alpha _{c}+\delta _{t}+\pi _{1}\log (Population)_{ct}+\pi _{2}\log (Population)_{ct-1}\nonumber \\&\quad +\pi _{3}X_{ct}+u_{ct} \end{aligned}$$

where \(Y_{ct}\) is either \(Log(1+\ Births)\) or Log \((1+ Immigrants),\) \(\alpha _{c}\) and \(\delta _{t}\) are city and century fixed effects, and \(X_{ct}\) are other covariates described below . We use this functional form to allow for observations where Births or Immigrants are 0. Standard errors are clustered at the level of NUTS 2 regions.Footnote 25 A finding that \(\pi _{1}+\pi _{2}\) is not significantly different from 0 would imply that the formation of creative clusters cannot be predicted by contemporaneous or past city size, casting doubts on the possibility of a causal effect. This being a rather short panel, we do not include the lagged dependent variable, but results are similar if we include it.

Table 7 Population and famous creatives
Table 8 Real wage of skilled workers and famous creatives

The results are presented in Table 7. In Columns 1 and 2 we only include the population variables. We cannot reject the null hypothesis that creative people (born or immigrated) are uncorrelated with current and previous population, although the p-value in the Immigrants regression is only barely above 0.1. In columns 3–4 we add current values of Commune and of other variables indicating a city’s status in the political and ecclesiastical hierarchy, namely Bishop, Archbishop and Capital. Again, we fail to reject the null hypothesis of no positive correlation with current and lagged population, and here p-values are close to 0.9. In results untabulated here, we have explored heterogeneity across disciplines. Births of famous non-performing artists and deaths of famous immigrants in business are correlated with lagged population, as one would expect if good local economic conditions increase the demand for non-performing arts and attracts entrepreneurship.

This evidence is only suggestive, because time is measured at 100 year intervals and population could be measured with error, but the lack of correlation between city size and our indicators of creativity is robust. Results are similar when population of Italian cities is measured by data from Malanima (1998), which is available for a larger number of cities than in Bairoch et al. (1988). We have also used the interaction between a dummy variable indicating whether the city is an Atlantic port and dummy variables from 1500 (Acemoglu et al., 2005) as an alternative proxy for economic success, and reached similar conclusions. Moreover, we obtain very similar results when (a) using Births rather than \(Log(1+Births)\) as dependent variable, (b) entering two lags rather than one, (c) allowing the coefficient of population to vary before/after the middle of our sample period. Overall, these results suggest that changes in urban population, an indicator of local economic conditions does not play an important role in the formation or decay of creative clusters. As remarked in the Introduction, these negative findings are in line with historical anecdotal evidence.

Wages To further investigate the dynamics of the relationship between local economic conditions and famous creatives, we also rely on real wages of skilled workers. Wages are a better measure of local economic conditions than population. Although skilled workers are not the main source of demand for the services of creative, wages of urban workers are likely to be correlated with other sources of income, in particular income from trade. Moreover, wage data are available at yearly frequency for a subset of European cities, and this is important to detect which variable moves first.

We begin our exploration by estimating the following specification:

$$\begin{aligned} Log(1+Unscaled\,Births)_{ct}=\alpha _{c}+\delta _{t}+\sum _{k=0}^{5}\beta _{k}\log (Wage)_{ct-k}+\sum _{k=1}^{5}\beta _{k}\log (1+Unscaled\text { }Births)_{ct-k}+u_{ct} \end{aligned}$$

where t now represents a decade. Thus, our lag structure goes back 50 years. Because population is not available at the frequency of 10 years, we measure the dependent variable as the sum of famous births during the decade, not scaled by population (Unscaled Births). The sample is an unbalanced panel of 18 cities over the period 1260-1890. Note that here we can include the lagged dependent, despite estimating with a city fixed effect, because the panel is sufficiently long (on average about 34 periods per city).Footnote 26 Results are similar when estimating without the lagged dependent variable. Standard errors are clustered at the city level. We also test for first order serial correlation in the estimated residuals, and we can never reject absence of serial correlation.

The results are presented in Column 1 of Table 8: the F test cannot reject that the sum of estimated coefficients of current and lagged real wages is equal to zero. Note that failure to reject is not due to lack of statistical power, because the estimated coefficients on the lagged endogenous variable (lagged births) are always highly significant. This regression suggests that wages do not help predict famous births. To further explore this possibility, we estimate a specification with \(\log (Wage)\) as the dependent variable, and the contemporaneous values and leads and lags of \(Log(1+Unscaled\) Births) on the RHS (for 5 decades):

$$\begin{aligned} Log(Wage)_{ct}=\alpha _{c}+\delta _{t}+\sum _{k=1}^{5}\beta _{k}\log (Wage)_{ct-k}+\sum _{k=-5}^{5}\beta _{k}\log (1+Unscaled\text { } Births)_{ct-k}+u_{ct} \end{aligned}$$

If the estimated coefficients on the leads of births were significantly different from zero, this would suggest that shocks to wages in period t were correlated with famous births in subsequent periods. As shown in column 2, this is not the case. The F-test of joint significance of the lead values of famous creatives born in city c is not significant. This again suggests that wages are not a leading indicator of subsequent accelerations in famous births.

In Columns 3–4 we repeat the same set of regressions replacing births with the number of famous immigrants born in each decade (again, not scaled by population).Footnote 27 The patterns are very similar to Column 1–2.

In results untabulated here, we have also estimated the same regressions when famous creatives are disaggregated by discipline. Unlike for population, in all four domains real wages are uncorrelated with famous creatives. The conclusions are also very similar if we replace real with nominal wages, which are available for a larger number of city-periods, or if we include a distributed lag of immigrants on the right hand side of (2)—here lagged immigrants have a positive and significant effect on subsequent births.

A possible concern is that real wages are measured with error, and do not display sufficient time variation. However, the results do not change if we restrict the sample to the 9 cities for which the coefficient of variation in real wages over time is above the median. As a further check of the statistical power of these regressions, Appendix Table A.9 replaces the dependent variable with Log(Population) and estimates by century and by half century (since 1700). Here contemporaneous and lagged real wages have positive and significant estimated coefficients, suggesting that they display enough time variation to have statistical power.

Overall, therefore, these estimates confirm that the formation of creative clusters cannot be predicted by contemporaneous or past indicators of economic development.

5 City institutions and birth of famous creatives

We now turn to a more detailed analysis of how city institutions influence the formation of creative clusters. In particular, we study how transitions into and out of the status of Commune influence the births of famous creatives. As already discussed, we expect that more democratic and participatory forms of self-government, protecting economic and political freedoms, favor a more open, inclusive and innovative social environment, and thus are positively associated with births of creative individuals.

5.1 OLS estimates

The regression equation that forms the basis of our empirical analysis in this section is:

$$\begin{aligned} Log(1+Births_{ct})=\beta _{1}Commune_{ct}+\beta _{2}X_{ct}+\beta _{3} Spatial \_ Lag\_B _{ct}+\alpha _{c}+\delta _{t}+u_{ct}, \end{aligned}$$

where \(X_{ct}\) are city-level covariates, \(\alpha _{c}\) and \(\delta _{t}\) are city and century fixed effects. The covariates \(X_{ct}\) belong to two groups: those less likely to be affected by the status of Commune ( Large state, Bishop, Archbishop, Capital, Plundered), and those more likely to be influenced by Commune or correlated with the error term, and hence to possibly be “bad controls” (Population, University). Exchange of ideas is crucial for successful innovation, and interactions could take place with neighboring areas, and not just within the city. An isolated city is in a very different situation compared to a city located in the middle of a very creative area. To control for these spatial determinants of creativity, we also show estimates including the spatial lag of \(Log(1+Births_{ct}),\) defined as \(Spatial\_Lag\_B_{ct}=\sum _{d\ne c}\varpi _{d}Log(1+Births_{dt}),\) where the weights \(\varpi _{d}\) are the inverse of distance between cities d and c within the same NUTS 1 region, and 0 outside the NUTS1 region. This spatial lag thus measures the “creative potential” in the macro region, namely neighboring creativity. It captures possible direct effects of being close to other creative cities, as well as possibly omitted variables correlated with creativity in the vicinity of each city. As discussed below, our instrumental variable approach exploits regional waves of institutional change, and this spatial lag is also important to make the IV exclusion restriction more credible.

Table 9 Commune and births, OLS regressions
Table 10 Commune and births, 2SLS regressions

Table 9 shows OLS estimates of Eq. (4). Standard errors are clustered at the region (current NUTS 2) level. In Column 1 we estimate a parsimonious version of the baseline specification where we include period dummies and city fixed effects only. In Column 2, we add the first set of controls. In Column 3 we add the remaining covariates in \(X_{ct},\) and in Column 4 we add the Spatial Lag of the dependent variable. All columns report a positive and significant coefficient on Commune. According to the estimate in column 4, becoming a Commune is associated with a 6 percentage point increase in Births, or an increase of 0.7 unscaled births (a 26% increase relative to the average).Footnote 28 Regarding the other city-level variables, we find a positive and significant coefficient for (possibly endogenous) University, and for Capital. Religious institutions have a negative estimated coefficient, weakly statistically significant only in some specifications.Footnote 29

Note that the inclusion of Log(population) and of University in column 3 does not affect the estimated coefficient of Commune, suggesting that the effect of local political institutions is unlikely to go through these two channels. On the other hand, the inclusion of the spatial lag reduces the estimated coefficient of Commune from 0.70 to 0.58 in column 4, and the estimated coefficient of the spatial lag of \(Log(1+Births_{ct})\) is large and highly significant, suggesting the possible presence of spatially correlated unobserved determinants of creativity, or positive spillover effects from being close to other creative cities.

Finally, note that transition into Commune status of city c may indirectly affect creative births also in non-treated cities different from c. As emphasized by anecdotal evidence and as shown in Sect. 6, when a city becomes a Commune it attracts creative immigrants. Through social learning, this may reduce births of famous creatives in the cities experiencing the outflow. The opposite could also happen, since creative individuals may exert positive spillover effects on neighboring cities. Such general equilibrium effects imply that, even neglecting the identification issues due to unobserved heterogeneity and discussed in Sect. 5.3, we cannot interpret the estimated coefficient on Commune as an average treatment effect. Note however that some of these general equilibrium effects may be captured by the inclusion of the spatial lag of creative births in column (4) of Table 9. Moreover, adding to the regressors also the spatial lags of famous immigrants into neighboring cities and/or the spatial lag of population does not materially change the numerical value or the statistical significance of the estimated coefficient of Commune (results available upon request).

5.2 Event study

A concern for the estimation of the relationship between Commune and Births arises from the possibility of differential pre-trends. Moreover there may be some interesting post-transition dynamics which are not captured by the estimation procedure in the previous OLS Table. We therefore turn to use an “event-study” approach as in Kline (2011) and Autor (2003). This allows us to test for the presence of differential pre-trends and recover any dynamics of the Commune effect. We compare changes in Births of treated cities (i.e. localities that experience the transition into Commune status) both to cities that have not yet been treated and cities that will never be treated during our sample period.

Specifically, the regression equation is:

$$\begin{aligned} \log (1+ Births _{ct})=\mathop {\displaystyle \sum }\limits _{\tau =-2}^{T}\beta _{\tau }D_{ct}^{\tau }+\alpha _{c}+\delta _{t}+u_{ct}, \end{aligned}$$

where \(\alpha _{c}\) and \(\delta _{t}\) are city and century fixed effects, and \(D_{ct}^{\tau }\) are a sequence of “event-time” dummies that equal one when the transition to Commune is \(\tau\) centuries away in city c and T is the end of the sample period (expressed in event time). Therefore \(\tau =0\) is the period of transition to Commune and the \(\beta _{\tau }\) coefficients characterize the time path of creativity relative to the date of transition for “treated” cities, conditional on the unobserved variance components \(\alpha _{c},\delta _{t},u_{ct}.\) We estimate (5) by OLS, including separate indicator variables for two centuries before the transition, the century of the transition, one century after, and for centuries 2 and forward. In other words, we constrain the effects of the transition to remain constant from century 2 onwards. We normalize \(\beta _{-1}\) to zero, so that all post-event coefficients can be interpreted as treatment effects.Footnote 30

Figure 1 displays the estimates. There is no evidence of significant pre-existing trends in Births of famous people, before the transition into Commune. Becoming a Commune at the beginning of the century (date 0 in Fig. 1) is associated with a 5 percentage point increase in the birth of creative individuals (per 1000 inhabitants) during the current century, with an additional increase in the subsequent century. Results are very similar when adding the same controls as in Table 9, including the spatial lag. In results not tabulated here, we have explored more complex dynamics by including additional periods of indicator variables.Footnote 31 The data reject these more complex specifications in favor of those found in Fig. 1. Specifically, we find no evidence of an accumulating impact on Births beyond 2 centuries, nor is there evidence of mean revision in the longer term. It thus appears that the extent of the dynamics of the Births response to the transition is resolved within 2 centuries.

5.3 2SLS estimates

An important limitation of the estimation framework in Fig. 1 is the possibility that both local institutions and creativity may be influenced by time-varying omitted factors. For instance, the emergence of a vibrant and successful class of merchants and financiers could induce political transitions into Commune, and also exert a direct effect on the demand for the services of innovative artists. To tackle this challenge, we adapt a strategy introduced by Persson and Tabellini (2009) and Acemoglu et al. (2019) in their analysis of democratic transitions in a panel of countries. Namely, we instrument Commune with the proportion of other cities with Commune status in the region (defined by current NUTS 1 administrative borders) and in the same century, leaving out the own-city observation—we call our instrument Regional Commune. This instrumental variable relies on the idea that transitions in and out of Commune were influenced by external factors common to an entire region. As described above, the diffusion of communal institutions occurred in regional waves, reflecting wider events such as wars of succession at the time of death of the local lord, or an enfeebled sovereign, or the support of the Church against imperial bishops during the Investiture Conflict (Tierney, 1983; Rubin, 1986; Parker, 2004; Becker et al., 2020). These regional factors also reflected knowledge spillovers and contagion effects in the design of political institutions: towns imitated successful institutional innovations of others, giving rise to families of urban law (Bartlett, 1994).Footnote 32 External forces could also work in reverse if regional threats induced transition into despotic rule (as with the Signorie in Italy), or if the consolidation of central states deprived cities of their autonomy.

The identifying assumption is that, conditioning on all included regressors, regional waves of institutional transitions influenced city level creativity only through a city’s own political institutions. To make this assumption more credible, the regressors also control for regional creativity, measured by the spatial lag of Log(1+Births). Controlling for the spatial lag of the dependent variable reduces the concern that neighboring cities with strong institutions produce or attract more creatives, which in turn exerts direct spillover effects in the region through migration or knowledge diffusion.

The 2SLS estimates are shown in Table 10, which also reports a summary of the first-stage and the reduced-form results. The sequence of specifications mirrors that in the OLS Table. Regional Commune is always highly significant both in the first stage and in the reduced form regressions (F-statistics for the excluded instrument range from 18 to 23). Table A.10 in the Appendix reports the full first stage estimates.

On average, upon a transition into Commune, births of creative people (per 1000 inhabitants) increase by about 12 percentage points in the more inclusive specification with the spatial lag. This corresponds to about 1.4 more creative births per century, or about a 47% increase relative to average births. The estimated coefficient of Commune is about twice as large as in the corresponding OLS regression. The fact that our IV strategy produces larger effects of city institutions on creativity may reflect attenuation bias in the OLS estimates due to measurement error in Commune. Another possibility is that the effect of political institutions is heterogeneous across cities. If so, then consistent OLS estimates the average effect of Commune on creativity across all cities. On the other hand, 2SLS estimates the average effect for the cities that are marginal in the transition, in the sense that they become communes if and only if there exists a regional wave of institutional change.Footnote 33 If the effect of Commune on creativity is larger for cities that are marginal in the transition, the 2SLS estimates exceed those of consistent OLS.

Note that the results are not affected by inclusion of city size on the RHS, which is never statistically significant, although we know from Bosker et al. (2013) that becoming a Commune is also associated with an increase in city size. This reinforces our previous claim that the formation of creative clusters does not seem to operate through local economic conditions. The coefficient estimates of the other institutional variables (not shown) are very similar to the OLS estimates.

To further reduce the concern that contemporaneous general equilibrium effects or omitted variables correlated with the instrument may violate the identifying assumption, we have also used Regional Commune lagged by one century as the instrument; the coefficient of Commune remains significant. The estimates are also not materially changed if the spatial lag of famous immigrants or of population are included amongst the regressors. These results are available upon request.

Alternative instrument Even after conditioning on the spatial lag of Births, the current or lagged regional incidence of Commune could exert a direct influence on the birth of famous creatives in cities within the region, violating our exclusion restriction. A specific concern is non-classical measurement error in Commune: some richer trading centers were erroneously coded as Commune by Bosker et al. (2013), and our estimates reflect the economic and social effects of trade integration rather than those of city institutions. This would pose a threat to our identification assumption, because the regional incidence of Commune could be positively correlated with trade intensity.

To cope with these concerns, we follow Schulz et al. (2019) and consider a second instrument, which measures city exposure to the Church ban of kin marriage. In the middle of the sixth century, the Church banned kin marriage. This policy contributed to the dissolution of kin networks throughout Europe, leading to profound cultural transformations, including greater individualism, more impersonal prosocial psychology and a more universalistic value system—see Goody and Goody (1983), Schulz et al. (2019), Enke (2019), Greif and Tabellini (2017). The Church policy was not uniformly implemented, and areas less exposed to the Church doctrine were slower to abandon kin networks. Schulz et al. (2019) exploits this geographic and temporal variation, and shows that transitions into Commune were more likely in cities more exposed to the Church ban. He also provides evidence that exposure to the medieval Church predicts weak kin networks across countries, ethnicities and European regions, and that weaker kin networks are associated with more democratic governance traditions in ethnicities and countries. Inspired by this work, we exploit exposure to the Church ban of cousin marriage as a second instrument for Commune. Specifically, following Schulz et al. (2019), define the variable W. Church Exposure as the sum of all instances (in 50-year intervals) that a city was within a 100-km radius of the nearest bishopric between 550AC and century t—the first synods that banned cousin marriage took place between 500 and 550. This variable thus captures proximity to a religious authority only during the period of the ban. Since most bishops were appointed after 550 AC, more exposed cities are close to earlier bishoprics.

This second instrument is not highly correlated with the first one (their unconditional correlation coefficient is 0.3), and its validity hinges on different assumptions: namely that, conditional on the included regressors, proximity to earlier bishoprics is not correlated with relevant omitted variables. Recall that we always control for being the seat of a Bishop or Archbishop, and in some specifications we also control for century fixed effects interacted with a dummy variable that equals one if the city has ever been the seat of a bishop. The concern that this instrument could be correlated with trade integration or other sources of non-classical measurement error in Commune, therefore, seems less relevant.

Table 11 Commune and births, using Western Church Exposure as IV
Table 12 Commune and births: QMLE Poisson estimates
Table 13 Determinants of the migration of famous creatives

The 2SLS estimates are shown in Table 11, which also reports a summary of the first-stage and the reduced-form results. Period dummies, city FE, the full set of city-level controls and the Spatial lag of Births are always included. Column (1) displays the just identified model with only W. Church Exposure as instrument. Column (2), our preferred specification, displays the over-identified model with Regional Commune and W. Church Exposure as instruments. The p-value of Hansen’s J statistic for the over-identification test is reported at the bottom, and the over-identification restriction is not rejected. Church Exposure is significant both in the first stage and in the reduced form regressions (the F-statistics for the excluded instrument is 14). On average, upon a transition into Commune, births of creative people (per 1000 inhabitants) increase by about 13 percentage points. In Column (3) and (4) we also control for whether the city was ever the seat of a bishop interacted with period dummies. The estimates are very similar to those in Column (1) and (2).

Overall, this evidence is consistent with the idea that becoming a Commune, and enjoying the resulting autonomy and economic and political freedoms, spreads a culture of openness that encourages innovation and creativity in arts, sciences and business.

5.4 Sensitivity analysis

We now investigate the robustness of the estimates to a number of other issues.

Alternative measures of creativity In our data, being famous is equivalent to being included in the database Freebase.com. Yu et al. (2016) have created a similar database that weights individuals by their influence (see Sect. 2). Table A.11 in the Appendix replicates the 2SLS estimates of Table 10, replacing the dependent variable Births with the corresponding weighted variable obtained from Yu et al (2016). The estimated coefficient of Commune is positive and significant across all different IV specifications, although the size of the implied estimated effect of a transition into Commune is a bit smaller than with the unweighted data used in Table 10 (expressed in percent of the mean of the dependent variable).Footnote 34

Specification and sample restrictions In results untabulated here we have explored the sensitivity of the estimates to alternative specifications and sample restrictions. First, we have included the interaction between a dummy variable indicating whether the city is an Atlantic port and dummies from 1500 onwards, as in Acemoglu et al. (2005). Second, we eliminated city-year observations with unusually high values of Births (trimming observations above the 99% percentile). Results are largely unchanged.

So far we have exploited all transitions into and out of Commune, estimating an average effect. In Appendix Table A.12 we estimate the effect of transitions in the two directions separately. Thus, when estimating the effect of entry into Commune, we drop the city-century observations following a negative transition (from \(Commune=1\) back to \(Commune=0\)). And when studying the effect of exits, we drop the city-year observations prior to a positive transition (from \(Commune=0\) to \(Commune=1\) ). The OLS estimates (not reported in Appendix Table A.12 but available upon request) remain very similar to those of Table 9, with a p-value below 1%, suggesting that the effect of Commune on Births is symmetric for transitions in both directions. When estimating by 2SLS, the estimated coefficients of transitions into Commune (columns 1 and 2) remain statistically significant and similar to the IV estimates reported in Tables 10 and 11, although the standard errors increase somewhat. The estimated coefficients of transitions out of Commune (columns 3 and 4) are also positive and larger than the OLS estimates, but they are smaller than for the positive transitions, and no longer statistically significant, possibly reflecting the smaller number of negative transitions (cf. Appendix Fig. A.10).

As discussed in Sect. 2, it is possible that Bosker et al. (2013) were too generous in preserving the status of Commune even when a city lost its independence or fell under the absolutist regime of a noble family. To assess the robustness of our estimates to this kind of measurement error, in columns 5 and 6 of Appendix Table A.12 we consider only transitions into Commune and retain only the first two centuries after the transition (the century of the transition and the subsequent century), dropping all observations from the third century after the transition and onwards. The 2SLS estimates, displayed in columns 5 and 6 of Appendix Table A.12, remain statistically significant and remarkably similar to those of columns (1) and (2), that include all centuries of Commune = 1 after a positive transition. This suggests that possible coding error in preserving the status of Commune for too long are not biasing our results.

A natural question is whether our results are due to a particular period in history, or to a specific set of countries. To answer, we have dropped the earliest centuries (eleventh, twelfth and thirteenth altogether) or the most recent one (nineteenth) and the results are very similar. However when we start dropping the fourteenth century (in addition to the three earliest centuries) or the eighteenth century (in addition to the most recent one) the coefficient of Commune, while positive, is no longer significant. We have also included the interaction between Commune and a dummy for the period from 1400 onwards (instrumented with the interaction between Regional Commune and this dummy). The coefficient of Commune remains positive and significant while the interaction is not significant. Similar results were obtained with a dummy for the period from 1500 onwards. These results suggest that the results do not differ much across centuries, but that the central period 1300–1799 is particularly important for the observed correlations.

We have also dropped (individually) countries representing at least 5% of the sample (France, Germany, Italy, Spain, UK) and pairs of countries representing macro-regions (Spain and Portugal, France and Germany). The coefficient of Commune remains positive and significant. When dropping the 7 countries belonging to Eastern Europe, the coefficient of Commune, while losing significance, is not very different from the one on the full sample (equal to 0.083 with standard error of 0.051). Finally, the estimates remain very similar if the sample is restricted to the borders of the former Carolingian Empire plus Britain. This suggests that our results are not driven by a particular geographic area, but the positive effect of Commune on births of famous creatives is present throughout Europe. All these results are available upon request.

Poisson estimation The log-linear specification described above has several advantages. OLS is the best linear unbiased estimator, its consistency properties are transparent and we can easily estimate also by instrumental variables. Moreover, scaling the dependent variable by Population reduces concerns about omitting an important regressor, or viceversa including an important “bad control”—Bosker et al. (2013) show that transitions into Commune have significant positive effects on city Population during the same century.Footnote 35 Nevertheless, a possible problem with the log-linear specification is the large number of zero observations in Births (about half of the overall observations have 0 births—see Appendix Fig. A.4). To cope with it, here we also estimate by QMLE Poisson, conditional on the same fixed effects described above. Thus, the dependent variable is the number of famous creatives not scaled by population.

Here the concern that Population is an endogenous regressor is important, because the dependent variable is unscaled and hence the error term is likely to be correlated with Population. To avoid including a “bad control”, rather than controlling for Log(Population) as a regressor, we include a set of dummy variables that classify cities according to their place in the size distribution or according to their size. These dummy variables are more time invariant than Population, and hence they are less likely to suffer from the “bad control” problem that plagues Population, and yet their inclusion makes cities of different size comparable. We estimate with two different definitions of the set of dummies. First, we enter separate dummies for belonging to each of the deciles going from the first to the ninth (in the overall sample of observations), plus one dummy for belonging to the set of percentiles from the 90th to the 94th and one for the percentiles 95th to 98th; thus, the default group consists of cities belonging to the 99th percentile. This specification groups cities so that the first 9 groups have roughly the same number of observations. The last decile would include observations that are very heterogeneous in terms of size, because there are few very large cities. To make these cities more comparable within this top decile, we split it in the finer partition described above. In our second and alternative definition, we include a set of dummy variables that classify cities according to the value of Log(Population), irrespective of the frequency in each bin. Specifically we split the range of variation of Log(Population) in the entire sample into 10 equally sized intervals, and enter a dummy variable for each interval except the last one (which is the default). Thus, this specification groups cities so that each interval corresponds to cities of roughly similar size and that differ from each other by about the same percentage, irrespective of the frequency distribution.

Table 12, reports these Poisson estimates. Note that, having changed the dependent variable, we redefine the spatial lag accordingly, as the spatial lag of unscaled famous births. City and century fixed effects are always included. We also control for all city observables described above (except Population) plus the spatial lag of the dependent variable. In column 1 we include the dummy variables based on the frequency distribution of Log(Population), while in column 2 we include the dummy variables based on the values of Log(Population). The estimated coefficient of Commune is very similar in both specifications. The estimated coefficient of 0.94 for Commune implies that the birth rate of FC in each century is exp(0.94) = 2.6 times larger in cities that are Commune, compared to the others. On average a non-Commune city in the sample features about 1.4 births of famous creatives per century, implying that becoming a Commune is associated with an increase of about 2.1 famous births per century. This is much larger than in the OLS estimate of the log-linear specification of Table 11, where we estimated that transitions into Commune are associated with an increase of about 0.7 unscaled births per century.Footnote 36 Overall, although these Poisson estimates cannot exploit our instrument for Commune, they confirm the main finding above.

6 Migration of famous creatives

In this section we study the determinants of the migration of famous people between European cities, in a gravity model. This section has one goal: to describe how migration is correlated with observable features of European cities, and in particular which institutional features make a city an attractive destination.

6.1 Microfoundations

Let \(m_{jit}\) denote the number of immigrants (unscaled by city size) who die in city i and were born in city j during century t (throughout the century refers to the date of birth, as explained above). Also let \(b_{jt}\) denote the number of famous individuals born in city j during century t. By definition, we have:

$$\begin{aligned} m_{jit}=p_{jit}b_{jt} \end{aligned}$$

where \(p_{jit}\) is the share of individuals who move from j to i in century t.

The share \(p_{jit}\) is the result of a deliberate decision to migrate. We model it as in the standard Random Utility Model, following Beine et al. (2016) and McFadden (1974). Specifically, let subscript k denote individuals, and define \(U_{kjit}\) as the utility of individual k born in j if he moves to i in century t. We assume:

$$\begin{aligned} U_{kjit}=w_{it}-c_{jit}+\varepsilon _{kjit} \end{aligned}$$

where \(w_{it}\) refers to a deterministic component of utility, such as income and other benefits from being in city i\(c_{jit}\) denotes the cost of moving from j to i in century t and \(\varepsilon _{kjit}\) is an individual specific random component of his utility. Note, that, due to our data limitations, we assume that the deterministic component of utility from being in city i\(w_{it},\) only depends on time and on the destination city, for all individuals irrespective of their origin.

If we assume that \(\varepsilon _{kjit}\) is independently and identically distributed according to an Extreme Value Type-1 distribution, then (6) and (7) imply that the expected number of immigrants from j to i in century t can be written as—see Beine et al. (2016):

$$\begin{aligned} E(m_{jit})=\phi _{jit}y_{it}b_{jt}/\Omega _{jt} \end{aligned}$$

where \(\phi _{jit}=\exp (-c_{jit})\) is decreasing in the cost of moving from j to i\(y_{it}=\exp (w_{it})\) is a measure of attractiveness of location i,  and \(\Omega _{jt}=\sum _{l}\phi _{jlt}y_{lt}\) is the expected utility from all possible alternatives available to an individual born in j—including also the decision to stay (corresponding to \(j=l)\). Thus, the flow of immigrants from j to i is higher if city i is more attractive relative to the average of all other cities (weighted by the cost of moving and including the city of origin), if the city of origin j has more famous natives (i.e. more potential migrants), and if the cost of moving from j to i is lower.

Adding a well behaved error term \(e_{jit}\) to Eq. (8), such that \(E(e_{jit})=1,\) allows us to estimate the following gravity equation with dyadic data referring to cities of birth and of death for which \(i\ne j\):

$$\begin{aligned} m_{jit}=\phi _{jit}y_{it}\frac{b_{jt}}{\Omega _{jt}}e_{jit}=\exp (w_{it}-c_{jit})\frac{b_{jt}}{\Omega _{jt}}e_{jit} \end{aligned}$$

6.2 Data and estimation

Since the dependent variable is a count variable with a very large number of zeros, we estimate Eq. (9) by Poisson Maximum Likelihood in the sample of cities included in Bosker et al. (2013). Table A.13 in the Appendix summarizes the main features of the dependent variable. There is a very large number of zeros (more than 99% of all observations), and when positive most dyadic observations have only a few immigrants from the same origin city per century. Nevertheless, more than 83% of the cities in our restricted sample received at least one immigrant throughout the period, and several of them, such as Paris and London, received several hundredths overall. The number of destination cities included in the data set ranges from 77 in the eleventh century to 358 in the nineteenth century. Note that we discard all the dyadic city-century observations where the origin city-century has 0 births, since the probability of receiving an immigrant from that origin is always zero by construction.

The variables on the right hand side of (9) have the following observable counterparts.

To measure the bilateral cost of moving, \(c_{jit},\) we use geographic distance (expressed in 100 km), a time varying variable measuring the fraction of each century in which cities i and j belonged to the same historical state (Schönholzer and Weese, 2018), and a set of dummy variables that equal 1 if cities i and j belong to the same (modern) NUTS1 region, and to two (modern) countries that share the same first official language (Bahar and Rapoport, 2018).

The utility of being in the destination city i\(w_{it},\) is proxied by population size (that only in this section is measured in 100,000) as a proxy for economic development, and by a set of dummy variables that capture the most relevant institutional variables in the dataset by Bosker et al. (2013), namely Commune, University, Capital, Bishop, Archbishop, and Plundered. We expect that being a Commune, a state Capital and having a University all make a city a more attractive destination, while having been Plundered has the opposite effect. The two ecclesiastic variables have an ambiguous sign: on the one hand the Church was a sponsor of creative endeavours in artistic domains, but on the other hand it was also a source of discrimination and censorship.

The number of famous births in the origin city j is measured in logs (to be consistent with the exponential functional form of the Poisson regression (see (9)) and includes all famous people born in city j during century t,  irrespective of whether or not they died in a different city, since they were all at risk of migrating.

Finally, the so called “multilateral resistance” term \(\Omega _{jt}\) refers to the attractiveness of all the alternative destinations, for an individual born in city j. This term has no easily observable counterpart. We thus incorporate it as follows. In a first and most restrictive specification, we assume that the only relevant alternative to moving from j to i is remaining in the origin city, and thus proxy \(\Omega _{jt}\) with \(w_{jt},\) namely with the same institutional variables described above but referring to the origin city. Here we also include a full set of century fixed effects, to capture possible symmetric changes in the cost of moving or in the available alternatives. We then relax this assumption by adding also origin and destination fixed effects, to capture possible time invariant omitted variables. Finally, we estimate with a full set of destination and origin-century fixed effects, with which we fully capture the multilateral resistance term \(\Omega _{jt}\) and any other variable that varies by origin and century (as well as time invariant destination variable). Standard errors are always clustered two ways, by origin and by destination, as suggested by Cameron et al. (2011).

The inclusion of destination (and origin) fixed effects implies that we are identifying the parameters of interest with a diff-in-diff methodology. Namely we assume that changes in the institutions of interest are randomly assigned to cities, after controlling for the remaining covariates. In particular, we must assume that there is no time-varying unobserved heterogeneity making cities that adopt specific institutions also more likely to attract or send out immigrants (for reasons unrelated to the institutional changes). Note that immigrants are measured by century of birth, to minimize the risk that the migration decision precedes the institutional change.

6.3 Results

Table 13 reports the estimates for the three specifications: with only century fixed effects (column 1), with century, destination and origin fixed effects (column 2), and with destination and origin-year fixed effects (column 3). The more credible specification is the one reported in column (3), but the estimated coefficients of the destination variables remain very stable in columns (2) and (3). Being a Commune and a state capital is associated with an inflow of immigrants; the coefficient on University is not significant; Bishop has a negative coefficient. The estimated coefficients on all the distance measures are highly significant and with the expected sign. Cities that gave rise to more births send out more migrants (the estimated coefficient less than 1 implies that some of famous births do not migrate, as we know from the presence of several natives who die in the city of origin). Finally, population size is not robustly associated with any migration patterns.

The estimated coefficient of 0.520 for Commune implies that the arrival rate of immigrants in each century from the same origin city is 70% larger in cities that are Commune, compared to the others.Footnote 37 On average a non-Commune city receives about 1.6 immigrants per century from all origin cities in this sample, implying that becoming a Commune is associated with an increase of about 1.1 famous immigrants per century. Becoming a bishop-city is associated with a drop of about the same size in the immigration rate (though estimated less precisely; p-value is 0.11). Becoming a state capital is associated with an increase in the arrival rate of immigrants of 130%, implying about 2 more immigrants per century.

7 Concluding remarks

It is often argued that open and tolerant political institutions, that protect individual rights and prevent abuse of power by authoritarian leaders, are a prerequisite to sustain innovation-based growth. This argument is strongly supported by the historical evidence of European cities.

As of yet there is no systematic study of the spatial patterns of creativity over a long historical period. After describing the main features of the formation and decay of creative clusters, we study how changes in city institutions affect local creativity. We find that institutions promoting local autonomy and protecting economic and political freedoms encourage the production and attraction of creative talent. The effects are quantitatively large. Becoming a Commune is associated with an increase in the births of famous people of about 40% relative to the average, while the attraction of famous immigrants almost doubles in size upon becoming a Commune. Overall, our estimates strongly suggest that inclusive local institutions and an open environment facilitate the attraction and production of upper-tail human capital in creative occupations.

What are the mechanisms through which becoming a Commune fosters local creativity? We know from Bosker et al. (2013) that transitions into Commune are also associated with subsequent increases in city size. Could this be the mechanism, namely transitions into Commune enhance local economic conditions (or occur in tandem with economic development), and this in turn induces an increase in local creativity? Although this mechanism cannot entirely be ruled out, given the non-experimental nature of our data and difficulties in measuring local economic conditions, the historical evidence seems inconsistent with this conjecture. We find no evidence that changes in historical indicators of economic activity precede or are correlated with changes in creative clusters.

This leaves open the question of what are the mechanisms through which Communal institutions favor the production and accumulation of creative talent. Section A.I of the Appendix provides a brief narrative of a few European cities that became amazing creative clusters in specific periods. These narrative suggests the following considerations about how participatory city institutions and local autonomy might have influenced innovation and creativity.

First, the protection of personal and economic freedoms and an inclusive environment changed the local culture, making it more receptive to innovations and new ideas, enhancing the importance of the common good over particularistic interests, and fostering the appreciation of individual achievements in creative endeavors. Second, the new institutions also changed incentives, through a more meritocratic and less rigid social environment, but also by encouraging works of art and innovations that would enhance the prestige of the city. The Italian Renaissance period exemplifies these two mechanisms. Third, free cities attracted talented and creative individuals who escaped censorship and persecution elsewhere, and this created role models and facilitated social learning, breeding new generations of innovators. Venice, which attracted large numbers of creative immigrants from Greece, Turkey, but also from several European cities, stands out as an example of this mechanism (De Maria, 2010). The inflow of Jews into Vienna from all over the Hasburg empire, after travel restrictions were removed in the mid nineteenth century, is another example (Weinzierl, 2003). Fourth, the political priority given to the protection of the interests of merchants facilitated the emergence of market infrastructures and exchange networks that could also be exploited for creating a market for works of art. The history of Dutch and Belgian cities, such as Bruges, Antwerp and Amsterdam in the fifteenth, sixteenth and seventeenth centuries is an important example. These mechanisms are not mutually exclusive. Discriminating their relative importance and understanding how they operate in different circumstances is an important task for future research, also to assess the external validity of these findings for modern economic development.