1 Introduction

Galor (2009) and Ray (2010), among others, illustrate mechanisms through which not only the level, but the distribution of economic prosperity greatly matters for institutional stability and economic growth of nations in the long-run. As argued in Glaeser and Henderson (2017), the increasingly uneven urban development of emerging economies such as China, India, and Nigeria appears as one of the crucial challenges of our times. The reasons are plenty and connected in a complex web of causalities, including the direct effects of urbanization and urban concentration on poverty alleviation, access to basic services, employment possibilities, socio-political tensions, pollution and the environment (see, e.g., Ravallion 2015; Sun et al. 2016). Concerning socio-political tensions, the mounting rage felt by the impoverished provinces towards the so-called cosmopolitan elites of the capitals of the world is a phenomenon very much at the center of the rise of populism in recent times (see, e.g., Eatwell and Goodwin 2018).

Our work is chiefly motivated by the need to understand dynamics of urban development which, to the best of our knowledge, are so far unexplored.Footnote 1 Taking a stylized approach in the tradition of coalition formation and threshold models of social interaction,Footnote 2 we model the agglomeration of a population into cities of endogenous number and size and study how their relative size changes when a larger fraction of the population moves from rural to urban areas. Our equilibrium predictions include a U-shaped relation between the level of urbanization (i.e., the fraction of the population living in urban areas) and urban primacy (i.e., the fraction of the urban population living in the largest city), a hypothesis that we test empirically using World Bank data. The economic, environmental, and socio-political implications of such an U-shaped trend are extensive. Assuming the bottom of the U has been reached sometimes in the late 20th century, we should now expect the urban population—and thus economic activity, wealth, and power—to increasingly concentrate in the capital city rather than the provinces. While this may feel true for many industrialized countries, the bottom of the U may not have been reached yet in many developing countries which may thus expect the opposite trend of empowerment of provinces.

To illustrate the relevance of our U-shaped hypothesis in a historical example, consider the long run effects of the industrial revolution on the level of urbanization and urban primacy in the United Kingdom through the last two-three centuries. Roughly speaking, before the industrial revolution, a large share of non-agricultural activity was concentrated in London and focused on services related to trade. When the industrial revolution took off, economic activity started diversifying across sectors and geographically spreading north towards growing industrial clusters such as Birmingham (automotive), Manchester (textile), and Newcastle (shipbuilding and steel).Footnote 3 By the mid 20th century, these peripheral centers reached levels of economic prosperity never witnessed before, but their economic growth reached an apex sometime in the mid 1970s and never came back. With their economic decline becoming evident and urbanization still on the rise, London reacquired its uncontested centrality in the last decades, in line with the general pattern of the “renaissance of the metropolis” across the developed world (Glaeser 2011). To summarize, as time progresses, we observe a steadily increasing trend in the level of urbanization and a U-shaped trend in the level of urban primacy, with London remaining the largest city through the whole time span.

While our (anecdotal, and later statistical) evidence is only suggestive, it is remarkable that related fields in the literature found similarly U-shaped correspondences between the degree of concentration and the level of mobilization of resources akin to economic development. One group includes, among others, Imbs and Wacziarg (2003) for GDP per capita and concentration of economic activity across industrial sectors, and related papers on GDP per capita and sectoral concentration of exports.Footnote 4 Another cluster revolves around inequality of income (or wealth) and GDP per capita as documented for instance by Piketty and Saez (2003) and Saez and Zucman (2016), which points directly against the well-known Kuznets hypothesis. Our paper provides a theory for this U-shaped relation in the context of urban development. While we do not claim one-to-one portability across fields, there are obvious interconnections between the distributions of people across space, those of people and resources across industrial sectors, and those of resources across people.

While other predictions of our model are broadly in line with stylized facts of urban economics,Footnote 5 our U-shaped hypothesis directly contradicts the line of inquiry that, in reminiscence of the Kuznets curve, postulates an inverted U-shaped relation between urban primacy and the level of urbanization.Footnote 6 Henderson (2003) provides an extensive review of this literature arguing that, although the inverted U-shape may still be present in the 1985–1995 decade, it is much noisier than in the 1965–1975 decade and it may be fading away in recent times (see, e.g., his Figure 4). We believe this fading effect is partly related to the aforementioned “renaissance of the metropolis” and the tumultuous urban transformation of certain emerging economies. Hence, in our view, there is scope for further debate on the empirical relation between urban primacy and urbanization, particularly in light of the opposite projections on future trends in urban concentration and the drastically different policy responses they may require.

Let us describe our framework in more detail. Our theoretical model considers a continuum of agents scattered on a territory, where the set of agents inhabiting a location is called a city if it has positive mass while it is called a village (or a solitary settlement) otherwise. For simplicity we assume that all locations are equally distant from each other, thus abstracting from the spatial dimension and exclusively focusing on the distribution of population across locations. With some narrative license, our agents can be interpreted as entrepreneurs with different business plans that are heterogeneous in their degree of ambition.Footnote 7 The core intuition is that while ambitious plans can lead to higher profits, they are more difficult to launch requiring more supportive stakeholders at early stages of implementation. Reflecting various frictions, these initial supporters are typically local and thus larger cities are more likely to provide the critical mass necessary to realize more ambitious plans.Footnote 8 We thus assume that the ambition of an agent is a threshold (or type) such that her business plan is operative if and only if she inhabits a city of size larger than or equal to this type. Acknowledging that larger cities also typically lead to higher crowding costs (e.g., higher rents, congestion, pollution, etc), we then model the agent’s preferences such that her crowding cost is minimized conditional on her business plan being launched.

We define an urban distribution as a partition of the set of agents into cities and villages, and we call it an equilibrium if no agent prefers to leave her location. As the presence of agents in a city constitutes the very incentive for more agents to locate there, this naturally leads to the multiplicity of equilibria and potential coordination failures that are typical of the development discourse.Footnote 9 We characterize the broad set of equilibria showing that the equilibrium distribution is always determined by an algorithm that lends itself to intuitive visualization in a simple diagram. Specifically, in every equilibrium, agents are sorted such that cities correspond to different intervals of types while villages are inhabited by the lowest and the highest (but non-utilized) types.Footnote 10 Under mild restrictions, this implies that a larger city size positively affects the dispersion of the residents’ utility but not necessarily the mean, in line with the empirical observations in Eeckhout et al. (2014) and Gaubert (2018).Footnote 11

Further analysis shows that the distribution of agents that maximizes utilitarian welfare is necessarily an equilibrium, and this equilibrium must be cost-efficient in the sense that it minimizes the aggregate crowding cost for given profits of each agent. This insight delivers a one-to-one mapping between the levels of urbanization (i.e., the fraction of agents living in cities instead of villages) and the set of cost-efficient equilibria. A crucial feature of cost-efficient equilibria is the presence of an infinite number of arbitrarily small cities and a limited number of bigger cities of heterogeneous size, where the number of cities whose size falls within the interval \([s, s + k]\) naturally decreases in s for any k (ignoring any intervals entirely devoid of cities). We feel this delivers a fairly realistic and tractable framework roughly in line with the empirical evidence on the frequency of city sizes in relation to Gibrat’s and Zipf’s laws.Footnote 12

Focusing on cost-efficient equilibria, we then engage in comparative statics that are relevant for urban development in the short to long run. We consider population replications that increase the mass of agents, and shifts in the distribution of ambition that lead to first-order stochastic dominance and mean-preserving spreads. In the short run,Footnote 13 we determine that increases in the mass of agents systematically reduce urban primacy (i.e., the share of the urban population living in the largest city), upward shifts in the distribution of ambition have the opposite effect, while higher inequality in the distribution of ambition always leads to higher (lower) urban primacy if the level of urbanization is sufficiently high (low). By contrast, we find that the long run effects depend on specific assumptions and no general pattern can be discerned, with one crucial exception: we fully characterize how a change in the level of urbanization should affect urban primacy. Under fairly general conditions, this delivers a U-shaped relation between urban primacy and the level of urbanization across cost-efficient equilibria for any given distribution of ambition. We view this U-shaped relation to be the principal testable prediction of our paper, and using openly accessible World Bank data across all countries of the world from 1960 to 2016, we provide preliminary evidence in support of this hypothesis.

The paper develops as follows. Section 2 defines the basic model. The core equilibrium and welfare analyses are in Sect. 3, the comparative statics in Sect. 4, and the empirics in Sect. 5. Section 6 concludes. All proofs can be found in the Appendix.

2 Model

2.1 Urban distributions

We consider a continuum of agents of mass \(a>0\), denoted by the set \(A\subset \mathbb {R}\). These agents are distributed on a territory constituted by an arbitrarily large set of locations. Assuming all locations are identical and abstracting from spatial distances, we define an urban distribution as a partition of A into a collection of sets of zero mass (villages) and positive mass (cities). We denote by \(\mathcal {D}\) the set of all urban distributions of agents (i.e., the set of all possible partitions of A). Note that, solely by our definition of city as a set of agents of positive mass, any urban distribution has countably many cities; these cities can be ranked in terms of the mass of agents they contain, and there can be multiple cities with equal mass of agents.

Let \(D\in \mathcal {D}\) be any urban distribution. For each possible rank \(k\in \mathbb {N}\) of a city in terms of mass of agents, we denote by \(n_k^D\in \mathbb {N}\cup \{0\}\) the number of cities ranked k and by \(m_k^D\in \mathbb {R}_+\) the mass of agents contained in each of them. If the number of cities in the urban distribution D is finite we write \(m_k^D=n_k^D=0\) for all ranks k larger than the rank of the city with the smallest mass of agents. Then, the structure of D is summarized by the sequence \(\mathcal {S}(D):=\left( m_k^D, n_k^D\right) _{k=1}^{\infty }\).Footnote 14

We define the level of urbanization of \(D \in \mathcal {D}\) as the fraction of agents who are urban,

$$\begin{aligned} \mathcal {U}(D):=\frac{1}{a}\sum _{k=1}^\infty n_k^D m_k^D. \end{aligned}$$

We think of the degree of urban concentration as a measure of the inequality of the distribution of the mass of the urban agents across cities. By the principle of transfers (i.e., the defining property of an inequality measure) urban concentration should not increase whenever a positive mass of agents is relocated from a larger city to a smaller city (or to a village that becomes a city), as long as this transfer is small enough so that the receiving city or village does not become larger than the providing city. It seems also desirable that a measure of urban concentration is scale invariant, in the sense that it remains constant whenever the mass of agents in each city is multiplied by the same positive factor (so that the proportions of mass of agents across cities are maintained). A measure of urban concentration that satisfies these properties is the generalized Herfindahl-Hirschman Index,

$$\begin{aligned} \mathcal {K}(D):=\sum _{k=1}^\infty n_k^D \phi \left( m_k^D/\sum _{k=1}^\infty n_k^D m_k^D\right) , \end{aligned}$$

where the function \(\phi :\mathbb {R}_+\rightarrow \mathbb {R}_+\) satisfies \(\phi (0)=0\) and it is differentiable, increasing and strictly convex. Finally, we define the level of urban primacy as the fraction of urban population that inhabits one of the largest cities,

$$\begin{aligned} \mathcal {P}(D):= m_1^D/\sum _{k=1}^\infty n_k^D m_k^D. \end{aligned}$$

Urban primacy is a crude but popular measure of urban concentration that is sensitive only to transfers of urban agents that involve the largest cities. As we will see, these three measures of urban development are intimately related to the workings and predictions of our model. Specifically, the measure of urban concentration \(\mathcal {K}(D)\) will be crucial for the interpretation of the cost-efficient equilibria our analysis will focus on, while the level of urbanization \(\mathcal {U}(D)\) and the level of urban primacy \(\mathcal {P}(D)\) will be the core ingredients of our U-shaped prediction.

2.2 Preferences

We think of the agents in our model as entrepreneurs, each endowed with a different idea or business plan. These business ideas are heterogeneous in their degree of ambition which affects both profits and implementability. More ambitious plans potentially lead to higher profits but require a higher critical mass of initial stakeholders (investors, customers, etc.) to become operative. We assume that, due to various frictions related to distance, these initial stakeholders are necessarily local and that larger cities can provide more (varied) resources. For each agent \(i\in A\), we denote by the threshold \(t_i\in \mathbb {R}\) the minimum city size that allows her business plan to realize, so that agent i makes profits if and only if she inhabits a city of mass larger than or equal to \(t_i\). We refer to \(t_i\) as the type of agent \(i\in A\), which is the critical mass required to implement her business plan and indicates her level of ambition.Footnote 15

Our definition of agents’ preferences is schematic but at the same time relatively general. We shall assume that each agent always prefers to make profits to not making profits, and because of increasing crowding costs she will prefer to live in the smallest available city that allows her to make profits. If she is unable to make profits in any available city, she will prefer to live in a village. These statements fully characterize the preferences that we will use in our general analysis, which are lexicographic with ‘making profits’ as the primary criterion and ‘minimizing the crowding cost’ as the secondary one.Footnote 16 The basic idea is that, while an agent’s profits may increase steeply in her degree of ambition, they should be relatively independent of the mass of the city she inhabits (once her business plan is operative) which seems to be a plausible simplification if a business operates on a national or global scale.

We now define the central element of our model, the distribution of types. For each possible city mass \(m\in [0,a]\), we denote by F(m) the total mass of agents whose types are lower than or equal to m, so that they all can make profits in any city of size m or larger. This cumulative mass function \(F: [0,a] \rightarrow [0,a]\) is non-decreasing by construction and we shall assume it is increasing and twice differentiable on the pre-image of [0, a), so that there is a density function \(f(m):= d F(m) /dm\) that is positive and differentiable on such a domain. Denoting by \(\overline{m}_F\) the smallest \(m\in [0,a]\) such that \(F(m)=a\), we can then write \(f(m)>0\) if \(m< \overline{m}_F\) and \(f(m)=0\) if \(m\ge \overline{m}_F\).

Our examples of distributions of types will primarily focus on the case of \(a=1\), making use of well-known distributions from probability theory. A convenient example distribution is the Beta density

$$\begin{aligned} f(m)=\frac{m^{\alpha -1} (1-m)^{\beta -1}}{\int _0^1x^{\alpha -1} (1-x)^{\beta -1} d x}, \end{aligned}$$

whose cumulative mass function satisfies \(F(0) =0\) and \(F(1) =1\) for all parameter configurations \(\alpha ,\beta >0\). Another convenient distribution is based on the Gumbel density

$$\begin{aligned} f(m)=\frac{1}{\beta } e^{- (x-\alpha )/\beta - e^{ - (x-\alpha )/\beta } }, \end{aligned}$$

which substantially differs from the Beta as \(F(0) > 0\) and \(F(1) < 1\) for all parameter configurations \(\alpha \in \mathbb {R}\), \(\beta \in \mathbb {R}_{++}\). Note that by \(F(0) > 0\) there is a positive mass of types (non-positive) that can make profits even in villages, while by \(F(1) < 1\) there is a positive mass of types (larger than \(a=1\)) that cannot make profits in any contingency. With the aforementioned Beta distribution, instead, by \(F(0) =0\) and \(F(1) =1\) such cases have zero mass.

2.3 Welfare

We now present the various welfare criteria that we will employ in our analysis. Let \(D, D' \in \mathcal {D}\) be any pair of urban distributions. We say that D Pareto dominates \(D'\) if a positive mass of agents prefers D to \(D'\) while no positive mass of agents prefers \(D'\) to D. While Pareto dominance leads to unquestionable welfare rankings, it typically leaves many pairs of urban distributions unranked. Hence, to sharpen our predictions, we impose some more structure. Let the function \(\pi :\mathbb {R} \rightarrow \mathbb {R}_{+} \) define the potential profits of each agent depending on her type, and let the function \(c: \mathbb {R}_+ \rightarrow \mathbb {R}_+ \) define the crowding cost of each agent depending on the mass of the city that she inhabits. We shall assume that these functions are twice differentiable and c satisfies \(c(0)=0\), is increasing and weakly convex, and that \(\pi (x)> c(x)\) for all \(x\in [0,a]\).Footnote 17 We can now represent the preferences of each agent \(i\in A\) by the utility function

$$\begin{aligned} u(t_i,m_{r(i)}^D)= \pi (t_i) I(t_i\le m_{r(i)}^D)-c(m_{r(i)}^D), \end{aligned}$$

in which \(m_{r(i)}^D\) denotes the mass of the city inhabited by agent i in the urban distribution \(D\in \mathcal {D}\) and \(I(t_i\le m_{r(i)}^D)\) is an indicator function that takes value 1 if \(t_i\le m_{r(i)}^D\) and 0 otherwise.Footnote 18 Fig. 1 is an illustration of these ideas.

Fig. 1
figure 1

The solid lines in the left, central and right panels, respectively, represent the potential profits \(\pi (t)=.2+.8 \sqrt{t}\) of an agent of type \(t\in [0,1]\), the actual profits \(\pi (t) I(t \le m)\) of an agent of type \(t=.25\) in a city of size \(m\in [0,1]\), and the crowding cost \(c(m)=.9 m^2\) of an agent in a city of size \(m\in [0,1]\). Note that these specifications of potential profits, actual profits and crowding cost are consistent with our restrictions on preferences given \(a=1\)

We say that an urban distribution \(D\in \mathcal {D}\) is cost-efficient if, for a given level of urbanization, it is not possible to decrease the aggregate crowding costs

$$\begin{aligned} C(D):=\sum _{k=1}^{\infty } n_k^D m_k^D c(m_k^D) \end{aligned}$$

without decreasing the profits of some agent. Note that the constrained minimization of C(D) is equivalent to the minimization of urban concentration in the form of the generalized Herfindahl-Hirschman Index \(\mathcal {K}(D)\), as urbanization is held constant in such minimization. Finally, we say that an urban distribution is welfare-efficient if it maximizes utilitarian welfare, which, for each \(D\in \mathcal {D}\), is defined by the average utility

$$\begin{aligned} W(D):=&\frac{1}{a}\int _{i\in A} u(t_i,m_{r(i)}^D) d i \\ =&\frac{1}{a}\int _{i\in A} \pi (t_i) I(t_i\le m_{r(i)}^D) d i - \frac{1}{a} C(D). \end{aligned}$$

Note that cost-efficiency is a necessary condition for welfare-efficiency.

3 Equilibrium and welfare analysis

In this section we develop the core theoretical results, characterizing the subset of urban distributions to be used in the comparative statics analysis. Specifically, we start by characterizing the set of equilibria and then proceed by pinning down the subset of equilibria that are cost-efficient, arguing that the welfare-efficient urban distribution is one of them.

We say that an urban distribution \(D\in \mathcal {D}\) is an equilibrium if no agent prefers to move from her city or village to another existing city or village. The basic idea is that individuals are free to move from one location to another but—being of sub-atomic size—take the existence and size of cities as given.

We say that an urban distribution \(D\in \mathcal {D}\) is assortative if each of the following conditions holds: (i) for each rank \(k\in \mathbb {N}\), the type of an agent inhabiting a city of mass \(m_k^D\) takes a value in \(\left( m_{k+1}^D,m_k^D\right] \); (ii) the type of an agent inhabiting a village takes a value in \(\left( -\infty ,0\right] \) or \( \left( m_1^D,+\infty \right) \). So, by assortativeness agents are segregated into cities according to their types, guaranteeing that each agent inhabits the smallest city where she can make profits while villages are inhabited by a mix of highly ambitious and highly unambitious agents.

We say that an urban distribution \(D\in \mathcal {D}\) has nested structure if \(F(m_{k+1}^D)=F(m_k^D)-n_k^D m_k^D\) for each rank \(k\in \mathbb {N}\), which is a recurrence relation that determines the series of masses of cities \(\left( m_k^D\right) _{k=1}^\infty \) given the largest city mass \(m_1^D\) and the series of numbers of cities \(\left( n_k^D\right) _{k=1}^\infty \). Intuitively, this nestedness condition is intimately related to assortativeness.

Proposition 1

1. An urban distribution is an equilibrium if and only if it is assortative. 2. Each equilibrium has nested structure.

Note that, as all equilibria have nested structure, we can represent the structure of each equilibrium graphically using the recurrence relation of nestedness. In Figs. 2 and 3, we consider two examples of distributions of types and the graphical representations of the corresponding equilibria. Each of them is useful to identify critical points to be addressed in the subsequent analysis.

Figure 2 illustrates the structures of six equilibria for the Beta distribution with parameters \((\alpha ,\beta )=(2,5)\). Together with the equilibrium with no cities, the figure fully characterizes the set of all seven equilibria in this example. All shown six equilibria Pareto dominate the equilibrium with no cities as they introduce new cities all else equal, and many other pairs of equilibria can be Pareto ranked (although not all of them).Footnote 19

Fig. 2
figure 2

Given \(a=1\), F(m) corresponds to the cumulative mass function of the Beta distribution with parameters \((\alpha ,\beta )=(2,5)\). Each panel depicts the nested structure of a different equilibrium, where the solid lines indicate the sizes of the various cities

In the example of Fig. 2, Pareto rankings are evident because the equilibria have a very limited number of cities (at most three). In reality, we typically observe a much higher number of cities on the territory of a country and, given that we have a continuum of agents in our model (a convenient approximation of a large finite population), it may seem natural to expect infinitely many cities in equilibrium. This can be achieved with suitable restrictions on the distribution of types that are introduced in the next example.

Figure 3 illustrates the structures of three equilibria for the Gumbel distribution with parameters \((\alpha ,\beta )=(0,.05)\). As \(F(0)= e^{-1}\approx .37\), there is a positive mass of agents that can make profits in villages, and the nested structure of each equilibrium must be identified using the shifted cumulative mass function \(F(m)-F(0)\), represented by the dotted line. The maximum level of urbanization that can be achieved in equilibrium corresponds to the case of a single city of mass \(m^* \approx .63\) in the left panel, where \(m^*\) is determined by the equation \(F(m^*)-F(0)=m^*\). There are uncountably many other equilibria, at least one for each size of the largest city \(m\in (0,m^*]\), each presenting infinitely many cities and an urbanization level equal to \((F(m)-F(0))/a\). For instance, the central panel depicts an equilibrium with an infinite number of cities, each of different size, where the largest size is .2, while the right panel depicts another equilibrium with an infinite number of cities, each of different size except for the two largest ones, each of size .2. Note that there is no Pareto dominance across these three equilibria, although we may expect the equilibrium in the central panel to lead to higher welfare than the one in the right panel as it presents equal urbanization levels (which implies equal profits for all agents) while having much lower urban concentration (which implies lower aggregate crowding cost, by the weak convexity of c). These insights on efficiency and welfare will be formalized shortly, in Proposition 2. Before doing so, we briefly discuss desirable restrictions on the distribution of types.

Fig. 3
figure 3

Given \(a=1\), F(m) corresponds to the cumulative mass function of the Gumbel distribution with parameters \((\alpha ,\beta )=(0,.05)\), represented by the solid curve, while the dotted curve represents \(F(m)-F(0)\). Each panel depicts the nested structure of a different equilibrium, where the vertical lines indicate the sizes of the various cities

As suggested by the example in Fig. 3, one can show that, in our model, there exists an equilibrium with infinite number of cities if and only if \(f(0)> 1\). Note that this implies the existence of \(\epsilon >0\) such that \(m< F(m)- F(0)\) for each \(m\in (0,\epsilon ]\), that is, there is an excess of agents which can make profits in a city of size smaller than or equal to \(\epsilon \) and cannot make profits in a village. In this spirit, we now consider a stronger condition on the distribution of types that allows to focus on equilibria with infinite number of cities for a broad set of urbanization levels.Footnote 20 We say that a distribution of types is non-constraining if \(m< F(m)- F(0)\) for each \(m\in (0,\overline{m}_F)\), which means that for each m in the pre-image of (0, a) there is an excess of agents which can make profits in a city of size m and cannot make profits in a village. This greatly simplifies the analysis, leading to the general properties of equilibria discussed in and after Remark 1. Before we turn to this discussion, however, we state a brief observation on the stability of the equilibria our analysis concentrates on.

We motivate our focus on non-constraining distributions by the argument that, within our framework, they guarantee the existence of equilibria with the “realistic” feature of representing a high number of cities. Another way to motivate this focus is by the stability of the implied equilibria. The argument is that, if a distribution systematically leads to unstable outcomes, there will be forces—by evolution or design—pushing for a change towards stability. We briefly sketch the argument here, which is along the lines of Granovetter (1978) in our extended framework with crowding costs. For a given equilibrium, consider an exogenous marginal decrease in the size of a city. If, on the one hand, the distribution is non-constraining, such a marginal decrease does not affect the size of other cities of different size, as agents have no incentive to migrate to or from these cities. Conversely, it affects the size of other cities of equal size only minimally, as agents of these cities will marginally migrate to the perturbed city so that their sizes re-balance. In this sense, the equilibrium may thus be considered stable. If, on the other hand, the distribution is constraining, in a typical equilibrium there must be a city whose size is determined by F crossing the 45 degree line from below (for an example, see Fig. 2 where this is the case for all equilibria except the one with no cities and the one with a single city containing the whole population). One can show that a marginal decrease in the size of such a city then leads to a chain reaction so that all residents leave the perturbed city for the villages. As this drastically alters the structure of the equilibrium, such a situation may thus be considered unstable.

Remark 1

Given that the distribution of types is non-constraining:

  1. 1.

    For each \(m \in (0,a-F(0))\), there exists an equilibrium with size of the largest city equal to m, infinite number of cities, and level of urbanization equal to \((F(m)-F(0))/a \) if \(m\le \overline{m}_F\) and equal to \((a-F(0))/a \) if \(m> \overline{m}_F\).

  2. 2.

    There exist multiple equilibria exhibiting up to \(n\in \mathbb {N}\) cities of same size \(m \in (0,a-F(0))\) if and only if \(n m\le F(m)-F(0)\).

Recall that, in the example of Fig. 2, certain equilibria Pareto dominate others because they create new cities all else equal. Conversely, while there is no Pareto dominance across the equilibria of Fig. 3, we may expect the equilibrium in the right panel to lead to higher welfare than the one in the central panel as it presents equal urbanization levels while having much lower urban concentration. These two intuitions are at the core of our welfare analysis.

We say that an urban distribution \(D\in \mathcal {D}\) has substantial structure if \(m_{1}^D \ge \underline{m}_F:= F^{-1}\left( \max _{m\in [0,\overline{m}_F]} [ F(m)-m ] \right) \), a condition which rules out particularly low levels of urbanization (e.g., no cities) because they are Pareto dominated.

We say that an urban distribution \(D\in \mathcal {D}\) has hierarchical structure if \(n_k^D=1\) for each rank \(k\in \mathbb {N}\) with \(m_k^D>0\), which means that there are no multiple cities of same size so that the aggregate crowding cost is minimized for a given urbanization level.

Proposition 2

Given that the distribution of types is non-constraining:

  1. 1.

    An equilibrium is cost-efficient if and only if it has hierarchical structure and the size of the largest city is lower than or equal to \(\overline{m}_F\).

  2. 2.

    An urban distribution is welfare-efficient only if it is an equilibrium (up to misallocation of zero mass of agents) that is cost-efficient and has substantial structure.

Proposition 2 formalizes aforementioned intuitions on the optimality of substantial and hierarchical structures. Firstly, it states that cost-efficiency implies hierarchical structure, meaning that the urban distribution cannot present cities of equal size. The intuition is that, by the weak convexity of the cost function c and for a given mass of urbanized \(U>0\), the crowding cost

$$\begin{aligned} C(D)=\sum _{k=1}^{\infty } n_k^D m_k^D c(m_k^D) \end{aligned}$$

is effectively a measure of urban concentration belonging the family of generalized Herfindahl-Hirschman indices,

$$\begin{aligned} \mathcal {K}(D)=\sum _{k=1}^\infty n_k^D \phi \left( m_k^D/\sum _{k=1}^\infty n_k^D m_k^D\right) , \end{aligned}$$

with \(\phi (m/ U)=m c(m)\). The crowding cost then naturally decreases when a larger city is dismantled (or reduced in size) by redistributing its population to smaller cities. This is exactly what happens in our model when transitioning from a non-hierarchical to a hierarchical equilibrium, implying a lower crowding cost. Secondly, Proposition 2 provides novel insights on the connection between the upper bound \(\overline{m}_F\) and the cost-efficient size of the largest city as well as the relation between welfare-efficiency and equilibrium (where the former implies the latter). The reason for the upper bound \(\overline{m}_F\) is best understood via the example in Fig. 4, which shows that increasing the size of the largest city above \(\overline{m}_F\) leaves urbanization (and the profits of each agent) unchanged while it increases urban concentration (therefore increasing the aggregate crowding cost). Finally, regarding the stated relation between welfare-efficiency and equilibrium in Proposition 2, the former implies the latter because there is an excess of agents in the population that can make profits in a city of any size, the distribution of types being non-constraining.Footnote 21 This implies that agents can always be rearranged so that there is no need to keep anyone in a city unwillingly, that is, it is efficient to keep an individual in a city only if such individual actually wants to be there. Hence, in our model the only source of inefficiency is miscoordination on the wrong equilibrium, as the efficient structure is an equilibrium itself and thus self-sustaining.

Fig. 4
figure 4

Given \(a=1\), \(F(m)=.2+\sqrt{m}\) corresponds to the cumulative mass function of the shifted Beta distribution with parameters \((\alpha ,\beta )=(0,.5)\), represented by the solid curve, while the dotted curve represents \(F(m)-F(0)\) where \(\overline{m}_F=.64\). Each panel depicts the nested structure of a different equilibrium, where the vertical lines indicate the sizes of the various cities

Proposition 2 greatly simplifies the maximization of utilitarian welfare. Suppose that the distribution of types is non-constraining. By Proposition 2, a cost-efficient equilibrium is fully characterized by the mass of the largest city, and a welfare-efficient urban distribution must be a cost-efficient equilibrium that is substantial. Then, denoting by \(D^*(\mu _1)\in \mathcal {D}\) the cost-efficient equilibrium with mass of the largest city equal to \(\mu _1 \in [\underline{m}_F,\overline{m}_F]\), the maximization of utilitarian welfare can be simply stated as

$$\begin{aligned} \max _{\mu _1 \in [\underline{m}_F,\overline{m}_F]} W(D^*(\mu _1))&= \frac{1}{a}\int _{0}^{\mu _1} \pi (t) d F(t) - \frac{1}{a} \sum _{k=1}^{\infty } \mu _k c(\mu _k) \\ \text {s.t. } \mu _{k}&=F^{-1} \left( F(\mu _{k-1})- \mu _{k-1}\right) \text { for each } k\ge 2. \end{aligned}$$

It is noteworthy that, on the considered domain, choosing the size of the largest city \(\mu _1\) is equivalent to choosing the corresponding level of urbanization \(\mathcal {U}(D^*(\mu _1))=(F(\mu _1)-F(0))/a\), which by our previous considerations must take a value in

$$\begin{aligned} \left[ {\left( F(\underline{m}_F)-F(0)\right) }/{a}, {\left( a-F(0)\right) }/{a} \right] . \end{aligned}$$

Going back to our examples, one can show that each of the equilibria with hierarchical and substantial structure depicted in the left and central panels of Fig. 3 is welfare-efficient for some combination of cost and profit functions. This is because the corresponding distribution of types is non-constraining. Conversely, if the distribution of types is constraining such as the one in Fig. 2, it is possible that no equilibrium is welfare-efficient for a given combination of cost and profit functions.

4 Comparative statics of urban development

In this section we focus on welfare-efficient solutions and study how they should change with shocks to the fundamentals. Assuming F to be non-constraining, we exclusively consider cost-efficient equilibria, as the welfare-efficient urban distribution is one of them. Specifically, the two variables of interest are the level of urbanization and the level of urban primacy of cost-efficient equilibria, which can be written as

$$\begin{aligned} \mathcal {U}(D^*(\mu _1))= \left( F(\mu _1)-F(0)\right) /a \text { and } \mathcal {P}(D^*(\mu _1))= \mu _1/ \left( F(\mu _1)-F(0)\right) \end{aligned}$$

for each size of the largest city \(\mu _1 \in [0,\overline{m}_F]\). Note that \(\mathcal {U}(D^*(\mu _1))\) and \( \mathcal {P}(D^*(\mu _1))\) can be easily visualized graphically as the height of the function F evaluated at \(\mu _1\) (shifted by F(0) and divided by a) and the fraction of this height that lies below the \(45^{\circ }\) line, respectively.

In what follows, we divide our comparative static analysis in short run and long run considerations. The short run is defined by a fixed level of urbanization, and we assume that any shock summarized by a change in the distribution of types from \(F'\) to F maps each cost-efficient equilibrium given the old distribution \(F'\) into the unique cost-efficient equilibrium with same urbanization level given the new distribution F. Within this framework, our short run analysis determines whether urban primacy should increase or decrease, depending on the specific shock. In the long run, we assume that urbanization can adjust to the welfare-efficient level (provided that coordination is achieved). While the analysis of the long run consequences of shocks to F does not lead to sharp predictions, we can fully determine the relation between the level of urbanization and the level of urban primacy across cost-efficient equilibria for any given distribution of types F. Intuitively, this relation is suggestive of the long run trends in the levels of urbanization and urban primacy of the welfare-efficient solution driven by shifts in the functions \(\pi \) and c, and more generally, of the relation between the levels of urbanization and urban primacy across different levels of development akin to the solution of coordination problems.

4.1 Short run considerations

In our short run analysis, we consider three shocks to the fundamentals that change the qualitative properties of the distribution of types.

We say that the distribution of types F is a population replication of the distribution of types \(F'\) corresponding to a mass of agents equal to a if there is \(k>1\) such that \(F(t)=k F'(t)\) for all \(t\in [0, a]\). Then, a population replication rescales the mass of agents by a factor of k while leaving the distribution of types unchanged (in relative terms).

We say that the distribution of types F is more ambitious than (first-order stochastically dominates) the distribution of types \(F'\) on [0, a] if each of the following conditions holds: (i) \(F(t)=F'(t)\) if \(t\in \left\{ 0, a \right\} \); (ii) \(F(t)< F'(t)\) if \(t\in (0,a)\). This means that high types are relatively more abundant in F than in \(F'\) (while low types are relatively scarcer).

We finally consider a mean-preserving spread that transfers mass from the center of a distribution to the sides, leaving the mean unchanged. Formally, we say that the distribution of types F is an expansion of the distribution of types \(F'\) on [0, a] if each of the following conditions holds: (i) \(F(t)=F'(t)\) if and only if \(t\in \left\{ 0, \int _{0}^a r d F(r), a \right\} \); (ii) \(\int _{0}^{t} F'(r)d r> \int _{0}^{t} F(r)d r\) for all \(t\in (0,a)\); (iii) \(\int _{0}^{a} r d F(r)= \int _{0}^{a} r d F'(r) \).

Proposition 3

Restricting attention to non-constraining distributions of types:

  1. 1.

    If the distribution of types F is a population replication of \(F'\), urban primacy is lower in the cost-efficient equilibrium with F than in the cost-efficient equilibrium with \(F'\) for any given level of urbanization.

  2. 2.

    If the distribution of types F is more ambitious than \(F'\) on [0, a], urban primacy is higher in the cost-efficient equilibrium with F than in the cost-efficient equilibrium with \(F'\) for any given level of urbanization.

  3. 3.

    If the distribution of types F is an expansion of \(F'\) on [0, a], there is \(\lambda ^* \in (0,(a-F(0))/a)\) such that urban primacy is higher (lower) in the cost-efficient equilibrium with F than in the cost-efficient equilibrium with \(F'\) for any given level of urbanization that is higher (lower) than \(\lambda ^*\).

Figure 5 is an illustration of the results summarized by Proposition 3. The left panel considers a population replication that doubles the population and compares the old cost-efficient equilibrium with the new cost-efficient equilibrium with equal level of urbanization. As shown by the dotted lines, the size of the largest city is left unchanged, which implies that the level of urban primacy decreases with the population replication (it becomes half). This illustrates Point 1 above.

The central panel of Fig. 5 considers a shift in the distribution of types that leads the new distribution to first-order stochastically dominate the old. As shown by the dotted lines, for a fixed level of urbanization, the size of the biggest city is larger in the cost-efficient equilibrium of the new distribution, which implies that urban primacy is higher as predicted by Point 2 above.

Finally, the right panel of Fig. 5 considers a shift in the distribution of types that leads the new distribution to be an expansion of the old. As shown by the dotted lines, for a fixed level of urbanization, the size of the largest city is smaller in the cost-efficient equilibrium of the new distribution than in the corresponding equilibrium of the old. Moreover, this remains true for any old size of the largest city below .5 (the old size is .4 in the example), while the opposite would be true if the old size of the largest city was above .5. As the level of urbanization is proportional to the size of the largest city (see Point 1 of Remark 1), this illustrates Point 3 above.

Fig. 5
figure 5

In the left panel the solid curve corresponds to the case \(a=1\), depicting the cumulative mass function of the Beta distribution with parameters \((\alpha ,\beta )=(0,.5)\), while the dotted curve depicts a population replication that doubles the mass of agents. The central panel focuses on \(a=1\), depicting the cumulative mass functions of the Beta distributions with parameters \((\alpha ,\beta )=(0,.5)\) (solid line) and \((\alpha ,\beta )=(0,.7)\) (dotted line), where the second distribution first-order stochastically dominates the first. The right panel also focuses on \(a=1\), depicting the cumulative mass functions of the shifted Beta distributions \(F'(m)=m+m^2(1-m)\) (solid line) and \(F(m)=m+m(1-m)^2\) (dotted line), where F is an expansion of \(F'\). Each panel depicts the nested structures of two different equilibria, where the vertical solid (dotted) lines indicate the sizes of the various cities that correspond to the equilibrium with the solid (dotted) cumulative mass function

4.2 Long run considerations

We now consider long run trends in urban development, when the level of urbanization can adjust to the welfare-efficient level (provided that coordination is achieved). In principle, one can always identify the optimal level of urbanization by solving the constrained maximization problem stated at the end of Sect. 3. However, our attempts suggest that results crucially depend on specific assumptions on the functions F, \(\pi \) and c, and no general pattern emerges.Footnote 22

While we cannot generally predict whether urbanization increases or decreases in the long run as a consequence of shocks to F, we can determine how a change in the urbanization level should affect urban primacy across cost-efficient equilibria for a given F. Intuitively, by the constrained maximization problem at the end of Sect. 3, this analysis is suggestive of the long run trends in the levels of urbanization and urban primacy of the welfare-efficient solution due to rescaling of the functions \(\pi \) and c. More generally, it can indicate the relation between the levels of urbanization and urban primacy across different levels of development. In this context, we can think of developmental increments as the solution to coordination problems limiting the agglomeration of agents into cities. Recall that, in our model, the expectation of many agents inhabiting a city constitutes the very incentive for such agents to actually go and settle there.

Proposition 4

Let F be non-constraining. For each \(\mu _1 \in [0,\overline{m}_F)\), the relation between urban primacy and the level of urbanization of the cost-efficient equilibrium \(D^*(\mu _1)\) is such that a marginal increase in the urbanization level leads to an increase (decrease) in urban primacy if

$$\begin{aligned} f(\mu _1)< (>) \left( F(\mu _1)-F(0)\right) / \mu _1 . \end{aligned}$$
(1)

To appreciate Proposition 4, it is fundamental to give meaning to the two variables \(f(\mu _1) \) and \( \left( F(\mu _1)-F(0)\right) / \mu _1 \) which govern the long run relation between the level of urbanization and urban primacy across cost-efficient equilibria. On the one hand, \(f(\mu _1) \) is the marginal density of the urbanized types in the cost-efficient equilibrium \(D^*(\mu _1)\), which indicates the total mass of agents that would become urbanized if the level of urbanization was to be marginally increased. On the other hand, \( \left( F(\mu _1)-F(0)\right) / \mu _1\) is the average density of the urbanized types in such an equilibrium, which indicates the relative abundance of agents that can make profits in the largest city. We are now ready to grasp the intuition of Proposition 4. Note that, by the nature of cost-efficient equilibria, an increase in urbanization must go hand in hand with a proportional increase in the size of the largest city. All newly urbanized agents must be residents of the largest city, but these may or may not be enough to match the new size of the largest city, and consequent migration in or out of the largest city may be triggered. Note that such migration must necessarily be from or to the smaller cities, not the villages, therefore involving the urban population only. Thus, by changing the fraction of urban population that resides in the largest city, these population movements directly affect urban primacy. Specifically, when \(f(\mu _1)< \left( F(\mu _1)-F(0)\right) / \mu _1\), the mass of newly urbanized joining the largest city is relatively small, and a marginal increase in urbanization should lead to migration of agents from the smaller cities to the largest to fill in the vacant slots, thus increasing urban primacy. Conversely, when \(f(\mu _1)> \left( F(\mu _1)-F(0)\right) / \mu _1 \), the mass of newly urbanized joining the largest city is relatively large and the migration must go in the opposite direction, thus decreasing urban primacy.Footnote 23

We now argue that, under fairly general conditions, the mechanism identified in Proposition 4 predicts a U-shaped relation between urban primacy and the level of urbanization in a cost-efficient equilibrium. While Proposition 5 identifies a sufficient condition to state this formally, Fig. 6 illustrates this in an example.

We say that a distribution of types F has a density f that is single-peaked on \( (0,\overline{m}_F)\) if there is \(m^* \in (0,\overline{m}_F)\) such that \(df(m)/dm >(<) 0\) if \(m<(>) m^*\) for all \(m\in (0,\overline{m}_F)\).

Proposition 5

Let F be non-constraining and satisfying \(f(\mu _1)= \left( F(\mu _1)-F(0)\right) / \mu _1 \) for some \(\mu _1 \in (0,\overline{m}_F)\).Footnote 24 If the density f is single-peaked on \( (0,\overline{m}_F)\), the relation between urban primacy and the level of urbanization of cost-efficient equilibria is U-shaped.

The crucial assumption behind Proposition 5 is to have a density f that is single-peaked on \( (0,\overline{m}_F)\), which we now argue to be a plausible property of a distribution of types. Consider an extension of our model where F is endogenously determined in a pregame interaction in which individuals choose their types by maximizing expected utility under strategic uncertainty on the formation of the urban distribution. Although this extension is far from obvious,Footnote 25 we can immediately see that certain predictions should hold generally and serve to justify the single-peakedness of f. Intuitively, if a distribution of types emerges from the maximization of expected utility, business plans of intermediate ambition should be the most common as they are close to the optimal compromise in the trade-off between higher profits and lower crowding costs. Conversely, highly or minimally ambitious plans should be relatively scarce due to excessive crowding costs and the insufficient profits, respectively. So, in this setup, we should expect f to be single-peaked in the interior, and the peak of f should coincide with the ex-ante optimal type.

Fig. 6
figure 6

Given \(a=1\), the dotted, dashed and solid lines respectively depict the (non-constraining) shifted Beta distribution \(F(m)=m+m^2(1-m)\), its density function \(f(m)=1+2m(1-m)-m^2\), and the level of urban primacy corresponding to the cost-efficient equilibrium with the largest city of size \(\mu _1=m\)

As a final note, we wish to point out that the converse of Proposition 5 can also hold under different assumptions. Roughly speaking, if we consider a single-dipped density f (i.e., if there is \(m^* \in (0,\overline{m}_F)\) such that \(df(m)/dm < (>) 0\) if \(m<(>) m^*\) for all \(m\in (0,\overline{m}_F)\)), a Kuznets-type inverted U-shaped relation between urban primacy and level of urbanization is generated by the same arguments of Proposition 4. While in the following section we concentrate on the U-shaped relation using 20th and 21st century observations, the opposite could follow from a bi-modal distribution of ambition ascribed to the lack of access to education of large parts of pre-20th century populations. More generally, Kuznets-type cycles of inverted U-shaped and then U-shaped relations between urban primacy and level of urbanization can be generated as a consequence of the introduction of new technologies and the subsequent growth of access to education for the use of such technologies (see Chapter 2 in Milanovic, 2016 for a related approach).

5 An empirical pattern

To test the predicted U-shaped relation empirically, we base the analysis of this section on the World Bank’s dataset, topic “Urban Development,” which includes a panel reporting the levels of urbanization and urban primacy for each country in the world, annually from 1960 to 2016.Footnote 26 As predicted by Proposition 5, the scatter plot in Fig. 7 suggests a U-shaped empirical relation between the level of urbanization and urban primacy. While this scatter plot is based on cross-country average data, the rest of this section tests this hypothesis further using econometric analysis of a panel consisting of all 218 covered countries of the world through the last 60 years.

Our analysis is similar in spirit to the highly influential Imbs and Wacziarg (2003) on stages of economic development. They document a remarkably robust U-shaped relation between sectoral concentration and GDP per capita. Since industrial sectors typically cluster in specialized cities according to increasing returns from spatial proximity, and since higher levels of GDP per capita typically coincide with higher levels of urbanization as joint manifestations of higher levels of economic development, we would like to pose our model as a common theoretical foundation for the empirical observations in Imbs and Wacziarg (2003) and ours. With some caution, one may also link our prediction to the empirical U-shaped relation between the inequality of income (or wealth) and GDP per capita as documented for instance by Piketty and Saez (2003) and Saez and Zucman (2016). Intuitively, when economic resources concentrate in fewer cities and industries, it may also be that income concentrates in the hands of the fewer individuals who dominate these cities and industries.

Fig. 7
figure 7

U-shaped cross-country relation between the average level of urbanization and the average urban primacy, where these averages are computed within each country across the years 1960–2016. Source: Own calculations based on World Bank data

In the following, our empirical strategy consists of a linear regression with the level of urban primacy of each country and year as dependent variable and the level of urbanization and the level of urbanization squared in the same country and year as the two main independent variables. We start by considering basic econometric specifications with robust standard errors with fixed effects for year and continent/country.Footnote 27 The resulting estimations are in Table 1.

Table 1 Relation between urban primacy and urbanization in the world sample

As shown in columns (1) and (2), the specifications which do not include country fixed effects yield statistically significant estimations of the two coefficients of interest which are negative for urbanization and positive for urbanization squared, and are thus in line with our predictions. Most notably, the specification in column (2) with year and continent fixed effects confirms the U-shaped relation. These estimations are robust to marginal changes of the empirical specification such as excluding certain countries from the sample, like e.g., the ones in the top-right corner of Fig. 7. However, when we introduce country fixed effects the evidence is somewhat weakened as the significance of the estimations depends on the exact empirical specification. For instance, the empirical pattern continues to hold as long as we exclude from the sample the countries that belong to the continent-label ‘Middle East and North Africa’, as shown in column (3), while the empirical pattern is blurred when these countries are included. Intuitively, other dynamics than those captured by our analysis may be at play in these countries as many of them have been systematically plagued by political turmoil, civil war, and international conflict.

One weakness of the above estimations is that, when we consider the relation between urban primacy and urbanization within a country and across time, the distribution of types is generally not constant as assumed in Proposition 5. This motivates our second empirical exercise where we introduce into a standard regression with country and year fixed effects control variables roughly corresponding to the shocks to the distribution considered in Proposition 3. As within the World Bank’s dataset these controls are reported only for a relatively small subset of rich countries and recent years, we exclusively focus on the corresponding subsamples within Europe and Central Asia and the world.Footnote 28 The resulting estimations are shown in Table 2 which considers two alternative sets of three control variables as empirical proxies for the three shocks.

Table 2 Relation between urban primacy and urbanization on restricted samples with additional control variables

In these alternative specifications, ‘population replication’ is either population density or total population, ‘more ambition’ is either tertiary education expenditure (as % of total government expenditure on education) or tertiary education enrollment (as % of the age group that is entitled to enrollment), and ‘expansion’ is income inequality measured either as Gini coefficient or as income share held by the top 10%.Footnote 29 As shown in Table 2, no matter which set of controls we choose or whether we focus on ‘Europe and Central Asia’ or the world, our empirical estimations are systematically consistent with the U-shaped hypothesis.

To conclude, the econometric exercises in Tables 1 and 2 together with the scatter plot in Fig. 7 are suggestive of an empirical pattern that is consistent with the U-shaped hypothesis predicted by Proposition 5. We provide additional evidence in the Appendix, in which Fig. 8 demonstrates the robustness of the pattern across time (with scatter plots for the time periods 1960–1979, 1980–1999, 2000–2016) while Table 3 and Fig. 9 show that the U-shape persists with polynomial specifications of higher order. Arguably, our handful of plots and regressions are far from a comprehensive analysis as many alternative empirical specifications can be chosen in terms of, e.g., subsamples and control variables. However, in combination with the much more robust evidence in Imbs and Wacziarg (2003) on the U-shaped relation between sectoral concentration and the level of economic development and the related findings in Piketty and Saez (2003) and Saez and Zucman (2016), we believe this is sufficient to motivate our model as empirically relevant.

6 Conclusions

We take a novel approach to urban development in the tradition of threshold models of social interaction. In our model, the number and the sizes of cities are endogenously determined by the incentives of agents to freely move across municipalities, where settlers in larger cities face a trade-off between higher productivity and higher crowding costs. In this setup, we characterize the set of equilibria, study their welfare properties, and analyze the equilibrium relation between three key measures of urban development: urbanization, urban concentration, and urban primacy. One appealing feature of our model is that all equilibria are defined by a simple recursive algorithm that can be represented graphically with an intuitive diagram, and welfare-efficiency corresponds to an urban distribution with an infinite number of cities of heterogeneous size.

Focusing on welfare-efficient solutions (and the weaker concept of cost-efficiency, which does not require the level of urbanization to be welfare-efficient) we find that in the short run population replications tend to decrease urban primacy, while the short run effects on urban primacy of changes in population characteristics are positive if they come in the form of first-order stochastic dominance, and positive/negative depending on the high/low level of urbanization if they come in the form of mean-preserving spreads. Although we cannot generally pin down the long run effects of these shocks, we can fully determine how changing the level of urbanization should affect other variables. Assuming that the distribution of types is single-peaked in the interior, our findings suggest a U-shaped relation between the level of urbanization and urban primacy. We find preliminary confirmation of this prediction considering a panel of all countries of the world through the last 60 years.

Due to its simplicity and versatility, our model of urban development has potential for various applications and extensions. One possibility is to explore the conflict of interest across cities. While here we have focused on welfare-efficient solutions, in practice these may be difficult to implement because of the necessary compensation of the ‘losers’ using part of the gains of the ‘winners’ of a welfare improvement. As these compensatory transfers should occur across cities in our model, they may be often infeasible and motivate an analysis of second-best solutions. From an empirical viewpoint, an interesting application would be to estimate the distribution of types of a country from the distribution of city sizes assuming that the nestedness condition holds. This would allow for more extensive testing of our predictions as one could monitor how the estimated distribution of types changes across time and countries, and whether these patterns are broadly in line with what we know from other sources.