Minority Population in the One-Dimensional Schelling Model of Segregation

Schelling models of segregation attempt to explain how a population of agents or particles of two types may organise itself into large homogeneous clusters. They can be seen as variants of the Ising model. While such models have been extensively studied, unperturbed (or noiseless) versions have largely resisted rigorous analysis, with most results in the literature pertaining models in which noise is introduced, so as to make them amenable to standard techniques from statistical mechanics or stochastic evolutionary game theory. We rigorously analyse the one-dimensional version of the model in which one of the two types is in the minority, and establish various forms of threshold behaviour. Our results are in sharp contrast with the case when the distribution of the two types is uniform (i.e. each agent has equal chance of being of each type in the initial configuration), which was studied in Brandt et al. (in: STOC ’12: proceedings of the 44th symposium on theory of computing, pp. 789–804, 2012) and Barmpalias et al. (in: 55th Annual IEEE symposium on foundations of computer science, Oct 18–21, Philadelphia, FOCS’14, 2014).

in large cities. Perhaps the earliest agent-based model studied by economists, since then it has become an archetype of agent-based modelling, prominently featuring in libraries of modelling software tools such as NetLogo [35] and often being the subject of experimental analysis and simulations in the modeling and AI communities [11,12,15,16,18,19,21,32,36]. Many versions of the model have been analysed theoretically, from a number of different viewpoints and disciplines: statistical mechanics [8,14] and [4,Sect. 3.1], evolutionary game theory [37][38][39][40] the social sciences [9,10,27], and more recently computer science and AI [1,5,7,13]. It was observed in [7], however, that despite the vast amount of work that has been done on the Schelling model in the last 40 years, rigorous mathematical analyses in the previous literature generally concern altered versions of the model, in which noise is introduced in the dynamics, i.e. where one allows that agents may make non-rational decisions that are detrimental to their welfare with small probability. The introduction of such 'perturbations' may be justifiable from a 'bounded rationality' standpoint.
The model (which will be formally defined shortly) concerns a population of agents arranged geographically, each being of one of two types. Each agent has a certain neighbourhood around them that they are concerned with, and also an intolerance parameter τ ∈ [0, 1] which we shall assume here to be the same for all agents. An agent's behaviour is dictated by the proportion of the agents in their neighbourhood which are of its own type. So long as this proportion is ≥ τ the agent may be considered 'happy' and will not move. Starting with a random configuration, one then considers a discrete time dynamical process. At each stage unhappy agents may be given the opportunity to move, swapping positions with another agent, so as to increase the proportion of their own type within their neighbourhood. Now one might justify a perturbed version of these dynamics, in which agents will occasionally move in such a way as to decrease their utility (i.e. the proportion of their own type within their neighbourhood) by arguing, for example, that it is reasonable to suppose that only incomplete information about the make-up of each neighbourhood is available to the agents. It is a fact, however, that (a) the methods used for the analysis of the perturbed models do not apply to the unperturbed model; (b) the segregation occurring in the perturbed models is very different than in the unperturbed model.
In the unperturbed models the underlying Markov chain does not have the regularities that are found in the perturbed case (e.g. the Markov process is irreversible). The presence of a large variety of absorbing states means that entirely different and more combinatorial methods are now required. Beyond the basic aim of a rigorous analysis for these unperturbed models, which have been so extensively studied via simulations, further motivation is provided by the fact that the Schelling model is part of a large family of models, arising in a broad variety of contexts-spin glass models, Hopfield nets, cascading phenomena as studied by those in the networks community-all of which aim at understanding the discrete time dynamics of competing populations on underlying network structures of one kind or another, and for many of which the unperturbed dynamics are of significant interest. The hope is that techniques developed in analysing unperturbed Schelling segregation may pave the way for similar analyses in these variants of the model. The first rigorous analysis of an unperturbed Schelling model was described by Brandt et al. in [7]. In this work it was also demonstrated that the eventual state of the process differs significantly from the stochastically stable states of the perturbed models. This study focused on the one-dimensional Schelling model and provided an asymptotic analysis, in the sense that the results hold with arbitrarily high probability for all sufficiently large neighbourhoods and population. More significantly, however, it dealt only with the symmetric case where intolerance parameter τ = 0.5 (i.e. an agent is happy when at least 50% of the agents in its neighbourhood are of its own type). In [1] a much more general analysis of the unperturbed one-dimensional Schelling model for τ ∈ [0, 1] was provided. In fact it was shown there that various forms of surprising threshold behaviour exist. A significant symmetry assumption underlying the results in [1,7] is that the populations of the two types of agents are assumed to be uniform (i.e. each agent has equal chance of being of each type in the initial configuration). Indeed, there is no rigorous study of the unperturbed spacial proximity model with swapping agents for the rather realistic case where the distribution of the two types of agents is skewed. In fact, the question as to what type of segregation occurs with a skewed population distribution was raised by Brandt et al. in [7,Sect. 4] as well as in popular expositions of the Schelling model like [20]. The purpose of the present work is to give an answer to this question. We show that complete segregation is the likely outcome if and only if the intolerance parameter is larger than 0.5. Moreover in the case that the minority type is at most 25%, there is a dichotomy between complete segregation and almost complete absence of segregation.

Definition of the Model
Schelling's model of residential segregation belongs to a large family of agent-based models, where a system of competitive agents perform actions in order to increase their personal welfare, while possibly decreasing the welfare of other individuals. This phenomenon roughly corresponds to the so-called spontaneous order approach 1 in economics literature, which studies the emergence of norms from the endogenous agreements among rational individuals.
The Schelling model that we study is a direct generalisation of that in [7] and also that studied by the authors in [1]. The one-dimensional model with parameters n, w, τ, ρ (as listed in Table 1) is defined as follows. We consider n individuals which occupy an equal number of sites 0, . . . , n − 1 (ordered clockwise) on a circle. Each of the individuals belongs to one of the two types α and β. The type assignment of individuals is independent and identically distributed (i.i.d.), with each individual having probability ρ of being type β. Without loss of generality we always assume that ρ ≤ 0.5, i.e. that the individuals of type β are the expected minority (so long as ρ = 0.5). This random type assignment takes place at stage 0 of the process, and defines the initial state. At the end of stage 0, we let ρ * be the actual proportion of the individuals that are of type β (i.e. ρ * is the random proportion, as opposed to the expected proportion ρ).
Unless stated otherwise, addition and subtraction on indices for sites are performed modulo n. Given two sites u, v in any configuration of the individuals on the circle, the interval [u, v] consists of the individuals that occupy sites between u and v (inclusive). For example, if 0 ≤ v < u < n then we let [u, v] denote the set of nodes [u, n − 1] ∪ [0, v] (while [v, u] is, of course, understood in the standard way). When we talk about a particular configuration, we identify each individual with the site it occupies, referring to both entities as a node. The neighbourhood of node u consists of the interval [(u − w), (u + w)] where w is a parameter of the model that we call the (neighbourhood) radius. The tolerance threshold τ ∈ (0, 1) is another parameter of the model that reflects how tolerant a node is to nodes of different a type in its neighbourhood. We say that a node is happy if the proportion of the nodes in its neighbourhood which are of its own type is at least τ .
Given the initial type assignment (colouring) of the nodes, the Schelling process then evolves dynamically in stages as follows. At each stage s > 0 we pick uniformly at random a pair of unhappy nodes of different type, and we swap them provided that in both cases the number of nodes of the same type in the new neighbourhood is at least that in the original neighbourhood. If at some stage there are no further legal swaps the process terminates. If at some stage all nodes of the same type are grouped into a single block (i.e. a contiguous interval), we say that at that stage we have complete segregation.
This completes the definition of the Schelling process with parameters n, w, τ and ρ, which we denote by the tuple (n, w, τ, ρ). The process can be seen as a Markov chain with 2 n states corresponding to the configurations that we get by varying the type of each node between α and β. A state is called dormant if either all α-nodes are happy, or all β-nodes are happy. We shall be interested in the case that w is large, and that n is large compared to w. In this context it will turn out that the absorbing states of the Schelling process are exactly the dormant states and, in fact, the only recurrence classes of the Schelling process are the dormant states and complete segregation. Complete segregation is, strictly speaking, a recurrence class of the process, consisting of the rotations of the two blocks, one consisting of all the α-nodes and the other consisting of all the β-nodes. Hence, modulo symmetries, we may regard complete segregation as an absorbing state. Dormant states are a different kind of absorbing state, as the process actually stops when it hits a dormant state. Note that a static process cannot come to complete segregation, since it does not allow a sufficient number of swaps for the complete separation of the node types on the ring to be formed, starting from a random state. Note that the number of nodes of type α and of type β does not change between transitions, once the initial state has been chosen.

Our Results
Given the Schelling process (n, w, τ, ρ) we wish to determine with high probability the type of equilibrium that will eventually occur in the system. Given a constant c ≥ 0, we write 'for c w n', to mean 'for all w sufficiently large compared to c, and all n sufficiently large compared to w'; formally, we say that a result with parameters w, n holds for c w n, if there exist functions c → W c , w → N w such that the result holds for each w > W c and n ≥ N w . We are interested in asymptotic results, i.e. statements that hold with arbitrarily high probability for 0 w n. The following definition encapsulates the type of asymptotic statements about the Schelling process (n, w, τ, ρ) that we are interested in establishing.
Definition 1 (Properties with high probability and static processes) Suppose that R is a property which may or may not be satisfied by any given run of the Schelling process (n, w, τ, ρ), and T is a property of τ, ρ. The sentence "if T (τ, ρ), then with high probability R(n, w, τ, ρ)" means that, provided that τ, ρ satisfy T , for all > 0 and 1/ w n, the process (n, w, τ, ρ) satisfies R with probability at least 1 − . We say that the process (n, w, τ, ρ) is static if, given > 0, with high probability the number of nodes that ever change their type in the entire duration of the process is ≤ · n. 2 By [1,7], the asymptotic behaviour of the process (n, w, τ, ρ) is known for ρ = 0.5 (except on the threshold τ = κ 0 ≈ 0.353). The present work is dedicated to the case where one type of node is the minority, i.e. when ρ < 0.5. We show that with probability 1 the process will either reach complete segregation or reach a dormant state. Moreover we show that when τ > 0.5 the highly probable outcome is complete segregation. Moreover, in many cases when τ ≤ 0.5 the outcome is negligible segregation (i.e. the process is static). Let κ 0 ≈ 0.353 and λ 0 ≈ 0.4115 be the unique solutions of (0.5 − x) 0.5−x = (1 − x) 1−x and 2τ · (0.5 − τ ) 1−2τ = (1 − τ ) 2(1−τ ) respectively in [0, 0.5]. 3 Theorem 1 (Main result) If τ > 0.5, ρ < 0.5 and τ + ρ = 1, then with high probability the Schelling process (n, w, τ, ρ) reaches complete segregation. The process is static (with high probability) if The values of (τ, ρ) for which we show that the process is static, correspond to the triangular area of the first diagram (or, equivalently, the collapsed part of the surface of the third diagram) of Fig. 1. The case when ρ ≤ 0.25 presents a remarkable contrast as τ crosses the boundary of 0.5. In this case, when τ exceeds the threshold 0.5, the process changes from static to the other extreme of complete segregation. We display these results in Table 2. In Sects. 2-4 we present the argument that proves these results. This argument uses a number of smaller results which are stated without proof, and are the building blocks of the proof of Theorem 1. It is our intention that the reader gets a fairly good understanding of our analysis in this part of the paper, without the burden of having to verify some of the more technical parts of the proof. Sect. 5 contains the detailed proofs of all the facts that were used in Sects. 2-4, and completes the proofs of Theorem 1 and Corollary 1. The two-dimensional axes refer to τ and ρ. In the first figure, the process is static except for the (τ, ρ) in the small area at the top right corner. The second figure is a plot of P stab and P unhap (for w = 100) as functions of (τ, ρ). The third figure is a plot of g(τ, ρ) for w = 100 Table 2 The main result Process parameters Segregation Our proof of Theorem 1 is nonuniform, and the analysis is roughly divided in the two cases displayed in Table 3: balanced happiness (when τ + ρ > 1, τ > 0.5) and unbalanced happiness (when τ + ρ < 1, τ > 0.5). Here happiness refers to the numbers of initially happy nodes of the two types, and determines the dynamics that drives the process to an equilibrium. Of the two cases, unbalanced happiness is the most challenging to deal with, and the dynamics is driven by a small number of unhappy α-nodes against the large number of unhappy β-nodes, which in fact is preserved throughout a significant part of the process.

Schelling Models and Relation to Spin-1 Physical Models
The definition of the Schelling model in Sect. 1.1 is rather standard, close to the spacial proximity model from [28,30] and identical to the model studied in [1,7]. Most significantly, it is an unperturbed Schelling model, where agents cannot make moves that are detrimental to their welfare. We have already remarked in the introduction that various rigorously analysed perturbed versions of the model in the literature (such as [38]) actually force 'regularity' on the process, which makes it fit an already existing methodology (such as Markov chains with a unique stationary distribution, or with properties that guarantee stochastically stable states). Even if we commit to the absence of perturbations in the model, it is possible to add complications to the simple dynamics defined in Sect. 1.1. For example, the agents may take into account the distance they need to travel before they move, and such considerations separate models with 'short-range' interactions from models with 'long-range' interactions, such as the Schelling model. Although similar models but with 'short-range' interactions have been studied in the past, e.g. [17], rigorous results for the corresponding models with "long-range" interactions are currently under study. Numerous authors, for example [14,23,25,26,34], have noted the close relationship between Schelling models and variants of the Ising Model, widely studied by statistical physicists to understand phase transitions. In this situation, perturbed or noisy versions of the model correspond to a temperature T > 0, which can productively be analysed using the Boltzmann distribution. Typically the limit T → 0 is then studied. Thus the current work can be viewed as a study of a family of 1-dimensional kinetic Ising models with range of interaction w, as non-equilibrium systems under rapid cooling, that is at T = 0 (where radically different behaviour can be observed than in the limit T → 0). In this situation, the Boltzmann distribution is no longer a viable tool, and the use of the threshold τ can be seen as a simple alternative in determining whether or not a spin will update if selected.
In the current work, the model evolves by swapping pairs of agents of opposite types, corresponding to "closed" spin-1 systems under Kawasaki dynamics in which magnetization is conserved (and which are used to model alloy systems), whereas versions such as [2] in which individual agents switch type correspond to "open" spin-1 systems under Glauber dynamics in which magnetization is not conserved. To make the connection explicit, (temporarily) write S i (t) = +1(respectively − 1) if site i is occupied by a node of type α (respectively β) at time t. Then the spin at site i is unhappy (and thus willing to swap) if and only if In our view, it is the simplicity of the original Schelling model, contrasted by the complexity of the analysis required to specify its behaviour, as demonstrated in [1,7] and the present work, that make this topic fundamental and interesting. Under the above requirement for simplicity and proximity to the original model, there remain a number of ways that the model can be altered or generalised. For example, note that in the case that τ > 0.5 in the model of Sect. 1.1, two nodes may swap although the number of same-type nodes in their neighbourhoods remain the same after the swap. One may alternatively require that for such a swap, the corresponding numbers of same-type nodes in the neighbourhoods increase (note that such a modification would not make a difference if τ ≤ 0.5). Our choice on this issue follows Brandt et al. in [7, §2]. One generalisation, considered in [2], is to allow different tolerance thresholds for the two types of individuals. Another generalization, already present in [30], is to introduce a number of vacancies, i.e. to allow the total number of individuals to be smaller than the number of sites.
We can also alter the dynamics. Instead of switching two chosen individuals at each stage, we could choose one individual and change his type. Such an action may be interpreted as the departure of the individual to some external location and the arrival of an individual of the opposite type at the site that has just become available. Models with this dynamics are often said to have switching agents (see [2], where such a model was analysed) as opposed to the swapping agents of the current model. Finally one can consider versions of this model in higher dimensions, where the most relevant recent works are [3,22] and concern switching agents.

Objectives of the Analysis of the Unperturbed Model and Related Work
We use the notation of Sect. 1.1, so that the symbol n always means the population variable of the process, and w always is the parameter of the process which determines the length of the neighbourhood of nodes. Similarly, τ, ρ always refer to the parameters of the Schelling process. In Sect. 2.2.3 we show that, with probability one, the process (n, w, τ, ρ) either reaches complete segregation or it reaches a dormant state. In the second case, we wish to determine the extent of segregation in the dormant state. In view of the large number of states that the process may have (most of them 'random') a question arrises as to how to classify or even talk precisely about different states that may be the outcome of the process. Brandt et al. noticed in [7] that, at least in the case τ = ρ = 0.5 that they considered, the extent of the segregation that occurs in the final state depends crucially on w. In fact, they showed that the dependence on w is at most 'polynomial'. We may say that a state is regarded as polynomial segregation if, with high probability a randomly chosen node belongs to a contiguous block 5 of size that is proportional to the value of a polynomial on w. A similar definition applies to exponential segregation. These two notions turn out to provide a very useful language for explaining the eventual outcome of the Schelling process. A full characterization (extending the work of Brandt, Immorlica, Kamath, and Kleinberg [7]) of the asymptotic behaviour of the process (n, w, τ, ρ) for ρ = 0.5 and τ ∈ [0, 1] was provided by the authors in [1] in terms of polynomial and exponential segregation, as well as static processes. Intuitively, a random state is non-segregated, while polynomial and exponential segregation correspond to highly non-random states.
The characterization from [1] is summarized in Table 4. It is rather striking that when intolerance is increased from, say, 0.4-0.5 the segregation is decreased. This phenomenon is akin to the many paradoxes that stem from the missing link between local motives of agents and global behaviour of a system (e.g. see Schelling's classic monograph [31], and in particular Chapter 4 which relates to his segregation models). Even more strikingly, the authors showed in [1] that the paradox occurs for all τ ∈ (κ 0 , 0.5), i.e. as τ approaches 0.5 the segregation (in the final state) decreases.  Fig. 3 The logic of the proof that if τ > 0.5, with high probability the process reaches complete segregation This paradoxical phenomenon is also clear in many simulations of the model. Figure 2 shows typical runs of the processes (5×10 5 , 3×10 3 , τ, 0.5) for τ ∈ {0.485, 0.49, 0.495, 0.5}. The final state is depicted in the circle, where the nodes of one type are black and the nodes of the other type are grey. We use the space between the centre of the ring and the ring in order to record the actual process, as it evolves in time. In particular, if a grey node switches its place with a black node, we put a black node (the colour of the more recent node) between the location of the node and the centre of the ring, at a distance from the centre which is proportional to the stage where the swap occurred. Hence we may observe "cascades' of swaps of nodes of the same type, which are less severe as τ approaches 0.5. Such cascades are crucial in the rigorous analysis of the model, both in [7] and in [1]. Figure 2 shows that as τ approaches 0.5, the segregation is decreased. This behaviour can be traced to the probability that a node is unhappy in the initial configuration, and in fact, the threshold constant κ 0 is derived by comparing related probabilities in [1].
In the case ρ = 0.5 the two constants κ 0 and 0.5 mark phase transitions in the limit state of the process (n, w, τ, ρ), as τ takes values in [0, 1]. This brings us to another important objective of the analysis of the Schelling process, which is the discovery of phase transitions with respect to the parameters τ, ρ. Incidentally, we note that the discovery of phase transitions has been one of the original motivations for the study of the one and two dimensional Ising model, when one varies the temperature (see the end of Sect. 1.3 for a brief discussion of the analogy between the Ising and the Schelling models). Finally we are also interested in the expected time that the process takes to converge.

Overview of Our Analysis
We use different methods for the cases τ ≤ 0.5 (Sect.4) and τ > 0.5 (Sects. 2 and 3). If τ ≤ 0.5, in order to derive conditions under which the process is static, we analyse and compare the probabilities of initially unhappy nodes and stable intervals. 6 If τ > 0.5 we consider the two cases τ + ρ < 1 (Sect. 3) and τ + ρ > 1 (Sect. 2) and argue (using distinct arguments) that in each of them complete segregation is the high probability outcome.
Case τ > 0.5 This case is divided to the cases τ + ρ > 1 and τ + ρ < 1, and the structure of the analysis is depicted as a flowchart in Fig. 3 (along with the sections where the various implications are analysed), and in more detail in Fig. 4. First, we show that asymptotically (on w, n), from any state there is a series of transitions that leads to either a dormant state, or complete segregation. Hence, since there are only finitely many states, with probability one the process will reach either a dormant state or complete segregation. So in order to establish complete segregation as the eventual outcome, it suffices to show that the process maintains unhappy nodes of each colour during all stages.  Fig. 4 The logic of the proof that if τ > 0.5, with high probability the process reaches complete segregation.
Here 'β-block' refers to the persistent β-block of Sect. 3.1 First, assume that τ + ρ > 1, a case which is dealt with in Sect. 2. In this case we can show in Sect. 2.2.3 that, assuming that the actual proportion of β-nodes is sufficiently close to ρ (which is very likely according to the law of large numbers), every reachable state is not dormant. More precisely, we show that given such numbers of α and β-nodes, every permutation of them on the ring corresponds to a state which has both unhappy α and unhappy β-nodes. Since the numbers of nodes of each type do not change during each transition, this argument suffices for this case. States with the property that no series of transitions from them leads to dormant states are called safe. So, in the case τ + ρ > 1 we argue that (with high probability) the initial state is safe.
Second, we assume that τ + ρ < 1, which is a considerably harder case that we deal with in Sect. 3. Under this hypothesis, in the initial configuration we have o n many unhappy α-nodes and Ω(n) many unhappy β-nodes. As before, it suffices to show that (with high probability) the process never reaches a dormant state. It is not hard to see that (with high probability) the initial state is not dormant. However it is no longer clear if the initial state is safe. In Sect. 2 we show that given the expected numbers of nodes of the two types in the initial state (or numbers sufficiently close to their expectations) any permutation of the nodes on a ring corresponds to a state with at least one unhappy β-node. Hence, with high probability, the process will never run-out of unhappy β-nodes and we only need to argue about the preservation of unhappy α-nodes. Already it should be clear that this is an asymmetric case where the α-nodes (the majority) and the β-nodes (the minority) play different roles. When τ + ρ < 1 there are many permutations of the nodes (which correspond to states where all α-nodes are happy, i.e. dormant states. So the argument that was used in the case τ + ρ > 1 is no longer relevant for arguing for the preservation of unhappy α-nodes in the process. The argument we use instead (technically overviewed in Sect. 3.2 and executed in Sects. 3.3, 3.4) is based on the asymmetry between the number of unhappy β-nodes and the unhappy α-nodes, which creates a dynamic that favours the preservation of unhappy α-nodes. More precisely, it favours the preservation of β-blocks of length > w, which is a condition implying the existence of unhappy α-nodes (indeed, the α-nodes neighbouring a β-block of length at least w are unhappy). Hence if we show that the expected number of unhappy α-nodes remains small during the stages of the process, then we can expect the existence of unhappy α-nodes (and unhappy β-nodes) up to the point where the total number of unhappy nodes is small.
In addition we show that if the total number of unhappy nodes in a state is sufficiently small, then this state is safe, i.e. there is no series of transitions from it to a dormant state. The argument is concluded in Sect. 3.5 by showing that it is very likely that by stage n the process will arrive at a state with appropriately low number of unhappy nodes, before it reaches a dormant stage. Figure 5 is a plot of the numbers of unhappy α-nodes and the unhappy β- node during the stages, taken from two typical simulations (one with large and one with small population), when τ + ρ < 1. 7 The process we described is clearly visible: the number of unhappy α-nodes remains small, until the number of unhappy β-nodes becomes small. Up to the later point, as we explained, the dynamics favours the preservation of unhappy α-nodes.
Case τ ≤ 0.5 In this case, dealt with in Sect. 4, we have τ + ρ < 1, and this means that in the initial configuration the α-population is happy with a few exceptions, while the βpopulation is unhappy, with a few exceptions. Recall that in this case we wish to show that the process is static. By the definition of the dynamics of the model α-to-β swaps can only occur in areas where there are unhappy α-nodes. Hence in this case the α-to-β swaps will be concentrated in a very few selected areas in the ring, at least in the first stages of the process. This concentration of α-to-β swaps creates cascades of α-node evictions which can be clearly seen in simulations such as the one displayed in Fig. 6. 8 If we could argue that such cascades are restricted to small areas around the initially unhappy α-nodes, then it is not hard to argue that the process reaches a dormant state rather quickly, having affected only a very small number of nodes. The way we do this is through stable intervals, a device that was also used in [1]. Roughly speaking, these are intervals that do not allow the spread of unhappy α-nodes through them.
If ρ is very small, or if τ is very small, then stable intervals occur with high probability. On the other hand, if ρ, τ get sufficiently large, the probability of a stable interval tends to 0 as w → ∞. This contrasts with prevalence of unhappy α-nodes. When τ, ρ are small, the probability of (the occurrence of) an unhappy α-node is small, while it gets large when τ, ρ increase. Figure 7 shows the actual probabilities (as calculated in Sect. 4) as functions of τ, ρ for the specific value of w = 100 (the shape of the plots does not change significantly for different values of w). The interesting case is the range for τ, ρ where both probabilities tend to 0 as w → ∞, i.e. both events become rare. Somewhere on the horizontal τ -ρ plane there is a line marking the intersection of the two surfaces. This is where the probability of a stable Fig. 6 The evolution of the infected area when τ + ρ < 1. The current state in the outer circle, the initial state is in the inner circle, and each move of an α-node is represented by a dot at the coordinates of the new position, but at a distance from the center which is proportional to the stage where the swap occurred. This representation of the process in time illustrates the cascades of swaps that occur and start from the initial unhappy α-nodes Fig. 7 The probabilities of a stable interval and an unhappy α-node, as functions of τ, ρ ≤ 0.5 when w = 100 interval becomes less than the probability of an unhappy α-node. Moreover, as w → ∞ the ratio of the two probabilities tends to infinity or zero, depending whether τ, ρ sit on one side of the plain (with respect to the intersection line) or the other. The crux of the argument in Sect. 4 is that for many values of τ, ρ stable intervals are much more common than unhappy α-nodes in the initial configuration. This allows us to argue that, in this case, the process has to reach a dormant state after o n many swaps, which implies that the process is static.

Probability Terminology and Asymptotic Notation
In Sect. 5.1 we summarize some facts from probability theory that are used in our analysis. In Sect. 5.2 we state and prove some basic probabilistic facts about the Schelling model, which are also needed in our analysis. Asymptotic notation will be useful in expressing various statements in our analysis. We already defined the notation 0 w n in Sect. 1.2. Given two functions f , g on the positive integers, (as is standard) we say that f is O g if there exists a positive constant c such that f (t) ≤ c · g(t) for all t. We say that g is Ω( f ) if f is O g , and that g is Θ( f ) if both f is O g and f is Ω(g). We also use this notation, however, in a more general sense: we say that f is g(O t ) if there exists some c > 0 such that f ≤ g(ct) for all t. For example, when we say that a function f is ne −O t , this means that there is c > 0 such that f (t) ≤ ne −ct for all t. Or, if we say that f is n(1 − e −O t ), this means that there is c > 0 such that f (t) ≤ n(1 − e −ct ) for all t. Similarly, we use Θ in a more general sense. We say that f is g(Θ(t)) to mean that there exist constants c 0 and c 1 such that g(c 0 · t) ≤ f (t) ≤ g(c 1 · t) for all t. We say that f = o g if lim t f (t)/g(t) = 0. The (often hidden) variable underlying the asymptotic notation in the various expressions will be w. In other words, for fixed values of ρ and τ , the choice of constants required in the asymptotic notation, will always depend only on w. We also combine the 'high probability' terminology with the asymptotic notation in a manner which is worth clarifying. When we say, for example, that 'with high probability the number of initially unhappy α-nodes in the process (n, w, τ, ρ) is n · (1 − ρ) · e −Θ(w) ', this means that there exist constants c 0 and c 1 such that, with high probability, the number of initially unhappy α-nodes in the process (n, w, τ, ρ) lies between n · (1 − ρ) · e −c 0 ·w and n · (1 − ρ) · e −c 1 ·w .

Metrics and Reaching Complete Segregation ( > 0.5, + > 1)
One of the most challenging problems in the analysis of the segregation process is the large number of absorbing states. In order to understand which transitions are possible, we use certain metrics that describe the current state.

Welfare, Mixing, and Expectations
We define global metrics that reflect the welfare of the entire population. 9 These metrics and their properties are essential in all of the proofs that will follow. An obvious choice is the number of happy nodes at a given state. It is not hard to devise transitions of the process which reduce the total number of happy nodes (see the second plot of Fig. 5). However it is possible to show that if τ > 0.5 the total number of happy nodes is approximately non-decreasing (in the sense that it is Θ(g) for some nondecreasing function g on the stages, where the underlying constant depends only on w). 10 Let the utility of a node (at a certain state) be the number of nodes of the same type in its neighborhood. A better behaved global metric of welfare of a state (compared to the number of happy nodes) is the sum of the utilities of the nodes in the state. We call this parameter the social welfare of the state and denote it by V. A consequence of the transition rule and the definition of utility is that the social welfare does not decrease along the stages of the process. Furthermore, if τ ≤ 0.5, every transition of the process strictly increases the social welfare. Let the mixing index of a node be the number of nodes in its neighbourhood that are of different type. The mixing index mix of a state is the sum of the mixing indices of the α-nodes in that state. The mixing index of a state is also equal to the sum of the mixing indices of the β-nodes in that state. The relationship between the two metrics is Hence the mixing index is non-increasing along the transitions. Note that a single swap cannot decrease the mixing index by more than 4w. On the other hand, by linearity of expectation we can calculate that the expectation of the mixing index in the initial state of (n, w, τ, ρ) is 2nwρ(1 − ρ). The mixing index of complete segregation (in nontrivial cases) is w(w + 1). Since ρ ≤ 1/2, this means that (with high probability) the process can reach complete segregation only after (nρ − (w + 1))/4 > nρ/5 stages, i.e. Ω(n) stages. On the other hand, a case analysis shows that if τ ≤ 0.5, each step in the process decreases the mixing index by at least 4. This happens because each time a swap occurs, the mixing index decreases by at least 4 (so its not possible that a constant number of nodes swap more than o n times). We have shown that the second clause of Corollary 1 (concerning the time to the final state) follows from the first clause.
As another measure of mixing, we may consider the number k β of maximal β-blocks in the state. These are the contiguous β-blocks that are maximal, in the sense that they cannot be extended to a larger contiguous β-block. Let U be the number of unhappy nodes in a state. It is not hard to show that if τ > 0.5 then mix = Θ(U) = Θ(k β ) and in particular This means that the number of unhappy nodes at a certain state reflects the progress of the process towards segregation. More precisely, the metrics mix, k β , U are mutually proportional when τ > 0.5, where the analogy coefficient depends on w (see Fig. 5). In Table 5 we display these global metrics of welfare, along with their dynamics. A function (on the stages of the process) has positive dynamics if it is non-decreasing and approximately positive dynamics if it is Θ(g) for some nondecreasing function g, where the multiplicative constant does not depend on n. Similar definitions apply for 'negative'. The first clause of Theorem 1 (the case when τ > 0.5) is the hardest to prove. It turns out that in this case we can deduce a non-trivial lower bound on the mixing index of dormant states.
The case τ > 0.5 is further divided in two cases, which reflect the proportions of happy nodes in the initial state. We display these in Table 3, along with the corresponding expectations for the numbers of happy nodes of each type. Lemma 1 is crucial for the proof of the first clause of Theorem 1 (in particular the case τ + ρ < 1).

Accessibility of Dormant States and Complete Segregation
Here we prove Theorem 1 for the case τ > 0.5 and τ + ρ > 1. However Lemma 3 is more general and will also be used in Sect. 3 which deals with the harder part of the proof of Theorem 1.

Overview of the Proof of Theorem 1 for > 0.5 and + > 1
This argument consists of two parts. First, we show that in this case with high probability the initial state is such that every state with the same number of α-nodes has unhappy nodes of both types (i.e. it is not dormant). Hence under these conditions, no accessible state is dormant. The second part consists of showing that from every state there is a sequence of transitions to either a dormant state or complete segregation. Moreover the latter fact holds in general, for any values of τ, ρ, so it can be reused for the case when τ + ρ < 1, in Sect. 3. This latter case is more challenging, as it can be seen that there are permutations of the initial state which are dormant.
Given ρ, by the law of large numbers with high probability (tending to 1, as n tends to infinity) ρ * will be arbitrarily close to ρ. Hence we may deduce the absence of dormant states (with high probability) in the case that τ + ρ > 1.
It remains to show the accessibility of either a dormant state or complete segregation, from any state of the process. An inductive argument can be used in order to prove this fact, which along with Corollary 2 shows Theorem 1 for τ > 0.5 and τ + ρ > 1. Here is a sketch of the proof. If τ ≤ 0.5 the mixing index is strictly decreasing through the transitions, so it is immediate that the process will reach a dormant state (indeed, 0 is a lower bound for the mixing index). For the case where τ > 0.5 (which we assume for the duration of this discussion) we can argue inductively, in four steps. First we show that from a stage with few unhappy nodes of one type (here 5w 4 is a convenient upper bound of what we mean by 'few', which is by no means optimal) there is a series of transitions which lead to either a state with a contiguous block of length 2w or a dormant state. Second, from a state with a contiguous block of length ≥ 2w there is a series of transitions to complete segregation or to a dormant state. Third, from any state which has at least w 4 unhappy nodes of each type, there is a series of transitions to a state with a contiguous block of length at least w. Finally from a state that has a contiguous block of length ≥ w and at least 4w unhappy nodes of opposite type from the block, there is a series of transitions to a state with a contiguous block of length ≥ 2w. The combination of these four statements constitutes a strategy for arriving to a dormant state or a state of complete segregation, from any given state. We illustrate this strategy in Fig. 8, where two arrows leaving a node indicate that at least one of these routes are possible.

Proof of Lemma 2 and Corollary 2
It is crucial to understand the dormant states and assess their accessibility from an initial state. We demonstrate that this issue ultimately depends on the given parameters τ, ρ. We show that if τ + ρ > 1 then with high probability we may assert that no dormant state is accessible from the initial state. 11 The following lemma implies Lemma 2.
Proof Given the parameters θ * , τ , w which is large, and any state of the process (n, w, τ, ρ) with no unhappy γ -nodes, it suffices to produce an upper bound on n (which does not depend on the particular state but only on θ * , τ, w and the fact that no γ -nodes are unhappy). Let δ ∈ {α, β} − {γ }. Since τ > 0.5 and all γ -nodes are happy, there are no δ-blocks of length ≥ w. We may assume that n > 3w + 1. Define the bias B(I ) of an interval I of nodes to be the difference between the number of γ -nodes in the interval and the number of δnodes in the interval. Without loss of generality suppose that the node occupying site w is a γ -node (otherwise consider a rotation). We define a sequence (u i ) of γ -nodes in the state, starting with u 0 = w. Let N i denote the neighbourhood of u i . Given u i , define u i+1 to be the rightmost γ -node in N i . Since there are no δ-blocks of length ≥ w, the sequence (u i ) is well defined and it never happens that u i = u i+1 . Let m be the largest number such that none of the neighbourhoods N i for 0 < i ≤ m contain the node at site 0. Since n > 3w Note that I m contains all of the nodes except at most w. Moreover since u i+1 − u i ≤ w we have Let L i , and R i be the leftmost and rightmost w-many nodes in N i respectively. Since N i contains at least τ (2w + 1) nodes of type γ : Note, however, that some nodes have been counted multiple times in the sum that defines V m , since the intervals N i are not disjoint. For each k ∈ N let J m k consist of the nodes in I m which belong to exactly k distinct intervals N i .
By the definition of (u i ), the node u i+2 is always outside N i (since it is a γ -node, and if it was in N i then u i+1 would not be the rightmost γ -node in N i ). Similarly, u i+4 is always outside N i+2 . This means that it is not possible for the neighbourhoods of 5 consecutive terms of (u i ) to have a nonempty intersection. 12 This, in turn, implies that J m k = ∅ for each k > 4. A similar consideration shows that J m 4 consists entirely of δ-nodes (hence B(J m 4 ) ≤ 0). Next, note that J m 1 ⊆ L 0 ∪ R m , so |J m 1 | ≤ 2w. Hence by counting the multiplicities of the nodes in the sum which defines V m , we have Then from (5) From the second clause of (3) and (4) we have If x m , y m are the numbers of γ and δ nodes in I m respectively, then x m + y m = |I m | and (2), By (6) we may deduce that We may assume that w is larger than By this condition and the fact that τ − θ * > 0, the left side of (7) is positive. Also, n ≤ |I m | + w, so by (2) we have n ≤ 3w + 1 + mw. If we combine the latter inequality with (7) we get which is the required bound on n.
Note that in the above result, the lower bound that is required on w depends only on τ, ρ * , while the lower bound that is required on n depends on τ, ρ * and w. We may now apply Lemma 4 in order to establish the conditional existence of unhappy nodes of both types.
Given ρ, by the law of large numbers with high probability (tending to 1, as n tends to infinity) ρ * will be arbitrarily close to ρ. Hence we may deduce the absence of dormant states (with high probability) in the case that τ + ρ > 1.

Corollary 4 (Absence of dormant states)
If ρ ≤ 0.5 < τ and τ + ρ > 1 then with high probability none of the accessible states of the process (n, w, τ, ρ) is dormant.
This corollary along with the remark made in Footnote 11 establishes the main dichotomy in the analysis of the process.

Proof of Lemma 3 (Accessibility of Complete Segregation or Dormant State)
A central part of our analysis is the fact that from any state there is a transition to either a dormant state or complete segregation. This is what we prove in this section. This also means that the only absorbing states of the process are the dormant states.
If τ ≤ 0.5 then it is clear that the only absorbing states of the process are the dormant states, since unhappy pairs of nodes of different type can always swap. Consider the mixing index which is non-negative and strictly decreasing in stages for τ ≤ 0.5. This means that there can only be finitely many swaps in the process, and so a dormant state must eventually be reached.
For the case where τ > 0.5 more effort is required. We argue in four steps. The numbers in what follows are fairly arbitrary. First we show that from a state with at most a small number of unhappy nodes of one type (here 5w 4 is a convenient upper bound of what we mean by 'small', which is by no means optimal) there is a series of transitions which lead to either a state with a contiguous block of length 2w or a dormant state. Second, from a state with a contiguous block of length ≥ 2w there is a series of transitions to complete segregation or to a dormant state. Third, any state which has at least 2w 4 unhappy nodes of each type, there is a series of transitions to a state with a contiguous block of length at least w, and at least w 4 unhappy nodes of each type. Finally from a state that has a contiguous block of length ≥ w and at least 4w unhappy nodes of opposite type from the block, there is a series of transitions to a state with a contiguous block of length ≥ 2w. The combination of these four statements constitutes a strategy for arriving at a dormant state or a state of complete segregation, from any given state.
In the following arguments we will often make use of the following two rather simple facts that hold when τ > 0.5. One is that (if w > (1 − τ )/(2τ − 1)), any β-node that is adjacent to a happy α-node is unhappy. The second concerns the situation where next to a happy α-node there is a β-node, and we swap the β-node for another α-node. Then, provided that before the swap the the second α node is outside the neighbourhood of the β-node, both α-nodes will be happy after the swap.
Lemma 5 (Shortage of unhappy nodes) Suppose that τ > 0.5 and 0 w n. From a state with less than 5w 4 unhappy nodes of one of the types, there is a series of transitions to either a dormant state or to a state containing a contiguous block of length at least 2w.
Proof Without loss of generality suppose that the state has less than 5w 4 unhappy α-nodes. Since ρ * ∈ (0, 1), and 0 w n, if there does not already exist a contiguous block of length 2w then there exists an interval [u, v] of 2w nodes which contains at least one α-node and such that any unhappy α-node is at distance at least 2w 2 from any node in [u, v]. 13 Any unhappy α node which cannot see any node in [u, v] can move to any position in [u, v] that is adjacent to an α-node (because by doing so, it becomes happy and because if a swap is legal for one member of a potential swapping pair then it is legal for both). Hence we can start successively replacing the β-nodes in [u, v] which are adjacent to α-nodes, with unhappy α-nodes, each time choosing unhappy α-nodes that have maximal distance from u, v. Note that this recursive procedure is valid because all α-nodes in [u, v] are happy after each swap. Ultimately we either run out of unhappy α-nodes, or else [u, v] becomes an α-block.
Lemma 6 (Toward a block of length w) Suppose τ > 0.5. If 0 w n then from any state which has at least 2w 4 unhappy nodes of each type, there is a series of transitions to a state with an α-block or β-block of length at least w, and at least w 4 many unhappy nodes of each type.
Proof Suppose that we are given a certain state of the process. Define a sequence u i , i ≤ w 2 of α-nodes with neighbourhoods N u i respectively, by induction as follows. Let u 0 be the least α-node whose neighbourhood contains the minimum number of α-nodes amongst all neighbourhoods of α-nodes. If u i is defined and i < w 2 , define u i+1 to be the least αnode whose neighbourhood is disjoint from ∪ j≤i N u i and whose neighbourhood contains the minimum number of α-nodes amongst all α-nodes with the same property (i.e. with neighbourhoods that are disjoint from ∪ j≤i N u i ). This completes the definition of (u i ), which is sound provided that n is sufficiently large. We define a sequence v i , i ≤ w 2 of β-nodes with neighbourhoods N v i respectively, in a way entirely analogous to the above definition, ensuring also that all neighbourhoods N u i and N v j are disjoint.
The sequences (u i ) and (v i ) provide a pool of nodes which will be used for legitimate swaps in a series of transitions which will lead to the desired state of the process. We start by considering an interval J of nodes of length 3w which is disjoint from ∪ j≤w 2 N u i and disjoint from ∪ j≤w 2 N v i . Such an interval exists, provided that n is sufficiently large. Let I consist of the w-many nodes in J that are at distance at least w + 1 from any node outside the interval. Clearly any swap that occurs between a node in I and one of the nodes u i , does not affect the composition of the neighbourhoods N u j for j = i, or N v j for j ≤ w 2 (and similarly for a swap between a node in I and one of the v i ).
Let t i , i < w be the nodes of I enumerated from left to right. We shall describe a swapping process, involving less than w 2 swaps. At the end of this process of legal swaps, all nodes in I will be of the same type, (but which type that is will not be determined until the end of the process). This process has w-many steps, with each step s involving up to s swaps. Let γ s be the type of t s at the end of stage s. Also, let V s contain the nodes u i , v i , i ≤ w 2 which are of type γ s and have not been involved in a swap by the end of stage s. The construction is designed so that γ s is the type of all t i , i ≤ s st the end of stage s. This feature guarantees that at the end of the process, all nodes in I have the same type. Stage 0 is null (i.e. we carry out no instructions at stage 0).
At stage s + 1 we check if t s+1 has type γ s . If so, then we go to the next stage. If not, then suppose first that t s+1 is unhappy. In the case that t s is happy, any unhappy γ s -node outside J can swap with t s+1 (because an unhappy γ s -node moving next to a happy γ s -node cannot decrease its utility). In the case that t s is unhappy, we claim that any node x from V s can legitimately swap with t s+1 . In order to see this, note that the number of γ s -nodes in the neighbourhood of t s is at least as large as this number at the beginning of the process. By the definition of V s , this number is at least as large as the number of γ s nodes in the neighbourhood of x. This means that if x moves to the place that t s+1 occupies, its utility will not decrease.
The last case in the procedure is if t s+1 is happy and of type different than γ s . In this case we define γ s+1 ∈ {α, β} − {γ s } and swap all t i , i ≤ s with distinct nodes in V s+1 , starting with t s and moving to the left. These are legitimate swaps, as nodes of type γ s+1 move next to happy nodes of the same type (so their utility is not decreased after the swap). This concludes the description of the process.
By the end of stage w − 1, all nodes in I are of the same type. Since we perform less than w 2 many swaps, there are less than 2(2w + 1)w 2 many nodes whose neighbourhoods are affected by these swaps. Since w is large, there are therefore at least w 4 many unhappy nodes remaining of each type remaining.
Lemma 7 (Toward a contiguous block of length 2w) Suppose that τ > 0.5 and 0 w n. From a state that has an α-block of length ≥ w and at least w 4 unhappy nodes of each type, there is a series of transitions to a state with an α-block of length ≥ 2w. The same holds for β-blocks.
Proof Consider the given state and assume that there is no α-block of length ≥ 2w (otherwise 0 transitions suffice). Let [x, y] be the longest α-block in the given state, and let J consist of all the nodes that are at distance at least w from the interval [y − 2w, y]. Note that x − 1 is a β-node and since τ > 0.5 it is unhappy. Let z be the rightmost α-node to the left of x. If z is unhappy, then we may swap it with x − 1 since its utility will not decrease. Otherwise, if z is happy, then it is at a distance at most w from x and we may successively swap the β-nodes in (z, x), starting from z + 1 and moving to the right, for an equal number of unhappy α-nodes in J . This is possible because each time that we move an α node next to a happy α-node, the new α node becomes happy. We repeat this process until an α-block of length 2w has been formed. Each step of the process increases the length of the α-block that is adjacent and to the left of y the process will terminate. We also perform at most w many swaps, meaning that we shall not run out of unhappy nodes to perform the swaps with. Proof Consider any state which is not completely segregated, but which has a contiguous block of length at least 2w. Without loss of generality, suppose that this is a block of α nodes occupying the interval [u, v], where this interval is chosen to be of maximum possible length. Our aim is to show that from this state, one may legally reach another with a contiguous block of greater length (or else a dormant state). Now if the nodes u and v are both happy then the length of the interval ensures that all nodes in the block are happy. 14 In this case, if there exists an unhappy α node u , then let t ∈ {u, v} be distance at least w + 1 from u . Then u and the β neighbour of t may legally be swapped, increasing the length of the run by at least 1.
So suppose instead that at least one of the nodes u and v is not happy, and without loss of generality suppose that u has bias less than or equal to v, where the bias of a node is the number of α-nodes minus the number of β-nodes in its neighbourhood. Then u and v + 1 may legally be swapped. Performing this swap causes position v + 1 to have at least the same bias as v did before the swap, and causes u + 1 to have at most the same bias as u did before the swap. Thus, the swap has the effect of shifting the run one position to the right and may be repeated until the length of the run is increased by at least 1, i.e. for successive i ≥ 0 we can swap the nodes u + i and v + i + 1, so long as the latter is of type β. The first stage at which the latter is of type α the length of the contiguous block has been increased. Putting these observations together, we conclude that from any state which has a contiguous block of length at least 2w it is possible to reach full segregation.
Finally, we piece together the above processes in order to show the following comprehensive statement.
Corollary 5 (Complete segregation or dormant state) From any state of the process (n, w, τ, ρ) with 0 w n, there exists a series of transitions to complete segregation or to a dormant state.
Proof The case τ ≤ 0.5, we considered earlier. Suppose that τ > 0.5. We may assume that ρ * ∈ (0, 1), because otherwise every state is a dormant state. If there exist at most 5w 4 unhappy nodes of each type in the state, Lemma 5 shows how to reach a dormant state or a state with a contiguous block of length ≥ 2w. In the latter case, Lemma 8 shows that there is a series of transitions to complete segregation or to a dormant state. So we may assume that the given state has more than 5w 4 unhappy nodes of each type. Then Lemma 6 shows how to reach a state with a contiguous block of length ≥ w and at least w 4 many unhappy nodes of each type. Furthermore, from such a state Lemma 7 shows how to reach a dormant state or a state with a contiguous block of length ≥ 2w. In the latter case, Lemma 8 shows that there is a series of transitions to complete segregation or to a dormant state. This is an exhaustive analysis that establishes a path to a dormant state or complete segregation, from every state.
This completes our proof of Theorem 1 for the case that τ + ρ > 1.

Reaching Complete Segregation when > 0.5, + < 1
This case of Theorem 1 is challenging because we need to show that the process avoids accessible dormant states, until it reaches a safe state i.e. a state from which no dormant state is accessible. The reason for this avoidance is (in contrast with the case τ + ρ > 1 of Sect. 2.2) the dynamics of the process with the given parameters. The methodology we use is based on a martingale argument, which involves a great deal of the analytical tools (e.g. the metrics of social welfare) and their properties that were developed in the previous sections. Having shown that dormant states are avoided until the process reaches a safe state, Lemma 3 gives Theorem 1 (for the case where τ > 0.5 and τ + ρ < 1). An overview of this argument is given in Fig. 3.

The Persistence of Large Contiguousˇ-Blocks
According to our plan, we wish to establish the existence of unhappy nodes of both types until a safe state is reached. 15 By Lemma 2, we do not have to worry about the existence of unhappy β-nodes. One device that guaranties the existence of unhappy α-nodes is a contiguous block of β-nodes, of length at least w. Such a block exists in the initial random state (with high probability). One way to argue for its preservation in subsequent stages is to consider the ratio of the unhappy nodes of the two types. Even more relevant is the ratio between the number of unhappy α-nodes, and the number of β-nodes which are not just unhappy, but actually sufficiently unhappy that they can swap with any unhappy α-node. Definition 2 (Very unhappy β-nodes) Given a stage of the process, a node of type β is very unhappy if there are at least (2w + 1)τ nodes of type α in its neighbourhood. The number of very unhappy β-nodes is denoted by U * β .
In the case that we study (τ > 0.5 and τ + ρ < 1) initially, the number of very unhappy β-nodes is Ω(n) while the number of unhappy α-nodes is o n . The following lemma says that as long as this imbalance (large number of very unhappy β-nodes versus small number of unhappy α-nodes) is preserved, it is very likely that a sufficiently long contiguous block of β-nodes is preserved.
Lemma 9 (Persistent β-block) Consider the process (n, w, τ, ρ) with τ > 0.5 and let s * be the least stage where the ratio between the very unhappy β-nodes and the unhappy α-nodes becomes less than 4w 2 (putting s * = ∞ if no such stage exists). Then with high probability there is a β-block of length ≥ 2w at all stages < s * of the process.
Since a β-block of length at least w is a guarantee for unhappy α-nodes, we get the following corollary. It remains to construct an elaborate martingale argument in order to show that the imbalance between U α and U * β persists for a sufficiently long time (until the process reaches a safe state).

Overview of the Infected Area of the Schelling Process
In the case of unbalanced happiness (i.e. when τ > 0.5, τ + ρ < 1, see Table 3) the unhappy α-nodes are initially very rare, so the interesting activity (namely α-to-β swaps) occurs in small intervals of the entire population (at least in the early stages). These intervals contain the unhappy α-nodes, and gradually expand, while outside these intervals all β-nodes are very unhappy. Figure 9 shows the development of this process, where the height of the nodes (perpendicular lines) is proportional to the number of α-nodes in their neighborhood and the horizontal black line denotes the threshold where an α-node becomes unhappy. Hence nodes with high proportion of α-nodes in their neighbourhood will be higher than the nodes with low proportion of α-nodes in their neighbourhood. The three horizontal bars are snapshots of the process, and show cascades forming, originating from the initially unhappy α-nodes. Figure 6 shows the same process, with the current state in the outer circle, and with swaps represented by a dot at a distance from the center which is proportional to the stage where the swap occurred. These cascades that spread the unhappy α-nodes are due to the following domino effect. An unhappy α-node moves out of a neighbourhood, thus reducing the number of α-nodes in that interval. This in turn often makes another α-node in the interval unhappy, which can move out at a latter stage, thus causing another α-node nearby to be unhappy, and so on. The expanding intervals are the infected segments which start their life as incubators. 16 Roughly speaking, incubators are a small intervals that surround the unhappy α-nodes in the Fig. 9 Formation and dynamics of the infected area when τ + ρ < 1. Here instead of a circle, we lay the nodes horizontally, representing them as perpendicular lines of a fixed length, which appear lower or higher according to the proportion of α-nodes in their neighborhood. The horizontal line represents the level below which an α-node becomes unhappy initial state. Moreover they are defined in such a way that, every β-node that is outside the incubators is very unhappy in the initial state. During the process, as we discussed above, these expand into larger infected segments, so that at each stage every unhappy α-node is inside an infected segment. The union of all infected segments is called the infected area which, roughly speaking, consists of the areas containing unhappy α-nodes (formal definition is given in the next section). At any stage, every β-node outside the infected area is very unhappy and every α-node outside the infected area is happy. It is not hard to show that if τ + ρ < 1, the probability that a node belongs to an incubator is e −Θ(w) . Hence with high probability the number of incubators as well as the number of nodes belonging to incubators of the process (n, w, τ, ρ) is ne −Θ(w) .
It turns out that the number of unhappy β-nodes in an interval of nodes, is conveniently bounded in terms of the number of α-nodes in the interval. This means that if the number of α-nodes in the infected area remains o n , then the number of unhappy β-nodes in the infected area also remains o n . In order to give a clear sketch of the argument depicted in Fig. 3 (for the current case when τ > 0.5 and τ + ρ < 1) let us define the global variables in Table 6. 17 Let Z s be the number of the α-nodes in the infected area and Y s be the number of unhappy β-nodes in the infected area at stage s. Also let G s be the number of β-nodes outside the infected area and let C be the number of nodes inside the incubators (in the initial state). Let U s be the number of unhappy nodes at stage s (this metric was discussed in Sect. 2.1 in the context of a fixed state).
Note that U s ≤ G s + Y s + Z s . A combinatorial argument can be used in order to show that Y s ≤ Z s /(1 − τ ) + 2wC (see Sect. 5.5 for the proof). Hence By (1) we know that a stage where the number of unhappy nodes is less than nτρ * /w is a safe stage. Hence we wish to show that (with high probability) the process will arrive at a stage where each of the three summands in (8) are at most nτρ * /(3w). We know that C can be bounded appropriately. Our main argument will show how to obtain a similar bound for Z s . Note that G s plays a different role, since it is initially large and shrinks monotonically (as the infected area expands monotonically). In order to find a stage where G s becomes sufficiently small, it is instructive to consider what is a typical swap in the process. At the start of the process the infected area is a very small proportion of the entire ring. The vast majority of unhappy β-nodes occur outside the infected area, while all unhappy α-nodes are inside the infected area. It follows that with high probability a swap will involve an α-node in the infected area and a β-node outside the infected area. A bogus swap is a swap is one that is not of this kind.

Definition 3 (Bogus swaps)
A swap which involves a β-node currently inside the infected area is called bogus. Given an infected segment I , a bogus swap in I is a swap that moves an α-node into I .
Note that any swap which is not bogus, reduces G s by at least 1. Hence if we show that the bogus swaps have small probability throughout a significant part of the process, we can ensure that G s becomes sufficiently small. In order to be more precise, recall the stopping time s * from Lemma 9. We introduce a few more stopping times, all of which will turn out to be earlier than s * (with high probability). These basically concern the satisfaction of conditions which will ensure that the mixing index is sufficiently low as to guarantee a safe state. By (1) we have mix ≤ U · w(w + 1) and in order to ensure a safe state (by Lemma 1) we want mix < n(w + 1)τρ * . So we want U < nτρ * /w at some stage of the process. Let T mix be the first stage which satisfies this condition. Similarly, consider the stopping times T g , T stop of Table 7 (for simplicity, we will not consider T y in the present discussion). Here T g is the first stage where G s ≤ τρ · n/(4w) and T stop is the first stage at which there are no more unhappy α-nodes.
We use an elaborate martingale argument in order to show the following. This lemma in combination with Lemma 9 implies that T g ≤ s * ≤ T stop . Hence every stage up to T g involves a swap. Then it follows from the second clause of Lemma 10 that T g < n (since G s is reduced by at least 1 at every non-bogus swap). Hence by (8) we have established (with high probability) the existence of a stage T g < n such that Hence by (1) we have T mix ≤ T g , which means that by stage T g a safe state has been reached. Then by Corollary 3 the process will reach complete segregation, with probability 1 − o 1 .

Corollary 7 (Safe state arrival)
Suppose that τ + ρ < 1. Then with high probability the process (n, w, τ, ρ) reaches a safe state by stage n, and eventually complete segregation.
This argument (with the full details given in the following sections) concludes the proof of Theorem 1 for the case τ > 0.5. The case τ ≤ 0.5 is dealt with in Sect. 4.

Infected Area and Random Variables (Formal Definitions)
In this case of unbalanced happiness (i.e. when τ > 0.5 and τ + ρ < 1, see Table 3) the unhappy α-nodes are initially very rare, so the interesting activity (namely α-to-β swaps) occurs in small intervals of the entire population (at least in the early stages). These intervals contain the unhappy α-nodes, and gradually expand, while outside these intervals all βnodes are very unhappy. Figure 9 (produced from a simulation) shows the development of this process, where the height of the nodes (perpendicular lines) is proportional to the number of α-nodes in their neighborhood and the horizontal black line denotes the threshold where an α-node becomes unhappy. These cascades that spread the unhappy α-nodes are due to the following domino effect. An unhappy α-node moves out of a neighbourhood, thus reducing the number of α-nodes in that interval. This in turn often makes another α-node in the interval unhappy, which can move out at a latter stage, thus causing another α-node nearby to be unhappy, and so on. The expanding intervals are the infected segments which start their life as incubators.
Definition 4 (Incubators) Consider the set I of nodes in the initial state which belong to an interval of nodes of length w with less than * = w(1 − ρ + τ )/2 many α-nodes. Let I * be the set of nodes whose neighborhood contains a node in I . An incubator is a maximal interval of nodes that is entirely contained in I * .
Each incubator in the initial state may initiate a cascade of α-to-β swaps, which generates additional unhappy α-nodes nearby, and which sustains itself by motivating additional α-toβ swaps, thus creating a cycle of α-to-β swaps and unhappy nearby α-nodes. The infected segment around an incubator I is, informally speaking, the minimum interval containing I which is affected by this process, i.e. by the appearance of additional nearby unhappy α-nodes. The infected area is the union of all of these infected segments, and it is always expanding during the process. We give a precise inductive definition of this notion in order to eliminate any ambiguity that may confuse the reader. An interval of nodes is called active at a certain state if it contains an unhappy α-node.

Definition 5 (Infected segments)
For each incubator I , let the infected segment I 0 corresponding to I at stage 0 be I itself. At the end of stage s + 1, suppose that the infected segment I s is defined for each incubator I , and define I s+1 for each incubator I as follows. Starting at position 0 and moving clockwise, consider each I s which is currently active and was also active at the end of stage s, and define I s+1 = I s ∪ J , where J consists of the nodes which do not already belong to another active infected segment (by the time we consider I s ) and whose neighborhood contains an unhappy α-node in I s at stage s + 1. Finally we define I s+1 for those incubators I which are no longer active by letting I s+1 = I s − Q, where Q consists of the nodes in I s which now belong to an active infected segment.
The infected area is the union of the infected segments. The fresh infected segment corresponding to infected segment I is I − I 0 , i.e. consists of the nodes I except the nodes in its incubator. Hence a fresh infected segment consists of two growing intervals of nodes. The fresh infected area is the infected area except the nodes in the incubators. The interior of a set of nodes J consists of those nodes whose neighbourhood is entirely contained in J . The boundary of J consists of the nodes in J which are not in the interior. It is not hard to show that if τ + ρ < 1, the probability that a node belongs to an incubator is e −Θ(w) . Hence with high probability the number of incubators as well as the number of nodes belonging to incubators of the process (n, w, τ, ρ) is ne −Θ(w) .
Our goal is now to show that the number of unhappy α-nodes remains suitably bounded throughout a significant part of the process. Formally, the main idea is to bound this number with a martingale. Intuitively though, why should the number of unhappy α-nodes remain fairly small? At the start of the process the infected area is a very small proportion of the entire ring. The vast majority of unhappy β-nodes occur outside the infected area, while all unhappy α-nodes are inside the infected area. It follows that with high probability a swap will involve an α-node in the infected area and a β-node outside the infected area.
In the absence of bogus swaps, it is not hard to show that the α-nodes in the infected area (except those in the incubators) are unhappy. This in turn can be used in order to show that the α-nodes in the infected area (and so, the unhappy α-nodes too) are likely to remain o n . However there will be bogus swaps, and these can make certain α-nodes in the infected area happy.

Definition 6 (Anomalous nodes)
A node is called actively anomalous at some stage of the process if it is a happy α-node in the interior of the fresh infected area; it is called anomalous if it has been actively anomalous in this or a previous stage. Finally a node is called generally anomalous at some stage, if it is in the current infected area and has been or will be actively anomalous at some later stage of the process.
Clearly actively anomalous implies anomalous, which in turn implies generally anomalous (but not the other way around). Let D s denote the number of anomalous nodes at stage s, and letD s denote the number of generally anomalous nodes at stage s. A martingale argument will be used in order to show that as long as D s = o n , the α-nodes in the infected area are likely to remain o n . The definition of anomalous nodes andD s may seem strange at this point, not least becauseD s is not predictable at stage s. The reason that we introduceD s is that D s is very hard to analyze, and very hard to bound directly via a martingale (adapted to the stages of the process). However it is possible to boundD s via a martingale argument of a more general type (i.e. which is not adapted to the stages of the process). Since D s ≤D s , this suffices for our purposes.
Define the global variables Table 6 (also recall Table 5). By the definitions we have Here (d) holds because of the likely total size of the incubators and (c) holds because β-nodes outside the infected area are very unhappy.

Probabilities in the Infected Area and Anomalous Nodes
Our current goal is to show that the number of unhappy α-nodes remains suitably bounded for a significant part of the process. 18 The basic idea is that if the number of unhappy α-nodes increases sufficiently, then the infected area must become quite large, and it becomes very likely that the next swap will involve an unhappy α-node in the interior of the infected area. We shall be able to argue that there are good chances that the swap is not bogus. This means that this α-node will move outside the infected area and will become happy. The anomalous nodes, however, present a difficulty with this line of argument. The eviction of the α-node from the infected area (and its replacement by a β-node) may produce more unhappy α-nodes in its neighbourhood. So it is not absolutely true that the total number of unhappy α-nodes will decrease. In fact, as the simulations of Fig. 5 suggest, at the early stages of the process this number is likely to increase slightly. If we assume the absence of bogus swaps, then it is not hard to show that the nodes in the interior of the infected area and outside the incubators have neighborhoods with proportion of α-nodes well below (2w + 1)τ . In this case it is straightforward to employ a martingale argument which shows that the number of α-nodes in the infected area (hence also the total number of unhappy α-nodes) remains bounded with high probability throughout the process. Indeed, in this case there will be no happy α-nodes in the interior of the fresh infected area, so (according to the argument we outlined above) the likely swap absolutely reduces the total number of unhappy α-nodes.
In the presence of bogus swaps, we will use a more sophisticated martingale argument to bound the anomalous nodes. This bound can be used by another simpler martingale argument, in order to bound the number of unhappy α-nodes, at least up to some stopping time of the process and with high probability. This plan requires the calculation of certain probabilities.

Lemma 11 (Probability of a bogus swap) At each stage s + 1, the probability that the current swap will be bogus is bounded above by Y s /G s .
Proof The number of pairs which can cause a bogus swap is bounded by U α (s) · Y s . On the other hand, any unhappy α-node can swap with a β-node outside the infected area. Indeed, this is because the number of α-nodes in the neighbourhood of any β-node outside the infected area is at least (2w + 1)τ . Hence there are at least U α (s) · G s pairs of nodes that can swap at stage s + 1. We can conclude that the probability of a bogus swap is bounded by U α (s)Y s /U α (s)G s = Y s /G s . The calculation of the following probabilities is a first step towards our martingale argument.

Lemma 12 (Probabilities for Z s ) The numbers
are a lower bound for the probability that Z s+1 < Z s and an upper bound for the probability that Z s+1 > Z s , respectively.
Proof The probability that Z s+1 < Z s is at least as much as the probability that the swap is not bogus and it involves a node in the interior of the infected area at stage s + 1. Indeed, in this case the swap moves an α-node from the interior of the infected area to outside the infected area, so Z s+1 = Z s − 1, because the length of the infected area remains the same. 19 The unhappy α-nodes of the infected area that cannot be part of such a swap are the ones that belong to the boundary of the infected area, so they are at most 2wC many. This means that there are at least Z s − D s − 2wC nodes of type α which can be picked as part of a swapping pair at stage s + 1 such that Z s+1 − Z s is negative. Note that each of these α-nodes forms a swapping pair with any β-node outside the infected area, since all such β-nodes are very unhappy. Therefore there are at least (Z s − D s − 2wC) · G s many swapping pairs which make Z s+1 − Z s negative. On the other hand, the total number of swapping pairs are at most (G s + Y s ) · U α (s) many. Hence the first expression of (9) is a lower bound for the probability that Z s+1 < Z s . For the second clause, note that Z s+1 > Z s can only happen in the case that the infected area expands at stage s + 1. This can only occur if the swapping pair involves an α-node that belongs to the boundary of the infected area of stage s. There are at most 2wC such nodes so there are at most 2wC · (G s + Y s ) swapping pairs that can cause Z s+1 < Z s . Moreover there are at least G s · U α (s) possible swapping pairs for stage s + 1. Hence the second expression of (9) is an upper bound for the probability that Z s+1 > Z s .
We may now identify our first supermartingale. Note that the following fact is the reason why we defined the anomalous nodes the way we did. The fact that D s is nondecreasing is a necessary part of the following proof.

Lemma 13 (Non-anomalous nodes in an infected segment) The following process is a supermartingale, for all s < T y : Z
Proof At the end of stage s (and given all information as to how the process has unfolded so far) denote the probability that Z s+1 < Z s by q and the probability that Z s+1 > Z s by p. Let E be the expected value of Z s+1 . Now at stage s + 1 the infected area can expand by at most w nodes. Moreover, it is not possible that at stage s + 1, an α-node which is not in the infected area of stage s is moved to a position in the infected area of stage s + 1. This is because all α-nodes outside the infected area of stage s are happy at stage s. It follows that Z s+1 − Z s ≤ w at each stage s. Therefore By Lemma 12, in order to ensure that wp − q ≤ 0, it suffices that Since s < T y the expression inside the parentheses in the latter inequality is bounded above by 2. Hence for the condition wp − q ≤ 0 it is sufficient that Z s ≥ D s + 10w 2 · C for all s < T y . So now we divide into two cases. If Z s < D s +10w 2 ·C then Z * s+1 = Z * s = 11w 2 ·C. Otherwise, E ≤ Z s and the result follows from the fact that D s is non-decreasing. Now to get from Z * s to Z s , we need to bound D s . Intuitively, we expect the proportion of the α-nodes in neighborhoods of nodes in the interior of the infected area to be rather low, e.g. considerably lower than the threshold (2w + 1)τ . The following lemma gives a justification for such an expectation and is also the reason why we chose * = (1 − τ − ρ)/2 in the definition of incubators, Definition 4. Here is an intuitive explanation of this fact. Let us say that a node in the infected area which is not in the interior of the infected area is in the boundary of the infected area. A node in the boundary of the infected area can see a node outside the infected area. The nodes in the complement of the infected area have never seen unhappy α-nodes, hence the proportion of α-nodes in their semi-neighbourhoods can only increase. This means that one of the semi-neighbourhoods of each node in the boundary of the infected area has not been affected by α-to-β swaps. The following lemma says that such a node can only be included in the interior of the infected area if the semi-neighbourhood of it which has been affected by α-to-β swaps, is affected by at least * w such swaps. In other words, the expansion of the infected area requires a considerable number of stages. The particular statement refers to the case where the infection travels from right to left. By symmetry, an analogous statement holds for the case where the infection travels the opposite direction. Proof Let s be a stage of the process and suppose that there have been no α-to-β swaps in [a − w, a) by stage s. Suppose that there is an unhappy α-node in [a − w, u] at stage s. Then there must have been an unhappy α-node in [u − w, u] at some stage ≤ s. Consider the first such stage t 0 and let v 0 be the rightmost α-node in (u − w, u] which became unhappy at stage t 0 . By our hypothesis, up to stage t 0 there has been no α-to-β swaps in (v 0 , u]. Hence all of the α-to-β swaps that occurred in the right semi-neighbourhood of v 0 are also in the right semineighbourhood of u. The proportion of the α-nodes in the left semi-neighbourhood of v 0 is more than τ +δ. Since v 0 is unhappy at t 0 , the proportion of the α-nodes in its neighbourhood is less than τ . Hence the proportion of the α-nodes in its right semi-neighbourhood is at most τ − δ at stage t. Hence by hypothesis, by stage t at least 2wδ many α-to-β swaps have occurred in the right semi-neighbourhood of v. By the above discussion, these swaps have also occurred in the right semi-neighbourhood of u. According to the definition of incubators, this fact is relevant for δ = (1 − τ − ρ)/2 and shows that the infected area expands reasonably slowly in the stages of the process (n, w, τ, ρ). Indeed, the proportion of α-nodes in the neighbourhood of any node outside the infected area at any particular stage is at least τ + (1 − τ − ρ)/2. This also shows that, in the absence of bogus swaps, all α-nodes in the interior of the fresh infected area are always unhappy (i.e. there are no anomalous nodes). In the presence of bogus swaps this is no longer true, and this is why we have to work in order to bound the spread of anomalous nodes.

Bounding the Anomalous Nodes
Recall that D s denotes the number of anomalous nodes at stage s. In this section we construct a martingale process which shows that D s is likely to be bounded appropriately, throughout a significant part of the Schelling process. This argument requires us to consider the random variables localized into the individual infected segments. Recall the stopping times defined in Table 7. We use τρn/(4w) rather than τρn/(3w) in the definition of T g so as to allow for the slight discrepancy which one might expect between ρ and ρ * . Definition 7 (Stopping times) Let T g be the least stage such that G T g ≤ τρn/(4w). Define T y to be the first stage which is either T g or else such that Y s > G s . Finally let T mix be the first stage for which mix < n(w + 1)τρ * . In all cases, if the stage described does not exist then we define the corresponding stopping time to be ∞.
Given an infected segment I , letD s =D s (I ) be the number of nodes in I s that will ever become anomalous, up to stage T g . This is a version of the generally anomalous nodesD s . A stage is called an I -stage if a swap occurs involving a node from I .
If (ν(s)) is an enumeration of the I -stages, letD * s =D ν(sw 5 ) and I * s = I ν(sw 5 ) . We use * as a superscript in other variables in the following, in order to indicate that they are 'jump processes' in the sense that they are not updated at every stage or even every I -stage of the Schelling process. For example, D * s is only updated every w 5 many I -stages of the Schelling process.
We may view the underlying probability space Ω as a tree, where the nodes are states and branchings correspond to state transitions. Let Ω ∧ T g denote the subspace restricted to the stages up to time T g (which may be infinite). Normally we would say that an event A ⊆ Ω ∧T g is I -independent if it did not impose any branching restrictions regarding the I -stages that occur in the reals in it. We give a sightly more general definition which is more appropriate for the argument to follow. An event A ⊆ Ω ∧ T g is called I -independent if for each β ∈ A and any s such that the transition from β s to β s+1 occurs at an I -stage, β s * S ∈ A for every state that is obtained from β s through a non-bogus swap. A filtration A s ⊆ A s+1 ⊆ Ω ∧ T g is called I -independent if for each s the event A s is I -independent. Analogously, a process (J s ) on Ω is called I -independent if the natural filtration of it is I -independent. Intuitively, a process (J s ) on the underlying probability space Ω is I -independent, if for each s, fixing the value of J s does not impose any restriction on (i.e. is compatible with all) the transitions of the Schelling process from stage s to stage s + 1 that involve a non-bogus swap and a node from I . Here we use boldface font for J s because this process will typically be global, in the sense that it involves information about the process that is not restricted to the infected segment I . In the following lemma we use (J * s ) for the underlying I -independent global process in order to indicate that it refers to the subsequence of stages sw 5 of the process, much likeD * s .

Lemma 15 (I -supermartingale)
Given an infected interval I , the processD * s − 10ws is a supermartingale relative to any I -independent process J * s to whichD * s is adapted.
Proof Given an I -independent process J * s such thatD * s is adapted to J * s (i.e.D * s is a function of J * s ) it suffices to show that E D * s J s−1 ≤D * s−1 + 10w for all s. LetD * 0 s be the number of nodes in the fresh part of I s , to the left of its incubator, that will ever become anomalous, up to stage T g . Similarly letD * 1 s be the number of nodes in the fresh part of I s , to the right of its incubator, that will ever become anomalous, up to stage T g . ClearlyD We will show that all of these events yield small expectation (conditional on J s ) on the number of happy α-nodes that will ever appear in the interval H * i s after I -stage sw 5 of the original process (in particular, the probabilities of (b)-(d) are very small). We decide to accept 4w happy α nodes in H i s as a desirable (i.e. not too high) count. So, irrespective of likelihood, event (a) is desirable. Note that by Lemma 14, in w 5 many I -stages I cannot grow by more than 2w 5 / (1 − τ − ρ). (11) Note that Lemma 11 also holds locally, by the same proof. In other words, given an interval of nodes of length , then the probability that at stage s + 1 a bogus swap will occur involving a β-node from the given interval is bounded above by /G s . Since all stages are bounded by T g , it follows that the probability (conditional on J * s−1 ) of a bogus swap in an area of length is less than 4w /nτρ. Hence by (11), event (b) has probability (conditional on J * s−1 ) upper bounded by w 2·5+2 /n. In this case we can bound the expectation trivially by w 3·5+2 /n = w 17 /n. Now suppose that (a), (b) do not occur so that, by Lemma 14, each subinterval of length w in the interior of H * i s has α-proportion at most τ − * at I -stage sw 5 , where recall that ( * = 1 − τ − ρ)/2. In particular, all α nodes in the interior of H * i s are unhappy, and remain so unless wδ bogus swaps happen in H i s . We wish to show that in this case event (c) has probability (conditional on J * s−1 ) upper bounded by (w 3·5 /n) wδ . Indeed couple this process (conditional on J * s−1 , where each stage is either a non-bogus swap in H * i s , or something else) with a gambler's ruin process, where the gambler has w 5 /δ chips and the house has wδ chips, and the ratio of the winning probabilities is less than q = 4w 5+2 /n in favor of the house. Then we can estimate an upper bound the probability that wδ bogus swaps occur in J i s before all the interior turns into a β-firewall. According to the standard gambler's ruin result, this is which is also a bound on the (conditional) probability of event (c). Now assume that (a)-(c) do not occur, and lets estimate an upper bound for the probability of (d). Again, couple this process (conditionally on J * s−1 ) with a biased random walk where a negative move corresponds to a bogus swap moving something from the w border (one or the other) of the firewall, and a positive move is swapping the α-node at the edge with a β-node (other events are ignored). The ratio of the probabilities is bounded above by 2w/(nτρ/4w) which is bounded by w 3 /n. Also note that a negative move chips (at most) w away from the firewall, while a positive move only contributes (at least) one node to the firewall. Then the probability that it will eat up tw at any future time is bounded by w · (w 3 /n) t−1 . For t = 4 we get event (d) has probability (conditional on J * s−1 ) upper bounded by w 10 /n 3 . Then the expectation of the number of anomalous nodes that will ever appear in H * i s is bounded by Finally under case (e) it is clear that the conditional expectation of D * i s is also bounded bȳ D * i s−1 + 5w. Considering all the different cases, by the law of alternatives for conditional expectation we have that E D * i s J * s−1 ≤D * i s−1 + 5w, which concludes the proof.
Let I j , j < t be the infected segments (and I j [s] their state at stage s). Recall thatD s is the sum of allD s (I j ), j < t. In order to boundD s we need to prove a global version of Lemma 15. An immediate obstacle is the asynchrony of the I -stages with respect to the various infected segments I . We need to find a process L s relative to whichD s (or some 'asynchronous' versionD s of it) is a supermartingale. For each j < t let τ s ( j) be the stage where exactly s · w 5 many I j -stages have occurred. Also let (τ i ) be a monotone enumeration of the times {τ s ( j) | j < t, s ∈ N}. Let λ s ( j) be τ m ( j) for the maximum m such that τ m ( j) ≤ s. LetD s be the sum of allD λ s ( j) (I j ), j < t. The point of this definition is thatD s considers values ofD(I j ), j < t at the last stage ≤ s where they completed a cycle (which happens at every w 5 many I j -stages) and outputs their sum. Define L s to be the vector containing the tuples (D λ s ( j) , λ s ( j)) for each j < t. In this way, the process (D s ) is adapted to (L s ) (in other words, for each s, the value ofD s is a function of L s ). Note thatD s remains constant in the intervals [τ s , τ s+1 ), just asD λ j (s) remains constant in the interval [τ s ( j), τ s+1 ( j)).

Lemma 16
The processD τ s − 20ws is a supermartingale relative to the process L τ s .
Proof Using the law of alternatives for conditional expectation, it suffices to show that for each s there is a (finite) partition A of events relative to L τ s such that for each Each event A ∈ A describes which pair of infected intervals I j completes a cycle at stage τ s+1 , and the sequence of I j -stages (for each of the two j) from λ j (τ s ) to τ s+1 . Formally, event A is a tuple one tuple (m 0 , m 1 ) where m i < t, and for each i = 0, 1 an increasing sequence of stages starting from λ m i (τ s ) and ending on the same number a. If m 0 = m 1 then the two sequences should be the same. The meaning of A is that τ s+1 = a and infected intervals with indices m i are hit at stage a, with the sequence of stages representing the exact stages from λ m i (τ s ) to a where a swap occurs in I m i . By the definition of A, this event is I m i -independent for i = 0, 1. At stage τ s+1 of the process there must be exactly one tuple (m 0 , m 1 ) where i < t, such that the swap occurred in I m 0 and I m 1 . For each such event A on L τ s we have and by Lemma 15 we have since A is I m i -independent for i = 0, 1. Therefore (12) holds for each of the events A. By the law of alternatives, and since there can be at most two infected segments that complete a cycle at stage τ s+1 , we get ThereforeD τ s − 10ws is a supermartingale adapted to L τ s .

Corollary 8
Let a ∈ N. With probability > 1 − 1/a, for all s < T g we have D s < a + 20s Proof By Lemma 16 and the maximal inequality for supermartingales, given any a > 1, with probability at least 1 − 1/a we haveD τ s < a + 20ws for all s < T g . Since each stage can be an I j -stage for at most two distinct j < t, we have |{i | τ i ≤ s}| ≤ 2s/w 5 . Hence for each a > 1 we have with probability > 1 − 1/a,D s < a + 20s w 4 for all s < T g .
Also, note that at each stage s we haveD s (I j ) ≤D λ j (s) (I j ) + w 5 for each j < t. Hencē D s ≤D s + nw 5 e −O(w) . Since we also have D s ≤D s for all s, the corollary follows from (13).

Bounding the Arrival Time to a Safe State
By Lemma 13 and Corollary 8 we have the desired bound on Z s .

Corollary 9
Let a ∈ N. With probability > 1 − 1/a, for all s < T y we have Z s < a + 20s By Lemma 11 and Corollaries 12 and 9 we have the following Corollary 10 Let a ∈ N. With probability > 1 − 1/a, for all s < min{T y , n} we have w 3 · Z s = o n and p s = o 1 .
The following result is the technical basis for the result that with high probability a safe state will be reached (at some finite stage). It says that, with high probability the stopping times T y , T g are equal and are bounded by n.
Lemma 17 (Stopping times) With probability 1 − o 1 we have T y = T g < n.
Proof Let > 0 such that 1 − > ρ + 1/8. By Hoeffding's inequality for Bernoulli trials we may consider n large enough such that the probability that G 0 > (ρ + 1/8)n is less than /4. Recall that p s is the probability of a bogus swap at stage s + 1. Suppose that w is large enough such that with probability at least 1 − /4 (a) w 2 C < nτρ/32; Clause (a) can be ensured by Lemma 33. Clause (b) can be ensured by Corollary 9. Clause (c) can be ensured by Corollary 10. First, for a contradiction, assume that T y < T g . Then G T y < Y T y . By Corollary 10 and since (by definition) T y ≤ T g we have which is the required contradiction. Hence with probability > 1 − /4 we have T y = T g . Second, we show that with probability at least 1 − /2 we have T y < n. By clause (c) above, with probability at least 1 − /4, at all stages s < min{T y , n} we have p s < 2 /4. (14) By (14), with probability at least 1 − /4, the expectation of the number of bogus swaps that have occurred by stage T y is < 2 · T y /4. Hence, conditionally on the event that p s < 2 /4 for all stages s ≤ min{T y , n}, the probability that by stage min{T y , n} more than n bogus swaps have occurred is less than /4. Hence the unconditional probability that by stage min{T y , n} at most T y bogus swaps have occurred is at least (1 − /4) 2 > 1 − /2. We conclude the argument. We have established that the probability of the event T y < T g or G 0 > n(ρ + 1/8) is bounded by /2. It remains to show that outside this rare event, T y < n. Since every non-bogus swap reduces G s by (at least) 1, and G 0 ≤ n(ρ + 1/8), ρ < 0.5, with probability at least 1 − /2 we have which shows that T z = T g < n with probability at least 1 − . Corollary 11 (Safe state arrival) Suppose that τ + ρ < 1, τ > 0.5. Then with high probability the process (n, w, τ, ρ) reaches a safe state, and then complete segregation.
Proof Let > 0. By the law of large numbers, with probability at least 1− /4 and sufficiently large n we have 3ρ < 4ρ * . Pick w, n large enough such that (a) T g = T y < n with probability > 1 − /4; Clause (a) can be ensured by Lemma 17 and clause (b) can be ensured by Corollary 9. Clause (c) can be ensured by Lemma 33. By the definition of T g , G T g ≤ τρn/(4w). Hence by Corollary 10 we have But mix ≤ U · w(w + 1) so the mixing index at stage T g is less than nτρ * · (w + 1). In other words, T mix ≤ T g , so by Proposition 1 the process at stage T g is in a safe state, with probability more than 1 − . Hence by Corollary 5, the process will arrive to complete segregation with probability at least 1 − .

The Case When Intolerance is at Most 50% ( ≤ 0.5)
In this case the behaviour of the process (n, w, τ, ρ) is very different, since the mixing index is strictly decreasing. 20 This means that the process is bound to arrive to a dormant state, with absolute certainty. Note that if τ ≤ 0.5 then complete segregation is a dormant state, but it can be shown that the final state is never complete segregation. We show that in most typical cases for ρ, the outcome is static when τ ≤ 0.5.

Overview of the Argument When ≤ 0.5
We assume that ρ < 0.5 because the case ρ = 0.5 has already been analysed in [1,7] and the case ρ > 0.5 is symmetric. Hence on the hypothesis τ ≤ 0.5 we have ρ + τ < 1 and by Table 3 the unhappy α-nodes are an arbitrarily small proportion of the α-nodes as w → ∞. In any case, since ρ < 0.5 < 1 − ρ we have τ − ρ < 1 − τ − ρ, so the probability that an α-node is unhappy is much smaller than the probability that a β-node is unhappy. However what matters in the analysis for τ ≤ 0.5 is the relationship between the likelihood of stable intervals and unhappy α-nodes. This analysis is a reminiscent of the work in [1], but has some new features. Table 8 Likelihood of various properties in the initial configuration under certain conditions, when ρ ≤ 0.5 and τ ≤ 0.5

Property
Probability Distribution Likelihood Always rare

Definition 8 (Stable intervals)
A stable interval is an interval of nodes of length w which contains at least (2w +1)τ nodes of one or the other type. An interval is α-stable if it contains at least (2w + 1)τ nodes of type α.
The β-stable intervals are defined analogously. Note that no α-node which is inside an α-stable interval can swap during the process. The reason is that such α-nodes are happy just because of the presence of the other α-nodes in the same interval. Then a simple induction shows that they will continue to be happy throughout the process, thereby remaining immune to swaps and fixed in their initial positions. A similar observation applies to β-stable intervals. The existence of stable intervals is characteristic to the case τ ≤ 0.5.
The events we are interested in are the occurrences of α-stable intervals and unhappy αnodes. The probabilities P stab , P unhap of these two rare events can be viewed as tails of certain binomial distributions. Consider the variables, probabilities and distributions of Table 8. It is not hard to see that and We are interested in the event where the ratio P unhap /P stab becomes small, because of the following fact.

Lemma 18 (Static processes)
Suppose that τ, ρ are such that P unhap = O c −w · P stab for some c > 1. Then with high probability the process (n, w, τ, ρ) is static, and in fact there exists some c * > 1 such that with high probability the process stops after at most n · c −w * many steps.
The intuition here is that, if the unhappy α-nodes are much more rare than the α-stable intervals (i.e. if P unhap = o P stab ) then it is very likely that unhappy α-nodes are enclosed in small intervals which are guarded by α-stable intervals. This means that the familiar cascades that can be caused by the eviction of an unhappy α-node are bound to be contained in small areas of nodes. The very definition of stable intervals ensures that such cascades cannot pass through them. Hence the condition P unhap = o P stab guarantees that any α-to-β swaps are contained in small areas of nodes of total size o n . Due to the monotonicity of the mixing index, this means that there can only be at most o n swaps in this case.
The second item in Fig. 1 shows the probabilities P stab , P unhap (for w = 100) with respect to τ, ρ. We see that for points away from (0.5, 0.5), the surface P unhap is above P stab , and there is a threshold curve beyond which the opposite relationship is established. Using basic results about the tail of the binomial distribution, and Stirling's approximation we can derive the following sufficient condition for P unhap = o P stab : The third item of Fig. 1 is a representation of g(τ, ρ) in the space, up to where it becomes negative, at which point we project it on the plane. The values of τ, ρ that we are interested ρ < λ 0 and τ ≤ λ 0 correspond to points on the plane, outside the collapsed area. This boundary (a curve) is more clear in the first item of Fig. 1 which is the projection of the surface to the plane, with different colours indicating the points which make g positive or negative. This boundary can be simplified (with slight loss of generality) if we consider the line that passes from the two points where the boundary curve intersects the lines τ = 0.5 and ρ = 0.5. Hence if 2ρ·(1−2κ 0 )+τ +κ 0 < 1, we are in the stable region, which shows a clause of Theorem 1. Note that both of the partial derivatives of g are negative when τ, ρ ∈ [0, 0.5). If we fix ρ = 0.5 then the largest value of τ that keeps g(τ, ρ) ≥ 0 is the solution (κ 0 ≈ 0.353092313) of the first equation of Table 9. Hence we may conclude that if τ < κ 0 and ρ ∈ (0, 0.5] then P unhap = O c −w · P stab for some c > 1. We can also look for the largest square that is contained in the large area of the first item of Fig. 1 (where the process is static). The edge of this square is given in Table 9. Hence if ρ, τ ∈ (0, λ 0 ) then P unhap = O c −w · P stab for some c > 1.
We have one last observation to make about the function g. If we let do not restrict the values of τ ∈ (0, 0.5) then we wish to find the values of ρ such that g(τ, ρ). According to the properties of g (in particular its negative derivative on ρ), these are all the positive numbers which are less than the limit (which is also an infimum) lim τ →0.5 Hence we may conclude that if ρ ≤ 0.25 and τ ∈ (0, 0.5) then P unhap = O c −w · P stab for some c > 1. This concludes the proof of the second clause of Theorem 1.

Proofs for the Case ≤ 0.5 (Stable Intervals)
Let P stab be the probability that an interval of length w in the initial configuration is α-stable. In order to express P stab as a tail of the binomial distribution with w trials and probability of success 1 − ρ let Z stab ∼ B(w, 1 − ρ). Then a w-block is α-stable if and only if there are at least (2w + 1)τ successes. Similarly, let Z unhap be the probability that a node u in the initial configuration is α and unhappy. Note that if u is an α-node then it is unhappy if and only if there are more than (2w If w is sufficiently large, this is equivalent to having more than 2w(1 − τ ) nodes of type β in N (u) − {u}. These are 2w Bernoulli trials with probability of success ρ, and any unhappy α-node has at least 2w(1 − τ ) successes. In order to express P unhap as a tail of the binomial distribution with 2w trials and probability of success ρ let Z unhap ∼ B(2w, ρ). Then and We are interested in the event where the ratio P unhap /P stab becomes small. Note that P unhap , P stab are functions of w. Hence, using the asymptotic notation, we may say that we are interested in finding conditions on τ, ρ such that P unhap = o P stab , which means that P unhap /P stab → 0 as w → ∞. By Hoeffding's inequality for Bernoulli trials, if 2τ + ρ < 1 stable intervals are common. Hence if 2τ + ρ < 1 then P unhap = o P stab . (16) Note that if τ < 1/4 we have 2τ + ρ < 1, so we have what we want (independently of ρ).
In other words, if τ ≤ 1/4 and ρ < 1/2 then P unhap = o P stab . Similarly, if ρ < 1/3 and τ < 1/3 then P unhap = o P stab . When 2τ + ρ > 1, we have P stab → 0 and P unhap → 0 so the previous argument does not apply. The analysis in this case requires some work, and gives better results than the ones we just discussed. We wish to derive conditions on τ, ρ under which we have P unhap = o P stab . Our argument only requires that τ, ρ < 0.5 and τ + ρ < 1. However it is particularly essential for the case where (in addition) 2τ + ρ > 1, which is not covered by our previous discussion. We start by calculating suitable bounds for P unhap , P stab .
Hence by Lemma 24 there exists a quadratic polynomial w → p(w) such that, P stab is bounded below by Since (1 + 1/(2w)) 2w ≥ e −1 we get The next step is to establish a suitable upper bound for P unhap . For this reason, and since ρ < 1 − τ , we may use Lemma 25 with h(2w) = 2w(1 − τ ) and parameter k ∈ (τρ/(1 − ρ)(1 − τ ), 1) in order to get Then by Lemma 24 there exists a quadratic polynomial w → q(w) such that Hence, combining this inequality with (17), there exists a quadratic polynomial w → r (w) such that We are interested in conditions on τ, ρ which guarantee that P unhap /P stab → 0 as w → ∞. The latter condition is equivalent to the condition that the expression inside the square brackets in (18) is less than 1. So for P unhap /P stab → 0 it suffices that i.e. that the function (15) is positive. The first illustration of Fig. 1 is a representation of g(τ, ρ) in the space, up to where it becomes negative, at which point we project it on the plane. The values of τ, ρ that we are interested correspond to points on the plane, outside the collapsed area. This boundary is more clear in the second illustration of Fig. 1 which is the projection of the surface to the plain, with different colours indicating the points which make g positive or negative (along with three areas which we are going to discuss in the following). Note that both of the partial derivatives of g are negative when τ, ρ ∈ [0, 0.5). If we fix ρ = 0.5 then the largest value of τ that keeps g(τ, ρ) ≥ 0 is the solution of the equation This equation was previously considered in [1]. It has a unique solution in [0, 0.5] which is κ 0 ≈ 0.353092313. Hence (as indicated in the second illustration of Fig. 1) we may conclude that if τ < κ 0 and ρ ∈ (0, 0.5] then P unhap = O c −w · P stab for some c > 1. Let us find the largest number λ 0 such that for all τ, ρ ≤ λ 0 we have g(τ, ρ) ≥ 0. In other words, we are asking for the edge of the largest square that is contained in the large area of the second illustration of Fig. 1. According to the dynamics of g, this is the solution of the equation We have one last observation to make about the function g. If we let do not restrict the values of τ ∈ (0, 0.5) then we wish to find the values of ρ such that g(τ, ρ). According to the properties of g (in particular its negative derivative on ρ), these are all the positive numbers which are less than the limit lim τ →0.5 Hence we may conclude that if ρ ≤ 0.25 and τ ∈ (0, 0.5) then P unhap = O c −w · P stab for some c > 1. (22) This area of (τ, ρ) is indicated in the second illustration of Fig. 1.
We wish to show that, for some c > 1, the condition P unhap = O c −w · P stab implies that the process is static. Recall that the definition of static processes (see Definition 1) is given in terms of the number of swaps that occur throughout the process. Before we lay out the main argument, we note that this definition can be alternatively and equivalently given in terms of the number of nodes that swap throughout the process, or even in terms of the growth of the infected area throughout the process. In particular, we show that a process is static if and only if one of the following conditions hold: -an arbitrarily small proportion of nodes are ever involved in swaps -the final length of the infected area is an arbitrarily small proportion of the population.
The number of swaps S T that have occurred before we arrived at a certain state T tells us how late into the process we are by the time we hit that state. If M T is the number of nodes that changed colour by the time we arrived in S T then 4S T /(2w + 1) 2 ≤ M T ≤ 2S T which means that the variables S T , M T provide similar information about the history of the process before state T was reached (recall that a state can be reached at most once through the process). The second inequality is clear, while the first one requires justification. If the same bunch of, say, k nodes keep getting swapped with each other, then the welfare progress is concentrated in their neighbourhoods, which consist of at most (2w + 1)k nodes. Since each swap increases the social welfare by at least 4, and the total social welfare of the (2w + 1)k nodes cannot exceed (2w + 1) 2 , it follows that that a collection of k nodes can sustain at most (2w + 1) 2 k/4 swaps. Hence S T swaps have to produce at least 4S T /(2w + 1) 2 many moved nodes. So M T ≥ 4S T /(2w + 1) 2 .
Another metric that provides similar information, is the length of the infected area at that state. It can be shown that the two metrics agree, in the sense that knowing one can provide a bound for the other. We only need one direction of this 'equivalence', which we encapsulate in the following lemma.
Lemma 19 (Length of infected area and number of swaps) Suppose that the infected area at stage t of the process has length . Then the number of swaps that have occurred up to stage t is bounded above by (w + 1) .

Proof
In general, when a swap occurs (and τ ≤ 0.5) the β-welfare increases by at least 2 (one for the one which moved and one from the difference of the other β-nodes in the two neighbourhoods). Note that each swap (whether it is bogus or not) moves a β-node into a position inside the interior of the infected area. This is because at the end of each stage, only the interior of the infected area contains unhappy α-nodes. Moreover the infected area does not shrink. We wish to count the increase of the β-welfare inside the infected area at each swap. If a swap is bogus, then an α-node inside the interior of the infected area moves to a position which is on the boundary of the infected area. In this case we count only part of the decrease of the β-welfare in the new position of α. But since the total decrease is less than the increase (which occurs inside the new neighbourhood of the α-node, entirely included in the infected area) there is still a positive increase of at least 2 overall, of the total β-welfare inside the infected area. Hence we may conclude that each bogus swap strictly increases the β-welfare of the nodes in the infected area. On the other hand each non-bogus swap involves the swap between an α-node inside the interior of the infected area and a β-node outside the infected area. Again the difference in the β-welfare in the two neighbourhoods is at least 2, and we only have to count the over all increase of the β-welfare in the interior of the infected area (which is even larger than the overall increase of the β-welfare in the total population). It follows that every swap increases the β-welfare of the infected area by at least 2. Now consider the process (n, w, τ, ρ) up to some stage t. The β-welfare of the nodes in the infected area at stage t can be at most (2w + 1) , so there can be at most (2w + 1) /2 i.e. swaps up to stage t.
Recall that the probabilities P unhap , P stab (as functions of w) are completely determined by the parameters τ, ρ. By the definition of α-stable intervals, the infected area cannot pass through them. Indeed, only the creation of unhappy α-nodes can spread the infected area, but no such event can happen through an α-stable interval. Based on this observation, we show Lemma 18, i.e. that the cases that we exhibited earlier in this section lead to static processes.

Proof of Lemma 18
By our previous discussion it suffices to show that, under the hypothesis of the lemma, there exists some d * > 1 such that for every > 0 and 0 w n, with probability at least 1 − the process (n, w, τ, ρ) stops after at most nd −w * many steps. Given a node u in a non-dormant state, let x u be the first node v to the left of u which has one of the following properties: (a) it belongs to an α-stable interval; (b) there is an unhappy α-node in its neighbourhood. Moreover let y u be the first node to the right of u which satisfies (a) or (b). Note that x u , y u are well defined and for each u it is not possible that x u or y u satisfy both (a), (b) (because an α-stable interval does not contain unhappy α-nodes). Given a state of the process, we say that a node u is exposed if at least one of x u , y u satisfies clause (b). Moreover, let us use the term exposed area for the collection of the exposed nodes.
Let us bound the probability that a node is exposed (in the initial configuration). By our hypothesis there exists constants c 1 > 0 and c 2 > 1 such that P stab > c 1 · c δw 2 · P unhap for all w. Then by Lemma 27, if n is sufficiently large, the probability that x u belongs to a stable interval is at least simple random walks. Here a random walk with respect to the integer-valued random variables (Z i ) is the stochastic process R k = r + i<k Z k , for some r ∈ N. We say that (R i ) is ruined at step k if k is the least number such that R k ≤ 0. The following two facts are folklore.
Lemma 21 (Random walk simulation) Let t 0 , t 1 ∈ N, X i ∈ {−t 0 , 0, t 1 } be (possibly dependent) random variables, letX i ∈ {−t 0 , 0, t 1 } be independent Bernoulli trials and let Y k = i<k X k ,Ŷ k = i<kX k be the associated random walks. Provided that, no matter what occurs at stages prior to i, at stage i we have , then for all k, x ∈ N the probability that (Y i + x) is ruined by step k is bounded above by the probability that (Ŷ i + x) is ruined by step k.
Lemma 22 (Biased random walks) Let t 0 , t 1 , r ∈ N, and let X i ∈ {−t 0 , 0, t 1 } be (possibly dependent) random variables such that at stage i, no matter what has occurred at previous stages, we have P[X i = t 1 | X i = 0] > t 0 /(t 0 +t 1 )+δ for some δ > 0. Let Y j = r + i< j X i , be the associated random walk. Then the probability that (Y j ) is ever ruined is bounded above by e −2r δ 2 /t 0 /(1 − e −2δ 2 ).
Our analysis depends on various exponential bounds that we can obtained on the expectations of certain parameters (e.g. the number of unhappy α-nodes). The following fact will be routinely used in order to express such bounds in a canonical form. In the following statement the variables Z s concern stage s of the Schelling process (n, w, τ, ρ) and the constants q, q , p are independent of n, w.

Lemma 23 (Expectation bounds) Let f be a polynomial, p < 1 and Z s a random variables
such that E(Z s ) < np for all s. If E(Z s ) ≤ n · f (w) · e −wq for some q > 0 and all all s and all sufficiently large w then there exists q > 0 such that E(Z s ) ≤ n · e −wq for all w, s.
The binomial distribution with t trials and success probability p is denoted by B(t, p), and Z ∼ B(t, p) means that random variable Z follows this distribution. Stirling's formula asserts that n! ≈ n n+ 1 2 e −n , i.e. that the limit of the ratio of the two expressions tends to 1 as n tends to infinity.
Lemma 24 (Stirling's approximation) There exists a polynomial y → p(y) such that for all k ∈ N and all x ∈ R ∩ (0, k) In our analysis of the Schelling process for the case when τ ≤ 0.5 we will need to compare the tails of different binomial distributions. For this purpose we use the following fact from [6, Theorem 1.1]. N ∼ B(N , p), p, k ∈ (0, 1) and for all sufficiently large N ,

Lemma 25 (Tails of the binomial distribution) Suppose that X
for all sufficiently large N . In asymptotic notation we have P[ The combination of this result with Lemma 24 gives the required information about the asymptotic behaviour of the ratio of the two binomial probabilities of interest (unhappy nodes and stable intervals).

Probability in Schelling Segregation
We lay out a general way for arguing about the probability of the various properties P u that a node u can have in the initial configuration. For example, P u could be the property 'at least half of the nodes in the neighborhood of u are of the same type as u'. Definition 9 (Rare and common events in the initial configuration) A property of a node in the initial configuration is called rare (or a rare event) if it holds with probability at most n · e −δw , for some positive constant δ which may depend on τ, ρ but not on w, n. A property whose negation is rare is called common.
Definition 10 (Local properties) Given f : N → N, a property P = P u of a node u in the initial configuration is f -local if it only depends on the nodes that are at most f (w)-near from u. In other words, P is f -local if, for any two nodes u, v such that for all i ∈ [− f (w), f (w)], u + i is of the same type as v + i, we have that P u holds if and only if P v holds.
Note that the two probabilities mentioned in Lemma 26 are on different spaces. The first one refers to the product space where a point is an infinite series of initial states. The second one refers to the space of points on a random initial state.
Lemma 26 (Strong law of large numbers for the Schelling process) Given a local property P u of nodes in the initial state of the process (n, w, τ, ρ), with probability one, as n → ∞ the proportion of nodes u that satisfy P u tends to the probability of P u .
Proof Let p be the probability of P u and let f be the function indicating the area around u on which P u depends (as in Definition 10). We wish to use the strong law of large numbers, so we need to manufacture a series of independent trials of properties with given expectation. Let m ∈ N be a parameter that depends on n (to be specified shortly). We consider the ring as a union of intervals of length m f (w) + 2 f (w) (which we think of an interval of length m f (w) with padding f (w) nodes on each side). We always assume that m f (w)+2 f (w) < n. Starting from node 0, denote the ith such interval by V i so that |V i | = m f (w) + 2 f (w). Also, denote the subinterval of V i that results from deleting the f (w)-node prefix and the f (w)node suffix of V i by I i . Hence |I i | = m f (w). Let M n ∈ N be the largest integer such that M n (m f (w) + 2 f (w)) ≤ n, so that M n → ∞ as n → ∞ and n − M n (m f (w) + 2 f (w)) < m f (w) + 2 f (w). Hence for each i < M n , the intervals V i are defined and are disjoint. The same is true for I i , i < M n . Moreover, if S is the set of all nodes, For each i < M n let Y i be the number of nodes u ∈ I i such that P u holds, and note that these random variables are independent. Moreover, by linearity of expectation, E(Y i ) = pm f (w).
Recall that M n → ∞ as n → ∞. According to the strong law of large numbers, i<M n Y i M n → pm f (w) as n → ∞, with probability 1.
By (23), the required proportion is where δ, ζ range in [0, 1) (depending on how close n is to being a multiple of f (m)(m + 2)).
If we take 0 m n, the ratio m/M n tends to 0, so by (24) the required proportion tends to Since 0 m, the required proportion tends to p. More formally, we may let m = log n. In this case, as n → ∞ we have m/M n → 0 because (log n) 2 /n tends to 0. Moreover M n → ∞ and m → ∞ when n → ∞ so the previous argument applies as indicated.
The following fact concerns pairs of properties P and Q that a node can have, which may both be rare but one (say P) occurs with much higher probability than the other. It asserts that in this case, a random node u is much more likely to be nearer to a node v satisfying P than a node t satisfying Q (although it may be far from any node satisfying P or Q). In the statement and proof of this result we use P u as a Boolean random variable which asserts that 'u satisfies P' (and similar with Q u ).
Lemma 27 (Rare properties in the Schelling ring) Let P u , Q u be -local properties of nodes in the initial state (where = w is a function of w) and for each node u let x u be the first node v to the right of u such that either P v or Q v holds. If ρ, λ are the probabilities of P u , Q u respectively, the probability that P x u and there is no node v with Q v to the left of and at distance at most from x u tends to a number ≥ ρ/(ρ + λ(2 + 1)) as n → ∞.

An analogous result holds when 'right' is replaced by 'left'.
Proof Consider a partition of the ring into disjoint neighbourhoods, starting from a node u 0 as follows. Given u = u 0 , suppose inductively that u t has been defined. Then define u t+1 = x u t + 2 + 1. This iteration continues as long as u t+1 < n. Let k n be the number of iterations in this recursive definition (i.e. the number of terms of the sequence (u t )). Consider the property T u : P x u holds and no node v to the left of x u and at distance at most satisfies Q v .
The sequence (u i ) can be seen as independent trials for this property. Let π n be the proportion of the terms of (u i ) that satisfy of T u i in a random initial state. Note that k n → ∞ as n → ∞ with probability 1. If π is the probability of T u , by the strong law of large numbers we have that π n → π as n → ∞ with probability 1. Let ρ n be the proportion of nodes that satisfy P u and let λ n be the proportion of nodes that satisfy Q u . Note that we view π n , ρ n , λ n as random variables that depend on the initial state. Then ρ n λ n ≤ (2 + 1)π n k n (1 − π n )k n = (2 + 1)π n 1 − π n ⇒ π n ≥ ρ n ρ n + λ n (2 + 1) By Lemma 26 we have ρ n → ρ and λ n → λ as n → ∞, which gives the required asymptotic bound.

Deferred Proofs from Sect. 2.1: Welfare, Mixing, and Expectations
The social welfare V of the state can easily be seen to be non-decreasing along the transitions of the process. Let us establish the relationship with the mixing index. Given a certain state of the process and a node u, we let u α denote the number of α nodes that are located in the neighbourhood of u at this state. Similarly, we let u β denote the number of β-nodes that are located in the neighbourhood of u. Given a state, let n α , n β be the number of α and β-nodes respectively, and let α i , i < n α and β j , j < n β be the finite sequences of α and β nodes respectively in the state. Hence α β j denotes the number of β-nodes that are located in the neighbourhood of α j while β α j denotes the number of α-nodes that are located in the neighbourhood of β j . Then In order to prove this equality, consider the configuration of α and β types in the state and start by removing all β from their positions. Then, adding the β types one-by-one back to their original positions we can see each placement incurs the same increase to the two sums. Hence by induction, the two sums are equal. We call the number in (25) the mixing index of the state, because it can be used as a metric of how mixed (i.e. not segregated) the population of α and β types is at the given state. Indeed, suppose that the state has at least 2w + 1 nodes of each type. In the state of complete segregation the sums in (25) take the value 2 · (1 + · · · + w), which is w(w + 1). This can be shown to be the minimum mixing index (in a state which has at least 2w + 1 nodes of each type). At the other extreme, if the two types are uniformly mixed (in the sense that every interval I has approximately ρ * · |I | green nodes) then the sums in (25) take approximately the value n · 2w · ρ * (1 − ρ * ), which can be shown to be the maximum possible mixing index. We also have From (25) and (26) we get V = (2w + 1) · n − 2 · mix.
A similar argument shows that the difference in the sum of the mixing indices of nodes in N v − I is 2t − 3i − 4y. Hence overall (and since the nodes outside N u ∪ N v maintain the same mixing index before and after the swap) the difference in the (total) mixing index is 4(x − y). Since x < y this means that a decrease by at least 4 occurs due to the swap.
In our analysis, one of the basic facts used is that dormant states have at least a reasonably high mixing index. If we can show that with high probability the process reaches a point where the mixing index is too low for dormant states to be accessible, then by Corollary 5 we will have shown that with high probability complete segregation is the eventual outcome. Proposition 1 below provides an appropriate bound for the mixing index of dormant states. First we prove a technical lemma, which will then be used in the proof of Proposition 1. Given c ∈ N, we say that a node γ is c-near to a node δ if either there are at most c − 1 nodes between γ and δ or there are at most c − 1 nodes between δ and γ .
Proof Since the second claim implies the first, it suffices to prove the second claim. By Lemma 3 we can assume that there are unhappy β-nodes in the given state. For a contradiction, suppose that some β-node is not (1 − τ )w -near to any α-node. Consider the α-node which is adjacent to the block and to the right of it. For large w, 2 (1 − τ )w + 1 < w, meaning that this α-node has at least 2 (1 − τ )w + 1 nodes of type β in its neighbourhood. Hence the α-node has at most 2w − 2 (1 − τ )w nodes of type α in its neighbourhood, which is less than (2w + 1)τ . The fact that this α node is unhappy means that the state is not dormant.
Proof Suppose that in a dormant state the mixing index is at most n(w + 1)τρ * . Since there are nρ * nodes of type β, there exists such a node u with mixing index at most (w + 1)τ . By Lemma 29 there exists an α-node v within (1 − τ )w nodes to the left or to the right of u. The number of α-nodes in the neighbourhood of ν is therefore at most (w + 1)τ + (1 − τ )w (because given c ∈ N, the mixing index can change by at most c when we shift inside an interval of β-nodes of length c). However this same number must be at least (2w + 1)τ since v is happy in a dormant state. This holding for arbitrarily large w would imply that (1 − τ ) ≥ τ which gives the required contradiction.
As another measure of mixing, we may consider the number k β of maximal contiguous β-blocks in the state. Let β i be the ith node of type β and let β α i denote the number of α-nodes in the neighbourhood around β i . Let [x, y] be a finite interval of integers such that {β i : i ∈ [x, y]} constitutes a block (i.e. there is no α-node between β x and β y ). If x − y ≥ w then β α x + · · · + β α y is bounded above by 2 · (1 + · · · + w) = w(w + 1). If x − y < w the number w(w + 1) continues to be a bound for β α x + · · · + β α y . Therefore i<n β β α i ≤ w(w + 1) · k β , where k β is the number of maximal β-blocks.
This inequality is a formal expression of the rather obvious fact that the fewer maximal β-blocks there are, the less mixed the two types are. By the definition of happy nodes, if τ > 0.5 and w > (1 − τ )/(2τ − 1) then no two adjacent nodes of different types can both be happy. This means that, as we move around the circle of nodes, every time we cross the border between a maximal β-block and a maximal α-block we may count an additional unhappy node. So, provided that τ > 0.5 and w is sufficiently large, the number of maximal β-blocks is bounded above by the number of unhappy nodes in the state. Then by (27) we get mix ≤ w · (w + 1) · k β ≤ w · (w + 1) · U. Intuitively this inequality says that the only way to have a small number of unhappy nodes is a small mixing index, i.e. a large degree of segregation. On the other hand we may bound the number of unhappy nodes in terms of the mixing index. By (25) and the definition of unhappy nodes, 22 U α · (1 − τ )(2w + 1) ≤ mix and U β · (1 − τ )(2w + 1) ≤ mix, where U α , U β are the numbers of unhappy nodes of type α and β respectively. So mix ≤ w · (w + 1) · k β ≤ w · (w + 1) · U ≤ mix · 2w(1 + 1/w) (1 − τ )(2 + 1/w) < mix · 2w 1 − τ and 1 w · mix w+1 ≤ k β ≤ U < 2 1−τ · mix w+1 which means that if τ > 0.5 then U = Θ(k β ) = Θ(mix).

Deferred Proofs About Initial Expectations (Sects. 2.1 and 3.4)
An important part of our analysis relies on the values of the welfare metrics at the initial state. By measure concentration inequalities, with high probability, these will be near to their expected values, which we may compute. We start with the mixing index.

Lemma 30
The expectation of the mixing index in the initial state of (n, w, τ, ρ) is 2nwρ(1− ρ).
Proof Consider the random variables β α i and note that E β α i = 2w(1 − ρ) for each i. If n β is the number of β-nodes, the expectation of the mixing index in the initial state is n β · 2w(1 − ρ) by the linearity of expectation. If we see n β as a random variable, its expected value is nρ. By the rule of iterated expectation, the expected value of the mixing index is 2nwρ(1 − ρ).
Note that the expected value of the mixing index in the initial state is only slightly smaller than the maximum possible mixing index n · (2w + 1) · ρ * (1 − ρ * ). This is hardly surprising, as a random state will be almost perfectly mixed, with the occasional non-uniformities that are implied by randomness (e.g. the existence of contiguous blocks of certain sizes).
Next, we are interested in the expected number of unhappy nodes of each type. It is not hard to see that this depends on whether τ + ρ < 1 or τ + ρ > 1 (we will not consider the special case where τ + ρ = 1).
Proof Let X j be 1 if the jth node u j in the initial state is of type α and unhappy, and 0 otherwise. By Lemma 26, it suffices to show that E X j is e −Θ(w) . Recall that the nodes are labelled independently, following a Bernoulli distribution, with the probability of a β-label being ρ. Let = 1 − ρ − τ which is positive, according to our hypothesis. If u j is an unhappy α-node, then the proportion of β-nodes in its neighbourhood N (u j ) is larger than 1 − τ . Hence the proportion of β-nodes in N (u i ( j)) − {u i ( j)} is larger than 1 − τ , so it is at least ρ + .
Similar arguments give information about the number of the unhappy β-nodes and the incubators.
Proof Let Y j be 1 if u j is of type β and happy, and 0 otherwise. Then provided that ρ < τ, by Hoeffding's inequality for Bernoulli variables we have that E X j ≤ ρ · e −4w(τ −ρ) 2 . Then Lemma 26 gives the first clause of the claim. Now lets assume that we also have τ + ρ < 1. Then by the second clause of Lemma 20 we get E X j ≥ ρ · e −4w(τ −ρ) 2 /ρ . This application is possible with p = ρ and = τ − ρ because ρ < 0.5 and τ + ρ < 1, which means that < 1 − 2 p. Then by Lemma 26 we get the second clause of the claim.

Lemma 33 (Number of incubators)
If τ + ρ < 1, the probability that a node belongs to an incubator is e −Θ(w) . Hence with high probability the number of incubators as well as the number of nodes belonging to incubators of the process (n, w, τ, ρ) is ne −Θ(w) .
Proof Let * = (1 − τ − ρ)/2, and let X j be the index variable of the event that the left semi-neighbourhood of the jth node has less than (τ + * )w many α-nodes. Given that τ + ρ < 1, by Hoeffding's inequality for Bernoulli variables (Lemma 20) and the tightness of it (Lemma 20), the probability that X j = 1 is e −Θ(w) . Let Y j be the index variable of the event that the jth node belongs to an incubator, so that the probability that Y j = 1 is e −Θ(w) (since (2w + 1)e −Θ(w) is e −Θ(w) ). Hence E(Y j ) is e −Θ(w) . Then by Lemma 26 with high probability the number of nodes belonging to incubators of the process (n, w, τ, ρ) is ne −Θ(w) .

Deferred Proofs from Sect. 3.1: Persistent Blocks and Unhappy Nodes
Now we focus on the case where τ > 0.5 and τ + ρ < 1. Having established that a low number of unhappy nodes suffices to ensure dormant states are inaccessible, we now wish to show that such state is reached, before any dormant state is reached. Since in this case there are always unhappy β-nodes, we are only concerned about the existence of unhappy α-nodes. One way to ensure this is to establish the existence of blocks of β-nodes of length > w.
Lemma 34 (Persistent β-block) Consider the process (n, w, τ, ρ) with τ > 0.5 and let s * be the least stage where the ratio between the very unhappy β-nodes and the unhappy α-nodes becomes less than 4w 2 (putting s * = ∞ if no such stage exists). Then with high probability there is a β-block of length ≥ 2w at all stages < s * of the process.
Proof Let > 0, let δ = 2w/(2w + 1) − w/(w + 1), and let y be a sufficiently large integer so that e −2(y−2w)δ 2 /w /(1 − e −2δ 2 ) < /2. β i , i < k have disjoint neighbourhoods, each of them containing at least y(2w + 1) nodes of type α. Hence ky(2w + 1) ≤ x so k(2w + 1) ≤ x/y which means that the number of nodes that are contained in the union of these neighbourhoods is bounded by x/y. Since these are neighbourhoods of β-nodes that are not weak, the number of β-nodes that are contained in the union of these neighbourhoods is at most x/y(1 − y) = x/y − x. Let x i be the distance between the right endpoint of the neighbourhood of β i and the left endpoint of the neighbourhood of β i+1 . Note that for each i < k there is a block of at least x i nodes of type α in the left semi-neighbourhood of β i+1 . Indeed, according to the definition of (β i ), the only reason why there is some distance d between the two endpoints is that a block of α-nodes of length d immediately to the left of β i+1 . We may conclude that there are at least i x i nodes of type α. By the hypothesis the α-nodes are exactly x, so i x i ≤ x. Hence the number of β-nodes in B that do not belong to the neighbourhood of some β i , i < k is at most x + 2w (where 2w is an upper bound for the number of β-nodes in the final segment of B to the right of the neighbourhood of β k−1 , or the whole of B if k = 0). Hence, overall, there are at most (x/y − x) + (x + 2w) = x/y + 2w nodes of type β in B, which concludes the proof.
A β-node is unhappy if and only if the proportion of α-nodes in its neighbourhood is more than 1 − τ . Hence we may apply L emma 35 with y equal to a value that is slightly larger than 1 − τ (taking the limit y → 1 − τ from above and taking into account that the number of nodes are integers) gives the following bound on the number of unhappy β-nodes in a block.
Corollary 12 (Unhappy β-nodes versus α-nodes) Consider a block of adjacent nodes of type α or β such that exactly x of these nodes are of type α. Then there are at most x/(1 − τ ) + 2w unhappy β-nodes in this block.
By applying this fact to each of the infected segments of the process, and adding up the numbers unhappy nodes in each of the segments we see that Y s ≤ Z s /(1 − τ ) + 2wC, which is the fact used in the main part of our analysis.