1 Introduction

Tournaments play an important role in numerous situations as a means of representing entities and a dominance relationship between them. For instance, both the outcome of a round-robin sports competition and the majority relation of voters in an election can be represented by a tournament. A question that occurs frequently is therefore the following: given a tournament, how can we choose the “best” alternatives in a consistent manner? This question has been addressed by a rich and beautiful literature on tournament solutions, which have found applications in areas ranging from sports competitions (Ushakov 1976) to multi-criteria decision analysis (Arrow and Raynaud 1986; Bouyssou 2004) to biology (Schjelderup-Ebbe 1922; Landau 1953; Slater 1961; Allesina and Levine 2011). Over the past half century several tournament solutions have been proposed, two of the oldest and best-known of which are the top cycle (Good 1971; Schwartz 1972; Miller 1977) and the uncovered set (Miller 1980).Footnote 1

Given that the purpose of tournament solutions is to discriminate the “best” alternatives from the remaining ones, it perhaps comes as a surprise that many common tournament solutions—including the top cycle, the uncovered set, the Banks set, and the minimal covering set—select all alternatives with high probability in a large random tournament (Fey 2008; Scott and Fey 2012). Put differently, the aforementioned tournament solutions almost never exclude any alternative in a tournament chosen at random. Nevertheless, these results are based on the uniform random model, in which all tournaments are drawn with equal probability, or equivalently each edge is oriented in one direction or the other with equal probability independently of other edges. For a large majority of applications of tournaments, one would not expect that this assumption holds. Indeed, stronger teams are likely to beat weaker teams in a sports competition, and candidates with a large base of support have a higher chance of winning an election. Moreover, real-world tournaments often exhibit a certain degree of transitivity: If alternatives a, b, and c are such that a dominates b and b dominates c, then it is more likely that a dominates c than the other way around.

A more general model of random tournaments is the Condorcet random model, previously considered by Frank (1968), Łuczak et al. (1996), Vassilevska Williams (2010) and Kim et al. (2017). In this model, there is a linear order of alternatives, which can be interpreted as an ordering of the alternatives from strongest to weakest. For each pair of alternatives, the probability that the edge is oriented from the alternative that occurs later in the linear order to the alternative that occurs earlier in the linear order is p, independently of other pairs of alternatives.Footnote 2 Crucially, the value of p is the same for all pairs of alternatives. The Condorcet random model generalizes the uniform random model, since the latter can be obtained from the former by taking \(p=1/2\). Łuczak et al. (1996) showed that under the Condorcet random model, the top cycle selects all alternatives as long as \(p\in \omega (1/n)\). The same authors show furthermore that this bound is tight, that is, the statement no longer holds if \(p\in O(1/n)\).Footnote 3

Although the Condorcet random model addresses the issues raised above with regard to the uniform random model, it is still rather unrealistic for two important reasons. Firstly, in tournaments in the real world, the orientation of different edges are typically determined by different probabilities. For instance, in a sports tournament the probability that a very strong team beats a very weak team is usually higher than the probability that a moderately strong team beats a moderately weak team; a similar phenomenon can be observed in elections. Secondly, even though one can roughly order the alternatives in a tournament according to their strength, it is often the case that not all probabilities of the orientation of the edges respect the ordering. Indeed, this precisely corresponds to the notion of “bogey teams”—weak teams that nevertheless frequently beat certain supposedly stronger teams. Given the limitations of the uniform random model and the Condorcet random model, it is natural to ask whether previous results continue to hold under more general and realistic models of random tournaments, or whether they break down as soon as we move beyond these restricted models.

In this paper, we show that a number of tournament solutions, including the top cycle and the uncovered set, still choose all alternatives with high probability under a significantly more general model of random tournaments. Unlike the Condorcet random model, our model does not rely on an ordering of the alternatives. Instead, the orientation of each edge is determined by probabilities within the range \([p,1-p]\) for some parameter p, and these probabilities are allowed to vary across edges. The only substantive assumption that we make is that the orientations of different edges are chosen independently from one another. Under this model, which is more general than both the uniform random model and the Condorcet random model, we establish in Sect. 3 that the top cycle almost never rules out any alternative as long as \(p\in \omega (1/n)\), thus generalizing the result by Łuczak et al. (1996). We also show that our bound is asymptotically tight, and that analogous results hold for two other tournament solutions based on the set of Condorcet winners and losers as well. Moreover, we prove in Sect. 4 that the uncovered set is likely to include the whole set of alternatives when \(p\in \omega (\sqrt{\log n/n})\). This bound is again asymptotically tight, and the same holds for another tournament solution based on the uncovered set. Since the condition that the top cycle or the uncovered set chooses all alternatives have meaningful graph-theoretic interpretations—the top cycle is the whole set of alternatives if and only if the tournament is strongly connected,Footnote 4 and the uncovered set fails to exclude any alternative exactly when all alternatives are kingsFootnote 5—we believe that our results are of independent interest in graph theory and discrete mathematics. Finally, we complement our theoretical results with experimental data in Sect. 5.

1.1 Related work

The study of the behavior of tournament solutions in large random tournaments goes back to Moon and Moser (1962), who showed that the top cycle almost never rules out any alternative in a large tournament chosen uniformly at random. In fact, they proved a stronger statement that the probability that the top cycle excludes at least one alternative is inverse exponential in the number of alternatives; the estimate was later made more precise by Moon (1968) in his seminal book on tournaments. Bell (1981) also considered the top cycle but assumed that tournaments are generated from the preferences of a large number of voters, each with a uniform random ranking over the alternatives; he likewise found that the top cycle selects all alternatives with high probability under this assumption. Fey (2008) and later Scott and Fey (2012) established results on several tournament solutions including the uncovered set, the Banks set, the Copeland set, the minimal covering set, and the bipartisan set using the uniform random model. While the uncovered set, the Banks set, and the minimal covering set are likely to include all alternatives in a large random tournament, the same event is unlikely to occur for the Copeland set. On the other hand, the bipartisan set chooses on average half of the alternatives in a random tournament of any fixed size (Fisher and Ryan 1995); it is the unique most discriminating tournament solution satisfying standard properties proposed in the literature (Brandt et al. 2018).

The discriminative power of tournament solutions has also been investigated empirically by Brandt and Seedig (2016). Building on the observation that the distributions of real-world tournaments are typically far from uniform, these authors examined the behavior of eleven common tournament solutions on tournaments generated according to stochastic preference models and empirical data. The stochastic models that they used include the impartial culture model, the Mallows mixtures model, and the Pólya–Eggenberger urn model. They reported that under these more realistic models, most tournament solutions are in fact much more discriminating than the analytical results for uniform random tournaments suggest.

2 Preliminaries

A tournament T consists of a set \(A=\{a_1,a_2,\ldots ,a_n\}\) of alternatives and a dominance relation. The dominance relation is an asymmetric and connex binary relation on A represented by a directed edge between each unordered pair of distinct alternatives in A. We say that alternative \(a_i\)dominates another alternative \(a_j\) if there is an edge from \(a_i\) to \(a_j\). An alternative is said to be a Condorcet winner if it dominates all of the remaining alternatives, and a Condorcet loser if it is dominated by all of the remaining alternatives. We extend the dominance relation to sets and say that a set \(A'\subseteq A\) of alternatives dominates another set \(A''\subseteq A\) of alternatives disjoint from \(A'\) if for all \(a'\in A'\) and \(a''\in A''\), \(a'\) dominates \(a''\). A tournament is commonly interpreted as the outcome of a round-robin sports competition and as the majority relation of an odd number of voters with linear preferences. In the former interpretation, alternative \(a_i\) dominating alternative \(a_j\) means that the player or team represented by \(a_i\) beats the player or team represented by \(a_j\) in the competition. In the latter interpretation, the same dominance relation signifies that more than half of the voters prefer \(a_i\) to \(a_j\).

We are interested in tournament solutions, which are functions that map each tournament to a nonempty subset of its alternatives, usually referred to as the choice set. Two simple tournament solutions are \( COND \), which chooses a Condorcet winner if one exists and chooses all alternatives otherwise,Footnote 6 and the set of Condorcet non-losers (\( CNL \)), which consists of all alternatives that are not Condorcet losers. Other tournament solutions considered in this paper are the following:

  • The top cycle (\( TC \)) is the (unique) smallest set of alternatives such that all alternatives in the set dominate all alternatives not in the set;

  • The uncovered set (\( UC \)) consists of all alternatives that can reach all other alternatives via a domination path of length at most twoFootnote 7;

  • The iterated uncovered set (\( UC ^\infty \)) is the result of iteratively computing the uncovered set until there is no further reduction.

The inclusions \( UC ^\infty (T)\subseteq UC (T)\subseteq TC (T)\subseteq CNL (T)\) and \( TC (T)\subseteq COND (T)\) hold for any tournament T.

Next, we describe the random model for generating tournaments that we consider in this paper, which we call the general random model. In this model, for each pair of distinct alternatives \(a_i,a_j\), there is an edge from \(a_i\) to \(a_j\) with probability \(p_{i,j}\) and an edge from \(a_j\) to \(a_i\) with probability \(p_{j,i}=1-p_{i,j}\), independently of other pairs of alternatives. Several models for generating random tournaments considered in previous work are special cases of our model. For example, the uniform random model (Fey 2008; Scott and Fey 2012) corresponds to taking \(p_{i,j}=1/2\) for all ij in the general random model. The Condorcet random model (Frank 1968; Łuczak et al. 1996; Vassilevska Williams 2010; Kim et al. 2017) corresponds to taking \(p_{i,j}=p\) for all \(i<j\) in the general random model, for some fixed value of p. Following standard terminology, we say that an event occurs “with high probability” or “almost surely” if the probability that the event occurs converges to 1 as n, the number of alternatives, goes to infinity.

We end this section by listing some standard tools for deriving probabilistic bounds. Our first lemma is the Chernoff bound, which gives us an upper bound on the probability that a sum of independent random variables is far away from its expected value.

Lemma 1

(Chernoff bound). Let \(X_1,X_2, \dots , X_r\) be independent random variables that take on values in the interval [0, 1], and let \(S=X_1 + X_2 + \cdots + X_r\). For every \(\delta \ge 0\), we have

$$\begin{aligned} \Pr [S \ge (1 + \delta )\mathbb {E}[S]] \le \exp {\left( \frac{-\delta ^2\mathbb {E}[S]}{3}\right) }. \end{aligned}$$

The next two lemmas allow us to estimate the expression \(1+x\) from above and below.

Lemma 2

(Bernoulli’s inequality). For all real numbers \(r\ge 1\) and \(x\ge -1\), we have

$$\begin{aligned} (1+x)^r \ge 1+rx. \end{aligned}$$

Lemma 3

For all real numbers x, we have \(1+x\le e^x\).

3 Top cycle

In this section, we consider the top cycle. We show that when each probability \(p_{i,j}\) is between f(n) and \(1-f(n)\) for some function \(f(n)\in \omega \left( 1/n\right) \), \( TC \) chooses all alternatives with high probability (Theorem 1). By using the inclusion relationships between \( TC \), \( COND \), and \( CNL \), we obtain analogous statements for \( COND \) and \( CNL \). We also show that our results are asymptotically tight—for all three tournament solutions, the statement ceases to hold if \(f(n)\in O\left( 1/n\right) \) (Theorem 2).

We begin with our main result of the section.

Theorem 1

Let \(f:\mathbb {Z}^+\rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(f(n)\le 1/2\) for all n and \(f(n)\in \omega \left( 1/n\right) \). Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j}\in \left[ f(n),1-f(n)\right] \end{aligned}$$

for all \(i\ne j\). Then with high probability, \( TC (T)=A\).

Theorem 1 generalizes a result by Łuczak et al. (1996) that establishes the claim for the case where \(p_{i,j}=f(n)\) for all \(i<j\) (or, by symmetry, the case where \(p_{i,j}=1-f(n)\) for all \(i<j\)). We remark that their proof relies crucially on the assumption that there is a linear order of alternatives and all edges are more likely to be oriented in one direction than in the other direction according to the order. Indeed, this assumption allows the authors to show that with high probability, any alternative can be reached by the strongest alternative and can reach the weakest alternative via a domination path of length at most two each. Moreover, with the assumption \(f(n)\in \omega \left( 1/n\right) \) one can show that the weakest alternative can almost surely reach the strongest alternative via a domination path of length four, thus establishing the strong connectivity of the tournament. In contrast, we do not assume that the edges in the tournament are likely to be oriented in one direction or the other. As such, we will need a completely different approach for our proof.

Before we go into the proof of Theorem 1, we first give a high-level overview. We observe that \( TC (T)\ne A\) exactly when there exists a proper, nontrivial subset of alternatives B that dominates the complement set of alternatives \(A\backslash B\). Using the union bound, we then upper bound the probability that \( TC (T)\ne A\) by the sum over all sets B of the probabilities that B dominates \(A\backslash B\); this sum can be written entirely in terms of the variables \(p_{i,j}\) for \(i<j\). From here, a relatively straightforward calculation yields the desired result for \(f(n)\in \Omega \left( \log n/n\right) \). However, substantially more work is required to shave off the logarithmic factor and tighten the bound. To this end, we note that the aforementioned sum is linear in all of the variables \(p_{i,j}\), implying that its maximum is attained when all variables take on a value at one of the two boundaries of their domain. Using a number of helper lemmas (Lemmas 4, 5, and 6), which include an interesting combinatorial extension of Karamata’s inequality (Lemma 5), we show that the sum is in fact maximized when all variables take on a value at the same boundary (i.e., when the tournament is generated by the Condorcet random model). This allows us to bound the sum directly by plugging in the value at a boundary and complete the proof.

In what follows, we assume that \(\mathbf x =(x_1,x_2,\dots ,x_n)\) and \(\mathbf y =(y_1,y_2,\dots ,y_n)\) are vectors of nonnegative integers with n components. We start by defining majorization, a preorder on vectors that we will use frequently in our proof.

Definition 1

(Majorization). For a vector \(\mathbf x \), let \(\mathbf x ^\downarrow =(x_1^\downarrow ,x_2^\downarrow ,\dots ,x_n^\downarrow )\) be the vector with the same components, but sorted in descending order. Given two vectors \(\mathbf x ,\mathbf y \), we say that \(\mathbf x \)majorizes\(\mathbf y \), and write \(\mathbf x \succ \mathbf y \), if the following two conditions are satisfied:

  1. (i)

    \(\sum _{i=1}^j x_i^\downarrow \ge \sum _{i=1}^j y_i^\downarrow \) for \(j=1,2,\dots ,n-1\);

  2. (ii)

    \(\sum _{i=1}^n x_i=\sum _{i=1}^n y_i\).

When one vector majorizes another vector, Karamata’s inequality allows us to compare the sum of an arbitrary convex function at the components of one vector to the corresponding sum of the other vector.

Lemma 4

(Karamata’s inequality). Let \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {R}\) be a convex function, and let \(\mathbf x ,\mathbf y \) be vectors with n components such that \(\mathbf x \succ \mathbf y \). Then

$$\begin{aligned} \sum _{i=1}^n f(x_i) \ge \sum _{i=1}^n f(y_i). \end{aligned}$$

We next show that if one vector majorizes another vector, then an analogous statement holds for the two vectors that arise from taking the sum of all subsets with any fixed number of components of the original vectors.

Definition 2

Let n be a positive integer and \(k\in \{1,2,\dots ,n\}\). For a vector \(\mathbf x \) with n components, define \(\mathbf x ^{(k)}\) to be the vector with \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \) components consisting of all sums of k distinct components of \(\mathbf x \) in nonincreasing order.

For example, if \(n=4\) and \(\mathbf x =(2,4,5,7)\), then \(\mathbf x ^{(2)}=(12,11,9,9,7,6)\).

Lemma 5

If two vectors \(\mathbf x ,\mathbf y \) with n components are such that \(\mathbf x \succ \mathbf y \), then we also have \(\mathbf x ^{(k)}\succ \mathbf y ^{(k)}\) for all \(k=1,2,\dots ,n\).

For the sake of continuity, we leave the proof of Lemma 5 along with that of the next lemma to the appendix.

Our final lemma shows that the outdegree vector of a transitive tournament majorizes the corresponding vector of any tournament. Given a tournament T and alternative a in the tournament, denote by \(\deg _T(a)\) the outdegree of a in T.

Lemma 6

Let W be a transitive tournament on alternatives \(d_1,d_2,\dots ,d_n\), where \(d_i\) dominates \(d_j\) for all \(i<j\). For any tournament U with alternatives \(b_1,b_2,\dots ,b_n\), we have

$$\begin{aligned} (\deg _W(d_1),\deg _W(d_2),\dots ,\deg _W(d_n))\succ (\deg _U(b_1),\deg _U(b_2),\dots ,\deg _U(b_n)). \end{aligned}$$

With Lemmas 4, 5, and 6 in hand, we are now ready to prove Theorem 1.

Proof of Theorem 1

Fix an arbitrary \(\varepsilon > 0\), and choose a constant \(c\ge 10\) such that \(16ce^{-\frac{c}{2}} < \varepsilon \). Since \(f(n)=\omega \left( 1/n\right) \), there exists \(N'\) such that \(f(n)\ge c/n\) for all \(n\ge N'\). Let \(N=\max (N',4c)\). We will show that for \(n\ge N\), the probability that \( TC \) does not choose the whole set of alternatives is at most \(16ce^{-\frac{c}{2}} < \varepsilon \). This suffices to establish the desired result.

Assume that \(n\ge N\). Observe that \( TC (T)\ne A\) exactly when there is a proper, nontrivial set of alternatives that dominate the complement set of alternatives. Hence

$$\begin{aligned} \Pr [ TC (T)\ne A]&= \Pr [B\text { dominates } A\backslash B \text { for some } \emptyset \ne B\subset A] \nonumber \\&\le \sum _{k=1}^{n-1}\sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \Pr [B\text { dominates } A\backslash B] \nonumber \\&= \sum _{k=1}^{n-1}\sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}}\prod _{\begin{array}{c} 1\le i\ne j\le n \\ a_i\in B \\ a_j\in A\backslash B \end{array}} p_{i,j} \nonumber \\&= \sum _{k=1}^{n-1}\sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}}\left( \prod _{\begin{array}{c} 1\le i<j\le n \\ a_i\in B \\ a_j\in A\backslash B \end{array}} p_{i,j}\prod _{\begin{array}{c} 1\le i<j\le n \\ a_i\in A\backslash B \\ a_j\in B \end{array}} \left( 1-p_{i,j}\right) \right) , \end{aligned}$$
(1)

where we use the union bound for the inequality. We will derive an upper bound for expression (1). Note that if we view the terms \(p_{i,j}\) with \(i<j\) as variables, then the expression in linear in each variable. This implies that the maximum of expression (1) over the range \(p_{i,j}\in \left[ c/n,1-c/n\right] \) is attained when each \(p_{i,j}\) is either c / n or \(1-c/n\) (but not necessarily when all \(p_{i,j}\) are identical). We henceforth assume that for each \(i<j\), either \(p_{i,j}=c/n\) or \(p_{i,j}=1-c/n\). We will show that expression (1) is maximized when \(p_{i,j}=1-c/n\) for all \(i<j\) (or alternatively, when \(p_{i,j}=c/n\) for all \(i<j\)). In fact, we will show the stronger statement that for each particular value of k in the outermost summation, the expression inside the outermost summation is also maximized when \(p_{i,j}=1-c/n\) for all \(i<j\).

Fix \(k\in \{1,2,\dots ,n-1\}\). Define a tournament U on n alternatives \(b_1,b_2,\dots ,b_n\) as follows: For any \(i,j\in \{1,2,\dots ,n\}\) with \(i<j\), there is an edge from \(b_i\) to \(b_j\) if \(p_{i,j}=1-c/n\) and an edge from \(b_j\) to \(b_i\) if \(p_{i,j}=c/n\). We have

$$\begin{aligned}&\sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}}\left( \prod _{\begin{array}{c} 1\le i<j\le n \\ a_i\in B \\ a_j\in A\backslash B \end{array}} p_{i,j}\prod _{\begin{array}{c} 1\le i<j\le n \\ a_i\in A\backslash B \\ a_j\in B \end{array}} \left( 1-p_{i,j}\right) \right) \nonumber \\&\quad = \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}}\prod _{\begin{array}{c} 1\le i\ne j\le n \\ a_i\in B \\ a_j\in A\backslash B \end{array}} p_{i,j} \nonumber \\&\quad = \left( 1-\frac{c}{n}\right) ^{k(n-k)} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{c}{n-c}\right) ^{|\{(a_i,a_j)\in B\times A\backslash B \text { such that } (b_i,b_j)\text { is not an edge in } U\}|} \nonumber \\&\quad = \left( 1-\frac{c}{n}\right) ^{k(n-k)} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{c}{n-c}\right) ^{(n-1)k-\left( {\begin{array}{c}k\\ 2\end{array}}\right) -\sum _{i:a_i\in B}\deg _U(b_i)} \nonumber \\&\quad = \left( 1-\frac{c}{n}\right) ^{k(n-k)}\left( \frac{c}{n-c}\right) ^{(n-1)k-\left( {\begin{array}{c}k\\ 2\end{array}}\right) }\sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{n-c}{c}\right) ^{\sum _{i:a_i\in B}\deg _U(b_i)}. \end{aligned}$$
(2)

Let W be a transitive tournament on n alternatives \(d_1,d_2,\dots ,d_n\), where \(d_i\) dominates \(d_j\) for all \(i<j\). In particular, \(\deg _W(d_i)=n-i\) for all \(i=1,2,\dots ,n\). To show that expression (1) is maximized when \(p_{i,j}=1-c/n\) for all \(i<j\), it suffices to show that expression (2) is maximized when \(U=W\). The terms outside the summation do not depend on the tournament U that we choose, so for the purpose of maximizing expression (2) we may ignore them.

From Lemma 6, we know that

$$\begin{aligned} (\deg _W(d_1),\deg _W(d_2),\dots ,\deg _W(d_n))\succ (\deg _U(b_1),\deg _U(b_2),\dots ,\deg _U(b_n)). \end{aligned}$$

Lemma 5 then implies that

$$\begin{aligned} (\deg _W(d_1),\deg _W(d_2),\dots ,\deg _W(d_n))^k\succ (\deg _U(b_1),\deg _U(b_2),\dots ,\deg _U(b_n))^k. \end{aligned}$$

Using Lemma 4 with the convex function \(f(x)=\left( \frac{n-c}{c}\right) ^x\), we find that

$$\begin{aligned} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{n-c}{c}\right) ^{\sum _{i:a_i\in B}\deg _U(b_i)} \le \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{n-c}{c}\right) ^{\sum _{i:a_i\in B}\deg _W(d_i)}. \end{aligned}$$

It follows that expression (2) is maximized when \(U=W\), as claimed.

We return to expression (1), which we now know is maximized when \(p_{i,j}=1-c/n\) for all \(i<j\). Substituting \(p_{i,j}=1-c/n\) for all \(i<j\), expression (1) becomes

$$\begin{aligned}&\sum _{k=1}^{n-1}\left( \left( 1-\frac{c}{n}\right) ^{k(n-k)} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{c}{n-c}\right) ^{(n-1)k-\left( {\begin{array}{c}k\\ 2\end{array}}\right) -\sum _{i:a_i\in B}\deg _W(d_i)}\right) \\&\quad \le 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck(n-k)}{n}} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{2c}{n}\right) ^{(n-1)k-\left( {\begin{array}{c}k\\ 2\end{array}}\right) -\sum _{i:a_i\in B}(n-i)}\right) \\&\quad \le 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck}{2}} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{2c}{n}\right) ^{\sum _{i:a_i\in B}i-\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) }\right) , \end{aligned}$$

where we use Lemma 3, the assumption \(n\ge 4c\), and the symmetry between the terms with \(k=i\) and \(k=n-i\) for the first inequality. Observe that \(\sum _{i:a_i\in B}i-\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \) is always nonnegative, and is zero exactly when \(B=\{1,2,\dots ,k\}\). Moreover, for any \(j \in \{1, 2, \dots , k\}\), the number of subsets \(B\subset A\) with \(|B|=k\) such that \(\sum _{i:a_i\in B}i-\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \le j\) is at most \(n^j\). Indeed, if a subset B satisfies this inequality, the \(n-j\) smallest elements of B must be \(1,2,\dots ,n-j\), which leaves at most \(n^j\) choices for the remaining elements. Note also that \(|\{B\subset A\mid |B|=k\}|=\left( {\begin{array}{c}n\\ k\end{array}}\right) \le n^k\). We have

$$\begin{aligned}&2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck}{2}} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{2c}{n}\right) ^{\sum _{i:a_i\in B}i-\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) }\right) \\&\quad \le 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck}{2}} \sum _{\begin{array}{c} B\subset A \\ |B|=k \end{array}} \left( \frac{2c}{n}\right) ^{\min \left( k,\sum _{i:a_i\in B}i-\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \right) }\right) \\&\quad \le 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck}{2}} \sum _{j=0}^k \left( n^j\cdot \left( \frac{2c}{n}\right) ^j \right) \right) \\&\quad = 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( e^{-\frac{ck}{2}} \sum _{j=0}^k (2c)^j\right) \\&\quad \le 2 \sum _{k=1}^{\lfloor n/2\rfloor } e^{-\frac{ck}{2}} (4c)^k \\&\quad = 2 \sum _{k=1}^{\lfloor n/2\rfloor } \left( 4ce^{-\frac{c}{2}}\right) ^k \\&\quad \le 2 \sum _{k=1}^{\infty } \left( 4ce^{-\frac{c}{2}}\right) ^k \\&\quad = \frac{8ce^{-\frac{c}{2}}}{1-4ce^{-\frac{c}{2}}} \\&\quad \le 16ce^{-\frac{c}{2}}, \end{aligned}$$

where we use the assumption \(c\ge 10\) for the last inequality.

In conclusion, when \(n\ge N\), the probability that \( TC (T)\ne A\) is at most \(16ce^{-\frac{c}{2}}\), completing our proof. \(\square \)

Since \( TC (T)\subseteq COND (T)\) and \( TC (T)\subseteq CNL (T)\), we immediately obtain the following corollary.

Corollary 1

Let \(f:\mathbb {Z}^+\rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(f(n)\le 1/2\) for all n and \(f(n)\in \omega \left( 1/n\right) \). Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j}\in \left[ f(n),1-f(n)\right] \end{aligned}$$

for all \(i\ne j\). Then with high probability, \( COND (T)= CNL (T)=A\).

Next, we show that Theorem 1 and Corollary 1 are tight in the sense that if \(f(n)\in O\left( 1/n\right) \), the results no longer hold.

Theorem 2

Let \(c\ge 0\) be a constant. Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j}\le \frac{c}{n} \end{aligned}$$

for all \(i>j\). Then for large enough n, with at least constant probability both \( TC (T)\) and \( COND (T)\) contain a single alternative. Moreover, for large enough n, with at least constant probability \( CNL (T)\) does not contain all alternatives.

Proof

The probability that \(a_1\) dominates all of the remaining alternatives is at least

$$\begin{aligned} \left( 1-\frac{c}{n}\right) ^{n-1}\rightarrow e^{-c} \end{aligned}$$

as \(n\rightarrow \infty \). When this occurs, both \( TC \) and \( COND \) only choose \(a_1\).

An analogous argument shows that \(a_n\) is dominated by all of the remaining alternatives with at least constant probability for large enough n. When this occurs, \( CNL \) chooses all alternatives except \(a_n\). \(\square \)

Theorems 1 and 2 and Corollary 1 allow us to obtain the following corollary on the Condorcet random model.

Corollary 2

Let \(f:\mathbb {Z}^+\rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(f(n)\le 1/2\) for all n. Assume that a tournament T is generated according to the general random model, and that \(p_{i,j}=f(n)\) for all \(i>j\).

  • If \(f(n)\in \omega \left( 1/n\right) \), then with high probability, \( TC (T)= COND (T)= CNL (T)=A\).

  • If \(f(n)\in o\left( 1/n\right) \), then with high probability, \( TC (T)\) and \( COND (T)\) contain a single alternative, and \( CNL (T)\) does not contain all alternatives.

  • If \(f(n)\in O\left( 1/n\right) \), then for large enough n, with at least constant probability \( TC (T)\) and \( COND (T)\) contain a single alternative. Moreover, for large enough n, with at least constant probability \( CNL (T)\) does not contain all alternatives.

Łuczak et al. (1996) also considered the case where \(p_{i,j}=c/n\) for all \(i>j\) and showed that the probability that \( TC \) selects all alternatives converges to \((1-e^{-c})^2\) in this special case. Our next theorem establishes an analogous result for \( COND \) and \( CNL \).

Theorem 3

Let \(c\ge 0\) be a constant. Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j} = \frac{c}{n} \end{aligned}$$

for all \(i>j\). Then the probability that \( COND (T)=A\) converges to \(1-e^{-c}\) as \(n\rightarrow \infty \). The same statement holds for \( CNL \).

Proof

We show the result for \( COND \); a similar argument holds for \( CNL \). We have

$$\begin{aligned} \Pr [ COND (T)\ne A]&= \sum _{i=1}^n \Pr [a_i \text { is a Condorcet winner}] \\&= \sum _{i=1}^n \left( 1-\frac{c}{n}\right) ^{n-i}\left( \frac{c}{n}\right) ^{i-1} \\&= \left( 1-\frac{c}{n}\right) ^{n-1}\cdot \sum _{i=0}^{n-1}\left( \frac{c}{n-c}\right) ^i. \end{aligned}$$

The first term converges to \(e^{-c}\) as \(n\rightarrow \infty \). For the second term, notice that it is always at least 1. Moreover, when \(n\ge (k+1)c\) for some positive \(k>1\), the term is at most

$$\begin{aligned} 1+\frac{1}{k}+\frac{1}{k^2}+\dots =\frac{k}{k-1}, \end{aligned}$$

which approaches 1 for large n. Hence the second term converges to 1, and therefore the probability that \( COND (T)\ne A\) converges to \(e^{-c}\), yielding the desired result. \(\square \)

4 Uncovered set

In this section, we turn our focus to the uncovered set. We show that when each probability \(p_{i,j}\) is between f(n) and \(1-f(n)\) for some function \(f(n)\ge c\sqrt{\log n/n}\) with \(c>\sqrt{2}\) a constant, \( UC \) chooses all alternatives with high probability (Theorem 4). As with \( TC \), we also show that our result is asymptotically tight—if \(f(n)\le 0.6\sqrt{\log n/n}\), the statement no longer holds (Theorem 5). It follows that similar results hold for \( UC ^\infty \), implying that \(\Theta (\sqrt{\log n/n})\) is the threshold where the two tournament solutions go from almost always choosing all alternatives to excluding at least one alternative with high probability.

Our first result of the section shows that \( UC \) chooses the whole set of alternatives for a wide range of distributions over tournaments.

Theorem 4

Let \(c>\sqrt{2}\) be a constant. Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j}\in \left[ c\sqrt{\frac{\log n}{n}},1-c\sqrt{\frac{\log n}{n}}\right] \end{aligned}$$

for all \(i\ne j\). Then with high probability, \( UC (T)=A\).

Proof

Choose N such that \(\frac{c^2(N-2)}{N}>2\), and let \(n\ge N\). Fix a pair of distinct alternatives \(a_i,a_j\). We first bound the probability that \(a_i\) cannot reach \(a_j\) via a domination path of length at most two. For each \(l\not \in \{i,j\}\), the probability that there is an edge from \(a_i\) to \(a_l\) and an edge from \(a_l\) to \(a_j\) is at least \(\left( c\sqrt{\log n/n}\right) ^2=c^2\log n/n\). The probability that \(a_i\) cannot reach \(a_j\) via a domination path of length at most two is therefore bounded above by

$$\begin{aligned} \left( 1-\frac{c^2\log n}{n}\right) ^{n-2}&\le e^{-\frac{c^2(n-2)\log n}{n}}\\&= n^{-\frac{c^2(n-2)}{n}}, \end{aligned}$$

where we use Lemma 3 for the inequality.

Observe that \( UC (T)=A\) exactly when any alternative can reach any other alternative via a domination path of length at most two. Using the union bound over all (ordered) pairs of distinct alternatives ij, we find that the probability that some alternative cannot reach some other alternative via a domination path of length at most two is no more than

$$\begin{aligned} n(n-1)n^{-\frac{c^2(n-2)}{n}}\le n^{2-\frac{c^2(n-2)}{n}}, \end{aligned}$$

which vanishes for large n. \(\square \)

Since the uncovered set is the finest tournament solution satisfying the axioms of Condorcet consistency, neutrality, and expansion (Moulin 1986), Theorem 4 implies that any tournament solution that satisfies these three axioms also selects all alternatives with high probability when the tournament is generated according to the assumptions of the theorem.

Next, we show that the statement of Theorem 4 breaks down if \(f(n)\le 0.6\sqrt{\log n/n}\), thus confirming that the assumption of the theorem cannot be relaxed asymptotically.

Theorem 5

Assume that a tournament T is generated according to the general random model, and that

$$\begin{aligned} p_{i,j}\le 0.6\sqrt{\frac{\log n}{n}} \end{aligned}$$

for all \(i>j\). Then with high probability, \( UC (T)\ne A\).

Proof

Let \(A_1=\{a_1,a_2,\dots ,a_{\left\lfloor n^{0.49}\right\rfloor }\}\), and let \(A_2\) be the set of alternatives that \(a_n\) dominates. We first prove the following claim.

Claim: With high probability, the following two events occur simultaneously: (i) \(a_n\) does not dominate any of the alternatives in \(A_1\), and (ii) \(|A_2|\le 0.61\sqrt{n\log n}\).

Proof of Claim:

First, using Lemma 2, the probability that \(a_n\) dominates at least one of the alternatives in \(A_1\) is at most

$$\begin{aligned} 1-\left( 1-0.6\sqrt{\frac{\log n}{n}}\right) ^{\left\lfloor n^{0.49}\right\rfloor }&\le 1-\left( 1-0.6\left\lfloor n^{0.49}\right\rfloor \sqrt{\frac{\log n}{n}}\right) \\&\le 0.6\cdot \frac{\sqrt{\log n}}{n^{0.01}}, \end{aligned}$$

which converges to 0 as \(n\rightarrow \infty \).

Next, for each \(a_i\in A\) with \(i=1,2,\dots ,n-1\), let \(X_i\) be an indicator random variable that indicates whether \(a_n\) dominates \(a_i\) or not: \(X_i\) takes on the value 1 if \(a_n\) dominates \(a_i\) and 0 otherwise. We have

$$\begin{aligned} \mathbb {E}[X_i]\le 0.6\sqrt{\frac{\log n}{n}}\le \frac{0.6\sqrt{n\log n}}{n-1}. \end{aligned}$$

Define \(X'_i=X_i+\frac{0.6\sqrt{n\log n}}{n-1}-\mathbb {E}[X_i]\), and \(X'=\sum _{i=1}^{n-1} X'_i\). We now have

$$\begin{aligned} \mathbb {E}[X'_i]=\frac{0.6\sqrt{n\log n}}{n-1} \text { and } \mathbb {E}[X']=0.6\sqrt{n\log n}. \end{aligned}$$

Moreover, observe that \(|A_2|=\sum _{i=1}^{n-1} X_i\). By Lemma 1, it follows that

$$\begin{aligned} \Pr \left[ |A_2|> 0.61\sqrt{n\log n}\right]&\le \Pr \left[ X'> 0.61\sqrt{n\log n}\right] \\&= \Pr \left[ X' > \frac{0.61}{0.6}\cdot \mathbb {E}[X']\right] \\&\le \exp \left( -\left( \frac{0.61}{0.6}-1\right) ^2\cdot \frac{\mathbb {E}[X']}{3}\right) \\&= \exp \left( -\frac{1}{3600}\cdot \frac{0.6\sqrt{n\log n}}{3}\right) , \end{aligned}$$

which again vanishes for large n.

Using the union bound over the two events, we have our claim. \(\square \)

From now on, we assume that \(a_n\) does not dominate any of the alternatives in \(A_1\) and that \(|A_2|\le 0.61\sqrt{n\log n}\). Under this assumption, \(a_n\) can reach all of the alternatives in \(A_1\) via a domination path of length at most two if and only if each alternative in \(A_1\) is dominated by some alternative in \(A_2\). Note that the event that this holds for a particular alternative in \(A_1\) is independent of the corresponding events for other alternatives in \(A_1\). It follows that

$$\begin{aligned}&\Pr \left[ a_n \text { can reach all } a_i\in A_1 \text { via a domination path of length at most two}\right] \\&\quad = \Pr \left[ a_n \text { can reach a fixed } a_i\in A_1 \text { via a domination path of length two}\right] ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad = \left( 1-\Pr \left[ \text {a fixed } a_i\in A_1 \text { dominates } a_j \text { for all } a_j\in A_2\right] \right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad \le \left( 1-\left( 1-0.6\sqrt{\frac{\log n}{n}}\right) ^{0.61\sqrt{n\log n}}\right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad = \left( 1-\left( \left( 1-0.6\sqrt{\frac{\log n}{n}}\right) ^{0.61\sqrt{\frac{n}{\log n}}}\right) ^{\log n}\right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad \le \left( 1-\left( 1-0.6\sqrt{\frac{\log n}{n}}\cdot 0.61\sqrt{\frac{n}{\log n}}\right) ^{\log n}\right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad = \left( 1-0.634^{\log n}\right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad \le \left( 1-n^{-0.46}\right) ^{\left\lfloor n^{0.49}\right\rfloor } \\&\quad \le e^{-n^{-0.46}\left\lfloor n^{0.49}\right\rfloor }, \end{aligned}$$

where we use Lemma 2, the estimate \(0.634>e^{-0.46}\), and Lemma 3 for the second, third, and fourth inequalities, respectively.

Finally, since

$$\begin{aligned} \lim _{n\rightarrow \infty }e^{-n^{-0.46}\lfloor n^{0.49}\rfloor }=0, \end{aligned}$$

the probability that \(a_n\not \in UC (T)\) converges to 1 as n goes to infinity. This implies that with high probability, \( UC (T)\) is not the whole set of alternatives, as desired. \(\square \)

Since \( UC (T)=A\) exactly when \( UC ^\infty (T)=A\), we immediately have the following corollary.

Corollary 3

Assume that a tournament T is generated according to the general random model.

  • Let \(c>\sqrt{2}\) be a constant. If \(p_{i,j}\in \left[ c\sqrt{\frac{\log n}{n}},1-c\sqrt{\frac{\log n}{n}}\right] \) for all \(i\ne j\), then with high probability, \( UC ^\infty (T)=A\).

  • If \(p_{i,j}\le 0.6\sqrt{\frac{\log n}{n}}\) for all \(i>j\), then with high probability, \( UC ^\infty (T)\ne A\).

Theorems 4 and 5 and Corollary 3 allow us to obtain the following corollary on the Condorcet random model.

Corollary 4

Let \(f:\mathbb {Z}^+\rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(f(n)\le 1/2\) for all n. Assume that a tournament T is generated according to the general random model, and that \(p_{i,j}=f(n)\) for all \(i>j\).

  • If \(f(n)\in \omega \left( \sqrt{\log n/n}\right) \) or \(f(n)\ge c\sqrt{\log n/n}\) for some constant \(c>\sqrt{2}\), then with high probability, \( UC (T)= UC ^{\infty }(T)=A\).

  • If \(f(n)\in o\left( \sqrt{\log n/n}\right) \) or \(f(n)\le 0.6\sqrt{\log n/n}\), then with high probability, \( UC (T)\ne A\) and \( UC ^{\infty }(T)\ne A\).

5 Experiments

To complement our theoretical results, in this section we investigate the asymptotic behavior of random tournaments according to the Condorcet random model as well as another more realistic model that we call the gap model.

5.1 Condorcet random model

Starting from a set of alternatives \(\{a_1, a_2, \ldots ,a_n \}\), we generate random tournaments according to the Condorcet random model by inserting for each pair of alternatives \(a_i,a_j\) with \(i>j\) an edge from \(a_i\) to \(a_j\) with probability p and an edge in the reverse direction with probability \(1-p\). The tournament solutions that we consider can all be computed efficiently: A simple counting algorithm suffices to compute \( COND \), a depth-first search algorithm computes \( TC \) in linear time, and the asymptotic running time for computing \( UC \) equals that of matrix multiplication (Hudry 2009). In our experimental setup, we draw 10,000 random tournaments of each size \(n \in \{5,10,20,30, \dots ,100\}\) for each \(p\in \{0.5,0.3,1/n,1/n^2,\sqrt{2\log n/n}, 0.6\sqrt{\log n/n}\}\) and check for each tournament solution \(S \in \{ COND , UC , TC \}\) whether it selects all alternatives.Footnote 8\(^{,}\)Footnote 9 Out of that, we compute the percentage of tournaments in which all alternatives are selected. The resulting graphs are displayed in Fig. 1.

Fig. 1
figure 1

Percentage of tournaments for which the tournament solution chooses the entire set of alternatives in the Condorcet random model, for different values of the probability p. The horizontal and vertical axes correspond to the number of alternatives in the tournament and the percentage, respectively. Averages are taken over 10,000 runs

For \(p=0.5\), which corresponds to the uniform random model, our experimental results in Fig. 1a coincide with the main theorem of Fey (2008). The results moreover reveal that \( UC \) chooses all alternatives with high probability in tournaments with at least 50 alternatives while \( COND \) and \( TC \) already do so in much smaller tournaments. As p decreases from 0.5 toward 0, the curves of \( COND \), \( TC \), and \( UC \) are shifted to the right; this is to be expected since for smaller p the tournament is more skewed, making it more likely for weaker alternatives to be excluded. Nevertheless, for any fixed p the fraction of tournaments in which all alternatives are chosen approaches 1. In particular, when \(p = 0.3\), \( UC \) almost never rules out any alternative in tournaments of size 100 or more (Fig. 1b).

Next, we look at the regimes where the probability p goes to 0 as n approaches infinity. For the case of \(p = 1/n\) we find that, in line with Theorem 3, the probability that \( COND \) selects all alternatives converges to \(1-e^{-1}\approx 0.6321\) (Fig. 1c). Similarly, the probability that \( TC \) selects all alternatives converges to \((1-e^{-1})^2 \approx 0.3996\) for the same value of p, confirming a result by Łuczak et al. (1996). Letting p approach 0 even faster, we find that for \(p = 1/n^2\), both \( TC \) and \( COND \) are discriminative with high probability (Fig. 1d). As \(1/n^2 \in o\left( 1/n\right) \), this is consistent with Corollary 2. Note that \( UC \) is discriminative for almost all tournaments for both \(p=1/n\) and \(p=1/n^2\); indeed, this is implied by Corollary 4 since already \(1/n \in o\left( \sqrt{\log n/n}\right) \).

Finally, we consider the regime \(p=\Theta \left( \sqrt{\log n/n}\right) \), which according to Corollary 4 is the boundary between \( UC \) almost never ruling out any alternative and almost always ruling out at least one alternative. The experimental setting for \(p = c\sqrt{\log n/n}\) with \(c \in \{0.6, \sqrt{2} \}\) differs from the previous settings in that we only examined tournaments of size \(n \ge 50\), since for small n the expression \(\sqrt{2\log n/n}\) is larger than 0.5, making it unsuitable for our experiments. On the other hand, as p decreases rather slowly, we examined random tournaments up to size 1000 in order to increase the expressive power of our experiments. We find that \( COND \) and \( TC \) select all alternatives with high probability for both values of c; this is in line with Corollary 2 and the observation that \(c\sqrt{\log n/n}\in \omega \left( 1/n\right) \). On the other hand, our experiments indicate that \( UC \) returns all alternatives in almost all tournaments in the case of \(p = \sqrt{2\log n/n}\) (Fig. 1e) but is discriminative in almost all tournaments when \(p = 0.6\sqrt{\log n/n}\) (Fig. 1f). These findings coincide with Corollary 4 and demonstrate the interesting fact that a small gap in the constant factor constitutes the threshold with regard to the discriminative power of \( UC \).

5.2 Gap model

As we explained in the introduction, while the Condorcet random model is commonly used in theoretical analyses, it does not properly capture tournaments in the real world since it assigns the same probability to all edges regardless of the difference in strength between the two alternatives adjacent to that edge. We next consider a different model, which we call the gap model, that takes this issue into account.

Like in the Condorcet random model, in the gap model there is a linear order of alternatives from strongest to weakest as well as a parameter \(p\le 0.5\). However, the probability that a stronger alternative dominates a weaker alternative depends linearly on the size of the gap between the two alternatives in the linear order: For two alternatives \(a_i,a_j\) with \(i<j\), there is an edge from \(a_i\) to \(a_j\) with probability \(0.5 + \frac{(0.5-p)(j-i)}{n-1}\), and an edge in the reverse direction with the remaining probability. In particular, there is an edge from \(a_1\) to \(a_n\) with probability \(1-p\). We perform experiments on the gap model using the same values of p as we did for the Condorcet random model. The only exception is \(p=0.5\) which we replace by \(p=0\), the reason being that both models coincide when \(p=0.5\). Moreover, we double the sizes of the tournaments considered for the first four values of p in order to better illustrate the convergence behavior. The resulting graphs are displayed in Fig. 2.

Fig. 2
figure 2

Percentage of tournaments for which the tournament solution chooses the entire set of alternatives in the gap model, for different values of the probability p. The horizontal and vertical axes correspond to the number of alternatives in the tournament and the percentage, respectively. Averages are taken over 10,000 runs

We now make some observations. Firstly, in Fig. 2b, we find that for \(p=0.3\) all three tournament solutions are unlikely to exclude any alternative in large tournaments; this is consistent with Theorems 1 and 4. The same phenomenon occurs for \(p=\sqrt{2\log n/n}\) (Fig. 2e). We remark that these phenomena cannot be explained by any theoretical result prior to our work. In the remaining figures, we likewise find that in the gap model, all tournament solutions cease to be discriminative as the size of the tournament grows. This is the case even for the extreme value \(p=0\) (Fig. 2a); the Condorcet random model for this value of p clearly always produces a Condorcet winner. Intuitively, the reason that all alternatives are likely to be chosen is that in the gap model, the overall difference in strength between alternatives is significantly less than in the Condorcet random model. Note that the observations for the latter values of p are not captured by our results, since our theorems require all edge probabilities to be in the range \([p,1-p]\) for appropriate values of p. Indeed, an intriguing direction for future work would be to generalize our results so that some edge probabilities are allowed to be outside of this range.Footnote 10

6 Conclusion

In this paper, we investigate the behavior of a number of tournament solutions in large random tournaments under a general probabilistic model. We establish tight asymptotic bounds on the boundary of the probability range for which each tournament solution is unlikely to exclude any alternative. In particular, we illustrate a difference between the discriminative power of the top cycle and the uncovered set; this difference is not evident in previous studies that focused on more restricted models. Indeed, while both tournament solutions include all alternatives with high probability in the uniform random model, our results suggest that the uncovered set is in fact considerably more discriminative than the top cycle.

Our work leaves many interesting open questions for future study. A natural next step would be to investigate the asymptotic behavior of other tournament solutions that have been previously studied in the uniform random model—including the Banks set (Fey 2008), the minimal covering set (Scott and Fey 2012), and the bipartisan set (Fisher and Ryan 1995)—using our general probabilistic model. For instance, it is conceivable that the approach used by Fey (2008) to show that the Banks set almost never rules out any alternative in the uniform random model can be extended to establish an analogous statement when each edge probability is drawn from some constant range. It is not clear, however, whether the approach would still work if we allow the range to depend on the number of alternatives in the tournament like we do in the current work.

From a broader point of view, we believe that an important direction is to apply our model to other tournament problems beyond those concerning tournament solutions, for example the problem of finding a dominating set of minimum size. It is well-known that a dominating set of size at most \(\log _2(n+1)\) always exists and can be found using a simple greedy algorithm. While a dominating set can be as small as a singleton in tournaments that admit a Condorcet winner, Scott and Fey (2012) showed that for uniform random tournaments, a dominating set of logarithmic size is the best that one can hope for. More precisely, these authors showed that given any constant \(0<c<1\), the smallest dominating set of a tournament chosen uniformly at random contains at least \(c\log _2n\) alternatives with high probability. Establishing a similar result in our general probabilistic model is an intriguing technical challenge that would allow us to better understand the behavior of such structures in the real world.