Skip to main content

Risk-Limiting Audits by Stratified Union-Intersection Tests of Elections (SUITE)

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11143))

Abstract

Risk-limiting audits (RLAs) offer a statistical guarantee: if a full manual tally of the paper ballots would show that the reported election outcome is wrong, an RLA has a known minimum chance of leading to a full manual tally. RLAs generally rely on random samples. Stratified sampling—partitioning the population of ballots into disjoint strata and sampling independently from the strata—may simplify logistics or increase efficiency compared to simpler sampling designs, but makes risk calculations harder. We present SUITE, a new method for conducting RLAs using stratified samples. SUITE considers all possible partitions of outcome-changing error across strata. For each partition, it combines P-values from stratum-level tests into a combined P-value; there is no restriction on the tests used in different strata. SUITE maximizes the combined P-value over all partitions of outcome-changing error. The audit can stop if that maximum is less than the risk limit. Voting systems in some Colorado counties (comprising 98.2% of voters) allow auditors to check how the system interpreted each ballot, which allows ballot-level comparison RLAs. Other counties use ballot polling, which is less efficient. Extant approaches to conducting an RLA of a statewide contest would require major changes to Colorado’s procedures and software, or would sacrifice the efficiency of ballot-level comparison. SUITE does not. It divides ballots into two strata: those cast in counties that can conduct ballot-level comparisons, and the rest. Stratum-level P-values are found by methods derived here. The resulting audit is substantially more efficient than statewide ballot polling. SUITE is useful in any state with a mix of voting systems or that uses stratified sampling for other reasons. We provide an open-source reference implementation and exemplar calculations in Jupyter notebooks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See https://github.com/pbstark/CORLA18.

  2. 2.

    “If” is straightforward. For “only if,” suppose \(\omega _{w\ell } \ge V_{w\ell }\). Set \(\lambda _s = \frac{\omega _{w\ell , s}}{\sum _t \omega _{w\ell , t}}\). Then \(\sum _s \lambda _s = 1\), and \(\omega _{w\ell , s} = \lambda _s \omega _{w\ell } \ge \lambda _s V_{w\ell }\) for all s.

  3. 3.

    If the stratum-level tests had continuously distributed P-values, the distribution would be exactly chi-square with 2S degrees of freedom, but if any of the P-values has atoms when the null hypothesis is true, it is in general stochastically smaller. This follows from a coupling argument along the lines of Theorem 4.12.3 in [3].

  4. 4.

    See https://www.sos.state.co.us/pubs/elections/RLA/2017RLABackground.html.

  5. 5.

    See [8] for a different (Bayesian) approach to auditing contests that include both CVR counties and no-CVR counties. In general, Bayesian audits are not risk-limiting.

  6. 6.

    Since so few ballots are cast in no-CVR counties, cruder approaches might work, for instance, pretending that no-CVR counties had CVRs, but treating any ballot sampled from a no-CVR county as if it had a 2-vote overstatement error. See [1].

  7. 7.

    So are some forms of preferential and approval voting, such as Borda count, and proportional representation contests, such as D’Hondt [15]. For a derivation of ballot-level comparison risk-limiting audits for super-majority contests, see https://github.com/pbstark/S157F17/blob/master/audit.ipynb. (Last visited 14 May 2018.) Changes for IRV/STV are more complicated.

References

  1. Bañuelos, J., Stark, P.: Limiting risk by turning manifest phantoms into evil zombies. Technical report, arXiv.org (2012). http://arxiv.org/abs/1207.3413. Accessed 17 July 2012

  2. California Secretary of State: California Secretary of State Post-Election Risk-Limiting Audit Pilot Program 2011–2013: Final Report to the United States Election Assistance Commission (2014). http://votingsystems.cdn.sos.ca.gov/oversight/risk-pilot/final-report-073014.pdf Accessed 6 May 2018

  3. Grimmett, G.R., Stirzaker, D.R.: Probability and Random Processes. Oxford University Press, Oxford, August 2001. www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0198572220

  4. Higgins, M., Rivest, R., Stark, P.: Sharper p-values for stratified post-election audits. Stat. Polit. Policy 2(1) (2011). http://www.bepress.com/spp/vol2/iss1/7

  5. Lindeman, M., Stark, P., Yates, V.: BRAVO: ballot-polling risk-limiting audits to verify outcomes. In: Proceedings of the 2011 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE 2011). USENIX (2012)

    Google Scholar 

  6. Lindeman, M., Stark, P.B.: A gentle introduction to risk-limiting audits. IEEE Secur. Priv. 10, 42–49 (2012)

    Article  Google Scholar 

  7. Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory, Applications, and Software. Wiley, West Sussex (2010)

    Book  Google Scholar 

  8. Rivest, R.L.: Bayesian tabulation audits: explained and extended, 1 January 2018. https://arxiv.org/abs/1801.00528

  9. Stark, P.: Conservative statistical post-election audits. Ann. Appl. Stat. 2, 550–581 (2008). http://arxiv.org/abs/0807.4005

  10. Stark, P.: Auditing a collection of races simultaneously. Technical report. arXiv.org (2009). http://arxiv.org/abs/0905.1422v1

  11. Stark, P.: CAST: canvass audits by sampling and testing. IEEE Trans. Inf. Forensics Secur. Spec. Issue Electron. Voting 4, 708–717 (2009)

    Article  Google Scholar 

  12. Stark, P.: Risk-limiting post-election audits: \(P\)-values from common probability inequalities. IEEE Trans. Inf. Forensics Secur. 4, 1005–1014 (2009)

    Article  Google Scholar 

  13. Stark, P.: Risk-limiting vote-tabulation audits: the importance of cluster size. Chance 23(3), 9–12 (2010)

    Article  Google Scholar 

  14. Stark, P.: Super-simple simultaneous single-ballot risk-limiting audits. In: Proceedings of the 2010 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE 2010). USENIX (2010). http://www.usenix.org/events/evtwote10/tech/full_papers/Stark.pdf

  15. Stark, P.B., Teague, V.: Verifiable European elections: risk-limiting audits for D’Hondt and its relatives. JETS: USENIX J. Election Technol. Syst. 3(1) (2014). https://www.usenix.org/jets/issues/0301/stark

  16. Stark, P.B., Wagner, D.A.: Evidence-based elections. IEEE Secur. Priv. 10, 33–41 (2012)

    Article  Google Scholar 

  17. Wald, A.: Sequential tests of statistical hypotheses. Ann. Math. Stat. 16, 117–186 (1945)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are grateful to Ronald L. Rivest and Steven N. Evans for helpful conversations and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kellie Ottoboni or Philip B. Stark .

Editor information

Editors and Affiliations

Appendices

A Comparison Tests for an Overstatement Quota

1.1 A.1 Notation

  • \(\mathcal {W}\): the set of reported winners of the contest

  • \(\mathcal {L}\): the set of reported losers of the contest

  • \(N_s\) ballots were cast in stratum s. (The contest might not appear on all \(N_s\) ballots.)

  • P “batches” of ballots are in stratum s. A batch contains one or more ballots. Every ballot in stratum s is in exactly one batch.

  • \(n_p\): number of ballots in batch p. \(N_s = \sum _{p=1}^P n_p\).

  • \(v_{pi} \in \{0, 1\}\): reported votes for candidate i in batch p

  • \(a_{pi} \in \{0, 1\}\): actual votes for candidate i in batch p. If the contest does not appear on any ballot in batch p, then \(a_{pi} = 0\).

  • \(V_{w\ell ,s} \equiv \sum _{p=1}^P (v_{pw} - v_{p\ell })\): Reported margin in stratum s of reported winner \(w \in \mathcal {W}\) over reported loser \(\ell \in \mathcal {L}\), in votes.

  • \(V_{w\ell }\): overall reported margin in votes of reported winner \(w \in \mathcal {W}\) over reported loser \(\ell \in \mathcal {L}\) for the entire contest (not just stratum s)

  • \(V \equiv \min _{w \in \mathcal {W}, \ell \in \mathcal {L}} V_{w \ell }\): smallest reported overall margin in votes between any reported winner and reported loser

  • \(A_{w\ell ,s} \equiv \sum _{p=1}^P (a_{pw} - a_{p\ell })\): actual margin in votes in the stratum of reported winner \(w \in \mathcal {W}\) over reported loser \(\ell \in \mathcal {L}\)

  • \(A_{w\ell }\): actual margin in votes of reported winner \(w \in \mathcal {W}\) over reported loser \(\ell \in \mathcal {L}\) for the entire contest (not just in stratum s)

1.2 A.2 Reduction to Maximum Relative Overstatement

If the contest is entirely contained in stratum s, then the reported winners of the contest are the actual winners if

$$ \min _{w \in \mathcal {W}, \ell \in \mathcal {L}} A_{w\ell ,s} > 0. $$

Here, we address the case that the contest may include a portion outside the stratum. To combine independent samples in different strata, it is convenient to be able to test whether the net overstatement error in a stratum is greater than or equal to a given threshold.

Instead of testing that condition directly, we will test a condition that is sufficient but not necessary for the inequality to hold, to get a computationally simple test that is still conservative (i.e., the level is not larger than its nominal value).

For every winner, loser pair \((w, \ell )\), we want to test whether the overstatement error is greater than or equal to some threshold, generally one tied to the reported margin between w and \(\ell \). For instance, for a hybrid stratified audit, we set the threshold to be \(\lambda _s V_{w\ell }\).

We want to test whether

$$ \sum _{p=1}^P (v_{pw}-a_{pw} - v_{p\ell } + a_{p\ell })/V_{w\ell } \ge \lambda _s. $$

The maximum of sums is not larger than the sum of the maxima; that is,

$$ \max _{w \in \mathcal {W}, \ell \in \mathcal {L}} \sum _{p=1}^P (v_{pw}-a_{pw} - v_{p\ell } + a_{p\ell })/V_{w\ell } \le \sum _{p=1}^P \max _{w \in \mathcal {W}, \ell \in \mathcal {L}} (v_{pw}-a_{pw} - v_{p\ell } + a_{p\ell })/V_{w\ell }. $$

Define

$$ e_p \equiv \max _{w \in \mathcal {W} \ell \in \mathcal {L}} (v_{pw}-a_{pw} - v_{p\ell } + a_{p\ell })/V_{w\ell }. $$

Then no reported margin is overstated by a fraction \(\lambda _s\) or more if

$$ E \equiv \sum _{p=1}^P e_p < \lambda _s. $$

Thus if we can reject the hypothesis \(E \ge \lambda _s\), we can conclude that no pairwise margin was overstated by as much as a fraction \(\lambda _s\).

Testing whether \(E \ge \lambda _s\) would require a very large sample if we knew nothing at all about \(e_p\) without auditing batch p: a single large value of \(e_p\) could make E arbitrarily large. But there is an a priori upper bound for \(e_p\). Whatever the reported votes \(v_{pi}\) are in batch p, we can find the potential values of the actual votes \(a_{pi}\) that would make the error \(e_p\) largest, because \(a_{pi}\) must be between 0 and \(n_p\), the number of ballots in batch p:

$$ \frac{v_{pw}-a_{pw} - v_{p\ell } + a_{p\ell }}{V_{w\ell }} \le \frac{v_{pw}- 0 - v_{p\ell } + n_p}{V_{w\ell }}. $$

Hence,

$$\begin{aligned} e_p \le \max _{w \in \mathcal {W}, \ell \in \mathcal {L}} \frac{v_{pw} - v_{p\ell } + n_p}{V_{w\ell }} \equiv u_p. \end{aligned}$$
(4)

Knowing that \(e_p \le u_p\) might let us conclude reliably that \(E < \lambda _s\) by examining only a small number of batches—depending on the values \(\{ u_p\}_{p=1}^P\) and on the values of \(\{e_p\}\) for the audited batches.

To make inferences about E, it is helpful to work with the taint \(t_p \equiv \frac{e_p}{u_p} \le 1\). Define \(U \equiv \sum _{p=1}^P u_p\). Suppose we draw batches at random with replacement, with probability \(u_p/U\) of drawing batch p in each draw, \(p = 1, \ldots , P\). (Since \(u_p \ge 0\), these are all positive numbers, and they sum to 1, so they define a probability distribution on the P batches.)

Let \(T_j\) be the value of \(t_p\) for the batch p selected in the jth draw. Then \(\{T_j\}_{j=1}^n\) are IID, \(\mathbb {P} \{T_j \le 1\} = 1\), and

$$ \mathbb {E} T_1 = \sum _{p=1}^P \frac{u_p}{U} t_p = \frac{1}{U}\sum _{p=1}^P u_p \frac{e_p}{u_p} = \frac{1}{U} \sum _{p=1}^P e_p = E/U. $$

Thus \(E = U \mathbb {E} T_1\). So, if we have strong evidence that \(\mathbb {E} T_1 < \lambda _s/U\), we have strong evidence that \(E < \lambda _s\).

This approach can be simplified even further by noting that \(u_p\) has a simple upper bound that does not depend on \(v_{pi}\). At worst, the reported result for batch p shows \(n_p\) votes for the “least-winning” apparent winner of the contest with the smallest margin, but a hand interpretation would show that all \(n_p\) ballots in the batch had votes for the runner-up in that contest. Since \(V_{w\ell } \ge V\equiv \min _{w \in \mathcal {W}, \ell \in \mathcal {L}} V_{w \ell }\) and \(0 \le v_{pi} \le n_p\),

$$ u_p = \max _{w \in \mathcal {W}, \ell \in \mathcal {L}} \frac{v_{pw} - v_{p\ell } + n_p}{V_{w\ell }} \le \max _{w \in \mathcal {W}, \ell \in \mathcal {L}} \frac{n_p - 0 + n_p}{V_{w\ell }} \le \frac{2n_p}{V}. $$

Thus if we use \(2n_p/V\) in lieu of \(u_p\), we still get conservative results. (We also need to re-define U to be the sum of those upper bounds.) An intermediate, still conservative approach would be to use this upper bound for batches that consist of a single ballot, but use the sharper bound (4) when \(n_p > 1\). Regardless, for the new definition of \(u_p\) and U, \(\{T_j\}_{j=1}^n\) are IID, \(\mathbb {P} \{T_j \le 1\} = 1\), and

$$ \mathbb {E} T_1 = \sum _{p=1}^P \frac{u_p}{U} t_p = \frac{1}{U}\sum _{p=1}^P u_p \frac{e_p}{u_p} = \frac{1}{U} \sum _{p=1}^P e_p = E/U. $$

So, if we have evidence that \(\mathbb {E} T_1 < \lambda _s/U\), we have evidence that \(E < \lambda _s\).

1.3 A.3 Testing \(\mathbb {E} T_1 \ge \lambda _s/U\)

A variety of methods are available to test whether \(\mathbb {E} T_1 < \lambda _s/U\). One particularly elegant sequential method is based on Wald’s Sequential Probability Ratio Test (SPRT) [17]. Harold Kaplan pointed out this method on a website that no longer exists. A derivation of this Kaplan-Wald method is in Appendix A of [15]; to apply the method here, take \(t = \lambda _s\) in their Eq. 18. A different sequential method, the Kaplan-Markov method (also due to Harold Kaplan), is given in [12].

B Ballot-Polling Tests for an Overstatement Quota

In this section, we derive a ballot-polling test of the hypothesis that the margin (in votes) in a single stratum is greater than or equal to a threshold c.

1.1 B.1 Wald’s SPRT with a Nuisance Parameter

Consider a single stratum s containing \(N_s\) ballots, of which \(N_{w,s}\) have a vote for w but not for \(\ell \), \(N_{\ell ,s}\) have a vote for \(\ell \) but not for w, and \(N_{u,s} = N_s - N_{w,s} - N_{\ell ,s}\) have votes for both w and \(\ell \) or neither w nor \(\ell \), including undervotes and invalid ballots. Ballots are drawn sequentially without replacement, with equal probability of selecting each as-yet-unselected ballot in each draw.

We want to test the compound hypothesis that \(N_{w,s} - N_{\ell ,s} \le c\) against the alternative that \(N_{w,s} = V_{w,s}\), \(N_{\ell ,s} = V_{\ell ,s}\), and \(N_{u,s} = V_{u,s}\), with \(V_{w,s} - V_{\ell ,s} > c\).

The values \(V_{w,s}\), \(V_{\ell ,s}\), and \(V_{u,s}\) are the reported results for stratum s (or values related to those reported results; see [5]). In this problem, \(N_{u,s}\) (equivalently, \(N_{w,s} + N_{\ell ,s}\)) is a nuisance parameter: we care about \( N_{w,s} - N_{\ell ,s}\).

Let \(X_k\) be w, \(\ell \), or u according to whether the ballot selected on the kth draw shows a vote for w but not \(\ell \), \(\ell \) but not w, or something else. Let \(W_n \equiv \sum _{k=1}^n 1_{X_k = w}\); and define \(L_n\) and \(U_n\) analogously.

The probability of a given data sequence \(X_1, \ldots , X_n\) under the alternative hypothesis is

$$ \frac{\prod _{i=0}^{W_n-1} (V_{w,s}-i) \; \prod _{i=0}^{L_n-1} (V_{\ell ,s}-i) \; \prod _{i=0}^{U_n-1} (V_{u,s}-i)}{\prod _{i=0}^{n-1} (N_s-i)}. $$

If \(L_n \ge W_n - cn/N_s\), the data obviously do not provide evidence against the null, so we suppose that \(L_n < W_n - cn/N_s\), in which case, the element of the null that will maximize the probability of the observed data has \(N_{w,s} - c = N_{\ell ,s}\). Under the null hypothesis, the probability of \(X_1, \ldots , X_n\) is

$$ \frac{ \prod _{i=0}^{W_n-1} (N_{w,s}-i) \; \prod _{i=0}^{L_n-1}(N_{w,s}-c - i) \prod _{i=0}^{U_n-1} (N_{u,s}-i)}{\prod _{i=0}^n (N_s-i)}, $$

for some value \(N_{w,s}\) and the corresponding \(N_{u,s} = N_s - 2N_{w,s}+c\). How large can that probability be under the null? The probability under the null is maximized by any integer \(x \in \{ \max (W_n, L_n+c), \ldots , (N-U_n)/2 \}\) that maximizes

$$ \prod _{i=0}^{W_n-1} (x-i) \; \prod _{i=0}^{L_n-1} (x-c-i) \; \prod _{i=0}^{U_n-1} (N_s-2x+c - i). $$

The logarithm is monotonic, so any maximizer \(x^*\) also maximizes

$$ f(x) = \sum _{i=0}^{W_n-1} \ln (x-i) + \sum _{i=0}^{L_n-1} \ln (x-c-i) + \sum _{i=0}^{U_n-1} \ln (N_s-2x+ c - i).$$

The first two terms on the right increase monotonically with x and the last term decreases monotonically with x. This yields bounds without having to evaluate f everywhere. Suppose \(y < z\). Then for all integer x between y and z,

$$ f(x) \le \sum _{i=0}^{W_n-1} \ln (z-i) + \sum _{i=0}^{L_n-1} \ln (z-c-i) + \sum _{i=0}^{U_n-1} \ln (N_s-2y+c-i).$$

The optimization problem can be solved using a branch and bound approach. For instance, start by evaluating

$$ f^+(x) \equiv \sum _{i=0}^{W_n-1} \ln (x-i) + \sum _{i=0}^{L_n-1} \ln (x-c-i) $$

and

$$ f^-(x) \equiv \sum _{i=0}^{U_n-1} \ln (N_s-2x+c-i) $$

at \(\max (W_n, L_n+c)\), \((N_s-U_n)/2\), and their midpoint, to get the values of \(f = f^+ + f^-\) at those three points, along with upper bounds on f on the ranges between them. At stage j, we have evaluated f, \(f^+\), and \(f^-\) at j points \(x_1< x_2< \ldots < x_j\), and we have upper bounds on f on the \(j-1\) ranges \(R_m = \{x_m, x_m+1, \ldots , x_{m+1}\}\) between those points. Let \(U_m\) be the upper bound on f(x) for \(x \in R_m\). Suppose that for some h, \(f(x_h) = \max _{m=1}^j U_m\). Then \(x^* = x_h\) is a global maximizer of f. If there is some \(U_m > \max _i f(x_i)\), then subdivide the range with the largest \(U_m\), calculate f, \(f^+\), and \(f^-\) at the new point, and repeat. This algorithm must terminate by identifying a global maximizer \(x^*\) after a finite number of steps.

A conservative P-value for the null hypothesis after n items have been drawn is thus

$$ P_n = \frac{\prod _{i=0}^{W_n-1} (x^*-i) \; \prod _{i=0}^{L_n-1} (x^*-c-i) \; \prod _{i=0}^{U_n-1} (N_s-2x^*+c-i)}{\prod _{i=0}^{W_n-1}(V_{w,s}-i) \; \prod _{i=0}^{L_n-1} (V_{\ell ,s}-i) \; \prod _{i=0}^{U_n-1} (V_{u,s}-i)}. $$

Because the test is built on Wald’s SPRT, the sample can expand sequentially and (if the null hypothesis is true) the chance that \(P_n < p\) is never larger than p. That is, \(\Pr \{ \inf _n P_n < p \} \le p\) if the null is true.

A Jupyter notebook implementing this approach is given in https://github.com/pbstark/CORLA18.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ottoboni, K., Stark, P.B., Lindeman, M., McBurnett, N. (2018). Risk-Limiting Audits by Stratified Union-Intersection Tests of Elections (SUITE). In: Krimmer, R., et al. Electronic Voting. E-Vote-ID 2018. Lecture Notes in Computer Science(), vol 11143. Springer, Cham. https://doi.org/10.1007/978-3-030-00419-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00419-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00418-7

  • Online ISBN: 978-3-030-00419-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics