Skip to main content
Log in

Using past experience to optimize audit sampling design

  • Original Research
  • Published:
Review of Quantitative Finance and Accounting Aims and scope Submit manuscript

Abstract

Optimal sampling designs for audit, minimizing the mean squared error of the estimated amount of the misstatement, are proposed. They are derived from a general statistical model that describes the error process with the help of available auxiliary information. We show that, if the model is adequate, these optimal designs based on balanced sampling with unequal probabilities are more efficient than monetary unit sampling. We discuss how to implement the optimal designs in practice. Monte Carlo simulations based on audit data from the Swiss hospital billing system confirms the benefits of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • AAMAS (2009) National health care billing audit guidelines. Technical report, The American Association of Medical Audit Specialists

  • AICPA (2008) Audit guide: audit sampling. Technical report. American Institute of Certified Public Accountants

  • Bickel PJ (1992) Inference and auditing: the Stringer bound. Int Stat Rev 60(2):197–209

    Article  Google Scholar 

  • Chauvet G, Tillé Y (2005) Fast SAS macros for balancing samples: user’s guide. Software Manual, University of Neuchâtel. http://www2.unine.ch/statistics/page10890.html

  • Deville J-C, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382

    Article  Google Scholar 

  • Deville J-C, Tillé Y (2004) Efficient balanced sampling: the cube method. Biometrika 91:893–912

    Article  Google Scholar 

  • Deville J-C, Tillé Y (2005) Variance approximation under balanced sampling. J Stat Plann Inference 128:569–591

    Article  Google Scholar 

  • Dickhaut JW, Eggleton IRC (1975) An examination of the processes underlying comparative judgements of numerical stimuli. J Account Res 13:38–72

    Article  Google Scholar 

  • Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD (1980) Case mix definition by diagnosis-related groups. Med Care 18:1–53

    Google Scholar 

  • Gafford WW, Carmichael DR (1984) Materiality, audit risk and sampling—a nuts-and-bolts approach (part one). J Account 158(4):109–110

    Google Scholar 

  • Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2011) Ranked set sampling: an auditing application. Rev Quant Finance Account 39:413–422

    Article  Google Scholar 

  • Grimlund RA, Felix D (1987) Simulation evidence and analysis of alternative methods of evaluating dollar-unit samples. Contemp Account Res 62(3):455–480

    Google Scholar 

  • Hansen SC (1993) Strategic sampling, physical units sampling, and dollar units sampling. Account Rev 68(2):232–345

    Google Scholar 

  • Higgins HN, Nandram B (2009) Monetary unit sampling: improving estimation of the total audit error. Adv Account 25(2):174–182

    Article  Google Scholar 

  • Hoogduin LA, Hall TW, Tsay JJ, Pierce BJ (2015) Does systematic selection lead to unreliable risk assessments in monetary-unit sampling applications? Audit J Pract Theory 34(4):85–107

    Article  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  Google Scholar 

  • Hosmer DW, Lemeshow S (2013) Applied logistic regression. Wiley, New York

    Book  Google Scholar 

  • Kaplan RS (1973) Statistical sampling in auditing with auxiliary information estimators. J Accout Res 11(2):238–258

    Article  Google Scholar 

  • Leitch RA, Neter J, Plante R, Sinha P (1981) Implementation of upper multinomial bound using clustering. J Am Stat Assoc 76(375):530–533

    Article  Google Scholar 

  • Leslie DA, Teitlebaum AD, Anderson RJ (1980) Dollar-unit sampling: a practical guide for auditors. Pitman, London

    Google Scholar 

  • Madow WG (1949) On the theory of systematic sampling, II. Ann Math Stat 20:333–354

    Article  Google Scholar 

  • Nedyalkova D, Tillé Y (2008) Optimal sampling and estimation strategies under linear model. Biometrika 95:521–537

    Article  Google Scholar 

  • Neter J, Loebbecke JK (1977) On the behavior of statistical estimators when sampling accounting populations. J Am Stat Assoc 72(359):501–507

    Article  Google Scholar 

  • Pea J, Qualité L, Tillé Y (2007) Systematic sampling is a minimal support design. Comput Stat Data Anal 51:5591–5602

    Article  Google Scholar 

  • Rosenberg MA, Fryback DG, Katz DA (2000) A statistical model to detect drg upcoding. Health Serv Outcomes Res Methodol 1(3–4):233–252

    Article  Google Scholar 

  • SFAO (2014) Kontrolle von DRG-Spitalrechnungen durch die Krankenversicherungen. Technical Report EFK-14367, Swiss Federal Audit Office

  • Smieliauskas W (1986) Control of sampling risks in auditing. Contemp Account Res 3(1):102–124

    Article  Google Scholar 

  • Statistics Canada (2010) Survey methods and practices, catalogue no. 12-587-x. Technical report. Statistics Canada

  • Stringer KW (1963) Practical aspects of statistical sampling in auditing. In: ASA proceedings of the business and economic statistics section. American Statistical Association, pp 405–411

  • Tillé Y (2006) Sampling algorithms. Springer, New York

    Google Scholar 

  • Tillé Y, Matei A (2012) Sampling: survey sampling. R package version 2.5

  • Tsui KW, Matsumura EM, Tsui KL (1985) Multinomial-Dirichlet bounds for dollar-unit sampling in auditing. Account Rev 60(1):76–97

    Google Scholar 

Download references

Acknowledgments

We are grateful to Sandro Prosperi and Giordano Macchi for their helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfio Marazzi.

Appendix

Appendix

Proof of equations (3)–(6)

We have

$$\begin{aligned} y_k&\,=\,x_k\exp \left( j_k {\mathbf{z}}_k^{\rm T} \beta \right) \exp (j_k\varepsilon _k) \\&\,=\,x_k - x_k j_k + j_k x_k \exp \left( {\mathbf{z}}_k^{\rm T} \beta \right) \exp (\varepsilon _k) \\&\,=\,x_k - x_k j_k \left\{ 1-\exp \left( {\mathbf{z}}_k^{\rm T} \beta \right) \exp (\varepsilon _k) \right\}, \end{aligned}$$

and since \(\varepsilon _k\) has a normal distribution,

$$\begin{aligned} \mathrm{E}_M \left[ \exp ({\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k)\right]&\,=\,\exp \left( {\mathbf{z}}_k^{\rm T} \beta + \sigma ^2/2\right) , \\ \mathrm{Var}_M \left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right]&\,=\,\left[ \exp (\sigma ^2)-1\right] \exp \left( 2{\mathbf{z}}_k^{\rm T}\beta + \sigma ^2\right). \end{aligned}$$

Therefore,

$$\mathrm{E}_M \left\{ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right\} = \psi _k\mathrm{E}_M \left[ \exp \left( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k\right) |j_k=1 \right] + (1-\psi _k)=\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] +1$$

and

$$\begin{aligned}&\mathrm{Var}_M \left\{ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right\} \\&\quad = \psi _k \mathrm{Var}_M \left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta +\varepsilon _k\right) \right] |j_k=1\right] + (1-\psi _k)\mathrm{Var}_M \left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=0\right] \\&\qquad + \psi _k\left\{ \mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=1\right] -\mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right] \right\} ^2 \\&\qquad + (1-\psi _k) \left\{ \mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=0\right] -\mathrm{E}_M\left[ \exp \left[ J_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k \varepsilon _k\right) \right] \right] \right\} ^2 \\&\quad = \psi _k \mathrm{Var}_M\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\varepsilon _k\right) |j_k=1\right] +0 \\&\qquad + \psi _k\left\{ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] -1\right\} ^2 \\&\qquad + (1-\psi _k)\left\{ 1-\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] -1\right\} ^2 \\&\quad = \psi _k \left[ \exp \left( \sigma ^2\right) -1\right] \exp \left( 2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2\right) \\&\qquad + \psi _k\left\{ (1-\psi _k)\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] \right\} ^2+(1-\psi _k)\left\{ \psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] \right\} ^2 \\&\quad = \psi _k[\exp (\sigma ^2)-1]\exp \left( 2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2\right) + \psi _k(1-\psi _k)\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] ^2. \end{aligned}$$

It follows that

$$\begin{aligned} \mathrm{E}_M (y_k)&\,=\,x_k\left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]+1\right\} , \\ \mathrm{Var}_M (y_k)&\,=\,x_k^2\left\{ \psi _k[\exp (\sigma ^2)-1]\exp (2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2) +\psi _k(1-\psi _{k})[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]^2\right\}. \end{aligned}$$

Using \(a_k=\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)\), we have

$$\mathrm{E}_M(y_k)=x_k\psi _k(a_k-1)+x_k=x_k\psi _{k}a_k-x_k\psi _k+x_k.$$

Finally,

$$\begin{aligned} r_{k}&\,=\,y_k-\mathrm{E}_M(y_k)=y_k-x_k\psi _ka_k-x_k\psi _k+x_k \\&\,=\,x_k - x_k j_k\left\{ 1-\exp ({\mathbf{z}}_k^{\rm T}\beta )\exp (\varepsilon _k)\right\} - x_k \left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]+1\right\} \\&\,=\,-x_k j_k\left\{ 1-\exp ({\mathbf{z}}_k^{\rm T}\beta )\exp (\varepsilon _k)\right\} - x_k\left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]\right\} \\&\,=\,x_k\exp ({\mathbf{z}}_k^{\rm T}\beta )\left[ j_k\exp (\varepsilon _k) -\psi _k\exp (\sigma ^2/2)\right] -x_k(j_k-\psi _k). \end{aligned}$$

Lemma 1

Under model M1, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_1)&\,=\,\mathrm{E}_p \left( \sum _{k\in S}\frac{x_k}{\pi _k}-\sum _{k\in U}x_k +\sum _{k\in S}\frac{x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k}-\sum _{k\in U}x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 \psi _k \left[ \sigma ^2+ ({\mathbf{z}}_k^{\rm T} \beta )^2 (1-\psi _k) \right] . \end{aligned}$$

Proof

We use the following general result. If

$$y_k = {\mathbf{z}}_k^{\rm T}\beta + u_k,$$

with \(\mathrm{E}_M(u_k)=0\), \(\mathrm{Var}_M(u_k)=\sigma _{uk}^2\), \(\mathrm{Cov}_M(u_k,u_\ell )=0\), then

$$\mathrm{E}_p\mathrm{E}_M\left( \sum _{k\in S}\frac{y_k}{\pi _k}-\sum _{k\in U}y_k\right) ^2 = \mathrm{E}_p\left( \sum _{k\in S}\frac{{\mathbf{z}}_k^{\rm T}\beta }{\pi _k}-\sum _{k\in U}{\mathbf{z}}_k^{\rm T}\beta \right) ^2 + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}\sigma _{uk}^2.$$
(11)

A proof is available, among others, in Nedyalkova and Tillé (2008). Now, under model M1, we have

$$\begin{aligned} \hat{D}_1 - D&\,=\,\hat{Y} - Y \nonumber \\&\,=\,\sum _{k\in S} \frac{x_k}{\pi _k} - \sum _{k\in U}x_k + \sum _{k\in S} \frac{x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k )}{\pi _k} - \sum _{k\in U}x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k ) \nonumber \\&\,=\,\sum _{k\in S} \frac{x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )}{\pi _k} - \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ) \nonumber \\&\quad+\sum _{k\in S} \frac{x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ])}{\pi _k} - \sum _{k\in U}x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ] . \end{aligned}$$
(12)

Since \(\mathrm{E}_M[x_k (j_k-\psi _k)( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k)] =0\),

$$\begin{aligned}&\mathrm{Var}_M \{ x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k] \}\\&\quad = x_k^2 \mathrm{E}_M[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ]^2\\&\quad = x_k^2 \mathrm{E}_M [ (j_k-\psi _k)^2({\mathbf{z}}_k^{\rm T} \beta )^2 + j_k^2\varepsilon _k^2+2(j_k-\psi _k){\mathbf{z}}_k^{\rm T} \beta \varepsilon _k]\\&\quad =x_k^2 \{ \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] \}. \end{aligned}$$

We now put \(u_k = x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k\), and apply (11) to (12). We obtain

$$\begin{aligned}&\mathrm{E}_p\mathrm{E}_M\left( \hat{D}_1 - D \right) ^2\nonumber \\&\quad =\mathrm{E}_p\left( \sum _{k\in S}\frac{x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )}{\pi _k} - \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ) \right) ^2 \nonumber \\&\qquad + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] . \end{aligned}$$
(13)

Lemma 2

Under model M1, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_2)&\,=\,\mathrm{E}_p \left( \sum _{k\in S} \frac{x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k}-\sum _{k\in U}x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k}x_k^2 \psi _k\left[ \sigma ^2 + ({\mathbf{z}}_k^{\rm T} \beta )^2(1-\psi _k) \right] . \end{aligned}$$
(14)

Proof

We have

$$\begin{aligned}&\hat{D}_2 - D\\&\quad =\sum _{k\in S} \frac{x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k )}{\pi _k} - \sum _{k\in U}x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k) \nonumber \\&\quad =\sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} - \sum _{k\in U}x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta \nonumber \\&\qquad +\sum _{k\in S} \frac{x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ])}{\pi _k} - \sum _{k\in U}x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ]. \end{aligned}$$
(15)

We use (13) again, and apply (11) to (15). We obtain

$$\begin{aligned}&\mathrm{E}_p\mathrm{E}_M\left( \hat{D}_2 - D \right) ^2 \\&\quad = \mathrm{E}_p\left( \sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} - \sum _{k\in U}x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] . \end{aligned}$$

Proof of Result 1

In order to minimize (13), we can first select a sample that is balanced on \(x_k\) and \(x_k\psi _k {\mathbf{z}}_k\), i.e. a sample such that

$$\begin{aligned} \sum _{k\in S} \frac{x_k}{\pi _k} &\,=\, \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )\\ \sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} &\,=\, \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ). \end{aligned}$$

In this case, the first term of (13) vanishes. Now, if we minimize the second term with respect to \(\pi _k\)

$$\sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ]$$

subject to

$$\sum _{k\in U}\pi _k=n,$$

we find, using the Lagrange technique,

$$\pi _k \propto \sqrt{x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ]}.$$

Lemma 3

Under model M2, we have

$$\begin{aligned}&\mathrm{MSE}({\hat{D}}_1)\\&\quad = \mathrm{E}_p \left( \sum _{k\in S}\frac{x_k \psi _ ka_k}{\pi _k}-\sum _{k\in U}x_k \psi _ ka_k - \sum _{k\in S}\frac{ x_k \psi _ k}{\pi _k}+\sum _{k\in U}+ x_k \psi _ k + \sum _{k\in S}\frac{x_k}{\pi _k}-\sum _{k\in U}x_k\right) ^2 \\&\qquad +\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 v_k^2, \end{aligned}$$

where \(a_k\) and \(v_k\) are defined in (5)-(6).

Lemma 4

Under model M2, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_2)&\,=\,\mathrm{E}_p\left( \sum _{k\in S}\frac{x_k \psi _ ka_k}{\pi _k}-\sum _{k\in U}x_k \psi _ ka_k - \sum _{k\in S}\frac{ x_k \psi _ k}{\pi _k}+\sum _{k\in U}+ x_k \psi _ k\right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 v_k^2. \end{aligned}$$

Proofs Lemma 3, Lemma 4, and Result 2

The proofs are the same the proofs of Lemmas 1 and 2, and Result 3 if we notice that M2 can be written as a linear heteroscedastic model:

$$\begin{aligned} y_k&\,=\,x_k \psi _ ka_k- x_k \psi _ k+x_k + r_k,\\ \mathrm{E}_M(r_k)&\,=\,0,\quad \mathrm{Var}_M(r_k)= v_k^2, \end{aligned}$$

where \(r_k\) is defined in (4), \(a_k\) is defined in (5), and \(v_k\) is defined in (6).

Lemma 5

$$\mathrm{MSE}({\hat{J}})=\mathrm{E}_p \left( \sum _{k\in S}\frac{\psi _k}{\pi _k}-\sum _{k\in U}\psi _k\right) ^2+\sum _{k\in U} \psi _k(1-\psi _k)\frac{1-\pi _k}{\pi _k}.$$

Proof

We only have to note that

$$\begin{aligned}&\hat{J}_2 - J \\&\quad = \sum _{k\in S} \frac{j_k }{\pi _k} - \sum _{k\in U}j_k \\&\quad = \sum _{k\in S} \frac{\psi _k }{\pi _k} - \sum _{k\in U}\psi _k + \sum _{k\in S} \frac{j_k-\psi _k }{\pi _k} - \sum _{k\in U}(j_k-\psi _k) .\\ \end{aligned}$$

Since \(\mathrm{E}_M(j_k-\psi _k)=0\) and \(\mathrm{Var}_M(j_k-\psi _k)=\psi _k(1-\psi _k)\), we can proceed as in the proof of Result 1. \(\square\)

Proof of Result 3

The proof is the same as for Result 1.

A Monte Carlo Simulation A population of size \(N=1000\) was generated according to models M0 and M1 as follows:

  • the \(x_k\) are independent lognormal variables with parameters \(\mu =0.7\) and \(\sigma =0.4\),

  • \({\mathbf{v}}_k=(1,x_k,b_k)^{\rm T}\), \(\gamma _k=(-1,0.007,1.0)^{\rm T}\),

  • the \(b_k\) are independent Bernoulli variables such that \(\mathrm{Pr}(b_k=1)=0.8\),

  • \({\mathbf{z}}_k = (1,z_k)^{\rm T}\), \(\beta =(0,0.5 )^{\rm T}\),

  • the \(z_k\) are independent normal variables such that \(z_k\sim N(0.2,0.3^2),\)

  • the \(\varepsilon _k\) are independent normal variables such that \(\varepsilon _k\sim N(0,0.1^2)\).

Figure 3 shows the generated population as well as \(\psi _k =\exp ({\mathbf{v}}_k^{\rm T} \gamma )/(1+\exp ({\mathbf{v}}_k^{\rm T} \gamma ))\) as a function of \(x_k\) and \(b_k\). There are \(J=292\) incorrect transactions and \(D=-2313.277\). There are more errors on small transactions than on large transactions. The variables \(b_k\) define two groups with different error levels. The amount of errors depends on \(z_k.\) We compared the following strategies:

  • cSRS: Simple random sampling without replacement, calibrated on x,

  • OPTD: Optimal design for D, balanced and calibrated on x and \(x\psi {\mathbf{z}}\),

  • OPTJ: Optimal design for J, balanced on \(\psi\), calibrated on x and \(\psi\),

  • MUS: naturally balanced on x,

  • iMUS: improved MUS balanced on \(x\psi\) and \(x\psi {\mathbf{z}}\), calibrated on x, \(x\psi\) and \(x\psi {\mathbf{z}}.\)

OPTD and OPTJ were based on Results 1 and 3. Calibration was computed according to the raking ratio technique (Deville and Särndal 1992) . We selected 10,000 samples of size 100. Table 7 reports the empirical \(\mathrm{MSE}\) (mse) of \({\hat{D}}={\hat{D}}_1={\hat{D}}_2\) and \({\hat{J}}\). The simulation confirms the theory: OPTD is the best strategy to estimate D and OPTJ is the best strategy to estimate J. MUS is a usable tool for estimating D but can be improved by balancing and calibrating.

Fig. 3
figure 3

Left panel simulated population of transactions according to model M0 and M1: x is the written value and y the audit value. The points on the diagonal are the correct transactions. Right panel inclusion probabilities \(\psi _k\) against \(x_k\) for \(B_k=1\) and \(B_k=0\)

Table 7 Monte Carlo simulation: empirical MSE/10,000 of \({\hat{D}}\) (standardized w.r.t. MUS) and \({\hat{J}}\) (standardized w.r.t. SRS) according to various strategies

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marazzi, A., Tillé, Y. Using past experience to optimize audit sampling design. Rev Quant Finan Acc 49, 435–462 (2017). https://doi.org/10.1007/s11156-016-0596-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11156-016-0596-7

Keywords

JEL Classification

Navigation