Using past experience to optimize audit sampling design

Marazzi, Alfio; Tillé, Yves

doi:10.1007/s11156-016-0596-7

Using past experience to optimize audit sampling design

Original Research
Published: 31 August 2016

Volume 49, pages 435–462, (2017)
Cite this article

Review of Quantitative Finance and Accounting Aims and scope Submit manuscript

Alfio Marazzi^1,2 &
Yves Tillé³

545 Accesses
7 Citations
Explore all metrics

Abstract

Optimal sampling designs for audit, minimizing the mean squared error of the estimated amount of the misstatement, are proposed. They are derived from a general statistical model that describes the error process with the help of available auxiliary information. We show that, if the model is adequate, these optimal designs based on balanced sampling with unequal probabilities are more efficient than monetary unit sampling. We discuss how to implement the optimal designs in practice. Monte Carlo simulations based on audit data from the Swiss hospital billing system confirms the benefits of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

Article Open access 13 January 2022

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

References

AAMAS (2009) National health care billing audit guidelines. Technical report, The American Association of Medical Audit Specialists
AICPA (2008) Audit guide: audit sampling. Technical report. American Institute of Certified Public Accountants
Bickel PJ (1992) Inference and auditing: the Stringer bound. Int Stat Rev 60(2):197–209
Article Google Scholar
Chauvet G, Tillé Y (2005) Fast SAS macros for balancing samples: user’s guide. Software Manual, University of Neuchâtel. http://www2.unine.ch/statistics/page10890.html
Deville J-C, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382
Article Google Scholar
Deville J-C, Tillé Y (2004) Efficient balanced sampling: the cube method. Biometrika 91:893–912
Article Google Scholar
Deville J-C, Tillé Y (2005) Variance approximation under balanced sampling. J Stat Plann Inference 128:569–591
Article Google Scholar
Dickhaut JW, Eggleton IRC (1975) An examination of the processes underlying comparative judgements of numerical stimuli. J Account Res 13:38–72
Article Google Scholar
Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD (1980) Case mix definition by diagnosis-related groups. Med Care 18:1–53
Google Scholar
Gafford WW, Carmichael DR (1984) Materiality, audit risk and sampling—a nuts-and-bolts approach (part one). J Account 158(4):109–110
Google Scholar
Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2011) Ranked set sampling: an auditing application. Rev Quant Finance Account 39:413–422
Article Google Scholar
Grimlund RA, Felix D (1987) Simulation evidence and analysis of alternative methods of evaluating dollar-unit samples. Contemp Account Res 62(3):455–480
Google Scholar
Hansen SC (1993) Strategic sampling, physical units sampling, and dollar units sampling. Account Rev 68(2):232–345
Google Scholar
Higgins HN, Nandram B (2009) Monetary unit sampling: improving estimation of the total audit error. Adv Account 25(2):174–182
Article Google Scholar
Hoogduin LA, Hall TW, Tsay JJ, Pierce BJ (2015) Does systematic selection lead to unreliable risk assessments in monetary-unit sampling applications? Audit J Pract Theory 34(4):85–107
Article Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article Google Scholar
Hosmer DW, Lemeshow S (2013) Applied logistic regression. Wiley, New York
Book Google Scholar
Kaplan RS (1973) Statistical sampling in auditing with auxiliary information estimators. J Accout Res 11(2):238–258
Article Google Scholar
Leitch RA, Neter J, Plante R, Sinha P (1981) Implementation of upper multinomial bound using clustering. J Am Stat Assoc 76(375):530–533
Article Google Scholar
Leslie DA, Teitlebaum AD, Anderson RJ (1980) Dollar-unit sampling: a practical guide for auditors. Pitman, London
Google Scholar
Madow WG (1949) On the theory of systematic sampling, II. Ann Math Stat 20:333–354
Article Google Scholar
Nedyalkova D, Tillé Y (2008) Optimal sampling and estimation strategies under linear model. Biometrika 95:521–537
Article Google Scholar
Neter J, Loebbecke JK (1977) On the behavior of statistical estimators when sampling accounting populations. J Am Stat Assoc 72(359):501–507
Article Google Scholar
Pea J, Qualité L, Tillé Y (2007) Systematic sampling is a minimal support design. Comput Stat Data Anal 51:5591–5602
Article Google Scholar
Rosenberg MA, Fryback DG, Katz DA (2000) A statistical model to detect drg upcoding. Health Serv Outcomes Res Methodol 1(3–4):233–252
Article Google Scholar
SFAO (2014) Kontrolle von DRG-Spitalrechnungen durch die Krankenversicherungen. Technical Report EFK-14367, Swiss Federal Audit Office
Smieliauskas W (1986) Control of sampling risks in auditing. Contemp Account Res 3(1):102–124
Article Google Scholar
Statistics Canada (2010) Survey methods and practices, catalogue no. 12-587-x. Technical report. Statistics Canada
Stringer KW (1963) Practical aspects of statistical sampling in auditing. In: ASA proceedings of the business and economic statistics section. American Statistical Association, pp 405–411
Tillé Y (2006) Sampling algorithms. Springer, New York
Google Scholar
Tillé Y, Matei A (2012) Sampling: survey sampling. R package version 2.5
Tsui KW, Matsumura EM, Tsui KL (1985) Multinomial-Dirichlet bounds for dollar-unit sampling in auditing. Account Rev 60(1):76–97
Google Scholar

Download references

Acknowledgments

We are grateful to Sandro Prosperi and Giordano Macchi for their helpful suggestions.

Author information

Authors and Affiliations

Institute of Social and Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
Alfio Marazzi
Nice Computing SA, Ch. de Maillefer 37, 1052, Le Mont/Lausanne, Switzerland
Alfio Marazzi
Institute of Statistics, University of Neuchâtel, Avenue de Bellevaux, 51, 2000, Neuchâtel, Switzerland
Yves Tillé

Authors

Alfio Marazzi
View author publications
You can also search for this author in PubMed Google Scholar
Yves Tillé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfio Marazzi.

Appendix

Proof of equations (3)–(6)

We have

$$\begin{aligned} y_k&\,=\,x_k\exp \left( j_k {\mathbf{z}}_k^{\rm T} \beta \right) \exp (j_k\varepsilon _k) \\&\,=\,x_k - x_k j_k + j_k x_k \exp \left( {\mathbf{z}}_k^{\rm T} \beta \right) \exp (\varepsilon _k) \\&\,=\,x_k - x_k j_k \left\{ 1-\exp \left( {\mathbf{z}}_k^{\rm T} \beta \right) \exp (\varepsilon _k) \right\}, \end{aligned}$$

and since $\varepsilon _k$ has a normal distribution,

$$\begin{aligned} \mathrm{E}_M \left[ \exp ({\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k)\right]&\,=\,\exp \left( {\mathbf{z}}_k^{\rm T} \beta + \sigma ^2/2\right) , \\ \mathrm{Var}_M \left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right]&\,=\,\left[ \exp (\sigma ^2)-1\right] \exp \left( 2{\mathbf{z}}_k^{\rm T}\beta + \sigma ^2\right). \end{aligned}$$

Therefore,

$$\mathrm{E}_M \left\{ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right\} = \psi _k\mathrm{E}_M \left[ \exp \left( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k\right) |j_k=1 \right] + (1-\psi _k)=\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] +1$$

and

$$\begin{aligned}&\mathrm{Var}_M \left\{ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right\} \\&\quad = \psi _k \mathrm{Var}_M \left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta +\varepsilon _k\right) \right] |j_k=1\right] + (1-\psi _k)\mathrm{Var}_M \left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=0\right] \\&\qquad + \psi _k\left\{ \mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=1\right] -\mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] \right] \right\} ^2 \\&\qquad + (1-\psi _k) \left\{ \mathrm{E}_M\left[ \exp \left[ j_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k\right) \right] |j_k=0\right] -\mathrm{E}_M\left[ \exp \left[ J_k\left( {\mathbf{z}}_k^{\rm T}\beta + \varepsilon _k \varepsilon _k\right) \right] \right] \right\} ^2 \\&\quad = \psi _k \mathrm{Var}_M\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\varepsilon _k\right) |j_k=1\right] +0 \\&\qquad + \psi _k\left\{ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] -1\right\} ^2 \\&\qquad + (1-\psi _k)\left\{ 1-\psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] -1\right\} ^2 \\&\quad = \psi _k \left[ \exp \left( \sigma ^2\right) -1\right] \exp \left( 2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2\right) \\&\qquad + \psi _k\left\{ (1-\psi _k)\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] \right\} ^2+(1-\psi _k)\left\{ \psi _k\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta + \sigma ^2/2\right) -1\right] \right\} ^2 \\&\quad = \psi _k[\exp (\sigma ^2)-1]\exp \left( 2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2\right) + \psi _k(1-\psi _k)\left[ \exp \left( {\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2\right) -1\right] ^2. \end{aligned}$$

It follows that

$$\begin{aligned} \mathrm{E}_M (y_k)&\,=\,x_k\left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]+1\right\} , \\ \mathrm{Var}_M (y_k)&\,=\,x_k^2\left\{ \psi _k[\exp (\sigma ^2)-1]\exp (2{\mathbf{z}}_k^{\rm T}\beta +\sigma ^2) +\psi _k(1-\psi _{k})[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]^2\right\}. \end{aligned}$$

Using $a_k=\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)$, we have

$$\mathrm{E}_M(y_k)=x_k\psi _k(a_k-1)+x_k=x_k\psi _{k}a_k-x_k\psi _k+x_k.$$

Finally,

$$\begin{aligned} r_{k}&\,=\,y_k-\mathrm{E}_M(y_k)=y_k-x_k\psi _ka_k-x_k\psi _k+x_k \\&\,=\,x_k - x_k j_k\left\{ 1-\exp ({\mathbf{z}}_k^{\rm T}\beta )\exp (\varepsilon _k)\right\} - x_k \left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]+1\right\} \\&\,=\,-x_k j_k\left\{ 1-\exp ({\mathbf{z}}_k^{\rm T}\beta )\exp (\varepsilon _k)\right\} - x_k\left\{ \psi _k[\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)-1]\right\} \\&\,=\,x_k\exp ({\mathbf{z}}_k^{\rm T}\beta )\left[ j_k\exp (\varepsilon _k) -\psi _k\exp (\sigma ^2/2)\right] -x_k(j_k-\psi _k). \end{aligned}$$

Lemma 1

Under model M1, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_1)&\,=\,\mathrm{E}_p \left( \sum _{k\in S}\frac{x_k}{\pi _k}-\sum _{k\in U}x_k +\sum _{k\in S}\frac{x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k}-\sum _{k\in U}x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 \psi _k \left[ \sigma ^2+ ({\mathbf{z}}_k^{\rm T} \beta )^2 (1-\psi _k) \right] . \end{aligned}$$

Proof

We use the following general result. If

$$y_k = {\mathbf{z}}_k^{\rm T}\beta + u_k,$$

with $\mathrm{E}_M(u_k)=0$, $\mathrm{Var}_M(u_k)=\sigma _{uk}^2$, $\mathrm{Cov}_M(u_k,u_\ell )=0$, then

$$\mathrm{E}_p\mathrm{E}_M\left( \sum _{k\in S}\frac{y_k}{\pi _k}-\sum _{k\in U}y_k\right) ^2 = \mathrm{E}_p\left( \sum _{k\in S}\frac{{\mathbf{z}}_k^{\rm T}\beta }{\pi _k}-\sum _{k\in U}{\mathbf{z}}_k^{\rm T}\beta \right) ^2 + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}\sigma _{uk}^2.$$

(11)

A proof is available, among others, in Nedyalkova and Tillé (2008). Now, under model M1, we have

$$\begin{aligned} \hat{D}_1 - D&\,=\,\hat{Y} - Y \nonumber \\&\,=\,\sum _{k\in S} \frac{x_k}{\pi _k} - \sum _{k\in U}x_k + \sum _{k\in S} \frac{x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k )}{\pi _k} - \sum _{k\in U}x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k ) \nonumber \\&\,=\,\sum _{k\in S} \frac{x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )}{\pi _k} - \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ) \nonumber \\&\quad+\sum _{k\in S} \frac{x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ])}{\pi _k} - \sum _{k\in U}x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ] . \end{aligned}$$

(12)

Since $\mathrm{E}_M[x_k (j_k-\psi _k)( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k)] =0$,

$$\begin{aligned}&\mathrm{Var}_M \{ x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k] \}\\&\quad = x_k^2 \mathrm{E}_M[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ]^2\\&\quad = x_k^2 \mathrm{E}_M [ (j_k-\psi _k)^2({\mathbf{z}}_k^{\rm T} \beta )^2 + j_k^2\varepsilon _k^2+2(j_k-\psi _k){\mathbf{z}}_k^{\rm T} \beta \varepsilon _k]\\&\quad =x_k^2 \{ \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] \}. \end{aligned}$$

We now put $u_k = x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k$, and apply (11) to (12). We obtain

$$\begin{aligned}&\mathrm{E}_p\mathrm{E}_M\left( \hat{D}_1 - D \right) ^2\nonumber \\&\quad =\mathrm{E}_p\left( \sum _{k\in S}\frac{x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )}{\pi _k} - \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ) \right) ^2 \nonumber \\&\qquad + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] . \end{aligned}$$

(13)

Lemma 2

Under model M1, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_2)&\,=\,\mathrm{E}_p \left( \sum _{k\in S} \frac{x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k}-\sum _{k\in U}x_k \psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k}x_k^2 \psi _k\left[ \sigma ^2 + ({\mathbf{z}}_k^{\rm T} \beta )^2(1-\psi _k) \right] . \end{aligned}$$

(14)

Proof

We have

$$\begin{aligned}&\hat{D}_2 - D\\&\quad =\sum _{k\in S} \frac{x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k )}{\pi _k} - \sum _{k\in U}x_k j_k ( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k) \nonumber \\&\quad =\sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} - \sum _{k\in U}x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta \nonumber \\&\qquad +\sum _{k\in S} \frac{x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ])}{\pi _k} - \sum _{k\in U}x_k [ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k ]. \end{aligned}$$

(15)

We use (13) again, and apply (11) to (15). We obtain

$$\begin{aligned}&\mathrm{E}_p\mathrm{E}_M\left( \hat{D}_2 - D \right) ^2 \\&\quad = \mathrm{E}_p\left( \sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} - \sum _{k\in U}x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta \right) ^2 + \sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ] . \end{aligned}$$

Proof of Result 1

In order to minimize (13), we can first select a sample that is balanced on $x_k$ and $x_k\psi _k {\mathbf{z}}_k$, i.e. a sample such that

$$\begin{aligned} \sum _{k\in S} \frac{x_k}{\pi _k} &\,=\, \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta )\\ \sum _{k\in S} \frac{x_k\psi _k {\mathbf{z}}_k^{\rm T} \beta }{\pi _k} &\,=\, \sum _{k\in U}x_k(1+\psi _k {\mathbf{z}}_k^{\rm T} \beta ). \end{aligned}$$

In this case, the first term of (13) vanishes. Now, if we minimize the second term with respect to $\pi _k$

$$\sum _{k\in U}\frac{1-\pi _k}{\pi _k}x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ]$$

subject to

$$\sum _{k\in U}\pi _k=n,$$

we find, using the Lagrange technique,

$$\pi _k \propto \sqrt{x_k^2 \psi _k(1-\psi _k)[ ({\mathbf{z}}_k^{\rm T} \beta )^2+\psi _k \sigma ^2 ]}.$$

Lemma 3

Under model M2, we have

$$\begin{aligned}&\mathrm{MSE}({\hat{D}}_1)\\&\quad = \mathrm{E}_p \left( \sum _{k\in S}\frac{x_k \psi _ ka_k}{\pi _k}-\sum _{k\in U}x_k \psi _ ka_k - \sum _{k\in S}\frac{ x_k \psi _ k}{\pi _k}+\sum _{k\in U}+ x_k \psi _ k + \sum _{k\in S}\frac{x_k}{\pi _k}-\sum _{k\in U}x_k\right) ^2 \\&\qquad +\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 v_k^2, \end{aligned}$$

where $a_k$ and $v_k$ are defined in (5)-(6).

Lemma 4

Under model M2, we have

$$\begin{aligned} \mathrm{MSE}({\hat{D}}_2)&\,=\,\mathrm{E}_p\left( \sum _{k\in S}\frac{x_k \psi _ ka_k}{\pi _k}-\sum _{k\in U}x_k \psi _ ka_k - \sum _{k\in S}\frac{ x_k \psi _ k}{\pi _k}+\sum _{k\in U}+ x_k \psi _ k\right) ^2 \\&\quad+\sum _{k\in U} \frac{1-\pi _k}{\pi _k} x_k^2 v_k^2. \end{aligned}$$

Proofs Lemma 3, Lemma 4, and Result 2

The proofs are the same the proofs of Lemmas 1 and 2, and Result 3 if we notice that M2 can be written as a linear heteroscedastic model:

$$\begin{aligned} y_k&\,=\,x_k \psi _ ka_k- x_k \psi _ k+x_k + r_k,\\ \mathrm{E}_M(r_k)&\,=\,0,\quad \mathrm{Var}_M(r_k)= v_k^2, \end{aligned}$$

where $r_k$ is defined in (4), $a_k$ is defined in (5), and $v_k$ is defined in (6).

Lemma 5

$$\mathrm{MSE}({\hat{J}})=\mathrm{E}_p \left( \sum _{k\in S}\frac{\psi _k}{\pi _k}-\sum _{k\in U}\psi _k\right) ^2+\sum _{k\in U} \psi _k(1-\psi _k)\frac{1-\pi _k}{\pi _k}.$$

Proof

We only have to note that

$$\begin{aligned}&\hat{J}_2 - J \\&\quad = \sum _{k\in S} \frac{j_k }{\pi _k} - \sum _{k\in U}j_k \\&\quad = \sum _{k\in S} \frac{\psi _k }{\pi _k} - \sum _{k\in U}\psi _k + \sum _{k\in S} \frac{j_k-\psi _k }{\pi _k} - \sum _{k\in U}(j_k-\psi _k) .\\ \end{aligned}$$

Since $\mathrm{E}_M(j_k-\psi _k)=0$ and $\mathrm{Var}_M(j_k-\psi _k)=\psi _k(1-\psi _k)$, we can proceed as in the proof of Result 1. $\square$

Proof of Result 3

The proof is the same as for Result 1.

A Monte Carlo Simulation A population of size $N=1000$ was generated according to models M0 and M1 as follows:

the $x_k$ are independent lognormal variables with parameters $\mu =0.7$ and $\sigma =0.4$,
${\mathbf{v}}_k=(1,x_k,b_k)^{\rm T}$, $\gamma _k=(-1,0.007,1.0)^{\rm T}$,
the $b_k$ are independent Bernoulli variables such that $\mathrm{Pr}(b_k=1)=0.8$,
${\mathbf{z}}_k = (1,z_k)^{\rm T}$, $\beta =(0,0.5 )^{\rm T}$,
the $z_k$ are independent normal variables such that $z_k\sim N(0.2,0.3^2),$
the $\varepsilon _k$ are independent normal variables such that $\varepsilon _k\sim N(0,0.1^2)$.

Figure 3 shows the generated population as well as $\psi _k =\exp ({\mathbf{v}}_k^{\rm T} \gamma )/(1+\exp ({\mathbf{v}}_k^{\rm T} \gamma ))$ as a function of $x_k$ and $b_k$. There are $J=292$ incorrect transactions and $D=-2313.277$. There are more errors on small transactions than on large transactions. The variables $b_k$ define two groups with different error levels. The amount of errors depends on $z_k.$ We compared the following strategies:

cSRS: Simple random sampling without replacement, calibrated on x,
OPTD: Optimal design for D, balanced and calibrated on x and $x\psi {\mathbf{z}}$,
OPTJ: Optimal design for J, balanced on $\psi$, calibrated on x and $\psi$,
MUS: naturally balanced on x,
iMUS: improved MUS balanced on $x\psi$ and $x\psi {\mathbf{z}}$, calibrated on x, $x\psi$ and $x\psi {\mathbf{z}}.$

OPTD and OPTJ were based on Results 1 and 3. Calibration was computed according to the raking ratio technique (Deville and Särndal 1992) . We selected 10,000 samples of size 100. Table 7 reports the empirical $\mathrm{MSE}$ (mse) of ${\hat{D}}={\hat{D}}_1={\hat{D}}_2$ and ${\hat{J}}$. The simulation confirms the theory: OPTD is the best strategy to estimate D and OPTJ is the best strategy to estimate J. MUS is a usable tool for estimating D but can be improved by balancing and calibrating.

Table 7 Monte Carlo simulation: empirical MSE/10,000 of ${\hat{D}}$ (standardized w.r.t. MUS) and ${\hat{J}}$ (standardized w.r.t. SRS) according to various strategies

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marazzi, A., Tillé, Y. Using past experience to optimize audit sampling design. Rev Quant Finan Acc 49, 435–462 (2017). https://doi.org/10.1007/s11156-016-0596-7

Download citation

Published: 31 August 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11156-016-0596-7

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using past experience to optimize audit sampling design

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

Violating the normality assumption may be the lesser of two evils

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of equations (3)–(6)

Lemma 1

Proof

Lemma 2

Proof

Proof of Result 1

Lemma 3

Lemma 4

Proofs Lemma 3, Lemma 4, and Result 2

Lemma 5

Proof

Proof of Result 3

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Using past experience to optimize audit sampling design

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

Violating the normality assumption may be the lesser of two evils

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of equations (3)–(6)

Lemma 1

Proof

Lemma 2

Proof

Proof of Result 1

Lemma 3

Lemma 4

Proofs Lemma 3, Lemma 4, and Result 2

Lemma 5

Proof

Proof of Result 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation