Abstract
Optimal sampling designs for audit, minimizing the mean squared error of the estimated amount of the misstatement, are proposed. They are derived from a general statistical model that describes the error process with the help of available auxiliary information. We show that, if the model is adequate, these optimal designs based on balanced sampling with unequal probabilities are more efficient than monetary unit sampling. We discuss how to implement the optimal designs in practice. Monte Carlo simulations based on audit data from the Swiss hospital billing system confirms the benefits of the proposed method.
Similar content being viewed by others
References
AAMAS (2009) National health care billing audit guidelines. Technical report, The American Association of Medical Audit Specialists
AICPA (2008) Audit guide: audit sampling. Technical report. American Institute of Certified Public Accountants
Bickel PJ (1992) Inference and auditing: the Stringer bound. Int Stat Rev 60(2):197–209
Chauvet G, Tillé Y (2005) Fast SAS macros for balancing samples: user’s guide. Software Manual, University of Neuchâtel. http://www2.unine.ch/statistics/page10890.html
Deville J-C, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382
Deville J-C, Tillé Y (2004) Efficient balanced sampling: the cube method. Biometrika 91:893–912
Deville J-C, Tillé Y (2005) Variance approximation under balanced sampling. J Stat Plann Inference 128:569–591
Dickhaut JW, Eggleton IRC (1975) An examination of the processes underlying comparative judgements of numerical stimuli. J Account Res 13:38–72
Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD (1980) Case mix definition by diagnosis-related groups. Med Care 18:1–53
Gafford WW, Carmichael DR (1984) Materiality, audit risk and sampling—a nuts-and-bolts approach (part one). J Account 158(4):109–110
Gemayel NM, Stasny EA, Tackett JA, Wolfe DA (2011) Ranked set sampling: an auditing application. Rev Quant Finance Account 39:413–422
Grimlund RA, Felix D (1987) Simulation evidence and analysis of alternative methods of evaluating dollar-unit samples. Contemp Account Res 62(3):455–480
Hansen SC (1993) Strategic sampling, physical units sampling, and dollar units sampling. Account Rev 68(2):232–345
Higgins HN, Nandram B (2009) Monetary unit sampling: improving estimation of the total audit error. Adv Account 25(2):174–182
Hoogduin LA, Hall TW, Tsay JJ, Pierce BJ (2015) Does systematic selection lead to unreliable risk assessments in monetary-unit sampling applications? Audit J Pract Theory 34(4):85–107
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hosmer DW, Lemeshow S (2013) Applied logistic regression. Wiley, New York
Kaplan RS (1973) Statistical sampling in auditing with auxiliary information estimators. J Accout Res 11(2):238–258
Leitch RA, Neter J, Plante R, Sinha P (1981) Implementation of upper multinomial bound using clustering. J Am Stat Assoc 76(375):530–533
Leslie DA, Teitlebaum AD, Anderson RJ (1980) Dollar-unit sampling: a practical guide for auditors. Pitman, London
Madow WG (1949) On the theory of systematic sampling, II. Ann Math Stat 20:333–354
Nedyalkova D, Tillé Y (2008) Optimal sampling and estimation strategies under linear model. Biometrika 95:521–537
Neter J, Loebbecke JK (1977) On the behavior of statistical estimators when sampling accounting populations. J Am Stat Assoc 72(359):501–507
Pea J, Qualité L, Tillé Y (2007) Systematic sampling is a minimal support design. Comput Stat Data Anal 51:5591–5602
Rosenberg MA, Fryback DG, Katz DA (2000) A statistical model to detect drg upcoding. Health Serv Outcomes Res Methodol 1(3–4):233–252
SFAO (2014) Kontrolle von DRG-Spitalrechnungen durch die Krankenversicherungen. Technical Report EFK-14367, Swiss Federal Audit Office
Smieliauskas W (1986) Control of sampling risks in auditing. Contemp Account Res 3(1):102–124
Statistics Canada (2010) Survey methods and practices, catalogue no. 12-587-x. Technical report. Statistics Canada
Stringer KW (1963) Practical aspects of statistical sampling in auditing. In: ASA proceedings of the business and economic statistics section. American Statistical Association, pp 405–411
Tillé Y (2006) Sampling algorithms. Springer, New York
Tillé Y, Matei A (2012) Sampling: survey sampling. R package version 2.5
Tsui KW, Matsumura EM, Tsui KL (1985) Multinomial-Dirichlet bounds for dollar-unit sampling in auditing. Account Rev 60(1):76–97
Acknowledgments
We are grateful to Sandro Prosperi and Giordano Macchi for their helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of equations (3)–(6)
We have
and since \(\varepsilon _k\) has a normal distribution,
Therefore,
and
It follows that
Using \(a_k=\exp ({\mathbf{z}}_k^{\rm T}\beta +\sigma ^2/2)\), we have
Finally,
Lemma 1
Under model M1, we have
Proof
We use the following general result. If
with \(\mathrm{E}_M(u_k)=0\), \(\mathrm{Var}_M(u_k)=\sigma _{uk}^2\), \(\mathrm{Cov}_M(u_k,u_\ell )=0\), then
A proof is available, among others, in Nedyalkova and Tillé (2008). Now, under model M1, we have
Since \(\mathrm{E}_M[x_k (j_k-\psi _k)( {\mathbf{z}}_k^{\rm T} \beta + \varepsilon _k)] =0\),
We now put \(u_k = x_k[ (j_k-\psi _k) {\mathbf{z}}_k^{\rm T} \beta + j_k \varepsilon _k\), and apply (11) to (12). We obtain
Lemma 2
Under model M1, we have
Proof
We have
We use (13) again, and apply (11) to (15). We obtain
Proof of Result 1
In order to minimize (13), we can first select a sample that is balanced on \(x_k\) and \(x_k\psi _k {\mathbf{z}}_k\), i.e. a sample such that
In this case, the first term of (13) vanishes. Now, if we minimize the second term with respect to \(\pi _k\)
subject to
we find, using the Lagrange technique,
Lemma 3
Under model M2, we have
where \(a_k\) and \(v_k\) are defined in (5)-(6).
Lemma 4
Under model M2, we have
Proofs Lemma 3, Lemma 4, and Result 2
The proofs are the same the proofs of Lemmas 1 and 2, and Result 3 if we notice that M2 can be written as a linear heteroscedastic model:
where \(r_k\) is defined in (4), \(a_k\) is defined in (5), and \(v_k\) is defined in (6).
Lemma 5
Proof
We only have to note that
Since \(\mathrm{E}_M(j_k-\psi _k)=0\) and \(\mathrm{Var}_M(j_k-\psi _k)=\psi _k(1-\psi _k)\), we can proceed as in the proof of Result 1. \(\square\)
Proof of Result 3
The proof is the same as for Result 1.
A Monte Carlo Simulation A population of size \(N=1000\) was generated according to models M0 and M1 as follows:
-
the \(x_k\) are independent lognormal variables with parameters \(\mu =0.7\) and \(\sigma =0.4\),
-
\({\mathbf{v}}_k=(1,x_k,b_k)^{\rm T}\), \(\gamma _k=(-1,0.007,1.0)^{\rm T}\),
-
the \(b_k\) are independent Bernoulli variables such that \(\mathrm{Pr}(b_k=1)=0.8\),
-
\({\mathbf{z}}_k = (1,z_k)^{\rm T}\), \(\beta =(0,0.5 )^{\rm T}\),
-
the \(z_k\) are independent normal variables such that \(z_k\sim N(0.2,0.3^2),\)
-
the \(\varepsilon _k\) are independent normal variables such that \(\varepsilon _k\sim N(0,0.1^2)\).
Figure 3 shows the generated population as well as \(\psi _k =\exp ({\mathbf{v}}_k^{\rm T} \gamma )/(1+\exp ({\mathbf{v}}_k^{\rm T} \gamma ))\) as a function of \(x_k\) and \(b_k\). There are \(J=292\) incorrect transactions and \(D=-2313.277\). There are more errors on small transactions than on large transactions. The variables \(b_k\) define two groups with different error levels. The amount of errors depends on \(z_k.\) We compared the following strategies:
-
cSRS: Simple random sampling without replacement, calibrated on x,
-
OPTD: Optimal design for D, balanced and calibrated on x and \(x\psi {\mathbf{z}}\),
-
OPTJ: Optimal design for J, balanced on \(\psi\), calibrated on x and \(\psi\),
-
MUS: naturally balanced on x,
-
iMUS: improved MUS balanced on \(x\psi\) and \(x\psi {\mathbf{z}}\), calibrated on x, \(x\psi\) and \(x\psi {\mathbf{z}}.\)
OPTD and OPTJ were based on Results 1 and 3. Calibration was computed according to the raking ratio technique (Deville and Särndal 1992) . We selected 10,000 samples of size 100. Table 7 reports the empirical \(\mathrm{MSE}\) (mse) of \({\hat{D}}={\hat{D}}_1={\hat{D}}_2\) and \({\hat{J}}\). The simulation confirms the theory: OPTD is the best strategy to estimate D and OPTJ is the best strategy to estimate J. MUS is a usable tool for estimating D but can be improved by balancing and calibrating.
Rights and permissions
About this article
Cite this article
Marazzi, A., Tillé, Y. Using past experience to optimize audit sampling design. Rev Quant Finan Acc 49, 435–462 (2017). https://doi.org/10.1007/s11156-016-0596-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11156-016-0596-7
Keywords
- Audit
- Hospital bill audit
- Monetary unit sampling
- Dollar unit sampling
- Balanced sampling
- Horvitz–Thompson Estimator