Abstract
Data quality and data fraud are of increasing concern in the digital world. Benford’s Law is used worldwide for detecting non-conformance or data fraud of numerical data. It says that the first non-zero digit \(D_1\), of a data item from a universe, is not uniformly distributed. The shape is roughly logarithmically decaying starting with \(P(D_1=1)\cong 0.3\). It is self-evident that Benford’s Law should not be applied for detecting manipulated or faked data before having examined the goodness of fit of the probability model while the business process is free of manipulations, i.e. ‘under control’. In this paper, we are concerned with the goodness-of-fit phase, not with fraud detection itself. We selected five empirical numerical data sets of various sample sizes being publicly accessible as a kind of benchmark, and evaluated the performance of three statistical tests. The tests include the chi-square goodness-of-fit test, which is used in businesses as a standard test, the Kolmogorov–Smirnov test, and the MAD test as originated by Nigrini (1992). We are analyzing further whether the invariance properties of Benford’s Law might improve the tests or not.
Keywords
- Benford’s Law
- Invariance properties
- Goodness-of-fit tests
- Data quality
- Data fraud
- Data manipulation
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
\(log(x)=log_{10}(x)\).
- 2.
- 3.
- 4.
- 5.
- 6.
Note that all \(c_{1-\alpha }\) of the KS test are chosen according to Morrow (2014).
References
Allart, P. C. (1997). An invariant-sum characterization of Benford’s law. Journal of Applied Probability34(1), 288–291.
Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society, 78(4), 551–572.
Berger, A., & Hill, T. P. (2015). An introduction to Benford’s Law. Princeton: Princeton University Press.
Berger, A., & Hill, Th P. (2011). A basic theory of Benford’s Law. Probability Surveys, 8, 1–126.
Darling, A. D. (1957). The Kolmogorov-Smirnov, Cram\(\acute{e}\)r-von-Mises Tests. Annals of Mathematical Statistics, 28(4), 823–838.
Deutsche Bank Aktiengesellschaft, Quartalsfinanzbericht zum 30. September 2017. http://www.bundesanzeiger.de/ebanzwww/wexsservlet.
Göb, R. (2007). Data conformance testing by digital analysis - A critical review and an approach to move appropriate testing. Quality Engineering, 19(4), 281–297.
Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giorn. dell’Inst. Ital. degli Att., 4, 83–91.
Miller, L. H. (1956). Table of percentage points of Kolmogorov statistics. Journal of the American Statistical Association, 51(273), 111–121.
Morrow, J. (2014). Benford’s Law, families of distributions and a test basis. Discussion Paper No 1291, Centre for Economic Performance, LSE, London.
Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics, 4(1), 39–40.
Nigrini, M. (1992). The detection of income evasion through an analysis of digital distributions. PhD dissertation, University of Cincinnati.
Nigrini, M. J. (2000). Digital analysis using Benford’s Law: Tests and statistics for auditors. Vancouver: Global Audit Publication.
Nigrini, M. (2012). Benford’s Law: Applications for forensic accounting, auditing, and fraud detection. Hoboken: Wiley.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine and Journal of Science, 5(50), 157–175.
Pinkham, R. S. (1961). On the distribution of first significant digits. Annals of Mathematical Statistics, 32(4), 1223–1230.
Smirnov, N. V. (1948). Table of estimating goodness of fit of empirical distributions. Annals of Mathematical Statistics, 19(2), 279–281.
UNStats Report. (2016). https://unstats.un.org/unsd/demographic-social/products/dyb/documents/dyb2016//table08.pdf.
Worldpopulation Report. (2016). http://worldpopulationreview.com/countries/china-population/cities/.
Acknowledgements
The authors thank an anonymous referee for helping to improve the paper substantially.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kössler, W., Lenz, HJ., Wang, X.D. (2021). Is the Benford Law Useful for Data Quality Assessment?. In: Knoth, S., Schmid, W. (eds) Frontiers in Statistical Quality Control 13. ISQC 2019. Frontiers in Statistical Quality Control. Springer, Cham. https://doi.org/10.1007/978-3-030-67856-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-67856-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67855-5
Online ISBN: 978-3-030-67856-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)