Skip to main content

Is the Benford Law Useful for Data Quality Assessment?

Part of the Frontiers in Statistical Quality Control book series (FSQC)

Abstract

Data quality and data fraud are of increasing concern in the digital world. Benford’s Law is used worldwide for detecting non-conformance or data fraud of numerical data. It says that the first non-zero digit \(D_1\), of a data item from a universe, is not uniformly distributed. The shape is roughly logarithmically decaying starting with \(P(D_1=1)\cong 0.3\). It is self-evident that Benford’s Law should not be applied for detecting manipulated or faked data before having examined the goodness of fit of the probability model while the business process is free of manipulations, i.e. ‘under control’. In this paper, we are concerned with the goodness-of-fit phase, not with fraud detection itself. We selected five empirical numerical data sets of various sample sizes being publicly accessible as a kind of benchmark, and evaluated the performance of three statistical tests. The tests include the chi-square goodness-of-fit test, which is used in businesses as a standard test, the Kolmogorov–Smirnov test, and the MAD test as originated by Nigrini (1992). We are analyzing further whether the invariance properties of Benford’s Law might improve the tests or not.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    \(log(x)=log_{10}(x)\).

  2. 2.

    http://www.bundesanzeiger.de/ebanzwww/wexsservlet.

  3. 3.

    http://redditmetrics.com/top.

  4. 4.

    https://unstats.un.org/unsd/demographic-social/products/dyb/documents/dyb2016//table08.pdf.

  5. 5.

    http://worldpopulationreview.com/countries/china-population/cities/.

  6. 6.

    Note that all \(c_{1-\alpha }\) of the KS test are chosen according to Morrow (2014).

References

  • Allart, P. C. (1997). An invariant-sum characterization of Benford’s law. Journal of Applied Probability34(1), 288–291.

    Google Scholar 

  • Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society, 78(4), 551–572.

    MATH  Google Scholar 

  • Berger, A., & Hill, T. P. (2015). An introduction to Benford’s Law. Princeton: Princeton University Press.

    Google Scholar 

  • Berger, A., & Hill, Th P. (2011). A basic theory of Benford’s Law. Probability Surveys, 8, 1–126.

    Article  MathSciNet  Google Scholar 

  • Darling, A. D. (1957). The Kolmogorov-Smirnov, Cram\(\acute{e}\)r-von-Mises Tests. Annals of Mathematical Statistics, 28(4), 823–838.

    Article  MathSciNet  Google Scholar 

  • Deutsche Bank Aktiengesellschaft, Quartalsfinanzbericht zum 30. September 2017. http://www.bundesanzeiger.de/ebanzwww/wexsservlet.

  • Göb, R. (2007). Data conformance testing by digital analysis - A critical review and an approach to move appropriate testing. Quality Engineering, 19(4), 281–297.

    Article  Google Scholar 

  • Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giorn. dell’Inst. Ital. degli Att., 4, 83–91.

    MATH  Google Scholar 

  • Miller, L. H. (1956). Table of percentage points of Kolmogorov statistics. Journal of the American Statistical Association, 51(273), 111–121.

    Article  MathSciNet  Google Scholar 

  • Morrow, J. (2014). Benford’s Law, families of distributions and a test basis. Discussion Paper No 1291, Centre for Economic Performance, LSE, London.

    Google Scholar 

  • Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics, 4(1), 39–40.

    Article  MathSciNet  Google Scholar 

  • Nigrini, M. (1992). The detection of income evasion through an analysis of digital distributions. PhD dissertation, University of Cincinnati.

    Google Scholar 

  • Nigrini, M. J. (2000). Digital analysis using Benford’s Law: Tests and statistics for auditors. Vancouver: Global Audit Publication.

    Google Scholar 

  • Nigrini, M. (2012). Benford’s Law: Applications for forensic accounting, auditing, and fraud detection. Hoboken: Wiley.

    Google Scholar 

  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine and Journal of Science, 5(50), 157–175.

    Article  Google Scholar 

  • Pinkham, R. S. (1961). On the distribution of first significant digits. Annals of Mathematical Statistics, 32(4), 1223–1230.

    Article  MathSciNet  Google Scholar 

  • Smirnov, N. V. (1948). Table of estimating goodness of fit of empirical distributions. Annals of Mathematical Statistics, 19(2), 279–281.

    Article  MathSciNet  Google Scholar 

  • UNStats Report. (2016). https://unstats.un.org/unsd/demographic-social/products/dyb/documents/dyb2016//table08.pdf.

  • Worldpopulation Report. (2016). http://worldpopulationreview.com/countries/china-population/cities/.

Download references

Acknowledgements

The authors thank an anonymous referee for helping to improve the paper substantially.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-J. Lenz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kössler, W., Lenz, HJ., Wang, X.D. (2021). Is the Benford Law Useful for Data Quality Assessment?. In: Knoth, S., Schmid, W. (eds) Frontiers in Statistical Quality Control 13. ISQC 2019. Frontiers in Statistical Quality Control. Springer, Cham. https://doi.org/10.1007/978-3-030-67856-2_22

Download citation

Publish with us

Policies and ethics