Severe testing of Benford’s law

Cerqueti, Roy; Lupi, Claudio

doi:10.1007/s11749-023-00848-z

Severe testing of Benford’s law

Original paper
Published: 28 February 2023

Volume 32, pages 677–694, (2023)
Cite this article

TEST Aims and scope Submit manuscript

209 Accesses
2 Citations
Explore all metrics

Abstract

Benford’s law is often used to support critical decisions related to data quality or the presence of data manipulations or even fraud in large datasets. However, many authors argue that conventional statistical tests will reject the null of data “Benford-ness” if applied in samples of the typical size in this kind of applications, even in the presence of tiny and practically unimportant deviations from Benford’s law. Therefore, they suggest using alternative criteria that, however, lack solid statistical foundations. This paper contributes to the debate on the “large n” (or “excess power”) problem in the context of Benford’s law testing. This issue is discussed in relation with the notion of severity testing for goodness-of-fit tests, with a specific focus on tests for conformity with Benford’s law. To do so, we also derive the asymptotic distribution of the mean absolute deviation (MAD) statistic as well as an asymptotic standard normal test. Finally, the severity testing principle is applied to six controversial large datasets to assess their “Benford-ness”.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Numerical tools for obtaining power-law representations of heavy-tailed datasets

Article 20 January 2016

Is the Benford Law Useful for Data Quality Assessment?

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

Availability of data and materials

The data used in the paper are publicly available at https://web.williams.edu/Mathematics/sjmiller/public_html/benfordresources/.

Code availability

R scripts are available upon request.

References

Barney BJ, Schulzke KS (2016) Moderating “cry wolf’’ events with excess MAD in Benford’s law research and practice. J Forensic Account Res 1(1):A66–A90. https://doi.org/10.2308/jfar-51622
Article Google Scholar
Benford F (1938) The law of anomalous numbers. Proc Am Philos Soc 78(4):551–572
MATH Google Scholar
Berkson J (1938) Some difficulties of interpretation encountered in the application of the chi-square test. J Am Stat Assoc 33(203):526–536. https://doi.org/10.1080/01621459.1938.10502329
Article MATH Google Scholar
Block HW, Savits TH (2010) A general example for Benford data. Am Stat 64(4):335–339. https://doi.org/10.1198/tast.2010.09169
Article MathSciNet MATH Google Scholar
Cerqueti R, Lupi C (2021) Some new tests of conformity with Benford’s law. Stats 4(3):745–761. https://doi.org/10.3390/stats4030044
Article Google Scholar
Cho WKT, Gaines BJ (2007) Breaking the (Benford) law: statistical fraud detection in campaign finance. Am Stat 61(3):218–223. https://doi.org/10.1198/000313007x223496
Article MathSciNet Google Scholar
Cohen J (1994) The earth is round \((p < 0.05)\). Am Psychol 49(12):997–1003. https://doi.org/10.1037/0003-066x.49.12.997
Article Google Scholar
Drake PD, Nigrini MJ (2000) Computer assisted analytical procedures using Benford’s law. J Account Educ 18(2):127–146. https://doi.org/10.1016/s0748-5751(00)00008-7
Article Google Scholar
Druică E, Oancea B, Vâlsan C (2018) Benford’s law and the limits of digit analysis. Int J Account Inf Syst 31:75–82. https://doi.org/10.1016/j.accinf.2018.09.004
Article Google Scholar
Fewster RM (2009) A simple explanation of Benford’s law. Am Stat 63(1):26–32. https://doi.org/10.1198/tast.2009.0005
Article MathSciNet Google Scholar
Granger CW (1998) Extracting information from mega-panels and high-frequency data. Stat Neerl 52(3):258–272. https://doi.org/10.1111/1467-9574.00084
Article MATH Google Scholar
Hill TP (1995a) Base-invariance implies Benford’s law. Proc Am Math Soc 123(3):887–895. https://doi.org/10.1090/s0002-9939-1995-1233974-8
Hill TP (1995b) A statistical derivation of the significant-digit law. Stat Sci 10(4):354–363. https://doi.org/10.1214/ss/1177009869
Kaiser M (2019) Benford’s law as an indicator of survey reliability—can we trust our data? J Econ Surv 33(5):1602–1618. https://doi.org/10.1111/joes.12338
Article Google Scholar
Kossovsky AE (2021) On the mistaken use of the chi-square test in Benford’s law. Stats 4(2):419–453. https://doi.org/10.3390/stats4020027
Article Google Scholar
Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer Texts in Statistics. Springer, New York
MATH Google Scholar
Li F, Han S, Zhang H et al (2019) Application of Benford’s law in data analysis. J Phys Conf Ser 1168(3):032,133. https://doi.org/10.1088/1742-6596/1168/3/032133
Article Google Scholar
Lindley DV (1957) A statistical paradox. Biometrika 44(1/2):187–192. https://doi.org/10.2307/2333251
Article MATH Google Scholar
Mayo DG (2018) Statistical inference as severe testing: how to get beyond the statistics wars. Cambridge University Press, Cambridge
Book MATH Google Scholar
Mayo DG, Spanos A (2006) Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. Br J Philos Sci 57(2):323–357. https://doi.org/10.1093/bjps/axl003
Article MathSciNet MATH Google Scholar
Mayo DG, Spanos A (2010) The error-statistical philosophy. In: Mayo DG, Spanos A (eds) Error and inference—recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press, Cambridge, chap 2, pp 15–27
Mayo DG, Spanos A (2011) Error statistics. In: Bandyopadhyay PS, Forster MR (eds) Handbook of the philosophy of science, vol 7. Philosophy of statistics. Elsevier, pp 153–198. https://doi.org/10.1016/b978-0-444-51862-0.50005-8
Newcomb S (1881) Note on the frequency of use of the different digits in natural numbers. Am J Math 4(1):39–40. https://doi.org/10.2307/2369148
Article MathSciNet MATH Google Scholar
Nigrini MJ (2012) Benford’s law: applications for forensic accounting, auditing, and fraud detection. John Wiley & Sons, Hoboken. https://doi.org/10.1002/9781119203094
Book Google Scholar
R Development Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://R-project.org
Raimi RA (1976) The first digit problem. Am Math Mon 83(7):521–538. https://doi.org/10.2307/2319349
Article MathSciNet MATH Google Scholar
Rodriguez RJ (2004) First significant digit patterns from mixtures of uniform distributions. Am Stat 58(1):64–71. https://doi.org/10.1198/0003130042782
Article MathSciNet MATH Google Scholar
Ross KA (2011) Benford’s law, a growth industry. Am Math Mon 118(7):571–583. https://doi.org/10.4169/amer.math.monthly.118.07.571
Article MathSciNet MATH Google Scholar
Stigler SM (1980) Stigler’s law of eponymy. Trans N Y Acad Sci 39(1 Series II):147–157. https://doi.org/10.1111/j.2164-0947.1980.tb02775.x
Article Google Scholar
Tsagbey S, de Carvalho M, Page GL (2017) All data are wrong, but some are useful? Advocating the need for data auditing. Am Stat 71(3):231–235. https://doi.org/10.1080/00031305.2017.1311282
Article MathSciNet MATH Google Scholar
Whyman G, Shulzinger E, Bormashenko E (2016) Intuitive considerations clarifying the origin and applicability of the Benford law. Results Phys 6:3–6. https://doi.org/10.1016/j.rinp.2015.11.010
Article Google Scholar
Wickham H (2016) ggplot2: Elegant graphics for data analysis. Use R!, Springer, New York. https://ggplot2.tidyverse.org

Download references

Acknowledgements

We would like to express our gratitude to Aris Spanos for his comments and suggestions on an early draft of this paper. Comments from Marcel Ausloos and two anonymous referees are gratefully acknowledged. We owe a special thank to Alex Kossovsky for having made public his data. All computations have been carried out using R 4.1.2 (R Development Core Team 2021): graphs greatly benefited from package “ggplot2” (Wickham 2016).

Funding

No funds, grants, or other support was received.

Author information

Both authors contributed equally to this work.

Authors and Affiliations

Department of Social and Economic Sciences, La Sapienza University of Rome, P.le Aldo Moro 5, 00185, Rome, Italy
Roy Cerqueti
GRANEM, Université d’Angers, SFR Confluences, 49000, Angers, France
Roy Cerqueti
Department of Economics, University of Molise, Via F. De Sanctis, 86100, Campobasso, Italy
Claudio Lupi

Authors

Roy Cerqueti
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Lupi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Lupi.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cerqueti, R., Lupi, C. Severe testing of Benford’s law. TEST 32, 677–694 (2023). https://doi.org/10.1007/s11749-023-00848-z

Download citation

Received: 20 January 2023
Accepted: 05 February 2023
Published: 28 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11749-023-00848-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Severe testing of Benford’s law

Abstract

Access this article

Similar content being viewed by others

Numerical tools for obtaining power-law representations of heavy-tailed datasets

Is the Benford Law Useful for Data Quality Assessment?

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Severe testing of Benford’s law

Abstract

Access this article

Similar content being viewed by others

Numerical tools for obtaining power-law representations of heavy-tailed datasets

Is the Benford Law Useful for Data Quality Assessment?

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation