Advertisement

Computational Statistics

, Volume 29, Issue 5, pp 1095–1128 | Cite as

On the accuracy of statistical procedures in Microsoft Excel 2010

  • Guy Mélard
Original Paper

Abstract

All previous versions of Microsoft Excel until Excel 2007 have been criticized by statisticians for several reasons, including the accuracy of statistical functions, the properties of the random number generator, the quality of statistical add-ins, the weakness of the Solver for nonlinear regression, and the data graphical representation. Until recently Microsoft did not make an attempt to fix all the errors in Excel and was still marketing a product that contained known errors. We provide an update of these studies given the recent release of Excel 2010 and we have added OpenOffice.org Calc 3.3 and Gnumeric 1.10.16 to the analysis, for the purpose of comparison. The conclusion is that the stream of papers, mainly in Computational Statistics and Data Analysis, has started to pay off: Microsoft has partially improved the statistical aspects of Excel, essentially the statistical functions and the random number generator.

Keywords

Statistical function Random number generator Nonlinear regression OpenOffice.org Calc 3.3 Gnumeric 

Notes

Acknowledgments

I would like to thank Christian Ritter, Pierre Dagnelie, Bruce McCullough, and Talha Yalta for some references and Atika Cohen and two anonymous referees of a first version for their comments. I am deeply grateful to Bruce McCullough and Talha Yalta who kindly gave me their test files, for their support and their comments. I am not sure I would have completed this paper without the insistence of Bruce McCullough. I thank Richard Simard for his advice on TestU01. I would like to thank David Heiser, Daniel Fylstra and Skylab Gupta, both of Frontline Systems, for their comments on a second version and Joseph Billo on a third version. I am indebted to Jerry W. Lewis for his remarks, although not all could be addressed. I would like to thank the two anonymous referees of the last version of the paper, the co-editor Juergen Symanzik and the production editor Balaji Rajagopal, for their help, and Wassim Cornet for editing. I assume full responsibility for the paper. It is dedicated to the memory of Leo Knüsel who passed away on April 22, 2013.

Supplementary material

180_2014_482_MOESM1_ESM.pdf (217 kb)
Supplementary material 1 (pdf 217 KB)

References

  1. Almiron MG, Lopes B, Oliveira ALC, Medeiros AC, Frery AC (2010) On the numerical accuracy of spreadsheets. J Stat Softw 34(4):1–29Google Scholar
  2. Bailey DH (1993) A portable high performance multiprecision package. RNR technical report 90–022, NASA Ames Research Center, Moffett Field, Calif.Google Scholar
  3. Berger RL (2007) Nonstandard operator precedence in Excel. Comput Stat Data Anal 51:2788–2791CrossRefMATHGoogle Scholar
  4. Cleveland W (1994) The elements of graphing data, 2nd edn. Hobart Press, Summit, NJGoogle Scholar
  5. Crabtree C (2009) How random is random. http://cliffcrabtree.net/randomarticle.html. Accessed 4 Feb 2014
  6. Cryer JD (2001) Problems with using Microsoft Excel for statistics. In: Proceedings of the joint statistical meetings. American Statistical Association, Atlanta.Google Scholar
  7. European Spreadsheet Risks Interest Group (2013) http://www.eusprig.org/index.htm. Accessed 29 Jan 2014
  8. Fylstra D, Lasdon L, Watson J, Waren AD (1998) Design and use of the Microsoft Excel Solver. Interfaces 28:29–55CrossRefGoogle Scholar
  9. Guibas LJ, Odlyzko AM (1981) Periods in strings. J Comb Theory A 30:1942Google Scholar
  10. Hargreaves BR, McWilliams TP (2010) Polynomial trendline function flaws in Microsoft Excel. Comput Stat Data Anal 54:1190–1196MathSciNetCrossRefMATHGoogle Scholar
  11. Heiser DA (2009) Microsoft Excel 2000, 2003 and 2007 faults, problems, workarounds and fixes. http://www.daheiser.info/excel/frontpage.html. Accessed 26 Feb 2013
  12. Herndon T, Ash M. Pollin R (2013) Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. PERI working paper no. 322, April, University of Massachusetts Amherst.Google Scholar
  13. Hümmer T (2010) International OpenOffice market shares. http://www.webmasterpro.de/portal/news/2010/02/05/international-openoffice-market-shares.html. Accessed 29 Jan 2014
  14. IBM (2010) The risks of using spreadsheets for statistical analysis. IBM software business analytics, Somers. http://public.dhe.ibm.com/common/ssi/ecm/en/ytw03240usen/YTW03240USEN.PDF. Accessed 29 Jan 2014
  15. Keeling KB, Pavur RJ (2011) Statistical accuracy of spreadsheet software. Am Stat 65:265–273MathSciNetCrossRefGoogle Scholar
  16. Knüsel L (1998) On the accuracy of statistical distributions in Microsoft Excel 97. Comput Stat Data Anal 26:375–377CrossRefGoogle Scholar
  17. Knüsel L (2003) Computation of statistical distributions–documentation of program ELV, 2nd edn. http://www.statistik.lmu.de/ knuesel. Accessed 26 Feb 2013
  18. Knüsel L (2005) On the accuracy of statistical distributions in Microsoft Excel 2003. Comput Stat Data Anal 48:445–449CrossRefMATHGoogle Scholar
  19. Knüsel L (2011) On the accuracy of statistical distributions in Microsoft Excel 2010. http://www.csdassn.org/software_reports/Excel2011.pdf. Accessed 29 Jan 2014
  20. Koenker R (2005) Quantile regression. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  21. Lasdon LS, Waren AD, Jain A, Ratner M (1978) Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans Math Softw 4:34–50CrossRefMATHGoogle Scholar
  22. L’Ecuyer P, Simard R (2007) TESTU01: A C library for empirical testing of random number generators. ACM Trans Math Softw 33(22):1–40MathSciNetGoogle Scholar
  23. L’Ecuyer P, Simard R (2009) TestU01: A software library in ANSI C for empirical testing of random number generators. User’s guide, compact version. http://www.iro.umontreal.ca/ \(\sim \)lecuyer/. Accessed 29 Jan 2014
  24. Matsumoto M, Nishimura T (1998) Mersenne Twister: a 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans Model Comput Simul 8:3–30CrossRefMATHGoogle Scholar
  25. McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52:358–366Google Scholar
  26. McCullough BD (1999) Assessing the reliability of statistical software: part II. Am Stat 53:149–159Google Scholar
  27. McCullough BD (2000) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15:279–299CrossRefMATHGoogle Scholar
  28. McCullough BD (2004a) Some details of nonlinear estimation. In: Altman M, Gill J, McDonald MP (eds) Numerical issues in statistical computing for the social scientist, Chapter 8 . Wiley, New York, pp 199–218Google Scholar
  29. McCullough BD (2004b) Wilkinson’s tests and econometric software. J Econ Soc Meas 29:261–270Google Scholar
  30. McCullough BD (2004c) Fixing statistical errors in spreadsheet software: the cases of Gnumeric and Excel. CSDA Statistical Software Newsletter. www.csdassn.org/software_reports.cfm.Google Scholar
  31. McCullough BD (2006) A review of TESTU01. J Appl Econ 21:677–682CrossRefGoogle Scholar
  32. McCullough BD (2008a) Editorial: special section on Microsoft Excel 2007. Comput Stat Data Anal 52:4568–4569MathSciNetCrossRefMATHGoogle Scholar
  33. McCullough BD (2008b) Microsoft Excel 2007’s ’Not the Wichmann-Hill’ random number generator. Comput Stat Data Anal 52:4587–4593MathSciNetCrossRefMATHGoogle Scholar
  34. McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft Excel 97. Comput Stat Data Anal 31:27–37CrossRefMATHGoogle Scholar
  35. McCullough BD, Renfro CR (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26:63–77Google Scholar
  36. McCullough BD, Wilson B (2002) On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Comput Stat Data Anal 40:713–721MathSciNetCrossRefMATHGoogle Scholar
  37. McCullough BD, Wilson B (2005) On the accuracy of statistical procedures in Microsoft Excel 2003. Comput Stat Data Anal 49:1244–1252MathSciNetCrossRefMATHGoogle Scholar
  38. McCullough BD, Heiser DA (2008) On the accuracy of statistical procedures in Microsoft Excel 2007. Comput Stat Data Anal 52:4570–4578MathSciNetCrossRefMATHGoogle Scholar
  39. McCullough BD, Yalta AT (2013) Spreadsheets in the cloud–not ready yet. J Stat Softw 52(7):1–14Google Scholar
  40. Microsoft (2013) Changes in Office 2013. http://technet.microsoft.com/en-us/library/cc178954.aspx. Accessed 29 Jan 2014
  41. Oppenheimer DM (2009) Function improvements in Excel 2010. http://blogs.msdn.com/b/excel/archive/2009/09/10/function-improvements-in-excel-2010.aspx. Accessed 29 Jan 2014
  42. Reinhart CM, Rogoff KS (2010) Growth in a time of debt. Am Econ Rev 100:573–578CrossRefGoogle Scholar
  43. Rinnooy-Kan AHG, Timmer GT (1987a) Stochastic global optimization methods. Part I: clustering methods. Math Program 39:27–56MathSciNetMATHGoogle Scholar
  44. Rinnooy-Kan AHG, Timmer GT (1987b) Stochastic global optimization methods. Part II: multi level methods. Math Program 39:57–78MathSciNetCrossRefMATHGoogle Scholar
  45. Sawitzki G (1994) Report on the numerical reliability of data analysis systems. Comput Stat Data Anal 18:289–301CrossRefGoogle Scholar
  46. Simon G (2000) Excel produces numerically inaccurate results. https://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0012&L=ASSUME&F=&S=&X=31BDFD6E4E1E0E4F11&P=4325. Accessed 5 Sept 2010 (Online resource 1)
  47. Su Y-S (2008) It’s easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs. Comput Stat Data Anal 52:4594–4601CrossRefMATHGoogle Scholar
  48. Tufte ER (1983) The visual display of quantitative information. Graphic Press, CheshireGoogle Scholar
  49. Walkenbach J (2007) Excel 2007 charts. Wiley, New YorkGoogle Scholar
  50. Wichmann BA (1985) Hill ID (1982) Algorithm AS 183: an efficient and portable pseudo-random number generator. J R Statist Soc Ser C Appl Stat 31:188–190. Reprinted with correction. In: Hill ID, Griffith P (eds) Applied statistics algorithms. Ellis Horwood, ChichesterGoogle Scholar
  51. Yalta AT (2008) The accuracy of statistical distributions in Microsoft Excel 2007. Comput Stat Data Anal 52:4579–4586MathSciNetCrossRefMATHGoogle Scholar
  52. Yalta AT, Jenal O (2009) On the importance of verifying forecasting results. Int J Forecast 25:62–73CrossRefGoogle Scholar
  53. Yalta AT, Yalta AY (2010) Should economists use open source software for doing research? Comput Econ 35:371–394CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Université libre de BruxellesECARES and Solvay Brussels School of Economics and ManagementBrusselsBelgium

Personalised recommendations