Abstract
All previous versions of Microsoft Excel until Excel 2007 have been criticized by statisticians for several reasons, including the accuracy of statistical functions, the properties of the random number generator, the quality of statistical add-ins, the weakness of the Solver for nonlinear regression, and the data graphical representation. Until recently Microsoft did not make an attempt to fix all the errors in Excel and was still marketing a product that contained known errors. We provide an update of these studies given the recent release of Excel 2010 and we have added OpenOffice.org Calc 3.3 and Gnumeric 1.10.16 to the analysis, for the purpose of comparison. The conclusion is that the stream of papers, mainly in Computational Statistics and Data Analysis, has started to pay off: Microsoft has partially improved the statistical aspects of Excel, essentially the statistical functions and the random number generator.
This is a preview of subscription content, access via your institution.






References
Almiron MG, Lopes B, Oliveira ALC, Medeiros AC, Frery AC (2010) On the numerical accuracy of spreadsheets. J Stat Softw 34(4):1–29
Bailey DH (1993) A portable high performance multiprecision package. RNR technical report 90–022, NASA Ames Research Center, Moffett Field, Calif.
Berger RL (2007) Nonstandard operator precedence in Excel. Comput Stat Data Anal 51:2788–2791
Cleveland W (1994) The elements of graphing data, 2nd edn. Hobart Press, Summit, NJ
Crabtree C (2009) How random is random. http://cliffcrabtree.net/randomarticle.html. Accessed 4 Feb 2014
Cryer JD (2001) Problems with using Microsoft Excel for statistics. In: Proceedings of the joint statistical meetings. American Statistical Association, Atlanta.
European Spreadsheet Risks Interest Group (2013) http://www.eusprig.org/index.htm. Accessed 29 Jan 2014
Fylstra D, Lasdon L, Watson J, Waren AD (1998) Design and use of the Microsoft Excel Solver. Interfaces 28:29–55
Guibas LJ, Odlyzko AM (1981) Periods in strings. J Comb Theory A 30:1942
Hargreaves BR, McWilliams TP (2010) Polynomial trendline function flaws in Microsoft Excel. Comput Stat Data Anal 54:1190–1196
Heiser DA (2009) Microsoft Excel 2000, 2003 and 2007 faults, problems, workarounds and fixes. http://www.daheiser.info/excel/frontpage.html. Accessed 26 Feb 2013
Herndon T, Ash M. Pollin R (2013) Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. PERI working paper no. 322, April, University of Massachusetts Amherst.
Hümmer T (2010) International OpenOffice market shares. http://www.webmasterpro.de/portal/news/2010/02/05/international-openoffice-market-shares.html. Accessed 29 Jan 2014
IBM (2010) The risks of using spreadsheets for statistical analysis. IBM software business analytics, Somers. http://public.dhe.ibm.com/common/ssi/ecm/en/ytw03240usen/YTW03240USEN.PDF. Accessed 29 Jan 2014
Keeling KB, Pavur RJ (2011) Statistical accuracy of spreadsheet software. Am Stat 65:265–273
Knüsel L (1998) On the accuracy of statistical distributions in Microsoft Excel 97. Comput Stat Data Anal 26:375–377
Knüsel L (2003) Computation of statistical distributions–documentation of program ELV, 2nd edn. http://www.statistik.lmu.de/ knuesel. Accessed 26 Feb 2013
Knüsel L (2005) On the accuracy of statistical distributions in Microsoft Excel 2003. Comput Stat Data Anal 48:445–449
Knüsel L (2011) On the accuracy of statistical distributions in Microsoft Excel 2010. http://www.csdassn.org/software_reports/Excel2011.pdf. Accessed 29 Jan 2014
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Lasdon LS, Waren AD, Jain A, Ratner M (1978) Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans Math Softw 4:34–50
L’Ecuyer P, Simard R (2007) TESTU01: A C library for empirical testing of random number generators. ACM Trans Math Softw 33(22):1–40
L’Ecuyer P, Simard R (2009) TestU01: A software library in ANSI C for empirical testing of random number generators. User’s guide, compact version. http://www.iro.umontreal.ca/ \(\sim \)lecuyer/. Accessed 29 Jan 2014
Lewis JW (2005). https://groups.google.com/forum/#!msg/microsoft.public.excel/n4PKPeo45QE/9a8eXXiV-24J. Accessed 4 Feb 2014
Matsumoto M, Nishimura T (1998) Mersenne Twister: a 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans Model Comput Simul 8:3–30
McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52:358–366
McCullough BD (1999) Assessing the reliability of statistical software: part II. Am Stat 53:149–159
McCullough BD (2000) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15:279–299
McCullough BD (2004a) Some details of nonlinear estimation. In: Altman M, Gill J, McDonald MP (eds) Numerical issues in statistical computing for the social scientist, Chapter 8 . Wiley, New York, pp 199–218
McCullough BD (2004b) Wilkinson’s tests and econometric software. J Econ Soc Meas 29:261–270
McCullough BD (2004c) Fixing statistical errors in spreadsheet software: the cases of Gnumeric and Excel. CSDA Statistical Software Newsletter. www.csdassn.org/software_reports.cfm.
McCullough BD (2006) A review of TESTU01. J Appl Econ 21:677–682
McCullough BD (2008a) Editorial: special section on Microsoft Excel 2007. Comput Stat Data Anal 52:4568–4569
McCullough BD (2008b) Microsoft Excel 2007’s ’Not the Wichmann-Hill’ random number generator. Comput Stat Data Anal 52:4587–4593
McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft Excel 97. Comput Stat Data Anal 31:27–37
McCullough BD, Renfro CR (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26:63–77
McCullough BD, Wilson B (2002) On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Comput Stat Data Anal 40:713–721
McCullough BD, Wilson B (2005) On the accuracy of statistical procedures in Microsoft Excel 2003. Comput Stat Data Anal 49:1244–1252
McCullough BD, Heiser DA (2008) On the accuracy of statistical procedures in Microsoft Excel 2007. Comput Stat Data Anal 52:4570–4578
McCullough BD, Yalta AT (2013) Spreadsheets in the cloud–not ready yet. J Stat Softw 52(7):1–14
Microsoft (2009a) Function improvements in Microsoft Excel 2010, White paper. http://blogs.office.com/cfs-filesystemfile.ashx/__key/CommunityServer-Components-PostAttachments/00-00-00-49-53/ExcelFunctionImprovements-10_2D00_05_2D00_09-_2800_1_2900_. Accessed 26 Feb 2013
Microsoft (2009b) About Solver. http://office.microsoft.com/en-us/excel-help/about-solver-HP005198368.aspx?pid=CH010004571033. Accessed 29 Jan 2014
Microsoft (2013) Changes in Office 2013. http://technet.microsoft.com/en-us/library/cc178954.aspx. Accessed 29 Jan 2014
Oppenheimer DM (2009) Function improvements in Excel 2010. http://blogs.msdn.com/b/excel/archive/2009/09/10/function-improvements-in-excel-2010.aspx. Accessed 29 Jan 2014
Reinhart CM, Rogoff KS (2010) Growth in a time of debt. Am Econ Rev 100:573–578
Rinnooy-Kan AHG, Timmer GT (1987a) Stochastic global optimization methods. Part I: clustering methods. Math Program 39:27–56
Rinnooy-Kan AHG, Timmer GT (1987b) Stochastic global optimization methods. Part II: multi level methods. Math Program 39:57–78
Sawitzki G (1994) Report on the numerical reliability of data analysis systems. Comput Stat Data Anal 18:289–301
Simon G (2000) Excel produces numerically inaccurate results. https://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0012&L=ASSUME&F=&S=&X=31BDFD6E4E1E0E4F11&P=4325. Accessed 5 Sept 2010 (Online resource 1)
Su Y-S (2008) It’s easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs. Comput Stat Data Anal 52:4594–4601
Tufte ER (1983) The visual display of quantitative information. Graphic Press, Cheshire
Walkenbach J (2007) Excel 2007 charts. Wiley, New York
Wichmann BA (1985) Hill ID (1982) Algorithm AS 183: an efficient and portable pseudo-random number generator. J R Statist Soc Ser C Appl Stat 31:188–190. Reprinted with correction. In: Hill ID, Griffith P (eds) Applied statistics algorithms. Ellis Horwood, Chichester
Yalta AT (2008) The accuracy of statistical distributions in Microsoft Excel 2007. Comput Stat Data Anal 52:4579–4586
Yalta AT, Jenal O (2009) On the importance of verifying forecasting results. Int J Forecast 25:62–73
Yalta AT, Yalta AY (2010) Should economists use open source software for doing research? Comput Econ 35:371–394
Acknowledgments
I would like to thank Christian Ritter, Pierre Dagnelie, Bruce McCullough, and Talha Yalta for some references and Atika Cohen and two anonymous referees of a first version for their comments. I am deeply grateful to Bruce McCullough and Talha Yalta who kindly gave me their test files, for their support and their comments. I am not sure I would have completed this paper without the insistence of Bruce McCullough. I thank Richard Simard for his advice on TestU01. I would like to thank David Heiser, Daniel Fylstra and Skylab Gupta, both of Frontline Systems, for their comments on a second version and Joseph Billo on a third version. I am indebted to Jerry W. Lewis for his remarks, although not all could be addressed. I would like to thank the two anonymous referees of the last version of the paper, the co-editor Juergen Symanzik and the production editor Balaji Rajagopal, for their help, and Wassim Cornet for editing. I assume full responsibility for the paper. It is dedicated to the memory of Leo Knüsel who passed away on April 22, 2013.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Mélard, G. On the accuracy of statistical procedures in Microsoft Excel 2010. Comput Stat 29, 1095–1128 (2014). https://doi.org/10.1007/s00180-014-0482-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0482-5
Keywords
- Statistical function
- Random number generator
- Nonlinear regression
- OpenOffice.org Calc 3.3
- Gnumeric