Computational Statistics

, Volume 32, Issue 4, pp 1411–1421 | Cite as

Inaccurate regression coefficients in Microsoft Excel 2003: an investigation of Volpi’s “zero bug”

Original Paper

Abstract

Leonardo Volpi found that Excel 2003, rather than report correct coefficients, would sometimes change them to zero. We have investigated this so-called “zero bug” of the linear regression function LINEST(), and have found that the inaccuracy is caused by a non-standard modified back-substitution procedure. The modification, for which we can find no justification in the numerical analysis or statistical literature, uses a logic to control the bug: when certain conditions are met, accurate coefficients are replaced with inaccurate coefficients that may be zeros or nonzeros. Although Excel 2003 is now out of support, it is still in use. We do not know whether the modification is limited to Excel 2003, or whether Microsoft has programmed similar inaccuracies into other functions or other versions of Excel.

Keywords

Linear regression function LINEST() StRD KB828533 QR decomposition Back-substitution IDA Pro free version IEEE-754 calculator 

Notes

Acknowledgements

We thank the developers of IDA Pro free version and IEEE-754 calculator. Without these free applications, our investigation could not be realized. We also thank very much to Volpi’s group for sharing their application, their discovery and their test dataset. We thank various referees and editors for useful comments on the present version and previous versions of this paper. In particular, we thank a referee who verified our x86 processor instructions obtained with IDA Pro. We also thank Talha Yalta, Robert Pavur, Kellie Keeling, and especially Guy Mélard for helpful comments. The research by Professors Sun and Fukuda was funded by Kenkyuhi of Kyushu Sangyo University.

References

  1. Almiron M, Lopes B, Oliveira A, Medeiros A, Frery A (2010) On the numerical accuracy of spreadsheets. J Stat Softw 34(1):1–29Google Scholar
  2. Carlberg C (2013) Forcing the constant in regression to zero: understanding excel’s LINEST() error. http://www.informit.com/articles/article.aspx?p=2019170. Accessed 20 Aug 2017
  3. Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple regression/correlation analysis for the behavioral sciences, 3rd edn. Lawrence Earlbaum and Associates, MahwayGoogle Scholar
  4. Keeling KB, Pavur RJ (2005) Numerical accuracy issues in using excel for simulation studies. In: Proceedings of the 2004 winter simulation conference, pp 1513–1518Google Scholar
  5. Keeling KB, Pavur RJ (2011) Statistical accuracy of spreadsheet software. Am Stat 65(4):265–273CrossRefMathSciNetGoogle Scholar
  6. Knüsel L (1998) On the accuracy of statistical distributions in Microsoft Excel 97. Comput Stat Data Anal 26(3):375–377CrossRefMathSciNetGoogle Scholar
  7. Knüsel L (2005) On the accuracy of statistical distributions in Microsoft Excel 2003. Comput Stat Data Anal 48:445–449CrossRefMATHMathSciNetGoogle Scholar
  8. McCullough BD (2008) Microsoft Excel’s ’not the Wichmann–Hill’ random number generators. Comput Stat Data Anal 52(10):4587–4593CrossRefMATHMathSciNetGoogle Scholar
  9. McCullough BD, Heiser DA (2008) On the accuracy of statistical procedures in Microsoft Excel 2007. Comput Stat Data Anal 52(10):4570–4578CrossRefMATHMathSciNetGoogle Scholar
  10. McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft Excel 97. Comput Stat Data Anal 31(1):27–37CrossRefMATHGoogle Scholar
  11. McCullough BD, Wilson B (2002) On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Comput Stat Data Anal 40(4):713–721CrossRefMATHMathSciNetGoogle Scholar
  12. McCullough BD, Wilson B (2005) On the accuracy of statistical procedures in Microsoft Excel 2003. Comput Stat Data Anal 49(4):1244–1252CrossRefMATHMathSciNetGoogle Scholar
  13. Mélard G (2014) On the accuracy of statistical procedures in Microsoft Excel 2010. Comput Stat 29(5):1095–1128CrossRefMATHMathSciNetGoogle Scholar
  14. Patrizio A (2013) Office 2003 is a bigger threat to Microsoft than Google Docs.http://www.networkworld.com/article/2225614/microsoft-subnet/office-2003-is-a-bigger-threat-to-microsoft-than-google-docs.html. Accessed 27 Mar 2017
  15. Sawitzki G (1994) Report on the numerical reliability of data analysis systems. Comput Stat Data Anal 18(2):289–301CrossRefGoogle Scholar
  16. Simonoff J (2008) Statistical analysis using Microsoft Excel. Manuscript. http://pages.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelreg.pdf. Accessed 23 Mar 2017
  17. Thisted RA (1988) Elements of statistical computing. Chapman and Hall, New YorkMATHGoogle Scholar
  18. Yalta AT (2008) The accuracy of statistical distributions in Microsoft Excel 2007. Comput Stat Data Anal 52(10):4579–4586CrossRefMATHMathSciNetGoogle Scholar
  19. Yalta AT, Yalta AY (2009) Should economists use open source software for doing research? Comput Econ 35(4):371–394CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Faculty of ManagementKyushu Sangyo UniversityFukuokaJapan
  2. 2.LeBow College of BusinessDrexel UniversityPhiladelphiaUSA

Personalised recommendations