Inaccurate regression coefficients in Microsoft Excel 2003: an investigation of Volpi’s “zero bug”
Leonardo Volpi found that Excel 2003, rather than report correct coefficients, would sometimes change them to zero. We have investigated this so-called “zero bug” of the linear regression function LINEST(), and have found that the inaccuracy is caused by a non-standard modified back-substitution procedure. The modification, for which we can find no justification in the numerical analysis or statistical literature, uses a logic to control the bug: when certain conditions are met, accurate coefficients are replaced with inaccurate coefficients that may be zeros or nonzeros. Although Excel 2003 is now out of support, it is still in use. We do not know whether the modification is limited to Excel 2003, or whether Microsoft has programmed similar inaccuracies into other functions or other versions of Excel.
KeywordsLinear regression function LINEST() StRD KB828533 QR decomposition Back-substitution IDA Pro free version IEEE-754 calculator
We thank the developers of IDA Pro free version and IEEE-754 calculator. Without these free applications, our investigation could not be realized. We also thank very much to Volpi’s group for sharing their application, their discovery and their test dataset. We thank various referees and editors for useful comments on the present version and previous versions of this paper. In particular, we thank a referee who verified our x86 processor instructions obtained with IDA Pro. We also thank Talha Yalta, Robert Pavur, Kellie Keeling, and especially Guy Mélard for helpful comments. The research by Professors Sun and Fukuda was funded by Kenkyuhi of Kyushu Sangyo University.
- Almiron M, Lopes B, Oliveira A, Medeiros A, Frery A (2010) On the numerical accuracy of spreadsheets. J Stat Softw 34(1):1–29Google Scholar
- Carlberg C (2013) Forcing the constant in regression to zero: understanding excel’s LINEST() error. http://www.informit.com/articles/article.aspx?p=2019170. Accessed 20 Aug 2017
- Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple regression/correlation analysis for the behavioral sciences, 3rd edn. Lawrence Earlbaum and Associates, MahwayGoogle Scholar
- Keeling KB, Pavur RJ (2005) Numerical accuracy issues in using excel for simulation studies. In: Proceedings of the 2004 winter simulation conference, pp 1513–1518Google Scholar
- netmarketshare.com (2017) https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=10&qpcustomd=0. Accessed 23 Mar 2017
- Patrizio A (2013) Office 2003 is a bigger threat to Microsoft than Google Docs.http://www.networkworld.com/article/2225614/microsoft-subnet/office-2003-is-a-bigger-threat-to-microsoft-than-google-docs.html. Accessed 27 Mar 2017
- Simonoff J (2008) Statistical analysis using Microsoft Excel. Manuscript. http://pages.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelreg.pdf. Accessed 23 Mar 2017