Computational Statistics

, Volume 33, Issue 2, pp 757–786 | Cite as

On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

  • Jason S. Bergtold
  • Krishna P. Pokharel
  • Allen M. Featherstone
  • Lijia Mo
Original Paper
  • 34 Downloads

Abstract

The numerical reliability of statistical software packages was examined for logistic regression models, including SAS 9.4, MATLAB R2015b, R 3.3.1., Stata/IC 14, and LIMDEP 10. Thirty unique benchmark datasets were created by simulating alternative conditional binary choice processes examining rare events, near-multicollinearity, quasi-separation and nonlinear transformation of variables. Certified benchmark estimates for parameters and standard errors of associated datasets were obtained following standards set-out by the National Institute of Standards and Technology. The logarithm of relative error was used as a measure of accuracy for numerical reliability. The paper finds that choice of software package and procedure for estimating logistic regressions will impact accuracy and use of default settings in these packages may significantly reduce reliability of results in different situations.

Keywords

Accuracy Benchmark datasets Logistic regression Maximum likelihood estimation Econometric software 

Notes

Acknowledgements

Partial support for this research was obtained from the National Science Foundation Grant: From Crops to Commuting: Integrating the Social, Technological, and Agricultural Aspects of Renewable and Sustainable Biorefining (I-STAR); NSF Award No. DGE-0903701. The analysis and conclusions set forth are those of the authors based on the independent assessments of statistical software.

Supplementary material

180_2017_776_MOESM1_ESM.docx (79 kb)
Supplementary material 1 (docx 78 KB)
180_2017_776_MOESM2_ESM.zip (2 mb)
Supplementary material 2 (zip 2,021 KB)

References

  1. Altman M, Gill J, McDonald MP (2004) Numerical issues in statistical computing for the social scientist. Wiley, New YorkMATHGoogle Scholar
  2. Arnold BC, Castillo E, Sarabia JM (1999) Conditional specification of statistical models. Springer, New YorkMATHGoogle Scholar
  3. Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms. Wiley, HobokenCrossRefMATHGoogle Scholar
  4. Bergtold JS, Spanos A, Onukwugha E (2010) Bernoulli regression models: revisiting the specification of statistical models with binary dependent variables. J Choice Model 3(2):1–28CrossRefGoogle Scholar
  5. Cameron AC, Trivedi PK (2009) Microeconometrics: methods and applications. Cambridge University Press, New YorkMATHGoogle Scholar
  6. Chang JB, Lusk JL (2011) Mixed logit models: accuracy and software choice. J Appl Econom 26(1):167–172CrossRefGoogle Scholar
  7. Core Team R (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  8. Econometric Software, Inc. (2012) Limdep 10 and Nlogit 5. http://www.limdep.com/features/documentation.php. Accessed on 15 Aug 2015
  9. Greene WH (2002) Econometric analysis. Prentice, Englewood CliffsGoogle Scholar
  10. Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, HobokenCrossRefMATHGoogle Scholar
  11. Huber J, Train K (2001) On the similarity of classical and Bayesian estimates of individual mean partworths. Mark Lett 12(3):259–269CrossRefGoogle Scholar
  12. Kay R, Little S (1987) Transformations of the explanatory variables in the logistic regression models for binary data. Biometrika 74:495–501MathSciNetCrossRefMATHGoogle Scholar
  13. Keeling KB, Pavur RJ (2007) A comparative study of the reliability of nine statistical software packages. Comput Stat Data Anal 51(8):3811–3831MathSciNetCrossRefMATHGoogle Scholar
  14. Kolenikov S (2001) Review of Stata 7. J Appl Econom 16(5):637–646CrossRefGoogle Scholar
  15. McCullough B (2000a) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15(2):279–300CrossRefMATHGoogle Scholar
  16. McCullough B (2000b) Is it safe to assume that software is accurate? Int J Forecast 16(3):349–357MathSciNetCrossRefGoogle Scholar
  17. McCullough BD, Renfro CG (1998) Benchmarks and software standards: a case study of GARCH procedures. J Econ Soc Meas 25:59–71Google Scholar
  18. McCullough B, Renfro CG (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26(1):63–77Google Scholar
  19. McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52(4):358–366Google Scholar
  20. McCullough BD (1999a) Assessing the reliability of statistical software: part II. Am Stat 53(2):149–159Google Scholar
  21. McCullough BD (1999b) Econometric software reliability: Eviews, Limdep. Shazam and Tsp. J Appl Econom 14(2):191–202CrossRefGoogle Scholar
  22. McCullough BD, Vinod HD (1999) The numerical reliability of econometric software. J Econ Lit 37:633–665CrossRefGoogle Scholar
  23. McCullough BD, Vinod HD (2003) Verifying the solution from a nonlinear solver: a case study. Am Econ Rev 93(3):873–892CrossRefGoogle Scholar
  24. McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft excel 97. Comput Stat Data Anal 31(1):27–37CrossRefMATHGoogle Scholar
  25. McKenzie CR, Takaoka S (2003) 2002: a LIMDEP odyssey. J Appl Econom 18(2):241–247CrossRefGoogle Scholar
  26. Murray W (1972) Failure, the causes and cures. In: Murray W (ed) Numerical methods for unconstrained optimization. Academic Press, New York, pp 107–122Google Scholar
  27. Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill Inc, New YorkGoogle Scholar
  28. National Institute of Standards and Technology (2014) Statistical reference datasets. http://www.itl.nist.gov/div898/strd. Accessed on 15 April 2014
  29. Odeh OO, Featherstone AM, Bergtold JS (2010) Reliability of statistical software. Am J Agric Econ 92(5):1472–1479CrossRefGoogle Scholar
  30. Oster RA (2002) An examination of statistical software packages for categorical data analysis using exact methods. Am Stat 56(3):235–246MathSciNetCrossRefGoogle Scholar
  31. Oster RA (2003) An examination of statistical software packages for categorical data analysis using exact methods–part ii. Am Stat 57(3):201–213CrossRefGoogle Scholar
  32. Ryan TP (2009) Modern regression models, 2nd edn. Wiley, Hoboken, NJMATHGoogle Scholar
  33. SAS Manual (2009) Sas/stat 13.2 user’s guide. http://support.sas.com/documentation/cdl/en/statug/67523/PDF/default/statug.pdf. Accessed on 25 Aug 2014
  34. Scrucca L, Weisberg S (2004) A simulation study to investigate the behavior of the log-density ratio under normality. Commun Stat Simul Comput 33:159–178MathSciNetCrossRefMATHGoogle Scholar
  35. Stokes HH (2004) On the advantage of using two or more econometric software systems to solve the same problem. J Econ Soc Meas 29(1):307–320Google Scholar
  36. Tomek WG (1993) Confirmation and replication in empirical econometrics: a step toward improved scholarship. Am J Agric Econ 75(Special Issue):6–14Google Scholar
  37. Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  38. Wolfram (2015a), Unconstrained optimization: methods of local minimization. Wolfram language & system. Online documentation. http://reference.wolfram.com/language/tutorial/UnconstrainedOptimizationOverview.html. Last accessed 3 Dec 2015
  39. Wolfram (2015a), Wolfram Mathematica tutorial collection. https://www.wolfram.com/learningcenter/tutorialcollection/complete/. Last accessed 3 Dec 2015

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Jason S. Bergtold
    • 1
  • Krishna P. Pokharel
    • 2
  • Allen M. Featherstone
    • 2
  • Lijia Mo
    • 2
  1. 1.Department of Agricultural EconomicsKansas State UniversityManhattanUSA
  2. 2.Department of Agricultural EconomicsKansas State UniversityManhattanUSA

Personalised recommendations