Abstract
The numerical reliability of statistical software packages was examined for logistic regression models, including SAS 9.4, MATLAB R2015b, R 3.3.1., Stata/IC 14, and LIMDEP 10. Thirty unique benchmark datasets were created by simulating alternative conditional binary choice processes examining rare events, near-multicollinearity, quasi-separation and nonlinear transformation of variables. Certified benchmark estimates for parameters and standard errors of associated datasets were obtained following standards set-out by the National Institute of Standards and Technology. The logarithm of relative error was used as a measure of accuracy for numerical reliability. The paper finds that choice of software package and procedure for estimating logistic regressions will impact accuracy and use of default settings in these packages may significantly reduce reliability of results in different situations.
Similar content being viewed by others
Notes
In any empirical work, researchers do not know true values, thus cross validation of research results becomes critical for verifying the numerical reliability of estimates from nonlinear models.
Quasi-separation (also known as quasi complete separation) occurs when a collection of covariates can almost completely separate the outcome groups in the discrete choice model. That is, only a few observations are left that make the outcome groups overlap or in terms of discriminant analysis, the discriminant can almost perfectly delineate the outcome groups (Hosmer et al. 2013).
References
Altman M, Gill J, McDonald MP (2004) Numerical issues in statistical computing for the social scientist. Wiley, New York
Arnold BC, Castillo E, Sarabia JM (1999) Conditional specification of statistical models. Springer, New York
Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms. Wiley, Hoboken
Bergtold JS, Spanos A, Onukwugha E (2010) Bernoulli regression models: revisiting the specification of statistical models with binary dependent variables. J Choice Model 3(2):1–28
Cameron AC, Trivedi PK (2009) Microeconometrics: methods and applications. Cambridge University Press, New York
Chang JB, Lusk JL (2011) Mixed logit models: accuracy and software choice. J Appl Econom 26(1):167–172
Core Team R (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Econometric Software, Inc. (2012) Limdep 10 and Nlogit 5. http://www.limdep.com/features/documentation.php. Accessed on 15 Aug 2015
Greene WH (2002) Econometric analysis. Prentice, Englewood Cliffs
Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, Hoboken
Huber J, Train K (2001) On the similarity of classical and Bayesian estimates of individual mean partworths. Mark Lett 12(3):259–269
Kay R, Little S (1987) Transformations of the explanatory variables in the logistic regression models for binary data. Biometrika 74:495–501
Keeling KB, Pavur RJ (2007) A comparative study of the reliability of nine statistical software packages. Comput Stat Data Anal 51(8):3811–3831
Kolenikov S (2001) Review of Stata 7. J Appl Econom 16(5):637–646
McCullough B (2000a) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15(2):279–300
McCullough B (2000b) Is it safe to assume that software is accurate? Int J Forecast 16(3):349–357
McCullough BD, Renfro CG (1998) Benchmarks and software standards: a case study of GARCH procedures. J Econ Soc Meas 25:59–71
McCullough B, Renfro CG (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26(1):63–77
McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52(4):358–366
McCullough BD (1999a) Assessing the reliability of statistical software: part II. Am Stat 53(2):149–159
McCullough BD (1999b) Econometric software reliability: Eviews, Limdep. Shazam and Tsp. J Appl Econom 14(2):191–202
McCullough BD, Vinod HD (1999) The numerical reliability of econometric software. J Econ Lit 37:633–665
McCullough BD, Vinod HD (2003) Verifying the solution from a nonlinear solver: a case study. Am Econ Rev 93(3):873–892
McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft excel 97. Comput Stat Data Anal 31(1):27–37
McKenzie CR, Takaoka S (2003) 2002: a LIMDEP odyssey. J Appl Econom 18(2):241–247
Murray W (1972) Failure, the causes and cures. In: Murray W (ed) Numerical methods for unconstrained optimization. Academic Press, New York, pp 107–122
Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill Inc, New York
National Institute of Standards and Technology (2014) Statistical reference datasets. http://www.itl.nist.gov/div898/strd. Accessed on 15 April 2014
Odeh OO, Featherstone AM, Bergtold JS (2010) Reliability of statistical software. Am J Agric Econ 92(5):1472–1479
Oster RA (2002) An examination of statistical software packages for categorical data analysis using exact methods. Am Stat 56(3):235–246
Oster RA (2003) An examination of statistical software packages for categorical data analysis using exact methods–part ii. Am Stat 57(3):201–213
Ryan TP (2009) Modern regression models, 2nd edn. Wiley, Hoboken, NJ
SAS Manual (2009) Sas/stat 13.2 user’s guide. http://support.sas.com/documentation/cdl/en/statug/67523/PDF/default/statug.pdf. Accessed on 25 Aug 2014
Scrucca L, Weisberg S (2004) A simulation study to investigate the behavior of the log-density ratio under normality. Commun Stat Simul Comput 33:159–178
Stokes HH (2004) On the advantage of using two or more econometric software systems to solve the same problem. J Econ Soc Meas 29(1):307–320
Tomek WG (1993) Confirmation and replication in empirical econometrics: a step toward improved scholarship. Am J Agric Econ 75(Special Issue):6–14
Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge
Wolfram (2015a), Unconstrained optimization: methods of local minimization. Wolfram language & system. Online documentation. http://reference.wolfram.com/language/tutorial/UnconstrainedOptimizationOverview.html. Last accessed 3 Dec 2015
Wolfram (2015a), Wolfram Mathematica tutorial collection. https://www.wolfram.com/learningcenter/tutorialcollection/complete/. Last accessed 3 Dec 2015
Acknowledgements
Partial support for this research was obtained from the National Science Foundation Grant: From Crops to Commuting: Integrating the Social, Technological, and Agricultural Aspects of Renewable and Sustainable Biorefining (I-STAR); NSF Award No. DGE-0903701. The analysis and conclusions set forth are those of the authors based on the independent assessments of statistical software.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bergtold, J.S., Pokharel, K.P., Featherstone, A.M. et al. On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables. Comput Stat 33, 757–786 (2018). https://doi.org/10.1007/s00180-017-0776-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0776-5