Skip to main content
Log in

On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The numerical reliability of statistical software packages was examined for logistic regression models, including SAS 9.4, MATLAB R2015b, R 3.3.1., Stata/IC 14, and LIMDEP 10. Thirty unique benchmark datasets were created by simulating alternative conditional binary choice processes examining rare events, near-multicollinearity, quasi-separation and nonlinear transformation of variables. Certified benchmark estimates for parameters and standard errors of associated datasets were obtained following standards set-out by the National Institute of Standards and Technology. The logarithm of relative error was used as a measure of accuracy for numerical reliability. The paper finds that choice of software package and procedure for estimating logistic regressions will impact accuracy and use of default settings in these packages may significantly reduce reliability of results in different situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. In any empirical work, researchers do not know true values, thus cross validation of research results becomes critical for verifying the numerical reliability of estimates from nonlinear models.

  2. Quasi-separation (also known as quasi complete separation) occurs when a collection of covariates can almost completely separate the outcome groups in the discrete choice model. That is, only a few observations are left that make the outcome groups overlap or in terms of discriminant analysis, the discriminant can almost perfectly delineate the outcome groups (Hosmer et al. 2013).

References

  • Altman M, Gill J, McDonald MP (2004) Numerical issues in statistical computing for the social scientist. Wiley, New York

    MATH  Google Scholar 

  • Arnold BC, Castillo E, Sarabia JM (1999) Conditional specification of statistical models. Springer, New York

    MATH  Google Scholar 

  • Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Bergtold JS, Spanos A, Onukwugha E (2010) Bernoulli regression models: revisiting the specification of statistical models with binary dependent variables. J Choice Model 3(2):1–28

    Article  Google Scholar 

  • Cameron AC, Trivedi PK (2009) Microeconometrics: methods and applications. Cambridge University Press, New York

    MATH  Google Scholar 

  • Chang JB, Lusk JL (2011) Mixed logit models: accuracy and software choice. J Appl Econom 26(1):167–172

    Article  Google Scholar 

  • Core Team R (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Econometric Software, Inc. (2012) Limdep 10 and Nlogit 5. http://www.limdep.com/features/documentation.php. Accessed on 15 Aug 2015

  • Greene WH (2002) Econometric analysis. Prentice, Englewood Cliffs

    Google Scholar 

  • Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Huber J, Train K (2001) On the similarity of classical and Bayesian estimates of individual mean partworths. Mark Lett 12(3):259–269

    Article  Google Scholar 

  • Kay R, Little S (1987) Transformations of the explanatory variables in the logistic regression models for binary data. Biometrika 74:495–501

    Article  MathSciNet  MATH  Google Scholar 

  • Keeling KB, Pavur RJ (2007) A comparative study of the reliability of nine statistical software packages. Comput Stat Data Anal 51(8):3811–3831

    Article  MathSciNet  MATH  Google Scholar 

  • Kolenikov S (2001) Review of Stata 7. J Appl Econom 16(5):637–646

    Article  Google Scholar 

  • McCullough B (2000a) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15(2):279–300

    Article  MATH  Google Scholar 

  • McCullough B (2000b) Is it safe to assume that software is accurate? Int J Forecast 16(3):349–357

    Article  MathSciNet  Google Scholar 

  • McCullough BD, Renfro CG (1998) Benchmarks and software standards: a case study of GARCH procedures. J Econ Soc Meas 25:59–71

    Google Scholar 

  • McCullough B, Renfro CG (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26(1):63–77

    Google Scholar 

  • McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52(4):358–366

    Google Scholar 

  • McCullough BD (1999a) Assessing the reliability of statistical software: part II. Am Stat 53(2):149–159

    Google Scholar 

  • McCullough BD (1999b) Econometric software reliability: Eviews, Limdep. Shazam and Tsp. J Appl Econom 14(2):191–202

    Article  Google Scholar 

  • McCullough BD, Vinod HD (1999) The numerical reliability of econometric software. J Econ Lit 37:633–665

    Article  Google Scholar 

  • McCullough BD, Vinod HD (2003) Verifying the solution from a nonlinear solver: a case study. Am Econ Rev 93(3):873–892

    Article  Google Scholar 

  • McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft excel 97. Comput Stat Data Anal 31(1):27–37

    Article  MATH  Google Scholar 

  • McKenzie CR, Takaoka S (2003) 2002: a LIMDEP odyssey. J Appl Econom 18(2):241–247

    Article  Google Scholar 

  • Murray W (1972) Failure, the causes and cures. In: Murray W (ed) Numerical methods for unconstrained optimization. Academic Press, New York, pp 107–122

    Google Scholar 

  • Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill Inc, New York

    Google Scholar 

  • National Institute of Standards and Technology (2014) Statistical reference datasets. http://www.itl.nist.gov/div898/strd. Accessed on 15 April 2014

  • Odeh OO, Featherstone AM, Bergtold JS (2010) Reliability of statistical software. Am J Agric Econ 92(5):1472–1479

    Article  Google Scholar 

  • Oster RA (2002) An examination of statistical software packages for categorical data analysis using exact methods. Am Stat 56(3):235–246

    Article  MathSciNet  Google Scholar 

  • Oster RA (2003) An examination of statistical software packages for categorical data analysis using exact methods–part ii. Am Stat 57(3):201–213

    Article  Google Scholar 

  • Ryan TP (2009) Modern regression models, 2nd edn. Wiley, Hoboken, NJ

    MATH  Google Scholar 

  • SAS Manual (2009) Sas/stat 13.2 user’s guide. http://support.sas.com/documentation/cdl/en/statug/67523/PDF/default/statug.pdf. Accessed on 25 Aug 2014

  • Scrucca L, Weisberg S (2004) A simulation study to investigate the behavior of the log-density ratio under normality. Commun Stat Simul Comput 33:159–178

    Article  MathSciNet  MATH  Google Scholar 

  • Stokes HH (2004) On the advantage of using two or more econometric software systems to solve the same problem. J Econ Soc Meas 29(1):307–320

    Google Scholar 

  • Tomek WG (1993) Confirmation and replication in empirical econometrics: a step toward improved scholarship. Am J Agric Econ 75(Special Issue):6–14

  • Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Wolfram (2015a), Unconstrained optimization: methods of local minimization. Wolfram language & system. Online documentation. http://reference.wolfram.com/language/tutorial/UnconstrainedOptimizationOverview.html. Last accessed 3 Dec 2015

  • Wolfram (2015a), Wolfram Mathematica tutorial collection. https://www.wolfram.com/learningcenter/tutorialcollection/complete/. Last accessed 3 Dec 2015

Download references

Acknowledgements

Partial support for this research was obtained from the National Science Foundation Grant: From Crops to Commuting: Integrating the Social, Technological, and Agricultural Aspects of Renewable and Sustainable Biorefining (I-STAR); NSF Award No. DGE-0903701. The analysis and conclusions set forth are those of the authors based on the independent assessments of statistical software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason S. Bergtold.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 78 KB)

Supplementary material 2 (zip 2,021 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bergtold, J.S., Pokharel, K.P., Featherstone, A.M. et al. On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables. Comput Stat 33, 757–786 (2018). https://doi.org/10.1007/s00180-017-0776-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0776-5

Keywords

Navigation