On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

Bergtold, Jason S.; Pokharel, Krishna P.; Featherstone, Allen M.; Mo, Lijia

doi:10.1007/s00180-017-0776-5

On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

Original Paper
Published: 17 November 2017

Volume 33, pages 757–786, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Jason S. Bergtold¹,
Krishna P. Pokharel²,
Allen M. Featherstone² &
…
Lijia Mo²

371 Accesses
5 Citations
Explore all metrics

Abstract

The numerical reliability of statistical software packages was examined for logistic regression models, including SAS 9.4, MATLAB R2015b, R 3.3.1., Stata/IC 14, and LIMDEP 10. Thirty unique benchmark datasets were created by simulating alternative conditional binary choice processes examining rare events, near-multicollinearity, quasi-separation and nonlinear transformation of variables. Certified benchmark estimates for parameters and standard errors of associated datasets were obtained following standards set-out by the National Institute of Standards and Technology. The logarithm of relative error was used as a measure of accuracy for numerical reliability. The paper finds that choice of software package and procedure for estimating logistic regressions will impact accuracy and use of default settings in these packages may significantly reduce reliability of results in different situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of model selection on predictor identification and parameter inference

Article Open access 22 October 2016

Ruth M. Pfeiffer, Andrew Redd & Raymond J. Carroll

Multiple Regression Analysis from Data Science Perspective

Regression Analysis and Its Development

Notes

In any empirical work, researchers do not know true values, thus cross validation of research results becomes critical for verifying the numerical reliability of estimates from nonlinear models.
Quasi-separation (also known as quasi complete separation) occurs when a collection of covariates can almost completely separate the outcome groups in the discrete choice model. That is, only a few observations are left that make the outcome groups overlap or in terms of discriminant analysis, the discriminant can almost perfectly delineate the outcome groups (Hosmer et al. 2013).

References

Altman M, Gill J, McDonald MP (2004) Numerical issues in statistical computing for the social scientist. Wiley, New York
MATH Google Scholar
Arnold BC, Castillo E, Sarabia JM (1999) Conditional specification of statistical models. Springer, New York
MATH Google Scholar
Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms. Wiley, Hoboken
Book MATH Google Scholar
Bergtold JS, Spanos A, Onukwugha E (2010) Bernoulli regression models: revisiting the specification of statistical models with binary dependent variables. J Choice Model 3(2):1–28
Article Google Scholar
Cameron AC, Trivedi PK (2009) Microeconometrics: methods and applications. Cambridge University Press, New York
MATH Google Scholar
Chang JB, Lusk JL (2011) Mixed logit models: accuracy and software choice. J Appl Econom 26(1):167–172
Article Google Scholar
Core Team R (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Econometric Software, Inc. (2012) Limdep 10 and Nlogit 5. http://www.limdep.com/features/documentation.php. Accessed on 15 Aug 2015
Greene WH (2002) Econometric analysis. Prentice, Englewood Cliffs
Google Scholar
Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, Hoboken
Book MATH Google Scholar
Huber J, Train K (2001) On the similarity of classical and Bayesian estimates of individual mean partworths. Mark Lett 12(3):259–269
Article Google Scholar
Kay R, Little S (1987) Transformations of the explanatory variables in the logistic regression models for binary data. Biometrika 74:495–501
Article MathSciNet MATH Google Scholar
Keeling KB, Pavur RJ (2007) A comparative study of the reliability of nine statistical software packages. Comput Stat Data Anal 51(8):3811–3831
Article MathSciNet MATH Google Scholar
Kolenikov S (2001) Review of Stata 7. J Appl Econom 16(5):637–646
Article Google Scholar
McCullough B (2000a) The accuracy of Mathematica 4 as a statistical package. Comput Stat 15(2):279–300
Article MATH Google Scholar
McCullough B (2000b) Is it safe to assume that software is accurate? Int J Forecast 16(3):349–357
Article MathSciNet Google Scholar
McCullough BD, Renfro CG (1998) Benchmarks and software standards: a case study of GARCH procedures. J Econ Soc Meas 25:59–71
Google Scholar
McCullough B, Renfro CG (2000) Some numerical aspects of nonlinear estimation. J Econ Soc Meas 26(1):63–77
Google Scholar
McCullough BD (1998) Assessing the reliability of statistical software: part I. Am Stat 52(4):358–366
Google Scholar
McCullough BD (1999a) Assessing the reliability of statistical software: part II. Am Stat 53(2):149–159
Google Scholar
McCullough BD (1999b) Econometric software reliability: Eviews, Limdep. Shazam and Tsp. J Appl Econom 14(2):191–202
Article Google Scholar
McCullough BD, Vinod HD (1999) The numerical reliability of econometric software. J Econ Lit 37:633–665
Article Google Scholar
McCullough BD, Vinod HD (2003) Verifying the solution from a nonlinear solver: a case study. Am Econ Rev 93(3):873–892
Article Google Scholar
McCullough BD, Wilson B (1999) On the accuracy of statistical procedures in Microsoft excel 97. Comput Stat Data Anal 31(1):27–37
Article MATH Google Scholar
McKenzie CR, Takaoka S (2003) 2002: a LIMDEP odyssey. J Appl Econom 18(2):241–247
Article Google Scholar
Murray W (1972) Failure, the causes and cures. In: Murray W (ed) Numerical methods for unconstrained optimization. Academic Press, New York, pp 107–122
Google Scholar
Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill Inc, New York
Google Scholar
National Institute of Standards and Technology (2014) Statistical reference datasets. http://www.itl.nist.gov/div898/strd. Accessed on 15 April 2014
Odeh OO, Featherstone AM, Bergtold JS (2010) Reliability of statistical software. Am J Agric Econ 92(5):1472–1479
Article Google Scholar
Oster RA (2002) An examination of statistical software packages for categorical data analysis using exact methods. Am Stat 56(3):235–246
Article MathSciNet Google Scholar
Oster RA (2003) An examination of statistical software packages for categorical data analysis using exact methods–part ii. Am Stat 57(3):201–213
Article Google Scholar
Ryan TP (2009) Modern regression models, 2nd edn. Wiley, Hoboken, NJ
MATH Google Scholar
SAS Manual (2009) Sas/stat 13.2 user’s guide. http://support.sas.com/documentation/cdl/en/statug/67523/PDF/default/statug.pdf. Accessed on 25 Aug 2014
Scrucca L, Weisberg S (2004) A simulation study to investigate the behavior of the log-density ratio under normality. Commun Stat Simul Comput 33:159–178
Article MathSciNet MATH Google Scholar
Stokes HH (2004) On the advantage of using two or more econometric software systems to solve the same problem. J Econ Soc Meas 29(1):307–320
Google Scholar
Tomek WG (1993) Confirmation and replication in empirical econometrics: a step toward improved scholarship. Am J Agric Econ 75(Special Issue):6–14
Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge
Book MATH Google Scholar
Wolfram (2015a), Unconstrained optimization: methods of local minimization. Wolfram language & system. Online documentation. http://reference.wolfram.com/language/tutorial/UnconstrainedOptimizationOverview.html. Last accessed 3 Dec 2015
Wolfram (2015a), Wolfram Mathematica tutorial collection. https://www.wolfram.com/learningcenter/tutorialcollection/complete/. Last accessed 3 Dec 2015

Download references

Acknowledgements

Partial support for this research was obtained from the National Science Foundation Grant: From Crops to Commuting: Integrating the Social, Technological, and Agricultural Aspects of Renewable and Sustainable Biorefining (I-STAR); NSF Award No. DGE-0903701. The analysis and conclusions set forth are those of the authors based on the independent assessments of statistical software.

Author information

Authors and Affiliations

Department of Agricultural Economics, Kansas State University, 307 Waters Hall, Manhattan, KS, 66506-4011, USA
Jason S. Bergtold
Department of Agricultural Economics, Kansas State University, 342 Waters Hall, Manhattan, KS, 66506-4011, USA
Krishna P. Pokharel, Allen M. Featherstone & Lijia Mo

Authors

Jason S. Bergtold
View author publications
You can also search for this author in PubMed Google Scholar
Krishna P. Pokharel
View author publications
You can also search for this author in PubMed Google Scholar
Allen M. Featherstone
View author publications
You can also search for this author in PubMed Google Scholar
Lijia Mo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason S. Bergtold.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 78 KB)

Supplementary material 2 (zip 2,021 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bergtold, J.S., Pokharel, K.P., Featherstone, A.M. et al. On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables. Comput Stat 33, 757–786 (2018). https://doi.org/10.1007/s00180-017-0776-5

Download citation

Received: 10 May 2016
Accepted: 03 November 2017
Published: 17 November 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s00180-017-0776-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

Abstract

Access this article

Similar content being viewed by others

On the impact of model selection on predictor identification and parameter inference

Multiple Regression Analysis from Data Science Perspective

Regression Analysis and Its Development

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (docx 78 KB)

Supplementary material 2 (zip 2,021 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables

Abstract

Access this article

Similar content being viewed by others

On the impact of model selection on predictor identification and parameter inference

Multiple Regression Analysis from Data Science Perspective

Regression Analysis and Its Development

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (docx 78 KB)

Supplementary material 2 (zip 2,021 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation