Skip to main content
Log in

A two-stage Bridge estimator for regression models with endogeneity based on control function method

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this study, we investigate a penalty-based two-stage least square estimator in regression models when the exploratory variables are correlated with the error term. We propose a two-stage Bridge estimator to overcome this endogeneity problem in high-dimensional data. Our proposed estimator enjoys remarkable statistical properties such as consistency and asymptotic normality. As special cases, this method deals some ill-condition situations such as the multicollinearity as well as the sparsity. Performance of the proposed estimators is demonstrated by simulation studies and it is compared to the existing estimators. An application in real data set is presented for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. We use the R software and all codes are available upon request.

References

  • Anderson TW (2005) Origins of the limited information maximum likelihood and two-stage least squares estimators. J Econom 127(1):1–16

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V, Chetverikov D, Hansen CB, Kato K (2018) High-dimensional econometrics and regularized GMM, arXiv preprint, arxiv:1806.01888

  • Bowden R, Turkington D (1984) Instrumental variables. Cambridge University Press, New York

    Google Scholar 

  • Burgess S, Small DS (2016) Predicting the direction of causal effect based on an instrumental variable analysis: a cautionary tale. J Causal Infer 4(1):49–59

    Article  MathSciNet  Google Scholar 

  • Burgess S, Small DS, Thompson SG (2017) A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26(5):2333–2355

    Article  MathSciNet  Google Scholar 

  • Chicco D, Tötsch N, Jurman G (2021) The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14(1):1–22

    Article  Google Scholar 

  • Didelez V, Sheehan N (2007) Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res 16(4):309–330

    Article  MathSciNet  Google Scholar 

  • Durbin J (1954) Errors in variables. Revue de l’institut Int de Stat 1:23–32

    Article  MathSciNet  Google Scholar 

  • Ebbes P (2004) Latent instrumental variables—A new approach to solve for endogeneity. University of Groningen Economics and Business, Netherlands

    Google Scholar 

  • Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92(438):548–560

    MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via non concave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  Google Scholar 

  • Fan J, Liao Y (2014) Endogeneity in high dimensions. Ann Stat 42(3):872

    Article  MathSciNet  Google Scholar 

  • Ferguson TS (2017) A course in large sample theory. Routledge, UK

    Book  Google Scholar 

  • Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Google Scholar 

  • Fu W, Knight K (2000) Asymptotic for LASSO-type estimators. Ann Stat 28(5):1356–1378

    Article  MathSciNet  Google Scholar 

  • Gao X, Ahmed SE, Feng Y (2017) Post selection shrinkage estimation for high-dimensional data analysis. Appl Stoch Model Bus Ind 33(2):97–120

    Article  MathSciNet  Google Scholar 

  • Gautier E, Tsybakov AB (2018) High-dimensional instrumental variables regression and confidence sets, arXiv preprint, arxiv:1105.2454

  • Guo Z, Kang H, Cai TT, Small DS (2016) Testing endogeneity with possibly invalid instruments and high dimensional covariates. arXiv preprint arXiv:1609.06713

  • Hausman J (1978) Specification tests in econometrics. Econometrica 46(6):1251–1271

    Article  MathSciNet  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  Google Scholar 

  • Hunter D, Li R (2005) Variable selection using mm algorithms. Ann Stat 33:1617–1642 (MR2166557)

    Article  MathSciNet  Google Scholar 

  • Lawlor DA, Harbord RM, Sterne JA, Timpson N, Smith GD (2008) Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 27(8):1133–1163

    Article  MathSciNet  Google Scholar 

  • Lin W, Feng R, Li H (2015) Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. J Am Stat Assoc 110(509):270–288

    Article  MathSciNet  Google Scholar 

  • Liu Z (2017) Statistical models to predict popularity of news articles on social networks

  • Liu H, Yu B (2013) Asymptotic properties of LASSO+ mLS and LASSO+ ridge in sparse high-dimensional linear regression. Electron J Stat 7:3124–3169

    Article  MathSciNet  Google Scholar 

  • Liu XQ, Gao F, Yu ZF (2013) Improved Ridge estimators in a linear regression model. J Appl Stat 40(1):209–220

    Article  MathSciNet  Google Scholar 

  • Lukman AF, Ayinde K, Binuomote S, Onate AC (2019) Modified Ridge-type estimator to combat multicollinearity: application to chemical data. J Chemom 33(5):e3125

    Article  Google Scholar 

  • Lukman AF, Ayinde K, Siok Kun S, Adewuyi ET (2019) A modified new two-parameter estimator in a linear regression model. Modell Simul Eng 2019:6342702

    Google Scholar 

  • Mesiar R, Sheikhi A (2021) Nonlinear random forest classification, a copula-based approach. Appl Sci 11(15):7140

    Article  Google Scholar 

  • Okbay A et al (2016) Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533(7604):539–542

    Article  Google Scholar 

  • Rietveld CA et al (2013) GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340:1467–1471

    Article  Google Scholar 

  • Sheikhi A, Bahador F, Arashi M (2020) On a generalization of the test of endogeneity in a two stage least squares estimation. J Appl Stat 49(3):709–721

    Article  MathSciNet  Google Scholar 

  • Sonnega A, Faul JD, Ofstedal MB, Langa KM, Phillips JW, Weir DR (2014) Cohort profile: the health and retirement study (HRS). Int J Epidemiol 43(2):576–585

    Article  Google Scholar 

  • Tibshirani T (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B 58(1):267–288

    MathSciNet  Google Scholar 

  • WooldRidge JM (2016) Introductory econometrics: a modern approach, 6th edn. Cengage Learning, Boston

    Google Scholar 

  • Wu DM (1973) Alternative tests of independence between stochastic regressors and disturbances. J Economet 41:733

    Article  MathSciNet  Google Scholar 

  • Xu X, Li X, Zhang J (2020) Regularization methods for high-dimensional sparse control function models. J Stat Plann Inferf 206:111–126

    Article  MathSciNet  Google Scholar 

  • Yüzbası B, Arashi M, Ejaz Ahmed S (2020) Shrinkage estimation strategies in generalised ridge regression models: low/high-dimension regime. Int Stat Rev Apr 88(1):229–51

    Article  MathSciNet  Google Scholar 

  • Zhang CH (2007) Penalized linear unbiased selection department of statistics and bioinformatics. Rutgers Univ 3(2010):894–942

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayyub Sheikhi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bahador, F., Sheikhi, A. & Arabpour, A. A two-stage Bridge estimator for regression models with endogeneity based on control function method. Comput Stat 39, 1351–1370 (2024). https://doi.org/10.1007/s00180-023-01379-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01379-9

Keywords

Navigation