Skip to main content
Log in

A robust method for regression and correlation analysis of socio-economic indicators

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Ordinary least squares regression and ‘product-moment’ correlation are the most commonly used statistical tools for analysing cross-national and other socio-economic indicator data. However, their use depends on assumptions that may not be plausible when applied to such data. Moreover, the use of squared deviations in formulas leads to an exaggerated influence of outliers. In this paper, an alternative methodology based on the ratio of absolute deviations is considered, and a simulation study is presented to evaluate its robustness against outliers and departures from normality. The results show that this methodology is very resistant and has a higher breakdown point than the traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Note that a correlation analysis is conventionally applied when both variables are observed, whereas a linear regression should be used when the values of the independent variable are considered known constants (e.g., values chosen by the researcher in an experimental protocol) (Schober et al. 2018).

  2. Whether they be individual indicators (e.g., “GDP per capita”) or composite indicators (e.g., the Human Development Index) (Sharpe 2004).

  3. For example, Li and Frank (2020) proposed a method to quantify the robustness of a causal inference in an observational study by calculating the probability of a robust inference for internal validity (PIV). The PIV is only meaningful if the null hypothesis (i.e., H0: the slope of the regression line equals 0), is rejected based on the observed sample.

  4. Macro data are data derived from individual level data by statistics on groups or aggregates, such as counts, means, or frequencies.

  5. McGranahan et al. (1985) suggested not to use the term ‘error’ in connection with country (or any other geographic area) values on development indicators.

  6. A median line is a line that minimizes the sum of absolute deviations on one of the variables, rather than the sum of squared deviations (Nievergelt 2012).

  7. Note that (2) gives results closer in general magnitude to r than formula \(\frac{{SAE^{ - } - SAE^{ + } }}{{SAE^{ - } + SAE^{ + } }}\), which yields considerably lower correlations and reacts somewhat more strongly to extension of range (McGranahan et al. 1985).

  8. Symmetric regression is used to specify an intrinsic functional relationship between X and Y, when the choice of the predictor is unclear, arbitrary, or ambiguous, and when both variables are measured with error (von Eye and Schuster 1998).

  9. For example, a line halfway between the two ‘extreme’ lines may be used.

  10. Note that both the BFL and its CL leave an equal number of points above and below them.

  11. An outlier is an observation that appears to deviate markedly from the other members of the sample in which it occurs (i.e., it is inconsistent with the rest of the data, relative to an assumed model). Such extreme observation may be reflecting an abnormality in the measured characteristic, or it may result from an error in the measurement or recording (Everitt and Skrondal 2010).

  12. We say that a distribution is bivariate exponential (uniform) when the univariate marginal distributions are all exponential (uniform) (Devroye 1986).

  13. This is a bivariate normal distribution with parameters µX = 3, µY = 5, σX = 1, σY = 1 and ρ = 1. Thus, we have: yi = xi + 2 (i = 1, …, 100), and the slope of the regression line is β = 1.

  14. The Johnson-Ramberg bivariate uniform family was considered.

  15. The trivariate reduction method for bivariate gamma distribution was used.

References

  • Abdullah, M.B.: On a robust correlation coefficient. Statistician 39, 455–460 (1990)

    Article  Google Scholar 

  • Alaimo, L.S., Maggino, F.: Sustainable development goals indicators at territorial level: conceptual and methodological issues - the Italian perspective. Soc. Indic. Res. 147, 383–419 (2020)

    Article  Google Scholar 

  • Bargiela, A., Hartley, J.K.: Orthogonal linear regression algorithm based on augmented matrix formulation. Computers Ops. Res. 20, 829–836 (1993)

    Article  Google Scholar 

  • Barrington-Leigh, C., Escande, A.: Measuring progress and well-being: a comparative review of indicators. Soc. Indic. Res. 135, 893–925 (2018)

    Article  Google Scholar 

  • Birkes, D., Dodge, Y.: Alternative Methods of Regression. Wiley, New York (1993)

    Book  Google Scholar 

  • Bloomfield, P., Steiger, W.L.: Least Absolute Deviations. Theory, Applications, and Algorithms, Birkhäuser, Boston (1983)

    Google Scholar 

  • Carsey, T.M., Harden, J.J.: Monte Carlo Simulation and Resampling Methods for Social Science. SAGE, Los Angeles (2014)

    Book  Google Scholar 

  • Daniel, C., Wood, F.S.: Fitting Equations to Data. Wiley, New York (1971)

    Google Scholar 

  • DeCatanzaro, D., Taylor, J.C.: The Scaling of dispersion and correlation: a comparison of least-squares and absolute-deviations statistics. Br. J. Math. Stat. Psychol. 49, 171–188 (1996)

    Article  Google Scholar 

  • Devroye, L.: Non-Uniform Random Variate Generation. Springer-Verlag, New York (1986)

    Book  Google Scholar 

  • Dietz, T., Scott Frey, R., Kalof, L.: Estimation with cross-national data: robust and nonparametric methods. Am. Sociol. Rev. 52, 380–390 (1987)

    Article  Google Scholar 

  • Draper, N.R., Smith, H.: Applied Regression Analysis, 2nd edn. John Wiley and Sons Interscience Publication, New York (1981)

    Google Scholar 

  • Everitt, B.S., Skrondal, A.: The Cambridge Dictionary of Statistics, 4th edn. Cambridge University Press, New York (2010)

    Book  Google Scholar 

  • Farebrother, R.W.: A simple recursive procedure for the L1 norm fitting of a straight line. Appl. Stat. 37, 457–489 (1988)

    Article  Google Scholar 

  • Gentle, J.E.: Least absolute values estimation: an introduction. Commun. Stat. Simula. Computa. B6, 313–328 (1977)

    Article  Google Scholar 

  • Harter, H.L.: Nonuniqueness of least absolute values regression. Commun. Stat. Theor. Meth. A6, 829–838 (1977)

    Article  Google Scholar 

  • Imai, K., King, G., Stuart, E.A.: Misunderstandings between experimentalists and observationalists about causal inference. J. R. Stat. Soc. a. Stat. Soc. 171, 481–502 (2008)

    Article  Google Scholar 

  • Karst, O.J.: Linear curve fitting using least deviations. J. Am. Stat. Assoc. 53, 118–132 (1958)

    Article  Google Scholar 

  • Kowalski, C.J.: On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. J. R. Stat. Soc. 21, 1–12 (1972)

    Google Scholar 

  • Lewis, P.A.W., Orav, E.J.: Simulation Methodology for Statisticians. Operations Analysts, and Engineers, Wadsworth & Brooks/Cole, Pacific Grove (1989)

    Google Scholar 

  • Li, T., Frank, K.: The probability of a robust inference for internal validity. Sociol. Methods Res. (2020). https://doi.org/10.1177/0049124120914922

    Article  Google Scholar 

  • Li, C.-N., Shao, Y.-H., Deng, N.-Y.: Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw. 65, 92–104 (2015)

    Article  Google Scholar 

  • McGranahan, D., Richard-Proust, C.: Methods of Estimation and Prediction in Socioeconomic Development: Regression and the Best-Fitting Line. United Nations Research Institute for Social Development, Geneva (1973)

    Google Scholar 

  • McGranahan, D., Pizarro, E., Richard, C.: Measurement and Analysis of Socioeconomic Development. United Nations Research Institute for Social Development, Geneva (1985)

    Google Scholar 

  • McGranahan, D.: Development indicators and development models. In: UNRISD. UNRISD CLASSICS VOL. I – Social Policy and Inclusive Development, pp. 19–30. United Nations Research Institute for Social Development, Geneva (2015)

  • Megiddo, N., Tamir, A.: Finding least-distances lines. SIAM J. Alg. Disc. Meth. 4, 207–211 (1983)

    Article  Google Scholar 

  • Mishra, S.K.: Construction of composite indices in presence of outliers. SSRN Electr. J. (2008). https://ssrn.com/abstract=1137644

  • Nievergelt, Y.: Real and generic data without unconstrained best-fitting Verhulst curves and sufficient conditions for median Mitscherlich and Verhulst curves to exist. Am. Math. Mon. 119, 211–234 (2012)

    Article  Google Scholar 

  • Nyquist, H.: Least orthogonal absolute deviations. Comput. Stat. Data Anal. 6, 361–367 (1988)

    Article  Google Scholar 

  • Nyquist, H.: Orthogonal L1-norm estimation. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 171–181. Birkhäuser, Berlin (2002)

    Chapter  Google Scholar 

  • OECD: Beyond GDP: Measuring what counts for economic and social performance. OECD Publishing, Paris (2018)

    Google Scholar 

  • Pareto, A.: A new look at the correlation coefficient: correlation as the difference-sum ratio of SSEs. Commun. Stat. Theor. Methods (2021). https://doi.org/10.1080/03610926.2021.1961153

    Article  Google Scholar 

  • Pasman, V.R., Shevlyakov, G.L.: Robust methods of estimation of a correlation coefficient. Autom. Remote. Control. 27, 70–80 (1987)

    Google Scholar 

  • Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42, 59–66 (1988)

    Article  Google Scholar 

  • Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  Google Scholar 

  • Schlossmacher, E.J.: An iterative technique for absolute deviations curve fitting. J. Am. Stat. Assoc. 68, 857–859 (1973)

    Article  Google Scholar 

  • Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesth. Analg. 126, 1763–1768 (2018)

    Article  Google Scholar 

  • Sharpe, A.: Literature Review of Frameworks for Macro-indicators. Centre for the Study of Living Standards, Ottawa (2004)

    Google Scholar 

  • Shevlyakov, G.L., Vilchevski, N.O.: Robustness in Data Analysis: Criteria and Methods. VSP, Utrecht (2002)

    Google Scholar 

  • Tofallis, C.: Robust correlation measures. SSRN Electr. J. (2007). https://doi.org/10.2139/ssrn.2261450

    Article  Google Scholar 

  • Von Eye, A., Schuster, C.: Regression Analysis for Social Sciences. Academic Press, San Diego (1998)

    Google Scholar 

  • Wald, A.: The fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11, 284–300 (1940)

    Article  Google Scholar 

  • Wonnacott, R.J., Wonnacott, T.H.: Econometrics. John Wiley and Sons, New York (1979)

    Google Scholar 

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adriano Pareto.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pareto, A. A robust method for regression and correlation analysis of socio-economic indicators. Qual Quant 57, 5035–5053 (2023). https://doi.org/10.1007/s11135-022-01599-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-022-01599-z

Keywords

Navigation