Abstract
Ordinary least squares regression and ‘product-moment’ correlation are the most commonly used statistical tools for analysing cross-national and other socio-economic indicator data. However, their use depends on assumptions that may not be plausible when applied to such data. Moreover, the use of squared deviations in formulas leads to an exaggerated influence of outliers. In this paper, an alternative methodology based on the ratio of absolute deviations is considered, and a simulation study is presented to evaluate its robustness against outliers and departures from normality. The results show that this methodology is very resistant and has a higher breakdown point than the traditional methods.
Similar content being viewed by others
Notes
Note that a correlation analysis is conventionally applied when both variables are observed, whereas a linear regression should be used when the values of the independent variable are considered known constants (e.g., values chosen by the researcher in an experimental protocol) (Schober et al. 2018).
Whether they be individual indicators (e.g., “GDP per capita”) or composite indicators (e.g., the Human Development Index) (Sharpe 2004).
For example, Li and Frank (2020) proposed a method to quantify the robustness of a causal inference in an observational study by calculating the probability of a robust inference for internal validity (PIV). The PIV is only meaningful if the null hypothesis (i.e., H0: the slope of the regression line equals 0), is rejected based on the observed sample.
Macro data are data derived from individual level data by statistics on groups or aggregates, such as counts, means, or frequencies.
McGranahan et al. (1985) suggested not to use the term ‘error’ in connection with country (or any other geographic area) values on development indicators.
A median line is a line that minimizes the sum of absolute deviations on one of the variables, rather than the sum of squared deviations (Nievergelt 2012).
Note that (2) gives results closer in general magnitude to r than formula \(\frac{{SAE^{ - } - SAE^{ + } }}{{SAE^{ - } + SAE^{ + } }}\), which yields considerably lower correlations and reacts somewhat more strongly to extension of range (McGranahan et al. 1985).
Symmetric regression is used to specify an intrinsic functional relationship between X and Y, when the choice of the predictor is unclear, arbitrary, or ambiguous, and when both variables are measured with error (von Eye and Schuster 1998).
For example, a line halfway between the two ‘extreme’ lines may be used.
Note that both the BFL and its CL leave an equal number of points above and below them.
An outlier is an observation that appears to deviate markedly from the other members of the sample in which it occurs (i.e., it is inconsistent with the rest of the data, relative to an assumed model). Such extreme observation may be reflecting an abnormality in the measured characteristic, or it may result from an error in the measurement or recording (Everitt and Skrondal 2010).
We say that a distribution is bivariate exponential (uniform) when the univariate marginal distributions are all exponential (uniform) (Devroye 1986).
This is a bivariate normal distribution with parameters µX = 3, µY = 5, σX = 1, σY = 1 and ρ = 1. Thus, we have: yi = xi + 2 (i = 1, …, 100), and the slope of the regression line is β = 1.
The Johnson-Ramberg bivariate uniform family was considered.
The trivariate reduction method for bivariate gamma distribution was used.
References
Abdullah, M.B.: On a robust correlation coefficient. Statistician 39, 455–460 (1990)
Alaimo, L.S., Maggino, F.: Sustainable development goals indicators at territorial level: conceptual and methodological issues - the Italian perspective. Soc. Indic. Res. 147, 383–419 (2020)
Bargiela, A., Hartley, J.K.: Orthogonal linear regression algorithm based on augmented matrix formulation. Computers Ops. Res. 20, 829–836 (1993)
Barrington-Leigh, C., Escande, A.: Measuring progress and well-being: a comparative review of indicators. Soc. Indic. Res. 135, 893–925 (2018)
Birkes, D., Dodge, Y.: Alternative Methods of Regression. Wiley, New York (1993)
Bloomfield, P., Steiger, W.L.: Least Absolute Deviations. Theory, Applications, and Algorithms, Birkhäuser, Boston (1983)
Carsey, T.M., Harden, J.J.: Monte Carlo Simulation and Resampling Methods for Social Science. SAGE, Los Angeles (2014)
Daniel, C., Wood, F.S.: Fitting Equations to Data. Wiley, New York (1971)
DeCatanzaro, D., Taylor, J.C.: The Scaling of dispersion and correlation: a comparison of least-squares and absolute-deviations statistics. Br. J. Math. Stat. Psychol. 49, 171–188 (1996)
Devroye, L.: Non-Uniform Random Variate Generation. Springer-Verlag, New York (1986)
Dietz, T., Scott Frey, R., Kalof, L.: Estimation with cross-national data: robust and nonparametric methods. Am. Sociol. Rev. 52, 380–390 (1987)
Draper, N.R., Smith, H.: Applied Regression Analysis, 2nd edn. John Wiley and Sons Interscience Publication, New York (1981)
Everitt, B.S., Skrondal, A.: The Cambridge Dictionary of Statistics, 4th edn. Cambridge University Press, New York (2010)
Farebrother, R.W.: A simple recursive procedure for the L1 norm fitting of a straight line. Appl. Stat. 37, 457–489 (1988)
Gentle, J.E.: Least absolute values estimation: an introduction. Commun. Stat. Simula. Computa. B6, 313–328 (1977)
Harter, H.L.: Nonuniqueness of least absolute values regression. Commun. Stat. Theor. Meth. A6, 829–838 (1977)
Imai, K., King, G., Stuart, E.A.: Misunderstandings between experimentalists and observationalists about causal inference. J. R. Stat. Soc. a. Stat. Soc. 171, 481–502 (2008)
Karst, O.J.: Linear curve fitting using least deviations. J. Am. Stat. Assoc. 53, 118–132 (1958)
Kowalski, C.J.: On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. J. R. Stat. Soc. 21, 1–12 (1972)
Lewis, P.A.W., Orav, E.J.: Simulation Methodology for Statisticians. Operations Analysts, and Engineers, Wadsworth & Brooks/Cole, Pacific Grove (1989)
Li, T., Frank, K.: The probability of a robust inference for internal validity. Sociol. Methods Res. (2020). https://doi.org/10.1177/0049124120914922
Li, C.-N., Shao, Y.-H., Deng, N.-Y.: Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw. 65, 92–104 (2015)
McGranahan, D., Richard-Proust, C.: Methods of Estimation and Prediction in Socioeconomic Development: Regression and the Best-Fitting Line. United Nations Research Institute for Social Development, Geneva (1973)
McGranahan, D., Pizarro, E., Richard, C.: Measurement and Analysis of Socioeconomic Development. United Nations Research Institute for Social Development, Geneva (1985)
McGranahan, D.: Development indicators and development models. In: UNRISD. UNRISD CLASSICS VOL. I – Social Policy and Inclusive Development, pp. 19–30. United Nations Research Institute for Social Development, Geneva (2015)
Megiddo, N., Tamir, A.: Finding least-distances lines. SIAM J. Alg. Disc. Meth. 4, 207–211 (1983)
Mishra, S.K.: Construction of composite indices in presence of outliers. SSRN Electr. J. (2008). https://ssrn.com/abstract=1137644
Nievergelt, Y.: Real and generic data without unconstrained best-fitting Verhulst curves and sufficient conditions for median Mitscherlich and Verhulst curves to exist. Am. Math. Mon. 119, 211–234 (2012)
Nyquist, H.: Least orthogonal absolute deviations. Comput. Stat. Data Anal. 6, 361–367 (1988)
Nyquist, H.: Orthogonal L1-norm estimation. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 171–181. Birkhäuser, Berlin (2002)
OECD: Beyond GDP: Measuring what counts for economic and social performance. OECD Publishing, Paris (2018)
Pareto, A.: A new look at the correlation coefficient: correlation as the difference-sum ratio of SSEs. Commun. Stat. Theor. Methods (2021). https://doi.org/10.1080/03610926.2021.1961153
Pasman, V.R., Shevlyakov, G.L.: Robust methods of estimation of a correlation coefficient. Autom. Remote. Control. 27, 70–80 (1987)
Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42, 59–66 (1988)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Schlossmacher, E.J.: An iterative technique for absolute deviations curve fitting. J. Am. Stat. Assoc. 68, 857–859 (1973)
Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesth. Analg. 126, 1763–1768 (2018)
Sharpe, A.: Literature Review of Frameworks for Macro-indicators. Centre for the Study of Living Standards, Ottawa (2004)
Shevlyakov, G.L., Vilchevski, N.O.: Robustness in Data Analysis: Criteria and Methods. VSP, Utrecht (2002)
Tofallis, C.: Robust correlation measures. SSRN Electr. J. (2007). https://doi.org/10.2139/ssrn.2261450
Von Eye, A., Schuster, C.: Regression Analysis for Social Sciences. Academic Press, San Diego (1998)
Wald, A.: The fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11, 284–300 (1940)
Wonnacott, R.J., Wonnacott, T.H.: Econometrics. John Wiley and Sons, New York (1979)
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pareto, A. A robust method for regression and correlation analysis of socio-economic indicators. Qual Quant 57, 5035–5053 (2023). https://doi.org/10.1007/s11135-022-01599-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-022-01599-z