Abstract
In any data analysis we should look for ability to predict and for connections to a broader comparative context. Our equations must not predict absurdities, even under extreme circumstances, if we want to be taken seriously as scientists. Poorly done linear regression analysis often does lead to absurd predictions. Fixed exponent and exponential patterns seem more prevalent in social nature than linear patterns. Before applying regression to two variables, graph them against each other, showing the borders of the conceptually allowed space and possible logical anchor points. Transform the data until anchor points and data points do fit a straight line which does not pierce conceptual ceilings or floors. During regression, consider symmetric regression, because Ordinary Least Squares y-on-x and x-on-y differ from each other and their slopes depend on the degree of scatter. After regression, look at the numerical values of parameters and ask what they tell us in a comparative context. When considering multivariable regression, pay more than lip service to Occam's Razor.
Similar content being viewed by others
References
Coleman, S. (2007) ‘Testing theories with qualitative and quantitative predictions’, European Political Science 6 (2): 124–133.
Colomer, J.M. (2007) ‘What other sciences look like’, European Political Science 6 (2): 134–142.
Dalton, R.J. and Shin, D.C. (2006) Citizens, Democracy, and Markets Around the Pacific Rim: Congruence Theory and Political Culture, Oxford: Oxford University Press.
Grofman, B. (2007) ‘Toward a science of politics?’ European Political Science 6 (2): 143–155.
Kvålseth, T.O. (1985) ‘Cautionary note about R2’, The American Statistician 39: 279–285.
Levine, J.H. (1993) Exceptions Are the Rule: An Inquiry into Methods in the Social Sciences, Boulder, CO: Westview Press.
Lijphart, A. (1999) Patterns of Democracy: Government Forms and Performance in Thirty-Six Countries, New Haven, CT: Yale University Press.
Taagepera, R. (2007a) ‘Why political science is not scientific enough: a symposium’, European Political Science 6 (2): 111–113.
Taagepera, R. (2007b) ‘Predictive versus postdictive models’, European Political Science 6 (2): 114–123.
Taagepera, R. (2008) Making Social Sciences More Scientific: The Need for Predictive Models, Oxford: Oxford University Press.
Taagepera, R. (2009) ‘Logical models in social sciences: how to begin’, http://www.psych.ut.ee/stk/Beginners_Logical_Models.pdf.
Acknowledgements
I thank Allan Sikk, Mirjam Allik, Rune H. Andersen, Russ Dalton, Steve Coleman and two anonymous reviewers for thoughtful comments on the manuscript, and Rune also for finalizing the graphs.
Author information
Authors and Affiliations
APPENDIX
APPENDIX
TURNING CURVES INTO STRAIGHT LINES AND CALCULATING THE PARAMETERS
Unbounded field
If we expect a linear pattern y=a+bx because it is an unbounded field, no transformation is needed. Graph y versus x. If the data cloud is linear, then y=a+bx applies. Then we can regress y versus x.
How can we find the coefficients a and b, using the visual best-fit line?
-
Intercept a is the value of y where the line crosses the y axis (because here x=0.)
-
Slope b is the ratio −a/c, c being the value of x where the line crosses the x axis (because here y=0.)
How can we find the coefficient values in y=a+bx from two points?
-
Take two ‘typical’ points along the axis of the data belt, far away from each other: x1,y1 and x2,y2.
-
For y=a+bx we have b=(y1−y2)/(x1−x2). Then a=y1−bx1.
-
When a=0 is imposed, the equation is reduced to y=bx. Then b=y1/x1.
Only one quadrant allowed
If we expect a fixed exponent pattern y=Axk, because only one quadrant is allowed, taking logarithms leads to linear relationship between log y and log x: log y=log A+k log x. Designating log A as a takes us to the familiar linear form (log y)=a+k(log x). Graph log y versus log x. If the data cloud is linear, then y=Axk applies. Then we can regress log y versus log x.
How can we find the coefficients A and k in y=Axk, using the log-log graph?
-
Coefficient A is the value of y where the line crosses the log y axis (because here log x=0 and x=1).
-
Exponent k is the ratio −A/c, c being the value of log x where the line crosses the log x axis (because here log y=0.)
How can we find the coefficient values in y=Axk from two points on the curved graph y versus x?
-
Take two ‘typical’ points of the data belt, far away from each other: x1,y1 and x2,y2.
-
For y=Axk we have k=log(y1/y2)/log(x1/x2). Then A=y1/(x1k).
-
When A=1 is imposed, the equation is reduced to y=xk. Then k=log y1/log x1.
Only two quadrants allowed
If we expect an exponential pattern y=A(Bx), because only the positive-x quadrants are allowed, taking logarithms leads to linear relationship between log y and non-logged x: logy=log A+x(log B). Designating log A as a and log B as b takes us to the familiar linear form (log y)=a+bx. Graph log y versus x itself. If the data cloud is linear, then y=A(Bx) applies. Then we can regress log y versus x itself.
There are often reasons to use the alternative exponential expression y=A(ekx) and natural logarithms (ln). By definition ln e=1. Hence the logarithms are related as ln y=ln A+kx=a+kx. Graph ln y versus x itself. If the data cloud is linear, then y=Aekx applies. Then we can regress ln y versus x itself.
-
How do natural (ln x) and decimal (log x) logarithms relate? ln x=2.30log x and conversely, log x=0.434ln x. Often we can use either.
How can we find the coefficients in y=A(Bx) or y=Aekx, using the ‘semilog’ graph?
One may get confused between log and ln, so it is better to use the two-point formula below.
How can we find the coefficient values in y=A(Bx)=A(ekx) from two points on the curved graph y versus x?
-
Take two ‘typical’ points of the data belt, far away from each other: x1,y1 and x2,y2.
-
For y=A(Bx) we have log B=[log(y1/y2)]/(x1−x2). Then B=10logB and A=y1/(Bx1).
-
For y= A(ekx) we have k=[ln(y1/y2)]/(x1−x2). Then A=y1(e−kx1).
Rights and permissions
About this article
Cite this article
Taagepera, R. Adding Meaning to Regression. Eur Polit Sci 10, 73–85 (2011). https://doi.org/10.1057/eps.2010.28
Published:
Issue Date:
DOI: https://doi.org/10.1057/eps.2010.28