Skip to main content
Log in

Adding Meaning to Regression

  • Research
  • Published:
European Political Science Aims and scope Submit manuscript

Abstract

In any data analysis we should look for ability to predict and for connections to a broader comparative context. Our equations must not predict absurdities, even under extreme circumstances, if we want to be taken seriously as scientists. Poorly done linear regression analysis often does lead to absurd predictions. Fixed exponent and exponential patterns seem more prevalent in social nature than linear patterns. Before applying regression to two variables, graph them against each other, showing the borders of the conceptually allowed space and possible logical anchor points. Transform the data until anchor points and data points do fit a straight line which does not pierce conceptual ceilings or floors. During regression, consider symmetric regression, because Ordinary Least Squares y-on-x and x-on-y differ from each other and their slopes depend on the degree of scatter. After regression, look at the numerical values of parameters and ask what they tell us in a comparative context. When considering multivariable regression, pay more than lip service to Occam's Razor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

References

  • Coleman, S. (2007) ‘Testing theories with qualitative and quantitative predictions’, European Political Science 6 (2): 124–133.

    Article  Google Scholar 

  • Colomer, J.M. (2007) ‘What other sciences look like’, European Political Science 6 (2): 134–142.

    Article  Google Scholar 

  • Dalton, R.J. and Shin, D.C. (2006) Citizens, Democracy, and Markets Around the Pacific Rim: Congruence Theory and Political Culture, Oxford: Oxford University Press.

    Google Scholar 

  • Grofman, B. (2007) ‘Toward a science of politics?’ European Political Science 6 (2): 143–155.

    Article  Google Scholar 

  • Kvålseth, T.O. (1985) ‘Cautionary note about R2’, The American Statistician 39: 279–285.

    Google Scholar 

  • Levine, J.H. (1993) Exceptions Are the Rule: An Inquiry into Methods in the Social Sciences, Boulder, CO: Westview Press.

    Google Scholar 

  • Lijphart, A. (1999) Patterns of Democracy: Government Forms and Performance in Thirty-Six Countries, New Haven, CT: Yale University Press.

    Google Scholar 

  • Taagepera, R. (2007a) ‘Why political science is not scientific enough: a symposium’, European Political Science 6 (2): 111–113.

    Article  Google Scholar 

  • Taagepera, R. (2007b) ‘Predictive versus postdictive models’, European Political Science 6 (2): 114–123.

    Article  Google Scholar 

  • Taagepera, R. (2008) Making Social Sciences More Scientific: The Need for Predictive Models, Oxford: Oxford University Press.

    Book  Google Scholar 

  • Taagepera, R. (2009) ‘Logical models in social sciences: how to begin’, http://www.psych.ut.ee/stk/Beginners_Logical_Models.pdf.

Download references

Acknowledgements

I thank Allan Sikk, Mirjam Allik, Rune H. Andersen, Russ Dalton, Steve Coleman and two anonymous reviewers for thoughtful comments on the manuscript, and Rune also for finalizing the graphs.

Author information

Authors and Affiliations

Authors

APPENDIX

APPENDIX

TURNING CURVES INTO STRAIGHT LINES AND CALCULATING THE PARAMETERS

Unbounded field

If we expect a linear pattern y=a+bx because it is an unbounded field, no transformation is needed. Graph y versus x. If the data cloud is linear, then y=a+bx applies. Then we can regress y versus x.

How can we find the coefficients a and b, using the visual best-fit line?

  • Intercept a is the value of y where the line crosses the y axis (because here x=0.)

  • Slope b is the ratio −a/c, c being the value of x where the line crosses the x axis (because here y=0.)

How can we find the coefficient values in y=a+bx from two points?

  • Take two ‘typical’ points along the axis of the data belt, far away from each other: x1,y1 and x2,y2.

  • For y=a+bx we have b=(y1y2)/(x1x2). Then a=y1bx1.

  • When a=0 is imposed, the equation is reduced to y=bx. Then b=y1/x1.

Only one quadrant allowed

If we expect a fixed exponent pattern y=Axk, because only one quadrant is allowed, taking logarithms leads to linear relationship between log y and log x: log y=log A+k log x. Designating log A as a takes us to the familiar linear form (log y)=a+k(log x). Graph log y versus log x. If the data cloud is linear, then y=Axk applies. Then we can regress log y versus log x.

How can we find the coefficients A and k in y=Axk, using the log-log graph?

  • Coefficient A is the value of y where the line crosses the log y axis (because here log x=0 and x=1).

  • Exponent k is the ratio −A/c, c being the value of log x where the line crosses the log x axis (because here log y=0.)

How can we find the coefficient values in y=Axk from two points on the curved graph y versus x?

  • Take two ‘typical’ points of the data belt, far away from each other: x1,y1 and x2,y2.

  • For y=Axk we have k=log(y1/y2)/log(x1/x2). Then A=y1/(x1k).

  • When A=1 is imposed, the equation is reduced to y=xk. Then k=log y1/log x1.

Only two quadrants allowed

If we expect an exponential pattern y=A(Bx), because only the positive-x quadrants are allowed, taking logarithms leads to linear relationship between log y and non-logged x: logy=log A+x(log B). Designating log A as a and log B as b takes us to the familiar linear form (log y)=a+bx. Graph log y versus x itself. If the data cloud is linear, then y=A(Bx) applies. Then we can regress log y versus x itself.

There are often reasons to use the alternative exponential expression y=A(ekx) and natural logarithms (ln). By definition ln e=1. Hence the logarithms are related as ln y=ln A+kx=a+kx. Graph ln y versus x itself. If the data cloud is linear, then y=Aekx applies. Then we can regress ln y versus x itself.

  • How do natural (ln x) and decimal (log x) logarithms relate? ln x=2.30log x and conversely, log x=0.434ln x. Often we can use either.

How can we find the coefficients in y=A(Bx) or y=Aekx, using the ‘semilog’ graph?

One may get confused between log and ln, so it is better to use the two-point formula below.

How can we find the coefficient values in y=A(Bx)=A(ekx) from two points on the curved graph y versus x?

  • Take two ‘typical’ points of the data belt, far away from each other: x1,y1 and x2,y2.

  • For y=A(Bx) we have log B=[log(y1/y2)]/(x1x2). Then B=10logB and A=y1/(Bx1).

  • For y= A(ekx) we have k=[ln(y1/y2)]/(x1x2). Then A=y1(ekx1).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taagepera, R. Adding Meaning to Regression. Eur Polit Sci 10, 73–85 (2011). https://doi.org/10.1057/eps.2010.28

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/eps.2010.28

Keywords

Navigation