Social Imaginaries and Econometrics for Education Policy
A notable feature in the evolution of economics in the past 50 years has been the emergence of “mainstream” economics. This is a term adopted largely by its critics in the economics community to highlight the dominance of a pared-down approach reliant on abstract theory, primarily neoclassical economics. Economists have also come to focus principally on quantitative research methods, with one in particular, econometrics being overwhelmingly dominant. Not surprisingly, then, economists have encouraged the use of econometrics in the analysis of the teaching of economics, or, as they refer to it, the study of economic education. The American Economic Association has produced an “Online Handbook for the Use of Contemporary Econometrics in Economic Education Research,” developed by Becker, who is also coeditor of a book on the subject. The dominance of the technique is made clear in the preface to the book. There MacDowell and Highsmith suggest that its coverage of model building, simultaneous equations, and qualitative response analysis offers inexperienced researchers “a comprehensive coverage of the basic statistical estimation and testing procedures required for the evaluation of learning” (Becker and Walstad 1987, p. xii, emphasis added). It is implicitly assumed that the techniques are appropriate.
It is not only in economics education but also in education and education policy more broadly that these techniques have taken hold. Hanushek in particular has published extensively in this area and econometrics is increasingly being used for policy-focused analysis in education.
Ideas always come in history wrapped up in certain practices, even if these are only discursive practices. But the motivations that drive toward the adoption and spread of these packages may be very varied.
The practice of econometrics involves the acceptance of generally unstated ideas about the nature of variables and the relationships between them. The interpretation of results involves assumptions about the nature of causality and policy decisions. So what drove the development of these practices and the adoption of these approaches?
In econometrics, there has been simultaneous development of computer-based packages and electronic databases. Over time, this meant that the techniques have become accessible to an increasing number of analysts, the required skill level has fallen, and conventions have developed. Initially tentative assumptions have evolved into firm foundations required for the application of the techniques. These are now so familiar that they are unquestioned by all but a small minority. In other words, “…the theory is schematized in the dense sphere of common practice” (Taylor 2004, p. 30).
Mainstream neoclassical microeconomic theory is built on a foundation that takes universal perfect competition as an ideal. Its suitability is not questioned in standard textbooks and research papers, although it is based on an unrealistic view of the world and a specific optimality criterion (Pareto optimality). Deviations from the ideal are considered to be “market failures” for which various remedies have been devised.
A parallel could be drawn for econometrics. Criteria for determining the relationship between selected variables, trying to find a “best fit,” were originally tentatively selected but have subsequently assumed great significance. There is now a concept of optimal estimation which can be achieved under “ideal” circumstances and corrections that can be made where the required assumptions are not met. Consider a variable, Y, the value of which depends in large part on the values of a set of variables, X1, X2, and so on. Y could be a test result, and the Xs could be the student’s mark the previous year, hours spent studying, the teacher this year, etc. Ordinary least squares multiple regression might be used to estimate such a relationship. In econometric theory, it has been show that, under the required conditions, this approach meets certain desirable criteria. Specifically, it provides the “best linear unbiased estimates” (BLUE) of the parameters of a model. This has conventionally been viewed as giving the best estimates that can be achieved. If the required conditions are not met, problems arise. Without going into technical details, some common problems are those of autocorrelation, heteroskedasticity, and multicollinearity. These are well covered in standard econometrics textbooks, and methods have been devised to address them. Resulting training has focused on application of these techniques. By comparison, limited attention has been given to broader issues of choice of technique (what should a “best” estimator do?) and associated reservations and qualifications (are the variables related to each other in a linear way?). Consequently there still remains the question whether the “ideal” is really so desirable or relevant.
The purpose of this particular section is to identify the social imaginaries which may be associated with this approach. In other words, the aim is to identify some of the limitations that are commonly overlooked in the standard approaches. This is done by looking at the basic assumptions of the assumed ideal situation.
Econometrics involves the estimation of a relationship between variables. A general functional form is specified, with the parameters indicating the relationship between variables. The estimation process then provides information on possible values for the parameters and additional statistical diagnostic information. The starting point is therefore the choice of variables and specification of the functional form.
Ideally there is some theoretical basis for these decisions. Statistical criteria relate to the numbers alone, the values in the data set. They are independent of the meaning assigned to the numbers, whatever they are (student grades, class size, teachers’ years of teaching, etc.).
Theory involves simplification. Subsequent empirical analysis, even if based on theory, involves further simplification. The result is at best a tenuous relationship with reality. The following sections consider first the criteria for BLUE estimators, then the relationship between such estimators and theoretically specified relationships between variables, followed by some common refinements and interpretations of results.
Linearity – linear models assume a fixed value for each βi. This is actually very restrictive. Consider Y as an individual’s income in a particular year and Xi as years of education. First, if these variables are used, each extra year of education is assumed to have the same incremental effect, βi, on income irrespective of whether it is the third or the twelfth year of education. Second, the effect is assumed to be invariant to differences in the values of any other explanatory variables, such as the school attended, or parents’ socioeconomic status – an additional year of education would still increase income by an amount βi. Third, if the aim is to determine the effect of a year’s education on income, the effect is considered to be fully identified in the income of the period represented by the data. This would not suit careers with very different earnings profiles over time, such as doctors for whom income can be expected to rise with age and manual workers whose incomes may peak much earlier. Fourth, the effect of an additional year of education is assumed to be the same for every person in the study. More generally, all the observations are assumed to be of the same structure, and so the effects are also identical. These requirements are far more restrictive than most theories would specify. Nevertheless, poor explanatory power for an equation is put down to either measurement error or omitted variables. Inappropriate or imprecise specification of the relationships should be an equally serious concern.
Unbiasedness – an unbiased estimator is one for which the expected value of the estimate is equal to the true value. This means that the estimator is unbiased in the estimated form of variables. Sometimes a linear relationship is estimated when the underlying relationship of interest is nonlinear. Note that, by using log transformations, multiplication can be undertaken by adding, and raising by a power through multiplication. Consider Z = AXaYb. Converting to log form, this becomes (logZ) = (logA) + a(logX) + b(logY). The latter form is linear. Adding an error term, it can be estimated using linear regression. If the required assumptions hold, the estimates will be unbiased and so the resulting estimates of logZ will be unbiased. However, that does not mean that the results will still be unbiased when transformed back into the desired form, Z. Consider Z having the values 10 and 100 and hence a mean of 55. LogZ to base 10 is 1 and 2, respectively, with a mean of 1.5 which has an antilog of 31.6.
Best – an estimator of the parameters β0 to βn is BLUE if, of all the possible linear unbiased estimators, it is best according to some quality measures. The measure used is that it has the lowest sum of squared errors, the sum of the squared differences between actual and estimated values of the dependent variable in the observations used. This is a conventional measure, but there are other possibilities, such as for quantile regression which uses absolute errors. “Best” means best out of the subset of possible estimators. There may be nonlinear or biased estimators with lower sum of squared errors such that the resulting estimates are more accurate. (It may be better to have many shots hit a target closely grouped but slightly to the right of the bull’s eye, rather than being widely scattered but with the bull’s eye at the center of the loose grouping.)
Estimators and Theoretical Relationships
It is not uncommon to find, for instance, research on dropout that fails to distinguish dropout resulting from academic failure from that which is the outcome of voluntary withdrawal. Nor is it uncommon to find permanent dropouts placed together with persons whose leaving may be temporary in nature or may lead to transfer to other institutions of higher education. (Tinto 1975, pp. 89–90)
There are also limitations in the functional forms that are being estimated. Stock variables describe a point in time and flow variables measure quantities aggregated over discrete units of time, such as semesters or years. Reality occurs in continuous time, in which the values of stock variables can change and the timing of flow variables is important, although it is not possible to identify from the data where within a time period events took place.
There is also commonly a very restrictive functional relationship between independent and dependent variables (or, in simultaneous equation models, exogenous and endogenous variables). Consider the pattern of change over time of an independent variable providing an “input wave” and the impact of this on the dependent variable observed as an “output wave.” In a linear model, the two waves would have either identical patterns if the coefficient, β, is positive or inverse patterns if β is negative. A new teacher is assumed to result in an immediate increase or drop in student performance, for example. The timing and nature of impact of one variable on another can be far more complex than this, but that cannot be easily identified. Similarly if a change in an independent variable occurs at the end of a time period, the impact on the dependent variable will be felt not in the current, but in the next or later periods. Hence aggregation into time periods results in imprecise specification even if the real timing of impact is exact and identical in all cases.
The relevance of a variable in an equation is commonly assessed by statistical criteria. However, statistical significance depends in part on the number of observations (which is quite distinct from the underlying importance of the variable), and statistical significance is not the same as policy significance. A statistically significant result, even if it coincides with underlying causal factors, may be such that attainable changes are very small or very costly, or alternatively a useful policy instrument may have a large but variable (and hence statistically insignificant) effect.
Moreover, there is a fundamental problem with statistical tests of significance. Known as the “fallacy of the transposed conditional,” significance tests estimate the probability of an outcome if the null hypothesis is true. It is argued that an unlikely result indicates that the null hypothesis may be false. However, there is no information about the likelihood of an outcome when the null hypothesis is false. “The likelihood of an outcome given (conditional on) the null hypothesis” is not the same as “the likelihood of the null hypothesis given (conditional on) an outcome.” The latter is the situation in a hypothesis test, hence the reference to a “transposed conditional.”
An additional problem arises in that correlation neither implies causality nor does it cover all possible causal relationships. Correlation is a measure only of linear association between data series. There are numerous other possible causal patterns that could be observed, such as a threshold effect (drowning and depth of water), or a viable range (survival in relation to temperature), not to mention INUS conditions. These latter refer to situations where an event can occur when a set of conditions arises, and there may be several such sets that produce the same effect. University study could be considered an INUS condition for higher income. It is Insufficient on its own (the person would then have to work). It is a Necessary part of a set of conditions (getting higher income by becoming a doctor). The set of conditions is Unnecessary (high income can come from being a top sportsperson or musician), but it is Sufficient (the education followed by employment as a doctor will give higher income). Similarly, consider “causes” of car accidents, workplace deaths, or obtaining a company directorship.
Interpretation of results is further complicated in that some data series are used as proxies for something else; hence, household income or a parent’s education might be used as a measure of socioeconomic status. Should results then be interpreted in terms of the intended variable or the proxy variable? Even if not planned, the result for one variable may actually be picking up the effect of something else with which it is highly correlated.
generalized, one-size-fits-all solutions do not work…Without intimate knowledge of local context, one cannot hope to devise solutions to local problems. All problems are de facto local; inquiry must be decentralized to the local context. (Stringer 2007, p. ix)
Interpretations of Results
Regression results are generally biased if relevant variables are omitted. One commonly accepted approach is the inclusion of “control variables,” which are claimed to control for the effects of the designated variable. Hence, for example, addition of a household income variable could be used to claim that results apply “after having controlled for household income.” Unfortunately these variables are added without regard to the functional relationship. In the extreme cases, this is achieved in blanket fashion simply through the addition of a “vector of control variables,” an increasingly common practice. The functional form is important, but linearity assumes “additive separability,” whereby each variable can be considered in isolation. This is problematic because “Observational data…are rife with dependency structures…No one variable can meaningfully be ‘held constant’ while others are allowed to vary” (Babones 2014, pp. 123–124).
Hanushek writes: “Most research articles, after finding a set of things that is correlated with student performance, immediately go to a section on policy conclusions” (Hanushek 1997, p. 303). He is concerned about causality and replication, or the wider applicability of findings from a study. Models alone do not address all the aspects to be considered when making policy decisions. Moreover, “A good model is merely one type of evidence among others, not the end of the argument. Much less the ultimate authority.” (Majone 1989, p. 51)
There are many complicating factors, including differing responses to passive versus active use of policy variables, learning and changed behavior (so the structure may change), the need to consider alternative policy options and their costs and benefits, and subgroups responding differently to an approach. In discussion on teacher value-added estimates as a basis for performance pay, Hanushek and Rivkin (2010, pp. 269–270) raise: “concerns about accuracy, fairness, and potential adverse effects of incentives based on a limited set of outcomes…[and] concerns about incentives to cheat, adopt teaching methods that teach narrowly to tests, and ignore non-tested subjects.”
The focus on econometrics as a social imaginary provides a valuable insight into the problems that might result. The concerns are further highlighted in parallel literature on the concept of framing, the role of theories as providing analogies and the additional issues to be considered to relate these to the real world (Birks 2015). Additional support can be found in cultural political economy, which asks why particular imaginaries may arise and emphasizes that “both history and institutions matter” (Jessop and Oosterlynck 2008, p. 1156).
Fifty years ago, basic econometric research focused on building up techniques with the expectation that there would be parallel improvements in data bases. It was hoped that this would result in valuable research at some stage in the future. In reality, econometrics packages and online databases made the techniques far more accessible, but the techniques and data quality were not able to live up to expectations. However, the practices became entrenched and people chose to focus on this “high status” activity using packages and large available online databases, ignoring the reservations that should be raised. This was at the cost of more pragmatic analysis of real world situations. There is a place for econometrics in an analyst’s toolkit, but it must be used with care, is only suited to certain types of data, and should be used in conjunction with other research techniques. The term “mainstream economics” is increasingly used to refer to what is perceived by many as a current narrowly focused approach to economics. In contrast, pluralist approaches are less prominent but offer alternative, more diverse techniques and theories. Education researchers may find that this alternative literature can provide useful insights.
- Babones, S. J. (2014). Methods for quantitative macro-comparative research. Los Angeles, CA: SAGE Publications.Google Scholar
- Becker, W. E., & Walstad, W. B. (Eds.). (1987). Econometric modeling in economic education research. Boston, MA: Kluwer Nijhoff Publishing.Google Scholar
- Majone, G. (1989). Evidence, argument, and persuasion in the policy process. New Haven, CT: Yale University Press.Google Scholar
- Stringer, E. T. (2007). Action research. Los Angeles, CA: Sage Publications.Google Scholar
- Taylor, C. (2004). Modern social imaginaries. Durham: Duke University Press. Retrieved from http://encore.massey.ac.nz/iii/encore/record/C__Rb1805753.
- Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45, 89–125. Retrieved from http://www.jstor.org/stable/1170024.