1 Introduction

“Most of the observed increase in global average temperatures since the mid-twentieth century is very likely due to the observed increase in anthropogenic green house gas concentrations, and it is likely that there has been significant anthropogenic warming over the past 50 years averaged over each continent except Antarctica” (IPCC 2007: p. 5). In this proposition, which was part of the summary for policy makers of the famous IPCC report on climate change which has influenced politicians throughout the world, the phrase “very likely due to” needs clarification. As such, it is a causal statement of the form “increase of X very likely causes increase of Y.” More specifically, the causal statement is “anthropogenic increasing of GHG concentrations very likely causes increasing of the (mean) global temperature.”

According to Bunge (1959; see Tacq 2010), a causal relationship between X and Y requires that (a) the relationship is conditional (if X, then Y); (b) unique (one cause, one effect); (c) asymmetrical (when X causes Y, then Y does not cause X); and (d) invariable (no probabilistic causality). Although elegant in its simplicity, application of Bunge's rigid framework does not seem to expand very much beyond the realm of elastic collisions as observed, e.g., in a game of billiard. On the other end of this rigid definition stands American pragmatism, where one ceases to look for metaphysical grounds of a theory but instead, measures the value of a hypothesis on causality based on the cash-value of the idea (James 1907):

Grant an idea or belief to be true, what concrete difference will its being true make in anyone's actual life? How will the truth be realized? What experiences will be different from those which one would obtain if the belief were false? What, in short, is the truth's cash-value in experiential terms?

Since as a matter of fact, the cash-value of the idea of AGW has proved to be positive, this alone may serve as sufficient motivation for its maintenance in science. For those satisfied with this argument, Q.E.D., and the story ends here. However, others may not dismiss the fact so easily that causal statements in science have been subject to debate for as long as science exists (see, e.g., Pearl 2000; Tacq 2010). As early as in 1737, the philosopher David Hume wrote that

When we look about us towards external objects, and consider the operation of causes, we are never able, in a single instance, to discover any power or necessary connection; any quality, which binds the effect to the cause, and renders the one an infallible consequence of the other.

In other words, regardless of the rationale on which a causal statement has been based, its validity will always remain to be open for dispute. While early scientists depended on the principle of verification for the validation of theories (including causal hypotheses), the critique of Hume gradually convinced scientists that other principles should be applied in order to establish whether or not a theory was scientific. Whereas the positivists of the Wiener Kreis still advocated that scientific theories are sound if they can be verified by empirical data, the logical shortcomings of the verification criterion led Karl Popper to radically brake with this tradition.

Briefly stated, the major shortcoming of the verification criterion is that it allows only experience to decide upon the truth or falsity of scientific statements (Popper 1965: 42; see Rapp 1975). Popper's most important contribution to the debate was to state that every scientific theory should be able to list counter-examples which, if found in reality, disconfirm (“falsify”) the theory. This is the principle of falsification. In the case of anthropogenic global warming (AGW), the theory should list one or more counter-examples that could (potentially) disconfirm the theory. This listing of potential falsifiers appears to be missing in the present debate on AGW. In fact, some skeptics in the debate on AGW point out that all natural climatic disasters are used as evidence (verification) for the human impact on climate, whereas evidence that a post WWII global warming is absent in, e.g., the Greenland Ice-Core Bore Record is ignored as falsifying evidence (see, e.g., Dahl-Jensen et al. 1998; Feldman and Marks 2009). Needless to say that a methodologically sound theory would encompass all available evidence and not “cherry-pick” those pieces of evidence that confirm the theory while ignoring those that do not.

Unfortunately, when a theoretical phenomenon such as AGW becomes a global political program, it soon becomes vulnerable to methodological fallacies in the realm of social and political science. Leaving aside the quality of used data and methods, the IPCC report aimed at reaching a consensus. Consensus is recognized by some social scientific methodologists as the defining feature of social science (Swanborn 1996; Feyerabend 1987). However, if reaching consensus were really the hallmark of sound science, the scientific theories of Galileo, Copernicus, Darwin, and many others would never have seen daylight. Also, there is no guarantee that majorities will reach sensible opinions (think only of the democratic Weimar republic in the 1930s). Finally, scientists need to make a living, and they will not bite the hand that feeds them, an argument used by some advocates of AGW who claim that climate skeptics are sponsored by “Big Carbon”. Therefore, consensus must be dismissed as a defining feature of science. The IPCC recognizes the limitation of consensus by adding the phrase ‘and much evidence’ when it makes statements as in, e.g., “there is high agreement and much evidence that with current climate change mitigation policies and related sustainable development practices, global GHG emissions will continue to grow over the next few decades” (IPCC 2007: p. 7, italics added). We must therefore discuss the sources of evidence that are used to formulate the many causal statements on AGW issued in the report.

The quality of all scientific research depends of course, on the quality of the data that are being processed. Regardless of the quality of the (statistical) model used for analysis, if bad data are fed to the model, then the result of the analysis will be bad. This principle is known as garbage in–garbage out. In other words, if the data that are fed into climate models are open to dispute, then so are the projections of these models. In the scientific (i.e., peer-reviewed) literature, several authors have expressed doubts about the quality of the analyzed data and the possibility to derive at valid inferences on human impact on global warming (e.g., Jaworowski 1994; Soon et al. 2004; Michaels 2008; Pielke et al. 2007). However, since the author of this article is no expert on climate science, the issue of whether or not data used in climate science are of enough quality will be left for others to decide. Instead, in this methodological note on the making of causal statements in the debate on AGW, we focus on the study designs that are used to establish the causal hypotheses. The following sections discuss briefly the consequence of a lack of experiment and the relying on correlational data for establishing causal relationships. This discussion prepares the ground for the formulation of possible falsifiers of AWG. Some concluding remarks remain in the last section.

2 On the establishing of causality in time series

2.1 The consequence of the lack of experiment

The challenge in corroborating any causal hypothesis is to determine what kinds of evidence constitute actual proof of the hypothesis. The study design appropriate for establishing causality, that is for establishing cause and effect relationships is the experiment, as any freshman's course or introductory work in research methodology will reveal (e.g., Kumar 2005: 100; Ford 2000: 141; de Vaus 2001: 70; Gomm 2008: 60; Neale and Robert 1986: 134). The key to the conducting of true experiments is the installing of control groups that receive no, or different, experimental interventions. However, despite efforts to find planets with equal conditions in the universe, no replicate of the Planet Earth has been found at the time that this article went to press and therefore, no true experiment can be conducted that would compare climate change on a planet with and without anthropogenic carbon dioxide production. Furthermore, artificial interventions where for instance, the impact of doubling anthropogenic carbon dioxide on global warming is investigated are if not infeasible, surely unlikely to be carried out by contemporary researchers. These observations may seem rather straightforward, but their consequences are far from trivial, for the immediate implication is that the only study design that is available to test the AGW hypothesis is the longitudinal study, either a panel or a trend study (see, e.g., Kumar 2005: 111).

Lack of a control group in experiments make the testing of counterfactual relationships, for instance ¬ X → ¬ Y (“if there were no Industrial Revolution, then there would not have been an increase in global temperature”), impossible.

In sciences where experimentation is difficult if not impossible, such as the social sciences, researchers rely on correlational research in order to establish causal relationships. Simply stated, two variables correlate positively when increasing values of X correspond to increasing values of Y. Thus, a change of X leads to a change of Y, which is one of the prerequisites for establishing causality (see Section 1). In order to find a (statistically significant) correlation, however, there is no requirement that X precedes Y. Both hypotheses X → Y and Y → X will produce identical correlations.

Furthermore, a significant correlation of empirical data can be observed when, in fact, no causal relationship is involved. For instance, the priests of ancient Egypt were paid for seeing to it that, in the evening, the sun would set and in the morning, the sun would come up again. Because they upheld their ceremonies every evening and morning, and because every morning the sun did come up and every evening the sun did set, the priests proved to the general public that their actions had value. Of course, the causal relationship between the making of offerings and the behavior of the sun was nil, whereas the correlation between the actions of the priests and the sun equaled one, that is, the correlation was maximal.

Another well-known problem relates to the so-called third variable (e.g., Neale and Robert 1986: 239). In order to illustrate this phenomenon, a much used example is where the number of churches in a city is compared with the number of crimes committed in a city: with a large enough sample of cities, a positive correlation will be found indicating that churches cause crime (or vice versa). The third or intervening variable, in this case, is the size of the city, and the empirical correlation is called spurious. In sum, even when correlation is a necessary condition in causality, it is not a sufficient one.

2.2 Granger causality

In the literature, the notion of Granger causality has received increasing interest (e.g., Dufour and Renault 1998), an econometric concept that has also been applied in the analysis of climate data (e.g., Stern and Kaufmann 1999; Verdes 2005; Kauffmann et al. 2006). Briefly stated, X → gY that is a variable X is said to Granger cause another variable Y if the observation of X up to time t can help one to predict Y at time t + 1, when the corresponding observation on Y is available. That is, G-causation implies that knowledge of the past of X and Y produces better predictions of Y then knowledge of the past of Y alone (Freeman 1983). This definition of causality does not require presence of a plausible theory for the causal connection. Rather, it focuses on one important aspect of causal hypotheses, namely the power to predict, and the validity of a predictive model can be expressed in terms of error in prediction and model fit statistics (e.g., likelihood ratios).

Use of Granger causality in order to confirm the AGW hypothesis has received critique in the literature (e.g., Triacca 2001, 2005). The most important problem with statistical models such as Error Correction Models and related time series models is that they cannot serve as proof of a causal relationship. Firstly, because the associations that are found may be spurious, meaning that one or more intervening variables have been omitted in the model. Of course, such intervening variables can be included in the model, e.g., in a vector Z of auxiliary variables (which requires of course, measurement of this vector), but proving that all intervening variables have been captured is impossible. This is because absence of proof is not equivalent to proof of absence.

Secondly, how does one prove that X is the precursor for Y in a statistical analysis of empirical data? At the very least, the goodness-of-fit statistics of the models where both data generating processes (X → gY and Y → gX) are assessed need to be compared. This comparison can have four possible outcomes with corresponding conceptual and operational hypotheses:

Conceptual hypothesis:

Operational hypothesis:

1. X → Y ∧ ¬ (Y → X)

X → gY has significant model fit and Y → gX does not

2. ¬ (X → Y) ∧ Y → X

X → gY does not have significant model fit and Y → gX does

3. ¬ (X → Y) ∧ ¬ (Y → X)

Neither X → gY nor Y → gX fit the data

4. X → Y ∧ Y → X

Both X → gY and Y → gX display significant model fit

In applied research, the operational hypothesis relates to the expected outcome from statistical inference based on analysis of empirical data. It is important to note that validity of the operational hypothesis provides evidence for, not proof of, validity of the conceptual hypothesis. For instance, the first outcome provides evidence for the simple mechanistic hypothesis that X is a cause (or at least a mediator) of Y; and similarly, the second outcome provides evidence for the alternative hypothesis that Y causes (or is a mediator of) X. Outcome 3 is a falsifier for both outcome 1 and outcome 2. The fourth outcome suggests the possibility of a feedback loop, i.e., causal recursion.

Modeling matters in terms of verification (and falsification) become complicated in the case that a feedback loop is present. Among issues that need to be taken into account are the different time lags for the X → Y and Y → X relationships; the time span that is covered by data on X and Y; the direction of the feedback (negative or positive); and the (non)linearity of the relationships. In verifying causal recursion, it is further important to note that outcome 4 is a necessary but insufficient condition for presence of a feedback loop. For instance, consider two simple time series X = sin(t) and Y = cos(t). In these data, a feedback loop is absent, while both X → gY and Y → gX will produce equally well fitting models: we need only remember that cos(t) = sin(t + π/2) and sin(t) = cos(t − π/2). Also, in Error Correction Models such as the one applied by Kauffmann et al. (2006: p. 256), if the model produces significant parameters with Y as effect (i.e., dependent variable), it will at the same time produce significant parameters with X as effect (this is so because the mathematical model specifies the two variables in terms of linear combinations of each other). In both cases, the data will result in outcome 4, and other kinds of evidence than the verification of statistical model fit are needed to decide what the proper direction of the causal relationship is (see, e.g., Hausman 1982). A possible strategy for obtaining such evidence is outlined in the next section.

2.3 Qualitative model validation and falsification

By verifying causal relationships, we only see one side of the coin when, in fact, we need to see both sides in other to show that the coin is genuine. That is to say, a verifying modeling procedure must be supplemented by a falsifying one. It was suggested by Van der Zouwen and Van Dijkum (2001: 237) that the formulation of potential falsifiers in recursive models can be based on qualitative mathematical knowledge:

A qualitative strategy may result in the identification of nullclines and equilibria which put constraints on the set of (differential) equations and make it possible that whole sets of models are falsified and others verified.

Qualitative mathematical knowledge is equally useful in the validation of relatively simple constituent non-recursive models, that is, causal chains that are parts of the fully specified model.

For example, we may establish that X indeed precedes Y (and not vice versa) by placing constraints on the pattern of minima and maxima of the time series (see also Van Dijkum et al. 2001). Formally, we may require that first (and higher order) derivatives of the cause predict, with a given time lag, first (and higher order) derivatives of the effect:

$$ \frac{\partial }{{\partial t}}{\hbox{X}}{ \to_{\rm{g}}}\frac{\partial }{{\partial t}}{\hbox{Y}}. $$

Notably, if, in a given time span, maxima (minima) of X are not consistently followed (let alone preceded) by maxima (minima) of Y, then the hypothesis that X is a precursor for Y may be considered falsified. This, in turn, requires a time span of measurements where changes of the gradients must be detected for both X and Y. Furthermore, the time series of derivatives must be computed independently for X and Y (artifacts are produced if they are derived from differentiation of the autoregressive models used for establishing G-causality!), for instance, by the use of splines (see, e.g., de Boor 1978).

Obviously, the qualitative strategy of model validation requires a theory about causal relationships that can be expressed in a mathematical model. It also requires data of substantial quality. For instance, if the AGW hypothesis is to be confirmed, data are required that quantify the amount of human produced CO2 as separated from the amount produced by other sources (e.g., volcano's) and as separated from the amount of CO2 dissipated in oceans for a time span that covers theoretically justified time lags (perchance in the order of centuries). An “aggregate” causal variable that combines radiative forcing of greenhouse gasses, anthropogenic sulfur emissions, and radiative forcing of solar irradiance cannot provide “direct evidence that, since 1870, human activity is largely responsible for the increase in global surface temperature” (cfr. Kauffmann et al. 2006: 225, 250). Of course, evidence that the simple non-recursive causal chain is valid (conceptual hypothesis) mounts with increasing numbers of maxima and minima that follow the regularity constraint (operational hypothesis). However, the precise point when evidence changes into proof remains to be a subjective decision based on consensus among scientists.

3 Concluding remarks

The fact that a true experiment (one including control groups) is not feasible in the case of studying causality relating to warming on Planet Earth does not mean that the weaknesses of correlational research do not apply in climate research. A well-known weakness of such research designs refers to fact that absence of proof (of spurious relationships) does not imply proof of absence (of spurious relationships). In order to scientifically corroborate the AGW hypothesis, the present focus on verification of the AGW hypothesis should shift towards a focus on its falsification. A potential falsifier is when empirical data fail to show that maxima (minima) of the cause (e.g., human produced CO2) produce, at a specified time lag, maxima (minima) of the effect variable (e.g., global temperature). The latter requirement necessitates the availability of time spans of data that are large enough to display changes in the gradients of both cause and effect variables, and the application of models that allow for inference on derivatives.

As said, the challenge in corroborating any causal hypothesis is to determine what kinds of evidence constitute actual proof of the hypothesis. The field of verification and falsification of recursive spatial–temporal causality is underdeveloped and merits future research. Meanwhile, consensus among scientists will remain to play a great role in deciding when empirical evidence suffices as proof of a causal hypothesis. However, in adopting Swanborn's (1996) “regulative idea of striving after truth by consensus within the scientific community over research results”, we are always in danger of replacing the purpose of science (knowledge) with a by-product of science (consensus in the form of “common sense”). Karl Popper's requirement of a sound scientific theory, that it should produce counter-examples that falsify its validity, serves as a first line of defense against this danger. Of course, failure to find falsifying evidence in empirical data will render the AGW hypothesis much stronger.