1 Laws and Causation

Many scientists and philosophers have assumed that general causal relations are expressed by causal laws. That is wrong, which was first realised by Bertrand Russell:

‘The law of causality, I believe, like much that passes muster among philosophers, is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm. (Russell, 1913).

It is not obvious what Russell meant by ‘the law of causality’, although it had been a standard phrase for a long time in philosophy. In any case, we take it that he held that there are no such things as causal laws in science. If so, he was basically right. Physics and chemistry do not contain anything that rightly could be called a ‘causal law’ and it is doubtful if there are any laws whatsoever outside physics and chemistry.

But we use laws (or weaker connections, regularities, to be discussed in Sect. 6.3) when making inferences about cause-effect relations. Physical and chemical laws connect variables to each other, or state invariance principles, and these laws are used in derivations.

The point is that several criteria must be satisfied for establishing a general cause-effect relation. The mathematical connection between the variables is only one of these conditions. A law, expressed as an equation, (or a regularity expressed as an equation including a random variable) is a necessary but not sufficient condition for there being a causal relation between the variables. In order to see this more clearly we may consider an episode in the history of science, Boyle’s discovery of the law named after him.

Robert Boyle (1627–1691) studied the connection between pressure and volume of gases using a J-formed tube filled with mercury, as in Fig. 6.1. By adding mercury in the open end of the tube he changed the pressure on the air in the closed end of the tube. The volume of the air was measured by the scale on the left leg of the tube. He found that the product of pressure and volume is a constant, abbreviated as \(pV= constant\), which is Boyle’s law.

Fig. 6.1
A schematic of a J-tube apparatus indicates the initial level of mercury and 29 times 11 over 16 inches between the shorter leg with scale and mercury column increased by adding mercury to the tube.

Boyle’s experiment, figure adapted from https://chemed.chem.purdue.edu/genchem/topicreview/bp/ch4/gaslaws3.html

This simple mathematical relation between two quantities does not say which is the cause and which is the effect. This distinction is made first when identifying what is manipulated in a particular concrete case. In the experiments Boyle reported in (Boyle, 1662) that he changed the pressure by increasing the amount of mercury and passively observed the volume of the air; hence the pressure change is the cause and the volume change is the effect in this experiment. In another experiment it could be the converse.

It is obvious from the mere form of Boyle’s law that it doesn’t say anything about cause and effect. But the law is needed for establishing cause-effect relations involving changes of pressure and volume of gases.

This is a general trait of laws; mathematical relations between quantities only tell us that a change in any of the variables logically entails changes of at least one of the other variables. But mathematical-logical relations are not the same as causal relations.

2 Laws, Regularities and Ceteris Paribus Clauses

2.1 The Form of Laws

Boyle’s law is one among a number of laws in physics and chemistry. These laws have the common feature of being general statementsrelating a number of quantities to each other, see (Johansson, 2019). But the generality is never explicit; usually only the numerical relations between quantities are explicit in law statements, while it is tacitly understood that these quantities are attributes of real objects of some kind. Here are some examples of physical laws expressing relations between quantitative variables:

  1. 1.

    Newton’s second law:

    $$\displaystyle \begin{aligned} f=ma \end{aligned} $$
    (6.1)
  2. 2.

    Coulomb’s law:

    $$\displaystyle \begin{aligned} f=k\frac{q_1 q_2}{r^2} \end{aligned} $$
    (6.2)
  3. 3.

    Maxwell’s equations:

    $$\displaystyle \begin{aligned} \triangledown \cdot \mathbf{E} =\frac{\rho}{ \epsilon} {} \end{aligned} $$
    (6.3)
    $$\displaystyle \begin{aligned} \triangledown \cdot \mathbf{B} = 0 {} \end{aligned} $$
    (6.4)
    $$\displaystyle \begin{aligned} \triangledown \times \mathbf{E} = \frac{\partial \mathbf{B}} {\partial t} {} \end{aligned} $$
    (6.5)
    $$\displaystyle \begin{aligned} \triangledown \times \mathbf{B} = \frac{4 \pi k}{c^2} \mathbf{J} + \frac{1}{c^2}\frac{\partial \mathbf{E}}{\partial t} {} \end{aligned} $$
    (6.6)

These equations, stating relations between quantities, are to be understood as abbreviations for full law statements. The quantities are in any particular application attributed to physical objects, i.e., bodies or fields. So the full verbal formulation of a physical law always contains a generalisation over all objects that can be attributed such quantities. So for example, Newton’s second law is the following more complete statement:

  • Newton’s second law: For any body with mass m, acceleration a and upon which a total force f acts, it holds that \(f=ma\).

Thus, the logical form of scientific laws is that of universally generalised conditionals, UGCs for short. They do not tell us anything about causal relations; they merely inform us about numerical relations between some quantities attributed to a set of objects. They are general statements since they are true of all objects in a domain.

UGCs: Universally Generalised Conditionals

A conditional is a sentence of the form ‘if A, then B’, where A and B are complete sentences. (The word ‘then’ is often omitted.)

Example: If it is raining, then the ground is wet.

A universally generalised conditional is a sentence of the form

‘For all x, if x is A, then x is B’.

Example: For all x: if x is a human, x has a heart.

In logic, the expression ‘For all’ is called ‘the universal quantifier’; hence a sentence of the form shown above may be called a universally generalised conditional.

A conditional is true if and only if either the antecedent is false or the consequent is true. The conditional doesn’t say anything about what makes it true; whether it follows from mathematical or logical axioms, or if the meaning of B is included in the meaning of A, or whether A causes B. The same goes for UGCs.

One may now ask whether there are any tacit conditions for such general statements? In the case of Boyle’s law, several researchers, the first was Amontons (1663–1705), discovered that the pressure of a gas depends on its temperature if the volume is constant. So a tacit assumption in Boyle’s law is that temperature is constant. By combining these two relations and introducing the amount of matter, the number of moles, n, we arrive at the general law of gases, \(pV=nRT\).

Is this the final truth about gases? No. In extreme conditions, for example at very high pressure or high temperature, one must take into account quantum effects, which entails some adjustments, resulting in van der Waals’ law.

2.2 Strict and Not-So-Strict Laws

The process of adjustments and improvements of laws has in some cases come to an end, or so we believe. When all tacit conditions have been made explicit in the antecedent of a law, we have arrived at a strict law. So we distinguish between strict and not so strict laws, the former being those where we believe no further adjustments are needed.

But there are a vast number of not so strict connections between variables in the sciences. That they are not strict means that there are unknown but relevant conditions that have not been incorporated in the antecedent of the law statement. These unspecified and unknown conditions are sometimes referred to by a ceteris paribus clause. (This Latin expression means ‘other things equal’) The crucial thing is that we have not complete knowledge about such conditions; for if we knew them, we could incorporate all into the antecedent of a strict law, just as temperature was combined with Boyle’s law, resulting in the general law of gases. So by saying that an observed regularity obeys a certain law, ceteris paribus, we indicate that we recognise the possibility of refinements, or even radical changes, in the so far not so strict law.

We think it better to call such non-strict lawswith unknown scope of validity ‘regularities’ instead. This is less committing; calling something a regularity leaves open for changes and/or restriction of scope. Woodward (1997) suggests instead the label ‘restricted invariances’.

Instead of referring to ceteris paribus conditions, one may add a random variable, an error term, to an equation expressing a not so strict relation between the variables. (Talk about randomness is in most cases another way of saying that there are factors about which we at present lack information.)Footnote 1

3 Correlation, Regression and Causation

Strict laws, in the sense of UGCs without any ceteris paribus clause or random variable, are so far not found in any discipline outside physics and chemistry. In e.g., biology, ecology, sociology and economics only weaker connections, regularities, have been found. In statistical terms such regularities are described by two functions, correlation and regression.

A scattergram displays vividly the information contained in the coefficient of correlation and the slope of the regression line, see Fig. 6.2.

Fig. 6.2
A scatter and line graph plots data points in and around the increasing trend.

Correlation between Indicator of Quality of Student Achievement (IQSA) (x-axis) and economic growth (EG) (y-axis), adapted from (Burhan et al., 2023)

The coefficient of correlation tells us how strong the connection between the two variables are. If the correlation is \(-1\) or \(+1\), one can with certainty derive the value of variable Y from information about the value of variable X, or vice versa. If the correlation is zero there is no connection at all. In Fig. 6.2, where the correlation is rather strong (\(R=0.74\)) one can, for a chosen value of X, determine an interval for the corresponding Y, or vice versa. One may further conclude that there must be other variables connected to the development index, although they contribute less than how many years girls go to school to human development index.

The coefficient of correlation is a measure of the spread of data-points around the regression line. If all data points were on the regression line, the correlation would be 1 (or \(-\)1, if the slope is negative). If the data points are completely randomly spread over the entire area of the scattergram the correlation is 0.

This means that if we want to formulate this as a regularity, we should write something like

$$\displaystyle \begin{aligned} EG= constant \cdot IQSA\ + U, \end{aligned} $$
(6.7)

where U is a probability distributionfunction representing all other factors, known or unknown. It is obvious that if the random variation in U is big, the equation is not of much use. A well-known example from economics may be useful as further illustration.

A.W. Phillips published 1958 a well-known paper, (Phillips, 1958), which showed that inflation and unemployment are roughly inversely proportional. This result is called the ‘Phillips curve’, see Fig. 6.3, which is our own drawing.

Fig. 6.3
A scatter and line graph of inflation versus unemployment in percentage plots data points in and around the concave-up decreasing trend.

Inflation vs unemployment in the UK 1861–1913

Paul Samuelson integrated this result into economic theory, see Samuelson (1983). For some time economic researchers thought that this relation was close to a real economic law, tacitly assuming that inflation can be manipulated in order to decrease unemployment. Policy makers throughout the western world used this result for policy decisions; when governments wanted to decrease the unemployment rate they increased expenditure and budget deficits, calculating that this would increase inflation and decrease unemployment. However, after some years it was realised that it didn’t work as expected, often one got higher inflation without any decrease in unemployment. The conclusion was that the roughly inverse correlation had not been stable, hence some unknown ceteris paribus factor had changed. Here is quote from a review report published by FED:Footnote 2

Federal Reserve Chair Jerome Powell has been asked about the Phillips curve, during his July 2019 testimony before Congress. He noted that the connection between economic slack and inflation was strong 50 years ago. However, he said that it has become “weaker and weaker and weaker to the point where it’s a faint heartbeat that you can hear now.”

In discussing why this weakening had occurred, he said, “One reason is just that inflation expectations are so settled, and that’s what we think drives inflation.” (Engemann, 2020)

Two things are pretty clear. The first is that since the data points are dispersed around the curve, there must be more factors than unemployment that determines inflation. This means that there is at most a probabilistic relation between unemployment and inflation. The second is that, since the connection has weakened over the years due to reduced inflation expectations, this factor, inflation expectations, was one of the unknown factors in the original study. Powell suggests that is the main cause of inflation (he used the word ‘drives’). Hence, the Philips curve cannot be used as basis for political measures; it does not reflect a useful causal relation between high unemployment and low inflation.

A further obvious conclusion can be drawn: mere observational data, statistics, are not sufficient for inferring causal relations; one needs also other kinds of information. And since manipulability, or more generally, intervention, is strongly connected to causation, we need information from experiments, carefully designed interventions or natural experiments, in order to determine whether an observed correlation is a sign of a causal relation or not. The tools needed for such inferences are discussed in some detail in (Pearl, 2009). Here is a quote from this paper:

Remarkably, although much of the conceptual framework and algorithmic tools needed for tackling such problems are now well established, they are hardly known to researchers who could put them into practical use. The main reason is educational. Solving causal problems systematically requires certain extensions in the standard mathematical language of statistics, and these extensions are not generally emphasised in the mainstream literature and education. As a result, large segments of the statistical research community find it hard to appreciate and benefit from the many results that causal analysis has produced in the past two decades. These results rest on contemporary advances in four areas:

  1. 1.

    Counterfactual analysis

  2. 2.

    Nonparametric structural equations

  3. 3.

    Graphical models

  4. 4.

    Symbiosis between counterfactual and graphical methods.

(op.cit. pp. 97–98)

We have discussed counterfactual analysis in Chap. 4 (and suggested replacing potential outcomes for counterfactuals) and will bring up structural equations and graphical models in this chapter. Symbiosis of counterfactual and graphical methods will not be discussed in this book.

4 Correlations Between Boolean Variables

Boolean variables (after George Boole, 1815–1864) have only two possible values, such as true–false, yes–no, or 0–1. Boolean variables are common in social sciences, they are used when organising data in two categories (male–female, college education–no-college education, etc.). The measure of correlation between two Boolean variables is the \(\phi \) coefficient (‘mean square contingency coefficient’).

Suppose we have two variables, X and Y, and denote their values ‘0’ and ‘1’ respectively. If we have n observations, we display the distribution as follows:

 

y = 0

y = 1

Total

x = 0

\(n_{00}\)

\(n_{01}\)

\(n_{0\bullet }\)

x = 1

\(n_{10}\)

\(n_{11}\)

\(n_{1\bullet }\)

Total

\( n_{\bullet 0}\)

\(n_{\bullet 1}\)

\(n \)

It is rather obvious that if \(n_{00}\) and \( n_{11}\) together make up all the observations, so that \(n_{01}\) and \( n_{10 }\) both are zero, we have a perfect correlation between X and Y. Likewise if the situation is completely reversed, all observations belonging to \(n_{01}\) or \(n_{10}\). Thus, the \(\phi \) coefficient is defined as:

$$\displaystyle \begin{aligned} \phi ={\frac {n_{11}n_{00}-n_{10}n_{01}}{\sqrt {n_{1\bullet }n_{0\bullet }n_{\bullet 0}n_{\bullet 1}}}} \end{aligned} $$
(6.8)

So, just as with the usual coefficient of correlation \(\rho \), the value is between \(-\)1 and 1, where the zero value means no correlation at all.

5 Directed Graphs and Structural Equations

The details of causal mechanisms may usefully be described using directed graphs and structural equations. Directed graphs visualise mechanisms and by using structural equations we can state quantitative relations between variables, i.e., we can give a measure of the strength of different causal connections.

5.1 Directed Graphs

Directed graphsis a conceptual and visual tool for displaying causal relations between, basically, values of variables. If we for example know that Z has two causes, X and Y, i.e., that the values of the variable Z is causally determined by the values of the variables X and Y, but not the other way round, we can visualise that with a directed graph of the form shown in Fig. 6.4. Figure 6.5 illustrates a situation where the variable Y has only one cause, the variable X, which in turn has only one cause, the intervention variable I. Directed graphs can be used to display rather complex structures, as is shown by e.g., Pearl (2000). Figure 6.6 is an example from his book (p. 215).

Fig. 6.4
A network. Nodes X and Y lead to node Z via mechanisms A and B.

The variables X and Y are each individually contributing causes of Z

Fig. 6.5
A network. Nodes I, X, and Y are serially connected.

The variable Y is directly caused by the variable X only, and the intervention I is the only direct cause of X. It means that the only way to change the value of X, and hence the value of Y, is to do something with I

Fig. 6.6
A network. Nodes U 1 and I lead to Q. Q leads to P via b 2 and P leads to Q via b 1. W and U 2 lead to P. I leads to Q via d 1 and W leads to P via d 2.

A diagram depicting the causal relations between price (P) and demand (Q) for a certain product. \(U_1\) and \(U_2\) are unknown external factors, I the household income and W the wage costs for producing the product, see (Pearl 2000, 215)

Assuming a free market economy, we can see that according to this model there are two ways to affect the price of a product: either to manipulate the wage costs for producing the product, or to manipulate the household income. If for example the household income is roughly the same during a certain period and the price have decreased, we may infer that it was caused by a decrease in wage costs. As always, we infer a causal relation between two individual events using information about causal relations between variables.

5.2 Structural Equations

Let us not forget that a causal relation between two variables is based, in an ontological sense, on causal relations between individual values of these variables. Thus, the fact that the variable X is the only cause of Y means that the event that X has a certain value, say \(x_i\), is the cause of the event of Y having the corresponding value \(y_i\). From an epistemological point of view we go in the opposite direction: we first establish knowledge about causal relations between variables by performing experiments, which then enables us to infer a causal relation between a pair of individual events or states of affairs.

These relations can more precisely be represented by so called structural equations. The following equation represents the situation depicted in Fig. 6.4 (\(k_1\) and \(k_2\) are parameters giving the relative contributions from X and Y):

$$\displaystyle \begin{aligned} Z = k_1X + k_2 Y \end{aligned} $$
(6.9)

Using linear equations is no substantial restriction. If the relation between an observed cause X and an effect Z is non-linear, one can easily make a variable transformation \(X \to X^\prime : X^\prime =X + a_1X^2+ ...a_nX^n\), so that Z is linearly dependent on \(X^\prime \). (All continuous functions, whatever their shape, can in any limited domain be approximated by functions of this type.)

Such equations differ from ordinary equations used in mathematical expositions of physics, economics and other ‘hard’ sciences in that the transformation rules of algebra are not valid in structural equations. The rule of interpretation for structural equations is that the left hand side represents the effect and the right hand side represent the cause or causes of this effect. This means that one cannot rewrite the equation by moving terms from left to right hand side of ‘=’, or vice versa, as is legitimate when using ordinary equations in derivations.

Therefore, using the identity sign, ‘=’, in structural equations is not appropriate; it would be a better idea, and in fact necessary, to use an asymmetric sign, for example ‘=:’ instead,Footnote 3 see (Pearl, 2000, 138) thus writing the equation above as:

$$\displaystyle \begin{aligned} Z =: k_1X + k_2 Y \end{aligned} $$
(6.10)

Here the sign ‘=:’ is to be read as ‘...is caused by ........’, and the entire equation means ‘The values of the variable Z are caused by the values of X and Y according to the weight factors \(k_1\) and \(k_2\)’. A situation depicted as in Fig. 6.5 can be given as a system of equations:

$$\displaystyle \begin{aligned} \left\{\begin{array}{l} Y=: k_1X \\ X=: k_2I \end{array}\right. \end{aligned}$$

and the relations depicted in Fig. 6.6 are given as

$$\displaystyle \begin{aligned} \left\{\begin{array}{l} Q=: b_1P+d_1I+U_1 \\ P=: b_2Q+d_2W+U_2 \end{array}\right. \end{aligned}$$

(So there is a feedback mechanism here, see Sect. 5.4 and the discussion in Sect. 8.5.2.). It is obvious how to extend this to more factors and more steps.

Each step in such a chain of causal relations may be realised by different kinds of links. Such a chain of causes is a causal mechanism, and providing the mechanism connecting a cause and its final effect is a common way to respond to quests for a causal explanation, to be further discussed in Chap. 8.

5.3 Bayesian Networks

In Fig. 6.6 the two arrows between price (P) and demand (Q) go in opposite directions. This represents a mutual dependency between these variables, see the last equation system in the previous subsection. Is this mutual dependency due to causal mechanisms or not? In economics it is assumed, we believe, that there is a feedback loop here, meaning that the value of e.g., the variable Q at a certain time \( t_1\) is causally dependent on the value of P at an earlier time, and a P-value at some time \(t_2\) depends on earlier Q-values.

If one knows, or has good reason to believe, that there are no feedback loops in a system, one may use Bayesian Networks, see e.g., (Pearl, 2000, Sect. 1.2), for modelling causal relations.

A Bayesian Network has two components, a Directed Acyclic Graf (DAG for short), and a set of conditional probabilities, one for each arrow in the graph. Figure 6.7 is a DAG and since there are five arrows, each representing a conditional probability for the connected variables, one needs information about five conditional probability distributions. For example, the left-most arrow connecting the sun activity and earth’s temperature represents a conditional probability of the form \(prob( T=t_1 | S=s_1) = p\), where T is the earth’s temperature, S is the sun activity (in some measure) and p is the probability.

Fig. 6.7
A network. Node sun activity leads to T. Nodes oil or coal burning and the number of cows map to T via the modulus of C O 2 and modulus of C H 4.

A DAG depicting causal links from sun’s activity, oil/coal burning and number of cows to the temperature in the atmosphere, T, via carbon dioxide and methane concentration. Observe that the three causal links are depicted as independent of each other, i.e., that the modularity condition is satisfied

One should keep in mind Cartwright’s ‘no causes in, no causes out’, (Cartwright, 1989). In other words, without causal assumptions as input in the construction of the network, one cannot draw any conclusions about causal relations from the network itself; it merely depicts statistical relations. (See further discussions about statistics and causation in Sect. 7.1.) But with input about causal relations, Bayesian Networks are useful tools for understanding causal structures and for making calculations.

When drawing the DAG one should ask oneself whether there are any causal interferences between different causal chains. In Fig. 6.7 there is no arrow between the concentration of \(CO_2\) and of \(CH_4\). The fact that no such arrow is drawn is a visualisation of the input, assumed to be correct, that there is no causal link between them. So when constructing the DAG, one needs to know whether there is any such link. The lack of causal couplings between different causal chains is in the literature called modularity, which is defined as follows:

  • Modularity: If \(X_i\) does not cause \(X_j\), then the probability distribution of \(X_j\) is unchanged when there is an intervention with respect to \(X_i\).

This is related to the Causal Markov Condition, CM: (V is the set of variables in a Bayesian Network):

  • CM: For all \(X_i , X_j, i\neq j\) in V, if \(X_i\) does not cause \(X_j\), then \(X_i\) and \(X_ j\) are probabilistically independent conditional on the set of parents, \(pa_i\), of \(X_i \).

These two conditions are related, since given a set of extra assumptions one can derive CM from Modularity, see (Hausman and Woodward, 2004). Thus it is possible to perform a statistical test for the assumption that there is no causal link from \(X_i\) to \(X_j\).

It may be observed that in constructing the figure we have taken for granted some causal relations, e.g. that cows produce great quantities of methane and this gas increases the temperature on Earth.

Why then is this network called ‘Bayesian’? Because we use Bayes’ theoremfor updating the probabilities when new information is available.

(Barbrook-Johnson and Penn, 2022) contains a useful description of Bayesian Networks (in that book called ‘Bayesian Belief Networks’). It contains a list of a number of softwares that may be used in constructing Bayesian Networks.

The authors rightly stress that the conditional probabilities connecting the nodes in the Network must be based on causal information, not mere observed statistics. Such information typically comes from stakeholders and they stress their importance:

We must encourage users to acknowledge that BBNs are always dependent on stakeholder opinion (unless developed based solely on data) and that removing outputs from that context, and not making clear either the process, or the network (i.e. the model), from which they are derived almost always dooms us to see them misinterpreted. Even in cases where outputs are not misused or misunderstood, the appeal of the diagram of a BBN with conditional probabilities annotated can also lead many to view BBN and its associated analysis as a product, rather than a process. Not recognising the value in the process of using this method is to ignore at best half its value, at worst, all its value. (op.cit., p. 107)

6 Non-linear Dynamics

When studying the associationFootnote 4 between two variables, the first step is to see how good the data points fit a linear regression line of the format \(Y=a +bX\). The calculation of such a linear regression can be done using any statistical package.

When looking at a graph one sometimes gets the impression that a non-linear equation would fit better to the data points. (A more reliable method is to use a statistical package by which one can calculate the best fit of the data points to different functions.) So one may repeat the procedure with e.g., an equation of the form \(Y= a +bX +cX^2\), or, as was the case with the Phillips curve, a function of the form \(Y= a +bX^{-1}\). And one can go further and use non-linear equations of higher and higher degrees as mathematical models of the observations. (Feedback loops, see Sect. 5.4, is one mechanism that may generate non-linear dynamics)

So far, all equations discussed are continuous and one may think that discontinuous changes must be possible in the real world. Well, the question is not whether there really are discontinuous changes in reality, but whether there are so fast state changes that a discontinuous function is a good representation. If for example one can measure a dependent variable at most once a day, it may one day change so abruptly that a step function is a good description of its state evolution. One might assume that the variable had intermediate values in between the two measurements, but for predictive purposes it doesn’t matter.

It is important to keep in mind that even if observational data quite well fit a non-linear equation, this fact in itself does not allow us to infer that the independent variable is a cause of the dependent variable. Just as is the case when a linear equation is a god fit, there may be common causes that produces the mathematical relation. The Phillips curve is a fine illustration; it is a non-linear equation, but, as we saw, there is virtually no causal connection between inflation and unemployment.

6.1 Predictions and Non-linear Dynamics

Non-linear evolution often surprises us, because we have a natural tendency to begin with the simplest hypothesis, a linear function, when investigating the relation between two variables. Consider, as an example, a simple physical experiment often made in physics courses in secondary school. The pupils are given a resistor, a current source, a current meter and a voltage meter. They are instructed to determine the resistance of the resistor by making series of measurements of voltage and current. A typical outcome could be something like this:

Voltage (V)

Current (mA)

1.0

3.3

2.0

6.4

3.0

8.1

4.0

12.3

5.0

14.4

A graph of these results strongly suggests that the current is a linear function of voltage (Fig. 6.8).

Fig. 6.8
A scatter and line graph of current in milliamperes versus voltage in volts plots a slant-increasing line for R = 330 ohms, with data points in and around the line.

Measurements of voltage and current in a resistor

In other words, one feels justified to conclude that the resistor has a constant resistance (R = U/I) of \(R\approx 330\, \Omega \). However, if one continues measuring voltage and current, this inference may be proven wrong. For one common type of resistors one would get something like the following data:

Voltage (V)

Current (mA)

1.0

3.3

2.0

6.4

3.0

8.0

4.0

12.3

5.0

14.4

6.0

16.5

7.0

19

8.0

21

9.0

23

10.0

24

11.0

24.5

We see that the resistance increases as the voltage increases. Higher voltage leads to higher currents which results in warming, which leads to higher resistance. It is well known both from experiments and theory that the power expenditure in many materials, as measured by warming, is proportional to \(current^2\), hence there is no linear relation between current and voltage, except as an approximation at low voltage (Fig. 6.9).

Fig. 6.9
A scatter and line graph of current in milliamperes versus voltage in volts plots a slant-increasing line, with data points in and around the line in a similar trend.

Extended measurements of voltage and current in a resistor

This is just a very simple example of a non-linear response where one can calculate a non-linear equation with good fit to any number of experimentally obtained data points. If this non-linear but continuous function can be guessed or derived from theory, one still has an explanation and can make good predictions.

In complex social-ecological systems there are many mechanisms, most of which are being poorly understood. There is seldom a possibility to perform controlled experiments and the data points are sparse. Having a small sample of data points which, just as in this simple example, suggests a linear relation between two variables, one naturally extrapolates this linearity to non-observed situations. But the extrapolation might prove wrong, the relation between the variables were not linear. In short, failed predictions is very often attributed to a non-linear and unforeseen connection between the predictor and the response variable.

7 Causation, Manipulation and Intervention

Our use of causal notions is basically connected to our interest in performing beneficial actions: we want to improve our conditions in all possible ways. Thus, several philosophers have suggested to define causation in terms of manipulation. The cause-effect relation is the relation between an action and the result of that outcome. Critics have objected that this is circular, ‘manipulation’ also expresses a causal notion. (Menzies and Price, 1993) countered this argument by pointing out that we have direct experiences of ourselves acting as agents:

The basic premise is that from an early age, we all have direct experience of acting as agents. That is, we have direct experience not merely of the Humean succession of events in the external world, but of a very special class of such successions: those in which the earlier event is an action of our own, performed in circumstances in which we both desire the later event, and believe that it is more probable, given the act in question, than it would be otherwise. To put it more simply, we all have direct personal experience of doing one thing and thence achieving another. We might say that the notion of causation thus arises, not as Hume has it, from our experience of mere succession; but rather from our experience of success; success in the ordinary business of achieving our ends by acting one way rather than another. (Menzies and Price, 1993, 194)

The point is that the meaning of the term ‘cause’ and its synonyms is determined by being used in direct linguistic interactions between people in concrete circumstances.Footnote 5 This is true not only of ‘manipulation’, but also of many other terms with a clear causal sense, as thoroughly discussed in Chaps. 2 and 3. So we hold Menzies & Price’ defence valid and it fits nicely with our observations in those chapters.

But, as Pearl observed (see the quotation in Chap. 2), we have since long extended our use of causal notions to cover also phenomena not in the scope of any possible human action. For example, the tides are caused by the motions of the sun and the moon, but, certainly, we cannot manipulate the motions of these celestial objects. This example indicates that we have generalised the concept of cause from covering merely human manipulations and their effects to a broader class of events. What, then, is the implicit idea behind this particular generalisation?

Extending the scope of a certain concept is always based on perceived similarities between old and new cases. In the case of extending the causal relation to be applicable to the connection between the motions of the moon, the sun and the tides, it is the physical links that is the basis.

The starting point is the application of the cause-effect relation to collisions between two bodies. The hit is the cause and the change of motion of the second body is the effect. Such events function as paradigmatic examples of cause-effect relations after the scientific revolution.

After Newton’s Principia we further learnt that physical interactions may obtain at distance, transmitted by gravitational, electromagnetic and other fields. So when gravitation theory could be used to derive the tides, using the motions of the sun and the moon as input, this interaction was naturally classified as an instance of causation.

The Causal Link Between the Tides and the Motions of the Sun and the Moon

Tables of the tides in English ports were published already in 1555Footnote 6 if not earlier. The tables were calculated from the motions of the sun and the moon, which had been predictable since long, and the correlations between the tides and the positions of the moon and the sun were known. But it was not known how the motion of these celestial bodies could cause the tides. Explaining this was one of Newton’s achievements. In his Principia (published 1687) he showed that by using the law of gravitation, applied to the water in the seas, the moon and the sun, he could explain the tides. In other words, he showed that there is a physical link, a force connecting these celestial bodies and the water in the seas. That was obviously sufficient for classifying this interaction as a cause-effect relation.Footnote 7 This is an example of how forces in general were conceived as mediators of causal effects.Footnote 8

We have already discussed another important extension of the idea of causation as manipulation, namely, so called natural experiments. In such experiments no intentional manipulation of a variable is made. Some authors have thus introduced the term ‘intervention’ as a substitute for, or rather extension of, ‘manipulation’, when talking about natural experiments. An intervention is a change of a variable that need not be the result of a intentional action by an agent.

Those who first introduced the concept of intervention had a rather restricted notion in mind, but it was soon extended. Here is Woodward’s description of this evolution:

Another important extension of interventionist ideas, also with a focus on inference but containing conceptual innovations as well, is due to Eberhardt (2007) and Eberhardt and Scheines (2007).

These authors generalise the notion of intervention in two ways. First, they consider interventions that do not deterministically fix the value of variable(s) intervened on but rather merely impose a probability distributionon those variables. Second, they explore the use of what have come to be called “soft” interventions. These are interventions that unlike the fully surgical (“hard”) interventions considered above (both Pearl’s setting interventions and the notion associated with M1–M4), do not completely break the previously existing relationships between the variable X intervened on and its causes C, but rather supply an exogenous source \(I \) of variation to X that leaves its relations to C intact but where I is uncorrelated with C.

Certain experiments are naturally modelled in this way. For example, in an experiment in which subjects are randomly given various amounts of additional income (besides whatever income they have from other sources) this additional income functions as a soft, rather than a hard intervention. Soft interventions may be possible in practice or in principle in certain situations in which hard interventions are not. Eberhardt (2007) and Eberhardt and Scheines (2007) explore what can be learned from various combinations of soft and hard, indeterministic and deterministic interventions together with non-experimental data in various contexts. Unsurprisingly each kind of intervention and associated data have both advantages and limitations from the point of view of inference. (Woodward, 2016)

The list M1–M4 of conditions for interventions referred to above is as follows (I = Intervention):

  1. (M1)

    I must be the only cause of X; i.e., the intervention must completely disrupt the causal relationship between X and its previous causes so that the value of X is set entirely by I.

  2. (M2)

    I must not directly cause Y via a route that does not go through X.

  3. (M3)

    I should not itself be caused by any cause that affects Y via a route that does not go through X.

  4. (M4)

    I leaves the values taken by any causes of Y except those that are on the directed path from I to X to Y (should this exist) unchanged.

Drawing conclusions about causal relations from statistical information is a central task in much of empirical science and several books have been written about this topic. Some useful ones are (Freedman et al., 2010), (Hernan and Robins, 2020), (Imbens and Rubin, 2015), and (Illari et al., 2011).

8 Summary

A common view is that scientific laws express causal relations. This is wrong; scientific laws state numerical relations between quantities such as mass, energy, momentum etc., but they do not express any causal relations between these quantities. The distinction between cause and effect is based on which quantity we directly manipulate in any concrete situation. The change of a variable performed by a certain manipulation is the cause and the change of some other variable, which according to a scientific law must change, is the effect. It follows that from a merely observed regression or correlation one cannot infer any causal relation. We need further information, telling us which interventions have been made, in order to draw any valid conclusion about a cause-effect relation.

Thus, mere statistical information, i.e. knowledge about correlations and regressions between two variables, is not sufficient evidence for a causal relation between them.

The argument applies also to non-quantitative variables. One can calculate statistical measures, such as Chi-square numbers, applicable to Boolean variables or variables over rank ordered data and estimate statistical dependencies. But the general lesson applies: statistical dependencies are insufficient for conclusions about causal relations.

The available data may allow a generalisation in the form of an ordinary equation where one variable is a function of one or several others, but this is not sufficient for taking the independent variables in that equation to be the causes of the dependent variable. The basic reason is that an ordinary equation, i.e., an identity sign flanked by two mathematical expressions, is symmetric; there is no asymmetry in the identity sign ‘=’. But structural equations aim to distinguish between left and right; the left hand side is thought to represent the effect and r.h.s the total cause. Thus structural equations differ sharply from ordinary equations, which is a strong reason not to use the common identity sign ‘=’ in structural equations.

A structural equation represents the asymmetry of cause and effect, and this asymmetry is postulated, hypothesised or empirically proven as the reason for formulating the structural equation.

Discussion Questions

  1. 1.

    Why are experiments considered necessary for reliable inference to causal relations?

  2. 2.

    Are there any scientific laws that clearly and explicitly express a causal relation?

  3. 3.

    Are there any scientific laws that has not the form of a universally generalised conditional?

  4. 4.

    Quite often one observes two correlated variables. What is required for us to infer that this correlation is due to a causal relation between them?

  5. 5.

    Similarly with regression: If Y = aX + b, under what conditions can one infer that X is a cause of Y?

  6. 6.

    What is the difference between manipulation and intervention?