4.1 Causal Inference

4.1.1 Covariance and Causation

Theory building and data analyses based on three or more variables offer many possibilities for refinement and increased accuracy beyond what has been discussed in Chaps. 2 and 3. One of these involves “causal inference.”

We know that a correlation between two variables, even a strong and statistically significant correlation—a correlation that justifies risking a Type I error—does not provide evidence that the relationship between the two variables involves causality. The distinction between a correlation and a causal connection is sometimes illustrated by silly, but humorous, examples. Here is one that we heard in the U.S. a few years ago.

Popular folktales pretend that newborn babies are brought to waiting parents by a stork. The image of a baby in a blanket hanging from the stork’s beak is familiar, at least in the U.S. Of course, storks do not really deliver babies; but wait, it turns out that there is a strong and significant correlation across a sample of geographic localities between the presence of storks and a relatively high number of babies born each year.

Does this mean that we should not be so quick to dismiss the story in the folktale? Of course not. The correlation does not indicate a causal connection. It reflects the impact of a third variable, and that third variable is probably whether or not a locality is urban or rural. Birth rates are higher in localities that are more rural, and storks are more likely to be found in rural localities. Thus, whether or not there are both more babies and more storks, or fewer of each, depends on whether the locality is urban or rural. It is the impact of this third variable, rather than a causal relationship between the original two, that causes a measure of storks and a measure of babies to covary.

Well, maybe this folktale is not so humorous, after all! At least it is silly! There are many silly examples of things that covary but do not involve a causal relationship. Consider another example: wearing shorts and eating ice cream covary. Is it possible that there is something about wearing shorts that pushes a person dressed this way to eat ice cream, that wearing shorts causes a person to eat ice cream? The correlation, again, does not indicate a causal connection. It reflects, rather, the impact of a third variable, and that third variable might be the temperature outside a person’s residence. When it is hot, people are more likely both to wear shorts and to eat ice cream; and so it is, again, the impact of this third variable, rather than a causal relationship between wearing shorts and eating ice cream, that causes the two variables to covary.

These examples, silly as they are, remind us that phenomena that vary together may do so for reasons having nothing to do with the variance on one variable determining, that is to say causing, the variance on a second variable. An online search of “correlation causation” yields many accounts and examples, some humorous and some less so, of strong bivariate relationships that do not involve causation. Among the examples given in a series of lectures entitled Real Statistics: An Islamic ApproachFootnote 1 is the correlation between height and reading ability among school-aged children. An increase in height does not cause an increase in reading ability, as if books and other things to read were on the upper shelves of bookcases and could be reached only by taller individuals. Rather, as illustrated in Fig. 4.1, age is a confounding variable. As young people get older, their height increases, they go farther in school, and their reading skills improve.

Fig. 4.1
figure 1

Impact of a Confounding Third Variable

Relationships that involve covariance, or association, but not causation are considered “spurious,” meaning that the two variables may appear to be causally related but in fact are not. Spuriousness may result from co-variance that is coincidental,Footnote 2 or it may reflect the influence of a third variable that is connected to both of the two strongly correlated variables. The latter possibility explains the three previous examples, with the confounding third variable being, respectively, the urban-rural character of the locality, the outside temperature, and age. This phenomenon is also sometimes called “omitted variable bias.”

Exercise 4.1

Can you think of another three-variable relationship in which two of the variables might appear to be causally related but are not because the third variable is a confounding variable?

  • What are the two variables that appear to be causally related? Which is the presumed cause, and which is the presumed effect?

  • Why do they appear to be causally related? Why might it be reasonable to think they are causally related?

  • What is the confounding variable? How is the confounding variable related to the other two variables in a way that makes them covary without being causally related?

How can a researcher determine, and offer evidence, that a relationship involves causality and is not spurious? This is among the most salient and frequently asked questions that a social science investigator needs to answer, and it is among the most challenging. Social science research designs that incorporate experiments offer one strategy for determining whether a relationship between two variables is causal, and then for offering evidence of causality should that be found. Experiments—natural experiments, field experiments, and survey experiments, among others—are frequently conducted by social scientists, and the use of experiments in social science research will be very briefly discussed later in this chapter. With the possible exception of social psychology, however, other data collection and/or data analysis procedures are much more common, usually because the conduct of an experiment is not possible or not appropriate given the topic and hypotheses being investigated.

This brings the focus of our discussion to multivariate analysis when seeking to infer causality, when testing hypotheses that posit a causal relationship between a dependent variable and an independent variable. And take note, the use of the term “infer” is deliberate. Causality is usually inferred, not demonstrated or proved, meaning that the investigator seeks to determine whether or not an observed relationship is probably, or very likely to be, causal. Described as “causal inference,” this involves elements both of theory and of research design. In quantitative studies, it will usually also involve multivariate statistical analysis. Analogous to the choice between Type I error and Type II error discussed in Chap. 3, the goal is to minimize the chance of error if causality is inferred, if an investigator concludes that a relationship is causal.

As discussed in the previous chapter, causal inference begins with the development of a causal story that is referenced by a testable hypothesis. Sources of the causal story, and therefore also the hypothesis, may include previous investigations by the researcher herself or other investigators, the researcher’s knowledge and personal experience relating to the subject of the causal story and hypothesis, and new insights drawn from reflections and theorizing that call upon what is sometimes described as the “sociological imagination.”

The important point here is that building a case for causal inference includes the delineation of a causal story that is at least very plausible and ideally very persuasive. While this might seem less rigorous than offering findings from a multivariate statistical test as evidence that a relationship is causal, a topic to which we turn next, it is actually no less important. In fact, the two must align; the causality attributed to a relationship whose statistical significance is confirmed must also make good sense. The fact that it does make good sense—that the causal story is coherent and persuasive, or at the very least plausible—is part of the case for causal inference that the researcher will need to build.

Against this background, we now turn to three interrelated considerations pertaining to causal inference. The first is the importance of a temporal sequence between the independent variable and the dependent variable. The second consideration is the use of multivariate statistical tests to derive probability values, which in turn give the researcher a basis for determining whether or not a hypothesized variable relationship that purports to be causal is very likely to be true—whether to risk a Type I error or a Type II error, in other words. The discussion focuses on multivariate regression, a widely used statistical technique that permits including and holding constant one or more control variables. Multivariate regression is the natural extension of bivariate regression, which was discussed in Chap. 3. The third consideration is a deeper look at control variables, including how they are defined and may be identified and why they are important. These three considerations are foundational elements of a convincing and robust causal story.

4.1.2 Temporal Sequence

Elements of research design that are relevant for causal inference include decisions related to the selection and measurement of key variables, beginning, of course, with the dependent variable and the independent variable. Further, there must be an appropriate temporal sequence between these two variables if causality is to be inferred. The cause of the independent variable, in other words, must precede the effect of the dependent variable.

Sometimes a temporal sequence occurs naturally given the structure of the study or the nature of the variables, and in these instances, the investigator need not do anything to ensure a sequential ordering. For example, an individual-level hypothesis that posits level of education as a determinant of current attitudes toward government held by adults posits a relationship between two variables that are by their nature sequentially arranged.

The requirement of a temporal sequence imposed by a concern for causal inference will very often determine the structure of an investigator’s research design. Designs that incorporate some of the elements of an experiment, and that might, therefore, be described as quasi-experimental, constitute one possibility. For example, studies that seek to assess the impact and explanatory power of a particular action or event can measure the dependent variable at a time before the action or event and then measure it again at a time after the action or event. The difference between the two time-specific measures, hence the variance on the dependent variable, may be attributable to the impact of the intervening action or event, which in this instance is the independent variable, the presumed cause.

The variance on the dependent variable might also be attributable to other things that took place during the time between the two measures; and for this reason, other elements will need to be included in the analysis before a persuasive case for causal inference can be built. These elements, most notably the identification and inclusion of control variables, will be taken up later in this chapter. The point to be retained at present is that the existence of a temporal sequence, while not sufficient for advancing a claim of causality, is a necessary element of a research design concerned with a causal relationship.

It is not unusual to survey a country’s population before and after a significant event, a national election for example, and then consider whether the attitudes or behavior of that population have changed in ways that might have been caused by the election. If the surveys are probability-based and nationally representative, the same population, although possibly not all of the same individuals, will have been surveyed at two points in time. Country is the unit of analysis in such studies, as illustrated by the meta-analysis described below.

An interesting variation on this country-level “Before and After” research design is provided by a meta-analysis that seeks to assess whether and how the Arab Spring uprisings in Tunisia, beginning at the end of 2010, contributed to changing, and improving, the status of women. The specific two-stage causal story to be assessed posits greater social media freedom both as a determinant of reduced violence against women and as a consequence of political changes brought by the country’s Arab Spring experience. Several studies have suggested this causal story, or some variation of it.

A review of these studies prepared by Lilia Labidi, a prominent Tunisian social psychologist, looks at research projects undertaken both before and after Tunisia’s Arab Spring uprisings. These uprisings, frequently described as the “jasmine revolution,” brought the fall of the country’s authoritarian government and, most relevant for the hypothesis, the removal of Internet censorship and restrictions on access to social media. Labidi reports that a number of private television and radio stations were started, and social media opportunities multiplied, with one individual able to maintain several Facebook accounts; and she then gives examples of the ways that advocates of women’s rights and gender equality used the new media freedoms to advance their cause.Footnote 3

One common criticism of a proposed causal story is the possibility that the direction of the causal relationship is actually reversed. With regard to the previous example, a critic might argue that increased support for women’s rights and gender equality pushed toward media reform, thus reversing the direction of the causality. But while worthy of consideration in the case of some hypothesized relationships, in this particular example Labidi calls attention to the temporal sequence, the before and after structure of the data. This helps to stave off criticism and strengthens her causal story.

Lagging independent variables, often referred to simply as lags, are a common way to ensure that a temporal sequence is built into the data and analysis used to test a hypothesized causal relationship. Lags provide the analytical structure, for example, in country-level studies in which both variables are time-specific, most often yearly, measures of aggregate national or societal performance or status. If, for example, a country-level study sought to test the hypothesis that Foreign Direct Investment (FDI) reduces a country’s level of unemployment, and if the study’s investigators had obtained or collected data on both variables for the 5 years between 2015 and 2020, the following are among the measures that might be used to test the hypothesis:

Dependent Variable

  • The dependent variable might be the difference between the percent unemployed in a given year and an earlier year

  • The specific year could be any in which the researcher has a particular interest or considers particularly important

  • The magnitude of the time between the 2 years will be specified by the researcher based on her knowledge of the data and the mechanisms of her causal story

  • The dependent variable might thus be the difference between unemployment in 2020 and 2019, or in 2020 and 2018, or even in 2020 and 2015, whichever best captures the variance for which the researcher seeks to account

Independent Variable

  • The measure of the independent variable might be the difference in FDI as a percentage of Gross National Product (GNP) between the earliest of the years on which the dependent variable is based and an earlier year

  • If the dependent variable is the difference between unemployment in 2020 and 2019, the independent variable might be the difference in FDI between 2019 and 2018

  • As in the case of the dependent variable, the magnitude of the time between the two measures of the independent variable will be specified by the researcher based on her knowledge of the data and the mechanisms of her causal story

In this hypothetical example, as noted, the specifics of the hypothesis to be tested might be that an increase in FDI between 2018 and 2019 caused the level of unemployment to decrease between 2019 and 2020. Notice the careful choice of the years for the independent and dependent variables. If a researcher were to claim that the increase in FDI between 2019 and 2020 caused the level of unemployment to decrease between 2019 and 2020, a critic would immediately respond that not enough time could have passed for the increase in FDI to be the driver of the change in unemployment. By lagging the independent variable and looking at FDI between the years 2018 and 2019, the researcher creates a temporal sequence between the independent and dependent variables and thus a much more convincing causal story.

This example is hypothetical, of course, and it is also simplified. But in fact, there have been serious tests of the proposition that an increase in FDI brings about a decrease in unemployment. A 2019 macroeconomic study of unemployment rates in general and youth unemployment in particular in eight Arab countries reports, “A positive impact of FDI on reducing national unemployment is proven in the group as a whole and individually in Jordan, Morocco, and Tunisia while it leads to an increase in unemployment in Egypt. The impact of FDI on reducing youth unemployment is not proven.”Footnote 4

This might be the place to introduce readers to the Cairo-based Economic Research Forum.Footnote 5 The ERF commissions and makes available numerous studies based on aggregate data with variables measured sequentially over time. A large proportion of these studies have an applied and policy-relevant focus. Among the ERF publications and working papers are, in fact, several studies that examine the relationship between FDI and unemployment in Arab countries. One of these compares and contrasts the impact of FDI in Arab countries and Asian countries.Footnote 6 The ERF website gives access to many other country-level studies that use aggregate data over time and lagged independent variables to test hypotheses about determinants of the variance associated with important economic and societal features, behavior, and performance.

4.1.3 Multivariate Regression

Multivariate regression is one of the inferential statistics most commonly used to test hypotheses in social science research. There are other statistical tests, of course, but attention to regression will be sufficient for present purposes, particularly because it has been used in many of this chapter’s examples of causal stories that involve more than two variables. Among the kinds and purposes of the additional variables that a multivariate regression analysis might include, beyond the dependent variable and one independent variable, are the following:

  • Multiple independent variables. Multivariate regression allows the researcher to test each of several hypotheses by considering one independent variable at a time with any others held constant

  • Control variables, which are discussed later in this chapter

  • Multiple indicators of the same, more abstract concept, or concepts, in order to consider the possibility that explanatory power resides in some dimensions of the concept but not in other dimensions

  • Other “third” variables, to which we turn in the section of this chapter devoted to “Third Variable Possibilities.”

These possibilities are not mutually exclusive. It would not be unusual for an investigator to include in her regression analysis variables selected for several of these objectives, or possibly even all of them.

Social science researchers, and certainly those that work with quantitative data, need to be broadly knowledgeable and competent with respect to inferential statistics, including, but not limited to, multivariate regression. Would-be researchers without this knowledge and competence should consult one of the many books on social statistics. With respect to the present discussion, only a cursory introduction to multivariate regression is offered, with an emphasis on how the statistic is used and how the results of its use should be understood. The goal of the present discussion is only to give readers enough familiarity with regression to understand and find instructive its use in the “third variable” designs to be introduced. The term “third variable” refers, generically, to the variable or variables that are added to the dependent variable and the independent variable in an analysis in order to enrich and/or increase confidence in a hypothesized bivariate relationship.

Ordinary least squares (OLS) regression is a statistical method of analysis used to estimate whether and to what extent a change in one or more independent variables brings a change in a dependent variable. This is sometimes described as estimating the strength of a relationship or predicting the effect that an independent variable has on a dependent variable. OLS is the most commonly used method for estimating the parameters of the linear regression model, and perhaps the most commonly used method overall in the social sciences. In addition to linear regression, which is used in the examples in this chapter, there are logistic and non-parametric forms of regression. These kinds of regression will be described very briefly following the discussion of OLS regression.

Linearity

OLS regression is a linear statistic, meaning that its estimates pertain to variable relationships that are presumed to be linear, or in a straight line. As discussed in Chap. 3, a linear relationship may be direct, or positive, in which case an increase in the independent variable brings an increase in the dependent variable, or it may be inverse, or negative, in which case an increase in the independent variable brings a decrease in the dependent variable. Figure 4.2 illustrates several different degrees to which a relationship may be linear, either positive or negative. These figures are called scatter plots. Each individual plot in the two-dimensional space represents the values of the two variables defining, respectively, the vertical axis and horizontal axis. Generally, the vertical, or y, axis shows values of the dependent variable, while the horizontal, or x, axis shows values of the independent variable.

As an example, consider the hypothesis that in Arab countries there is a negative linear relationship between an individual’s level of education and her satisfaction with the overall performance of the government. Accordingly, H1 posits that individuals who have had more education are more likely than individuals who have had less education to have an unfavorable judgment of the government’s overall performance. Although a positive linear relationship between education and satisfaction with government performance might have been expected and might seem more plausible, Arab Barometer data from Wave 5 surveys will in fact confirm the existence of a negative relationship, and this is no less instructive for illustrating linearity.

Figure 4.3 shows the plots on education and satisfaction with government performance of two of the respondents in the Arab Barometer Wave 5 surveys. The dependent variable, satisfaction with overall government performance, is on the vertical axis and is an 11-point scale, with 0 = total dissatisfaction and 10 = total satisfaction. The independent variable, level of education, is measured by a 7-point scale, with 1 = no schooling and 7 = a postgraduate degree. In between are 2 = primary school, 3 = intermediate school, 4 = secondary school, 5 = some post-secondary school education, and 6 = university bachelor’s degree or a comparable degree.

One of the two respondents in Fig. 4.3 has completed high school and has a score of 6 on the 11-point scale of satisfaction. The other respondent has had tertiary education, meaning some post-secondary schooling, and has a score of 3 on the 11-point satisfaction scale. Once the ratings on both variables have been entered for all of the respondents in the Arab Barometer Wave 5 surveys, the scatter plot will be complete and ready for visual inspection. Tests of statistical significance will, of course, guide the researcher’s decision about whether to conclude that the hypothesis has been confirmed, whether, in other words, to risk a Type I error. Visual inspection is often needed as well, however, in order to determine the structure of a relationship when the hypothesis of no relationship, the null hypothesis, has been rejected. The null hypothesis may have been rejected because the hypothesized linear relationship is true, or probably true. Or it may be rejected because the relationship between the independent and dependent variables appears to have a different structure than that proposed by the hypothesis.

Fig. 4.2
figure 2

Scatter plots showing degrees and direction of linearity

Fig. 4.3
figure 3

Scatter plot with ratings of two respondents on level of education and satisfaction with overall government performance

Probability Values

When researchers perform regression analyses, they are often most interested in the variable-specific coefficients and probability values, or p-values, that regression yields. A variable-specific coefficient, sometimes also called the slope, indicates the direction and magnitude of the relationship between an independent variable and a dependent variable. More specifically, as will be discussed more fully shortly, the coefficient provides an estimate of how much the dependent variable will change, either increasing or decreasing, in response to an increase of one unit in the independent variable.

The p-values associated with each coefficient indicate the likelihood that the researcher would have obtained the observed data, and thus the observed coefficients, from a population of units for which the null hypothesis, the hypothesis of no relationship, is true. Probability values are often of most immediate interest to an investigator because they provide her with a basis for deciding whether to reject the null hypothesis and accept her research hypothesis. Or, possibly, she may find a relationship between the variables in her hypothesis that differs significantly from the null hypothesis, hence a low p-value, but does not have the same structure as the one her hypothesis posits. In such a case, a scatter plot based on the independent variable and the dependent variable may help the researcher identify the structure of a variable relationship that differs from both the null hypothesis and the research hypothesis.

For example, suppose a researcher performs regression analysis using a sample from her population of interest and obtains a probability value of p = .01 for the independent variable in which she is interested. This indicates that were she to draw another 100 random samples from this population, most likely only one of these samples would be characterized by the null hypothesis. Conversely, 99 of the samples would be characterized by a relationship that differs from the null hypothesis and may lend support to the researcher’s proposed hypothesis. Another way to think about this is that, with a p-value of .01, there is a 1 in 100 chance that the sample analyzed by the researcher differs significantly from the population of interest. Those are pretty good odds, and in the social sciences most researchers would confidently reject the null hypothesis upon obtaining a p-value of .01.

As discussed in Chap. 3, probability values that are usually considered low enough to reject the null hypothesis are p < .05, p < .01 and p < .001, each of which indicates the probability of a Type I error. In other words, if a researcher decides to set her confidence interval at p < .01, she is saying that she is willing to accept her research hypothesis, and reject the null hypothesis, if there is no more than a 1 percent chance that the null hypothesis is in fact true of the population from which her sample was drawn. The .05, .01, and .001 probability values are also sometimes described as levels of statistical significance, or alpha values. Lower alpha values (and p-values) give greater confidence in the researcher’s findings.

Although widely used as standards for estimating statistical significance, the three alpha values are nonetheless arbitrary and subjective, as would be any other p-value. They are arbitrary since a p-value can be any number between 0 and 1. And they are subjective in the sense that the p-value does not tell an investigator how low is low enough to reject the null hypothesis and risk a Type I error. Other things being equal, the cost and consequences of making a Type I error will figure prominently in a researcher’s decision about whether or not to reject the null hypothesis. The higher the cost and the more injurious the consequences of being wrong, the lower the probability value she will require before considering the research hypothesis to be confirmed and proceeding to act on this basis.

Regression Results and Tables

Table 4.1 shows the results of an OLS regression analysis that uses data from Arab Barometer Wave 5 surveys to test the hypothesis that individuals who have had more education are more likely than individuals who have had less education to have an unfavorable judgment of the government’s overall performance. The hypothesis thus posits a strong negative, or inverse, relationship between the independent variable and the dependent variable. The table presents findings from a pooled analysis of data from 11 of the countries surveyed in Wave 5, the years of which are 2018 and 2019. The table also presents findings from single-country analyses of data from surveys in three of these 11 countries: Iraq, Lebanon, and Palestine. Importantly, the findings are not the same for the four sets of results presented in the table.

The numbers in the cells of Table 4.1 are regression coefficients and standard errors. The coefficients, as stated, express the magnitude and direction of a change in the dependent variable that is associated with an increase of one unit in the independent variable, with “caused by” replacing “associated with” if the relationship is very likely to be causal. As shown, a coefficient of −1.043 is obtained when judgments about government performance, the dependent variable, are regressed against education, the independent variable. This means that each increase in level of education decreases the value of the 11-point perception of government performance scale by 1.043.

The standard error, or standard error of the mean, is an estimate of the difference between the mean of an investigator’s sample and the mean of the population that her sample purports to represent. Given this, the standard error is at the same time an estimate of how much difference there would be between the mean of a variable in her sample and the means of this same variable in other samples she might draw were she to repeat her study. As is implied by the term “error,” lower values for standard errors indicate increased confidence in OLS results and hypothesized relationships.

The table also gives the value of the constant, also known as the intercept. This is the value of the dependent variable when the independent variable has a value of zero. As will be shown, a formula that includes both the constant and the regression coefficient can be used to estimate, or predict, hypothetical values of the dependent variable. Estimating or predicting values of the dependent variable might be, but very often is not, the objective of a social science research project that employs multivariate regression. Nevertheless, as in the case for the coefficient and the standard error, it is important to understand the kind of information that is provided by each value in a regression table.

The findings in Table 4.1 that are of most immediate interest are the probability values, which indicate the likelihood of being wrong if the research hypothesis is accepted and the null hypothesis is rejected. As discussed above, these p-values estimate the likelihood of finding the pattern an investigator actually observes if her sample has been selected from a population of units—of individuals, countries, or any other unit of analysis—that is characterized by the null hypothesis. The lower the likelihood that the population is characterized by the null hypothesis, the safer it is to conclude that the sample or subset of units drawn from that population depicts an existing, or true, relationship. And accordingly, then, the lower will be the likelihood of making a Type I error when concluding that the independent variable does in fact account for some of the variance on the dependent variable,.

As shown in Table 4.1, levels of statistical significance are often indicated by the presence and number of stars next to each variable in the table. A note at the bottom of the table indicates the p-values represented by one, two, and three stars. A variable next to which there are no stars is not strongly related to the dependent variable; the probability of finding a strong relationship involving this variable in a sample drawn from a population characterized by the null hypothesis is not low enough to reject the null hypothesis.

Table 4.1 Findings from regression analyses testing the hypothesis that individuals who are more educated are more likely than individuals who are less educated to judge overall government performance to be unsatisfactory

Several conclusions about the hypothesized relationship between level of education and satisfaction with overall performance of the government can be drawn from Table 4.1. First, focusing on the pooled analysis based on data from the Arab Barometer Wave 5 surveys in 11 countries, the hypothesized relationship between overall satisfaction with government performance and level of education is very strong and statistically significant at the .001 level of confidence. It is extremely unlikely that data exhibiting a relationship as strong as this were drawn from a population characterized by the null hypothesis. Accordingly, given these findings, the investigator would normally consider the research hypothesis to have been confirmed, reject the null hypothesis, and run the risk of a Type I error.

Second, findings about the hypothesized relationship found in the pooled analysis are not the same as the findings found in each of the countries. Sometimes the relationship between education and satisfaction with the government is also strong and statistically significant, as in Palestine. Sometimes it is statistically significant but at a lower level of confidence, as in Iraq. And sometimes, as in Lebanon, the relationship is not statistically significant and the researcher would probably choose to risk a Type II error, accepting the null hypothesis and rejecting the research hypothesis even though there is a chance that the latter might be true.

Third, given different findings across at least some of the countries included in the Wave 5 surveys of the Arab Barometer, the investigator might wish to reflect on, and perhaps offer hypotheses about, the determinants of this cross-country variance. Formulating, and perhaps also testing, such hypotheses would have country as the unit of analysis, have the bivariate relationship between education and satisfaction with the government as the dependent variable, and have country attributes or experiences as independent variables.

The Tradeoffs of Pooled Analyses

As Table 4.1 shows, findings from analyses that take together data from 11 of the countries surveyed in Wave V of the Arab Barometer, in what is called a “pooled” analysis, may not be the same as findings from analyses based on data from each individual country. This may or may not mean that findings from pooled analyses are misleading and that such analyses should not be undertaken.

If the objective of a research project is to identify univariate, bivariate, or multivariate relationships that apply to all of the groups, countries in this case, on which an investigator has data, findings from a pooled analysis may be misleading. In this case, the researcher will need to consider each group, or country, separately in order to determine whether or not the same findings apply to each group. The researcher may still wish to carry out a pooled analysis, for convenience or other reasons, but she may not claim that findings produced by a pooled analysis apply to all of the groups that make up the pool unless she has analyzed each group separately and found this to be the case.

If the objective of a research project is not to identify patterns that apply to all of the groups, countries in this case, on which an investigator has data, but rather to test hypotheses and offer insight and evidence about important causal stories, then pooled analysis will expand the data available and may be completely appropriate. Hypotheses being tested will, if confirmed, have been found to have substantial and broad explanatory power, even if they do not necessarily describe explanations of variance that obtain in any particular group.

The Slope-Intercept Equation

There is a simple equation, frequently called the slope-intercept equation, that makes use of the information provided by the coefficient, or slope, and the constant, or intercept, to estimate the value of the dependent variable for a particular value of the independent variable: y = mx + b

Where:

  • y is the value of the dependent variable, which is not known

  • x is the value of the independent variable, which is known and specified

  • m is the value of the change in y produced by a change of one unit in x; in regression, this is given by the coefficient, or slope, and frequently called the beta value or beta estimate

  • b is the value of y when x = 0; in regression, b is given by the constant and is frequently called the intercept

This equation omits a term that is ordinarily included in multivariate regression when the goal is to estimate the value of a dependent variable. This is called the “error term,” and it is represented by “e” as shown in the following equation: y = mx + b + e. The error term is a value that represents the difference between the value of a population or universe, a value that can only be estimated and is therefore sometimes called the “theoretical value,” and the actual observed value based on available data, usually a sample.

Note also that when referring to OLS, the slope-intercept equation, simplified here, is frequently rendered as: Y = Beta_0 + (Beta_x) (X).

Where:

  • Beta_0 is the constant, or intercept

  • (Beta_x) is the coefficient, or slope, for the variable(s) included in the regression

  • (X) is an observed value of the independent variable

An application of the slope-intercept equation to predict a value of the dependent variable (y) based on the coefficient and constant in Table 4.1 is shown below. For this illustration, the value of the independent variable (X) is 4, meaning that the value of the dependent variable is being predicted for individuals with a secondary school education, those with a 4 on the 7-point education-level scale. The dependent variable, again, is an 11-point scale of satisfaction with overall government performance, with 0 = totally dissatisfied and 10 = totally satisfied. Application of the formula predicts that individuals with a secondary school education will have a score of 5.483 on this scale.

  • y = mx + b

  • y = 4 * coefficient + constant

  • y = 4 * −1.043 + 9.655

  • y = −4.172 + 9.655

  • y = 5.483

The error term, as noted, is a value that represents the difference between the value of a population or universe, a value that can only be estimated, and the actual observed value based on an investigator’s data. In the example above, 5.483 is the predicted value on the 11-point perception of government performance scale for individuals with a 4 on the 7-point level of education scale. However, not every individual with an education level of 4 surveyed by the Arab Barometer answered the government satisfaction item with a response of 5.483. In fact, obviously, a response of exactly 5.483 was not an option. The error term is the difference between an individual’s actual response and the predicted response of 5.483. The error term for an individual who had a secondary school education and judged government performance to deserve a 7 on the 11-point scale would be 7 − 5.483, or 1.517. Further discussion of the error term is beyond the scope and purpose of the present account. Readers wishing additional information about the conceptualization and measurement of the error term, and about multivariate regression more generally, will find this readily available in books on multivariate statistics.

Exercise 4.2. Estimating Satisfaction with Overall Government Performance

Use the findings presented in Table 4.1 to estimate the satisfaction with overall government performance score of each set of respondents listed below. Satisfaction with overall government performance, the dependent variable, is measured by a 0–10 scale with 10 indicating the highest level of satisfaction. What is the score on this 11-point scale for each of the following:

  1. 1.

    All Wave V respondents with a rating of 6 on the 1–7 scale measuring level of education. A rating of 6 on the 7-point education scale indicates that an individual has had a university education.

  2. 2.

    Iraqi respondents with a rating of 4 on the 1–7 scale measuring level of education. A rating of 4 on the 7-point education scale indicates that an individual has had a secondary school education.

  3. 3.

    Palestinian respondents with a rating of 4 on the 1–7 scale measuring level of education. A rating of 4 on the 7-point education scale indicates that an individual has had a secondary school education.

  4. 4.

    In what way is satisfaction with overall government performance different for Iraqi respondents with a secondary school education and Palestinian respondents with a secondary school education?

  5. 5.

    Lebanese respondents with a rating of 2 on the 1–7 scale measuring level of education. A rating of 2 on the 7-point education scale indicates that an individual has had a primary school education.

Other Types of Regression

Multivariate regression is a parametric statistic, meaning that assumptions are made about the distribution of variables in the population from which the data to be analyzed have been obtained. OLS regression is used when the dependent variable is continuous. It makes several strong assumptions, including, most importantly, that there is a linear relationship between the independent and dependent variables.

Although perhaps the most common, OLS regression is not the only regression model used to test hypothesized variable relationships. Another form of parametric regression is logistic regression, which is used when the dependent variable is a categorical variable. Binary logistic regression is used when the dependent variable has two categories, such as agree/disagree, present/absent, or high/low; multinomial logistic regression is used when the dependent variable has more than two categories; and ordinal logistic regression is used when the dependent variable has ordered categories, such as primary, secondary, and university levels of education.

There are also non-parametric types of regression, which require fewer assumptions about the shape or form of variable distributions in the population from which the data to be analyzed have been obtained. Non-parametric regression statistics are not as powerful with smaller samples.

Discussion of these other forms of multivariate regression is beyond the scope and purpose of the present account. They are mentioned only to alert readers to their existence. Readers wishing additional information, including about parametric requirements and assumptions, will find this readily available in books on social statistics.

4.1.4 Control Variables

Tests of hypotheses that posit variable relationships that purport to be causal usually require multivariate analysis. Along with the independent variable and the dependent variable, the analysis will require the inclusion of one or more additional variables. Control variables are of most immediate importance here, and it is here that the multivariate character of analyses devoted to causal inference comes into play.

Control variables are usually variables that are related to both the dependent variable and the independent variable and that, because of these parallel relationships, might cause the dependent variable and the independent variable to covary. Should this be the case, an investigator might be tempted to conclude, mistakenly, that the dependent variable and the independent variable covary because variance on the former is determined, or caused, by variance on the latter. However, should this be the case, such a conclusion would be wrong; as far as causality is concerned, the relationship would be spurious.

The reason that a relationship between two variables that covary might be spurious is illustrated by the stork-baby folktale and the shorts-ice cream story mentioned earlier. Despite the covariance of the stork measure and the baby measure, or of a wearing shorts measure and an ice cream measure, there is no causal connection between the measures in each pair. Rather, it is the impactful relationship of both variables in each pair to a third variable, rural-urban character of the localities in the first instance and temperature outside the home in the second, that produces the covariance.

This situation, or problem, where there is the possibility of attaching causality to a hypothesized variable relationship that is actually spurious with respect to causality, is often referenced by the term omitted variable bias, which is a form of endogeneity. A researcher has an endogeneity problem when there is a third variable, an endogenous variable, or a number of endogenous variables, that is related to her dependent variable and is also related to her independent variable but has not been identified and taken into consideration. It has not, more specifically, been included in a test of the hypothesized relationship between the investigator’s independent variable and her dependent variable.

It is the designation “not taken into consideration” that makes endogeneity a problem. If a researcher knows that there is a third variable that affects both the dependent variable and the independent variable, she can include it in her analysis and treat it as a control variable. In this way, she can hold it constant and remove its impact. The challenge is to know, or identify, the third variable, or the fourth, fifth, sixth, and other variables, that fit this description and need to be included in the analysis and controlled.

In all probability, the researcher already knows some of the variables that need to be controlled. Research projects in which the individual is the unit of analysis very often include demographic attributes, like sex, age, education, and others as control variables. Studies in which country is the unit of analysis might include per capita gross national product, per capita national income, and position on a democracy-authoritarianism scale as control variables. Beyond these, however, and also in research projects that focus on other units of analysis, an investigator must be alert to less immediately obvious variables that may need to be controlled, perhaps interpersonal trust or civic engagement for individuals and ethnic diversity or percentage of women in the labor force for countries.

To identify other variables that may need to be controlled, an investigator will want to consult previous research on the subject of her study; and still others may suggest themselves as she continues to reflect and deepen her understanding of the causal stories that her hypotheses represent. In any event, to the extent that relevant variables are not identified and an endogeneity problem persists, findings about the researcher’s hypotheses may be incomplete or even wrong.

After finding a strong and significant bivariate relationship between a dependent variable and an independent variable, identifying and adding one or more endogenous variables to the analysis as control variables—endogenous variables being, again, those that are related to the independent and dependent variables—will produce one of two possible results. Either the strong and significant relationship between the independent and dependent variables will remain strong and significant or it will cease to be strong and significant.

If the relationship between the independent variable and the dependent variable ceases to be strong and significant when a third variable, an endogenous variable, is included in the analysis and thereby controlled, it will become clear that the relationship found in a bivariate analysis was indeed spurious with respect to causality. To return for a moment to the humorous and silly examples previously used to illustrate the possibility of an endogeneity problem, the relationship between storks and babies will cease to be significant if the nature of the locality is considered as a control variable, as will the relationship between shorts and ice cream if temperature outside the home is considered. It will then be clear that the relationship between the independent variable and the dependent variable, however strong might be the bivariate correlation between them, is not a causal relationship.

Alternatively, if the relationship between the independent variable and the dependent variable remains strong and significant when one or more endogenous variables are included in the analysis and controlled, the case for causal inference will be strengthened. Whatever might be the impact of the control variable, or control variables, this is not the reason that the independent variable and dependent variable covary. More likely, the independent variable and dependent variable covary because the former is a determinant of the latter, because variance on the independent variable is a cause of variance on the dependent variable.

Causation cannot be proved, of course. It can only be inferred. And there may well be endogenous variables that have not been identified and controlled. Nevertheless, the case for causal inference will be stronger, and the probability of making a Type I error will be lower, if there is a strong and statistically significant relationship between the independent variable and the dependent variable, if there is a temporal sequence between these two variables, and if relevant and potentially endogenous variables have been identified and included in the analysis.

In multivariate statistical analysis, control variables are often included in the regression models that are run. Here model refers to a particular set of variables that are included in an analysis, along with the dependent and independent variables. It is not unusual for an investigator to run a number of models, one without any control variables in order to observe the strength and significance of the hypothesized relationship without any interference, and then one or more models with control variables, or different subsets of control variables, in order to see if the strength and significance of the hypothesized relationship change. As stated, whether a significant hypothesized relationship loses significance or remains significant in models that include control variables has clear implications for causal inference.

Procedures along these lines, with the goal being causal inference, are employed in numerous political and social science studies carried out in Arab countries and societies. One innovative and instructive study examines the behavior of members of parliament in Algeria and Morocco. It analyzes data from an original survey of 200 male and female parliamentarians.Footnote 7 Some of the study’s hypotheses seek to account for variance in the kind or amount of constituent service work undertaken by different categories of deputies, variance associated with constituent service work being, therefore, the dependent variable. Below are two of these hypotheses. Each specifies a different independent variable, which it posits as one of the determinants, or causes, of the variance associated with the kind of constituent service work that deputies perform.

  • H1. Female deputies are more likely to serve female and less influential constituents than are male deputies.

  • H2. Quota-elected female deputies are more likely to serve female and less influential constituents than are non-quota-elected female deputies.

The table below is a reconstructed and simplified version of one of the tables in the article that describes this study and reports its findings. The table, based on an analysis of the 82 Moroccan deputies who were interviewed, presents the results of a multivariate statistical analysis, ordinary least squares regression, that tests the two hypotheses. Model 1 tests H1 and Model 2 tests H2. The dependent variable in both is the extent to which a deputy carried out service work on behalf of female and less influential constituents. The degree of service work devoted to these constituents is measured on an 8-point scale, with 8 = more service work on behalf of female and less influential constituents. Each model shows the relationship between the independent variable and the dependent variable. Each model also includes six control variables, a subset of those in the published article, and shows the statistical significance of each one’s relationship to the dependent variable.

The findings presented in Table 4.2 show that the analysis supports both hypotheses. In both cases, the probability that the null hypothesis is true and that the researcher will make a Type I error if she rejects it and considers the research hypothesis to be confirmed, is less than .05. Note that in this table, the p-values have not been explicitly listed, so the reader must rely on the star system detailed in the note below the table and mentioned previously in this chapter in order to determine statistical significance.

With respect to H1, the probability that deputies who do more service work on behalf of female and less influential constituents are not more likely to be female is less than 5 percent (p < 0.05), and so the investigator concluded that the risk of making a Type I error is low and reported, accordingly, that H1 is confirmed. With respect to H2, the probability that deputies who do more service work on behalf of female and less influential constituents are not more likely to be female and to have entered the assembly through a quota of seats reserved for women is less than 5 percent (p < 0.05), and so the investigator again concluded that the risk of making a Type I error is low and reported, accordingly, that H2 is confirmed.

The six control variables in Table 4.2 are only some of the control variables in the table in the published article. As discussed, control variables are selected for inclusion in multivariate statistical analyses that test hypotheses because the investigator wishes to consider, and hopefully rule out, the possibility that a hypothesized variable relationship that purports to be causal is actually spurious. If the relationship between a dependent variable and a hypothesized independent variable is statistically significant when control variables are not included in the analysis but then ceases to be statistically significant when one or more control variables are included, the researcher will be forced to conclude that the relationship is not causal—or at least that it is not a direct causal relationship. The possibility of an indirect causal relationship will be discussed in the section of this chapter devoted to “Third Variable Possibilities.”

Table 4.2 Some determinants of service work on behalf of female and less influential constituents by Moroccan deputies

Exercise 4.3 Connecting Hypotheses and Causal Stories

H1 and H2 in the study of constituent service work done by members of the Algerian and Moroccan national assemblies represent and call attention to a fuller causal story. Describe in two or three sentences what, in your best judgment, is a plausible causal story that tells why it is that Algerian and Moroccan members of parliament who do more service work on behalf of female and less influential constituents are more likely to be female and also more likely to have entered the assembly through a quota of seats reserved for women.

To select control variables, an investigator will reflect on and attempt to identify variables that may be related to both the dependent variable and the independent variable in ways that cause the two to covary or otherwise have an impact on the relationship between them. The impact of a potential control variable cannot always be known in advance, and it is unlikely that an investigator will be able to identify and include in a test of her hypotheses all of the control variables that might possibly be relevant. Nevertheless, confidence in a finding that her data support a hypothesized causal relationship will be much lower if plausible control variables have not been identified and included in her analyses.

Table 4.2 shows that the hypothesized causal relationships that H1 and H2 posit remain statistically significant at the .05 level of confidence when a number of control variables have been included in the analysis. Accordingly, this makes it more reasonable not only to risk a Type I error and accept the hypotheses but also to infer that the hypothesized relationships are very probably causal. When a hypothesized dependent variable-independent variable relationship is found to be statistically significant, in most cases with a p-value of .05 or lower, the inclusion of control variables increases confidence that this significant relationship is not due to the impact of one or more other variables and thus is probably not spurious.

In addition, however, confidence in causality when control variables are included in the analysis depends on the plausibility and relevance of the particular control variables that have been selected. On the one hand, variables that are known to be associated with both the dependent variable and the independent variable, or might reasonably and logically be thought to be associated, are those whose inclusion is most important. On the other hand, it is also important to have theoretical reasons for the control variables that are included, meaning that their connections to the dependent and independent variables should make sense in terms of the hypotheses and causal stories being investigated.

Researchers should be cautious about including additional control variables just in case they might have an unsuspected impact on the hypothesized relationship. Too many control variables can damage statistical estimates, particularly if the size of a researcher’s dataset is small. By itself, the availability of data is not a good reason to include a control variable. Rather, if a researcher cannot explain why a particular variable should be included as a control, she probably should not include it in her analysis. In this way she avoids a common pitfall of multivariate analysis known as overfitting.

The control variables in Table 4.2, which are among those in the table in the published article, were selected with the previously mentioned criteria in mind. The author’s rationale for including variables based on the urban population of the deputy’s home district is given below. It suggests that these district-level attributes might influence the relationship between, on the one hand, a deputy’s gender and/or whether or not she entered the assembly through a quota of parliamentary seats reserved for women and, on the other hand, the categories of constituents most likely to benefit from the deputy’s service work. Without the inclusion of these variables as controls, the researcher’s ability to conclude and then report that her hypotheses had been confirmed, including the causal connection that the hypotheses posit, would be very much weaker.

[Measures of district population are among the variables related to perceived electoral incentives that have been included as controls.] Women elected in larger districts, including Algiers with 3 million residents and 32 seats, may be more responsive to females, due to greater ability to serve less influential constituencies, the presence of civil society organizations, and urban, employed female constituents. Women elected in small districts (e.g., Moroccan districts with 2 to 5 seats) may have stronger incentives to cultivate a personal vote among constituents of both genders and have lower responsiveness to women.

Exercise 4.4 Identifying and Selecting Control Variables

Table 4.2 includes six control variables, and the author’s rationale for selecting some of them has been given. She states that they were selected because they are “among the variables related to perceived electoral incentives.”

One of the variables included as a control variable is whether or not the deputy is a member of an established center-left political party.

  • Why do you think the researcher thought it necessary to control this variable? In what way might this variable be related to the dependent variable and the independent variable, possibly causing them to covary and, for this reason, making it necessary to include it in the analysis as a control variable?

The six control variables in Table 4.2 are only some of the variables the investigator deemed it necessary to control.

  • Making your best guess, identify another variable that it would probably be necessary to control. Then give your reasons for selecting it; suggest how and why it might be related to the dependent and independent variables, thereby requiring that it be controlled.

4.2 Third Variable Possibilities

4.2.1 Other “Third” Variables

We turn now to ways that the refinement of a research design and the inclusion of additional variables, beyond those included for purposes of control, can enrich causal stories and/or make them more informative and more precise. These additional variables are frequently referred to as “third” variables, even though more than one of them may be added to the variables already included in the researcher’s models. Table 4.3 identifies and describes the three components into which our discussion of third variables is divided. What the three third variable types and roles share is attentiveness to the possibility that a causal story may involve more than two variables that are significantly related and remain so when tested in analyses that include relevant control variables.

Table 4.3 Third variables in multivariate causal stories

4.2.2 Direct and Indirect Relationships

A researcher might wonder not only whether the bivariate relationship she has observed is causal but also whether it is a direct or an indirect relationship, a distinction with very different implications about whether and how the independent variable impacts and accounts for variance on the dependent variable. Our discussion to this point has not made a distinction between direct and indirect relationships; we have for the most part proceeded as if our concern were only with direct relationships, relationships for which a change in the independent variable directly brings, and presumably causes, a change in the dependent variable. In this case, the causal story does not involve any other variables. The pathway from the independent variable to the dependent variable does not run through one or more other variables.

This is not the only possibility, however. The pathway at the center of a causal story may not lead directly to the dependent variable. Instead, it may initially lead to a third variable, making this third variable part of the causal story, and then lead from the third variable to the dependent variable. For example, the previously noted individual-level relationship between evaluation of the government’s economic performance and the likelihood of voting might involve such a pathway. The hypothesis that posits evaluation of the government’s economic performance as a determinant of the decision to vote or not to vote might actually involve an additional variable, such that the pathway leads from evaluation of government performance not to the decision about voting but rather to a third variable, perhaps trust in the government, and then from the third variable to voting.

This was, in fact, the finding of a study that used Arab Barometer data from five countries to test a hypothesis about the relationship between evaluation of government economic performance and voting and specifically to test the proposition that more favorable evaluations of government performance push toward greater likelihood of voting.Footnote 8 Bivariate analysis showed that the relationship between evaluation of government performance and voting had a very low probability of being spurious, and this remained the case in a multivariate analysis that included control variables. Accordingly, with the likelihood of making a Type I error very low, the researchers judged the hypothesized bivariate relationship to have been confirmed.

The figures below illustrate different possibilities with respect to direct and indirect relationships. An unbroken line between two variables indicates that the analysis has found a statistically significant relationship between these variables. X, Y, and M are the variables in this illustration; Y is the dependent variable; X is the independent variable; M is a third variable, trust in the government in this case.

  • Figure 4.4 illustrates the results of a bivariate analysis that finds a statistically significant and direct relationship between X and Y. It is direct because there is not another variable in the pathway from X to Y. The researcher will recognize, of course, that the finding of an indirect relationship is not possible in bivariate analysis. The bivariate analysis can only determine whether or not the relationship between two variables is statistically significant, and also the structure and direction of the relationship. The structure and direction of variable relationships were discussed in Chap. 3.

Fig. 4.4
figure 4

Bivariate analysis finds a direct relationship

  • Figure 4.5 illustrates the results of a multivariate analysis that finds a statistically significant and direct relationship between X and Y. The analysis also finds a significant indirect relationship, wherein one of the pathways leads from X to M and then from M to Y. If all of the separate bivariate relationships are statistically significant, as they are in Fig. 4.5, the researcher can report that the causal story, in this case, includes both a direct way and an indirect way that the independent variable affects and accounts for variance on the dependent variable. The basis for attributing causality to these relationships will be much stronger if control variables are included in the analysis and the relationships shown in Fig. 4.5 remain statistically significant.

Fig. 4.5
figure 5

Multivariate analysis finds both a direct and an indirect relationship

  • Figure 4.6 is taken directly from the study, previously cited, that used Arab Barometer data to test the individual-level hypothesis that the more favorable an individual’s evaluation of the government’s economic performance, the more likely this individual will vote in national elections. In this case, in contrast to the relationships shown in Fig. 4.5, the multivariate analysis shows that there is only one statistically significant relationship involving the independent variable, and that this statistically significant relationship is not directly between the independent variable and the dependent variable.

Figure 4.6 shows that there is, nonetheless, a pathway leading from the independent variable to the dependent variable. It is indirect rather than direct, however, with the pathway running through a mediator variable, trust in the regime, in this case. The causal story to be reported by the investigators is, therefore: more positive evaluations of the government’s economic performance increase trust in the government, and greater trust in government increases the likelihood that a citizen will vote. The dotted arrow leading from evaluation of the government’s economic performance to likelihood of voting is intended to show that this relationship was significant in a bivariate analysis but ceased to be significant in a multivariate analysis that included trust in the governing regime, also sometimes called political trust.

Fig. 4.6
figure 6

Multivariate analysis finds only an indirect relationship

One way, and the most straightforward way, to test for the indirect effects shown in Figs. 4.5 and 4.6 begins with running three different bivariate regressions, one for each of the three relationships between X, Y, and M taken two at a time. There will be the possibility of a direct relationship if the X–Y connection is statistically significant, and there will be the possibility of an indirect relationship if the X–M connection and the M–Y connection are also both statistically significant.

The researcher will then proceed to multivariate analysis to see if these relationships remain significant in models that include all three variables—and also any relevant control variables. If all three two-variable relationships remain significant in the multivariate analysis, as shown in Fig. 4.5, the independent variable, X, will have been shown to have both a direct and an indirect effect on the dependent variable, Y.

Figure 4.6 depicts an alternative possibility: that there is an indirect relationship between the independent variable and the dependent variable but there is not a direct relationship between the two variables. This is the case if:

  • the two relationships in Fig. 4.5 involving the mediating variable (M), trust in the governing regime, remain significant; and

  • the relationship between evaluation of the government’s economic performance and likelihood of voting, the X–Y relationship shown in Figs. 4.4 and 4.5, ceases to be significant when the multivariate analysis includes M.

As noted, this discussion of “Other Third Variables” seeks to introduce some of the ways that the addition of variables can produce instructive findings that might otherwise have been missed. Multivariate analysis makes it possible to test hypothesized bivariate relationships with control variables included in the analysis, and this can very significantly strengthen the case for causal inference. Beyond this, however, are many other ways in which a more sophisticated and nuanced, and hence more informative, causal story can be proposed and evaluated. Considering indirect as well as direct relationships is one such possibility. Moreover, indirect variable relationships can be proposed when hypotheses are formulated. In other words, unanticipated findings that result from data analysis are not the only way that attention might be called to such relationships. Indirect variable relationships may also, when relevant, deserve attention in the theorizing that precedes data analysis.

4.2.3 Disaggregation/Conditional Effects

Disaggregation refers to the process of separating something into its component parts. In social science research, an investigator may find it useful to consider disaggregation with respect to the population, or sample, of the units on which she has data. She may also find it useful to consider the disaggregation of the more abstract concepts, or the indicators of these concepts, that are important parts of the causal story she seeks to evaluate. Accordingly, the purposes for which an investigator may wish to disaggregate elements of a research project include more nuanced insights and greater precision in specifying the cases to which key findings apply. Disaggregation may also be undertaken to capture the dimensionality of key concepts and of associated variable relationships.

The line plot in Fig. 4.7 is based on Arab Barometer Waves 3, 4, and 5 surveys in Tunisia, which were carried out, respectively, in 2013, 2016, and 2018. The distribution based on all respondents (in orange) that is plotted over the three time periods shows the percentage of those who agreed or agreed strongly with a sentence stating that a university education is more important for men than for women. The line plot suggests questions that might be instructive to explore. Why, for example, was there a decrease in 2016 and an increase in 2018 of individuals who agreed with a proposition that is inconsistent with gender equality?

But while this and other questions raised by the line plot might deserve attention, the chart is presented here as a very simple example of disaggregation. The line plot shows that Tunisian men are significantly and consistently more likely than Tunisian women to agree with a proposition that favors men over women in university education. This is not the view of most men. Even at its highest level, in 2018, only 25 percent of Tunisian men expressed this view. Nevertheless, men in this instance are always less supportive of gender equality than are women, and this important finding would have been missed if men and women had not been analyzed separately, if there had not been disaggregation with respect to sex. The relationships that result from disaggregation are sometimes called conditional effects, with sex in this case being the conditioning variable.

This particular example is perhaps too simple; if support for gender equality were either the dependent variable or an independent variable, it is very likely that the analysis would at some point have compared the attitudes of men and women, perhaps by including sex as a control variable and thus making disaggregation superfluous. But the principle remains relevant and important: Distributions and relationships may appear one way for some subsets but not the same way for other subsets of the population or sample on which a researcher has data.

Fig. 4.7
figure 7

Percent of Tunisians in 2013, 2016 and 2018 surveys agreeing that university education is more important for men

The following example also uses individual-level survey data, and the dependent variable is again views about gender equality. Based on a study published in 2017, the authors analyzed a dataset constructed from one or more surveys in 15 different Arab countries,Footnote 9 and they tested hypotheses that posit religious, economic, and political factors as determinants of the variance associated with attitudes toward gender equality. The measures of some variables, including attitudes toward gender equality and personal religiosity, are indices based on a number of items in the survey instruments. Two of the hypotheses, one pertaining to religiosity and one pertaining to economic circumstance, are shown below. The table that follows shows the findings about each hypothesis, first for all respondents and then for subsets of respondents disaggregated on the basis of gender, age, and education taken together.

  • H1. Individuals who are more religious are less likely than individuals who are less religious to support gender equality.

  • H2. Individuals in more favorable economic circumstances are more likely than individuals in less favorable economic circumstances to support gender equality.

The table below, Table 4.4, shows that there is among all respondents and as hypothesized: (1) a significant and inverse relationship between personal religiosity and support for gender equality; and (2) a significant and positive relationship between more favorable economic circumstance and support for gender equality. In addition, however, it also shows that there is more to be learned from disaggregation. The hypotheses are tested for subsets of respondents grouped according to sex, age, and level of education taken together, and the table shows that in only some of these demographic categories are the findings the same as those based on all respondents. For example, the hypothesized inverse relationship between religiosity and support for gender equality is confirmed when the analysis is based on all respondents, but in fact religiosity does not have explanatory power among younger men and younger women.

A fuller discussion of the nature and implications of these findings is beyond the scope of the present discussion. Nevertheless, it will be clear that the study would have reported findings that are at best incomplete had the authors not disaggregated their respondents. Rather than reporting that religiosity bears a statistically significant and inverse relationship to support for gender equality among citizens of the countries from which data have been collected, and also that economic circumstance bears a significant and positive relationship to support for gender equality among the same population, the investigators would be able, having disaggregated their respondents on potentially important demographic variables, to specify the characteristics of the respondents to whom these conclusions do and do not apply. This would permit the investigator to offer better insights and present a much richer and more fine-grained causal story about some of the determinants of support for gender equality.

Table 4.4 Impact of personal religiosity and of economic circumstance on support for gender equality among respondents grouped by sex, age, and education

As stated previously, a researcher may find it useful to consider disaggregation with respect to the population, or sample, of the units on which she has data. This does not mean that the units are always individuals, however, or that variables on which there is disaggregation are always demographic attributes. On the contrary, the potential utility of disaggregation is not limited to research in which the individual is the unit of analysis. There are numerous studies that analyze data on a different unit of analysis and report that their analyses and findings have been enriched by disaggregation. A few diverse and randomly selected examples are below:

  • In a study of determinants of household poverty in Egypt, household was the unit of analysis. The attributes of the household considered included whether the head of the household was male or female, whether this person was in or not in the labor force, and if in the labor force, in what sector did the household head work. Most employed female heads were “blue collar workers,” and to achieve a more fine-grained analysis, the investigator disaggregated this category by sector, including agriculture, fishing, service, and other.Footnote 10

  • In a study of the foreign policies of Arab countries, the unit of analysis is foreign policy output. The authors, two senior Egyptian political scientists, write that their study includes “disaggregation of foreign policy output into its relevant components: the actor’s general objectives, orientation, or strategy and specific foreign policy behavior. The breaking down of foreign policy output into general objectives and concrete behavior draws attention to important questions for both the empirical analysis of foreign policy and theory-building.”Footnote 11

  • A study of “Women’s Empowerment and Political Voice” in Morocco considered many criteria, ranging from women in parliament to the Moroccan family code. Another important variable is public spending, which is the unit of analysis for this part of the research. The authors praised the recent introduction of “gender-responsive budgeting,” with allocations and expenditures disaggregated by gender. They also complained that “inadequate disaggregated data on women’s social and economic status limits the extent to which the state can be held to account.”Footnote 12

Finally, disaggregation is also potentially useful in measurement, especially the measurement of more abstract or multidimensional concepts. A good general example is the United Nation’s Human Development Index (HDI). Created as an alternative to the economic indices that are frequently used to measure a country’s level of development, the HDI seeks to measure the well-being of the ordinary citizens of a country. Toward this end, the index is composed of indicators pertaining to health, education, and standard of living, the latter measured by the GINI coefficient, which is a measure of income inequality. There is agreement that the HDI measures something important that is not captured by such economic indices as per capita gross domestic product or per capita national income. But while the HDI is frequently used, there are instances when it is useful to disaggregate the index and consider separately the explanatory power of one or two of its component indicators.

Personal religiosity, in Arab countries as elsewhere, is often measured by a composite index that includes behavior, such as prayer and reading religious books; belief in God and in the religion’s central articles of faith; and other actions, such as preferring to consult religious officials to discuss personal problems. Construction of a scale based on all or at least some of these different indicators, perhaps by factor analysis or another scaling technique, provides a measure of personal religiosity that is more complete and sometimes more useful. But in some instances, too, it may be more instructive to disaggregate the composite measure and consider relationships in which only religious action or only religious belief is a variable.

Table 4.5 presents findings from a research project that used survey data from Algeria and Morocco to explore the relationship between attitudes toward political Islam and attitudes toward democracy.Footnote 13 Support for political Islam is defined in this study as a belief that Islam and politics should not be separated and that the religion should play an important role in the governance of the respondent’s country. Based on ordinary least squares regression analysis, the table presents standardized coefficients and gives t-statistics in parentheses. (t-statistics are an alternative measure of confidence in the coefficient estimates, similar and mathematically related to p-values.)

The table illustrates both measurement disaggregation and unit of analysis disaggregation in the analysis of both the Algerian data and the Moroccan data. More specifically, there is a composite measure of attitudes toward political Islam that has been disaggregated, with its political and economic dimensions considered separately; and the sex of respondents has also been disaggregated, with men and women considered separately.

The findings in Table 4.5 are the same for both Algeria and Morocco, and they are instructive findings that would have been missed had there been no disaggregation. There is a statistically significant inverse relationship between a favorable attitude toward democracy and a favorable attitude toward political Islam when the composite measure of attitudes toward political Islam is employed and the full sample of respondents is included in the analysis. The findings differ with disaggregation, however. It turns out that this significant inverse relationship only reflects the positive attitudes held by women toward the economic and commercial dimensions of governance that might be guided by Islam.Footnote 14 Had there been no disaggregation, the findings reported would have at best been incomplete and, in fact, actually somewhat misleading.

Table 4.5 Multiple regression showing the influence of attitudes toward political Islam on attitudes toward democracy

4.2.4 Scope Conditions

Scope conditions refer to the subset of cases, defined in terms of their most relevant attributes, to which a theory applies.

In positivist and empirical social science research, as discussed here, the goal of a research project is very often a causal story, that is to say a set of confirmed causal relationships and interrelationships. This is also frequently called a theory. Scope conditions are the characteristics, or parameters, that specify and describe in terms of concepts and variables the circumstances in which this theory is believed to apply. These concepts and variables are the conditionalities.

Scope conditions may be specified by an investigator prior to data collection and data analysis, in effect making them part of the causal story. In this case, the analysis will subsequently offer evidence about the degree to which the researcher’s specification is correct. Alternatively, scope conditions may not receive serious attention until the research project’s findings have become clear. In this case, the investigator will reflect on the attributes of the case or setting that has or has not lent support to her causal story and designate the case or setting attributes that she believes constitute appropriate scope conditions.

In many and probably most studies, an investigator will give attention to scope conditions both before the study has been conducted and after its findings are clear. Initially, her selection of the case or cases to be studied will almost certainly be based on her ideas about the conditions under which her hypotheses will have explanatory power. Subsequently, her findings will provide evidence about the accuracy of these ideas and a basis for thinking further about scope conditions. It is possible that some conditionalities will have been clarified and confirmed, that some will have been shown to be incorrect, and/or that some may be instructive but will need to be revised and refined.

Attention to scope conditions reflects the cumulative character of scientific research, including social scientific research. It recognizes that the work of an individual researcher or research project has increased value to the extent it contributes to the work of a community of investigators seeking to answer the same or very similar questions. This might not be the case if the objective of an investigation is to provide only thick description of a particular place and time. In fact, however, researchers usually aspire to identify generalizable insights, that is to say causal stories that have explanatory power in cases beyond those studied by any one investigator. To do this, except in the very unlikely event that the focus is on the rare causal stories that are believed to be universal and to apply in all times and places, the path toward cumulativeness lies with scope conditions. It is to say more than that the causal story sometimes applies and sometimes does not apply. It is to say, or to contribute to the research community’s ability to say, that the causal story is disproportionately likely to apply in cases or settings that are characterized by specific attributes.

Cumulativeness also signifies that the identification of scope conditions is an ongoing process. Individual researchers or research teams undertake to reduce uncertainty and add to what is known about the conditionalities attached to a given explanation of variance—to a given causal story or theory. An investigator recognizes that she cannot offer definitive insights about these conditionalities. She also recognizes, however, that she can and should add to the insights about conditionalities that have been, are being, and will be added by other investigators. And so the contribution of her research, if successful, is not only a causal story that meaningfully accounts for variance, but also a delineation of the most relevant attributes of the case or setting for which she has found this causal story to have a high probability of being true.

A good example of attention to scope conditions comes from an individual-level study of the relationship between observing and participating in elections, the independent variable, and attitudes toward democracy, the dependent variable. Students of democratization argue that elections in non-democracies, particularly elections that are at least somewhat competitive, expose ordinary citizens to democratic principles and procedures and that the experience of electoral participation increases the likelihood that an individual will have a positive view of democracy. This is significant since public support for democracy appears to be necessary for a sustained and consolidated democratic transition.

A study in Algeria examined a modified version of this proposition, hypothesizing that the impact of electoral participation on attitudes toward democracy depends on whether the elections are, or are perceived to be, free and fair. Analyzing data from surveys both before and after a presidential election, the study found that the country’s electoral experience decreased support for democracy among those Algerians who believed that the election was not free and fair. Accordingly, the take-away, at least in its basic formulation, is that if ordinary citizens observe and experience an important election that they judge to be fraudulent and unfair, their support for democracy as a desirable political system will then diminish significantly.Footnote 15

Turning then to scope conditions, reflection is invited on the attributes of the cases or settings to which this analytical insight has been found to apply. In this instance, Algeria is the case to which a causal story about effect of elections on attitudes toward democracy has been found to apply. But “Algeria” is not a conditionality. It is rather the relevant attributes of the Algerian case that constitute conditionalities, and their specification is frequently described as replacing proper names with variable names.

So, what are the names of variables that might replace the name “Algeria” in order to specify the conditions under which what was found in Algeria might be found elsewhere? Among the likely scope conditions that specify the applicability of the Algeria study’s findings are that they apply when the country is not democratic and is not actively engaged in a robust democratic transition, when the elections are at the national level and perhaps also are presidential, when the election is competitive to the extent that there are multiple candidates and/or political parties competing for votes, and when there are both candidates and parties aligned with the government and candidates and parties that are not aligned with the government.

Additional research, by others and perhaps also by the researchers themselves, will be necessary to determine whether these proposed scope conditions actually do specify when what was found in Algeria will be found elsewhere. Additional research on the relationship between elections and attitudes toward democracy will also be necessary to determine whether these are only some of the conditions under which the findings from Algeria apply, and whether all or only some of these particular scope conditions are necessary. That these determinations about scope conditions require additional research reflects the cumulative character of the production of knowledge in social science research.

Another opportunity to think about scope conditions is provided by a study in Tunisia, Algeria, and Morocco that tests hypotheses about the determinants of variance in attitudes toward political Islam.Footnote 16 One of the several hypotheses that were tested is shown below, and the results of an OLS regression analysis are presented in Table 4.6. The hypothesis was tested at two points in time, 2013 and 2016.

  • H1. Individuals with lower levels of economic satisfaction are more likely than are individuals with higher levels of economic satisfaction to favor a political formula that gives Islam an important role.

Table 4.6 OLS Regression coefficients showing the influence of personal economic circumstance on attitudes toward political Islam among all respondents and among respondents grouped by country and year

The findings in Table 4.6 pertaining to H1 are straightforward. The hypothesis posited that higher levels of economic dissatisfaction push toward support for political Islam, and this was confirmed only in Morocco, and in Morocco for both 2013 and 2016. More research on the individual-level relationship between economic circumstance and attitudes toward political Islam will be needed before conclusions about scope conditions can be drawn with any degree of confidence. But the findings presented in Table 4.6 do contribute to this ongoing and cumulative enterprise. First, more often than not, economic circumstances do not have an impact on attitudes toward political Islam, and so it appears that H1 does not posit a relationship that is broadly applicable.

Second, in certain circumstances, that is to say under particular conditions, the relationship proposed in H1 does apply. And in the case being considered, the scope conditions are likely to be political and economic attributes of Morocco that are at least somewhat stable over time.

The place of Islamist parties and movements in Morocco points to what may be a conditionality. Extremist and anti-regime Islamist movements have been marginalized in Morocco. But the most important Islamist movement, the Party of Justice and Development, not only operates in the mainstream of Moroccan political life, it has in fact been victorious in elections and has led the government during the 2010s. Accordingly, a political situation that gives an Islamist party considerable influence may be an important conditionality.

Economic circumstances may also be a relevant conditionality. Given that Morocco is one of the poorest Arab countries, a likely scope condition is also that a significant proportion of a country’s population is living in poverty. Both of these conditionalities, attributes that characterized Morocco but not Tunisia or Algeria at the time of the research, are discussed in the publication from which this example is taken. What distinguishes Morocco and may constitute scope conditions favorable to the existence of the hypothesized causal relationship probably lies in “the interaction between Morocco’s overall relative and absolute poverty and its experience with the Justice and Development Party.”

Exercise 4.5. Thinking about scope conditions in the Maghreb

Table 4.2 presents the results of an innovative study in Morocco and Algeria that tested hypotheses about determinants of variance in the constituent service work of parliamentary deputies. It found that female deputies are more likely to serve female and less influential constituents than are male deputies, and that quota-elected female deputies are more likely to serve female and less influential constituents than are non-quota-elected female deputies. But it found these relationships only in Morocco. These or very similar patterns were not found in Algeria.

  • What might be scope conditions in this case? Offer your thoughts about the conditionalities that determine when this finding is disproportionately likely to be found elsewhere.

The relationships shown in Table 4.5 offer another opportunity to think about scope conditions. The study is based on surveys in Algeria and Morocco, and attitude toward democracy is the dependent variable. Attitude toward political Islam is the independent variable. Interestingly and quite significantly, in both countries, attitudes toward one and only one of the two dimensions of political Islam were found to have explanatory power, and this was found to be the case only for women. This is a somewhat particular and unusual finding, reflecting both unit of analysis disaggregation and measurement disaggregation. To identify scope conditions, an investigator must consider whether there are attributes of both Algeria and Morocco, or of the situation of the two countries with respect to political Islam and to women, that may specify the conditions under which the same or very similar variable relationships will be found elsewhere.

  • What might be scope conditions in this case? Offer your thoughts about the conditionalities that determine when this particular and somewhat unusual finding is disproportionately likely to be found elsewhere.

Table 4.5 also shows that personal religiosity has explanatory power among women but not men in Algeria and does not have explanatory power among either sex in Morocco. For women in Algeria, greater personal religiosity pushes toward unfavorable attitudes toward democracy.

  • What might be scope conditions in this case? Offer your thoughts about the conditionalities that determine when this finding is disproportionately likely to be found elsewhere.

4.2.5 Experiments

Although controlling for potentially confounding variables helps to establish that a relationship is not spurious, there are often variables that a researcher cannot control due to measurement limitations or the absence of relevant data. In addition, there may be variables that could produce a spurious relationship that an investigator did not think to include in the causal story she proposes to test and, therefore, are not included as control variables in her hypothesis-testing analyses.

Experiments offer an alternative approach to controlling sources of extraneous variance and, thereby, reducing the chance of making a Type I error or a Type II error. Experiments also significantly strengthen the basis not only for concluding that an observed variable relationship very likely represents accurately the population of cases of which it is a subset, but also for establishing that the relationship is very likely to be causal. Accordingly, when appropriate given the hypothesis and causal story to be evaluated, experiments offer a powerful methodology for generating and analyzing data.

Experiments have traditionally been used more frequently in some social science disciplines than others. They have been conducted most frequently in psychology and educational psychology. Although not entirely absent, experimental research designs have been less common in political science and, to some extent, sociology. In recent years, however, the conduct of experiments has become much more common in the latter disciplines, and it has also become common to include an experiment as one element of a multi-method research design. Thus, although a thorough discussion of experiments in social science research is beyond the scope of this guide to social science research, political scientists and researchers in other social science disciplines should be familiar with at least the basic elements of experiments. A short overview of these elements, along with a few examples, is presented here for this purpose.

The basic structure of an experiment is simple and straightforward. To begin, an investigator randomly assigns the units of analysis on which she has data and that she plans to use in the experiment to two or more groups. One group will be designated the control group and it will not be subject to the treatment, or treatments, associated with the experiment. A second group will be exposed to the experimental treatment, and if the experiment involves more than one treatment, there will be more than one treatment group. After this group(s) has received the treatment, the researcher can easily compare the measures of the dependent variable in the control group and the treatment group(s) in order to measure the effect of the treatment.

Survey Experiments

An example of a survey experiment is provided by a study pertaining to attitudes toward the Islamic State (Daesh) that was embedded in the 2016 Wave 4 Arab Barometer surveys. With attitudes about Daesh the dependent variables, the purpose of the experiment was to determine whether receiving different kinds of information about the goals and tactics of the Islamic State affected the attitudes toward the terrorist group held by ordinary citizens in the Arab world. In this experiment, as shown in Table 4.7, there were four treatments. One treatment called attention to the group’s claim to be establishing a caliphate and its use of violence against both non-Muslims and Muslims in pursuit of this objective. Each of the three remaining treatment groups received the same information about the pursuit of a caliphate and the use of violence, and then received additional information about one of the goals that was espoused by the Islamic State and emphasized in the group’s social media messaging.

Table 4.7 Control group and treatment groups in an Arab Barometer Wave 4 experiment on the influence of information about the Islamic State (Daesh) on attitudes toward the terrorist group

As noted, it is essential that group assignments be random. This assures that the groups, five in this example, are comparable with respect to anything other than the treatments that might affect the attitudes of the respondents. To measure their attitudes, respondents were asked to indicate their agreement or disagreement with a number of statements, three of which are listed below. These statements were presented to respondents after the experimental treatments had been introduced, and the influence of the information provided by each treatment was measured by comparing the responses of individuals in each treatment group to the responses of individuals in the control group. As long as assignments to the control group and the four treatment groups are random, the groups are almost certainly comparable with respect to other possible determinants of attitudes. And with other possible determinants thus held constant, control group-treatment group attitudinal differences can be attributed to the explanatory power of the treatment, rather than to any confounding variable, with a low likelihood of error.

  • To what extent do you agree with the goals of the Islamic State? (3.1 percent of the individuals in the control group agree or somewhat agree)

  • To what extent do you agree with the Islamic State’s use of violence? (2.9 percent of the individuals in the control group agree or somewhat agree)

  • To what extent do you agree that the Islamic State’s tactics are compatible with the teachings of Islam? (5.3 percent of the individuals in the control group agree or somewhat agree)

A fuller account of this experiment is beyond the purview of the present discussion.Footnote 17 A few points may nonetheless be briefly noted. First, the proportion of respondents expressing even somewhat positive attitudes toward the Islamic State is very low. The percent agreeing to a large extent or to some extent is given in the parentheses after each statement. Second, the impact of the experimental treatments sometimes does but sometimes does not push the percentage of individuals with positive attitudes even lower, and the impact of some treatments varies from country to country.

Finally, some of the most instructive findings emerge when control group and treatment group comparisons are made for subsets of respondents, rather than for all respondents. This involves disaggregation, as discussed earlier in this chapter. Figure 4.8 offers an example and further illustrates the use and utility of experiments. It considers the attitudes of younger and less well educated men, a prime target of Islamic State recruitment efforts, and it shows that the proportion of individuals with positive attitudes toward the goals of the Islamic State is somewhat higher among those in the control group but significantly lower among those in each of the treatment groups. It is also lower in some treatment groups than in others.

Fig. 4.8
figure 8

Support for Daesh’s goals by treatment among younger less educated men

Another interesting example of an experiment embedded in a survey addressed determinants of attitudes toward gender equality. The survey was conducted in Egypt in 2013, and the dependent variable was the views of ordinary citizens toward women’s roles in public and political life.Footnote 18 The specific question to be answered by the experiment was whether support for female political leadership would increase among individuals, Egyptians in this case, if they were exposed to arguments in favor of women’s political equality that were grounded in the Qur’ān, Islam’s holiest text. The authors write in this study that they draw on recent scholarship on religion and politics, including work by some who describe themselves as “Islamic feminists,” to hypothesize that such exposure would increase support for women’s political equality.

There were two treatment groups and a control group in the experiment. One of the treatment groups gave respondents a statement in support of gender equality based on Islamic sources. The other treatment group gave respondents a statement in support of gender equality based on scientific research. Both statements are shown below.

  • Treatment 1. Some say that there is no problem if a woman assumes a position of authority, such as the presidency of the republic or the prime ministership. And they rely on a verse from Sūrat al-Tawba (Chapter of Atonement) in the Holy Qur’ān that says, “Believing men and believing women are protectors of one another.” And they interpret it to mean that God does not distinguish between men and women in their capabilities.

  • Treatment 2. Some say that there is no problem if a woman assumes a position of authority, such as the presidency of the republic or the prime ministership. And they rely on the results of numerous scientific studies. For example, in 2010, a group of leading scholars completed a study that showed that women and men have the same leadership capabilities.

Following the treatments, the survey continued and respondents were asked to respond to the item shown below. To test their hypothesis about the impact of religious discourse on attitudes toward political leadership by women, the authors compared the post-treatment attitudes of respondents who received Treatment 1 to the attitudes of respondents in the control group. The post-treatment attitudes of respondents who received Treatment 2 were similarly compared to the attitudes of respondents in the control group. The difference between each treatment group and the control group were then compared to see not only whether the religious treatment made a difference but also whether it made more of a difference than the non-religious treatment.

Between the following two opinions, which one is closer to your personal opinion?

  1. (a)

    It is not good for a woman to assume a position of authority, such as the presidency of the republic or the prime ministership, or

  2. (b)

    There is no problem if a woman assumes a position of authority, such as the presidency of the republic or the prime ministership.

The findings of the experiment supported the authors’ hypothesis that religious discourse can be used to make inroads against conservative attitudes. Among respondents in the control group, 32.6 percent chose the statement that there is no problem if a woman assumes a position of authority. Among respondents in the treatment group receiving a statement in support of female political leadership based on scientific research, 34.3 percent agreed that there is no problem if a woman assumes a position of authority. This is only slightly higher than the percentage of respondents in the control group who took this position, a difference that is not statistically significant. By contrast, this position was taken by 40.5 percent of the respondents in the treatment group given a religious basis for gender equality. Both the difference between this treatment group and the control group and the difference between this group and the other treatment group were statistically significant, the former at the .01 level and the latter at the .05 level of confidence.

Conjoint Experiments

The two experiments described above illustrate what are sometimes called “discrete” experiments or “unidimensional” experiments. This refers to the fact that questions measuring the dependent variable are asked and answered one at a time. Participants are not asked to consider the interaction between the subjects about which different questions ask.

In conjoint experiments, sometimes described as “multi-dimensional” experiments, participants are asked to respond to questions that propose a number of alternatives based on two or more variables taken in combination. This allows the investigator to assess the impact of experimental treatments on concepts that are complex and multidimensional.

An example of a conjoint experiment is provided by a study in Tunisia of the way that voters evaluate candidates running for office based on their gender and their religiosity taken together.Footnote 19 Respondents in a nationally representative survey conducted in 2012 were randomly assigned to one of two treatment groups. Both groups were shown two pictures of possible candidates for office and asked to indicate for each whether they definitely would, probably would, probably would not, or definitely would not vote for the person in the picture. In one treatment group, respondents were shown pictures of a secular man and a secular woman. In a second treatment group, respondents were shown pictures of a religious man and a religious woman, with dress and appearance indicating religiosity. In both cases, the researchers were very careful to make the pictures clear and believable and otherwise comparable.

Examining support for men and women with religiosity taken into consideration, the authors found different patterns in the two treatment groups. In the second treatment group, those who gave a high score to the religious male candidate gave an even higher score to the religious female candidate. In the treatment group with secular candidates, those who gave a high score to the secular male candidate did not give a higher score, and sometimes gave a lower score, to the female secular candidate. The larger theoretical goal of the study was, first, to identify demographic and ideological factors that account for variance across the four candidate preference types; and second, to use the profile of supporters of each candidate type to assess the relevance and explanatory power of three theoretical frameworks: modernization theory, role contiguity theory, and social identity theory.

Natural Experiments and Matching

Natural experiments are very similar in design to other experiments in that they rely on randomization to ensure that treatment and control groups are comparable. The main difference is that in natural experiments, the randomization of the treatment occurs “naturally” in the world, instead of being done by the researcher herself. Matching may be used in instances where assignment to the control and treatment groups is not based on randomization.

Suppose, for example, that the Ministry of Education in your country has been planning to develop a new set of high school textbooks that present a new interpretation of important events in your country’s history. However, the ministry does not have enough copies of the new textbooks for all of the schools in the country, and so it decides for this reason to select a subset of high schools to receive and use the new textbooks. This creates the foundation for a natural experiment. Schools in which the old textbook continues to be used constitute, in effect, the control group of an experiment, and schools using the new textbook constitute, in effect, an experimental treatment group. An investigator could use this opportunity to conduct a natural experiment by surveying students in schools that did and in schools that did not receive the new textbooks, and she could then compare the students in the two groups to see whether and how using the new textbooks affected the knowledge, attitudes, and/or behavior of the students.

In this example, the way that the ministry selected the schools that received the new textbooks would have important implications for the experiment. In order to draw conclusions about whether and how the new textbooks affect student knowledge, attitudes, and behavior, the control group and the treatment group must be comparable. In other words, factors other than the textbook used must be held constant and thus be deprived of any possible explanatory power. If use of the new textbooks is the only difference between the control and treatment schools, any difference in the knowledge, attitudes, or behavior, which are the dependent variables, cannot be attributed to any confounding variable.

Randomization in assigning schools to either the old textbook control group or the new textbook treatment group is the best way to create groups that are comparable apart from the treatment. As in other types of experiments, randomization in group assignment should be used whenever possible. But randomization is not always possible, especially in natural experiments since group assignments are made in a way and for reasons having nothing to do with an experiment. In this case, it may be possible to use matching to establish comparability between the control group and the treatment group or groups.

Matching uses observational data where the treatment and control groups are not randomly assigned. Based on observable pre-treatment covariates, meaning variables that need to be held constant, each unit of analysis in the treatment group is paired (matched) with a very similar unit of analysis in the control group. These matched pairs are then used to assess the explanatory power of the experimental treatment. Units of analysis that cannot be matched are not included in the analysis. Matching does not provide the same degree of comparability as randomization. Nevertheless, the degree to which matching makes the treatment and control groups very similar, and similar especially with respect to variables that may be sources of extraneous variance, will increase the researcher’s confidence in her findings about the impact of the treatment.

An interesting and potentially important finding comes from a natural experiment in the Israeli-occupied West Bank.Footnote 20 Matching was used to establish the treatment group and the control group in this experiment. The groups were two West Bank Palestinian villages that had many similar characteristics and thus matched one another and were broadly comparable.

Among the elements of the occupation that are particularly problematic and oppressive for West Bank Palestinians are the checkpoints through which they must pass when traveling in many areas. Although it is not always the case, Palestinians may be interrogated, searched, or otherwise detained at checkpoints, sometimes for a long time. It is not surprising that checkpoints contribute to anti-Israel attitudes among Palestinians.

In 2009, Israel decided to remove a checkpoint that monitored traffic on an important highway and through which travelers into and out of a large Palestinian village were obliged to pass. To make this the basis for a natural experiment, with removal of the checkpoint as the treatment, the research team identified another village with very similar characteristics that could serve as a control group. The second village was affected by a checkpoint that was not being removed, which qualified its residents as a control group. The research team then interviewed a representative sample of Palestinian residents in each village several months before the checkpoint was removed and then again several months after it had been removed. The survey questions asked about various aspects of the Israeli-Palestinian conflict in general and about the degree of militancy of respondent attitudes in particular.

The data were analyzed using a “difference in differences” model, often called simply dif-in-dif or DID. The dif-in-dif analysis involved measuring the difference in treatment village attitudes before and after removal of the checkpoint, measuring also the difference in control village attitudes before and after removal of the checkpoint in the other village, and then comparing these two measures of difference. To the extent that the two villages are broadly comparable, and to the extent that attitudes changed significantly more among residents of the treatment village than among residents of the control village, the researchers were able to conclude that the presence or absence of a checkpoint is a significant determinant of variance in the attitudes of West Bank Palestinians toward the conflict with Israel.

More specifically, the attitudes about the Israeli-Palestinian conflict of the treatment village residents were less militant and hostile in the post-treatment survey than they had been in the pre-treatment survey. Among control village residents, by contrast, attitudes toward the conflict were actually more militant and hostile in the post-treatment survey than they had been in the pre-treatment survey. Among the study’s conclusions: Checkpoints have a significant effect on West Bank Palestinian attitudes toward their conflict with Israel, and the nature of this effect involves making Palestinians more militant and hostile in their views about the conflict.

This chapter has discussed many different types and goals of multivariate analysis, and it has provided a diverse array of examples for purposes of illustration. The discussion in the section on “Causal Inference” is primarily concerned with the requirements for establishing that a bivariate relationship very probably involves causality. Identifying relationships that are causal, that have explanatory power, in other words, is a central preoccupation in social science research. This is not the goal of all social science research. Some studies simply seek to better understand variance and are primarily descriptive; they seek to provide information and insight about what the variance looks like or how it is distributed, not why a variable behaves as it does or with what consequences. And variance itself is not the preoccupation of all social science research.

Nevertheless, formulating and testing theories and hypotheses that purport to account for variance, that seek to discover determinants, that seek to explain, remain one of the most widespread and important dimensions of positivist social science. Multivariate analysis is required to determine whether a variable relationship is indeed causal, or very likely to be, and to inspire confidence in an investigator’s claim that her findings demonstrate causality. The concepts and procedures discussed in the section of this chapter on “Causal Inference” are the elements of a research design to which an investigator must be attentive in order to build a case for causal inference.

The section on “Third Variable Possibilities” discusses some of the important ways that multivariate analysis, meaning the addition of one or more “third” variables, can enrich the sophistication and shed light on the applicability of a causal story. A distinction between direct and indirect relationships, the concept of disaggregation, and attention to scope conditions all contribute to these important goals. This does not exhaust the list of ways that the inclusion of one or more additional or “third” variables can contribute to the refinement of a causal story or to increased understanding of the conditions that specify when the causal story probably does and probably does not have explanatory power. Again, however, the topics covered in the section on “Third Variable Possibilities” are widely used in social science research and constitute powerful methodologies with the potential to significantly enhance the value of any research project.

Numerous real-world examples of social science research projects carried out in the Arab world have been provided to illustrate these points about causal inference and the addition of “third” variables. Interested readers, or perhaps students reading this guide in a classroom setting, might find it useful to consult and discuss some of the original publications that have been cited in this chapter. Or, such readers might find it profitable to seek out and review additional examples of relevant research reports, especially those designed and conducted by Arab social scientists. More examples of research projects designed and carried out by Arab social scientists will contribute to a fuller understanding of the application in Arab environments of the concepts and procedures discussed in this and other chapters.