2.1 Thinking About Research

Why are some people in the Arab region more likely than others to vote in elections? Why do some countries but not others have higher levels of satisfaction with their healthcare systems? Why is domestic violence more prevalent in some communities than others? What causes some people but not others to become more politically engaged, or less politically engaged, over time? Every day, we come across various phenomena that make us question how, why, and with what implications do they vary across people, countries, communities, and/or time. These phenomena—e.g. voting, satisfaction with health care, domestic violence, and political engagement—are variables, and the variance they express is the foundation and the point of departure for positivist social science research. Accordingly, they are “variables of interest” in research projects that are motivated by this variance and that seek to answer questions of the kind listed above.

Most research projects begin by describing the variance referenced by the variable or variables of interest. This is description, which is the focus of the present chapter. The research project usually then goes on to propose and evaluate hypotheses about factors that account for some of the variance on the variable of interest. This is explanation, which will be the focus of Chaps. 3 and 4.

In many research projects, the concern for explanation, as expressed in hypotheses, offers a causal story about some of the determinants of the variance on the variable of interest. It offers a cause and effect story in which the variable of interest is the effect. Alternatively, the goal of a research project motivated by the variance on the variable of interest may not be why it varies as it does, but rather, what difference does it make. This is also a cause and effect story, but this time the variable of interest is the cause.

For example, if aggregate voter turnout by country is the variable of interest, an investigator might ask and try to answer the question of why citizens of some countries are more likely to vote than are citizens of other countries. One possibility she might consider, offered only as an illustration, is that higher levels of corruption incentivize voting, such that voter turnout will be higher in countries with more corruption. An investigator might also, or instead, be interested in whether voter turnout is itself a determinant of the variance on a particular and presumably important issue. She might advance for testing the proposition that greater voter turnout helps to explain why some countries have more developed and better-quality infrastructure than other countries.

Before an investigator decides on a variable (or variables) of interest and begins a research project, she must consider why the topic is important, and not only to her but also to the broader society and global community. She must also decide on the causal story she will investigate. Will she seek to identify important determinants of the variance on the variable of interest, and then explicate the ways, or mechanisms, by which these determinants exert influences? Or will she choose to consider the variable of interest as a determinant, and then investigate whether, and also how, the variable of interest exerts influence on the variance of other variables?

As stated, the variable of interest has been chosen because the investigator considers it important, and because identifying its relationships with other variables will contribute to a better understanding of political, social, or economic life. The investigator will also want to consider what has been said about the topic in the scholarly literature, and what her investigations have the potential to add to this literature. She will ask whether the topic has for the most part been overlooked, and if not, whether findings from previous research are persuasive or appear to be flawed, or whether there are knowledge gaps that her research can help to fill. By choosing important topics and investigating how and why certain phenomena vary, social science research can make valuable contributions and enrich our knowledge and understanding of societal dynamics.

The variables and variable relationships mentioned above are fictitious, provided only to illustrate that positivist social science research usually begins with the designation of a variable of interest and a description of the way it varies, then proceeds to investigate the nature and direction of its relationships, and very often its causal relationships, with other variables. Of course, designing a research project involves much more than selecting a variable of interest and specifying its relationships to other variables, and this will be the focus of Chaps. 3 and 4. Readers should keep a concern for explanation in mind as they engage with this chapter’s emphasis on description.

2.2 Variance and Variables

2.2.1 The Concept of Variance

Once you have decided on a research topic, or while you are deciding whether a particular topic is of interest, the first objective of every researcher should be to understand how a variable varies. Thus, a central preoccupation of this chapter is with the concept of variance, with discovering and then presenting information about the way that the subject or subjects of interest vary. The chapter focuses, therefore, on univariate analysis, that is to say, variables taken one at a time.

The concept of variance is a foundational building block of a positivist approach to social and political inquiry, an approach that refers to investigations that rely on empirical evidence, or factual knowledge, acquired either through direct observation or measurement based on observable indicators. Positivist social science research does not always limit itself to considering variables one at a time, of course. As discussed briefly in the introductory chapter and as a central preoccupation of Chaps. 3 and 4, discerning relationships and patterns of interaction that connect two or more variables is often the objective of inquiry that begins with taking variables one at a time. Often described as theory-driven evidence-based inquiry, positivist social science research that begins with separate descriptions of variance on relevant phenomena, that is to say on the variables of interest to an investigator, frequently does so in order to establish a base for moving from description to explanation, from discerning and describing how something varies to shedding light on why and/or with what implications it varies.

Although discerning and describing variance may not in these instances be the end product of a social scientific investigation, being rather the point of departure for more complex bivariate and multivariate analyses concerned with determinants, causal stories, and conditionalities, it remains important to be familiar with working with variables one at a time. Relevant to this topic are: sources and methods of collecting data on variables of interest; the development of measures that are valid and reliable and capture the variance of interest to an investigator, sometimes requiring the use of indicators to measure abstract concepts that cannot be directly observed; and the use of statistics and graphs to summarize and display variance in order to parsimoniously communicate findings. These are among the topics to which the present chapter devotes attention.

It must also be added that descriptive analysis, that is to say measuring and reporting on the variance associated with particular concepts or variables, need not always be the first stage in an investigation with multivariate objectives. It can be, and often is, an end in and of itself. When the phenomenon being investigated is important, and when the structure and/or extent of its variance are not already known, either in general or in a particular social or political environment, descriptions of variance under these conditions need not be the first step on a multi-step investigatory journey to derive significance but can be, in and of themselves, the end goal of a research project with its own inherent significance.

Finally, while the principle preoccupation of this chapter and the remaining chapters is with what should be done once an investigator has decided on a research question or variables of interest, the first half of this chapter may also be helpful in choosing a research topic. The following sections discuss how to think about and describe variance, not only on the variable(s) of interest but also on other variables that will be included in the research project. Attentiveness to variance, along with the considerations discussed earlier, will help an investigator to think about a research topic and the design of her research project.

2.2.2 Units of Analysis and Variance

Positivist social science research can be conducted with both quantitative and qualitative data and methods, each of which has strengths and limitations. Some topics and questions are best addressed using quantitative data and methods, while others are better suited to qualitative research. Still other researchers utilize both quantitative and qualitative data and methods, often using insights derived from qualitative research to better understand patterns and variable associations that result from the analysis of quantitative data.

This chapter, as well as those that follow, places emphasis on social science research that works with quantitative data. In part, this is because of the volume’s connection to the Arab Barometer and the ready availability of the Barometer’s seven waves of survey data. Nevertheless, the concept of variance also occupies a foundational position in positivist social science research that works with qualitative data. For this reason, we briefly discuss qualitative research later in the chapter and illustrate its value with several examples.Footnote 1 We also discuss what to consider when choosing between different styles of research and types of data.

In this section, we present a few examples from Arab Barometer surveys and other data sources to highlight the importance of describing and understanding the variance of certain phenomena. These examples, which use quantitative data, will also reintroduce the notion of a unit of analysis, which is the entity being studied whose status with respect to the variance is being measured. The unit of analysis in studies based on Arab Barometer data is usually, although not always, the individual. These studies investigate how (and very often also why, as discussed in Chaps. 3 and 4) individuals give different responses to the same survey question. An investigator would in effect be asking, what is the range of ways that individuals, that is to say, respondents, answered a question; and how many, or what proportion, of these respondents answered the question in each of the different ways that it could be answered.

Although still somewhat limited in comparison to most other world regions, the number of systematic and high-quality surveys in Arab countries is growing, as is the number of published studies based on these surveys. For example, a study of “Gender Ideals in Turbulent Times,” published in Comparative Sociology in 2017, used Arab Barometer data to describe the gender-related attitudes of men in Algeria, Egypt, Tunisia, and Yemen. After describing the variance on the attitudes of men in each country, the authors considered the impact of religiosity on this variable of interest and found that the impact of religiosity on attitudes about women varies in instructive ways across the four countries.Footnote 2

Another example, and one with objectives that clearly involve description, uses data from earlier surveys in Kuwait in 1988, 1994, and 1996 to map continuity and change in Kuwaiti social and political attitudes. Led by a team of Kuwaiti, Egyptian, and American scholars, and published in International Sociology in 2007, the study found, among other things, that support for democracy increased over time but attitudes pertaining to the status of women did not change. Consistent with their focus on description, the authors note that their study serves as a “baseline” for later research seeking to take account of differences in Kuwait and other nations.Footnote 3

Country is another commonly used unit of analysis in social science research, including quantitative work. Studies in which country is the unit of analysis might compute a country-level measure by aggregating data on the behavior or attitudes of the individuals who live in that country.Footnote 4 For example, respondents in a nationally representative survey of citizens of voting age might be asked if they voted in a given election, and responses might then be aggregated to develop a country-level measure of voter turnout. In comparing countries for descriptive and/or explanatory purposes, the country-level measure might be a single value based on an average, such as the percent who voted. Or it might involve the comparison of response distributions across the countries included in the study.

Measures of both kinds have been used, for example, in the reports of Arab Barometer findings that have been published in Journal of Democracy after each wave of surveys. In 2012, for instance, JoD published “New Findings on Arabs and Democracy.” The article presented and compared findings about attitudes and understandings related to democracy and about Islam’s role in political affairs in countries included in the first and second wave of Arab Barometer surveys. It reported, for instance, that the percentage of ordinary citizens agreeing that “it would be better if more religious people held public office” varied from a low of 17.6 percent in Lebanon to a high of 61.0 percent in Palestine in the first wave of surveys, and in the second wave, from a low of 14.3 percent in Lebanon to a high of 61.1 percent in Yemen.Footnote 5 Thus, the individual-level data from the Arab Barometer survey was aggregated by country to create statistics at the country-level.

Of course, measuring variance across countries, making country the unit of analysis, in other words, does not involve only the aggregation of individual-level data about the country’s citizens. Numerous commonly used country-level measures are produced by important international institutions, such as the United Nations, the World Bank, the Arab League, and many others. The measures themselves are numerous and very diverse, ranging, for example, from Gross Domestic Product to the UN’s Human Development Index, which is based on the proportion of school aged children actually in school, the unemployment rate, and other quality of life indicators.

Without attempting to be comprehensive, and for the broader purpose of insisting on the need to be self-aware and designate the unit of analysis in systematic social science research, it may be useful to present a small number of additional examples. On the one hand, there are quasi-academic institutions that present and regularly update ratings of countries on important concepts and variables. One example among many is Freedom House, which rates countries each year with respect to political rights and civil liberties. It awards 0 to 4 points for each of 10 political rights indicators and 15 civil liberties indicators, giving a total score from 0 to 100. In 2019, it awarded a total of 34 points to Algeria, 37 points to Morocco, and 70 points to Tunisia.

Country-level measures are also produced by individual scholars and scholarly teams for use in data-based research on particular topics or issues. A good example is the scholarly literature on what has been called the “Resource Curse,” which considers the proposition that oil and mineral wealth impedes democracy. There are active debates about both theory and method in this field of research, and one result has been the development of country-level measures of key concepts and variables. Among the measures developed by an important early study, for example, is an index of oil-reliance. Scores for the 25 most oil-reliant countries at the time of the study ranged from 47.58 (Brunei) to 3.13 (Columbia). Among Arab countries, Kuwait was judged to be the most oil-reliant and given a score of 46.14. Other Arab countries judged sufficiently oil-reliant to be rated include Bahrain (45.60), Yemen (38.58), Oman (38.43), Saudi Arabia (33.85), Qatar (33.85), Libya (29.74), Iraq (23.48), Algeria (21.44), and Syria (15.00).Footnote 6 Although subsequently used in multivariate analysis to test resource curse hypotheses, this country-level index, like numerous others in which country is the unit of analysis, offers valuable descriptive information.

Individual and country are not the only units of analysis, of course. Among the many others are community and group, with numerous possibilities for describing the attributes with respect to which these units vary. Size, location, administrative structure, ethnicity and/or religion, and economic well-being are only a few of the possibilities. Each attribute is a concept and variable with respect to which the units—communities, groups—differ, and descriptions of this variance in the form of univariate distributions can be very useful. An innovative example comes from a study of Lebanese communities that sought to assign to each community a measure pertaining to public goods provision and also to governance structure. In advance of exploring the connection between these two variables, the investigator needed to develop measures of each and present these in univariate descriptions. Interestingly, an attribute related to community governance that turned out to be particularly important was whether the community was dominated by a single faction or whether there was competition for community leadership.Footnote 7 The cross-community variance associated with this concept—community governance—was mapped in the initial, descriptive portion of the project, which involved univariate analysis and variables being considered one at a time.

2.2.3 Univariate Distributions

Before continuing the discussion of variance, and also taking a brief detour into working with qualitative data, the nature and value of univariate analysis and the presentation of descriptive information can be further illustrated by presenting univariate distributions of answers to four questions asked in Arab Barometer surveys. First, we’ll look at responses to two questions about the Islamic State; and second, we’ll consider responses to two questions about sexual harassment and domestic violence. These issues are obviously important, and in all four cases, variance in the experience, behavior, and attitudes of ordinary citizens is at best imperfectly known. Accordingly, particularly since samples in Arab Barometer surveys are probability-based and nationally representative, there can be little doubt that univariate distributions of responses given by the individuals interviewed by the Arab Barometer team are valuable and instructive with regard to Arab society at large.

The first example, which was explored in the fourth wave of Arab Barometer surveys, in 2016–2017, concerns Arab attitudes toward the Islamic State. Findings, presented in the top half of Table 2.1, are based on surveys in Jordan, Lebanon, Palestine, Tunisia, Algeria, and Morocco, taken together. The table shows that the overwhelming majority of those interviewed have very negative attitudes toward the Islamic State. At the same time, small minorities agree with the goals of the Islamic State and believe its actions to be compatible with the teachings of Islam.

The second example, which was explored during the fifth and sixth waves of Arab Barometer surveys, in 2018–2019 and 2020–2021, deals with sexual harassment and domestic violence, very important issues about which the variance within and across countries in the Arab world (and elsewhere) is not well-known. Accordingly, once again, discerning and then describing the variance with respect to relevant experiences or behaviors make a very valuable social scientific contribution, and this is quite apart from whatever, if anything, might be learned through bivariate and multivariate analyses in subsequent phases of the research. The lower half of Table 2.1, based on all of the respondents in the 12 countries surveyed in Wave V of the Arab Barometer, shows how people answered questions about unwanted sexual advances and in-household physical abuse. It shows that substantial majorities have not experienced physical sexual harassment and do not reside in a household in which there has been domestic violence.

Table 2.1 Univariate frequency and percentage distributions based on responses to questions about the Islamic State, unwanted sexual advances, and in-household physical abuse

Respondent answers to survey questions can be and frequently are aggregated by country for research projects in which country is the unit of analysis. Table 2.2 shows the distribution by country of one of the Wave IV questions about the Islamic State and one of the Wave V questions about sexual harassment and domestic violence. The construction of univariate distributions in which country is the unit of analysis is simple and straightforward. In most cases, it involves simply totaling the responses of everyone in the country who was surveyed and then calculating the distribution of percentages. Less straightforward, in some instances, is deciding which unit of analysis is most appropriate for the description (and explanation) of the variance that an investigator seeks to discern.

A third example from Arab Barometer data further illustrates the choice among units of analysis that a researcher may have to make. The variance in this case refers to voting, and specifically to whether or not the respondent voted in the last parliamentary election in her country. Based on responses to the question about voting in Wave V surveys, and again aggregating data from the surveys in Jordan, Lebanon, Palestine, Tunisia, Algeria, and Morocco, 45.5 percent of the individuals interviewed say they voted and 54.5 say they did not. This may be exactly what an investigator wishes to know, and it may be a point of departure for a study that asks about the attributes of individuals who are more likely or less likely to vote.

Alternatively, an investigator may not be very interested in how often individuals vote but rather in the variance in voting across a sample of countries. In this case, the variable of interest to a researcher is voter turnout; and as seen in Table 2.2, turnout ranges across the six countries from a low of 20.8 percent in Algeria to a high of 63.8 in Lebanon. Whether it is individual-level voting or country-level turnout that references the variance of interest to an investigation, and whether, therefore, the relevant unit of analysis is the individual or the country, depends, of course, on the goals of the researcher. Either one may be most relevant, and it is also possible that both will be relevant in some studies.

Table 2.2 Selected country-level univariate percentage distributions

Readers are encouraged to access the Arab Barometer website, arabbarometer.org, and take a closer look at the data. The website’s online analysis tool permits replication of the response distributions shown in Tables 2.1 and 2.2. Additionally, responses to topically associated questions not shown in these tables can also be accessed. In addition, the online analysis tool permits the conduct of simple mapping operations, operations that involve disaggregating the data and examining the variance that characterizes specific subsets of the population, such as women, older individuals, or less religious individuals.

2.2.4 Qualitative Research

While this volume places emphasis on social science research that works with quantitative data, the concepts of variance and unit of analysis also occupy a foundational position in positivist social science research that works with qualitative data. In positivist qualitative social science research, as in quantitative research, the initial objective is to discern the various empirical manifestations of each concept of interest to an investigator, and then to assign each unit on which the investigator has data to one of the empirical manifestations of each concept. The resulting frequency or percentage distributions provide potentially valuable descriptive information, as they do in quantitative research.

As in quantitative research, the objectives of a qualitative study may be descriptive, in which case no more than univariate distributions are needed. Alternatively, again as in quantitative research, these distributions on variables of interest to the investigator may be the beginning stage of research projects that aspire to explanation as well as description, and that, for this reason, anticipate bivariate and/or multivariate analysis. In any of these instances, the point that deserves emphasis is that the notion of variance is central to most positivist inquiry, be it quantitative or qualitative.

A small number of examples involving qualitative research may be mentioned very briefly to illustrate this point. Among these are two studies based on fieldwork in Lebanon, one by Daniel Corstange of Columbia University and one by Melani Cammett of Harvard University. Both projects included the collection and analysis of qualitative data.

The unit of analysis in the Corstange study is the community, some of which were villages and some of which were neighborhoods in larger agglomerations. The variables with respect to which these communities were classified—hence, the variance that Corstange sought to capture—included the inter-religious confessional composition of the community and whether its leadership structure involved competition or was dominated by one group. These qualitative distinctions with respect to which each community was classified were part of Corstange’s larger goal of explaining why some communities fared better than others in obtaining needed public goods, such as electricity and water.Footnote 8

The Cammett study involved the construction of a typology based on two variables taken together. Typologies almost always involve qualitative distinctions, even if one or both of the variables used in their construction are themselves quantitative. Typologies can be particularly useful in conceptualizing and measuring variance that involves more than a single dimension.

In the Cammett study, the unit of analysis was the welfare association, more formally defined as a domestic non-state welfare provider, and each was classified according to the presence or absence of a linkage to a political organization and also to an identity-based community. The concatenation of these two dichotomous distributions yielded four categories, each representing a particular “type” of welfare society based on its political and confessional connections taken together. Cammett’s distinctions with respect to type, as is usually the case with typologies, reference qualitative variance. Among the larger goals of the Cammett study, based on the proposition that the motivations of Lebanese welfare societies are not entirely charitable, was to discern whether and how welfare society type was related to the characteristics of those that the association tended to serve and how it made decisions.Footnote 9

The unit of analysis in another study, conducted in Palestine by Wendy Pearlman of Northwestern University, was the Palestinian national movement. The project focused on the movement’s resistance to the Zionist project prior to Israeli independence and to Israel’s occupation of the West Bank and Gaza following the war of June 1967. The variable of interest to Pearlman was whether resistance activities were essentially non-violent or included significant violence, often directed at Israeli citizens who did not live in the West Bank or Gaza. Pearlman gathered information about resistance activities over time, beginning with the post-World War I period, and then classified each instance of resistance according to whether the national movement used non-violent or violent methods in pursuit of its goals. The larger goal of Pearlman’s research project was not only to describe the variance in national movement resistance activities but also to test hypotheses about determinants of this variance.Footnote 10

A final example is provided by an older but very important study by Tunisian sociologist Elbaki Hermassi. Hermassi’s project focuses on Tunisia, Algeria, and Morocco, and country is the unit of analysis. An important qualitative variable in Hermassi’s study is the character of the governing regime at independence, 1956 for Tunisia and Morocco and 1962 for Algeria. Tunisia at independence was governed by Western-educated leaders who were supported by a mass-membership political party; in Algeria, the country was led by a military-civilian coalition backed by the military and without a popular grass-roots institutional base; and in Morocco, the king sat at the top of a political system that included a parliament in which the largest party had Islamist origins.Footnote 11

Hermassi uses this country-level variation in political regime to address and answer an important question: Why did the three countries arrive at independence with such differing political systems? After all, each was part of the Arab west and each was colonized by the French. To answer this question, Hermassi takes his readers on a sophisticated and insightful historical journey that can only be hinted at here. He identifies and describes differences among the three countries—making distinctions that are also qualitative—at critical historical periods and junctures. These include differences in pre-colonial society, differences in the character of French colonialism, and differences in the origins and leadership of the nationalist movement and the struggle for independence. The classification of the three countries during each of these time periods is anchored in thick description and extensive historical detail.

These qualitative differences between the three North African countries define a multi-stage temporal sequence through which Hermassi and his readers travel using a method known as process tracing. Country-level qualitative differences during one time period help to explain country-level qualitative differences during the time period that followed, leading in the end to an explanation of the reasons that the countries began their respective political lives at independence with very different governing regimes.

Although brief, and beyond the central, quantitative, focus of this research guide, this overview of qualitative social science research suggests several take-aways, all of which apply to quantitative social science research as well. One is that the concept of variance is no less relevant to qualitative social science investigations than it is to quantitative social science research. A second is that measuring and describing qualitative variance still involves specifying the unit of analysis. A third is that typologies, which make qualitative distinctions among units of analysis, are a useful technique for capturing the variance on concepts defined by more than one attribute or experience. A fourth is that the variance being measured may be among different entities at the same point in time, among the same entity at different points in time, or among different entities at different points in time. And finally, these examples illustrate the importance of fieldwork and deep knowledge of the circumstances of the unit of analysis and variables on which the research project will focus.

2.2.5 Descriptive Statistics

While we often want to know only whether a variable has very much or very little variance across our unit of analysis, it can also be useful to understand how to calculate variance mathematically. In addition, we may also want to describe a variable’s distribution of values (numbers) in other ways, such as giving the average value of a variable or identifying the value in a distribution that occurs most frequently (mode). Two ways of describing variance are central tendency and dispersion. There are descriptive statistics for both central tendency and dispersion that can be calculated mathematically.

Measures of central tendency are the mean, the median, and the mode. The mean, or average, is the sum of all observations for a variable divided by the total number of observations. The median is the “middle” value in a variable’s distribution of values; it is the value that separates the higher half from the lower half of the values in a distribution. The mode is the value in a distribution that appears most often.

It is important to understand that measures of central tendency—the mean, median, and mode—do not tell us how spread out a distribution of values is. The values might be clustered in the middle, spread out evenly, or clustered at the extremes, with each of the distributions having the same mean. Measures of dispersion can be calculated to determine the degree to which the values of a variable differ from the mean, or how spread out the distribution is. Two of the most important measures of dispersion are the variance and standard deviation. The standard deviation, which is the square root of the variance, is one of the most frequently used ways to determine and show the dispersion of a distribution. The interquartile range is another measure of dispersion. It shows how spread out the middle 50 percent of the distribution is by subtracting the value of the 25th percentile from the value of the 75th percentile.

Calculating the Variance

The variance expresses the degree to which the values in a variable’s distribution of values differ from one another. As a descriptive statistic, it is a measure of how much the values differ from the mean. For example, if satisfaction with a country’s healthcare system is measured on a scale of 1 to 4, with 4 indicating a high level of satisfaction, and if every individual in a study chooses 4 to express her opinion, there is no variance. The mean will be 4 and none of the ratings given by these individuals differs from the mean. Alternatively, if some participants in the study choose 2, others choose 3, and still others choose 4 to express their opinion, there is variance. We calculate the degree of variance, or dispersion, by squaring the difference between the value and the mean for each participant in the study, then summing the squared deviations for all participants, and then dividing this sum by the number of participants minus 1.Footnote 12 These calculations are expressed by the following formula, and its application is shown in an exercise below.

$$ {\mathrm{s}}^2=\Sigma\ {\left({\mathrm{x}}_{\mathrm{i}}-\bar{\mkern6mu}\mathrm{x}\right)}^2/\left(\mathrm{n}-1\right) $$

where s2 is the variance, xi is the value for each individual, \( \bar{\mkern6mu}\mathrm{x} \) is the mean, and n is the number of individuals.

In the example above, the unit of analysis is the individual. How much variance do you think there is in satisfaction with the healthcare system at the country level in the MENA region? Do you think it is about the same in every country, or do you think it is much higher in some countries and much lower in others? Offer only your “best guess.”

You can use the online analysis tool on the Arab Barometer’s website, following the steps shown below, to evaluate the accuracy of your best guess.

  • Click on the following link: https://www.arabbarometer.org/survey-data/data-analysis-tool/.

  • Click “AB Wave V—2018,” click “Select all,” and click “See results.”

  • Click “Evaluation of public institutions and political attitudes.”

  • Click “Satisfaction with the healthcare system in your country.”

You can see that there is quite a bit of variance. Which two countries have the most different levels of satisfaction with their healthcare system? Which countries are most similar to each other? Do you have any ideas about why they are similar?

Do you think the variance across the countries would be greater, about the same, or lesser if the cross-country variance was calculated using only male respondents? You can evaluate your answer by adding a filter:

  • Click “Add filter”

  • Click “Gender”

  • Click “Apply”

An example based on the tables below, Tables 2.3 and 2.4, further illustrates computation of the variance. In one case, the unit of analysis is the individual. In the other, country is the unit of analysis. The variable of interest in each case is satisfaction with the country’s healthcare system, which is measured on a 1 to 4 scale with 4 being very satisfied and 1 being very dissatisfied.

First are some questions based on Table 2.3, in which the unit of analysis is the individual. You should be able to answer the questions without doing any major calculations.

Table 2.3 Unit of analysis: individual
Table 2.4 Unit of analysis: country
  • What is the mean of the healthcare system ratings by the individuals in each of the three countries?

  • In which of the three countries is the variance of healthcare system ratings by individuals greatest? In which is the variance lowest?

We can now calculate the variance of ratings by the five individuals in Jordan. Thereafter, you may wish to calculate the variance of ratings by the five individuals in Iraq and in Tunisia. This will permit you to check the accuracy of your earlier estimates of rank with respect to magnitude of the variance among distributions for the three countries. To calculate the variance, we take the following steps:

  • Calculate the mean: Average satisfaction with the healthcare system in Jordan = sum of individual values/total number of observations = (4 + 4 + 3 + 1 + 2)/5 = 2.8

  • Calculate the sum of squared differences between each value and the mean: ((4–2.8)2 + (4–2.8)2 + (3–2.8)2 + (1–2.8)2 + (2–2.8)2) = 1.44 + 1.44 + .04 + 3.24 + .64 = 6.8

  • Divide the sum of the differences squared by the number of observations in the data set minus 1: 6.8/4 = 1.7

We have determined that the individual-level variance in satisfaction in the healthcare system in Jordan at the individual level is 1.7.

We turn now to variance at the country level based on Jordan, Iraq, Lebanon, and Palestine, as shown in Table 2.4. This means we are interested in how much the average values of satisfaction with the healthcare system of these countries differ from each other. Before doing any calculations, do you think the variance will be high or low?

The first step in calculating variance at the country level is to calculate the mean of the “satisfaction with the healthcare system” country scores shown in Table 2.4. Thereafter, following the procedures used when individual was the unit of analysis, the sum of the squared difference between the country-specific values and the mean of all countries together, divided by the number of units (countries) minus one gives the variance. These operations are shown below.

  • Mean = (3.3 + 2.6 + 3.5 + 2.6)/4 = 12/4 = 3

  • Variance = sum of squared differences between each value and the overall mean/number of observations minus 1 = ((3.3–3)2 + (2.6–3)2 + (3.5–3)2 + (2.6–3)2)/3 = .66/3 = 0.22

In this example, we see that there is variance in healthcare system satisfaction at the individual level and at the country level. However, variance at the individual level is much higher. Another way to think about this is to say that satisfaction with the healthcare system is more similar across countries than it is across individuals. What in your opinion is the significance of this unit of analysis difference in degree of variance? If you wanted to study healthcare satisfaction in the MENA region, how might this difference in degree of variance influence your research design?

Now, think about your own country or a country in the MENA region that you know well. What do you think is the average level of satisfaction with the healthcare system in this country? How much individual-level variance in healthcare system satisfaction do you think there is? What do you think causes, and thus helps to explain, this individual level variance?

2.2.6 Visual Descriptions

Investigators and analysts will often wish to see and show more about the distribution of a variable than is expressed by univariate measures of dispersion such as the variance or standard deviation, or more about a distribution than is expressed by a measure of dispersion and a measure of central tendency taken together. They may also wish to see and show exactly how the variance is distributed. Are the values on a variable clustered at the high and low extremes? Are they spread out evenly across the range of possible values? For these reasons, an analyst will often prepare a frequency distribution to show the variance.

Frequency distributions are a common way of looking at variance. A frequency distribution is a univariate table that shows the number of times each distinct value of a variable occurs. As shown for Jordan in Table 2.3, for example, the value “2” appears once and the value “4” appears two times. A percentage distribution shows the percentage of times a value appears in the data—the value “2” is 20 percent of all observations and the value “4” is 40 percent of all observations, Table 2.5.

Table 2.5 Example of a frequency and percentage distribution: satisfaction with the healthcare system in the MENA Region (Arab Barometer Wave V, 2018–2019)

There are many other ways to visualize the data on a variable. A bar chart is a visualization of a frequency distribution, where the bars represent distinct (categorical) responses and the length or height of the bars represents the frequency of each. A histogram is similar to a bar chart, but it is used to display groupings of a variable’s values. A pie chart is similar, but the numerical values of a variable are represented by different “slices,” with each representing its proportion of the whole.

You will see examples of all of these visual descriptions in the exercise below. In the Arab Barometer’s online analysis tool, frequency distributions are on the left side of the page and charts and graphs are on the right side of the page. Above the charts and graphs on the right side of the page is a legend that permits selecting the particular type of chart or graph that is desired.

Exercise 2.1. Units of Analysis, Variance, and Descriptive Charts

  1. 1.

    Go to the data analysis tool on the Arab Barometer website using the following link: https://www.arabbarometer.org/survey-data/data-analysis-tool/

  2. 2.

    Choose a wave of the Arab Barometer. For this exercise, let’s click “AB Wave V—2018-2019.” Select the country or countries you want to examine. For this exercise, click “Select all.” Next, click on “See results.” This will bring you to a page that says “Select a Survey Topic.” Select “General topics.”

  3. 3.

    Let’s begin with looking at a variable at the individual-level of analysis: interpersonal trust. Respondents can choose among the following options: “Most people can be trusted,” “I must be very careful in dealing with people,” or “Don’t Know.”

  4. 4.

    Click on “Interpersonal Trust.” You should now see a table showing the frequency and percentage of responses. On the right side, you should also see a bar chart with the percentage of each response. Congratulations, you have just made two types of frequency distributions! We can see that there is variance in interpersonal trust in the MENA region: 15.7 percent of the individuals surveyed trust most people, while 83.1 percent do not trust others very much. What questions could we ask based on the fact that there is variance in interpersonal trust in the MENA region? We could consider other individual-level factors, such as “Does someone’s age affect how much they trust other people?” Or country-level factors, such as “Does the amount of corruption in the country in which people live affect how much they trust other people?”

  5. 5.

    We might also be interested in how interpersonal trust varies over time. To see this, click “Time series.” How does interpersonal trust vary between 2007 and 2019? Why do you think people started trusting others less after 2013?

  6. 6.

    Suppose you are considering doing research on the question: Does gender affect interpersonal trust in the MENA region? Click “Cross by” and select “Gender.” Describe the variance in how much men and how much women in the MENA region trust others.

  7. 7.

    Now, suppose you are interested in how much gender affects interpersonal trust in a certain country, in a certain age group, or in some other category. Click “Select countries,” then “Deselect all,” and then select Morocco. Describe the variance in interpersonal trust based on gender in Morocco.

  8. 8.

    What are the advantages or disadvantages of studying how gender affects interpersonal trust in the MENA region vs. in Morocco? Which study do you think would be more interesting, or more instructive? We see that the distributions for men and for women in the MENA region look almost identical: 15.6 percent of men trust most people, and 15.8 percent of women trust most people. On the other hand, in Morocco, 19 percent of men and 26 percent of women trust most people. You might conclude from this that you want to pursue this research question in Morocco, and not in the entire MENA region.

  9. 9.

    You may also be interested in how interpersonal trust varies at the country-level. To see this, click “Select countries,” “Select all,” and “Apply.” You should now see the average, or mean, of respondents who selected each response category in each country. There is quite a lot of variance in interpersonal trust at the country-level! Describe the distribution you observe. Are there clusters of countries with similar degrees of interpersonal trust? What might be the reasons these countries have similar degrees of interpersonal trust?

  10. 10.

    Repeat the steps above for two other variables: one that you would like to explore at the individual-level—either in the entire MENA region or in one or more specific countries in which you are particularly interested; and one that you are interested in exploring at the country-level. Describe the variance you observe in each case. Are there individual-level factors that you think might help to explain the variance you observe? Are there country-level factors that you think might help to explain the variance you see?

Arab Barometer data have been used to illustrate the points this chapter makes about variance, variables, univariate analysis, and unit of analysis. This has been done, in part, for convenience, but also because Arab Barometer data are readily available, which offers readers an opportunity to replicate, deepen, or expand on the examples used above. The points being illustrated with Arab Barometer data are, of course, of general significance. Different kinds of examples will be offered in the second half of this chapter.

2.3 Data Collection and Measurement

Remaining to be discussed are two essential and interrelated topics that must be addressed in the design of a research project and then implemented before any analysis can be undertaken. One of these involves data collection, which obviously must precede both the calculation of descriptive statistics and the preparation of graphs and charts. Since most of our examples use Arab Barometer data, our discussion will give special attention to the collection and use of survey data. A fuller overview of survey research is presented in Appendix 3. We will also, however, discuss other sources and methods of data collection and data generation. Even researchers who work with data that have already been collected and cleaned should have an understanding of the sources and processes associated with data collection.

The second essential topic concerns measurement, which merits special attention when the concepts and variables of interest to a researcher are to some degree abstract and cannot be directly observed. In this case, measurement involves the selection and use of indicators, phenomena that can be observed and will permit inferences about the more abstract concepts. In survey research, and equally in many other modes of data collection, the concepts and variables to be measured and the indicators to be used must be identified before data collection can begin.

2.3.1 Types of Data and Measurement Scales

Data can be categorized as categorical or numerical, and many research projects utilize both types of data. Categorical data is often the main type of data used in qualitative research, although numerical data may also be used.

Categorical data are data that are divided into groups based on certain characteristics, such as gender, nationality, or level of education. Categorical data can either be nominal or ordinal. Nominal data don’t have a logical order to the categories. There is no set order to male or female, for example. Ordinal data do have a logical order. There is an order to primary school education, secondary school education, and university degree, for example. Sometimes categorical data are represented with numbers in datasets to facilitate statistical analyses, such as assigning “female” the number 1, and “male” the number 2. When a researcher does this, they will generally provide a codebook to assist others in understanding what the numbers mean.

Numerical data are data that can be measured with numbers. Numerical data can either be discrete or continuous. Discrete data can only take on certain values, such as the number of protests that take place in a month—you can’t have 3.1 or 3.5 protests. Continuous data are data that can take any value within a certain range, such as the GDP of a country. It could be $20.05 billion USD, $40.26 billion USD, or any other number larger than zero.

2.3.2 Data Sources and Data Collection

There are many different types of data sources, and each of them is useful in different contexts. We will not discuss all of them in detail in this guide, but it may be useful to get an idea of some of the major sources of data.

Existing datasets can be extremely useful for a researcher. Many existing datasets are free and accessible online to everyone. The Arab Barometer, and other similar surveys, such as the Afrobarometer, the World Values Survey, and the European Social Survey, measure diverse attitudes, beliefs, and behaviors in various regions of the world. International organizations, such as the UN and the World Bank, publish data on socioeconomic indicators and other topics on their websites. Most of the datasets are aggregated at the country level, but some data come from surveys or administrative systems and are at the individual or sub-national level. Many countries or administrative units also publish datasets, such as crime statistics.

Many researchers also make the data that they collect available online without charge, such as through Harvard Dataverse or personal or university websites. For example, a recently published dataset accessible through Harvard Dataverse is the Global Parties Survey, which compares the values, issue positions, and rhetoric of over 1000 political parties. Researchers make datasets available not only to make future research easier, but also to increase the transparency and replicability of their own research. This is important, as transparency and replication increase our confidence in researchers’ findings and can make their propositions easier to test in other settings.

Archival research involves using documents, images, correspondence, reports, audio or audiovisual recordings, or other objects that already exist. Archival research is commonly used to answer historical research questions. Additional types of records one might access when conducting archival research include medical records, government papers, news articles, personal collections, or even tweets. Archival materials are generally accessed at museums, government offices or, of course, archives. In some cases, a researcher may need to get special permission or training before being allowed to access archival materials. What documents are used depends on the research question. Researchers sometimes use content analysis to categorize or quantify archival documents.

Content analysis is a related research technique that can generate both quantitative and qualitative data. The goal of content analysis is to characterize textual or audio data, such as news articles, speeches of officials, or even a set of tweets. Content analysis can generate quantitative data by counting the frequency, space, or time devoted to certain words, ideas, or themes in the documents being analyzed. Content analysis can generate qualitative data, and sometimes also quantitative data, through directed coding, such as, for example, categorizing certain speeches as either in favor or not in favor of a certain policy. Direct coding is usually done by multiple coders who are instructed to employ a set of coding guidelines, and confidence in the data produced usually requires agreement among the decisions and assignments of the different coders. Sentiment analysis is a type of directed coding in which texts are classified as containing certain emotions, such as positive, negative, sad, angry, happy. Content analysis is most useful when there is a large amount of scattered text from which it is difficult to draw conclusions without analyzing it systematically. More recently, advances in the fields of computational linguistics and natural language processing have allowed researchers to conduct content analysis on much larger amounts of data and have reduced the need for human coders.

Observational research is exactly what it sounds like—observing behavior. Sometimes observational research occurs in a laboratory setting, where aspects of the environment are controlled to test how participants react, but often observational research occurs in public. A researcher might be interested in the gender dynamics of a protest, for example. The researcher might attend the protest and take notes or record who stands where in a crowd, what kinds of things women and their signs say versus what kinds of things men and their signs say. This is a very flexible method of data collection, but it can be difficult to draw conclusions with so many uncontrollable factors.

A focus group is a group discussion of a specific topic led by a moderator or interviewer. You have probably heard of focus groups in the context of consumer research. Focus groups are a good way to learn and understand what a target audience thinks about a specific topic. They are sometimes used in the early stages of survey research, before actually conducting the survey. In this connection, focus groups are used to gain ideas and insights that help in developing the survey instrument or in evaluating the clarity and efficacy of a survey instrument that an investigator is planning to use.

Interviews are a way of collecting information by talking with other people. They can be structured, unstructured, or semi-structured. You might conduct interviews of protesters as though you were having a conversation to get a sense of their motivations. This unstructured format might make the respondents feel more at ease and then disclose valuable information you would never have thought to ask. On the other hand, you might also conduct interviews in a structured way, by asking the protesters a predetermined set of questions. This allows you to more easily compare responses between respondents.

Survey research is another way to collect data by asking people questions. The answers people give are the data. You might conduct a survey through face-to-face interviews, as the Arab Barometer does. This means administering a questionnaire face-to-face, with the interviewer asking the questions and recording the responses. You might also conduct a survey using phone calls, text messages, or online messaging. Another method of conducting surveys is having people complete questionnaires in person, online, or through mail. In this case, the surveys are called self-administered rather than interview-administered. We have included more information about survey research in the Appendix of this research guide.

Sampling refers to the fact that it is usually impossible for a researcher to collect all of the data that are relevant for her research project. Sampling is very often a concern in survey research, but it may also be a concern when other data collection procedures are employed. There are some projects for which this is not the case, but these are the exceptions in social science research. In survey research, for example, an investigator may be interested in the political attitudes of all of the adult citizens of her country, but it is very unlikely, virtually impossible, actually, that she or her research team can survey all of these men and women. Or, a researcher may be interested in the gender-related behavior of students in college social science classes, but again, it is virtually impossible for the researcher and her team to observe all of the social science classes in all of the colleges in her country, let alone in other countries.

A sample refers to the units about whom or which the researcher will actually collect data, and these are the data she will analyze and with which she will carry out her investigation. Population refers to the units in which the researcher is interested, those about whom or which her study seeks to provide information. In the first example above, the population is all of the adult citizens of the researcher’s country; and in the second, it is college social science classes in general, meaning virtually all such classes.

The distinction between population and sample raises two important questions that are discussed elsewhere in this guide.

  • The first question asks which are the units about whom or which the investigator will collect information. In other words, it asks which members of the population will be included in the sample, and how will those in the sample be selected. The answers lie in the design and construction of the investigator’s sample, a topic that is discussed in the appendix on survey research with examples from the Arab Barometer.

  • The second question concerns the relationship between the population and the sample. It asks whether, and if so, when, how, and with what degree of confidence, can findings based on analyses of the data in the sample provide information and insight that apply to the population. This important question is taken up in the next chapter.

2.3.3 Conceptual Definitions and Operational Definitions

The indicators and type of data that are best suited for measuring a certain concept depend on how the concept is defined. A conceptual definition should specify how we are thinking about variance related to the concept: what is the unit of analysis and what is the variance we want to capture? Take, for example, the concept of quality of healthcare services. If country is the unit of analysis, do we want to capture the amount of healthcare services that the government provides? If so, we might consider measuring the concept by using the percent of the national budget devoted to health and healthcare, or the number of physicians per 100,000 citizens. On the other hand, we may be interested in citizens’ perceptions of healthcare service provision. In this case, we may want to measure perceived quality of healthcare services by asking questions, most probably in a survey, such as “Do you find doctors helpful when you are sick?” or “When you have been sick, were you able to obtain the healthcare services you needed?” We will want to conceptualize “quality of healthcare services” differently depending on our research question, and then, accordingly, collect or use data at the appropriate level of analysis.

Once an investigator has formulated and determined her conceptual definitions, she is ready to think about and specify her operational definitions. An operational definition describes the data, indicators, and methods of analysis she will use to rate, classify or assign values to the entities in which she is interested. An operational definition, in other words, describes the procedures an investigator will use to measure each of the abstract concepts and variables in her study, concepts and variables that cannot be directly observed.

In formulating an operational definition, an investigator must decide what data and indicators best fit the variance specified in the conceptual definition of each concept and variable in her study that cannot be directly observed or directly measured. She asks, therefore, what data can be obtained or collected, do these data contain the indicators she needs, and of course, what will be the quality of the data.

Returning to the previous illustration, suppose you have decided to measure the satisfaction of individuals with the provision of healthcare services. You need to decide what type of data to use, and in this case, it makes sense to use public opinion data. Perhaps, however, survey data on this topic do not exist, or the survey data that do exist do not ask questions that you would consider good indicators of the concept you want to measure. You might consider administering your own survey, but if this is not feasible, you can consider other types of data and data collection and build your own new dataset, informed and guided by the conceptual definition you are seeking to operationalize. For example, you might collect tweets related to healthcare and use content analysis by coders to rate each tweet on a spectrum ranging from very negative to very positive.

Think of another concept in which you are interested and then ask yourself the following questions. Are you interested in variance at the individual level, country level, or a different level and unit of analysis? What is a conceptual definition that makes clear and gives meaning to the variance of the concept that you seek to measure? What type of data would best measure your variable? What elements that might be good indicators of the variable you seek to measure—questions in a survey, for example—should the dataset contain? How feasible do you think it is to obtain data that will contain these elements?

Researchers cannot always use the data and method of measuring their concepts and variables that are best suited to operationalizing their conceptual definitions. Collecting data takes time, resources, and certain skill sets. Also significant, certain types of data collection may pose risks to the researcher or the research subjects, and for this reason they must be avoided due to ethical concerns. For example, in some countries it can be dangerous to interview people about their participation in opposition political parties or movements. It is important to consider the trade-offs in using different types of data and, in some cases, different indicators. Which type of data collection is most feasible? What are good indicators of the variance you want to capture? We discuss these questions in the following section.

2.3.4 Measurement Quality

How do we decide whether a certain kind of data or particular survey questions are good indicators of the concept we seek to measure? We want data to be reliable and valid, which are the criteria by which the quality of a measure may be evaluated. We also want measures that capture all of the variance associated with the concepts and variables to be used in analyses. Attention to these criteria is particularly important if the concept to be measured is abstract and not directly observable. In this case, we will probably be measuring indicators of the concept, rather than the concept itself.


Reliability refers to the absence of error variance, which means that the observed variance is based solely on the measure, as intended, and not on extraneous factors as well. In survey research, for example, a question will be a reliable measure if the response is based solely on the content of that question and not also on factors such as ambiguous wording, the appearance or behavior of the interviewer, comments by other persons who were present when the interview was conducted, or even the time of day of the interview.

Attention to reliability is no less important in other forms of research. For example, when coding events data from newspapers in order to classify countries or other units of analysis with respect to newsworthy events such as protests, instances of violence, violations of human rights, labor strikes, elections and electoral outcomes, or other attributes, error variance may result from unclear coding instructions, inconsistent newspaper selections, or changing standards about what constitutes an instance of violence or a violation of human rights.

Consistency over multiple trials offers evidence of reliability. In survey research, once again, this means that a question would be answered in the same way—perhaps a response of “somewhat satisfied” to a question about satisfaction with the country’s healthcare system—regardless of who was the interviewer or the time of day at which the interview was conducted. In natural science, especially laboratory science, this means that the result of a measuring operation is reproducible.

In social science, evidence of reliability is often provided by consistency among multiple indicators that purport to measure the same concept or variable. This applies not only to questions asked in a survey, but also to data collected or generated in other ways as well. A measure based on multiple indicators that agree with one another can also be described as a unidimensional measure, and unidimensionality across multiple indicators demonstrates that a measure is reliable. For this reason, researchers often seek to use multiple indicators to measure the same concept in order to increase the robustness of their results.

Note also that the values or ratings produced by different indicators need not be absolutely identical. To be consistent with one another, they need only to be strongly correlated. A number of statistical tests are available for determining the degree of inter-indicator agreement, or consistency. Cronbach’s alpha is probably the test most frequently used for this purpose. Factor analysis is also frequently used. We discuss some of these statistical techniques in Chaps. 3 and 4.

Table 2.6, which presents hypothetical data on indicators purporting to measure a country’s level of development, provides a simplified illustration of three patterns of agreement among multiple indicators. Although the data are fictitious, the indicators might be thought of as Gross Domestic Product, Per Capita National Income, Percentage of the Population below the Poverty Line, the Level of Unemployment, or other potentially reliable indicators of national development. The table illustrates the following three patterns.

  • Pattern A indicates strong inter-indicator agreement and hence a high degree of reliability. Even though the ratings are not completely identical, the correlations among them are very strong. Each of the indicators can be used with confidence that it is a reliable measure. Its reliability has been demonstrated by its agreement with other indicators. The items can also be combined to form a scale or index, which, again, can be used with confidence that it is a reliable measure.

  • Pattern B indicates the absence of inter-indicator agreement and hence a low level of reliability. It is possible that one of the indicators is a reliable measure of national development, but there is no basis for determining which item is the reliable measure. It is also possible that all are reliable measures but of different dimensions of national development, meaning that the concept and the measure are not unidimensional and, hence, Pattern B does not provide evidence that any indicator or combination of indicators constitutes a reliable measure for the specific concept of concern.

  • Pattern C indicates strong inter-indicator agreement among three of the indicators (I-1, I-2, and I-4), and these three, but not the fourth (I-3), may be considered reliable measures and used in the ways described for Pattern A. In the absence of evidence that it is reliable, I-3 should not be used to measure the same concept, national development, that the other three indicators measure.

    Table 2.6 Patterns of agreement among hypothetical indicators of National Development and Reliability Implications


Validity asks whether, or to what degree, a measure actually measures what it purports to measure. A concern for validity is important whenever the concept or variable to be measured cannot be directly observed, and so the investigator in this case must use an indicator, rather than the concept or variable itself, to capture the relevant variance.

It is useful to think of validity as expressing the congruence between the conceptual definition and the operational definition of a concept or variable. A conceptual definition makes explicit what an investigator understands the concept to mean, and it is important that she provide a conceptual definition when the concept cannot be directly observed, is abstract, and might also be multi-dimensional. By contrast, if the concept or variable is familiar and there is a widely shared understanding of what it means, the investigator may make this the basis of her conceptual definition.

The operational definition makes explicit the way that the concept or variable will be measured. What indicator or indicators should an investigator use, and how exactly should she use them? Suppose, for example, that an investigator is designing a study in which country is the unit of analysis and the goal is to measure the degree to which each country is democratic. Her operational definition will spell out how she will capture the cross-country variance in degree of democracy. If you, Dear Reader, were the investigator, what would be your operational definition? What indicator or indicators would you use, and how would you use them?

A concern for validity often emerges when using Arab Barometer survey data. Suppose you wanted to rate or classify individuals with respect to tolerance and interpersonal trust. After offering conceptual definitions of tolerance and interpersonal trust, what would be your operational definitions? What item or items would you feel comfortable treating as indicators of each concept and would you, therefore, include in your questionnaire or survey instrument?

It is important to make clear that validity is not about how well the variance is captured by an operational definition. That is an important concern and one by which the quality of a measure is judged, as discussed in the next section. But validity does not ask how much of the variance is captured but, rather, does the variance that is captured, however complete or incomplete that may be, actually pertain to the concept specified in your conceptual definition.

The standardized tests used to evaluate and classify students are often mentioned to illustrate this point. Do intelligence tests really measure intelligence, or do they rather measure something else—perhaps being the oldest child, perhaps income, perhaps something else? Do university entrance exams, the tawjihi, for example, really measure what they purport to measure: the likelihood of success at university? Or do they again measure something else—perhaps growing up in a middle class household? You might find it useful to construct your own operational definition of the concept “likelihood of success at university.” What indicators would you use, and how would you use them to construct a measure that would give a rating or score to each student?

An exercise with Arab Barometer survey data provides another illustration, and one in which the importance of the conceptual definition is also demonstrated. Suppose that the variable to be measured is satisfaction with the government, and the goal is to rate each respondent on a five-point scale ranging from 1 = no satisfaction at all to 5 = very high satisfaction. Do you think, for example, that a question about government corruption is a valid indicator—an indicator of the concept as it is defined in your conceptual definition? What about a question that asks, “Do you think government officials care more about being re-elected than solving important problems for the country’s citizens?” What question would you write to attempt to measure the concept of government satisfaction?

Once you have specified your measurement strategy, or operational definition, it may be necessary, or at least very advisable, to offer evidence that your measure is valid—that you can use it to measure a concept with confidence and that it does actually measure that concept. Unlike reliability, which can be demonstrated, validity must be inferred. An investigator will state why it is very likely that the measure does indeed measure the concept it purports to measure, and when appropriate, she will offer evidence or reasoning in support of this assertion.

Face Validity Sometimes, asserting “face validity” is sufficient to establish validity and to persuade consumers of the findings produced by a research project that an operational definition does indeed measure what it purports to measure. This may be the case if there is an apparently very good fit between the conceptual definition and the operational definition of a particular concept or variable. In many cases, however, face validity may not be evident and the assertion of face validity by an investigator is unlikely to be persuasive. Below are brief descriptions of the ways an investigator can support her assertion that a measure is valid. Although different, each involves some sort of comparison.

Construct Validity The measure may be considered valid if it is related to the same phenomena, and in the same way, that the concept being measured is known to be related to the measure. For example, if an investigator conducting a survey seeks to measure interpersonal trust, and if it is known that interpersonal trust is related to personal efficacy, construct validity can be demonstrated by a significant correlation between the investigator’s measure of interpersonal trust and a measure of personal efficacy.

Criterion Validity Also sometimes known as Predictive Validity. The measure may be considered valid if there is a significant correlation between the results of an investigator’s operational definition and a distinct, established, and commonly used measure of the concept or variable the investigator seeks to measure. For example, an investigator using aggregated survey data to classify countries with respect to democracy might assess validity by comparing her country ratings with those provided by Freedom House.

Known Groups The measure may be considered valid if it correctly differentiates between groups that are known to differ with respect to the concept or variable being measured. For example, evidence that the tawjihi examination is a valid measure of likelihood of success at university would be provided if university students currently doing well at university have higher exam scores than university students currently doing less well at university.

Inter-Indicator Agreement Inter-indicator agreement builds on the discussion of reliability, particularly on the significance of unidimensionality and the patterns of inter-indicator agreement shown in Table 2.6. If each indicator in a battery of indicators has face validity, and if each one agrees with each of the others, it is very unlikely that they measure something other than the concept or variable they purport to measure.

As noted in the discussion of reliability, various statistical procedures, including Crombach’s alpha and factor analysis, can be used to assess the degree to which indicators are inter-correlated and, therefore, taken together, constitute a unidimensional and reliable measure. And if different indicators possessing face validity all reliably measure the same concept or variable, it is reasonable to infer that this concept or variable is indeed the one the investigator seeks to measure and is, therefore, valid as well as reliable.

Content Validity Content validity refers to whether, or to what degree, an operational definition captures a fuller range of the concept’s meaning and variance. As discussed in the next section, using multiple indicators increases the likelihood that a measure will possess content validity.

Exercise 2.2. Inter-Indicator Agreement and Reliability and Validity

Which of the following items from Arab Barometer surveys do you think would be most likely to be reliable and valid indicators of support for gender equality? Briefly state the reasons you have selected these particular items, or all of the items, if that is what you chose. Then, referring to the patterns of inter-indicator agreement shown in Table 2.6, describe the pattern that you think the items you have selected would resemble. Finally, describe and explain the implications for reliability and validity of the pattern of inter-indicator agreement that you think the items you have selected would resemble.

  1. 1.

    Do you think it is important for girls to go to high school?

  2. 2.

    A married woman can work outside the home, if she wishes

  3. 3.

    It is acceptable for a woman to be a member of parliament

  4. 4.

    A university education is more important for a boy than a girl

  5. 5.

    Men and women should have equal job opportunities and wages

  6. 6.

    Women have the right to get divorced upon their request

  7. 7.

    A woman can be president or prime minister of a Muslim country

  8. 8.

    A woman should cease to work outside the home after marriage in order to devote full time to home and family

  9. 9.

    A woman can travel abroad by herself, if she wishes

  10. 10.

    On the whole, men make better political leaders than women

2.3.5 Capturing Variance

Although reliability and validity are recognized and widely-used criteria for assessing the quality of a measure in social science (and other) research, there is an additional criterion that is important as well. This is the degree to which a measure captures all, as opposed to only some, of the variance associated with the concept that an investigator seeks to measure. This can be described as the completeness of a measure.

If the variance of the concept to be measured is continuous, ranging from low to high or weak to strong, for example, a measure will be flawed if it does not capture the whole of the continuum, or at least the whole of that part of the continuum in which the investigator is interested. Such a measure may be reliable and valid, and in this sense of very good quality. But its utility may still be limited, or it may at least be less than maximally useful, if it captures only some of the variance.

A simplified and hypothetical example of a survey about religious involvement illustrates this point. If an investigator wishes to know how often a respondent attended Friday prayers at the mosque during the past year, her survey instrument should not ask a question like the following: “Over the last year, on average, did you pray at the mosque on Friday at least once a month?” A response of “No” will lump together respondents who never pray at the mosque on Friday and those who do so in two months out of three. A response of “Yes” will lump together those who attend Friday prayers once a month and those who do so every week.

In constructing the survey instrument, the investigator may have had good reason to ask a Yes-No question and make “once a month” the cutting point. But such a cutting point can be implemented during the data analysis phase, if needed, rather than asking the initial survey question in a manner that reduces much of the variance that characterizes the population being surveyed.

A more realistic example, perhaps, would be survey questions that ask about age or income. Ideally, the investigator should ask respondents for their exact age and their exact monthly or annual income. Sometimes this is difficult or impossible, however, in the latter case, perhaps, because the matter is considered sensitive. If this is the case, the investigator may decide to use age and income categories, such as 18–25 years of age and 500–1000 dinars per month. Again, while the data obtained may be reliable and valid and also useful, not all of the variance that characterizes the population has been captured: individuals 19 years of age and individuals 24 years of age are treated as if they have the same age, the variance in their actual ages, therefore, not being captured. Even more variance would remain uncaptured by wider categories or categories with no lower or upper limit, such as 55 years of age or older and, say, 5000 or more dinars a month.

The same concern arises when the data are categorical rather than continuous. Variance in this case refers to a range of types or kinds or categories. As is standard with categorizations, categories should be comprehensive and mutually exclusive, meaning that every member of a population can be assigned to one but only one category.

The challenge here is for the investigator to be knowledgeable about the array of one-and-only-one categories into which she wishes to sort the entities whose attributes she seeks to measure. And in principle, this means she must be knowledgeable about the actual, real-world variance, as well as about the categories relevant for her particular study. With respect to religious affiliation, for example, asking people in Lebanon whether they are Muslim or Christian would leave a great deal of variance uncaptured since there are important subdivisions within each category. Asking only about Muslims and Christians would therefore be appropriate only if the researcher is aware of the subdivisions within each category and has explicitly determined that her project does not require attention to these subdivisions.

There are numerous examples that involve a unit of analysis other than the individual. Consider, for example, a study in several Arab countries in which non-governmental organization, NGO, is the unit of analysis, and the variable of interest is NGO type. The investigator seeks, in other words, to prepare a distribution of NGO types, perhaps to see if the distribution differs from country to country. In this case, the investigator must decide on the categories of NGO type that she will use, and these categories, taken together, must be such that each NGO can be assigned to one and only one category.

To make it easier to assign each NGO to a category of NGO type, the investigator might be inclined to define NGO type very broadly, such as economic, sociocultural, and political NGOs. But this will, again, leave much of the variance uncaptured. Assigning NGOs to the “sociocultural” NGO type category, for example, will group together NGOs that may actually differ from one another with respect to objectives and strategies and perhaps in other ways as well. The investigator must be aware of these within-NGO type differences and make an informed decision about their relevance for her study.

Finally, there is an additional and somewhat different way in which an investigator needs to think about the variance that will and will not be captured, and this concerns the dimensional structure of the concept to be measured. For example, the United Nations has developed an index of human development and it annually gives each of the world’s countries a numerical HDI score. Investigators can use the index provided by the UN if it measures a concept that is relevant for their studies. But the HDI is based on a formula that combines a country’s situation with respect to health, education, and income, and countries with an identical HDI score are not necessarily the same with respect to the three elements. One country’s HDI may be driven up by the excellent quality of its educational system, whereas it might be the excellent quality of its health care system that is driving up the HDI score of another country.

Does this mean that the investigator should abandon the HDI and instead include separate measures of education, income, and health in their analyses? Of course, it depends on the goals of the study. But investigators seeking to measure concepts with multiple dimensions or multiple elements must be aware of these differences and then, in light of the goals of each specific research project, make informed decisions about whether the variance these differences represent does or does not need to be captured.

Other examples, ones in which the individual is the unit of analysis, remind us that attitudes and behavior may also have multiple dimensions or components, and that a researcher must again decide, therefore, whether her investigation will be best served by considering the dimensions separately or by constructing an index that combines them.

Attitudes about immigration, for example, probably have an economic, a cultural, and perhaps a political dimension. Similarly, the important concept of trust has multiple components, including general interpersonal trust, trust in important political institutions, trust in people who belong to a different religion, etc. In cases such as these, the researcher will likely want to ask about each of these dimensions or components. The use of multiple questions will enable the researcher to capture more of the variance associated with attitudes toward immigration or the concept of trust. It will remain, however, for the investigator to decide whether to consider the various elements separately or construct an index that considers them in combination with one another.

The concepts and procedures discussed in this chapter focus on description, on taking variables one at a time. But while the objectives of a positivist social science research project might be descriptive, and this might well produce valuable information and insight, familiarity with the topics discussed in the present chapter is necessary not only, or even primarily, for investigators with descriptive objectives. An understanding of many of these topics is essential for investigators who seek to explain as well as describe variance. This is not the case for every topic considered. Descriptive statistics and visual descriptions are, as their name indicates, for descriptions, for variables taken one at a time. But most of the other concepts and procedures are building blocks, or points of departure, for research endeavors that seek to explain and, toward this end, carry out bivariate and multivariate analyses. Accordingly, readers of Chaps. 3 and 4 will want to keep in mind, and may occasionally find it helpful to refer back to, the material covered in the present chapter.