1 The Atomic Model

This chapter examines the role of the Rasch (1960) model for dichotomous data from the perspective of first principles concerning the measurement of psychometric constructs. The chapter therefore begins with an atomic model for a single, dichotomous item. The descriptive term atomic implies that single-item models appear as basic units in more elaborate so-called molecular models that incorporate multiple items. Typically, 1 of the 2 outcomes for the dichotomous item is the favored or successful response, and this is denoted as 1. The atomic model defines a construct on persons by specifying that each person has a probability of responding 1 to the item, and this probability varies from person to person. This success probability can, if we choose, be taken as the person parameter, that is, the true measure of a person on the construct defined by the atomic model. The variability from person to person is a sine qua non for the construct-defining property of the atomic model. A model for the tossing of a coin, in which each person has the same probability of obtaining the favored outcome (i.e., heads), does not define a construct on persons (Wood, 1978).

Two important observations can be made about the atomic model and its associated construct: (1) the quantity that represents the construct is a latent variable (i.e., it is not directly observable as data); and (2) the essential character of this latent variable is ordinal. We consider the implications of both of these observations in some detail, beginning with the latter.

The ordinal character referred to in the second observation above does not imply that the atomic model has only order relationships among the objects of measurement (i.e., persons). The success probabilities are a valid numerical representation of the construct. Ordinal character instead implies that any monotonic transformation of the success probabilities would be as valid a representation of the construct as the success probabilities themselves. Thus, the probabilities could be converted to logits or probits, perhaps followed by multiplicative rescaling. Transforming the probabilities from an atomic model into logits yields a Rasch model with the difficulty for the single item set to zero. Setting the difficulty parameter to something other than zero in the Rasch model for a single item produces an alternative monotonic transformation. If that transformation is followed by a multiplicative rescaling,Footnote 1 the result is the two-parameter model for a single item with specified location and discrimination parameters (Birnbaum, 1968). Because of the ordinal character of the construct obtained from the atomic model, there is no inherent reason to prefer any particular monotonic transformation of the probabilities compared to any other. Within the confines of the atomic model, the choice of a numerical structure to represent the order relationships is a matter of personal preference.

The first of the two important observations has critical implications for measurement. When the objective is measurement, it is not enough to define the construct in terms of a latent variable. The sine qua non for measurement is the ability to use observable data to discriminate reliably between objects that differ on the construct by an amount large enough to matter. In almost all situations, the one bit of information produced by the observable response to a single dichotomous item does not yield the reliable discrimination required for measurement. Reliable discrimination requires replication, and that in turn requires we broaden the atomic model to accommodate more items.

Some of the previously introduced concepts (e.g., a construct difference that is considered large enough to matter) as well as concepts to be introduced subsequently are context-dependent. It is helpful to have a prototype example to which to refer when discussing these context-dependent concepts. As such an example, consider an item from a reading test in which a passage is presented followed by a task, scored as correct or incorrect, to assess the reader's understanding of the passage. The construct is reader ability. We assume that better readers have higher probabilities of succeeding at the assessment task.

The inadequacy of the discrimination obtainable from a single dichotomous item is easy to see in this example. The difference between a reader with a 40% chance of success and a reader with an 80% chance of success is almost certainly large enough to matter. There is a less than even chance of successful detection of this difference with the single item. Hence, replication is needed.

The purpose of replication is to generate new information that can be used to estimate the value of the latent variable with less uncertainty. In many other dichotomous experiments (e.g., coin-tossing), the experiment can be repeated and the second outcome assumed to be independent of the first, but the assumption of independence for a repeat encounter is not appropriate in the current context. If a reader is presented again with the same passage and the same assessment task, the same outcome is virtually assured to occur. When the second outcome is independent of the first, it provides new information that can be used to reduce uncertainty about the latent variable; when the second outcome is perfectly predictable from the first, no new information is produced. This dependence problem can be overcome by introducing a new passage and task as the replication, but in this situation, the single-item atomic model is no longer adequate. There are now at least two atoms to consider. Continuing the metaphor, a model that represents two or more dichotomous items will be called molecular.

2 Molecular Models

With two dichotomous items, call them Item A and Item B, each could be represented by its own atomic model. Thus, each person has two probabilities, pA and pB, that represent the probability of a correct response on Item A and Item B, respectively. These two latent variables could potentially represent two distinct constructs. A third possible construct, represented by the latent variable equal to the sum of the two probabilities, comes readily to mind.

The replication needed to achieve measurement requires multiple items whose atomic models represent the same construct. What conditions must be imposed on the molecular model to ensure that its constituent atomic models define the same construct? The answer lies in the essential ordinal character of the construct obtained from the atomic model.

In the two-atom molecular model, the set of success probabilities for Item A generates a rank ordering of persons that allows the possibility of ties. A similar statement applies to Item B. The two atomic models define the same construct if and only if the success probabilities for the two items generate the same rank ordering of persons, including ties. A set of two or more items whose atomic models define the same construct is said to satisfy the unidimensionality condition. It is worth noting that unidimensionality may depend on the population of persons. A set of items that satisfies unidimensionality for a given population may not be unidimensional when the population is extended.

When the unidimensionality condition is satisfied for a two-atom molecular model, the constructs derived from the two atomic models and the construct obtained from the sum of the two success probabilities are all identical. The latent variables that arise from the two atomic models are related by a strictly monotonic transformation. The proof of this assertion is straightforward. Define a function h as follows:

If a person has success probabilities pA and pB, then h(pA) = pB.

The preservation of ties implies that h is unambiguously defined. The preservation of rank order implies that h is strictly monotonic. The ordinal character of the construct implies that any strictly monotonic transformation of a valid numerical representation is another valid numerical representation of the same construct.

More than two items of replication are generally necessary to achieve the measurement objective. A multi-item instrument can provide adequate replication if the items satisfy the unidimensionality condition (Stout, 1990). The question is whether unidimensionality is a reasonable assumption in a given context. For an answer in the context of the prototype example, consider a test of reader ability that consists of 40 dichotomous items. The assumption of unidimensionality asserts that examinees can be ordered by reading ability such that for every item on the test, more able readers have higher success probabilities than less able readers.

3 Parameterizations

As mentioned above, any monotonic transformation of the success probabilities in an atomic model can be used as a numerical representation of the person parameter for the construct defined by that model. Let θ = θ(p) be the result of applying a monotonic transformation to a success probability p. The change from p to θ can be regarded as a reparameterization of the atomic model. The inverse function p = p(θ), which is also monotonic, is called the item characteristic function. Note that the form of the item characteristic curve depends on the reparameterization, and the selection of the reparameterizing monotonic transformation is arbitrary. If success probabilities are converted to logits, the item characteristic curve will be the logistic ogive. If success probabilities are converted to probits, the item characteristic curve will be the normal ogive.

In a molecular model that incorporates multiple items, each person has multiple success probabilities, one for each item in the model. If the model satisfies the unidimensionality condition, it unambiguously determines a single construct of ordinal character. When a parameterization of that construct is selected, each person can be mapped to the construct with a parametric value that determines all of that person's success probabilities.

For example, the construct from a unidimensional molecular model could be parameterized in terms of the success probabilities for a single canonical item, say Item A. For any other item in the model, say Item B, there is a monotonic function, hB that satisfies pB = hB(pA) for all persons. For this parameterization, the item characteristic curve for Item A will be a straight line from the origin to the point (1, 1). For Item B, the item characteristic curve need not be a straight line but will start at the origin and increase monotonically until it reaches (1, 1). If instead values for a latent variable θ are obtained by transforming the success probabilities for Item A into logits or probits, the item characteristic curve pA(θ) for A will be the logistic or normal ogive, and Item B's characteristic curve will be hB(pA(θ)).

4 Unidimensionality, Replication, and Measurement

At this point it is appropriate to return to the prototype example to examine the progress toward the goal of achieving measurement. Two questions should be raised when a new item is included in the assessment instrument: (1) Is the new item on the construct? and (2) Does its inclusion provide new information about the person's location on the construct (i.e., the person parameter)? To achieve our goal of measurement, we need affirmative answers to both questions. The first question can be answered yes when the unidimensionality condition holds, and the second can be answered yes if the new item has the property of local independence with the other items in the instrument. Local independence occurs when the responses to two items by the same person are statistically independent. In a model in which the person parameter is not represented as a random variable, local independence is identical to statistical independence. If the person parameter is a random variable, as in some Bayesian models, local independence is a conditional independence, given the person parameter. In either case, the addition of new items with the local independence property produces replication that leads to a reduction in the standard error of measurement.

Achieving the measurement objective requires replication, but how much replication is required to achieve reliable discrimination between objects that differ by an amount large enough to matter? The answer depends on the context. The finer the differences to be discriminated and the higher the desired level of confidence in the discrimination, the more replication is needed. In the prototype example, if a valid reading test does not have sufficient reliability for the purpose at hand, it may be possible to find more items with affirmative answers to Questions #1 and #2 that could be appended to the test to increase the reliability to the desired level.

At this stage, the unidimensional molecular model can be enhanced with a replication capability by assuming an inexhaustible supply of new on-construct items. Without making further assumptions, what are the measurement implications of this replication-enhanced, unidimensional model? The answer is that reliable discrimination can be achieved between two objects whose construct values differ but only to the extent of determining the order relationship between the two objects. In the prototype example, suppose John and Alice have different reading abilities. If replication is available via an inexhaustible supply of reading items, but all that can be said of these items is that they have local independence and are on construct (the unidimensionality assumption), then a test that consists of a sufficient number of these items can determine who is the better reader but not by how much. Just being on construct (i.e., ordering persons identically) is not an adequately stringent assumption about items to measure any more detailed information about persons than their order relationships. Measurement on an interval scale requires more specificity in the assumptions about the atomic models for the items.

5 The Stringency Construct for Model Specifications

It is possible and often useful to contemplate a stringency construct defined for statistical models. Ordinal position on this construct is determined by the stringency of the assumptions incorporated into the statistical model. These assumptions are called the model's specification, and a so-called tight specification has more stringent assumptions than a loose one. The molecular model that asserts only that there is some encounter-specific probability of success associated with each encounter between a person and an item is a loose specification—the loosest under consideration. Incorporating local independence and unidimensionality tightens the specification but only enough to allow ordinal measurement even though the model is based on a numerical latent variable. The Rasch model is clearly an even tighter specification because it assumes more about the items' atomic models.

If we denote by ULI the model that assumes only unidimensionality and local independence, there is a stringency difference between ULI and the Rasch model. Two questions naturally arise: (1) What further tightening assumptions must be added to the ULI model to yield the Rasch model specification? and (2) Are there any important model specifications located between ULI and the Rasch model on the stringency construct?

The answer to the second question is yes. When the roles of persons and items are reversed in the molecular model, the success probabilities associated with a person can constitute an atomic model for a construct on items. There are as many such atomic models as there are persons. Application of the assumption of unidimensionality to these atomic models tightens the specification. Models can satisfy person unidimensionality without satisfying item unidimensionality. Models that satisfy both unidimensionality conditions are called doubly monotonic models. As the discussion in the next section demonstrates, the doubly monotonic model specification is tighter than person unidimensionality but looser than the Rasch model specification.

6 Doubly Monotonic Models

In the prototype example, it may happen that for each reader, the success probability is lower for Item B than for Item A. As a possible explanation for this feature, perhaps the passage in Item B presents more of a challenge to comprehension than does the passage for Item A (e.g., more complex syntax and more difficult vocabulary). This suggests the possibility that a text readability construct for items might be obtainable from a molecular model that incorporates multiple persons and items. So far, persons have been the objects of measurement and items have been regarded as instruments for measuring the person construct. These roles can be reversed. Consider a molecular model with multiple persons and items and a success probability for each encounter between a person and an item. There is an atomic model associated with each person, and each of these atomic models defines a construct on items. The unidimensionality condition will be satisfied if the rank order of items is consistent for all persons’ atomic models. In the prototype example, this unidimensionality of the persons’ atomic models implies that the text readability rank of two passages will not depend on who happens to be reading them.

A molecular model with multiple items and persons has two sets of atomic models: one set provides rankings of persons for each item and the other provides rankings of items for each person. If unidimensionality is satisfied for both sets, it is called a doubly monotonic model. If a doubly monotonic model is capable of replication, then ordinal measurement is enabled for persons if enough items are included and for items if enough persons are included.

When a molecular model has double monotonicity, the item characteristic curves do not cross regardless of the parameterization. Conversely, if a unidimensional model does not have double monotonicity, the item characteristic curves will be monotonic, but at least one pair of curves will cross. In a doubly monotonic model, the person characteristic curves also do not cross. If the atomic models for persons define a unidimensional construct on items, the person characteristic curves will be monotonic. If there is any crossing of these person characteristic curves, the molecular model will not define a unidimensional construct on persons.

7 Numerical Conjoint Measurement Models

The Rasch model is an example of a numerical conjoint measurement model. With the Rasch model, when the success probabilities are transformed to logits, the result is an array of numbers that can be expressed as differences between a person parameter that does not change from item to item and an item parameter that does not change from person to person. In other words, the interaction in the array of success probabilities can be removed by transforming them to logits. This feature, existence of a monotonic transformation to remove the interaction from the array of success probabilities, characterizes a numerical conjoint measurement model. A numerical conjoint measurement model is necessarily doubly monotonic.

The absence of interaction means that the transformed success probabilities can be expressed as differences between a latent variable for persons and a latent variable for items. Consequently, the constructs for persons and items (e.g., reader ability and text readability in the prototype example), can be expressed on a common interval scale. The absence of interaction also implies that item and person characteristic curves are horizontally parallel, that is, any item characteristic curve can be transformed to any other by moving it either left or right. A similar statement applies to person characteristic curves. The parallelism implies the trade-off of a difference between two reader measures for an identical difference between two text measures to hold constant the success probability (i.e., comprehension rate).

The Rasch model specification is not the only one with the numerical conjoint measurement property. For every numerical conjoint measurement model, there is a monotonic transformation that removes the interaction from the array of success probabilities. An important feature of the logistic transformation, and thus a feature unique to the Rasch model, is the property that the raw scores for persons and the item p values are sufficient statistics for estimating the respective latent variables.

This is not to say that the Rasch model is an instantiation of the theory of conjoint measurement (Luce & Tukey, 1964; Krantz, Luce, Suppes, & Tversky, 1971). The Rasch model is not concerned with the ordinal and equivalence relations necessary and sufficient for additive representation (i.e., those entailed by the hierarchy of cancellation axioms (Scott, 1964)).Footnote 2

8 The Score Sufficiency Condition and Its Implications

Sufficiency is an important technical term in the language of statistical inference. It is especially important in the current context because of its implications with respect to the Rasch model, which is unique in having the property that the raw score—the total number of correct responses—is a sufficient statistic for estimation of the person parameter. Raw score sufficiency means that, once we know the total number of correct responses, we can learn nothing more about the person parameter from the response pattern.

The precise statement of this result is as follows. Assume that the multi-item, multi-person molecular model is unidimensional for persons and that local independence holds. If, in addition, the raw score is a sufficient statistic for the person parameter, then the molecular model is necessarily a Rasch model. In other words, in the context of a unidimensional molecular model with local independence, the reparameterization obtained by transforming the success probabilities to logits produces horizontally parallel item characteristic curves.

The proof is rather straightforward. Let item A and item B represent two arbitrarily selected items, and let pA(θ) and pB(θ) denote the success probabilities for these items expressed as an arbitrary monotonic function of a person parameter θ. Raw score sufficiency implies that the conditional probability distribution of response patterns with the same raw score does not depend on the person parameter. Consider a response pattern in which item A is answered correctly but item B is not in comparison to a pattern in which the only change is to reverse these two responses (i.e., item B correct, item A not). These two patterns have the same raw score. The ratio of the probabilities of these patterns in the conditional distribution is the same as their ratio in the unconditional distribution.

If this ratio is denoted by R, it is defined by the equation:

$$ p_{A} \left( {1 - p_{B} } \right) = R\left( {1 - p_{A} } \right)p_{B} , $$
(1)

where the common factors for the responses to other items have been canceled out. Dividing this equaion by (1-pA) (1-pB) and taking logarithms yields:

$$\mathrm{ln}\frac{{p}_{A}}{(1-{p}_{A})}=\mathrm{ln}\frac{{p}_{B}}{(1-{p}_{B})}+\mathrm{ln}\,R .$$

Because the ratio R applies to the conditional as well as to the unconditional distribution and the unconditional distribution is independent of the person parameter, R cannot depend on the person parameter θ. This implies that the item characteristic curves are horizontally parallel when the success probabilities are transformed to logits. That is the defining characteristic of the Rasch model.

9 Tightening via Theory

Although the Rasch model, in which person abilities and item difficulties are parameters to be estimated from data, is the tightest model yet considered, further tightening of the model specification is possible and desirable. As an alternative to a data-based empirical approach for the estimation of item difficulties, a theory might be used to predict an item's difficulty from characteristics of the item. In the prototype example, a readability formula for the text passage might be used as a predictor of item difficulty.

Tightening a model's specifications is, however, a two-edged sword. This process increases the number of ways in which a model can be wrong, which can hamper a model's usefulness. On the other hand, tightening can enhance a model's capability for measurement, which adds value to the model. What then is the enhanced capability afforded by a theory of item difficulty? For an answer to this question, consider the prototype example and the enhancement that occurs at various stages of tightening the model specification.

In the prototype example, the assumption of unidimensionality under replication allows the comparison between John and Alice as to who is the better reader. That specification is too loose to allow a data-based answer to the question of how much better. The Rasch model specification with person and item parameters to be estimated is tighter still and with the assumption of adequate replication for both items and persons, this model can answer the question of how much better Alice is than John. The answer to this question does not change when other items that satisfy the specification are used instead of the original items. The capability of providing a measure of the difference in reading ability between John and Alice independent of the items (qua instrument) used to effect the measurement is called specific objectivity (Rasch, 1977).

How well does Alice read? This is a question about Alice's reading ability apart from any comparison with the reading ability of John or anyone else. Suppose the only data available from which to infer an answer to this question are Alice's responses to a set of dichotomous reading items. The Rasch model specification with undetermined item difficulties does not have the capability to provide an answer. When the Rasch model is tightened by using theory to determine item difficulties, the question can be answered using only the data from Alice's responses. The enhancement provided by the theory-based determination of item difficulties is substantial. Alice now has an absolute measure on the reading ability construct, as distinct from her measure relative to another person's, which is independent of the items (qua instrument) used to make the measure.

The use of substantive theory in the form of a construct specification equation (Stenner et al., 1983) adds stringency to the model specification. Other uses of the specification equation include: (1) Explaining the variation detected by an instrument. The specification equation includes just those features of the measurement context that cause variation in success probabilities. In the prototype example, the construct theory states that as we move up the scale, we will encounter text that places higher syntactic and semantic demands on the reader. The specification equation includes proxies for these two text features. As these text features are manipulated, the theory predicts changes in the observed item difficulties. Some argue that there is no more compelling validity evidence than causal control over the variation an instrument detects; (2) Bringing nontest behavior into the measurement frame of reference. In the Lexile Framework for Reading, books are imagined to be tests with theoretical calibrations provided by the specification equation. The Rasch model is solved for the reader ability given an arbitrary but useful relative raw score of 75% and the theoretical item calibrations. The resulting reader measure required to answer correctly 75% of the virtual test items is the text readability measure assigned to the book. Thus, an important nontest behavior, comprehension of a particular book by a particular reader, can be forecasted; and (3) Generating item calibrations for reading items that have been built by a software program. This application enables one-off instruments to be used with each examinee and then disposed of as with disposable (single-use) thermometers. The specification equation and item engineering rules maintain the unit from instrument to instrument (Stenner et al., 2006).

10 Applying the Framework

There are important issues to consider when formulating and evaluating models for the purpose of effecting measurement from dichotomous data. Dimensionality, differential item functioning, and the number of parameters needed to represent item characteristics are examples of such issues. The developments presented in this chapter can provide a framework for addressing these issues when we formulate and evaluate models.

As an example of the framework in action, suppose the items on a test of verbal ability have been organized into subtests labeled comprehension and vocabulary. Do these subtests measure two distinct person constructs or is the full test unidimensional? The framework provides a basis for the conduct of data analysis to answer this question. At issue is the question of whether the ordering of persons is the same for the two subtests. Unidimensionality within subtest and the ability to replicate are all the assumptions needed to make it possible for measurement to provide the answer. In practice, however, there could be a reason to tighten the specification. Observed between-subtest differences in the rankings of persons could occur despite overall unidimensionality as a result of measurement error. An assessment of the statistical significance of the departure from overall unidimensionality may require more stringent assumptions than within-subtest unidimensionality about the interrelationships between the atomic models of the items.

This is just one example. Other important questions can be similarly addressed from the perspective of the framework.

11 Summary and Conclusion

A typical psychometric model represents the encounter between a person and a dichotomous item as a probability. For a given item, the probabilities associated with persons define a construct that is ordinal in character, that is, the probabilities can be arbitrarily subjected to a monotonic transformation without changing the character of the construct. Although a construct may be defined by the model for a single item, the single item model is insufficient for measurement, which requires replication, and replication requires multiple items that all define the same construct. Because single item models are the basic building blocks for multiple item models, it is natural to use the terms atomic and molecular for single item and multiple item models, respectively.

Molecular models involve assumptions about the relationships between the atomic models of the items, and these assumptions vary as to their stringency. This variation enables us to locate model specifications on an ordinal stringency construct where less stringent specifications are described as loose and more stringent ones as tight. Tighter specifications have less data-fitting flexibility but compensate with features that enhance their usefulness for measurement. The Rasch model is a tight specification that enables conjoint measurement. Although other, equally tight specifications enable conjoint measurement, the Rasch model is the only one for which the raw scores and item p-values provide all of the information that is relevant to the measurement of persons and items on the same scale. When the Rasch model is further tightened with item difficulties specified by theory, each person can be measured on an absolute scale with a measurement derived from single-use items.

For a given measurement application, the choice of model is likely to depend on particulars of the context. The concepts of atomic and molecular models and the location of a model specification on the stringency construct provide a framework that can help guide the choice of a model. The framework can also help to guide analyses of issues such as multidimensionality and differential item functioning that can threaten the validity of model-based inferences.