The Map of a Variable is the beginning and end of assessment. But we must immediately add that variable construction never ends because it is never complete; it is ever continuing. Variables require continuous attention for their development and maintenance. The map of a variable is a visual representation of the current status of variable construction. It is a pictorial representation of the “state of the art” in constructing a variable.

1 The Origin of Mapping

Maps are visual guides. They ground us in a stable frame of reference and give a sense of direction. How frequently we use expressions of belief implying vision, “Do you see?”, “Now I see.”, “Show me what you mean.”, and “Put me in the picture.” These expressions testify to the visual power inherent in pictorial representations and conveyed in speech and writing. Mapping visualizes the extent of our knowledge.

Maps are indispensable to planning and traveling. Map making has great utility. The inability to understand or make use of maps is a handicap to understanding the world.

The earliest maps used naturally occurring phenomena—celestial and terrestrial—to identify features. If we look at the sky on a starry night we can use the “pointers” of the Big Dipper to locate Polaris, the pole star. Although both dippers move, they rotate around Polaris which appears fixed and determines north. From this “fixed” star we orient ourselves to the points of the compass. More comprehensive maps of the heavens include the popular constellations of the Zodiac, lesser known constellations and other celestial features. The more celestial features we know, the better oriented we become to a starry night.

Terrestrial markers also serve to orient. A lake, a river, or a mountain may be used to anchor locations. Celestial and terrestrial maps have been used for centuries and are sometimes brought into relationship with one another. Today’s roadmaps are but a current update of the state of knowledge in local geography. A map is an analogy, an idea that pictures an abstraction. While the map may initially seem superficial, incomplete, or even inaccurate, it still serves a purpose. The map shows the current status of what is known about a domain.

Maps by their very nature, invite improvement. Every edition of a map calls attention to its accuracy and inaccuracy. Each new edition incorporates changes from a previous one resulting in a new and more accurate version.

Consider a map with the lines of longitude and latitude. This illustrates the benefits of superimposing an abstraction upon the natural contours of land and sea. Abstractions enhance maps by expediting generalization.

Natural reference points also explain by serving as markers to ground our observations. The more natural reference points we can employ, the fewer the resulting errors. The wider apart the natural markers, the greater the possibility of error. Brown (1949) provides a comprehensive history of map building with numerous illustrations that record how maps have become increasingly more accurate. Wilford (1981) has produced a similar, but more recent history. Edward Tufte’s recent publications (1997, 1983) offer a panorama of useful visual strategies together with his critique of how visual displays can facilitate the interpretation of data or mislead.

The use of maps illustrates several important aspects:

  1. 1.

    Maps are useful pictures of experience.

  2. 2.

    Inaccuracies are successively and inevitably corrected.

  3. 3.

    Abstractions, such as longitude and latitude, enhance mapping.

  4. 4.

    More knowledge produces greater accuracy.

2 Maps of Variables

Map topography is a useful application to psychometrics because a map is an abstraction of a variable. The variable implied by a test can first be pictured as a line (Wright & Stone, 1979, pp. 1–6). It is a line with direction illustrated by an arrow. The variable is defined by items and persons, but other useful characteristics can also be incorporated on the map. Continuous improvement is irresistible. Maps invite further corrections. The more information we gather about the variable, the more accurate our representation becomes. Finally, this pictorial representation of the variable invites yet further abstractions that generalize understanding.

Rudolph Carnap wrote:

The nineteenth-century model was not a model in this abstract sense (i.e., a mathematical model). It was intended to be a spatial model of a structure, in the same way that a model ship or airplane represents an actual ship or plane. Of course, the chemist does not think that molecules are made up of little colored balls held together by wires; there are many features of his model that are not to be taken literally. But, in general spatial configuration, it is regarded as a correct picture of the spatial configuration of atoms of the actual module. As has been shown, there are good reasons sometimes for taking such a model literally—a model of a solar system for example, or a crystal or molecule. Even when there are no grounds for such an interpretation, visual models can be extremely useful. The mind works intuitively, and it is often helpful for a scientist to think with the aid of visual pictures. At the same time, there must always be and awareness of the model’s limitations. The building of a neat visual model is no guarantee of a theory’s soundness, nor is the lack of a visual model and adequate reason to reject a theory. (Carnap, 1966, p. 176)

Carnap’s exposition clearly indicates the value of a map in fostering pictures by which to visually conceptualize an intuitive idea. He also cautions that maps are not substitutes for reality, but pictures and as such they cannot be interpreted literally.

3 Using Maps

There are three uses of maps:

  • To DIRECT… where are we planning to go,

  • To LOCATE… where we are, along the way, and

  • To RECORD… where we have been.

These three uses indicate that a map is the beginning and end of test construction (Stone, 1995). In the beginning stages, a map defines our intentions. At the end, it is a realization of progress to date. In between are markers along the way. Maps of variables are never finished because they invite constant correction and improvement. When maps embody abstractions derived from experience they connect the world of the mind to the world of experience. Abstraction is validated by correspondence to experience and experience is understood by abstraction.

Mapping illustrates the dialogue that must take place between these two worlds in order to communicate constructively. A map is a visual, operational definition of a variable. While maps are necessarily only models, their pictorial representation invites continual correction, ever increasing their accuracy.

4 Graphs as Maps

Graphs of functions are maps showing the relationship between two variables. Graphs make it easy to see by looking whether a useful function is emerging.

Graphs make functions recognizable and familiar. We recognize linearity in a straight line, and special curves describe a parabola or a quadratic relation. Other well-known functions like the undulating curve for the sine are easily recognized by their shape. The graphs of functions are maps as familiar to their users as roadmaps are to motorists. They aid understanding by simplifying the process and allowing us to “see” a complex representation.

5 A Map Is an Analogy

Measurement is made by analogy. Our most efficient and utilitarian measures rely upon visual representation. The ruler, the watch face, the mercury column, and the dial are common analogies used to record length, time, temperature, and weight. The utilitarian success of analogy in these measuring tools is demonstrable by their ubiquity.

  • The “intended map” of the variable is the idea, plan, and best formulation of our intentions.

  • The “realized map” of the variable which is made from item calibrations and person measures implements the plan.

  • Continuous dialogue between intention (idea) and realization (data) produces and maintains the validity of the variable.

  • A “Map of a Variable” is. the scope and sequence of instruction because it shows how to sequence instruction and how to relate it to assessment.

  • Progress from instruction and resultant learning can be located on the variable. Growth can be seen and measured.

  • There are shortcomings to maps, especially evident if their use is “pushed” to extremes.

What writers like Carnap (1966) and Kaplan (1964) present in their discussions about the shortcomings of models also applies to maps. We must be careful not to expect too much of a map and ascribe more substance to what is produced than can be justified. Constant monitoring of map building is necessary. Monmonier’s (1996) book “How to Lie With Maps” presents in a useful and amusing way the fallacies that can result from viewing a map as a “finished product” rather than as a “fiction,” an approximation of the outcome and one that is in process and never completed. Braithwaite (1956) has also cautioned, “The price of the employment of models [maps] is eternal vigilance.”

Psychometric maps serve as the plan for instrument development and revision. The map of a variable is a blueprint for a test. When a map is logical and well constructed, its implementation can be straightforward in the form of ordered items.

Figure 1 is a flowchart of the steps in bringing a variable into existence. Its development is guided by a map of intention.

Fig. 1
figure 1

Flowchart for variable construction

When the map is empirically verified, it documents a successful realization of an idea. The map of the variable pictures both the idea and its realization in the form of calibrated items and measured persons (Wright & Stone, 1996, Chap. 14). It embodies the construct validity of the instrument.

Figure 2 is the item/person map for the Knox Cube Test (Stone & Wright, 1980) generated by BIGSTEPS (Wright & Linacre, 1991 to date) This map as well as others generated from WINSTEPS (Linacre, 1999) greatly assist the psychometrist in variable construction. However, it is simple maps like this one that make psychometric analysis understandable to content specialists and other persons interested in the results, but not concerned about methodology.

Fig. 2
figure 2

Item/person map

Binet’s work in test development began more than 100 years ago. His work implies mapping although he did not employ the term.

First of all, it will be noticed that our tests are well arranged in a real order of increasing difficulty. It is as the result of many trials, that we have established this order; we have by no means imagined that which we present. If we had left the field clear to our conjectures, we should certainly not have admitted that it required the space of time comprised between 4 and 7 years, for a child to learn to repeat 5 figures in place of 3. Likewise we should never have believed that it is only at ten years that the majority of children are able to repeat the names of the months in correct order without forgetting any; or that it is only at 10 years that a child recognizes all the pieces of our money. (Binet, 1916, p. 329).

Binet clearly indicates how data from experience was used to establish a hierarchy of item difficulty. He makes special note of the requirement for “well-arranged” items expressing a “real order”. Binet also relied on “numerous” replications of ordered items in order to produce the level of accuracy he desired.

One might almost say, ‘It matters very little what the tests are so long as they are numerous’ (Binet, 1916, p. 329).

Binet clearly stressed (1) item arrangement by difficulty order, (2) numerous items, sufficient for precision. How else can one be successful? There is no other way except to do as Binet did: begin with an idea for a variable, illustrate the variable by items, arrange them by their intended difficulty, and measure persons by their locations among the items. The hallmark of Binet’s efforts is his early effort at benchmarking items and persons on a variable. He must have had a mental map of what he intended, although there is no indicate of one in his writings.

An early example of a psychometric map is Thurstone’s “Scale of Seriousness of Offense” (1927, 1959) shown in Fig. 3. His map marks out the severity of offenses from “vagrancy” located at the bottom end to “rape” at the top.

Fig. 3
figure 3

Scale of seriousness of offense. From Thurstone (1959, p. 75)

The scale is further subdivided into three offense categories: (1) sex offenses, located at the top of the scale, (2) injury to the person, located from the top to the middle, and (3) property offenses, located from the middle of the scale and down. Thurstone’s map provides insight into a hierarchy of criminal acts and a practical “ruler” for determining, not only the location of offenses, but the “distance” between them.

Figure 4 is a map of an achievement variable: WRAT3 (Wilkinson, 1993). This test of achievement measures (1) word naming, (2) arithmetic computation, and (3) spelling from dictation. Items are arranged according to difficulty These maps progress from left to right indicating increases in item difficult and person ability. The arrangement of items indicates the expected arrangement of persons according to their abilities. Less able persons will be located to the left of more able persons. Able persons will find the items on the left easier than those items further along to the right. These three variables follow developmental lines of learning, correspond to instructional goals and make test administration efficient and informative. The map of each WRAT3 variable is enhanced by sample items illustrating progressive difficulty and below the items is an equal-interval scale indicating the measures.

Fig. 4
figure 4

Wide range achievement test: WRAT3. From Wilkinson (1993). The Wide Range Achievement Test (1993 Edition). Wilmington, DE: Wide Range

These maps have immediate application. Like the marks made on a door jamb to show the increasing height of a child, this map shows student progress on three achievement variables. The maps show order to the items and measures. Progress of pupils along this educational ruler is enhanced by criterion and normative locations. The grade and age norms show growth. The map provides useful information to students, teachers and parents.

Figure 5 is a reduction of the “map” of the Lexile Scale of Reading© copyright Metametrics (1995).

Fig. 5
figure 5

The Lexile framework. From MetaMetrics (1995) [For a more legible version, see https://www.isbe.net/Documents/lexile.pdf]

The master map is larger and more comprehensive and requires a chart greater than 2’ by 3’ in order to picture only some of the large amount of available information. Lexile calibration values have been computed for a substantial number of trade books, texts, and tests. The title column indicates the content validity of the scaling. The educational levels column shows the increase in difficulty corresponding to reading more difficult materials. Construct validity can be demonstrated by these map locations. Educational levels, ages, and other information can be positioned on the Lexile Map. Criterion and construct validity are demonstrated by these relationships.

Mapping technology offers a powerful tool for conjointly ordering objects of measurement i.e., readers and indicants i.e., texts. Meaning accrues as this conjoint ordering of reader and text is juxtaposed with other orderings including grade level, income or job classification. Collections of these “orderings” constitute a rich interpretive framework for bringing meaning to the measurement of human behavior.

A good leap in understanding and utility is accomplished when the ordering of indicants along the line of the variable can be predicted from theory. In every application of physical science measurement, instrument calibration is accomplished via theory not data. Social science measurement stands alone in its reliance on data in the construction of instrument calibration and co-calibrations between instruments.

Perhaps the key advantage of theory based calibrations is that an absolute framework for measure interpretation can be constructed without reference to any individual or group measures on objects or indicants. The prospect of absolute measurement, long taken for granted in the physical sciences, has until recently eluded social scientists. The building of maps for the major dimension of human behavior is now possible because of the theoretical work of Rasch and Wright, amplified by the work of colleagues.

One pretender to the kind of mapping process outlined above is evident in NAEP’s use of the Reading Proficiency Scale (RPS). The RPS is a transformed Rasch scale with an operating range of 0 to 500. NAEP describes performance at grades 4, 8, and 12 as rudimentary, basic, intermediate, adept or advance depending upon the RPS attained by each student. Thus, a rudimentary reader, has an RPS = 150, an intermediate reader at RPS = 250 and an advanced reader at RPS = 350. So far so good, since all we have done is “name” certain “anchor” points on the RPS scale.

Problems develop when reader performance on the RPS scale is described using relative language such as stating that a rudimentary reader “can follow brief written directions” or “can carry out simple, discrete reading tasks” or a basic reader “can understand specific or sequentially related information.” An intermediate reader “can search for specific information, interrelate ideas, and make generalizations.” An adept reader “can analyze and integrate less familiar material and provide reactions to and explanations of the text as a whole.” An advanced reader “can understand the links between ideas even when those links are not explicitly stated.”

These statements are not appropriate descriptions of scale points along the RPS scale. Rather they are good descriptions of the behavioral consequences of more or less accurately matching the demands of a text with the capabilities of a reader. Thus, rather than describing absolute scale positions, these annotations, in fact, describe differences between a reader measure and a text measure. When a text’s measure exceeds a reader’s measure, comprehension is low and the kinds of reader behaviors used above describe a “basic” result.

When a reader’s measure exceeds a test’s measure the kinds of reader behaviors used to describe adept and advanced readers are evident. The key point is that each of these behaviors can be elicited in the same reader simply by altering the level of text that is presented to the reader. Thus we can make a 400L (second grade level) reader adept by presenting a 100L text or a 400L reader rudimentary by presenting 800L text. Comprehension rate is always relative to the match between reader and text and it is this rate, rather than the reader’s measure, that is appropriately described in behavioral and proficiency terms. Much confusion has resulted from a failure to recognize this distinction.

6 Summary

Successful item calibration and person measurement produces a map of the variable. The resulting map is no less a ruler than the ones constructed to measure length.

The map indicates the extent of content, criterion, and construct validity for the variable. The empirical calibration of items and the measures of persons should correspond to the original intent of item and person placement, but changes must be made when correspondence is not achieved. There should be continuous dialogue between the plan, person measures, and item calibrations. Variables are never created once and for all. Continuous monitoring of the variable is required in order to keep the map coherent and up-to-date. Support for reliability and validity does not rest in coefficients, but in substantiating demonstrations of relevant and stable indices for items and measures. Such indications must be continuously monitored in order to maintain the variable map and assure its relevancy.