Introduction

With the increasing hegemony of quantification in archaeobotany (and, indeed, in the social and natural sciences in general), results that depend on complex statistical arguments are often viewed with scepticism. Much of this suspicion is borne by a set of statistical tools commonly called “multivariate statistics,” which I will call “traditional” or “standard” multivariate statistics to distinguish them from graphical analysis, another method of dealing with multivariate data. Books like How to Lie with Statistics (Huff 1954) and aphorisms like the tricolon crescendo “Lies, damn lies, and statistics” indicate the skepticism with which statistical results are sometimes viewed. Some archaeobotanists have pointed out specific limitations of multivariate methods (e.g., Hastorf 1993; Hubbard and Clapham 1992) and more archaeologists simply ignore bodies of work that seem to depend too heavily on statistical arguments.

One reason for this sort of skepticism is that mathematical statistics has become a complicated technical field with many procedures (developed by statisticians) that are either inherently complex or explained in complex technical language. Therefore, the average archaeologist frequently has no way to evaluate the legitimacy of a statistical conclusion.

Another problem is that the assumptions behind statistical tools limit the questions to which they can usefully be applied. This can be stated in technical language (e.g., “every exact probability is conditional upon a finite set of possible outcomes” or “every test of a hypothesis is a test of multiple joint hypotheses”) (Quine 1953; Lakatos 1978). This amounts to the argument that the world is a complicated place and making simplifying assumptions so that the mathematics work out nicely may lead to missing something important.

By the time they are used in archaeology, standard multivariate statistical tools are seldom or never wrong per se, but they are frequently misapplied. Even more frequently, they are correctly applied but contribute nothing of anthropological interest to the question being addressed. Differences in terminology and confusion over concepts allow reporting of results that are statistically correct but irrelevant to the anthropological or historical questions of interest.

In archaeobotany, standard multivariate methods like principal components analysis are increasingly becoming routine tools for data analysis. Therefore, it now seems important to examine their use critically, and make to the suggestion that “graphical analysis,” or the “visual display of graphical information” (Tufte 1983), is a viable alternative or accessory for the routine analysis of archaebotanical data. This is a solution that has been suggested by statisticians many times in the past few decades (Anscombe 1973; Chernoff 1973; Tukey 1977; Wang 1978; Kleiner and Hartigan 1981; Tufte 1983; Cleveland 1985) but is still underemployed in archaeobotany.

In this paper, I begin by examining some theoretical limitations that arise in archaeobotanical practice, and then apply a form of graphical analysis to a small data set from the fourth millennium BC in what is now eastern Syria in order to show how it copes with some of these theoretical limitations better than standard multivariate statistics. There is a vast and disparate literature on graphical tools for data analysis, which cannot be summarized or reviewed here. Instead, I hope to open a window into an area of statistical thought that is not covered in introductory statistics courses or textbooks for social scientists, and provide one example of how graphical analysis can be applied in archaeobotany.

Semiquantitative data and excess precision

Whatever methods are used to accumulate and analyze archaeobotanical data, at some point in the process of collection and analysis, the data are represented as a matrix of numbers. Typically, for instance, taxa or anatomical parts form the row labels and the sample names form the column labels of such a matrix. However, the “numbers” that appear in such a matrix are seldom real, fully quantitative “numbers” in the same way as the “five” in the statement “I have five fingers” is a number. Instead, they are numbers like the “five” in “I’ll be there in 5 min.” In the former case, the “five” means that the speaker has five fingers, as opposed to four-and-a-half or three; exactly corresponding to the integral number five; in the latter case, however, “five” could well mean four-and-a-half or (one frequently finds) as many as 25, but is unlikely to mean 5 s or 5 h. In other words, the phrase “five minutes” conveys more information than “pretty soon,” but not as much information as “a number of minutes well approximated by a gaussian random variable with mean 5 and standard deviation 2.” Furthermore, the meaning of “five minutes” varies depending on the reliability of the person using the phrase, the tone of voice in which it was uttered, and the context in which it is said. “Your boiled egg will be ready in 5 min” differs from “just 5 min while I print it out.”

In the same way, a count of 213,987 barley grains in a typical archaeological sample conveys little more information than “about 200,000” or even “thousands of barley grains.” Providing numbers with too many significant figures is sometimes referred to as “excess precision,” but the distinction between “precision” and “accuracy” is not the same as the difference between the quantitative “fingers’ five” and the semiquantitative “minutes’ five”. Unlike accuracy, the “minutes’ five” is fundamentally descriptive, not quantitative.

This kind of unquantifiable error or uncertainty has been pointed out in several different fields. In instrumental analytic chemistry, the term “semiquantitative” is used: numbers produced by an uncalibrated instrument run are referred to as semiquantitative data—useful for internal comparison or order-of-magnitude estimates, but not to be confused with elemental concentrations. In economics, Knight (1921) distinguished true uncertainty (now sometimes called Knightian or radical uncertainty) from quantifiable probabilities or risks, and in discussion of the history of probability theory, Hacking (1985) drew a similar distinction between “epistemological” and “aleatory” probabliity. Here, the most useful way of formulating this distinction is to say that some random variables have distributions that can be well-approximated by quantitative parameters and some do not. Statistical models and tests apply only to random variables that meet or approximate the assumptions for which the procedures are designed.

In archaeobotanical data, the precise numerical counts are often the byproducts of many, complex, confounding factors, and frequently, the interesting samples are the statistical outliers. For instance, Hubbard and Clapham (1992) make the point that depending on the context from which an archaeobotanical sample is recovered, taphonomy may be a more significant determinant of the numbers counted than the original distribution of the material on the site. They point out that the contexts in which (charred) archaeobotanical material is found range from a perfectly preserved, closed storage container in its original anthropogenic context, which they call a “Type A” context, through mixed or potentially contaminated but identifiable contexts (“Type B”), to material obtained by flotation of sediment from an indeterminate context (“Type C”). They then argue that quantification of Type C samples, which are the most frequently encountered, is largely pointless because the complex processes that produce such samples can never be sufficiently disentangled that statements can be made about agricultural or dietary practices.

As with all areas influenced by New Archaeology, this debate over the value of quantification may continue. In practice, however, most contemporary archaeobotanists do attempt to quantify their samples. If archaeobotanical methods are ultimately a tool to understand past human societies better, then it seems reasonable to see taphonomy and site formation processes as producing large and potentially unquantifiable errors in the seed counts of flotation samples from unidentified contexts.

Other sources of numerical errors in archaeobotanical data are obvious and easily corrected. For instance, the number of embryo ends (a way of quantifying the minimum number of individual seeds present in a sample) are not directly comparable with counts of every identifiable element. Other sources of error may be ultimately unquantifiable, like the subjectivity of identification and the possibility of contamination or mislabelling of samples. It is often possible with care to reduce or eliminate some of these potential sources of error, but it is never possible to eliminate, or even identify, all possible errors. Thus, we are led to the conclusion that the numbers in a matrix of seed counts are semiquantitative—usually more like the “minutes’ five” than the “fingers’ five.”

Moreover, the anthropological or historical significance of equivalent numbers of seeds can be radically different. A single well preserved seed of Zea mays (maize) from a sealed context well dated to the third century AD of the Near East would threaten to revise our entire historical framework of world civilization—in fact, such a report would be so radical that it would almost certainly be treated as a mistake or a hoax. A single Hordeum vulgare (barley) caryopsis from the same context might not even be interesting enough to be worth recording.

Significance, pattern, and meaning

Appropriate statistical tools are available for dealing with some types of semiquantitative data. For instance, nonparametric statistics allow some analyses of quantities whose exact sizes are not known, so long as they can be ranked or ordered from small to large. These are useful, but there remains a second fundamental limitation on statistical analysis of archaeobotanical data: identifying a pattern or statistically significant correlation is not sufficient to show a scientifically interesting effect. This point is essentially the same as is made by the cliché that correlation is not causation, but in practice, there are many patterns that are not correlations.

A simple univariate example of this is the well-known demonstration by Anscombe (1973) that a number of different distributions of points in two-dimensional space can produce the same regression line (see Fig. 1). No one who looks at the plots is fooled, but without a graphical examination, even a statistical procedure as well understood as univariate, least-squares regression can hide important differences between data sets. Multivariate data can prevent the sort of simple visualization that identifies the differences between Anscombe’s data sets. In the case of archaeobotanical data, which is usually highly multidimensional, it is easy to be led astray.

Fig. 1
figure 1

Figure plotted from data in Anscombe (1973). Four least-squared regressions are shown, which all produce the same regression line: y = 3 + 0.5x. Though this line is a mathematically accurate statistical model in all cases, it fails to capture essential aspects of the data in all cases except in the first (x1 vs y1)

The tools that constitute traditional multivariate statistics are designed for dealing with data in three or more dimensions. Most of the archaeobotanical applications of multivariate statistics have employed what is called Q-mode analysis, which is the practice of classifying or grouping objects based on a number of variables (Legendre and Legendre 1998). With archaeobotanical data, the taxa counted in each sample are generally treated as variables and the samples are treated as objects; in other words, the intent is to classify samples.

Q-mode analysis can be separated into two general sets of tools, both of which have been used by archaeobotanists: clustering algorithms and eigenvalue methods (which including factorial analysis, principle components, canonical correspondence analysis, and multidimensional scaling). The general intent behind both types of Q-mode analysis is to classify objects based on a large number of measured variables. Most clustering algorithms treat all measured variables as equally important, while eigenvalue methods focus on identifying a few axes along which the objects are spread out (thereby reducing the dimensionality of the data) and classifying objects by reference to these axes.

The major advantage that these forms of mathematical classification have over classifications made by eye is usually referred to as “objectivity,” by which it is meant that they are presumed to reflect “natural” or “inherent” relationships among the objects being classified. In linguistics and antropology, there has been debate over the role of the classifier (Pike 1967). Happily, for our purposes, there is no difference between “finding natural groups in data” and “classifying data well.” In both cases, the object is to obtain classes or groups that reflect important or interesting aspects of the data and are stable when applied to more data of the same type, under examination by different people at different times and in different places. Therefore, in order to side-step the epistemological issue, we can describe the advantage that mathematical forms of classification enjoy as “bias of a type radically different from most human biases.”

Such forms of a classification are useful because they provide foils to the biases of trained archaeobotanists, but they are not necessarily less biased than classification by eyeball. They merely substitute arbitrary mathematical biases for rational human ones. To use a simplistic analogy: imagine a room full of people with a number of characteristics that can be measured, like age, height, weight, and hair color, and a number of things one might like to know like life expectancy or cultural affiliation. Actuarial analysis is not needed to realize that age will reveal a lot and hair color virtually nothing about life expectancy, while the significance of the variables is reversed if one is interested in cultural affiliation. Which of these two things one is interested in is a question that no statistical method can answer. By and large, statisticians accept that “There is no obligation to slavishly [sic] accept a numerical solution...every research scientist knows the value of a courteous and able assistant who will make constructive and unbiased suggestions, and this function a computer equipped with a suitable battery of clustering programs is able to fulfil. And like a good assistant, it knows its place: it issues neither judgements nor commands.” (Williams 1971, p. 324). However, the translation between disciplines is confusing. The archaeological consumer needs to distinguish between a classification that is objective merely by virtue of being arbitrary and a classification that explains interesting variation in the archaebotanical record. As a recent review of multivariate statistics in ecology and systematics puts it: “Much of the misuse of statistical tools is attributed to miscommunication between statisticians and biologists...Statistical usage of terms like ‘effect’ or ‘explanatory variable’ is not meant to imply causation...The objective of the present review is to help the researcher navigate between the Scylla of oversimplification...and the Charybdis of assuming that patterns in data necessarily reflect factors in nature, that they have a common cause, or, worse, that statistical methods alone have sorted out multiple causes.” (James and McCulloch 1990, p. 131f).

Hastorf (1993) has already made the point that “Analysis must transform raw counts quantitatively or qualitatively so that they are interpretable...Because of the biases inherent in the data, most paleoethnobotanists choose to use some form of relative presentation, maintaining an internal comparison, to control for preservation differences and other post-deposition effects...I have found that no one technique gives a complete picture of what the data have to offer archaeologists.”

Exploratory data analysis

As pointed out by Tukey (1977) (among others), the object of exploratory data analysis is not to simplify or summarize data but to represent it in ways that allow us to appreciate it better. This goal is much older than traditional multivariate statistics or its critiques. Graphical display of quantitative information is actually the primitive form of data analysis, and multivariate statistics is a comparative innovation, so “traditional multivariate statistics” is a misnomer. Tufte, who coined the phrase “visual display of quantitative information” (Tufte 1983) provides a brief discussion of the earlier history of the practice beginning with William Playfair in the eighteenth century.

Data collection and analysis, however, were totally changed by the introduction of the electronic computer about 40 years ago. The idea of analyzing data by graphical representation was effectively reintroduced and stated in modern statistical terms by Chernoff (1973). Essentially, he argued that it is easier for us to appreciate complex patterns in familiar objects (like faces) than in pages of numbers. Therefore, in order to enhance the power of human pattern recognition, he coded data onto small face-like icons or glyphs. An elaboration of this strategy was discussed in the Journal of the American Statistical Association (see Kleiner and Hartigan 1981: article, comments, and reply), and several examples of its application are given in Wang (1978). The logic behind this approach is that traditional multivariate tools focus on reducing the dimensionality or complexity of data, summarizing, simplifying, or eliminating all but the axes of maximal variation.

Frequently, archaeobotanists already understand the general characteristics of their data and are interested in addressing complicated multifaceted relationships or in finding patterns that were not anticipated. Summarizing or reducing dimensionality can obscure complexity, which is why exploratory data analysis relies heavily on visual representation (Tukey 1977). Note that the term “exploratory data analysis” is most frequently understood to include hierarchical cluster analysis, principle components analysis, and other techniques of data summary that do not involve the testing of explicit hypotheses. All of these techniques have graphical aspects and can provide effective methods of plotting data. I use the term “graphical analysis” to refer to visual representations of data that do not rely on mathematical summary or simplification.

To illustrate how this strategy can be applied to archaeobotanical data, I will use a preliminary matrix of seed counts from the site of Tell Brak in northeastern Syria (Green 1999). (Please note that the seed counts shown in Table 1 in Appendix A and the stratigraphic relationships in Fig. 5 are preliminary and should not be relied upon for archaeological interpretations except in general terms. The archaeological and archaeobotanical conclusions drawn from this material will be presented elsewhere (Charles et al. 2009), as this paper is exclusively about the methods of data analysis.)

Figure 2 shows a star plot (also known as radar plots or rose diagrams) representing an archaeobotanical sample. In this example, six categories of archaeobotanical material are assigned to axes around a center. The number of specimens in each category is plotted along the axes and their apices joined to form an irregular polygon; in this case, a hexagon whose axes correspond to the numbers of wheat seeds, nongrain crop seeds, barley grains, weed seeds, chaff fragments, and seeds of hydrophytic plants. This is labeled the “composition hexagon.” An accompanying circle (the “abundance circle”) gives the total number of specimens in all categories in the sample and another dotted circle gives the total number of specimens counted in all samples for scale (“unit circle”). This is a very general method for exploring complex multivariate data, which, so far as I know, has only been employed once before (Hastorf 1993) for archaeobotanical samples. This particular version was produced by a macro in the program Matlab (Appendix B), though a number of other software packages [in particular, the R statistics and graphics language, R Development Core Team (2004), which, unlike Matlab, is free and open-source] also provide the ability to produce arbitrary graphical representations of numbers.

Fig. 2
figure 2

Key to star plot; encoding was done with a macro (“M-file”) in Matlab 5.3. Text in the figure explains the details of coding and scaling, but it is only necessary to remember that the size of the circle indicates the abundance of a sample and the hexagon gives its composition

In Fig. 3, the counts in Appendix A are represented as a series of these star plots, one for each sample. The size of the circle represents the abundance of the sample and the vertices of the hexagon give its composition, as described above. In Fig. 3, the glyphs are printed in an arbitrary order, while in Fig. 4, they have been clustered by eye, without reference to contextual data.

Fig. 3
figure 3

Star plots with all contextual information removed, arranged in an arbitrary order

Fig. 4
figure 4

Star plots with all contextual information removed, clustered by eye

The groups displayed in Fig. 4 are based only on pattern recognition and, thus, should not be biased by archaeological preconceptions. If additional objectivity is needed, resampling values analogous to bootstrap or jackknife numbers could be obtained for given groups by showing the top box in Fig. 4 to a large number of people and respecting only the groups that are found reliably. The chances are, however, that archaeologically interesting groups will be obvious and revealed by any reasonable method of data presentation. If clustering is so weak as to disappear or change radically when a different method of analysis is applied, it probably contains little of archaeological interest.

In order to compare graphical analysis with standard multivariate tools, Fig. 5 combines a principle components analysis with graphical analysis by plotting the stars from Fig. 3 in the locations where they fall in a hierarchical cluster dendrogram. As can be seen, the groups revealed are very similar to those produced by eye in Fig. 4.

Fig. 5
figure 5

A hierarchical cluster map (dendrogram) showing the relationships between samples as produced by an algorithmic procedure. The procedure used is one of many possible options using Ward’s method for clustering and a euclidean distance metric. Note that it basically replicates the groups apparent in Fig. 4

In Fig. 6, the same star plots are ordinated in a bivariate plot of the first two principal components. Again, the groupings that are produced are roughly similar. An additional advantage of this presentation is that the covariation of weed seeds grain and chaff is shown by the cluster of four arrows pointing to the left in a similar direction, which suggests that the first principal component is related to crop processing.

Fig. 6
figure 6

Here, the star plots are themselves plotted in the bivariate space defined by the first and second principle components. Geometrically, this is the projection of data points in 6-dimensional space onto the plane whose axes are defined by the linear combination of the six variables that maximize the spread of the scatter of points

Finally, in Fig. 7, the glyphs representing each sample are inserted into a Harris matrix to show how the flexibility of graphical presentation allows specialized data like stratigraphic relationships and a certain amount of contextual information to be incorporated.

Fig. 7
figure 7

Harris matrix showing stratigraphic relationships between samples, as well as sample number, name of the stratigraphic unit in which the sample was found, and an abbreviated description of the archaeological context

Discussion

Graphical analysis has some limits not shared by some other multivariate tools: on the 2-dimensional page, it is sometimes possible to fit a dozen or more variables (for instance, Fig. 7 gives six quantitative, two categorical, and one relational variable for 31 objects). When there are more than 50 or 100 variables of interest, however, some form of summary or simplification is needed. Original graphical displays tailored to a particular purpose frequently take more effort to produce and may require either artistic skill or facility with computer programming.

These defects, however, are offset by the advantages visual tools have over traditional multivariate statistics: they are better at coping with the semiquantitative nature of data because we naturally view shapes and sizes by reference to other shapes and sizes (Cleveland 1985), whereas numbers can imply a precision that the data do not support. Graphical displays are also more easily interpreted by readers without reference to the statistical literature.

The best strategy usually consists of the application of several different tools to the same data (Hastorf 1993), but almost all anthropological or historical points can be made by reference to a graphical display instead of a statistical summary or the numerical output of a mathematical manipulation. This is not to imply that graphs should be published alone—it is still important to publish numerical data in as raw a form as practicable. It is also important to recognize that no analytical choices or statistical techniques can compensate for data that are carelessly collected or can add relevance or precision to data that are not pertinent to anthropological or historical questions of interest. Archaeobotanists without a background or interest in statistics or quantitative analysis should be reassured that elaborate statistic manipulations are seldom or never necessary. A picture is worth a thousand words, and often even more numbers.