Introduction

Along with every other graduate student of evolutionary biology in the 1970’s, I pored over Olson and Miller’s great treatise Morphological Integration of 1958. The book is basically an extension of a paper of theirs earlier in the decade that had introduced their \(\rho \)F-model (“rho-eff model”), an approach to quantitative morphology that combined “\(\rho \)-groups,” sets of variables all mutually intercorrelated above some threshold, with “F-groups,” variables that were “thought to be functionally or developmentally related” (Olson & Miller, 1951, p. 337). The ensuing biometric model, a version of factor analysis complementary to Wright’s approach (see Sect. “A Classic Example of Disarticulated Distance Data”), enthralled many of us right through the early 1990’s, as long as those morphometric data sets of “measures” generally consisted of quantified extents (lengths, areas, volumes) or some of their derived biomechanical indices. The book, which has been cited over 1100 times (Google Scholar, retrieved 6/21/2022), remains central to the methodology of morphological integration.

Back in those graduate days I had simply overlooked some text following those definitions in the later publication:

The principal differences between the earlier \(\rho \)F-model and the reformulated model presented in this chapter relate to the use of the concept of F. In the original model \(\rho \)F-groups were formed by the intersection of a \(\rho \)-group and an F-group. Such groups were thought to have biological significance. The \(\rho \)F-group is not formed in the modified model, for the concept of F has been eliminated from the formal framework. This has provided greater flexibility in subsequent mathematical manipulations of the groups, \(\rho \)-groups, and follows logically from tests that have confirmed the hypothesis that \(\rho \)-groups imply F-groups. The \(\rho \)F-group remains as an important conceptual unit but one that is beyond the structure of the model. (Olson & Miller, 1958, p. 56; emphasis mine.)

Olson and Miller recuse themselves to some extent from this radical empiricism when, in their final chapter, “Summary and Theoretical Considerations,” they state,

What we have attempted in the pages of this book is a description and development of a quantitative approach to representation and analysis of anatomical changes. We feel that this goal has been reasonably well achieved. Now we propose to turn to a survey of time-dependent phenomena in the history of life. ... Interpretation of such phenomena brings the implications of causality into sharp focus. ... We specify our goal in this respect as follows: If, for a given problem in the context of this book, we are able to predict, by virtue of reducing the problem to a few quantitative terms and then noting regularities or lack of them in time-ordered sequences, we shall then claim that a step has been taken in the direction of a causal explanation. (Olson & Miller, 1958, pp. 259–260, italics omitted.)

The concept of time-ordering invoked here comes at diverse scales. In the examples of the present paper, it is either calendar age or size; in the authors’ setting of paleobiology, presumably it includes geological age and selection gradients as well. In either context it serves to set a boundary on the notion of integration that is being invoked: a criterion for what counts as causal for the purposes of interpreting any patterns that emerge from the data analysis.

In the 1950’s, the subordination of F-groups I set in italics in the first quotation above might have been seen as plausible, at least for its intended context, multiple measures of organismal size. The extension of multivariate analysis from distances like these to log-distances (Jolicoeur, 1963) makes little difference here—as will be noted in Sects. “A Classic Example of Disarticulated Distance Data” and “Landmark Configurations Analyzed by Distance Networks” of this paper, statistics of distance measures are relatively resistant to gentle algebraically monotone transformations.Footnote 1 However, the Olson–Miller approach has not been updated to accommodate the revolution in morphometrics (Rohlf and Marcus, 1993) occasioned by the advent of the geometrical approach that emerged as a core toolkit beginning in the early 1990’s and then its extension to semilandmarks a few years later. The problem today would be not only the replacement of distances by Cartesian coordinates but also respect for the simple count of today’s variables. The H. sapiens midsagittal skull example in Sect. “A More Complicated Example, Involving Semilandmarks” here generates \(92\cdot 91/2=4186\) alternative distance measures among its 92 points and then \(4186\cdot 4185/2= 8759205\) coefficients of pairwise association among those distances. It is unimaginable to approach such a range of possibilities without access to the crucial additional information about such recondite data structures that is carried by the landmark scheme according to which they were digitized in the first place.

My concern here for what today’s museum biologist might refer to as “metadata pertinent to the list of traits” was not a focus of earlier reconsiderations of this Olson and Miller classic. For example, an excellent review from nearly a quarter of a century ago, Chernoff and Magwene (1999), which “extend[s] beyond the originally published material to include new topics or those underrepresented in the original text,” briefly considers a problem it calls “congruence of traits” in the section headed “Homology and Morphometric Traits” at pages 328–330. While arguing that “care must be taken” regarding the “potential ambiguity in our measures, descriptors, and analyses,” they overlook the specifically geometric aspects of the formulas that extract individual quantifications by combining the coordinates of landmark locations arising as long untheorized lists, the specific concern of this article. Chernoff and Magwene acknowledge the possible relevance of partial warps (Bookstein, 1991, more fully explained later in Bookstein 2014, 2018) to the design of schemes of “measured lengths,” but, understandably, they did not anticipate the deeper application of that spectrum to the conceptualization of integration announced in Bookstein (2015) and relied on in Figs. 8 and 1115 here. Neither the partial warp score vector nor any other specific rotation of the shape coordinates is appropriate as a canonical list of traits to be submitted to some standardized integration analysis. Their role in descriptions of morphological integration, to be explored in this essay, is quite a different one.

For a methodology of morphological integration to accommodate today’s turn to megavariate morphological data in evolutionary biology, developmental biology, and medicine, it is mandatory that we remedy this defect—failure to attend to the a-priori logic of those trait lists—in the linear multivariate approach that suited the biometrics of that distant time. The present essay approaches this purpose by a reananalysis of data sets from earlier publications using the toolkit of geometric morphometrics (GMM) in more or less its present form. Some of these core techniques date from the previous century [e.g., thin-plate splines (TPS), Bookstein 1989], while others are of more recent invention and thus a lesser degree of dissemination (e.g., self-similarity as a null model for integration, Bookstein 2015, or Boas coordinates as a replacement for Procrustes coordinates whenever size or age is an important explanatory factor, Bookstein 2021a). The combination of these tools ultimately results in the generation of multiple aspects of morphological integration (the “dimensions” referred to in the title of this essay) that are always latent factors of the distributed coordinate data, not principal components per se. Furthermore, the principal components taking part in these novel analyses are not the conventional “relative warps” derived from Procrustes shape coordinates as in the GMM textbooks. Instead, they are principal components of Boas coordinates—Boas principal components (BPC’s)—from sublists of the available point data, thereby fusing the two purposes of Olson and Miller’s original thrust, the statistical and the biological, by subsetting the available information resources in light of the available spectrum of biotheoretically reasonable pattern explanations.

One main component of today’s Olson-Miller-citing literature is the exploitation of an assortment of statistics that attempt to summarize a morphometric configuration in terms of the “general degree of integration” of some collection of morphological measures. There are, for instance, Olson & Miller’s (1958, p. 155) squared ratio of the count of correlations greater than some threshold to the maximum count possible, and also the statistic \(V_\mathrm{rel}\) of Pavlicev et al. (2009), which is the normed variance of the eigenvalues of that correlation matrix. But that theme is not closely related to the concerns of this essay. From the point of view of the organismal biologist, the freedom to select particular geometric patterns from the manifold offered by any competent GMM data structure is far more important than any simplistic algebraic formula for scalar summary of such a selection promulgated a-priori. The present paper thus strongly deprecates the use of a statistic like \(V_\mathrm{rel}\) in the course of any morphometric study. To the extent that so compact a numerical summary appears to suggest some analytical inference, that analysis is unlikely to be meaningful in any biological sense.

Methodological Preliminaries

I have no quarrel with the definition of morphological integration implicit in most expositions that embrace Olson and Miller’s spirit, to wit, the careful survey of possible evidentiary consequences of shared causes of form-change or form-variation that will eventually be interpreted exogenously, which is to say, by arguments beyond the morphometric data at hand. These arguments might be biomechanical in origin [e.g. D’Arcy Thompson’s (1917) “origins of form in force”], genetic, morphogenetic in terms of germ-layer or tissue-level processes (e.g. Jacobson and Gordon, 1976; Schüepp, 1966), or otherwise. The present discussion, echoing Olson and Miller’s in its thrust though not its arithmetic, concerns protocols for interpreting the multivariate morphological data that accumulate nonexperimentally over multiple specimens. Techniques for this purpose must combine the multiple measurements by some algebraic formalism (usually linear combinations) as overseen by one or another standard representation of the interdependencies (usually covariances or correlations) of those measures. This century’s easy access to powerful computing packages foregrounds one particular tool, principal component analysis, for initial exploratory data analyses toward this goal.

This paper will shortly consider three morphometric examples that together convey my main argument for multifactorial interpretations of morphometric integration patterns that involve both choice of measures (analogous to Olson and Miller’s “F-groups”) and choice of a figure of concordance (analogous to their “\(\rho \)-groups”). But before beginning that discussion we must examine a pair of fundamental maneuvers that lie at the root of any principal component analysis based on geometric data, whether distances or normalized Cartesian coordinates. Principal component analysis (PCA), familiar to us all from graduate school on and the topic of many standard textbooks (e.g., Jolliffe, 2002; Jackson 1991), is the eigenanalysis of some symmetric matrixFootnote 2 of covariances or correlations. It entails two successive algebraic maneuvers each risking an extensive loss of potentially relevant geometric information. The first is the most familiar and at the same time the most invisible: the decision, in the Gaussian model \(N(\mu ,\Sigma )\) usually underlying interpretations of data schemes like this, to ignore the vector \(\mu \) of means of the variables, restricting the computations just to the covariance term \(\Sigma \) of that model. In the second loss, not only the means but also the variances of those variables are ignored when the matrix \(\Sigma \) of their covariances is replaced by the matrix \(R=\Sigma \Bigg /\Bigl (\sqrt{\mathrm{diag}(\Sigma )}\otimes \sqrt{\mathrm{diag}(\Sigma )}\Bigr )\) of their correlations. (The notation “\(\otimes \)” means “outer product,” the matrix whose entries are pairwise products of the two vectors either side of the \(\otimes ,\) and the division operator on the right-hand side of this expression is meant elementwise.)

The correlation matrix R is then passed through a PCA that expresses it as \(R=E\Lambda E'\). The terms of this equation represent the findings of the PCA: the eigenvectors of R (the mutually orthogonal, unit-length columns of E) are the loadings of R’s principal components, meaning the coefficients of its scores that will later be scattered; while the diagonal elements of \(\Lambda ,\) the only nonzero elements it has, are the corresponding eigenvalues of these same linear combinations, their variances now that every original variable has been rescaled to variance 1. Yet in typical applications of the method, neither of the manpulations that preceded the PCA, the erasing of the mean vector nor the normalization of the covariance matrix by the square roots of its diagonal, is subjected to any critical analysis, though of course both are explicitly present wherever the PCA has been written out in algebraic notation.Footnote 3

For any principal component analysis of morphometric data to contribute to a meaningful quantitative discussion of “integration,” the (usually nonexperimental) search for detailed evidence of shared causes, both of these intentional losses of information must be highlighted and strenuously defended. Otherwise there is no reason to trust the analyst’s implicit claims that interpretation of the eigenvalues \(\Lambda \) and eigenvectors E of the eigenanalyses to come should not depend on the means or variances of those variables, claims that strike me as unlikely a priori. (Of course interpretation depends even more on the list of variables sent for eigenanalysis.) Writing prior to the entry of eigenanalysis into the discussion, Olson and Miller seem never to mention either of these limitations of the correlational approach, saying only, “high correlation of elements in morphogenetic fields suggests relationship of association and genetic heritage, although at the present stage the evidence is admittedly tenuous” (Olson & Miller, 1958, p. 31). Yet inasmuch as their method of \(\rho \)-groups requires a common scale for the coefficients \(\rho ,\) the reliance on correlations appears more or less mandatory.

Wright (1954) likewise speaks only of correlations without any justification beyond their pervasive presence in the equations of population genetics, even though he tabulates the ratio of standard deviation to mean value for the six measures that concern him. Writing much later, in volume one (1968) of his four-volume Evolution and the Genetics of Population, he explains how the concept of correlation actually arose first, in the inspired amateur guesses of Francis Galton and then the exquisite statistical investigations of his protégé Karl Pearson.Footnote 4 Similar axioms about the irrelevance of lost information are hidden in the dataflow of other popular morphometric approaches to biological explanations. In the toolkit usually called “geometric morphometrics” that deals with labelled Cartesian point data, for example, the currently standard Procrustes analysis begins with a centering of the configuration data, specimen by specimen, then a rotation of each configuration around its own centroid to a best sum-of-squares fit with the shared average of those same rotated forms, and finally a division of each rotated form by the square root of the second central moment of the centered data, the quantity usually called Centroid Size. I have argued elsewhere (Bookstein, 2016) that every step of this cascade demands its own separate explicit justification, particularly the last two, which involve erasing rotation information that is often relevant in biomechanical settings and also information about size that is often the single most powerful explanation of whatever data patterns emerge from the subsequent computations. In this same GMM context, furthermore, the investigator must justify any decision to report a data analysis by principal components of the resulting coordinates, a style of reporting lamentably susceptible to accidents of landmark density and separation, rather than referring to any more natural spectrum of descriptors such as Sneath’s (1967) polynomial trend surfaces or the partial warps of the thin-plate spline approach (Bookstein, 2014, 2018). This dependence on point spacing and clustering is especially important in data sets that combine discrete landmark points with the representations of curves or surfaces called semilandmarks.

This paper introduces and iterates the theme that the nexus of concepts associated with the term “integration” cannot be treated by simple numerical summaries, or even by principal component analyses alone, but requires acknowledgement of the deleted information channels and, if appropriate, their restoration. Let me put the matter in the form of three dogmas.

1.:

The yes/no question “Is this data set integrated?” is always the wrong question. Living creatures are integrated more or less axiomatically by the very definition of life. No single binary digit can adequately convey the biological implications of any study competent to investigate this phenomenon to begin with. To find a data set about living metazoan form not to be “integrated” is to admit that the data set was not worth collecting in the first place, or, worse, that it was collected incompetently.

2.:

An apparently more sophisticated query, “How integrated is this data set?”, is still always inappropriate in any GMM context. As was the case for the one-bit reduction, this version still lamentably fails to investigate the logic of both the list of landmark coordinates contributing to those measurements and the list of measurements, shape coordinates or Boas coordinates making up the data set whose covariances are being investigated. (Regarding the latter, this paper will demonstrate explicitly how unstable one of the standard “measures of integration,” \(V_\mathrm{rel},\) is when applied to different parameterizations, equally biotheoretically plausible, of the selfsame landmark configuration.) Put this more baldly: inasmuch as the delineation of the particular set of quantities to be sent for multivariate analysis is a purely psychological process inside the mind of the biologist, lacking any extrinsic biological meaning, no concept of “increasing” or “decreasing” “strength of integration” in a GMM context can be biologically meaningful in the absence of an unacceptable count of lengthy table notes and caveats.

Thus the goal of a GMM analysis should never be “testing a null hypothesis,” a composite of these first two questions, as the term is used in the social sciences or medicine. Important morphometric questions never have binary answers (yes/no) or even simple scalar answers (a single numerical magnitude of some “effect” abstracted from the multidimensional settings where contemporary morphometrics overwhelmingly does its work). Wherever the space of possible “effects” is of some high dimension, the investigator’s task is to ascertain what simultaneous morphogenetic patterns—note the plural—the observed variability actually embodies. Were the null hypothesis conceivably true as a biological theory—were a finding of no patterning genuinely tenable for the data set at hand—usually the study should not have been undertaken in the first place: its statistical power would have been just too low. In these high-dimensional settings, there must be prior assurances that there are patterns to be unearthed worth the effort of collecting the data and exercising the software that purports to reveal them. “Statistical significance” is useless as a guide to the scientific understanding of a real effect—the clearest argument along these lines was already published 65 years ago by the physicist E. T. Jaynes (see his posthumous textbook of 2003). It is even more irrelevant to progress in domains such as evolutionary or developmental biology where the existence of real effects is known for certain from experimental or paleobiological evidence and where the descriptions of good multivariate data sets reveal features consistent with a range of simultaneous explanations, only some of which are biologically coherent.

3.:

In other words, the only operationalizations of the integration concept that can lead to valid morphometric explanations are those that allow an answer to the quite different interrogation “How are these data integrated?” Those answers must be based in a data-dependent rhetoric for the reporting of whole patterns of covarying measurements along with the homologies of the corresponding point locations. A typical answer, indeed, will refer to a list of unearthed factors (patterns of covariation that might be under the control of separate evolutionary and/or developmental processes). At the least, these analyses must involve information from the means and the variances of the constituent data series; at best, they also articulate with information channels distinct from those of the morphometric domain per se. In morphometrics, principal components analysis is not the inspection of a covariance matrix alone, much less just the list of its eigenvalues, but instead requires attending to both the loadings of eigenvectors and their scores, so as to expedite discussion of the patterns manifested in both domains. In sum, I will argue, because (as Olson and Miller already said so many years ago) assertions about integration are matters not of statistical algebra but of causal analysis, reference is required to the biologist’s entire a-priori knowledge base pertaining to the system(s) governing the data, knowledge that should have contributed to the decision to analyze just these particular data using these particular arithmetical manipulations in the first place.

To be meaningful, then, a GMM analysis must take the form of a spectrum of simultaneously applicable pattern findings, not just a single one. In the leghorn example to come, Sect. “A Classic Example of Disarticulated Distance Data”, this will be the spectrum of one general size factor together with some number (which turns out to be three) of “special factors”; in the rodent neurocranial example, Sect. “Landmark Configurations Analyzed by Distance Networks”, a general growth factor incorporating three simultaneous gradients, together with an effect focal to one specific landmark location with respect to its neighbors. Section “Landmarks Analyzed by Boas Coordinates” replicates the finding of Sect. “Landmark Configurations Analyzed by Distance Networks” using up-to-date GMM tools, and Sect. “A More Complicated Example, Involving Semilandmarks” applies the same protocol to a data set I have not previously submitted to so searching a reanalysis, a H. sapiens example from 2003. The finding, coincidentally, is in essence the same: a single-factor model for size allometry combining three gradients plus a strong effect local to the vicinity of one particular structure. The point of each of these analyses is to arrive at the simplest description that adequately accounts for the pattern(s) the analytic algebra has revealed. But none of these simplest descriptions reduce to single decimal quantities. Section “Discussion”, a closing discussion, draws inferences from these examples for criteria of adequacy in the execution of studies of morphological integration more generally.

What makes one of these accounts “adequate” is not a function of the statistical algebra or its standard errors. Rather, its justification stems from what Olson and Miller referred to as its “F-group,” the biological context in which the investigation as a whole has taken place. That context might be heterochrony versus speciation, or the relative proportions of the uniform and first through fifth partial warps of the rat’s neurocranial growth as they are informed by theories of fluid pressure versus directionality of stresses versus geography of osteogenesis, or the detailed integration (or not) of the factor of maxillary or bimaxillary prognathism that so sharply characterizes human cranial growth allometry. In short, no GMM finding should ever be a binary decision, zero or one, yes or no, and likewise should never be a single quantity (not even the \(\mu \)-BPC crossproduct to be introduced presently).

A Classic Example of Disarticulated Distance Data

While the central concern of GMM analysis is with data that are Cartesian coordinates of labelled points, it is helpful to begin with a considerably older data scheme, the simple list of distance measures. In this realm, the morphometric methodology formalized by Olson and Miller was at root an imitation and adaptation of the much earlier tradition of factor analysis instigated as far back as 1900 and disseminated over the twentieth century into a wide range of the natural and social sciences (see, for instance, Reyment & Jöreskog, 1993 or Malinowski, 2002; cf. Harman, 1976). The locus classicus of the specifically morphometric version of this work is a pair of papers by Sewall Wright (1932, 1954) that my University of Michigan group reviewed a long time ago, in Bookstein et al. (1985). Here I follow the condensed narrative in Bookstein (2014), Sec. 5.2.3.3, which combines Wright’s story about factor analysis with an auxiliary motif from an entirely different multivariate theorem.

Begin with that “auxiliary motif,” the concentration matrix (sometimes called the precision matrix) usually appearing in the context of a multiple regression. For any covariance structure \(\Sigma \) of full rank there is an inverse matrix \(\Sigma ^{-1},\) the matrix that, when multiplied by \(\Sigma \) (on either side), yields the identity matrix of appropriate rank. Divide each element of \(\Sigma ^{-1}\) by the square root of the product of the diagonal elements in its row and in its column. (In other words, using the same maneuver as the one by which we get from a covariance to a correlation matrix, replace the matrix \(\Sigma ^{-1}\) by the matrix \(\Sigma ^{-1}\Big /\Bigl (\sqrt{\mathrm{diag}(\Sigma ^{-1})}\otimes \sqrt{\mathrm{diag}(\Sigma ^{-1})}\Bigr )\), where, again, the division operation is meant termwise.) Off the diagonal, the resulting quantities, multiplied by \(-1\), are the partial correlations of each pair of the original variables after regressing out all of the other variables in the study. By the standards of multivariate statistics, this fact appears to be relatively new—its first appearance in print seems to have been Guttman (1938). There is an accessible elementary derivation in Bookstein (2014, pp. 321–322); the multiplication by \(-1\) just mentioned comes from the minus signs in the matrix factor of the explicit formula \(\begin{pmatrix}c&{}-b\\ -b&{}a\end{pmatrix}\Big /(ac-b^2)\) for the inverse of the \(2\times 2\) matrix \(\begin{pmatrix}a&{}b\\ b&{}c\end{pmatrix}\). This approach is sensible whenever the ultimate explanation of the network’s behavior is intended as a superposition of “general factors,” applying to large subsets of the measurements if not all of them at once, atop “special factors” that represent processes local to small sublists. In the approach via concentration matrices, “local” is taken to mean “dyadic.” See also Mitteroecker and Bookstein (2009).

Consider a data set first brought to the attention of evolutionary biologists by Sewall Wright in 1932: a matrix of six length measurements on skeletal elements of 276 leghorn chickens originally published by L. C. Dunn in 1928. Leslie Marcus transcribed the data set into the file data/fowlfem.sas.txt on Jim Rohlf’s web site https://sbmorphometrics.org. Figure 1 scatters all fifteen pairs of these six length measures for the 274 specimens that appear not to be outliers on any of the measurements. As the labels indicate, the measurements fall into three pairs, two each for the birds’ skulls, wings, and legs.

Fig. 1
figure 1

Scatterplots of all pairs of Dunn’s (1928) six measurements of leghorn chickens, exploited for pedagogical purposes by Sewall Wright beginning in the 1930's. The data set is reduced from Wright’s count of 276 organisms by the deletion of two outlying cases, specimens 5 and 101 of the archival version at https://sbmorphometrics.org.

All the panels here seem to be consistent with the elliptic shape first noted by Francis Galton in the 1870’s. They differ, however, in their apparent axis ratios, as summarized effectively in their correlation matrix R:

$$\begin{aligned} R=\begin{pmatrix} 1.000&{}0.583&{}0.567&{}0.603&{}0.6230&{}0.603\\ 0.583&{}1.000&{}0.515&{}0.550&{}0.590&{}0.525\\ 0.567&{}0.515&{}1.000&{}0.930&{}0.883&{}0.879\\ 0.603&{}0.550&{}0.930&{}1.000&{}0.887&{}0.903\\ 0.623&{}0.590&{}0.883&{}0.887&{}1.000&{}0.945\\ 0.603&{}0.525&{}0.879&{}0.903&{}0.945&{}1.000 \end{pmatrix}. \end{aligned}$$

Wright’s exposition begins with the standard eigenanalysis of this matrix, its decomposition \(R=E\Lambda E'\) by principal components. But he firmly rejects this as a biological explanation once he notices that all components after the first take the unrealistic form of contrasts between two of the measures and the other four or three against the other three. No biological processes obviously correspond to such contrasts. Instead, relying on his own favorite method of path analysis, Wright recommends a different approximation to this correlation matrix as a sum of four outer products (expressions of causal factor effects) \(g\otimes g + s_1\otimes s_1 + s_2\otimes s_2 + s_3 \otimes s_3\) where \(g=(0.636,\, 0.583,\, 0.958,\, 0.947,\, 0.914,\, 0.932)\), \(s_1=(0.468,\, 0.468,\, 0,\, 0,\, 0,\, 0),\) \(s_2=(0,\, 0,\, 0.182,\, 0.182,\, 0,\, 0),\) and \(s_3=(0,\, 0,\, 0,\, 0,\, 0.269,\, 0.269)\) and, for sound but technical reasons, R’s diagonal entries, all 1.000, are not part of the approximation. The vector g is referred to as “general size” in this context, whereas \(s_1\) is the special factor for head size, \(s_2\) is the special factor for leg size, and \(s_3\) is the special factor for wing size.

As the highlighted subsets of the original Wright example are all pairs of variables (not by happenstance—this is a very simple, tightly constructed data set), we can (temporarily) forget about PCA completely, instead applying the method of concentration matrices from first principles. The inverse of Wright’s correlation matrix is

$$\begin{aligned} R^{-1}=\begin{pmatrix} 1.886&{}-0.620&{} 0.140&{} -0.376&{}-0.341&{}-0.273\\ -0.620&{} 1.803&{} 0.276&{} -0.523&{}-1.235&{} 0.824\\ 0.140&{} 0.276&{} 8.402&{} -5.861&{}-2.513&{} 0.050\\ -0.376&{}-0.523&{} -5.861&{} 9.986&{} 0.283&{}-3.628\\ -0.341&{}-1.235&{} -2.513&{} 0.283&{}11.675&{}-8.222\\ -0.273&{} 0.824&{} 0.050&{} -3.628&{}-8.222&{}11.731 \end{pmatrix}, \end{aligned}$$

so that after its diagonal is normalized to unity and the off-diagonal entries reversed in sign those completely partialled correlations off the diagonal are seen to be

$$\begin{aligned} \begin{pmatrix} 1.000&{} 0.336&{}-0.035&{} 0.087&{} 0.073&{} 0.058\\ 0.336&{} 1.000&{}-0.071&{} 0.123&{} 0.269&{} -0.179\\ -0.035&{}-0.071&{} 1.000&{} 0.640&{} 0.254&{} -0.005\\ 0.087&{} 0.123&{} 0.640&{} 1.000&{} -0.026&{} 0.335\\ 0.073&{} 0.269&{} 0.254&{} -0.026&{} 1.000&{} 0.703\\ 0.058&{}-0.179&{}-0.005&{} 0.335&{} 0.703&{} 1.000 \end{pmatrix} \end{aligned}$$

in which the largest entries are evidently in the (1,2) and (2,1) cells, the (3,4) and (4,3) cells, and the (5,6) and (6,5) cells. In other words, the three largest entries in the normalized concentration matrix are in the same positions as the three special factors \(s_1, s_2, s_3\) in Wright’s version of the analysis, and now we see that two of them, for the limbs, are roughly equal at a value double that of the third (for the head). The concentration of these residuals in the larger two of these three numbers would be surprising enough to start a hypothesis rolling were the analysis not nearly a century old already. ConsilienceFootnote 5 would probably require the sort of detailed knowledge of morphogenetic control mechanisms that arose only toward the end of the twentieth century.

Wright’s expositions emphasize that this analysis is not closely related to principal component analysis (PCA) even though one obvious iterative algorithm for its estimation starts there. The general factor g is close to, but not identical with, the first principal component of the six length measures. More importantly, the classic Perron–Frobenius theorem that applies to eigenanalysis of matrices of all-positive quantities forces all principal components after the first to have loadings of mixed sign, some positive and some negative (Reyment, 2013). Each of these thus must be taken as the formula for some “contrast,” which Wright rightly rejected as any part of a valid interpretation of a correlation matrix whose entries were all so large and positive.

For the purposes of this paper, interrogate one aspect of Wright’s exemplar that has become central to discussions of a methodology, geometric morphometrics, that was not even imagined during Wright’s lifetime: why is he using correlations? As the previous section noted, analysis of correlation matrices tacitly ignores two major sources of biological information, the vector of means of the variables considered and the vector of their separate variances, that in principle should contribute to any meaningful summary of whatever morphometric patterns are unearthed by subsequent arithmetical maneuvers.

So let us go back to Wright’s borrowed data, in the 274-specimen version of Fig. 1, and compare three different approaches to this concentration-matrix maneuver: analysis of the covariance matrix of the original distance measures, analysis of the corresponding correlation matrix, and analysis of an alternative version put forward by Jolicoeur (1963), the covariances of the logarithms of the same six distances. Here are the two other covariance matrices we need. (For convenience and conciseness of both text and graphics I will henceforth refer to these options as the “cov and “log cov” versions; the original, printed earlier, is the “cor” version.) The covariance matrix of the Wright data set is

$$\begin{aligned} \begin{pmatrix} 1.575&{}0.682&{}2.288&{}3.730&{} 2.204&{}2.068\\ 0.682&{} 08691&{}1.542&{}2.530&{} 1.550&{}1.337\\ 2.288&{}1.542&{}10.327&{}14.735&{}7.996&{}7.721\\ 3.730&{}2.530&{}14.735&{}24.332&{}12.336&{}12.170\\ 2.204&{}1.550&{}7.996&{}12.336&{}7.944&{}7.277\\ 2.068&{}1.337&{}7.721&{}12.170&{}7.277&{}7.471 \end{pmatrix}. \end{aligned}$$

Standard deviations of the six distance measures, in millimeters, are 1.255, 0.932, 3.216, 4.933, 2.818, and 2.73, a more than fivefold range. So the limb measures, especially the tibia, are much more variable than the skull measures. But of course that is mainly because they are simply larger as lengths—the means of the six variables are 38.8, 29.8, 77.4, 115.0, 74.7, and 68.9 mm. In the corresponding correlation matrix R, clearly the limb measures correlate more among themselves than the skull measures do with each other or with the limbs.

The log cov version of this same data set is

$$\begin{aligned} 0.001 \times \begin{pmatrix} 1.057&{}0.597&{}0.767&{}0.844&{}0.766&{}0.780\\ 0.597&{}0.984&{}0.675&{}0.748&{}0.702&{}0.659\\ 0.767&{}0.675&{}1.730&{}1.661&{}1.391&{}1.458\\ 0.844&{}0.748&{}1.661&{}1.846&{}1.446&{}1.548\\ 0.766&{}0.702&{}1.391&{}1.446&{}1.435&{}1.426\\ 0.780&{}0.659&{}1.458&{}1.548&{}1.426&{}1.589 \end{pmatrix} \end{aligned}$$

from which the corresponding standard deviations are 0.0325, 0.0314, 0.0420, 0.0430, 0.0379, and 0.0399. These are all very small, of course—in effect the log transform is virtually equivalent to dividing each original length measure by its own mean, one of the principal justifications of log transformations in general (Bookstein et al., 1985). And also their range is much smaller than it was for the untransformed distances. Nothing to this point suggests that it was a good idea to switch to correlations; rather, the appropriate standardization should have been switching to logarithms all along.

Turn now to the corresponding three principal component analyses. The 6-vectors of extracted eigenvalues, normalized to the same sum of squares, are plotted (to a log scale) in Fig. 2.Footnote 6 All three variant analyses confirm the overwhelming dominance of the first component (general size), but whereas the cor and log cov analyses closely track each other, the analysis of raw covariances (the open circles) differs greatly, as four of the six variables are limb factors of high variances separately and also high covariances among them, leaving much less signal to be allotted to the other five dimensions of this six-dimensional space. Judging integration by some function of these eigenvalue plots will evidently not result in sound biometric inferences (as the decision to use cov, cor, or log cov is evidently not a matter of biology).

Fig. 2
figure 2

Eigenvalues of the three principal component analyses of the Wright leghorn example: covariances or correlations of the measured lengths, or covariances of their logarithms. The correlation matrix, Wright’s choice, closely matches the log-covariance approach, which does not involve standardizing the variances of the separate distance measurements. The vertical axis of this plot is standardized and log-transformed to clarify the parallelism of the three profiles of principal component variances after the first

We learn more by inspecting the matrices of normed eigenvectors arising in the same three eigenanalyses. For the covariances, this is

$$\begin{aligned} \begin{pmatrix} 0.115&{}0.229&{}0.656&{}0.504&{}-0.476&{}-0.153\\ 0.078&{}0.139&{}0.357&{}0.264&{}0.674&{}0.568\\ 0.446&{}-0.188&{}-0.552&{}0.666&{}-0.073&{}0.109\\ 0.703&{}-0.529&{}0.331&{}-0.322&{}0.062&{}-0.098\\ 0.386&{}0.616&{}-0.109&{}-0.041&{}0.377&{}-0.563\\ 0.375&{}0.483&{}-0.127&{}-0.356&{}-0.410&{}0.561 \end{pmatrix} \end{aligned}$$

where we see that loadings on the first component, while all positive, vary by a factor of nearly ten. All subsequent factors must have mixed sign, as required by the Perron–Frobenius theorem. In the second column, for instance, this “contrast” is mainly of the two wing measures against only one of the leg measures, tibia length.

In connection with the GMM analyses to come it will be of interest to describe the relation between this first column of loadings and the vector of means of the six measures. One reasonable approach is to convert the list of six means to a unit vector (0.217, 0.167, 0.433, 0.644, 0.418, 0.386) and then sum its crossproducts by that first eigenvector: \(0.217\times 0.115+\ldots + 0.386\times 0.375=0.988.\) (If the vectors were perfectly proportional, this crossproduct would be 1.0. The difference of this coefficient from 1.0 is analogous to the Procrustes distance between mean shape and BPC1 that will be coming in Fig. 7 below.)

Continuing, we may examine the normalized eigenvectors for the correlation matrix. These are

$$\begin{aligned} \begin{pmatrix} 0.347&{}-0.529&{}-0.772&{} 0.049&{} 0.026&{}-0.008\\ 0.324&{}-0.702&{} 0.629&{} 0.028&{}-0.015&{}-0.078\\ 0.433&{} 0.289&{} 0.054&{} 0.549&{} 0.590&{}-0.276\\ 0.441&{} 0.232&{} 0.040&{} 0.422&{}-0.667&{} 0.355\\ 0.445&{} 0.163&{} 0.061&{}-0.516&{} 0.356&{} 0.615\\ 0.440&{} 0.252&{}-0.009&{}-0.501&{}-0.281&{}-0.643 \end{pmatrix} \end{aligned}$$

where we note a radically different second eigenvector (and thus all subsequent eigenvectors) contrasting the two skull lengths with the suite of four limb lengths weighted almost equally. The third eigenvector is a contrast of the two skull measures, the fourth the contrast of the two limb dyads. Apparently all are potentially “meaningful,” or, rather, would be potentially meaningful had they not been forced into the form of contrasts.

This matrix is close to the loading matrix for our third eigenanalysis option, the log cov matrix, for which, as Fig. 2 shows, the vectors of eigenvalues were closely matched. The loadings are

$$\begin{aligned} \begin{pmatrix} 0.274&{}-0.623&{}-0.727&{}-0.084&{}-0.369&{}-0.008\\ 0.244&{}-0.682&{}0.681&{}0.029&{}0.005&{}-0.104\\ 0.472&{}0.264&{}0.045&{}0.518&{}-0.625&{}-0.215\\ 0.495&{}0.191&{}0.031&{}0.401&{}0.698&{}0.263\\ 0.435&{}0.076&{}0.047&{}-0.512&{}0.291&{}0.675\\ 0.456&{}-0.186&{}-0.054&{}-0.549&{}0.187&{}-0.647 \end{pmatrix} \end{aligned}$$

where the greater discrepancy between the skull loadings and the limb loadings on the general size factor (column 1) reflects the greater variance of those four log size measures.

Because correlations of multiple regression residuals are independent of rescaling, the partial correlations derived from the concentration matrix of the raw covariances must be the same as those printed earlier from the correlation matrix. For the log cov version, the partial correlations are

$$\begin{aligned} \begin{pmatrix} 1.000&{} 0.338&{} -0.036&{} 0.090&{} 0.074&{} 0.054\\ 0.338&{} 1.000&{} -0.070&{} 0.127&{} 0.265&{} -0.177\\ -0.036&{} -0.070&{} 1.000&{} 0.635&{} 0.253&{} -0.003\\ 0.090&{} 0.127&{} 0.635&{} 1.000&{} -0.023&{} 0.339\\ 0.074&{} 0.265&{} 0.253&{} -0.023&{} 1.000&{} 0.698\\ 0.054&{} -0.177&{} -0.003&{} 0.339&{} 0.698&{} 1.000 \end{pmatrix}. \end{aligned}$$

This matrix conveys exactly the same three “special size factors,” corresponding to the large entries in the (1,2), (3,4), and (5,6) cells—0.338 instead of 0.336, 0.635 instead of 0.640, 0.698 instead of 0.703. We have thereby learned something quite important for the extensions to GMM that will follow: in principle, neither standardization of size nor standardization of variability is necessary for the exploration of morphological integration.

Distill this whole line of calculations down into the following dialogue:

  • Is this data set integrated? Wrong question.

  • How integrated is this data set? Still the wrong question.

  • How is this data set integrated? We find four factors that together explain nearly all of the patterning here. One factor is “general size,” with pattern approximately (0.274, 0.244, 0.472, 0.495, 0.435, 0.456) (this will be slightly altered from PC1 after accounting for the special factors, as explained in Bookstein et al., 1985), while the other three are “special size factors” accounting for the three additional associations among the two skull measures, the two limb measures, and the two leg measures. General size accounts for much more variation of these 6-vectors of measured length (or log length) than any of the special factors. One might say that the analysis has confirmed what any rural child already knows, that a living chicken has a head along with two pairs of limbs that function differently. The principal finding is thus one of integration at multiple levels. It is this rhetoric that extends almost verbatim to the GMM context.

Landmark Configurations Analyzed by Distance Networks

Another locus classicus of morphometric pedagogy is the Vilmann data set of eight landmarks on the midsagittal neural skulls of 21 closebred laboratory rats radiographed at eight ages each, Fig. 3. Clockwise from lower left, the landmarks are Basion, Opisthion, Intrapariental Suture, Lambda, Bregma, Sphenoëthmoid synchondrosis, Intersphenoidal synchondrosis, and Sphenoöccipital synchrodrosis, or Bas, Opi, IPP, Lam, Brg, SES, ISS, SOS for short. Breeding and imaging were by the Danish craniofacial biologist Henning Vilmann; the landmarks were digitized by Melvin Moss more than 40 years ago. From the 164 8-vectors of digitized locations published in Bookstein (1991) I have extracted the 144 octagonal configurations for the 18 animals with complete data at all eight ages of observation (7 days, 14, 21, 30, 40, 60, 90, and 150 days). This data set has been the subject of many previous demonstrations (e.g. Bookstein 2014, 2018, 2021a), but the exposition here is new.

Fig. 3
figure 3

Eight landmarks on the midsagittal neurocranial octagon of Vilmann’s study of laboratory rat growth. From Bookstein (1991), Fig. 3.4.1

To begin, let us imitate the previous methodology in its PCA version. Treating the Vilmann data set as specifying only measures of length (i.e., ignoring the mean landmark configuration that will play so important a role in subsequent diagrams), eight landmarks yield 28 distances. A preliminary GMM analysis involving them all might initially seem promising—when the three styles of eigenanalyses of the 28 distances (pairwise covariances or pairwise correlations of the distances, or pairwise covariances of their logarithms) are compared (Fig. 4, eigenvalues scaled always to sum of squares 28), we see that the apparent “strength” of the first component, which would be Wright’s “general size,” is always the same, and always exceeds the variance explained by the second component by a ratio of at least 30 to 1. If the only information we had were these measured distances, we would have just one dimension to report on.

Fig. 4
figure 4

Traces of log-transformed eigenvalues for the same three versions of the association matrix, 28 interlandmark distances for the Vilmann rodent neurocranial octagons. The plot is truncated near eigenvalue 13 because the number of independent degrees of freedom for distances or any near-linear modification of distances among eight landmarks in two dimensions is only \(2\cdot 8-3=13\); all the eigenvalues past that point express mere nonlinearity of the multiple triangulations. All traces are scaled to sum of squares 28, and clearly they agree on the explanatory power of that first eigenvector, general size

Unlike the case for Wright’s leghorn data, the findings for the correlation approach differ from the findings for the other two. One sees this quite clearly in the elements of the corresponding first eigenvectors, presented as a scatterplot in the left panel of Fig. 5. Distances IPP-Lam and Bas-Opi, two of the three distances loading least on General Size in the log-covariance approach, evidently behave differently from the other 26 distances in the analysis of loadings on the first correlation eigenvector (left panel, horizontal axis), By contrast, the loadings on this first eigenvector for the covariances of the logarithms vary over a threefold range. The same two distances are outliers (in opposite directions) in the right-hand plot, the loadings on the second principal component of the two types of analysis; nevertheless the patterns for the second eigenvector are much more in concord between the methods.

Fig. 5
figure 5

Scatters of the loadings on the first two principal components for the cor and log cov matrices of the 28 distances of the Vilmann neurocranial octagons

But treating the data resource merely as a list of 28 distances this way offers us no tools by which to interpret the vertical axis in the left panel of Fig. 5, the roster of loadings on that general size factor itself in the log cov approach. (For instance, as the caption for Fig. 4 indicates, we cannot carry out a concentration matrix analysis for these 28 distances, as only thirteen of its dimensions bear information. We shall see that a search for Wright-style “special factors” here requires more information about those distances than just those covariances.) The log cov approach does indeed take into account both the mean and the variance of each separate distance, but still it pays no attention to the pairing of distances that share a landmark, the near-collinearity of some distances, the near-perpendicularity of other distances, and, generally speaking, the configuration of landmark points that underlay all these computations. Figure 6 shows the situation as starkly as possible by jointly diagramming all three of these candidate “profiles of integration,” just for PC1. A fourth panel, lower right, would seem the simplest possible: ignoring all the multivariate algebra, it just displays the raw growth ratios, age 7 days to age 150 days, of these same 28 distances. The log cov eigenvector and the explicit growth ratios, the panels in the figure’s lower row, correlate 0.993. Their import is thus exactly the same when interpreted in toto as complete networks among interlandmark segments varying in location and direction. (The analysis by correlations is least aligned with the others, correlating only 0.483 with the observed growth rates.)

Fig. 6
figure 6

Saturated networks of growth descriptions for the 28 interlandmark distances of the octagon in Fig. 3, in four versions: first principal component of the cov, cor, and log cor analyses, and raw growth ratios from the average of these animals’ age-7-day measures to their age-150-day averages. For landmark names, see Fig. 3

But these distances are not logically independent observations, the way they were in the leghorn example. They all depend together on the set of eight points characterizing each octagon; they total only thirteen degrees of freedom, not 28. The list of relevant means to accommodate must include the mean landmark configuration itself, the arrangement of these points each in relation to all the others. And the relevant “variance” to study is the variation of all those shapes at once, in a context of all possible shape measurements of their configuration.

Hence Fig. 6 reminds us that we know much more about these specimens than just the distances—we also know the configuration of the landmarks between which these distances were taken. GMM calls this calibration the template of the specimens. As soon as we inject that construction into the workflow, as Fig. 6 already has done, an interpretation of any of the eigenfindings becomes clear: smaller growth rates, or smaller coefficients on any of the PC1’s, for the verticals than the horizontals; larger growth rates or PC1 entries for the landmarks along the cranial base (SES-ISS-SOS-Bas) than for the landmarks along the calvaria (IPP-Lam-Brg).

Hence in diagramming the data this way, it turns out that we did not need any multivariate modeling at all—the patterns of loadings of the log covariances, lower left panel, is almost exactly the same as the pattern of the raw growth ratios of those distances themselves, lower right panel. We didn’t need PCA to notice the excess of those horizontal changes over the vertical changes and, of those, the excess along the cranial base over those of the calvaria; nor did we need PCA to see the anomalous value of IPP-Opi growth at the posteriormost edge of this octagon. We need, however, to use this language: to report changes in all the distances, not just a few, considered with reference to the layout of the landmarks that delimit them all (some pairs close together, others far apart; some collinear, others perpendicular; some sharing an endpoint; etc.) For completeness I note the value of the correlation-based coefficient \(V_\mathrm{rel}\) for the data set of these 28 distances; it is 0.852. Of course fifteen of the 28 eigenvalues are structural zeroes.

Landmarks Analyzed by Boas Coordinates

Here at the midpoint of our second example it is convenient to adopt again the call-and-response model of a dialogue that interrogates this data analysis from a biological point of view.

  • Is the Vilmann data set integrated? Wrong question.

  • How integrated is this data set of neurocranial octagons? The answer would summarize one of those eigenanalyses in some single quantity; so this is still the wrong question.

  • How is this data set integrated? That is where biologically relevant reportage should be focused. Figure 6 seems essential for communication of the findings to this point, but its geometry was not actually involved in any of the computations reported there. To answer the real question about integration, we need a method mapping the findings about changes of landmark position onto this prior information deriving from the study design per se. Such arithmetic requires the comparison of two representations having the same formal structure, one the typical or textbook or average landmark locations, the other the joint variation of these locations as it structures potential comparisons of interest in that same original study design.

Thus it is time to abandon the analysis of distances, the only technology available to Wright so long ago, in favor of the analysis of point coordinates at the core of today’s GMM: to understand the details of that first principal component concept as deriving from some multivariate analysis of the standardized locations of the eight landmarks individually instead of the distances among them (even though the information content of variation in the two representations, distances or locations, is precisely the same). The Boas formalism allows a straightforward approach to this information in which an eigenanalysis of the configural variation of the Vilmann data can be calibrated to the average of those same configurations. For data sets where general size will prove to be one of the explanatory factors, the best way of proceeding (Bookstein, 2021a) is by a variant of the classical GMM approach that excises the size-standardization step from the rhetoric by which the findings are reported. This is now an analysis not of 28 distances but of 16 standardized coordinates, still with 13 degrees of freedom. Joe Felsenstein and I named them Boas coordinates because they were first hinted at in an article by Frans Boas in the remarkably distant year of 1905. I have reviewed this particular analysis in Bookstein (2021a), Because of the appearance of centrifugality in the next figure—all the landmarks seem to be diverging from a shared origin near the center of the figureFootnote 7—I labelled this pattern “centric allometry” in Bookstein (2021a). Here the allometric process it is expressing is obviously the growth of these skulls. To explicate that process quantitatively, it is sufficient to examine only the loadings of that first principal component as functions of landmark position within the mean configuration.

Fig. 7
figure 7

A “pseudoprocrustes” analysis of the loadings of the Vilmann neurocranial landmarks on Boas principal component 1 (BPC1). a Vertices of the 144 neurocranial octagons, in Boas superposition (centered and rotated but not scaled). b The vectors of this first Boas component, drawn as displacements (to an arbitrary scale) from the means of the eight landmarks in frame a. c The same BPC1 loadings, now drawn as if they were landmarks themselves. d The “pseudoprocrustes analysis”—superimposition of the Boan mean and the loadings of BPC1 as if both were shapes. e The same after correction for the apparent vertical mismatch in frame d, that is, the vertical compression of frame c with respect to the means in frame b. f The same for a seven-landmark analysis, IPP omitted

The result of such an elementary analysis, new to this paper, is remarkable in its geometric clarity. Figure 7 displays a series of panels that unambiguously extract the features of this growth configuration. Frame (a) shows all 144 octagons of this Vilmann data set; in this diagram style its centric allometry (centrifugal organization) is obvious, inasmuch as the landmarks are all peripheral and the form is convex. For completeness I note the value of \(V_\mathrm{rel}\) for this set of 16 coordinates; it is 0.536 (with three structural zeroes), radically different from the value of 0.852 for the distances, Fig. 6, even though the information content of the two data resources is precisely the same. Frame (b) draws the loadings of BPC1, the first eigenvector of the covariances of the Boas 16-vector of coordinates (8 x’s, 8 y’s), as displacements from the means in frame (a). Frame (c) draws those same vectors as displacements from the origin rather than the mean landmark positions; the resulting configuration is similar to the actual template for these same data, Fig. 3. We can quantify that similarity explicitly by an innovative application of a familiar methodology, Procrustes fitting.

Panel (d) of the figure “superimposes” the geometric expression of that first eigenvector on the actual mean Boas form of the sample—frame (c) on the dots of frame (b). I have named this a “pseudoprocrustes plot” because while Procrustes analysis in GMM explicitly pertains to locations of observed landmarks upon organisms, here one of the configurations to which the analysis is applied represents instead the coefficients of a vector that is merely being visualized as locations upon an organism. Interestingly, this is the original setting of GMM’s version of Procrustes analysis as propounded by John Gower a long time ago: Gower (1975) attributes the name to Schönemann (1968), who in turn attributes it to Hurley and Cattell (1962), where it pertains to comparisons among configurations of factor loadings in any number of dimensions. The spirit of the pseudoprocrustes plot echoes the discussion of the correlation of 0.988 between the vector of means and the first principal component loadings for the Wright leghorn data covariance structure, Sect. “A Classic Example of Disarticulated Distance Data”. In both, we are calibrating the parallelism between an empirical sample average and a multivariate optimum descriptor of covariances—a relation between the \(\mu \) and the \(\Sigma \) of the conventional model \(N(\mu ,\Sigma )\).

The fit in Fig. 7d seems meaningful, and becomes more so when we acknowledge an interpretable aspect of panel (c), the vertical compression of this layout versus that of the average form. This is the graphical version of what we saw already in the panels of Fig. 6, the dominance of vertical growth rate over horizontal. To sequester this phenomenon from the visual impact of panel (d) we use another standard tool of Procrustes-based GMM, the display of the nonaffine coordinates (Bookstein 2018, p. 478) that partial out whatever aspect of the shape variation is uniform. The result, panel (e), shows a very close superposition except at one landmark, IPP. That suggests, finally, the reanalysis in panel (f), where I have simply omitted landmark IPP from the entire sequence of steps. Now, with only seven points, the integration of the scheme seems nearly perfect: every expression of that first principal component now lies very close to the adjusted mean position of the corresponding landmark.

Graphical matches like these, Boas PC1 against Boas mean, can be quantified by the same arithmetic I already suggested for summarizing the leghorn covariance analysis in Sect. “A Classic Example of Disarticulated Distance Data”: the crossproduct of BPC1, treated as a 16-vector, against the corresponding normalized Boas mean configuration when it too is normalized to geometric length 1. Because both of these vectors are in the same centered-rotated coordinate system, that crossproduct is identical with the ordinary correlation of those two 16-vectors. For the full Vilmann octagon, panel (c) of Fig. 7, this is 0.928. When procrustes-registered forms are this similar, the squared Procrustes distance between them is approximately twice the difference between that correlation and 1.0; here that value would be about 0.144. (It is actually 0.149.) For the heptagon of landmarks remaining after IPP is sequestered, the correlation rises to 0.961, almost twice as close to the value of 1.0 that would correspond to perfect centric allometry as modeled in Bookstein (2021a), and the squared nonaffine pseudoprocrustes distance drops to 0.054.

But there is even more in this figure. We see how it matches, too, the report we’ve already noted in connection with Fig. 6, that horizontal extension is stronger along the cranial base than along the calvaria. We can verify this with another conventional statistic, extending the “regression” (in quotes because in this context it is only a data summary) of the horizontal component of that first eigenvector against the horizontal coordinate of the mean Boas position to include an interaction term with the vertical component. In the corresponding multiple regression, both geometric terms, the horizontal component and its interaction with the vertical, are found “significant” (though that concept is meaningless here), and of the net variance in those eight horizontal loadings, a full 98.7% is accounted for by those two functionals of the mean configuration. Notice, finally, that to the extent that the polygons in Fig. 7d are not the same, the formula for Boas PC1 is not the formula for Centroid Size—the quantity divided out in the course of the Procrustes analysis is not the appropriate regressor for explaining allometry anyway. (For a numerical confirmation of this assertion for these data, see Bookstein 2021a; for confirmation in a different data set, see Klingenberg 2022.)

With these crucial additional graphical resources now in hand, let us return to our theme’s basic dialogue.

  • Is this data set integrated, or not? Wrong question.

  • How integrated is this data set? Still the wrong question.

  • How is this data set integrated? The findings to this point have unearthed four geometrically distinct factors accounting for the observed pattern of changes in this octagonal landmark configuration. Three of the four qualify as aspects of integration, while any or all might be the site of experimentally manipulable processes. The census of these includes the growth of horizontal dimensions by about a factor of two over this growth period, the growth of vertical dimensions by a percentage increase of about half that, the overgrowth of the horizontal along the cranial base, and this additional, distinctive effect of the vertical overgrowth between Opi and IPP. All these effects are seen much more easily in Fig. 7 than in Fig. 6.

To make the study of integration fully consistent with state-of-the-art GMM methodology, one more tool is needed. This tool responds to a well-known problem of PCA in any context of factor analysis, where causal considerations play a prominent, often dominant, role (Reyment and Jöreskog, 1993). The relation between principal components and causal explanations is sensitive to a decisive but rarely highlighted feature of a landmark-based data set, namely, the spacing of landmarks over the form. This is crucial but also easily obscured when the study’s template incorporates semilandmarks as well as landmarks: points lying along curves (which themselves may or may not link proper landmark points) at a density which is completely the investigator’s choice to set a-priori. The problem is not with the notion of semilandmarks on curves or surfaces per se—they certainly convey information relevant to the understanding of morphogenesis and function. The problem is, rather, that principal component analysis, the technique on which the exegeses of the previous section relied, is inappropriately and counterintuitively sensitive to arbitrary weightings of the actual factors which will prove ultimately responsible for the empirical pattern(s) uncovered.Footnote 8

This problem can be brought under the investigator’s control to some extent if we take advantage of one further formalism in the standard GMM toolkit: the rotation of the nonaffine part of form space from its most accessible, diagrammable basis, the list of separate shape coordinates, into the analytically far more tractable basis of the partial warps introduced in the very first paper about thin-plate spline models of deformation (Bookstein, 1989). A much later paper (Bookstein, 2015) introduced a powerful analysis of integration based on a deep mathematical property of intrinsic random fields (distance-based patterns of correlation among multiple landmarks), the possibility of self-similarity when expressed in the basis of partial warps. Put simply, for well-behaved landmark data the partial warps are always available as an unexpectedly insightful prior spectrum, and often are seen to be self-similar, or nearly, in that spectrum; the investigator should allow the biometric algorithm to explore this elegant morphometric possibility. The best elementary tool for such explorations is the graphic I have named the bending energy – partial warp variance (bePwV) plot. This is a log-log plot of the eigenvalue of each eigenvector of the spline’s bending energy matrix, a function of the mean configuration only, against the summed variance of the corresponding pair of scores (projection onto that eigenvector) for the data set of variations at hand. As the logarithm of zero is not a number, bePwV plots omit the uniform component (which has no bending) of the partial warp spectrum.

The isotropic Mardia-Dryden distribution (Dryden and Mardia, 2016; Bookstein, 2018), a model wholly unrealistic for metazoan data, corresponds to a slope of zero in this plot. The model of the self-similar intrinsic Gaussian field, having an equal chance of information at every geometric scale, corresponds to a slope of \(-1\), and strong integration such as a powerful quadratic growth gradient to a slope of \(-2\) or better. It is obvious why one needs such a prior spectrum in order to make sense of variation as observed in arbitrary landmark configurations—neighboring landmarks are most often found in positions that covary with respect to landmarks at any greater distance, and so the patterns of their covariation are necessarily confounded with the pattern of their mean locations. The spectrum of partial warps adjusts for this in a variety of realistic contexts. As the technique is somewhat esoteric, this paper is not the place to review it in any algebraic detail. The interested reader is referred either to the original 2015 publication or to the pedagogic version in Section 5.6.8 of my 2018 textbook.

It is useful to summarize the finding here—the description of the integration pattern characterizing this GMM data set—in the four-panel Fig. 8. The contents of this exemplar and also Figs. 11 through 15 to follow lay out the basic strategy this paper is proposing for GMM integration studies in four different graphics. At upper left is the vector graphic of Boas PC1 that we have seen before, and at upper right the scatterplot of the scores on this principal component (along the abscissa here) versus BPC2 (along the ordinate). According to the scatter, the centric allometry we already noted explains very nearly all the variation in this data set: the pattern is almost one-dimensional. As all the 7-day-old rats fall at the left end of the scatter and all the 150-day-old animals at the right end, we have confirmed that the horizontal axis—the pattern shown explicitly in terms of the coordinate data at the upper left—must be understood as growth. It explains more than 40 times as much variance as the next eigenvector.

Fig. 8
figure 8

First example of a useful four-panel display of single-component analyses, showing both how to describe their patterns of integration and how to segregate out hints of nonintegrated features. Upper left, duplicate of panel (b) in Fig. 7: vector plot of an arbitrary multiple of the loadings on the first (strongly dominant) Boas principal component, usually meriting an interpretation as general size (Wright’s label for this situation). Upper right, scatter of the first two principal component scores for the same analysis, BPC1 horizontal. The youngest (7-day) configurations are plotted as the open circles at the left, the oldest (150-day) as the \(\times \) symbols at the right; axis scales are the same. Lower left, thin-plate spline (again at an arbitrary multiple) of this same first Boas principal component. Lower right, two versions of the bePwV plot, log of bending energy against log of partial warp variance, on the same axes. Open circles, for the full data set, Fig. 7; closed circles, for BPC1 alone, rescaled to the same net Procrustes variance (compare Windhager et al. 2017)

But just attributing the pattern to growth does not constitute an interpretation of the pattern. Complementing (i) this clear increase of general size, we see, from the thin-plate spline (lower left), (ii) a general change of aspect ratio—horizontal growth rates greater than vertical growth rates; (iii) a greater horizontal rate along the cranial base than along the calvaria; and (iv) a focal abnormality at the upper left corner of the octagon, involving landmark IPP, again as already noted in Fig. 7. (Indeed that unconformity was already hinted at by the vector attached to IPP in the upper left panel—it is disproportionate given the position of this landmark between Opi and Lam.)

Finally, the lower right panel shows the bePwV plot for these data in two versions: in the open circles, the full span of 144 landmark configurations (18 animals by eight ages each); in the closed circles, the same for the first eigenvector only, the pattern graphed in two different ways in the left column, normalized to the same sum of variances. With the exception of its rightmost point, the fifth partial warp, you see that this first principal component BPC1 is more integrated than the raw data themselves—the apparent slope of the regression through these first four points is steeper than \(-2,\) versus a slope of about \(-1.4\) for the same four open circles. (For the interpretation of this slope as an intensity of integration when these points lie near a line, see Bookstein 2015.) But in both versions, the last partial warp, which is the deviation of landmark IPP from the others, carries additional information, information not appropriately described as any sort of integrated phenomenon: not, at least, at this rather crude scale of neurocranial description.

The reader may wonder why the discussion of the BPC’s here is limited to BPC1 even though the plot of the first two BPC’s, Fig. 7c, is obvious curvilinear. For an interpretation of how to think about BPC2 in cases like these, see the Discussion, in the vicinity of Fig. 20.

A More Complicated Example, Involving Semilandmarks

The paper’s final example involves 21 modern human skulls from a data set first published in Bookstein et al. (2003) and reused as an example in all three of my textbooks since then (Weber and Bookstein, 2011; Bookstein, 2014, 2018). Originally the data set concerned 38 dried skulls that were CT-scanned and thereafter represented by computed midsagittal sections. The selection of 21 here matches what was used in Section 4.5 of the Weber–Bookstein text except for deletion of the extremely early archaic modern Mladeč specimen. The analyses published earlier dealt with 21 digitized landmarks together with 73 semilandmarks, points taken along curving arcs to minimize the net bending information of their variation from the average given the positions of the true landmarks (see Bookstein 2014, 2018). Two of those landmarks (B point and Crista galli) have been omitted from the present analysis, and a third, Glabella, has been recoded as a semilandmark: hence, 18 landmarks and 74 semilandmarks, 92 points in total, over 21 specimens. It will be relevant to the integration analysis that the landmarks and semilandmarks span a list of bones having different names in the anatomy atlases and different functional implications. Figure 9 displays the Boas mean form for this set of 21 92-gons. As the legend says, the 18 landmarks, along with three pairs of semilandmarks, are coded to indicate their role in the discussion to follow. Figure 10 ordinates the full data set of \(21\cdot 92=1932\) points; now only the named landmarks are highlighted (the “eighteen-point” configuration reported in Fig. 13 to come). Already there are two hints here about integration: variation in the thicknesses across the calvaria that appears to be stable along that calvarial midsagittal section, and size variation of the ten landmarks of the “core” common to Figs. 11 and 12. For completeness I note the value of \(V_\mathrm{rel}\) for this set of \(2\cdot 92\) Boas coordinates. It is 0.197 (with \(184-20=164\) structural zeroes), strikingly less than either quantity characterizing the Vilmann data set, even though their apparent integration patterns will end up described in very nearly the same way and with the same degree of assurance. For a restricted quantification, the 92 distances of these 92 points from their shared Boas center, the value of \(V_\mathrm{rel}\) is nearly doubled, at 0.386. Clearly this summary statistic is driven by something unrelated to either evolution or development—it also depends very strongly on the investigator’s arbitrary choice of the data matrix sent for eigenanalysis of correlations.

Fig. 9
figure 9

Boas mean configuration (Procrustes shape coordinates, rescaled to original centroid sizes) for 92 points on synthetic midsagittal sections of 21 Homo sapiens skulls. Landmark names and operational definitions: Alv, alveolare, inferior tip of the bony septum between the two maxillary central incisors; ANS, anterior nasal spine, top of the spina nasalis anterior; Bas, basion, midsagittal point on the anterior margin of the foramen magnum; BrE, BrI, external and internal bregma, outermost and innermost intersections of sagittal and lambdoidal sutures; CaO, canalis opticus intersection, intersection point of a chord connecting the two canalis opticus landmarks with the midsagittal plane; FCe, foramen caecum, anterior margin of foramen caecum in the midsagittal plane; FoI, fossa incisiva, midsagittal point on the posterior margin of the fossa incisiva; InE, InI, external and internal inion, most prominent projections of the occipital bone in the midsagittal; LaE, LaI, external and internal lambda, outermost and innermost intersections of sagittal and lambdoidal sutures; Nas, nasion, highest point on the nasal bones in the midsagittal plane; Opi, opisthion, midsagittal point on the posterior margin of the foramen magnum; PNS, posterior nasal spine, most posterior point of the spina nasalis; Rhi, rhinion, lowest point of the internasal suture in the midsagittal plane; Sel, sella turcica, top of dorsum sellae; Vmr, vomer, sphenobasilar suture in the midsagittal plane. Dotted lines follow the semilandmarks that trace the curving of these forms between selected landmark pairs. Legend: large black dots, the ten “core landmarks” not on the calvaria; \(\bigtriangleup \), the four landmarks of the inner calvaria curve; \(\bigtriangledown \), the four paired landmarks of the outer calvaria curve; centered dots, three pairs of semilandmarks transecting the calvaria between those four landmark pairs; small black dots, the other 68 semilandmarks; circled \(\times \), center of the Boas coordinates of the 24 points highlighted. (Modified from Bookstein, 2018)

Fig. 10
figure 10

Modification of Fig. 9 to show the distribution of all 21 forms. Large black disks, Boas coordinates for the 18 named landmarks of the previous figure in all 21 specimens

An initial comparison of two different displays deriving from this common data set makes our analytic problem clear. Figure 11 emulates the four-panel display, Fig. 8, for a fourteen-landmark configuration comprising the ten core landmarks plus the four landmarks of the outer calvaria; Fig. 12, the same substituting the four landmarks of the inner calvaria.

Fig. 11
figure 11

First Boas principal component of the ten core landmarks together with the four (Nasion, external Bregma, external Lambda, external Inion) of the outer calvaria curve. Layout is as in Fig. 8. Legend for the scatterplot at upper right: 2, 10, 14, 17, subadults of the indicated age; 4a, 4b, the two four-year-olds; a through q (omitting i and o), the fifteen adults. In this and subsequent bePwV plots, at the lower right, open circles seemingly omitted from their vertical columns are those that happened to share their plotted location with the solid disk in the same column (as the horizontal coordinate, the log bending energy, is the same for the two spectra)

Fig. 12
figure 12

The same for the ten core landmarks together with the four (Foramen caecum, internal Bregma, internal Lambda, internal Inion) of the inner calvaria curve

The analyses of Figs. 11 and 12 agree in several respects. In both of the BPC1 vector plots (upper left panels), the landmarks are generally centrifugal, moving away from the centroid: Wright would surely have referred to either of these first principal components as a first guess at a “general size” factor. And the scatterplots of scores on the first two BPCs, upper right panels, are likewise nearly identical. (Note further that in both of these figures, the specimens that are extreme on the temporal sorting of the data set, the subadults aged 2 and 4 years, lie at one extreme of the data scatter, in keeping with the Olson–Miller requirement of “time-ordered sequences” I discussed in the Introduction.) The thin-plate splines are quite similar throughout the anterior halves of these configurations—they agree that the maxilla is both taller and more prognathic with increasing size. Finally, the bePwV plots, lower right, are quite similar in the slopes of the line of open circles (around \(-0.5\) in both). But the leftmost five black dots of the bePwV plots are rearranged between the two analyses, and in the corresponding thin-plate spline diagrams the anterocaudal gradient is clearly more homogeneous posteriorly in Fig. 11 than in Fig. 12. Note, for instance, how when drawn at this degree of extrapolation the inner-calvaria BPC1 grid, Fig. 12, nearly folds in the vicinity of InI (internal Inion). Examination of the corresponding vector plot shows that this landmark fails to be displaced outward as fast as its neighbors—it fails to accommodate the isotropic (non-shape-changing) feature of the whole posterior region of this configuration. (Compare how the homologous grid cells in Fig. 11 are almost unchanged from the original square configuration.) The analyses also disagree regarding the extent to which the mean configuration explains the BPC1 loadings: the correlation that was 0.928 for the Vilmann data is 0.920 for the outer 14-gon, 0.868 for the inner 14-gon. I return to this theme in the Discussion, where the above language about displacement “outward” will be revised.

One might be tempted to simply superimpose these two configurations in the fused eighteen-landmark analysis of Fig. 13. (In fact, that was the first of these four-panel displays to be generated originally: the two fourteen-landmark versions were designed only after the perplexities of Fig. 13 came to light.) While three of the four panels here seem well-behaved, that at the upper left (the vector plot) being perhaps the most insightful (with a Boas mean – BPC1 correlation of 0.886), the outline of the splined rectangular grid at lower left seems to show spikes and dents all around that make little sense in terms of the actual landmarks driving the deformation. Also, there is an unfamiliar, nearly incomprehensible T-formation of creases (splines reducing distance nearly to zero in one direction) in the vicinity of the origin of coordinates, where there aren’t actually any landmark data (and where the nearest three landmarks, CaO, Sel, and Vmr, appear not to be shifting relative position much at all). These artifacts warn against drawing any inferences about the actual integration of this configuration of landmarks from the selection in this particular diagram.

Fig. 13
figure 13

Attempted fusion of Figs. 11 and 12 into one combined eighteen-landmark scheme. The vector plot, the scatter of scores, and the bePwV cascades all look quite similar to their earlier versions, and now the discrepancy between the shifts at InE and InI is clear. But what on earth could that crumpling of the spline near the center of the Boas coordinate system mean, or that weird alternation of spines with indentations around the outer edge of the grid?

We might investigate further whether those inconvenient grid features are due to the discrete spacing of those four paired landmarks along the calvaria. In Fig. 14 the eighteen landmarks are combined with three pairs of semilandmarks across the calvaria, corresponding to semilandmarks facing each other across this arch. The configuration here has been aligned so that each of the seven pairs of adjacent points (outer calvaria, inner calvaria) is now collinear with the center (0, 0) of the Boas coordinate system for the full set of 24 (so that the spline’s behavior is not a function of the semilandmark spacing that was part of the original digitizing operation 20 years ago). On closer inspection of this figure we note that the arrowheads of most of the seven calvaria transect pairs are farther apart than the black dots from which they spring—this implies that the calvaria is becoming relatively thicker as the skull grows, a theme that will concern us much more in a moment. The odd misbehavior of the spline, that T-shaped crumpling near the center in the absence of data there and also the spines and indentations around the periphery of the grid, persists. At the same time, the other three panels of this graphic are well-behaved. The alignment procedure alters the Boas mean – BPC1 correlation, so it should not be reported for this design.

Fig. 14
figure 14

The perplexing configuration of the thin-plate spline persists when we increase the density of representatives of shape along the calvaria from four pairs of matching points to seven. The seven pairs have now been aligned, collinear with the origin of coordinates, as described in the text

Figure 15 extends this four-panel representation to the entire 92-point data set of Fig. 10. The vector plot at upper left, though perhaps difficult to read at this printed scale, now suggests a gradient of relative calvaria thickening that is higher at both ends of this arc while more or less uniform elsewhere. As the calvaria is thin in this section, the thickening does not particularly degrade the Boas mean – BPC1 correlation, which has risen to 0.904 owing to the density of all those semilandmarks appearing displaced outward in tandem around the calvaria. The ordination of the 21 specimens, upper right, is roughly the same, but at this degree of semilandmark oversampling (which serves the interest of scientific illustration far more than it serves any theoretically driven analysis) the dominance of the first Boas principal component over the second has weakened. Oversampling of semilandmarks in this way has damaged the interpretability of the finding as “integration” as well as injecting meaningless local turbulence into the thin-plate spline display all along the calvaria. In particular, the slope of the bePwV curve for the black dots toward the right in this plot is now nearly \(-2.0,\) an artificial aggrandizement of the degree of “small-scale integration” that is driven not by the landmark positions themselves but by the covariances among the semilandmarks engendered by their joint dependence on those landmarks. Here is a strong lesson indeed about digitizing templates and strategies (compare Katina et al., 2007).

Fig. 15
figure 15

Misbehavior of the spline near the origin and around the grid outline (graphical implications unrelated to any information in the GMM data) persists in a misrepresentation of the integration when the data set reverts to the full roster of 92 points in Figs. 9 and 10

At upper left in Fig. 14 or 15 one notes how clearly the BPC1 signal incorporates a systematic thickening of the calvaria along its entire length. This finding, while in no way a surprise, is a key to the misbehavior of the spline in the vicinity of the origin of coordinates here. Its mathematical origin was originally explored by Green and me 30 years ago (Bookstein and Green, 1993) but has received hardly any attention since then.Footnote 9 Its resolution, however, is crucial to understanding the meaning of the splines that involve transects of the calvaria. They imply an additional integrated aspect of the general size pattern here, namely, calvaria thickening. The algebraic-geometric modification of the analysis that they afford circumvents those undesiderable peripheral spines and indentations and also the central degradation of the grid.

The anomalies in Figs. 13 through 15 arise because the thin-plate spline is being commanded to display a situation to which it has no mathematical access: the combination of two phenomena, one highly directional, at distinctly different spatial scales in different parts of the diagram not bridged by any data in-between. Figure 16 shows, in panels (a) and (b), the effect of thickening within one single sector of this configuration. In frame (a) you see how this thickening is accommodated in a compensatory thinning toward the center that is incorrect in the way it is graded; frame (b) also shows how in the other direction it induces a correspondingly misleading extrapolation of the thickening itself. Frames (c) and (d) indicate how iterations of this thickening around either a small or a large range of orientations combine in the emergence of four distinct apparent “features” of the spline of which only one (the thickening of the ring) is present in the data, while the other three (the spine at a corner of the grid, the indentation along the central edge of the grid, the grading toward the center of the semilandmark alignment of that compensatory thinning) are joint artifacts of the spline’s algebra or our choice of coordinate system for the grid. (For a survey of the range of biologically relevant coordinate systems other than our familiar Cartesian scheme, see Bookstein 1981.)

Fig. 16
figure 16

Thinning or thickening of a shell of semilandmarks is represented by the thin-plate spline in a strikingly counterintuitive way. In all frames, the “center,” presumed at a great enough distance from the biology of the annulus not to be functionally related, is fixed between the configurations. a A short segment of nearby paired semilandmarks distant from a center and aligned with that separation is thickened (here in a ratio of 150%) in the normal direction. The spline visualizes the thickening correctly but converts its fixed displacement from the center to a gently compensating imputed thinning in that direction, a gradient for which there is no evidence in the data. b The same where the grid happens to lie at 45 degrees to the thickening. In addition to the centripetal thinning we now see an outward extension of the directionality of the thinning. The appearance of a spine in the spline is an artifact of the position of the widening arc with respect to the rectangular grid of the standard diagram. c,d Meristic accumulations of \(45^\circ \) or \(90^\circ \) over multiple instances of the configuration in a or b are consistent among themselves in respect of both the extrapolatory and the compensatory aspects of the thickening. e Model for the thickening of the calvaria in Fig. 14: somewhat more than a semicircle’s worth of this same annular thickening model, misrepresented by a thin-plate spline. f The same deformation, viewed in polar coordinates, now clearly indicates the rotational symmetry in the vicinity of the landmark configuration. g When the thickening is extended to a complete circular annulus, the spline’s symmetry is likewise perfect when visualized in this customized coordinate system. In all frames, the starting landmark or semilandmark positions are plotted with the \(\times \) symbol and their final positions, corresponding to thickening of whatever structure they lie across, with the \(\bullet \) symbol. In frames f and g, the choice of radii is arbitrary, but they were evenly spaced in the starting configuration (not shown)

Extended to the full angular range of the calvaria in this example, frame (e) serves as a model for how the spline misapprehends a uniform thickening around the slightly more than half a circle of this semilandmark scheme to include two imaginary effects, a central thinning together with an approximate fourfold symmetry alternating the spines with the indentations. The diagram strikingly reconstitutes all the apparent features, not only the true thickening of the calvaria but also all the artifacts, in the thin-plate splines of Figs. 13 through 15. Frame (f) of Fig. 16 corrects some of these misrepresentations by switching from a Cartesian to a polar coordinate system. The indentations and the spines are gone, but the minimization of area at the origin persists along with the true phenomenon, arch thickening, with which we are concerned. Because these circles were evenly spaced in the starting grid, their apparent crowding toward the center of this revised panel confirms the (inappropriate) compensation of the calvaria’s thickening by imputed thinning in the vicinity of the center in Figs. 13 through 15. Because the arch is not fully symmetric, neither is this transformation, even in polar coordinates. The figure’s final panel (g) completes the annulus over which the thickening is observed; now the resulting coordinate transformation is perfectly symmetrical as well, although the collapse of area toward the origin of coordinates remains an artifact unrelated to any information in the data—it is purely an effect of the mismatch between the spline’s assumptions and the biology of the scene when, as in these skulls, the interior of the annulus is actual tissue with different properties from bone. Invoking the thin-plate spline in Fig. 15 may have solved one problem, imbalance of landmark/semilandmark densities, but only at the cost of creating another difficulty, one that feels less tractable.Footnote 10

To remedy this difficulty the investigator of integration who wishes to make some use of thin-plate spline representations like these must sever the thickening of the calvaria from the rest of the anatomy, in order to represent the possibility of its integration without these hallucinations. Figure 17 shows a sequence of recalculations that ultimately rescues the spline from its mistake by straighting the calvaria without changing its degree of thickening. In all six frames of this figure the thickening of the arc is the same, proceeding at a greater rate than increase in distance from the (fixed) center. In the first pair of panels, “aligned transects,” the angles of the action of BPC1 match those in Fig. 14 except for a rotation that eased my coding, and the spline indeed appears quite similar to the mystifying bad example there—for the purpose of describing changes at the calvaria, it is enough to model the rest of the landmark configuration as unchanging. Successive frames of Fig. 17 straighten this arc by factors of 0.2, 0.4, 0.6, 0.7, 0.8 (i.e. reducing the rotation of the tangent and normal to the arc by those fractions). You see how the apparent singularity of the grid recedes away from this thickened arc until it is eventually well off the diagram. But in the real, unstraightened setting, all those imputed thinnings are, in effect, cumulative—“focused”—at one single point, the center of the arc, as in the model of Fig. 16, frames (e) or (f). This effect, a maximally smooth “compensation” along the grid of a deformation that that ought not to be smooth at all when crossing the edge of the calvaria, fools our visual system into finding a signal there, in the center, where there actually are no data. By the final panel of Fig. 17, that “compensation” has been displaced leftward toward infinity, and now the thickening is graphically visible with adequate clarity.

Some readers may find the algebra of this straightening worth imitating. In the package Splus, a variant of the popular language R, I coded it in complex arithmetic (Bookstein 2018, pp. 370–371) as follows, where jhe03.21.straighten.boas begins as the matrix of aligned Boas coordinates (14 landmarks or semilandmarks in seven pairs) from Fig. 14, rotated clockwise by about 40\(^\circ \) to a new horizontal just for graphical convenience, and jhe03.21.straighten.boas.mean is their average over the specimens. Then for extents of straightening

straightangle\(\leftarrow \)c(0, .2, .4, .6, .7, .8) that were themselves chosen for visual convenience, the straightened data will attenuate the angle out of the centroid of each original aligned pair from jhe03.21.straighten.boas by an angle proportional to that straightening extent. In code, inside a loop over landmark straightening index j and landmark index ilmk, we have

straighten \(\leftarrow \) exp(0-(0+1i)*

(straightangle[j]*Arg(jhe03.21.straighten.boas.mean[ilmk]))) which is a rotation by a fraction straighten against the original angle of that landmark’s calvarial normal away from the new horizontal. (That this is an attenuation owes to that minus sign preceding the imaginary unit 0+1i in the formula.) An inner loop over specimen number then multiplies each original jhe03.21.straighten.boas by the corresponding factor straighten, thus affording the same attenuation for each little segment linking the aligned pairs in Fig. 14, by this attenuation factor away from its original direction (lengthy code line not shown). The net result is to shrink the full angular span of the calvarial arc toward straightness around its new horizontal displacement. Once straightened this way, an appropriate multiple of the first BPC of variation over the full sample of 21 midsagittal configurations can be visualized by thin-plate spline in the usual Cartesian fashion, as in Fig. 17. Interested readers may challenge their understanding of this procedure by realizing why an extent 2.0 of straightening would have the effect of turning the original calvarial arc upside-down, whereas an extent of 1.2 would attenuate most of the curvature of that version.

Such an algebraic straightening of the actual form of the calvaria is required if the integration of its thickening with aspects of the general size effect, such as maxillary prognathism, is to be captured in some way by a grid deformation. But then in this two-border arcuate region we have no real use for the grid, only for one of its dimensions, the thickening. Hence the involvement of the calvaria in the integrated effect of general size can be confirmed without damaging the visual coherence of the spline as in Figs. 13 or 14 if one simply segregates the calvaria data for the purpose of this computation, computing a principal component separately across these seven separate arc thicknesses and correlating it with the “general size” factor from Fig. 11, the factor that incorporates the position of the calvaria but depends on its thickness only to a very minor extent. (Even better would have been to use a midcalvaria medial axis, Bookstein 1991:80–87.) This correlation is 0.831, just as persuasive as any of the first PC’s in Figs. 11, 12, 13, 14 and 15, now that the calvaria’s arc has been algebraically straightened. The thickening, by the way, is nonuniform (Fig. 17, lower right, “straightened 0.8”)—it appears to be greater toward Inion than toward Nasion and greater at either end than along the central portion (a perception confirmed by the loadings on this seven-term principal component, 0.408, 0.236, 0.202, 0.224, 0.233, 0.220, 0.764 as ordered from Nas to InI). Likely this increase of thickening focal to the vicinity of Inion itself owes to the insertion of the tendons there that hold the head in its upright bipedal posture.

Fig. 17
figure 17

The effect of straightening an arch on the representation of thickening as a thin-plate spline deformation. Upper left pair: aligned fourteen-point calvaria data for the 21 sapiens, rotated for coding convenience, accompanied by the corresponding BPC1 thin-plate spline. Other panels: the same paired diagrams after the arch is straightened by fractions 0.2, 0.4, 0.6, 0.7, 0.8 of the original angular range of transects through the Boas center of the configuration. In the straightened frames, the center point (0, 0) fixed in the spline transformation is outside the plot to the left except in the two configurations in the first row. In the TPS frames, that fixed center is plotted by an open circle

In summary: the collection of 21 sapiens in this example, while not the balanced growth design characterizing Vilmann’s rats, nevertheless spans a considerable range of sizes, from a 2-year-old through a range of adults. The sample as a whole manifests one dominant principal component that combines (i) a nearly isotropic size increase in the posterior half of the configuration with (ii) greater increase in both dimensions of the maxillary region, especially (iii) prognathism along its palatal border, together with (iv) a greater relative thickening all around the more than 180-degree expanse of the calvaria. Again, as in the Vilmann neurocranial example, a single integrated principal component is described via three general features (a general size and two gradients) worth exporting for potential causal exploration, along with a special factor, here measured redundantly enough to qualify as one aspect of an integrated description, but perhaps suggestive of a separate biomechanical signal all its own. We see, finally, that the standard thin-plate spline cannot easily handle the pattern of points in this example, involving as it does an imbalance of an extended densely sampled curve where the deformation’s gradients are highly anisotropic (thickening principally in one direction only) against large empty sectors characterized by growth gradients that are roughly commensurate around the circle of directions. When landmark/semilandmark data have been gathered in so unbalanced a design, it is not the splines but the Boas coordinate vector plots that convey the message meriting an interpretation in terms of dimensions of morphological integration.

Discussion

The polar coordinate rendering style for thin-plate deformations, frames (f) and (g) of Fig. 16, could be of interest well beyond the bounds of the present paper, whenever a concern for integration interacts with structures that may be regulated in the context of a center at some distance. More generally, just as neither \(V_\mathrm{rel}\) nor any other single algebraic summary procedure should be considered adequate for pursuing the variety of phenomena the organismal biologist ought to be calling “integration,”Footnote 11 discussion of so subtle a composite inference about the evolution or development of animals should not rely on a community’s preference for one single graphical style, however convenient, as a constraint on the report of such GMM-driven studies.

Reverting to the analysis of Sect. “Landmarks Analyzed by Boas Coordinates”, Fig. 18 visualizes the growth of Vilmann’s neurocranial octagons from age 7 to age 150 days using the same polar-coordinate alternative just introduced in Fig. 16. In a choice that is probably unfair to arachnology, it is an irresistible temptation to refer to these as spiderweb diagrams. The polar option is just as informative as the Cartesian option for this comparison—note its agreement on the nonconformity of the landmark IPP in the lower left panel. The central column confirms the near-return of symmetry between cranial base and calvaria when landmark IPP is omitted from the octagon (the asymmetry between ends of the cranial base persists). At the right is the analogous polar graphic for the aligned 24-point analysis of the 21-sapiens BPC1. The display is based in the same information as the lower-left panel of Fig. 14, but the way it reaches the viewer’s eye is different in several ways. It confirms the thickening of the calvaria, of course, but also suggests an echo of that geometry halfway around the form from that region, in the maxilla. Though the misrepresentation of that central region remains, the spines and indentations of the Cartesian (rectangular) boundary are gone now. This polar-coordinate version of the spline deserves to be proffered by the standard TPS packages as a new user-accessible option, an alternate graphical style that replaces Thompson’s (1917) Cartesian grid by a polar grid as its icon.

Fig. 18
figure 18

Spiderweb renderings (polar coordinate grids) for the principal findings of the spline deformation of interest in the paper’s two main examples. Top row, undistorted polar coordinate grids; bottom row, polar representation of the standard spline deformation. Left column, deformation from the age-7 Vilmann mean to the 150-day mean, equivalent to the deformation Figs. 6 and 7 called BPC1, extrapolated threefold (Bookstein 2014,p. 445, 2018, p. 458). Central column, the same after landmark IPP is omitted (as it was in the final panel of Fig. 7). Right column, polar representation of BPC1 for the 21 aligned human 24-landmark configurations displayed by Cartesian coordinates in Fig. 14. Each panel is centered on the centroid of the point configuration there

Combining the rodent and sapiens examples, it is clear that the task of assessing “integration” in a GMM data set is not a matter that can be handled by unsupervised algorithms, or maybe not even by unsupervised graduate students. Rather, discerning human judgment is required, along with considerable knowledge of the classical and contemporary literature of evolutionary or developmental organismal biology and a willingness to refine computations in lengthy series along which both landmark counts and their configurations are varied. When seemingly innocuous changes of a landmark list impose major changes on the structure of the statistical workup (contrast Fig. 11 with Fig. 13), neither one ought to be believed; more work needs to be done. [Compare Joe Felsenstein’s (1985) famous point about the standard error of computed phylogenies.] The main lesson of Olson and Miller (1958) for GMM turns out to be its extensive exploration of alternatives for reporting the \(\rho \)-groups. Insofar as the import of a correlation-driven PCA tracks the averages (though not the maxima) of that matrix to some extent, this Olson–Miller concern is closely allied to the surveys of alternative landmark sets and even the manipulations of calvaria geometry explored in this paper. Surely no single formula such as the normed variance of the diagonal of the PCA-derived matrix \(\Lambda \) based on correlations, the quantity usually called \(V_\mathrm{rel},\) is clever enough to be trusted as an empirical guide in these matters.

Regarding this paper’s conclusions about the structure of integration in its three examples, I have confirmed Wright’s four-factor finding while rederiving it using an alternative computational strategy toward that end, and likewise there is nothing new in the analysis of the Vilmann data set, which, except for Fig. 7, has been published before in its present form. The reanalysis of the 21 H. sapiens configurations, however, is novel, inasmuch as it claims that some obvious features of previously published thin-plate splines are artifacts and that the data set as a whole has too many semilandmarks to be a reliable guide to principal-component interpretations of integration or any other biologically meaningful decomposition of its structure without extensive methodological modifications. (This problem, the miscalibration of our intuitive apprehension of a situation involving a very large number of very weak predictors, was handled well over a century ago: see, for instance, Pearson 1914.) Both examples conform to my narrative purpose here, which has been the interweaving of GMM algebra with the spectrum of pattern incarnations of integration afforded by all the other available biological information. Sometimes this additional (but far from ancillary) information is as elementary as the interpretation of size change as “growth,” but it can be as complex as the arrangements of the bones of the mammalian skull over its important categories of content, what Melvin Moss famously called its “functional matrices.”

What about the second Boas principal component? In his leghorn chicken example, Wright arrived at the factor model by inspecting the artificial contrasts along principal components 2, 3, and 4 of the analysis in Sect. “A Classic Example of Disarticulated Distance Data” of this paper. I showed how the same result could be arrived at without any PCA simply by inspecting the concentration matrix, inverse of the correlation matrix Wright examined. For GMM data, however, the corresponding matrix argument, covariances of the Boas coordinates, is not invertible. Does one need to revert to examination of second and subsequent principal components to complete a study of integration begun along the lines of Sects. “Landmarks Analyzed by Boas Coordinates” and “A More Complicated Example, Involving Semilandmarks” of this paper, the close and careful examination of BPC1? Figure 19 replicates that graphical style for the second Boas principal components of our two GMM examples. In the left column is a vector plot of the loadings of that second Boas principal component in one or the other direction. (Since it must be orthogonal to the general factor, which is size, it does not have a natural polarity in either setting.) In the other columns are thin-plate spline representations of both directions of those displacement patterns as applied to the corresponding Boas mean form.

Fig. 19
figure 19

Analyses of BPC2 for the two GMM examples in this paper. Left column, displacement plots (at two different arbitrary magnifications); center and right columns, thin-plate splines in the direction of BPC2 and its opposite. See text

In the upper row of Fig. 19 is an analysis of BPC2 for the Vilmann neurocanial octagons, which appears to modify the effect of BPC1 over time (Fig. 8, upper right) in an informative way. The dimension is a combination of a shear and a contrast. The contrast is to some extent over space (cranial base lengthening versus neurocranial height shortening), but its interpretation is not so much about space as about time—according to the upper-right panel of Fig. 8, it applies in this direction over the first three milestones of this animal growth data set, from ages 7 to 21 days, followed by the opposite deformation from age 30 days onward. In either direction, the transformation appears to be highly integrated over the octagon, and so it deserves to participate in summaries of this pattern in spite of its much smaller explained Boas variance if there is some plausible explanation of a biomechanical origin of the change point, for instance, the weaning of these rats around the fourth week of postnatal growth. But, explanation or not, BPC2 is not itself a dimension of integration. Rather, it is the difference between two versions of BPC1, each of which describes growth during a specific ontogenetic interval; BPC2 per se is not an aspect of “integration” any more than PC2 was in Wright’s leghorn example. The biological reality is not a “reversal,” but a sequence of two slightly discrepant geometries of integration. Figure 20 shows the analyses of these two growth domains separately; note how the features identified in Figs. 7 and 8 arise mainly in the second column, growth after 30 days of age.

Fig. 20
figure 20

As already hinted in Fig. 8, the second Boas principal component of the Vilmann data set is best interpreted as a moderator between two integrated growth regimes agreeing in certain aspects and disagreeing in others. Upper row, BPC1 plots for the age ranges of 7 to 21 days (54 octagons) and 30 to 150 days (90 octagons), at the same arbitrary multiple. Lower row, corresponding thin-plate splines, earlier to late Boas mean form, extrapolated by a factor of two

For our other GMM example, the 21 H. sapiens midsagittal configurations in their aligned 24-point form (Fig. 14), the urge to examine this second Boas principal component might be stronger, as this second eigenvector explains around 22% as much variance as the first. The two deformation grids in the lower row of Fig. 19 obviously do not share the appearance of integration seen in the row above. The thickening/thinning artifact of Figs. 16 and 17 is gone now, as the polarity of that change reverses from forehead to occiput—this is indeed a contrast with respect to BPC1—but there is also a larger pattern here, a contrast between horizontal extensions against vertical extension across the diagram as a whole, its directionality concentrated in the maxilla. Yet in the panel at lower left the displacement shows vectors in this region to be quite disorganized. Hence it would be unwise to trust this BPC2 as some sort of expression of a “second-order integration” in the absence of a denser representation of landmarks here in the maxilla, additional landmarks that were not available.

In sum, the second Boas principal component of the octagon data set suggests a valid contrast, early growth against late growth, that would modify the description associated with Fig. 8 to correspond to these two developmental ranges. For the sapiens example, on the other hand, visualization of BPC2 seems to add nothing to the description of integration already arrived at by close examination of BPC1 only. As all the textbooks already say, in a biota where “everything is correlated with everything else,” a second principal component, by itself, cannot be an independent descriptor of integration or any other interesting morphological process. It can serve only as a modifier of its predecessor, principal component 1. Here, that means that BPC2, if it has anything to do with integration, gains its meaning only insofar as it works as a possibly meaningful modifier of the action of BPC1, as it does in the first of our two examples, but not the second.

Pseudoprocrustes plots for the display of integration findings vis-à-vis mean configurations. What matters when a first principal component of some set of Boas coordinates is interpreted in terms of integration is not the ratio of this first eigenvalue to the second (as long as scatters like that in Fig. 8 are reassuring), but instead the relation of its loadings to the multiple geometric features that the investigator should eagerly convert into explanatory suggestions of hypotheses worth testing in new data. This closely matches Wright’s original intention in putting forward the leghorn four-factor finding as one example in his 1932 paper collecting several examples with an emphasis on the corresponding genetics. But the methods here also fulfill Olson and Miller’s original purpose, the combined inspection of associations and functional considerations they meant when they said (Olson & Miller 1951, p. 325) that “attention is focused upon the nature of morphological change at a low taxonomic level including analyses of the mechanism and modes of these changes. ... Much of interest and significance may be gained by carefully planned exploitation of the data that are available.” While the “available data” have surely changed drastically since 1951, the desire to balance mere statistical manipulations against deeper consideration of the “modes” of change (in this paper, the diverse types of geometrical patterns that can be superimposed over one another for diverse subsamples of coordinate pairs) closely matches Olson and Miller’s original theme, the complementarity of alignment between statistical and functional or developmental considerations.

The “pseudoprocrustes plot,” panel (d) of Fig. 7, suggests a further diagram style in connection with the deleterious effect of imbalanced templates that concerned us in connection with any of the H. sapiens analyses that involved paired data directly transecting the calvaria, Figs. 13 through 17. When the Boas mean – BPC1 correlation is large enough (0.9 or better) and there are more than a handful of landmarks, as in this example, it is often helpful to replace the simple Procrustes superposition of that earlier figure with the equivalent conventional thin-plate spline. Figure 21 contrasts the extremes here: on the left, the 14-landmark analysis of Fig. 11, with no information about calvaria thickness; on the right, the full 92-point data resource. The 14-point plot is a suitable icon for the extended interpretation of integration reported there: directional enlargement of the maxilla with hardly any shape change toward the posterior. In showing us how the integration of the data set is so clearly different between the neurocranium and the maxilla, its import is closely analogous to what Jolicoeur (1963) established for the principal component analysis of a data set of logged distances: because the logarithm of a distance is effectively a surrogate for its ratio to its own mean, a principal component of distance data can be understood only in the context of those same means, which is to say, by comparison with an eigenvector proportional to the vector of all 1’s.

Fig. 21
figure 21

Thin-plate splines for two extreme pseudoprocrustes analyses from the sapiens example. The deformation is from the mean landmark configuration to the configuration of the tips of the first Boas principal component, treated simply as displacements from a centroid as in panel c of Fig. 7. Left, only the outer 14-gon; right, the full 92-point data set

The 92-point equivalent display, at right, is far more difficult to decode, owing to the irrelevance of all this small-scale information to the actual theme of integration with which this paper is concerned. The directional hypermorphosis of the maxilla is now partially concealed by the semilandmarks along the palate and the nasal bone, and even though the \(\mu \)-BPC1 correlation has not dropped below 0.90, ordinary digitizing noise along all the semilandmarks of the calvaria obliterates any legibility of the diagram in that vicinity except for that now-familiar general thickening. The result is to confirm the wisdom of Olson and Miller in insisting that pattern analysis of morphological integration not be separated from the causal explanations to follow. Here, the explanation of the integration pattern, BPC1, is calibrated to the mean Boas configuration that must guide its interpretation as an eigenvector. That linkage is very clear in the simpler data configuration, but obscured in the more complex version. In short, what eases the visualization of the form’s geometry as you might hold it in your hand (here, the great increase in spatial density of the Cartesian information resource) does not necessarily suit the understanding of its biology. The pseudoprocrustes grid with fewer landmarks is likely more effective whenever the \(\mu \)-BPC1 correlation is close to unity—whenever the explanation of integration is dominated by a graded centric allometry.

For that reason, it is unfortunate that where Olson and Miller’s (1951) scheme referred explicitly to \(\rho \)F-groups, their 1958 monograph explicitly discarded the F component owing to the fallacious claim that “the hypothesis that \(\rho \)-groups imply F-groups” has been “confirmed.” Even in the Wright example, this is not really true—the skull measures are not, of themselves, a \(\rho \)-group, while the strength of the partial correlations along the supradiagonal of the negative of the concentration matrix is more commensurate with the Fisher z-transformations \(\begin{pmatrix} ~ &{} 1.66&{}1.39&{}1.37\\ 1.66&{}~ &{} 1.41&{}1.49\\ 1.39&{}1.41&{}~ &{}1.78\\ 1.37&{}1.49&{}1.78&{} \end{pmatrix}\) of the underlying \(4\times 4\) submatrix of limb measures than with the raw correlations, which the first principal component fits by least squares instead. More saliently, Wright’s example makes clear what Richard Reyment emphasized in his last paper, Reyment (2013). When morphometric data is sufficiently integrated, all principal components after the first take the seriously misleading form of contrasts, and thus should be quickly discarded as candidates for any sort of biologically meaningful description. Fortunately, GMM’s rich toolkit offers alternatives to this misinterpretation by mapping the loading pattern of PC1 alone onto the landmark configuration diagram whenever the PCA is of Boas coordinates rather than distances. In both of the GMM examples here, the evenly spaced octagon and the badly imbalanced 92-gon, there is immensely more variety in the domain of possible form-features, the mathematical space of possible F’s, than there is information in the covariances alone to guide us toward sensible inferences. Hence the analysis must fully involve the mean form of the configuration via the rebuilding of the landmark subsets to examine different subconfigurations. Only by forwarding the landmark resource to be enriched in the course of factor studies can the investigator manage the tension between those “special factors” that are actually features of BPC1 and others that might be dealing with genuinely incidental components of the circumstances under which any specific sample of organisms was accumulated.

Return for a moment to a theme briefly mentioned in the introduction to this paper, the stereotyped role of null hypotheses in morphometric investigations. I have argued elsewhere (Bookstein, 2014, 2018) that this concept is almost always useless for the multivariate biometrics of either evolutionary or developmental biology. We know a-priori that data from living organisms are patterned at a staggeringly broad range of scales and formalisms of morphological information. The “hypothesis” of no pattern is simple nonsense. What characterizes unpromising inferences in morphometrics is not the absence of a pattern but its noninterpretability—its failure to suggest further investigation. For allometry, the suitable model that stops most further questioning is Jolicoeur’s (1963) isometric model for the first principal component of log-distances whose loadings are a multiple of a vector of all 1’s (although, as Sect. “A Classic Example of Disarticulated Distance Data” showed, the concentration-matrix-based search for pairs can apply here as well). For landmark and semilandmark configurations, the equivalent would be the pattern Bookstein (2015) called self-similarity, the uniform distribution of interpretable features (signal amplitudes) over the range of possible geometric scales. In all these contexts, “interpretation” must be in a context of everything else that is known about the data scheme from the more powerful contexts of biomechanics, bioenergetics, phylogeny, embryology, or genomics—anywhere and everywhere that validated theories have preceded the empirical morphometric study at hand. The role of a GMM integration analysis, in effect, is to restrict the list of potentially interpretable features to those aligned with the first one or two principal Boas components. Such a task, a radical simplification of the organismal biologist’s subsequent efforts, is well worth pursuing.

In this connection one can draw some inferences for GMM methodology from the details of the calvaria representation in Fig. 17. Calling the calvaria a “module” is not helpful, as it participates fully in the general size expansion. Its position suits the TPS approach, but not its local, anisotropic thickening. Parameters like these need to be shifted from the space of Boas coordinates into their own separate vector space. Such a treatment—the separation of finite GMM analyses from differential analyses like thinning/thickening—can be generalized. Here, a report of increased thickening between internal and external Inions is much more cogent than the comment a few pages above that InI “fails to be displaced outward as fast as its neighbors.” (Both statements are arithmetically correct, but the anatomically appropriate frame of reference for a description of the thickening is the centerline of the calvaria, not the centroid of the landmark configuration.) The problems that dogged the unthinking application of GMM tools in this data set (specifically, the inclusion of too closely spaced calvarial semilandmarks in Fig. 15 and elsewhere) arose from a misunderstanding of what the thin-plate spline is capable of doing. Because its mathematics is rigorously isotropic, structures that are obviously directional in their biometric patterning (sheets, shells, tubes, folds, crests, foramina, holes, branches, fenestrae, aneurysms, bundles) must be dealt with by their own parametric subsystems the way we segregated the thinning of the calvaria in Fig. 17 from its positioning in Fig. 11.

While the composite TPS representation risks misrepresenting these highly directional organs and tissues, the Boas principal component approach seems capable of indicating patterns of loadings consistent with such derived parameterizations as long as the count of landmarks counteracts the count of semilandmarks to a sufficient degree. Figure 14 is probably a good balance in this sense, seven pairs against ten unpaired points. The problem is with the axioms driving the thin-plate spline itself as a mathematical model, specifically, the assumption that anatomical space is homogeneous and isotropic everywhere in the diagram, regardless of the positions of the landmarks. Recall the basic description of the underlying TPS model: an infinite, infinitely thin, uniform metal sheet subject to infinitesimal displacements normal to itself (Bookstein 2018, p. 425). But anatomy is neither infinite nor homogeneous in this way. To be truly realistic (which admittedly is not the spline’s actual purpose) one must have recourse to the finite-element analyses parameterized by spatially varying anistropic material properties, information very difficult to articulate to this much less sophisticated “geometric” style of morphometrics. As regards the context of discovery, clearly the existing tools of GMM are insufficient for the task of discovering arches and other extended differential models – all our insights must be couched in the prior jargon of the anatomy atlases. We could not have detected the biometric relevance of the calvaria, as in Figs. 11, 12, 13, 14 and 15, had we not already known of its existence on anatomical grounds and peppered it with (too many) semilandmarks in advance.

In all these settings, with or without semilandmarks, the proper question to ask is neither “Is this data set integrated?” nor “How integrated is this data set?” To be useful as a suite of morphometric findings, the operationalization of integration needs to have taken the form of a composite answer to the different question “How is this data set integrated?” And the answer will refer variously to subsets of distances or subsets of deformation patterns, all defined in principle as patterns over the whole organism. Interpretation of a GMM pattern in terms of some “modularity” reflecting “prior hypotheses” (e.g., Klingenberg, 2009) severely depauperizes the actual range of these morphometric pattern findings. (Philipp Mitteroecker and I have discussed this matter in earlier publications prior to the rediscovery of Boas coordinates. See Mitteroecker and Bookstein, 2007, 2008, 2009.) Even in Wright’s leghorn example, it is not that skull, wing, and leg are “modules,” but that four factors are found to participate in explaining the pattern here, one a “general size” factor and the other three “special size factors.” “Is the roof of the skull a module, or is it not?”—the conceptual basis of so simplistic a question obviously fails to apply here. In the rodent neurocranial example, there is one focal factor along with three extended geometric features of the first Boas principal component. Asking whether there are “modules” here, say, the calvaria vis-à-vis the cranial base, is a question without a purpose. In GMM the concept of modularity seems simply not to apply, owing to the richness of all that auxiliary information about geometrically distributed means and variances along with the library of anatomy atlases. The simplistic algebraic structure of descriptions of “modules”—mere lists of landmarks—is not able to capture the geometry of the phenomena intended to be captured by that term.

A reasonable GMM-compatible definition of “module” requires there to be more than one. The methodology cannot be an analogue of the classic psychometric factor analyst’s pursuit of a simple structure (Thurstone, 1935), a morphometric factor the loadings of which are heavily concentrated on one specific anatomical component of a few landmarks that are closely clustered both by location and by embryology, genetics, or physiology within the organism. (For a general survey of the role of factor analysis in the natural sciences, including GMM, see, again, Reyment and Jöreskog, 1993.) Instead, the range of potentially valid “prior hypotheses” is actually a remarkably broad one, incorporating, among other possibilities, regionalized size variation, displacement/rotation of components, and interactions either along or across some common boundary curve. The dimensionality of this space of possible reifications of modularity is thus far broader than the conventional comparison of within-module to between-module correlations. The only instance of this sort of substructure in the examples of this paper is the emergence of the calvaria as a feature requiring a separate analysis in Sect. “A More Complicated Example, Involving Semilandmarks”, where, you will note, the evidence for “modularity” deals only with thickening, not with position, orientation, or relative size per se.

Thus, all details and caveats notwithstanding, the argument of this paper is easily summarized. When data take the form of anatomically labelled Cartesian coordinate pairs or triples, a description of “integration” capable of alignment with the purposes of evolutionary and developmental biology often goes well as a graphical exegesis based mainly in a detailed decomposition of the loadings of the first Boas principal component together with segregation of nonconforming regions of the form. There results a multifarious morphometric pattern description, in part homogeneous, in part characterized by gradients, and in part focal. This new protocol fulfills the original thrust of Olson & Miller’s (1951) “\(\rho \)F-groups,” the matching of statistical evidence to biologically explanatory guidance, while respecting the enormously greater range of morphological descriptors afforded by well-designed landmark-semilandmark configurations.