Stature is a fundamental component of a person’s body size and morphology. Ιt is a highly inherited, polygenic trait meaning that its phenotype is controlled by different genes. During the last decades, genome-wide association studies have linked numerous SNPs (single nucleotide polymorphisms) to human height, and genomic research focuses on the examination of selection patterns in height variation (Cox et al. 2019, 2021; Friebel et al. 2019; Gudbjartsson et al. 2008; Mathieson et al. 2015; Marouli et al. 2017). However, apart from genetics, stature is the result of a long developmental process and is influenced by changes in environmental and nutritional conditions (Bogin 2001; Hermanussen et al. 2010; Jelenkovic et al. 2020; Mancinelli and Vargiu 2012; Rosenstock et al. 2019; Ruff 2018; Rühli et al. 2008). Malnutrition, lack of dietary protein, and exposure to infections have been found to postpone growth and lead to short stature (Bhan et al. 2001; Bhutta et al. 2017; Cole 2003; Jee et al. 2014; Jelenkovic et al. 2020; Perkins et al. 2016; Pomeroy and Stock 2012). Therefore, anthropometric values and mean stature variations are used as nutritional and health status indexes of past societies.

Stature from human skeletal remains can be estimated by two different methods: the anatomical and the mathematical. The anatomical method, established by Georges Fully (1956) and revised by Michelle Raxter and colleagues (2006), provides the most accurate estimations (Koukli et al. 2021; Raxter et al. 2007; Ruff et al. 2012). However, its application is often limited since it requires measurements from all bone elements that constitute body height (i.e., cranium, vertebrae, sacrum, femur, tibia, talus, and calcaneus). Mathematical methods use only long bone measurements. They are based on regression equations developed on modern reference series with actual or reported living stature or skeletal collections with anatomically reconstructed stature values. Two types of regressions are utilized: the ordinary least squares (OLS) and the reduced major axis (RMA) equations. The traditional OLS regressions produce clear, unidirectional stature estimations (from long bones to stature) and are generally thought to be more accurate (Legendre and Legendre 2012, Table 10.4; Koukli et al. 2021; Smith 2009). However, other researchers support that OLS tend to underestimate tall and overestimate short individuals. They propose, therefore, that RMA models should be used only if the understudy population exhibit a high degree of variability in body size (Auerbach and Ruff 2010; Konigsberg et al. 1998; Konigsberg and Jantz 2018; Maijanen and Niskanen 2006; Pomeroy and Stock 2012; Ruff et al. 2012; Sjøvold 1990). Soft tissue loss affects the estimation of living stature from skeketal height for approximately 10.0 to 13.0 cm and necessitates the use of reliable correction factors (Raxter et al. 2006; 2007; Ruff et al. 2019).

Population-specific equations provide the most accurate results. However, this is difficult to achieve in osteological material of archaeological origin due to poor preservation. Additionally, those that do exist are mainly produced by small reference samples, a fact that may affect the validity of their estimates. In the absence of a population-specific regression, the choice of the suitable mathematical method from the literature is a difficult task. It is suggested that the suitability of mathematical estimations is influenced by the degree of similarity between the anthropometric characteristics of the reference sample and the target population (i.e., body proportions, geographic origin; Feldesman and Fountain 1996; Holliday and Ruff 2001; Jantz and Jantz 1999; Pearson 1899; Ross and Konigsberg 2002; Ruff 1994; Ruff et al. 2012; 2019; Vercellotti et al. 2009; Vercellotti 2018). However, other studies show that there is no clear indicator for choosing the optimal mathematical method and that similarities in body proportions might play a role in the quality of estimations, but this relationship is not always clear and straightforward (Formicola and Franceschi 1996; Koukli et al. 2021; Ruff et al. 2019).

Nowadays, there is a growing interest in the factors (evolutionary, environmental) that influence human morphology. Therefore, valid data on the diachronic anthropometric characteristics and their secular changes are necessary. So far, many studies offer average body height, anthropometric secular trends, and regression equations for past populations that mainly originate from central and northern Europe (Maijanen and Niskanen 2006; Mays 2016; Pomeroy and Stock 2012; Ruff et al. 2012; Ruff 2018; Sládek et al. 2015; Vercellotti et al. 2009). Mediterranean populations are less studied (Cox et al. 2019; Danubio et al. 2017; Rosenstock et al. 2019), whereas data from ancient Greece are scarce (Angel 1945, 1946, 1971, 1972, 1984; Bourbou 2003; McGeorge 1983; Wesolowsky 1973). The few available stature values are estimated using the Trotter and Gleser (1952) method which produces rather inaccurate results for ancient Greek populations (Koukli et al. 2021).

Thus, the aim of the present study is to (a) generate population-specific stature estimation equations suitable for populations of southeastern Europe and especially for ancient Greece, (b) present stature secular trends during Greek antiquity, (c) test the newly generated regressions on other European populations and compare them with widely used stature estimation methods, and (d) introduce the Logarithmic Human Stature Index (LHSI), a promising methodological tool in anthropometry that simplifies inter-population comparisons and offers a new perspective in stature estimation theory.

For this purpose, we collected a numerous dataset of 137 skeletons (8th BC–9th c. AD) with anatomically reconstructed stature values and generated population-specific equations for ancient Greece. We collected anthropometric data and estimated stature in 775 individuals (10th millennium BC–9th c. AD) presenting for the first-time average body height and diachronic secular trends for this part of the ancient world.


The Greek reference dataset with anatomically reconstructed stature values

For the development of population-specific equations, we collected data from 129 individuals, 70 males and 59 females from northern Greece. The 129 individuals preserved all skeletal elements necessary for the estimation of stature by the anatomical method (“Fully” technique modified by Raxter et al. (2006)). The individuals originate from two major rural and urban settlements of antiquity: the Kitrini Limni region (situated near the modern city of Kozani) and the ancient city of Thessaloniki (Fig. 1) (Acheilara 2010; Fotiadis et al. 2000; Karamitrou-Mentessidi et al. 2013, 2016; Makropoulou et al. 2015). One hundred twenty-one individuals date to the Roman period (2nd c. BC–4th c. AD), 2 to the late Hellenistic (3rd–2nd c. BC), and 6 to the Early Byzantine period (4th–9th c. AD). We named this dataset the Greek reference dataset (GRD).

Fig. 1
figure 1

Map of Greece. Geographical and temporal distribution of the osteological material used in the present study. Numbers on the map indicate the archaeological sites. The number of individuals per chronological period and site is given in the table, and bold indicates the number of anatomically reconstructed skeletons

Sex was estimated by the combination of cranial and pelvic morphological traits (Ascádi and Nemeskéri 1970; Buikstra and Ubelaker 1994; Ferembach et al. 1980; Sjøvold 1988). Age was estimated using the cranial suture closure (Ferembach et al. 1980), the pubic symphysis morphology (Brooks and Suchey 1990), and the changes in the auricular surface of the pelvis (Lovejoy et al. 1985). We collected data only from adults, with fused long bone epiphyses and vertebral endplates, without any gross pathologies or injuries that could affect the surface of the bones and thus result in possible biases in measurements. Skeletal age was not considered for the stature estimation since none of the individuals exceeded the age of 45 years, and as thus, no severe errors by soft tissue thickness were expected (Raxter et al. 2006, 2007). Precisely, most of the individuals ranged between 30 and 40 years old (n = 80, 62%) and 20–30 (n = 49, 38%).

Greek dataset for estimating secular changes in stature of prehistoric and historic period

To reconstruct the secular changes in stature of prehistoric and historic period Greece, we collected long bone measurements from 775 individuals, 402 males and 373 females. The skeletal material originates from key archaeological sites of the Greek mainland and the islands and dates from the Mesolithic (10th millennium BC) to the early Byzantine period (9th c. AD) (Fig. 1). The 599 individuals, 296 males and 303 females (Table S1), were studied by the authors. Age and sex estimation as well as long bone measurements were estimated identically to the GRD. For the remaining 176 individuals, 106 males and 70 females, long bone measurements were retrieved from published studies (Table S2; Angel 1945, 1946, 1971; Musgrave and Popham 1991; McGeorge 2018; Manani 2018). We used the published long bones measurements only from adult individuals to whom the estimation of sex was documented by the use of both cranial and pelvic elements, and the measurements were taken using an osteometric board following Martin’s (1928) guidelines. We included this additional dataset because it broadens the temporospatial limits of our investigation. The exact number for every long bones measurements is given in Table S3.

The test datasets

Population-specific equations estimate the stature of the reference sample with high accuracy. To overcome this bias, we tested the newly developed equations to three different test populations:

  1. A.

    The Archaic/Classical period test population from Greece

It consists of eight individuals dated to 8th–4th c. BC: seven individuals (6 males and 1 female) from the ancient cemetery of Akanthos and 1 male from the region of Kitrini Limni who was excluded from the GRD dataset for reasons of chronological homogeneity. Akanthos was a Greek colony, established in the middle of the 7th c. BC in Athos peninsula in northern Greece (Fig. 1) (Hansen and Nielsen 2004; Kaltsas 1998; Trakosopoulou-Salakidou and Paspalas 2006; Tiverios 2008; Tsigarida and Xydopoulos 2015). The 8 individuals were adults and had fully preserved skeletal elements that allowed the application of the anatomical method (Raxter et al. 2006). Thus, they serve as an excellent test sample to compare the efficacy of the newly generated equations. Sex and age estimation of the individuals was performed identically to the GRD.

  1. B.

    The Ruff et al. (2012/2018) test populations

This published dataset includes 1063 individuals (607 males and 456 females) with stature estimation values based on the anatomical method (Raxter et al. 2006) (Ruff et al. 2012; Ruff 2018, pp. 21). This reference sample of European origin is divided into chronological (from Neolithic to Modern times) and geographical groups (Iberia, Scandinavia, France, Italy, Balkans, Britain, Italy, and North and Central Europe).

  1. III.

    The “mean value” test population

We collected published mean values of long bone measurements that were used for the generation of stature estimation formulae. We included data from research articles that provide (1) a large sample size, i.e. > 50, (2) published mean values of long bone measurements (but no individual raw data), and (3) mean stature values that were available either through medical records or were reconstructed by the anatomical technique or by modified cadaveric lengths. In total, 15 research articled satisfied the above criteria (Fig. 2). Table S4 includes all raw data and information retrieved from the 15 studies, as well as descriptions of our calculations. For every population, published long bone lengths, stature, and standard deviation values were used to create a “standard individual” (male, female, or pooled). In order to integrate individuals with more diverse values, in every case where a standard deviation value was available, we used it to calculate two more individuals: the “short” individual (mean stature value subtracting one s.d.) and the “tall” individual (mean stature value adding one s.d.). For example, if the mean stature value for one population is 160.0 cm with a standard deviation value of 3.0 cm, the “shorter” and the “taller” individuals are 157.0 cm and 163.0 cm, respectively. Doing so, 93 standard individuals (48 males, 40 females, 5 pooled) were generated.

Fig. 2
figure 2

Geographical origin of the “mean value test population”. Colored dots illustrate the geographic origin of the anthropological collection and the corresponding reference. Details on the anthropological collection are presented in Table S4


Stature reconstruction based on the anatomical method

Stature was reconstructed using the revised anatomical method of Raxter et al. (2006) and specifically the age-free (A2) equation. Long bone measurements (H1, R1, U1, F1, F2, T1, T1a, T1b, Fib1) were taken after Martin (1928) to the nearest millimeter by the use of an osteometric board. Both left and right sides were measured per individual, and the mean values were used. Data are presented in Tables S5 and S6. Maximum cranial height (BBH) was measured with a spreading caliper to the nearest millimeter. The maximum height of the vertebral bodies (C2-S1) was measured with a sliding caliper and the talo-calcaneal height (TCH) with an osteometric board to the nearest millimeter. The bicondylar length of the femur (F2) and the maximum length of the tibia (T1) from the medial malleolus to the lateral condyle were measured after Raxter et al. (2006). Only a small number of individuals had missing skeletal elements. Specifically, in 11 individuals out of 129, we had to reconstruct the height of two thoracic vertebrae and for one individual the talo-calcaneal height. The reconstruction of missing bones was calculated following the equations of Auerbach (2011). Body proportions were assessed by the lower limb/stature index, the femur/lumbar index, and the tibia/femur index, estimated after Ruff et al. (2019).

Generating a population-specific formula presumes the homogeneity in anthropometric characteristics of the individuals that constitute the reference sample. Therefore, we tested whether the Kitrini Limni and the Thessaloniki datasets were similar enough. We applied the Kolmogorov–Smirnov test with Lilliefors correction to all long bone measurements. Normal distribution to all cases was presented with no statistically significant deviations. We then applied a parametric t-test to compare differences between the two datasets for long bone measurements and the estimated stature values. Statistically significant differences were present only in the measurements of humerus (H1) and radius (R1), but not in any of the other measurements (U1, F1, F2, T1, T1a, T1b, Fib1) or in the stature values (confidence interval: 0.95). Thus, we considered both populations as a uniform reference sample.

In addition, we checked inter alia the Q-Q plots, Cook’s distances, and Bonferroni outlier test in order to identify any extremes in long bone values. Results revealed that 4 measurements (1 humerus (H1: 230 mm), 1 radius (R1: 222 mm), 1 tibia (T1: 375 mm), and 1 fibula (316 mm)) from 4 different individuals were outliers. For this, we excluded from the dataset the four measurements but not the individuals.

Development of the population-specific equation

We generated OLS and RMA sex-specific and pooled sexes regression equations. We present both single and multivariate regressions including the tibia, the F1 and T1, F1 and T1a, and F1 and T1b combinations. Correlation coefficients values (R2 adj.), unstandardized residuals, standard estimation errors (SEE), and mean percent prediction error (%PE) have been calculated to evaluate the performance of the regression equations and to observe differences between the results acquired by the OLS and the RMA models. Sexual dimorphism was calculated as sexual dimorphism% = (mean male stature − mean female stature) * 100 / mean female stature. Differences in mean stature between males and females for each chronological period are given also in centimeter.

We compared the GRD stature values acquired by the newly generated regressions with those from six commonly used mathematical methods (Vercellotti et al. 2009, Maijanen and Niskanen 2010, Ruff et al. 2012 for North/South and Ruff et al. 2019, North/South adjusted equations). Vercellotti et al. (2009) and Maijanen and Niskanen (2010) regressions were selected because they were found to perform satisfactory for estimating stature on ancient Greek population (n = 68) dated to the Roman period (2nd c. BC–4th c. AD) (Koukli et al. 2021). The Ruff et al. regressions were included to examine the performance of the “universally applicable” equations to populations from Ancient Greece. For the comparisons, we used the pooled sexes equations that use both the femur and tibia.

Statistical analyses were done using R 4.0 version (R Core Team 2020). Linear regressions were performed with the standard function “lm,” and RMA models were calculated by the “lmodel2” package (Legendre 2018).

The application of the Logarithmic Human Stature Index

The logarithmic index is inspired by earlier studies in archaeozoology (Meadow 1984; Uerpmann 1982). It is used for comparing the relative dimensions of animals. For each individual species (cow, goat, gazelle, or similar) and each measure collected, the difference from a (commonly agreed and well-documented) standard individual is calculated, and this value is logarithmized. This offers the advantage to pool and compare all measurements, even from different skeletal elements, into a single (logarithmic) size difference between the understudy individual and the commonly agreed standard. Populations of diverse geographical origins and chronological periods get comparable by this approach and are broadly used especially in archaeozoology (Cramer 2002; Hongo 1996, Figs. 1–3; Price et al. 2013, Fig. 2; Uerpmann and Uerpmann 2000, 2008).

We introduce the LHSI as a useful tool for assessing and comparing long bone lengths in a simple numerical way. By that, we can emphasize small differences between anthropometric values and compare them effectively. LHSI is calculated as

$$\begin{array}{c}\mathrm{LHSI}=\mathrm{log}({\mathrm{length}}_{\mathrm{obs}})-\mathrm{log}({\mathrm{length}}_{\mathrm{std}})\\ (\mathrm{length}=\mathrm{long\,bone\,measured\,length},\mathrm{ observed }(\mathrm{obs})=\mathrm{individual\,actually}\\ \mathrm{measured},\mathrm{ standard }(\mathrm{std})=\mathrm{standard\,individual})\end{array}$$

In order to compare whether an individual has longer limbs than the standard individual, we calculate the mean value from all generated LHSI values of each long bone, using the formula: LHSI = mean (LHSI_H1, LHSI_U1, LHSI_R1, LHSI_F1, LHSI_T1, LHSI_Fib1). If the generated LHSI value is negative, the target individual has a lower mean value of long bones than the standard (mean value). If the generated LHSI values are positive, the target individual has longer long bones. The same procedure is applicable to population means. The standard population is an original dataset that is used as a proxy and includes the logarithmized mean values of all available long bone lengths.

In the present study, we set the GRD as the standard population. We combined both male and female values to create a standard individual. We calculated the mean value of right and left sides for each long bone length. For the upper limbs, we used the H1, U1, and R1 values, and for the lower limbs, we used the F1, T1, and Fib1 values (Martin 1928). We then compared each one of the populations in Ruff et al. (2012/2018) following the process described above. Thus, we were able to identify similarities and differences between the different populations and the GRD and to juxtapose these results with the performance of the regressions on estimating stature.

This approach offers many advantages: (1) by changing to a logarithmic scale, small differences are emphasized, while huge differences get less important than without transformation; (2) the standard population can be easily generated and compared with one or many individuals; (3) lengths (or breadths) of various bones can be pooled and generate a single value that represents the size difference between individuals or groups.


The stature estimation of the Greek reference dataset

According to the anatomical method (Raxter et al. 2006), the mean height for the GRD was for males 168.6 cm (s.d.: 6.0), for females 156.6 cm (s.d.: 5.3), and for pooled sexes 163.1 cm (s.d.: 8.2) (Table 1). The mean lower limb/stature index is 52.59%, the femur/lumbar index is 3.15, and the crural index (tibia/femur) is 81.66% (Table S7). As seen in Table 1, the mathematical estimation methods compared to the anatomical method tend to underestimate body height (stature ranges for pooled sexes: 160.8–163.0 cm). The Vercellotti et al. (2009) equation generates the lowest errors followed by Maijanen and Niskanen 2010 regression (mean res. for total: − 0.12 cm and − 0.58 cm, respectively). Ruff et al. (2012) equations, as well as the “adjusted” sets after Ruff et al. (2019), produced an error range from − 0.70 to − 2.30 cm.

Table 1 Stature estimation of the Greek reference dataset: comparison between the anatomically reconstructed stature after Raxter et al. (2006) given in bold and the estimations generated by four regression equations (all values in cm)

OLS and RMA population-specific regression equations for stature estimation

Tables 2 and 3 present the generated OLS regression equations for pooled sexes and their respective errors. We generated single regressions for six different long bones and multivariate regressions that combine the measurements of the femur and the tibia. For single regressions, we observe that lower limbs provide better estimations than upper limb bones. This is evident by the small values of mean errors with respect to the standard deviation. For lower limbs, mean error values range from − 0.001 to 0.017 (s.d.: 3.49–3.88), whereas for upper limbs from − 0.001 to 0.005 (s.d.: 4.09–4.19), respectively. We also observe that the most frequently used measurements of R1, F2, and T1 produce lower errors than the R2, F1, T1a, or T1b. For multivariate equations, the combination of F1 and T1 is better than the F1 and T1a and F1 and T1b, showing the lowest mean error and s.d. values (F1 and T1: − 0.004 s.d.: 3.09, F1 and T1a: 0.021 s.d.: 3.13, F1 and T1b: 0.008 s.d.: 3.14). In general, multivariate sets present lower mean errors followed by the lowest s.d. (range: 3.09–3.14).

Table 2 Single and multivariate OLS equations for pooled sexes generated in the present study (constant is in cm)
Table 3 Calculated errors in OLS equations (pooled sexes, all values in cm)

Tables 4 and 5 present the calculated RMA regression equations and errors. Similarly, to the OLS, lower limb bones generate smaller errors than the upper limbs. Lower limb mean error values range from − 0.009 to 0.007 (s.d.: 3.55–3.95), whereas upper limbs from − 0.089 to 0.006 (s.d.: 4.16–4.24). In general, comparing SEE values and resulting residuals, OLS regressions and especially the F1 and T1 generated in all cases smaller error compared to the RMA (Tables 3 and 5).

Table 4 Single RMA equations for pooled sexes generated in the present study (constant is in cm)
Table 5 Calculated errors in RMA equations (all values in cm)

We also generated OLS sex-specific regression equations (Tables S8S11). Similarly, lower limbs give better estimations than the upper limbs, whereas multivariate sets, and especially the combination of F1 and T1, are superior to the single counterparts. Mean %PE and s.d. were used to compare between the pooled sexes and the sex-specific regressions (Table S12). Pooled sexes equations produce low errors (mean %PE: 0.002) and s.d. (1.91) almost equal to the sex-specific sets (males: 0.000, s.d.: 1.50; females: − 0.007, s.d.: 1.89).

Testing the newly generated population-specific equations

The newly generated OLS femur and tibia (F1 and T1) equation for pooled sexes together with the Ruff et al. (2012) North and South, the Vercellotti et al. (2009) and the Maijanen and Niskanen (2010) femur and tibia regression equations were applied to three test populations.

According to the anatomical method, the mean stature for both sexes in the Archaic/Classical period test population from northern Greece was 161.8 cm (s.d.: 4.09) (Table 6). The newly generated equations provide a mean stature value of 161.5 cm (s.d.: 3.12, mean res.: − 0.33 cm). Similar results provide the Maijanen and Niskanen’s (2010) Eqs. (161.9 cm, s.d.: 2.96, mean res.: 0.10 cm). On the contrary, Vercellotti et al. (2009) as well as Ruff et al. (2012) equations produce a higher mean difference in the range of centimeters. The former overestimates stature (mean res.: 1.84 cm), whereas Ruff et al. North underestimates it (mean res.: − 1.10 cm). The bigger difference (mean res.: − 2.07 cm) was generated by the Ruff et al. (2012), South regression.

Table 6 Stature estimation of the “Archaic/Classical period test population” from Greece using the anatomical method (after Raxter et al. (2006), the newly generated mathematical method of the present study (equations for pooled sexes), and four published equations (Ruff et al. 2012NorthSouth; Vercellotti et al. 2009; Maijanen and Niskanen 2010)

Table 7 presents the results obtained by the application to the “mean values test population”. In general, there are no significant variations in generated errors by the applied mathematical methods (range of the mean res.: 0.09–1.21). The GRD performs poorer than the other three methods, overestimating stature up to 1.21 cm. Maijanen and Niskanen (2010) equations produce a very small difference (mean res.: 0.09 cm). Vercellotti et al. (2009) overestimate (0.90 cm), whereas for Ruff et al. (2012), the North set of regressions is distinctively better than the South (0.18 vs − 1.09 cm).

Table 7 Reported stature of the “mean values test populations” estimated by the newly generated mathematical method of the present study (equations for pooled sexes) and four published equations (Ruff et al. 2012NorthSouth; Vercellotti et al. 2009; Maijanen and Niskanen 2010)

Table 8 shows the performance of 5 regression equations to the dataset of Ruff et al. (2012 and 2018) which we divided chronologically and geographically, according to the original publication. Differences between the estimations and the anatomical values (residuals) show that the Ruff et al. (2012) equation is the most suitable method for this dataset. The Maijanen and Niskanen (2010) regression is equally good for the Early Upper Paleolithic” (− 0.47 cm), the Iron/Roman (0.26 cm), and the “very recent” populations (− 0.27 cm) as well as the groups from Britain (0.07 cm), Iberia (0.10 cm), and Scandinavia/Finland (0.09 cm). Vercellotti et al. (2009) estimations were very close to the actual stature in the Late Upper Paleolithic group (− 0.05 cm). The GRD equations do not produce significant differences compared to the anatomical values (overall difference: 1.65 cm), nor do they differ a lot in terms of accuracy to the rest of the formulae. Good (below or close to 1.0 cm) is its performance to the late Upper Paleolithic (0.96 cm) and Neolithic (0.70 cm) samples, as well as to the northern European populations from Britain (0.92 cm) and Scandinavia/Finland (1.03 cm).

Table 8 Comparison of the generated residuals (given in cm) between the estimations produced by the present study (OLS equations) and by four published regressions for pooled sexes to the datasets collected by Ruff et al. (2012/2018) (grouped by chronological period and geographical region)

The application of LHSI: comparing Ruff et al. (2012/2018) to the GRD

Table 9 presents the calculated LHSI for the GRD, and Table 10 presents the generated LHSI values of Ruff et al. (2012/2018) datasets, when compared to the GRD. In this scheme, an LHSI value close to zero indicates a similarity between the two datasets. Positive LHSI values mean that the Ruff et al. dataset includes overall taller individuals than the GRD, and negative values that includes shorter individuals. We observe that the GRD, i.e., a Roman period population from Greece, is similar in body size to the Iron Age/Roman period populations from France and North/Central Europe (− 0.0001 and − 0.0008, respectively). These populations are slightly shorter (in mean long bone lengths) than the GRD (Iron/Roman France: 161.6 cm, North/Central Europe: 161.5 cm, GRD: 163.1 cm). Differences are small between the GRD and the Neolithic groups from Britain (− 0.0006), Scandinavia/Finland (0.0006), and the early modern population from France (− 0.0007).

Table 9 The calculated LHSI values for the Greek reference dataset with the anatomically reconstructed stature values (after Raxter et al. 2006)
Table 10 Anthropometric comparison with the application of LHSI between the Greek Reference Dataset (GRD) and the chronologically and geographically clustered Ruff et al. (2012/2018) datasets

Stature secular trends for prehistoric and historic period Greece

We applied the newly generated OLS regressions to the 775 individuals, 402 males and 373 females, from ancient Greece (Table 11, Fig. 3).

Table 11 Secular trends in body height of ancient Greek populations (9,000 BC-842 AD). Estimated by the newly generated OLS equations
Fig. 3
figure 3

Boxplots showing stature in Greece from the Neolithic to the Byzantine period for male (n = 402) and female (n = 373) individuals. Stature is estimated by the newly generated OLS equations. Black horizontal line indicates the median. Interquartile ranges (box) show the dispersion of 50% of scores that are close to the median. Whiskers show the high/low extremes (rest 50% of scores)

The male Mesolithic individual from the Aegean island of Astypalaia was 166.2 cm tall. In the Neolithic period, the mean male height was 159.7 (s.d.: 8.6). The distribution of values is slightly negatively skewed; upper values range between 163.0 and 168.0 cm and lower values 145.0 cm. During the Bronze and the Iron Ages, mean male stature was 163.1 cm (s.d.: 6.3) and 164.3 cm (s.d.: 7.6), respectively. During the Archaic/Classical and Hellenistic periods, mean male stature was 166.1 cm (s.d.: 5.8) and 165.3 cm (s.d.: 5.8). Values are symmetrically distributed during the Bronze Age. During the Iron Age, males show a widening of the interquartile range and a positive skewness, a characteristic rather reversed in the Archaic/Classical times. In the Roman period, mean male stature was 166.2 cm (s.d.: 6.1), and males continued to get taller in the Early Byzantine period showing an average value of 170.6 cm (s.d.: 6.6). In both periods, stature values are symmetrically distributed. Overall, four outliers are observed. One male from the Bronze Age was 180.5 cm tall (almost 17.0 cm taller than the mean), one male from the Hellenistic period was 152.6 cm (almost 12.7 cm shorter than the mean), and two males from the Roman Period were 184.9 and 188.4 cm (almost 22.0 cm taller than the mean).

Neolithic females were on average 155.6 cm tall with a small standard deviation value (2.5 cm) and a very narrow distribution of the data (interquartile range: approximately 3.0 cm). Bronze Age females showed a mean of 153.1 cm (s.d.: 7.2) and a symmetric distribution of the values. Mean female stature in the following periods increased. In the Iron Age, it was 156.9 cm with a high standard deviation (s.d.: 10.6 cm). Distribution of the 50% of the values is wide (approximately 12.0 cm), and the overall spread is rather positively skewed. During the Archaic/Classical period, the average female stature was 157.8 cm (s.d.: 5.4) and decreased during the Hellenistic period, 155.6 cm (s.d.: 4.4). During both periods, the interquartile ranges were rather narrow (approximately 4.0–8.0 cm), and the data were slightly negatively skewed. Females during the Roman period were on average 157.6 cm tall (s.d.: 5.7) and during the Byzantine period 160.4 cm (s.d.: 6.8), whereas the range of the values is large. Outliers were observed during the Bronze Age when one female was 134.0 cm (reported also by Angel (1971)) and during the Roman period when two tall females were 173.0 and 178.0 cm.

Sexual dimorphism in the Neolithic period is 2.6% (4.1 cm) but increases in the Bronze Age to 6.5% (10.0 cm). In the early Iron Age, it decreases to 4.7% (7.4 cm) and increases in the Archaic/Classical and Hellenistic periods, 5.3% (8.3 cm) and 6.2% (9.7 cm), respectively. During the Roman period, it was 5.4% (8.5 cm), and it increased during the early Byzantine period to 6.4% (10.2 cm).


Population-specific regression equations for ancient Greeks: the guidelines

We generated the first OLS and RMA stature estimation regression equations based on skeletal material of prehistoric and historic period Greece. In brief, we present the basic guidelines for their application. For OLS equations, we recommend the following:

  • The equation that includes all long bones should be preferred, followed by those that use only the lower limbs. If lower limbs are not available, the equation that uses the humerus gives the most accurate results.

  • Multivariate regressions based on lower limb bones and especially the F1 and T1 equation provide slightly more accurate estimations than the single bone equations.

  • For the femur and tibia, both F1 or F2 and T1 or T1a measurements are equally accurate.

The better performance of the lower limb multivariate equations is evident in the calculated errors of the RMA regressions. This is a common observation (Ruff 2018; Sládek et al. 2015) and indicates that lower limb bones contribute more in the final body height (Albanese et al. 2016; Sládek et al. 2015; Vercellotti et al. 2009). Both OLS and RMA regressions estimated stature effectively, probably because they are generated by a very large sample (> 100). However, due to small differences (slightly better SEE values, lower errors, and resulting residuals), we recommend the use of OLS equations. Finally, for the pooled sexes, OLS sets were found to produce lower average errors and s.d. values (mean %PE: 0.002, s.d.: 1.91), almost equal to the sex-specific set. Therefore, regressions for combined sexes are preferred (Albanese et al. 2016; Howley et al. 2018) since sex estimation is frequently uncertain.

The Greek reference dataset

Males (n = 70) from the GRD had an average height value of 168.6 cm, and females (n = 59) of 156.6 cm. Compared to other, contemporary populations, they were approximately as tall as the inhabitants of Scandinavia/Finland (males: 171.9 cm, females: 160.9 cm) and the females from Britain (males: 164.14 cm, females: 156.51 cm). Τhey are also taller compared to their European neighbors (France/Italy: males 162.1 cm, females 155.3 cm; Central Europe: males 165.5, females 154.2 cm; Spain: males 163.2 cm, females 151.8 cm; Sardinia: males 164.4 cm, females 152.1 cm) (Holt et al. 2018; Martella et al. 2018; Sládek et al. 2018).

Comparing the generated stature estimations

In a previous study (Koukli et al. 2021, Tables 2 and 4), we compared the stature values between the mathematical and the anatomical methods. We showed that differences between the two approaches lie within 1.0–2.0 cm and that Vercellotti et al. (2009) and Maijanen and Niskanen (2010) were the two most suitable methods.

In the current study, the newly generated regressions perform very good on the Archaic/Classical test population from ancient Greece (Table 6). They produce a very small difference (− 0.33 cm) compared to the anatomically reconstructed stature and maintain the population’s variability (seen in s.d. values).

Compared to other mathematical methods, Maijanen and Niskanen (2010) performed equally good. The stature estimation is really close to the anatomical mean stature (mean res: 0.10 cm), but it produces a smaller standard deviation. When applied to the GRD (Table 1), this method produces slightly higher residuals, but still within the range of millimeters, and smaller errors compared to the other methods.

We also applied the newly generated equations to the “mean values” test population, and we observe that even though our regressions provide quite adequate estimations (mean difference: 1.21 cm, s.d.: 8.63 cm), they are not as close to the actual stature as the estimation provided by the other four methods (Table 7). This indicates that there are more differences between the GRD population and the “mean values” test population and confirms our initial argument that the identification of the most suitable prediction method is a difficult task. In the same line of evidence, the good performance of the four methods is related to the fact that their reference series are included in the “mean values test population” (see legend in Fig. 2). Thus, similarities in anthropometric values between the target population and the reference sample of equations are a significant factor in the efficiency of the stature estimation method. However, to which extent they influence the stature estimation procedure and how we could a priori predict the right formula still remains unknown.

The common geographic origin has been proposed as a fundamental criterion for stature estimation based on the assumption that individuals from different climatic zones exhibit different body proportions (Eveleth and Tanner 1990; Pomeroy et al. 2021; Ruff 1994; Ruff et al. 2012, 2018; Savell et al. 2016). Ruff et al. (2012) composed a wide (temporal and spatial) reference dataset of European origin and generated a North and a South regression equation that could be effectively applied to any population. We applied both North and South regressions to all three test datasets. Mean residuals by the Ruff et al. (2012) universal method were the highest (1.10 to 2.30 cm). Even though our reference sample is of southern European origin, it is the North equation that gives the better estimations (mean res.: − 1.10 cm) than the South (mean res.: − 2.07 cm) (Table 6). Likewise, our newly generated regressions provide better estimations for the North (mean res.: ~ 1.0 cm) rather than the South populations (mean res.: > 2.0 cm) (Table 8).

The comparison between the body proportions of the GRD and the Ruff et al. datasets (shown in Ruff et al. 2019, p. 4, Table 1: lower limb/stature: 52.58, femur/lumbar: 3.15, tibia/femur: 82.28 and Table S7) indicate more similarity of the GRD to Northern than to Southern Europeans (52.89, 3.22, 83.07, respectively). Thus, while the comparison in terms of body proportions explains the better performance of the GRD regressions to the North populations, it fails to explain in depth why the North equation does not produce optimal estimations for the GRD compared to the other two methods of Maijanen and Niskanen (2010) and Vercellotti et al. (2009) (Tables 1 and 8).

This indicates that although body proportions are related to stature estimation, their role is not very clear, and they cannot serve alone as an indicator for the optimal mathematical method (Koukli et al. 2021). The so far widely acknowledged scheme that high similarities in body proportions between reference series and target populations lead to better stature estimations (Holliday and Ruff 2001; Formicola and Franceschi 1996; Ruff et al. 2019) is more complex and not straightforward. As an alternative, the LHSI method compares body size between populations (Table 10) based on the logarithmized values of long bones. As a case study, we compared the GRD (standard) with the different Ruff et al. (2012/2018) datasets.

Combining the LHSI values (Table 10) with the calculated differences in estimations (Table 8), we get the following observations:

  1. 1)

    The GRD is very similar in size with the Neolithic populations from Britain (− 0.006) and Scandinavia (0.0006), and probably because of this, the newly generated equations manage to provide very good estimations for the Neolithic sample from Ruff et al. (2012), overestimating stature for only 0.70 cm.

  2. 2)

    Differences between the GRD and the three populations in the Late Upper Paleolithic cluster are rather small (from − 0.0028 to 0.0070). Again, our equations provide fine estimations for the group with an error below the 1.0 cm (0.96 cm).

  3. 3)

    GRD is similar in size with 3 (n = 151) out of 5 (n = 110) populations included in the Iron/Roman group: almost identical to those from North and Central Europe (− 0.0008) and France (− 0.0001); slightly different from the one from Britain (− 0.0044); and different to those from Scandinavia/Finland (0.0226) and Italy (− 0.0113). As expected, this similarity is translated into good prediction values: GRD estimations deviate from the anatomical values for only 1.06 cm, being the third-best prediction model for this cluster.

Therefore, we observe here a positive relationship between the LHSI and the quality of estimations: The smaller the LHSI value, the better the estimations. For this reason, we propose the use of LHSI as a new way for comparing populations (regardless of ratios and body proportions) and as a secure method to indicate the most suitable stature estimation formula. We are confident though that in the future, the LHSI could be proved as a useful tool for inter-population anthropometric comparisons.

Diachronic body height trends in ancient Greece

The application of population-specific equations to 775 individuals (402 males and 373 females) from Greece provided for the first-time secular changes in stature over the last 10,000 years (Table 11, Fig. 3). The only male Mesolithic (∼10th mill. BC) individual shows a stature of 166.2 cm (Efstathiou 2022; Stravopodi 2022). The so far available data from Mesolithic Greece included one female individual (THE2) from Theopetra cave in Thessaly. She was 157.0 cm tall, estimated after the Bass (1987) method (Stravopodi et al. 1999). According to Formicola and Giannecchini (1999) and Ruff (2018), mean male stature in Mesolithic Europe was quite high (West/East Europe males 160.0 cm/173.2 cm) (Formicola and Giannecchini 1999, Table 5; Ruff 2018, pp. 76–81). The increased stature has been linked to environmental and nutritional factors, i.e., broad-spectrum foragers with adequate diet and less seasonal shortage. Although we refrain from drawing conclusions only by one individual, palaeoclimatic data describe the Aegean island with temperate climates and mild winters (Asouti et al. 2018; Bottema 2003), and sporadic data indicate meat and marine-based nutrition (Nafplioti 20102011; Papathanasiou 2003; Stiner and Munro 2011; Trantalidou 2010).

During the Neolithic period (∼6.500–3.200/3.000 BC), males in Greece exhibited an average body height of 159.7 cm, which is the lowest compared to all other periods, and minimum values reach down to 145.0 cm. Females were on average 155.6 cm tall, and the range of their values is small (approximately 7.0 cm) compared to males (approximately 23.0 cm). However, sexual dimorphism is low (4.1 cm) compared not only to the following chronological periods, but also to other Neolithic populations (in Finland: 13.8 cm, France/Italy: 9.0 cm, Britain: 5.1 cm) (Holt et al. 2018; Niskanen et al. 2018; Ruff 2018). Differences in stature’s variability between males and females might result among other factors, i.e., ancestry from different biological mechanisms of growth. Males are more susceptible to negative changes in living conditions (such as malnutrition) which explains the loss of stature, but at the same time, their growth phase is more prolonged than women who reach their plateau a few years after menstruation (Bogin 2020, pp. 287–289; Cole et al. 2015; Hasibuan et al. 2020; Silventoinen et al. 2001; Stinson 1985). Thus, males are expected to have a reduced stature but also higher diversity in values, compared to females that will simply retain a low mean stature with most of the values restrained close to the mean.

The decrease in stature values from the Mesolithic to the Neolithic period is evident in many European populations (Niskanen et al. 2018 pp. 68; Ruff 2002; Sládek et al. 2018, pp. 337–343). Apart from the anthropometric data, recent ancient DNA analyses indicate the presence of selection for low stature in the Neolithic Iberia (Mathieson et al. 2015). Early farmers who were characterized by low stature during their migration from Anatolia to Central and Western Europe influenced the genetic background of local populations (Mathieson et al. 2015, 2018). The genetic discontinuity between local hunter-gatherers and early farmers has been well acknowledged (Bramanti et al. 2009; Brandt et al. 2013; Hofmanová et al. 2016; Rosenstock et al. 2019; Skoglund et al. 2012). The factors responsible for the low-stature adaptation have been linked to the transition from foraging to agriculture that equaled lack of nutrients and animal protein, overcrowding, and spread of infectious diseases (Cox et al. 2019; Hermanussen 2003; Larsen 2017, pp. 456–474; Rosenstock et al. 2019; Ruff 2002, 2018). The few but informative skeletal analyses of Neolithic skeletons (from Nea Nikomideia, Alepotrypa, Argolis, Lakonia, Theopetra, Perachora) indicate the presence of increased pathological conditions (i.e., osteoarthritis, anemia, porotic hyperostosis, cribra orbitalia) associated to changes in living conditions (Isaakidou et al. 2018; Papathanasiou 2005; Stravopodi et al. 2009; Theodorakopoulou and Karamanou 2020; Triantafyllou and Adaktilou 2014).

The transition to Bronze Age (∼3200–1200 BC) signified an increase of approximately 3.4 cm in male stature and a difference between males and females of 10.0 cm. Female stature decreased on average by 2.0 cm. Despite the loss, females show more variability (137.0–171.0 cm) compared to the Neolithic period. Bronze Age male Europeans became taller, whereas females either slightly decreased or stabilized (Holt et al. 2018, pp. 245–246; Ruff and Garvin 2018, pp. 287; Ruff 2018, pp. 79; Sládek et al. 2018, pp. 323).

Recent ancient DNA findings highlight for Bronze Age Greece: the genetic continuity of the Bronze Age inhabitants to their Neolithic ancestors but also gene flow from the Pontic-Caspian Steppe region to the Aegean during the Middle Bronze Age (2600–2000 BC) (Clemente et al. 2021). Similar gene flow events have been observed in central and western Europe (Clemente et al. 2021; Juras et al. 2018; Racimo et al. 2020; Rosenstock et al. 2019; Saag et al. 2021) and were linked to populations taller than the Neolithic farmers (Mathieson et al. 2015).

However, Bronze Age Greece is characterized by a general improvement of living standards that favors the increase of stature (Bogin et al. 2015; Hermanussen and Scheffler 2016; Rosenstock et al. 2019; Roberts and Cox 2003, 2007; Niskanen et al. 2018).

Cultivation and pastoralism were stabilized signifying a better nutrient diet (crops, wine, oil, dairy products, etc.). Urban revolution was expanding, and as a result, we witness the existence of well-stratified Bronze Age societies (i.e., Minoan, Mycenean, Cycladic civilizations) with an increased mobility between the urban centers (Cline 2012; Harding 2000; Isaakidou et al. 2018; Koukli 2008; Urem-Kotsou et al. 2018a,b; Valamoti 2011a,b; Whelton et al. 2017). Archaeology indicates complex social systems, large necropoles, trade networks, large storage facilities, variability in pottery styles, and craft specialization. From the Iron Age (∼1200–700 c. BC) until the Classical period (4th c. BC), stature marks a stable increase in both sexes (males: 164.3–166.1 cm; females: 156.9–157.8 cm). Variability in stature values is larger in the Iron Age and reduces in the Archaic/Classical periods. It may indicate a diversity of the Iron Age populations as a result of Bronze Age gene flow or diversity in living conditions. Overall, the so-called Dark Ages, which coincide with the Iron Age, did not last long (Coldstream 2003; Dickinson 2007; Koukli 2008; Morris 2007; Malkin 1987; Whitley 2001). Even though some settlements were abandoned, others were established and occupied for a short period of time, whereas some were reinforced with newcomers from various sites of Greece (Eder and Lemos 2020; Hall 2007; Whitley 2001). Additionally, agriculture was gradually revived, metallurgy became well-established, trade networks were restored, and the Phoenician alphabet was adopted (Coldstream 2003; Osborne 2009, pp. 35–152; Papadopoulos 2015). In the following centuries, a new era began. Settlements took the form of komai and poleis (i.e., unwalled settlements, city-states), population increased, Greek literature arose, the Olympic games and a wide network of sanctuaries and temples were established (Hall 2007; Hammond 2000, pp. 345; Martin 2000, pp. 36–50; Morris 2007; Papadopoulos 2015). All of the above signaled a significant rise of the living standards depicted in the increase in stature values. Contemporaneous populations from France, Italy, and Iberia (Siegmund 2010, pp. 81, Table 38, Figs. 7–8; Holt et al. 2018; Ruff and Garvin 2018) show similar stature trends.

For most of the Greek mainland, the Hellenistic period (∼4th–2nd c. BC) includes warfare events and conflicts not only between the local populations but also with foreigners (such as the Illyrians and the Persians) (Boardman and Hammond 2006; Martin 2000, pp. 184–190; McInerney 2018, pp. 272–292; Roberts 2017, pp. 33–90; Roisman 2011, pp. 382–464). We observe this negative change in the living standards on the male and female mean stature decreased (165.3 cm, 155.6 cm, respectively) and the reduced variability (range of the 50% in males: 163.0–168.0 cm, females: 153.0–158.0 cm). However, since the population from the Hellenistic period is rather small, we caution for this interpretation.

During the Imperial Rome and Pax Romana (until 2nd c. BC), Greek city-states recovered, experiencing a long period of peace. Many of them became centers of Hellenic culture, whereas arts, literature, and trade were flourishing. People relocated from Rome to Greece for reasons of education, trade, military support, and secure trade routes (i.e., Roman colonies) or due to manumission (i.e., former slaves that were granted their freedom). Therefore, positive changes in living standards took place such as water supply improvements, sanitation systems, the foundation of new city-states, economic growth, and the application of strategic measures to reduce famine (Allamani-Souri 2003; Fagan 2012; Finlay 1877, pp. 73–90; Garnsey and Saller 2014, p. 150–260; Killgrove 2010; Kron 2005; Millar 2006, pp. 106–200; Morris 2004, 2007; Nilsson 1921; Noy 2000; Oliver 2000, pp. 81–120; Scheidel 2010). All of the above equaled a gradual increase in male (166.1 cm) and female (157.6 cm) mean stature, and we should notice that stature was symmetrically increased in both sexes. Paleopathological and isotopic studies support this hypothesis, showing at the same time that during childhood, boys and girls had the same diet and that health status increased (Bourbou and Garvie-Lok 2015; Kwok 2015; Rife 2012, pp. 457–468).

The steady increase continues towards the early Byzantine period (∼4th–9th c. AD). Men were on average 170.6 cm and females 160.4 cm. The overall range in males is 35.0 cm and in females 30.0 cm. Sexual dimorphism reached similar values to modern times. The solidification of urban centers and improvements in agriculture and diet led to the gradual amelioration of growth patterns.

Finally, according to recent data (World Population Review, twenty-first century), the average male height in modern Greeks (during the last decade) is 179.3 cm and female 165.8 cm. By that, Greeks exhibit an approximately 10.0 cm (males) and 6.0 cm (females) increase from the early Byzantine period onwards. Similar patterns have been observed in other European populations, linked to urbanization, industrialization, and changes in dietary patterns (Ruff 2018, pp. 76–82). However, studies from modern Greece have found a temporal stature loss in men and women (mean: 164.3 cm and 155.0 cm, respectively) during the difficult sociopolitical changes that took place during 1912–1950 (world war I and II, famine, etc.) (Bertsatos and Chovalopoulou 2017; Eliakis et al. 1966; Manolis et al. 1995; Papadimitriou et al. 2008).

There is no easy way to define stature’s plateau and thus to quantify the influence (positive or negative) of living standards to growth. We cannot exclude other parameters, such as genes and migration, that provoke fluctuations in stature trends. But we have two important findings: (a) in both ancient and modern populations, when environmental and social conditions improve, stature is gradually restored; (b) males are always more susceptible (in stature) to outer conditions, whereas females tend to retain a more stable course in stature fluctuations. Finally, we observe that previous studies estimated stature in ancient Greece by the application of the Trotter and Gleser White (1952) equation (Angel 1945, 1971, pp. 128–151; Koukli et al. 2021, Table 9). Comparing our results to these reports, we see that stature was usually underestimated (for approximately 1.0–4.0 cm depending on the period). Therefore, we suggest the use of the newly generated population-specific methods for estimating stature in this important region of the ancient world.