Introduction

In human populations, males and females normally differ in dental size. On average, males have larger teeth than females and this characteristic could be used in sex estimation (Garn et al. 1964, 1966; Ditch and Rose 1972; Kieser et al. 1985; Hattab et al. 1997; Işcan and Kedici 2003; Hassett 2011; Viciano et al. 2015). The most commonly reported tooth measurements for sex estimation are the maximum mesiodistal and buccolingual crown measurements (Black 1978; Hattab et al. 1997; Kondo and Townsend 2004; Acharya and Mainali 2007; Pereira et al. 2010; Mitsea et al. 2014; Gonçalves et al. 2014; Sharma et al. 2015). These measurements, however, are difficult to obtain in worn teeth or crowns that are embedded in the jaw. To solve this issue, alternative measurements of cervical tooth diameters were proposed by Hillson et al. (2005). Cervical measurements are useful in the studies of prehistorical skeletal remains as they allow for the inclusion of teeth with alterations on the crown due to wear, pathology (e.g. caries), cultural modification or post-mortem damage. This allows a larger dataset to be obtained with a broader range of ages represented.

Studies on sexual dimorphism in cervical mesiodistal and buccolingual measurements have demonstrated that the canine is the most sexual dimorphic tooth (Starp 1990; Ellendt 1993; Alt et al. 1998, Vodanovic et al. 2007, Viciano et al. 2011, 2013, 2015; Hassett 2011, Mujib et al. 2014, Viciano et al. 2015). This is similar to the conclusion reached in the previous studies on crown mesiodistal and buccolingual measurements (Garn et al. 1966; Yuen et al. 1997; Işcan and Kedici 2003; Vodanovic et al. 2007; Cardoso 2008; Zorba et al. 2011; Khamis et al. 2014). In addition, studies on cervical measurements of molars suggest that the first and second molars are also among the most dimorphic teeth (Alt et al. 1998; Starp 1990; Ellendt 1993; Zorba et al. 2012, 2013).

A large number of studies have demonstrated that the degree of sexual dimorphism in teeth varies between populations (Bishara et al. 1989; Ateş et al. 2006; Prabhu and Acharya 2009) as a result of genetic and environmental factors (Kieser 1990; Hughes and Townsend 2013). Therefore, the collection of data from different populations is important for dental sexual dimorphism. As currently, there is no reference odontometric data from Iranian archaeological populations, the present study contributes to the development of standards for sex estimation.

The Hasanlu population, which is the focus of this study, represents a distinctive sample of anthropological and archaeological value due to the unusual way in which the sample was formed. Most of the biological remains belong to victims of an instantaneous catastrophe and so the paleodemographic data is synchronous. Moreover, the Hasanlu osteological collection represents one of only a few well-preserved skeletal collections from Iron Age Iran.

The present study aims to examine the level of sexual dimorphism in permanent teeth of Iranian archaeological populations using cervical mesiodistal and buccolingual diameters and to assess the applicability of cervical measurements in sex estimation based on discriminant function analysis.

Material and methods

Material

This study is conducted on the skeletal remains of 107 individuals from Hasanlu site (West Azerbaijan, Iran) dating from 1450 to 800 bc and 36 individuals from Dinkha Tepe (15 mi west of the Hasanlu site) dating from 1350 to 800 bc. Hasanlu is the largest and most important archaeological site in Gadar River Valley in north western Iran. This settlement was destroyed by fire during a battle probably with Urartians around 800 bc (Dyson 1965).

Hasanlu and Dinkha Tepe are located 15 mi apart, and they are determined to be of similar age (Muscarella 1988). Dinkha II and III burials (1350-800 bc) contained cultural material paralleling that of Hasanlu period IV and V (1450-800 bc) (Muscarella 1974, 1988), which are the main focuses of this study.

The skeletons of the Hasanlu and Dinkha Tepe individuals are stored in the University of Pennsylvania’s Museum of Archaeology and Anthropology (UPM), USA. In total, 1334 maxillary and mandibular teeth of 143 skeletons (95 males, 48 females) are studied. The sex of the skeletons was estimated using the morphological features of the pelvis (differences in the ventral arch, subpubic concavity, and ischiopubic ramus (Phenice 1969)) and skull (nuchal crest, mastoid process, glabella, supraorbital margin, mental eminence, and shape of orbit (Walker 2008)). All the 143 individuals used in this study are adults aged between 20 and 65 years for both males and females. The age at death of the individuals was estimated based on the degree of dental wear (Miles 2001; Buikstra and Ubelaker 1994), changes in the pubic symphyseal face (Brooks and Suchey 1990) and ilium auricular area (Buckberry and Chamberlain 2002), and the closure of cranial sutures (Meindl and Lovejoy 1985).

Data acquisition

Tooth measurements are taken on the loose teeth as well as on teeth intact in the jaw using the Hillson-Fitzgerald Paleotech dental calliper (modified digital Mitutyo needlepoint calliper) that was specifically developed to measure the diameters of teeth even if they are in the jaw. Mesiodistal cervical measurements are taken as the greatest mesiodistal dimension parallel to the occlusal and buccal surface measured in the cervical part of the tooth crown (Vodanovic et al. 2007), and buccolingual cervical measurements are taken as the maximum measurement at the cement-enamel junction from buccal to lingual surface (Hillson et al. 2005).

Dental measurements are taken from the left and right maxillary and mandibular teeth. To avoid the possibility of incorrect measurements, the samples with caries, heavy calculus deposits, and hypoplastic defects along cemento-enamel junction were excluded. Due to the small number of maxillary third molars, as well as their high degree of variation in size (Townsend et al. 2016), measurements for this tooth were excluded from the discriminant function analysis. In total, 1324 teeth—1007 teeth from Hasanlu and 317 teeth from Dinkha Tepe—are measured (Table 2).

Statistical analysis

Bilateral asymmetry of right and left side in the entire sample is tested using a paired t test.

To assess the intra-observer error, mesiodistal and buccolingual cervical measurements are collected from 619 randomly selected teeth from the original sample at a different time by a single observer (SMK). Technical error of measurements (TEM), relative technical error of measurements (rTEM), and the coefficient of reliability (Ulijaszek and Kerr 1999) are used to determine the differences between two sets of measurements.

A one-way ANOVA is used to compare the mean differences between males and females. An independent sample t test is carried out to determine if there was any statistical significant differences between Hasanlu and Dinkha Tepe collections. The percentage of sexual dimorphism for each measurement is also calculated using the Garn et al. (1967) formula: (male mean–female mean) × 100 / 2.

Univariate discriminant function analysis is used for each variable. Stepwise discriminant function analysis is used to determine which variables best discriminate males and females. Separate discriminant analysis is conducted for cervical measurements of teeth, separately by tooth class (incisor, canine, premolar, and molar) and position (maxillary and mandibular). Many studies have shown that canines are the most sexually dimorphic teeth; therefore, we added maxillary and mandibular canines to each function to indicate whether the classification success would increase. In addition, in order to increase the applicability of technique where the dentitions are not well preserved, the analysis is also conducted for each tooth separately.

Bootstrapping is used in all cases to account for possible biases due to the small sample size.

A leave-one-out classification procedure is also used to demonstrate the accuracy rate of the original sample and the one created by cross-validation. Posterior probabilities of each individual are also calculated as they reflect the affinity of each case to be reassigned to the original group according to the value of the discriminant score (Kranioti et al. 2008). Statistical analysis is conducted using SPSS 23 software programme.

Results

Mean differences of all measurements between left and right maxillary and mandibular teeth were found to be statistically insignificant (p > 0.05). As a principle, we used measurements from the right maxillary and mandibular teeth for the analysis. In the case of a missing value from the right side, the left antimere was substituted.

Table 1 shows the differences between the mean values, TEM, rTEM, and the coefficient of reliability (R) for mesiodistal and buccolingual measurements of anterior and posterior teeth. The values of the TEM and rTEM were low, varying between 0.02–0.07 mm and 0.26–0.72 mm, respectively, with the exception of the mesiodistal measurement of mandibular second incisor (1.81 mm). R values ranged between 0.99 and 1.00. The only variable that showed a coefficient of reliability less than 99 % was the mesiodistal measurement of mandibular incisor (0.96).

Table 1 TEM and rTEM results evaluating intra-observer error in cervical tooth measurements

In the mesiodistal and buccolingual measurements, both maxillary and mandibular teeth showed no statistically significant differences between the mean values of Hasanlu and Dinkha Tepe skeletons (p < 0.05) (Table 2), except for mesiodistal maxillary M3. Accordingly, the two samples were subsequently pooled to increase sample size for analysis.

Table 2 Independent Student’s t test comparing the means between Hasanlu and Dinkha Tepe collections

The results of the one-way ANOVA indicated that the differences between male and female mean values were significant for all measurements (p < 0.001) except for mesiodistal upper M3. Table 3 shows descriptive statistics, associated univariate F-ratio, and percentage of sexual dimorphism. The most sexually dimorphic measurements was the mesiodistal diameter of mandibular canine with 14.99 percentage of dimorphism followed by the mesiodistal diameter of maxillary canine and maxillary second molar each with 13.93 and 13.40 percentage of dimorphism respectively (Table 3).

Table 3 Descriptive statistics, %SD (sexual dimorphism), and F ratio of the differences between the sexes

Discriminant function analysis was carried out for samples >20 individuals that had relatively equal size groups. Maxillary M3 was removed from the discriminant analysis due to the small number of data, and mandibular M3 was also excluded from functions 5 and 9 to increase the sample size. Tables 4, 5, and 6 show the tooth variables in each function that were used in discriminant function analysis. Functions 1 to 9 (Table 4) demonstrate the results of stepwise discriminant function analysis using cervical measurements of each tooth type. Maxillary and mandibular canines also added to functions 6 to 9 to examine whether the classification accuracy would increase. Functions 10 to 21 also demonstrate the results of discriminant function analysis for each tooth separately (Tables 5 and 6). F value gives an indication of the contribution of variables entered in the equation to separate sexes. Unstandardised (raw) coefficient is used to calculate the discriminant scores. The sex of the individual can be assessed by multiplying the tooth measurement with its respective unstandardised coefficient and added to the constant. The sectioning point was set to zero for all the functions; therefore, if the value obtained is greater than the sectioning point of zero, the individual is considered male, and if less than zero, the individual is considered female.

Table 4 Stepwise discriminant function analysis of cervical mesiodistal and buccolingual measurements of all teeth
Table 5 Discriminant function analysis of cervical mesiodistal and buccolingual measurements for each tooth separately
Table 6 Discriminant function analysis of cervical mesiodistal and buccolingual measurements for each tooth separately

Table 7 shows the classification accuracy of all functions. The best classification accuracy (100 %) was achieved with functions 1 and 6, which used the anterior teeth. The combination of maxillary and mandibular M1 and M2 with maxillary and mandibular canines (function 9) gave the next best classification (92.3 %), followed by maxillary and mandibular canines (function 2: 86.4 %). The canines, which showed the highest percentage of sexual dimorphism (Table 3), were added to functions 6 to 9. The classification accuracy significantly improved in all functions particularly in function 8 (Table 7). Mandibular M3 was removed from the analysis, which increased the sample size to 52 and 39 and also the classification accuracy to 82.7 and 92.3 % for functions 5 and 9, respectively. The best discriminant function for the single tooth measurements used mandibular I2 to differentiate between males and females and reached 87.9 % accuracy followed by maxillary and mandibular canines (86.9 and 86.3, respectively). In all functions, the accuracy in males was significantly greater than females (Table 7). Cross-validation accuracy was close to the original classification accuracy in all cases (Table 7).

Table 7 Classification accuracy of original and cross-validated samples

Figures 1 and 2 demonstrate the probability levels of correct group assessment according to the discriminant scores of each individual. For example, if a discriminant score based on the stepwise analysis of cervical measurements of mandibular canine (function 15) is 3.26 (x coordinate), the posterior probability of that individual coming from a male group is 99.9 % (y coordinate).

Fig. 1
figure 1

Probability levels of correct sexing for each individual, single tooth analysis (maxillary teeth)

Fig. 2
figure 2

Probability levels of correct sexing for each individual, single tooth analysis (mandibular teeth)

Discussion

Dental sexual dimorphism has been long acknowledged (Garn et al. 1964, 1966; Ditch and Rose 1972; Bishara et al. 1989; Hattab et al. 1997; Hillson et al. 2005; Cardoso 2008; Hassett 2011; Zorba et al. 2013; Viciano et al. 2015; Tuttösí and Cardoso 2015), and studies have demonstrated that dental dimensions can be used to accurately assess sex in living individuals and among skeletal remains (Fig. 3).

Fig. 3
figure 3

Probability levels of correct sexing for each individual, functions 1–9

Considering that odontometric methods for sex estimation are population-specific, different scholars have undertaken studies on tooth measurements to determine specific standards of group assessment for various populations (Bishara et al. 1989; Alt et al. 1998; Işcan and Kedici 2003; Ateş et al. 2006; Acharya and Mainali 2007; Hassett 2011; Khamis et al. 2014). The present study is the first reference study for sex estimation using odontometric data in Iranian archaeological populations. It should be emphasised again that the present study has estimated the sex of the individuals using osteological methods.

All variables analysed here presented statistically significant differences between males and females (p < 0.001) with the exception of the maxillary third molars that were excluded from the analysis. A comparison between the two sexes showed that the classification accuracy of all functions was higher for males. This result is in agreement with other studies on cervical tooth measurements (Vodanovic et al. 2007; Hassett 2011; Viciano et al. 2011, 2013, 2015; Zorba et al. 2011, 2013; Mujib et al. 2014; Peckmann et al. 2015). This means that females have a greater variation in tooth size and can more often be misclassified as male.

The greatest difference in percentage of sexual dimorphism was observed in canine mesiodistal measurements. There is little comparative data against which the amount of sexual dimorphism in the cervical measurements can be compared. Only Vodanovic et al. (2007) and Tuttösí and Cardoso (2015) provided percentage of sexual dimorphism for cervical tooth measurements in other archaeological samples. Vodanovic et al. (2007), however, reported only the SD% for mesiodistal measurements of maxillary canine and mandibular third molar. The SD% for maxillary canine mesiodistal measurement was 13.93 %, which is similar to the Tuttösí and Cardoso (2015) study (13.83 %) and about 4 % higher compared to the Vodanovic et al. (2007) study (9.55 %). The highest percentage of sexual dimorphism was observed in mandibular canine. The SD% for this tooth was 14.99 % (Mesiodistal) and 12.35 % (Buccolingual), which are significantly higher compared to the Tuttösí and Cardoso (2015) study (4.90 % mesiodistal and 6.87 % buccolingual). In the latter study, maxillary second incisor showed the highest percentage of sexual dimorphism contradicting the present and other studies (Cardoso 2008; Zorba et al. 2011). For the molar teeth, second molar showed the highest percentage of sexual dimorphism in accordance with the results of Tuttösí and Cardoso (2015) and also those of crown measurement studies (Cardoso 2008; Garn et al. 1979; Zorba et al. 2011).

Discriminant function analysis for single tooth measurements also showed that the cervical measurements of the permanent canines and incisors were the most dimorphic teeth providing classification accuracy between 76.9 and 87.9 %. These results were in accordance with previous studies (Alt et al. 1998; Starp 1990; Ellendt 1993; Hassett 2011; Viciano et al. 2011, 2013, 2015; Mujib et al. 2014). In addition, it was found that second molar dimensions can be a very effective single variable for sex estimation with a classification accuracy of 83 %. A similar result was achieved for a Modern Greek population (Zorba et al. 2012), and some archaeological populations (Starp 1990; Ellendt 1993; Viciano et al. 2015; Tuttösí and Cardoso 2015) also reported a high percentage of correct classification for second molar.

Furthermore, several different discriminant functions were created using different combinations of tooth dimensions. The best discriminant functions for sex classification used maxillary and mandibular incisors (F1) and a combination of maxillary and mandibular incisors and canines (F6). The obtained classification accuracy rates were 100 % for both original and cross-validated data; however, this observation was based on a small sample size (n = 27, n = 22), and despite the fact that functions derived from similar size samples are reported (e.g. Viciano et al. 2011), we do not recommend the use of this equation for sex estimation without a follow-up study on a larger sample. The second best discriminant function used a combination of canines, first molars and second molars (F9) with an accuracy rate of 92.3 % for both original and cross-validated data. This was followed by a discriminant function analysis that used maxillary and mandibular canines providing a correct sex classification of 86.4 % for the original and 84.8 % for the cross-validated data. Although Işcan and Kedici (2003) reported that the majority of difference between the sexes appears to come from the canines, Garn et al. (1967) suggested that the teeth located adjacent to the canines are more dimorphic than others; however, some studies of crown mesiodistal and buccolingual measurements indicated that incisors are the least sexually dimorphic teeth (Bishara et al. 1989, Ling and Wong 2007). Acharya and Mainali (2007), however, found that central and lateral incisors show significant sexual dimorphism.

In conclusion, sex estimation using dental cervical measurements in an Iranian population has proven to be highly accurate for both original and cross-validated data. It must be stressed though, that the expression of sexual dimorphism was calculated based on the individuals for which sex could be accurately estimated. This means that the sample may not be totally representative of the population, which introduces a bias in the analysis. Nevertheless, this study is of importance for the application to unknown skeletal remains from Iran around the same period (Iron Age); especially considering that they are more likely to survive harsh taphonomic conditions than any other skeletal element (Andersen et al. 1995; Vodanovic et al. 2007; Fereira et al. 2008). It is highly recommended to consider reliable estimates with over 95 % probability of correct classification. For estimates with 80–95 % probability, the prediction should be treated with caution, while any estimate with probability lower than 80 % should be considered unreliable and alternative methods should be used. Moreover, the percentage of accuracy may be slightly inflated due to small sample sizes used in some of the functions. This is why the authors aim to do additional testing on an expanded data set as a direction for future research.