Wearing down the constraints of low magnification tooth microwear analysis: reproducibility and variability of results based on extant ungulates

Low magnification dental microwear analysis is a widespread dietary proxy for palaeoenvironmental analyses. The limitations of the method, such as observer bias or variation of microwear scars between different tooth positions, are still not quite understood. This study aims to reveal that reproducibility and variability of low magnification dental microwear is better, than it was previously thought. The main focuses of this study were differences between results produced by independent observers, and individual variability of the wear features on different teeth of the same specimen. To approach these issues, the microwear of 1944 0.4 × 0.4 mm areas on every right molar and premolar (144 teeth of 12 extant ungulate specimens) was quantified. Reproducibility and interobserver error was tested by calculating the intraclass correlation coefficients for the scores produced by the observers. The microwear features of each tooth were characterized by the mean, median, standard deviation, range, skewness and kurtosis. These statistical parameters were than compared. To test whether observed differences between the microwear patterns of different tooth positions are significant, ANOVA and Dunnett’s post hoc tests were performed. To calculate the minimal number of sampling sites required for characterizing a tooth, a computer-assisted bootstrap method was applied. As a result, it can be suggested that the low magnification microwear method is quite robust, with low interobserver error. The variance of microwear scars seems uniform throughout the dentition of the examined specimens. Some differences can be noted between tooth positions, however, some limitations could be lifted, at least in the case of ungulates.


Introduction
Dental wear analysis is an important and widely utilized proxy for dietary reconstruction of extinct animals (Jiménez-Manchón et al. 2019;Ibáñez et al. in press). The two major proxies of dental wear analysis are mesowear Kaiser et al. 2000;Kaiser and Fortelius 2003;Kaiser and Solounias 2003) and microwear signals (Solounias and Semprebon 2002;Semprebon et al. 2004;Rivals et al. , 2011Rivals and Athanassiou 2008).
To assess the microwear signal, different low magnification techniques were developed in the past few decades. These techniques utilize either a confocal microscope (Scott et al. 2005;Ungar et al. 2008) or a stereomicroscope (Semprebon 2002;Solounias and Semprebon 2002;Merceron et al. 2004Merceron et al. , 2005Semprebon et al. 2004;). These methods quickly became widely utilized thanks to the fact that they make researchers capable of evaluating a great deal of samples easily and quickly. Microwear scars observed using these methods have already been examined in the case of various taxa, such as conodonts (Purnell 1995;Purnell and Jones 2012;Martínez-Pérez et al. 2014), dinosaurs (Barrett 2006;Williams et al. 2009), and a wide range of herbivorous, omnivorous and carnivorous mammals (Merceron et al. 2004;Peigné et al. 2009;Münzel et al. 2014;Withnell and Ungar 2014).

3
The present study focuses exclusively on assessing low magnification dental microwear and the possible methodological issues such as observer bias, unreliable diet categorization, variability between teeth and specimens included in analyses.
No matter how widespread is the utilization of the abovementioned microwear methods, they still have their drawbacks. These drawbacks can originate from either the condition of the studied remains, the unsatisfactory cleaning and processing of the samples or from differences produced by observer bias. Several studies were conducted on such methodological issues (e.g. Semprebon et al. 2004;Galbany et al. 2005;Fraser et al. 2009;) and even new approaches emerged focusing on the elimination of observer bias in categorising the scars (Strani et al. 2018), but very few focused on the intraindividual differences of the observable microwear pattern (a few exceptions are, e.g. Todd et al. 2007;Rivals et al. 2015;Xafis et al. 2017). Differences in the microwear profiles of different populations of the same species has already been addressed (Rivals and Solounias 2007), backing up the hypothesis that different populations of the same species might have different enamel wear features depending on their habitat.
This study focuses on microwear of extant ungulates, including roe deer (Capreolus capreolus Linnaeus, 1758; Cervidae), red deer (Cervus elaphus Linnaeus, 1758; Cervidae), domesticated sheep (Ovis aries Linnaeus, 1758; Bovidae) and West Caucasian tur (Capra caucasica Güldenstädt and Pallas, 1783; Bovidae). The interobserver bias was addressed here by comparing the results of two independent investigators (the authors), similar to Todd et al. (2007) and . Previous studies usually restrained the application of the method to only apriori specified tooth positions (namely the M2 and m2 in the case of ungulates) and/or to specific areas on said teeth (namely the paracone and protoconid). The possible expansion of the methodology to other tooth positions and areas was also explored here by comparing several observations taken from each of the premolars and molars of the same individuals similar to Xafis et al. (2017), with the inclusion of all other molars into the study, not just the premolars.

Materials
The second, third and fourth premolars (P2/p2, P3/p3, P4/ p4) and the first, second and third molars (M1/m1, M2/ m2, M3/m3) of the right tooth rows of four extant ungulate species (Capreolus capreolus, Cervus elaphus, Ovis aries and Capra caucasica) were studied. All specimens utilized in the present paper are stored at the Mammalia collection of the Hungarian Natural History Museum in Budapest. For the mathematical statistical testing discussed below, we produced two different datasets. These two datasets were created with different questions in mind.
One of which was used entirely for assessing observer reproducibility, examined on three species regardless of their dietary habits, whereas the other was used for testing intraindividual variability in two species with highly differing diets. Dataset "A" contains 96 teeth from 4 Capreolus capreolus, 3 Cervus elaphus, and 1 Capra caucasisa skulls, whereas dataset "B" consists of 72 teeth from 3 Capreolus and 3 Ovis aries skulls. For dataset "B", the two species were selected based on their differing diet; C. capreolus is widely considered as a browser (Gębczyńska 1980;Storms et al. 2008;Kamler and Homolka 2019), whereas O. aries is thought to be a grazer (Wagner and Peek 2006;La Morgia and Bassano 2009). The list of specimens with all necessary identification data can be found in Table 1.

Molding and counting microwear scars
After thoroughly cleaning the surface of each tooth using cotton swabs soaked in ethanol, a high-resolution polyvinylsiloxane mold of each occlusal surface was made with Coltene Affinis Precious light body fast impression material (ISO 4823, Type 3, Coltene Holding, Altstätten, Switzerland). After creating the molds, transparent casts were made using EPO-TEK 301 resin (Epoxy Technology, Billerica, Massachusetts, USA). These materials allowed us to reproduce features with a resolution of a fraction of a micron, which enables the observation of microscopic scars on the enamel surface.
The microscopic wear features of the teeth were quantified under a Nikon SMZ800 stereomicroscope using a  Solounias and Semprebon (2002) and Semprebon et al. (2004). Pits and scratches were identified based on their shapes and refraction properties. Pits are scars that have approximately similar widths and lengths; meanwhile scratches are elongated features with straight, parallel sides. By carefully adjusting the angle in which the light strikes the casts, pits and scratches become separable based on their refractive properties. Small pits, fine and coarse scratches have high refractivity, they can become relatively bright under certain lighting properties. However, large pits and hypercoarse scratches are less refractive, they always remain darker. These microware scars were counted on a 0.4 × 0.4 mm area of the enamel (measured with an ocular micrometre) under 75 × magnification. This was used instead of the regular 35 × magnification to make the examined 0.16 mm 2 areas almost entirely fill the field of vision, thus to exclude possible errors originating from accidentally counting features outside the designated study area. Increasing the magnification from 35 × to 75 × could cause problems regarding the quantity and visibility of the counted features, however, based on the results of , magnification and resolution are not that important as long as all data included in a single analysis were collected under consistent circumstances. Our aim here was not to reconstruct the dietary preferences of the above mentioned species, but rather to see whether the microwear features on the enamel surfaces are consistent, or not, and if the method itself is reproducible or not. Small pits and large pits, as well as fine scratches and coarse or hypercoarse scratches were not differentiated from each other in the present study (see Solounias and Semprebon (2002) and Semprebon et al. (2004) for details). The amount of large pits and coarse or hypercoarse scratches could provide additional information about the alimentary habits of an animal. However, in this case, the aims were not to separate slightly differing diets, but to better understand the distribution and variability of the microwear scars, so the additional information derived from the scar subcategories were not considered in this study. The nomenclature of teeth follows that of Bärmann and Rössner (2011) (Fig. 1).

Interobserver reproducibility
To test the interobserver reproducibility of the method, the microwear features were quantified for each tooth in dataset "A". The counting was done on two randomly selected areas on each replica by the two independent observers (BSz and AV, respectively). The areas were selected by the first observer, and then mapped out in detail to help the second observer to locate the same regions. To avoid observer bias, blind counting was utilized. Each cast received a code number by a third, independent researcher, which made the later re-identification possible; however, none of the two observers knew during the counting, to which species or which specimens the actual teeth belonged. In total, the results of 192 areas were compared with each other.
After the quantification of the microwear features, intraclass correlation coefficients (ICC), and their 95% confidence intervals were calculated for the results of the two observers using the "irr" package of the open access R software (R Core Team 2017) based on mean-rating, consistency, two-way mixed effect models. These models result in ICC values between zero and one. Based on the 95% confident interval of the ICC estimate, values less than 0.5 indicate poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 are indicative of excellent reliability (Shrout and Fleiss 1979;Koo and Li 2016).
Furthermore, linear major axis regressions (MAR) were also calculated using the scratch and pit counts of the two observers as pairs for each area. This regression was selected for this purpose for its ability to handle the same amount of error in the case of the compared variables, not like the more frequently used ordinary least squares method, which assumes larger variability for the dependent variable on the Y axis compared to that of the predictor variable on the X axis. Linear major axis regression minimizes the sums of squares of the perpendicular distance between each point and the regression line. The strengths of the correlations between the results of the two observers were interpreted following the rule of thumb suggestions of Mukaka (2012). Correlation coefficients between 0.9 and 1 mark a very high correlation, coefficients between 0.7 and 0.9 mark a high correlation, values between 0.5 and 0.7 mark a moderate correlation and values under 0.5 mark a low or a negligible correlation between the two variables.

Intra-and intertooth variability
The occlusal surface of each premolar and molar in dataset "B" was divided into multiple regions along the longitudinal and the transversal axes of the teeth (Fig. 1); the labial and lingual enamel ridges of every cone/conid were separated into different regions. In the case of premolars, eight regions were created: the two enamel ridges of the posterolingual cone/conid; the two enamel ridges of the anterolingual cone/ anterior conid; the two enamel ridges of the posterolabial cone/conid; and the two enamel ridges of the anterolabial cone/mesolabial conid. Most molars were divided into 16 regions: the 2 enamel ridges of the preparacrista/premetacristid; the 2 enamel ridges of the postparacrista/postmetacristid; the 2 enamel ridges of the premetacrista/preentocristid; the 2 enamel ridges of the postmetacrista/postentocristid; the 2 enamel ridges of the preprotocrista/preprotocristid; the 2 enamel ridges of the postprotocrista/postprotocristid; the 2 enamel ridges of the premetaconulecrista/prehypocristid; and the 2 enamel ridges of the postmetaconulecrista/posthypocristid. Two further regions were created on the lower third molars: the enamel ridges of the entoconulid and the enamel ridges of the hypoconulid.
The microwear features were quantified on two 0.4 × 0.4 mm areas within each subregion, if possible. In those cases, when no sufficient microwear feature was observable on the subregions one, or zero areas were included from there. In total, 1752 areas were studied; microwear features were observed in 1400 subregions. In our case, areas with extremely low scratch and pit counts were present in every specimen. Discriminating the two dietary categories based on these areas is problematic, because in this case, they overlap greatly. Areas with such low number of microwear scars could be present on any specimen with any kind of diet. To avoid this problem, data falling in the far bottom-left parts of the dietary morphospace were excluded. This critical area is located below the line connecting the two points, which define the bottom-left border of the browser morphospace. This border can be interpreted as a required sum of scratches and pits for further analysis. To exclude the low-scratch and low-pit range specimens, the limit was set to 12, meaning that at least that many scars had to be calculated in a given area to consider that area in further analyses. After the exclusion of the low scar number areas, 1184 of them turned out to be suitable for further analysis.
The microwear features of each tooth were characterized by the mean, median, standard deviation, range, skewness and kurtosis of the scratches and pits. These basic statistics describe the dispersion and the shape of the distribution of the data points. Additionally, the microwear feature counts of the upper and lower second molars were further analysed. The protoconid of lower molars and paracones of upper molars (control areas) are usually recommended to use for low magnification microwear analysis (e.g. Solounias and Semprebon 2002;Merceron et al. 2004;Rivals et al. 2009). To test if the characteristic microwear pattern of these two areas are similar or equivalent to that observed on any other part of the same teeth, two-sided t tests (with a confidence interval of 95%, and an alpha value of 0.05) were executed.
All these aforementioned statistics were calculated separately by the built in functions and the "moments" package of the R software (R Core Team 2017).
To evaluate the minimal amount of sampling areas which are necessary to provide reproducible average feature counts, the available data pool for each tooth was randomly rarefied. During the first step of this procedure, only one randomly selected data pair (i.e. the scratch and pit counts of only one area) was used from the whole dataset of a given tooth (16-36 areas depending on its position) to calculate the average scratch and pit counts for that same tooth. This extreme sample rarefaction was repeated 1000  Bärmann and Rössner (2011)) times for each tooth, and each result was registered as it was the outcome of a separate examination. Then in the second step, two data pairs (originating from two randomly selected areas) were averaged together. This sampling was also repeated 1000 times for each tooth. The number of the randomly selected sampling areas was raised by one in each consecutive step, until it reached ten. This resulted in ten sets of data, each of them including 1000 pairs of average feature counts for each tooth. Finally, the basic statistical parameters of each set (such as the minimum, the mean and the maximum) were compared to the average scratch and pit counts calculated from all available sampling areas on the same tooth.
The aim of this process was to simulate an observer, who studies an increasing number of sampling areas on the enamel surface of the teeth. Then, this observer tries to describe the properties of a given tooth using the average feature counts of the studied areas. Theoretically, the more sampling areas are included into the process, the better characterization can be made. Although, after a while, not much improvement can be reached with the time consuming inclusion of further and further sampling sites, because the average will approach a plateau close to the average of the background distribution. The minimal number of sample sites necessary to provide reproducible average feature counts was determined as follows: Once the spread of the data of a given rarefied set-not considering the outliersfell within the standard deviation interval calculated using the whole data pool available for a given tooth, the sampling was deemed as satisfactory. Any value in a given set outside the range defined here by the first and third quartile minus/ plus 1.5 times the interquartile range, respectively, were considered an outlier.
To test for significant differences between the microwear patterns of different tooth positions (from P2/p2 to M3/m3) of individuals, an analysis of variance (ANOVA) and a Dunnett's post hoc test were performed (Dunnett 1964;Holm 1979). For the Dunnett's test, the microwear values of the m2s and M2s were chosen as control groups, to which all other lower and upper tooth positions were compared using the "multcomp" package of the open access R software (R Core Team 2017). These two positions were chosen as control groups, because they are widely utilized in the literature for extracting microwear signals from ungulate teeth (e.g. Gordon 1982;Merceron et al. 2004;Merceron et al. 2004;Rivals et al. 2010). The same procedure was performed with the ratio of scratches and pits of the different tooth positions. The reason for this was to check whether other tooth positions had similar distributions of scratches and pits as the second molars, consequently, whether their data can be used for drawing paleodietary conclusions. The same microwear ratio can be achieved with different numbers of scratches and pits as well, and even though the number of scratches and pits might differ, the ratio might still carry some information about the alimentary habits of an animal (Rivals et al. 2009).

Interobserver reproducibility
Based on the analysis of 192 sampling areas, the ICC value for the number of scratches falls between 0.761 and 0.894 (with a mean of 0.84) using a 95% confidence interval when comparing the results of the two observers (BSz and AV). A linear major axis regression was also calculated using the scratch counts of the two observers as pairs for each area (Fig. 2a). The correlation coefficient (r 2 ) was 0.565 (p < 0.001).
The minimum difference between the scratch counts of the two observers for the same areas was 0, whereas the maximum difference was 20. The mean of all scratch count differences was 4.49. The results of BSz were higher than those of AV in 57.29% of the cases.
The ICC value for the number of pits falls between 0.915 and 0.962 (with a mean of 0.94) using a 95% confidence interval when comparing the results of the observers. The correlation coefficient of the major axis regression calculated using the pit counts of the two observes as pairs for each area (Fig. 2b) was 0.803 (p < 0.001).
The minimum difference between the pit counts of the two observers for the same areas was 0 as well, whereas the maximum difference was 16. The mean of all pit count differences was 4.64. The pit counts of AV were higher than those of BSz in 54.17% of all cases.

Intratooth and interteeth variability
There is no statistically significant difference between the microwear pattern observable on the protoconids and the paracones of the second molars (control areas), and the rest of the occlusal surface (test areas) of those teeth (Table 2, Fig. 3).
The variability of the pit and scratch counts on all studied areas from the same tooth is relatively high. The range of both features can reach up to 20 when combining the data of all areas from the same tooth (Table 3, Fig. 4).
Based on the results of the experimental procedure which aim was to simulate an observer, it can be concluded that each separate average within a set of 1000 resampling fell inside the standard deviation interval of the mean value based on all available areas from the same tooth if the number of the studied areas is at least five during the resampling step. A further increase in the number of studied areas on a given tooth does not improve the results significantly (Fig. 5, Supplementary Table 1).
The difference between the scratch counts of the upper and lower second molars in any given specimen is statistically not significant based on the results of the ANOVA (p = 0.851). The same is true in the case of the pit counts of the upper and lower second molars (p = 0.095).
However, major differences can be observed between the average pit and scratch counts when comparing different tooth positions of a given individual. The microwear features of the first and third upper and lower molars and the fourth upper and lower premolars fall closest to the microwear signal observed on the second molars on each individual, regardless of the species. On the other hand, second and third premolars can have quite different microwear signals   compared to the second molars. The biggest difference can be observed between the second premolars and the second molars. In general, the further back the premolar in the tooth row, the closer the microwear scar counts are to the counts of the second molar (Table 4). Based on the results of the ANOVA with the Dunnett's post hoc test in the case of the upper teeth, the following tooth positions show comparable microwear signals to the second molars: M1, M3. In the case of the P2s, almost all specimens had different microwear features compared to the M2s. The P3 and P4 counts resulted in much better results than those of the P2s (Table 4).
The results of the lower teeth are similar to those of the upper ones: p4, m1 and m3 have results similar to m2. The p2 resulted in different results in almost all cases. The lower third premolars had better results than the lower second premolars; however, these results still fall further away from the control m2 (Table 4).
In the case of the roe deer, wear features portrayed on the dietary morphospaces show that all molars classify the specimens to the same dietary category as the M2/ m2-s, namely as a browser. However, the premolars can give misleading results, in the case of the fourth premolar, the results fall close to the boundary of the browser morphospace, but the third and second premolars suggest a shift into the mixed feeder domain, closer to the grazer morphospace (Fig. 4).
In the case of the grazer sheep the difference between the premolars and the second molar is less well-marked. All dental elements show a low-pit number coupled with moderately higher scratch numbers. Every examined tooth classified the sheep to the mixed-feeder territory, marked with especially low pit numbers (Fig. 4).
The results of the comparisons of ratios calculated from the scratches and pits observed on the surface of all the upper and lower molars and premolars to the ratio of scratches and pits of the M2/m2, respectively, can be found in Table 5. The obtained ratio of the p2 differed significantly from that of the m2 in 67%, and the ratio of the p3 differed in 50% of the cases. The ratios of both the p4 and the m1 differed significantly from that of the m2 in 33% of the cases, and the obtained ratios did not differ from the m2 for the m3-s. The upper teeth had similar results. The obtained ratio of the P2 differed significantly from that of the M2 in 33%, and the ratio of the P3 differed in 67% of the cases. The P4 and the M3 microwear ratios differed in 17% of all cases, and the obtained ratios did not differ from the M2 for the M1-s. The two dietary categories represented by the two species differed from each other in the S/P ratio, as the ratios of the browser roe deer consequently had lower mean values, when compared to the theoretically grazer sheep (Fig. 6).

Discussion
One very important question of the low magnification microwear method concerns the subjectivity of the observers. Up to date, there has been some work done on the bias resulting from registering microwear scars by independent observers. Most of these studies, however, focused exclusively on the error rates of the microwear quantification using scanning electron microscopy (Grine et al. 2002;Galbany et al. 2005). In other studies that focused on the reproducibility and reliability of the low magnification wear method based on the analysis of digital micrographs, some differences were found regarding the microwear feature counting experience of independent observers, but similar interobserver error was recorded on all examined resolutions, which suggests that the specific magnification and resolution is not important as long as all data included in a single analysis were collected under consistent circumstances .
For the low-magnification microwear method, observer error was assessed by Semprebon et al. (2004) using the results of 13 ungulate species. They concluded that there were no significant differences between the results of the independent observers. Although DeSantis et al. (2013) reported a somewhat higher interobserver error for the low magnification method in the case of both herbivore and carnivore taxa, our results based on dataset "A" are consistent with Semprebon et al. (2004) and seem to support the reproducibility and reliability of the method in question. Microwear feature recognition and registration can be done reproducibly by any researcher trained in the field of microwear analysis. Before counting, the observer should gain a clear classification notion by learning the definitive parameters of each studied feature. Consequently, the resulting microwear pattern will be comparable with the observations of other researchers who applied the same method.
The resulting numbers of microwear features of both observers (SzB and VA) correlated well with each other. Both in the case of the scratches and pits, high ICC values were obtained, and the linear models fitted on the data showed high correlation between the results of the two counters. The two observers had slightly different results in many cases, but systematic directional differences were not present. In some cases, observer VA returned higher numbers of features, other times, observer SzB did the same. These results support the hypothesis of good reproducibility and robustness of the microwear method.
The microwear results of this study were compared with the earlier established morphospaces of average scratch versus average pit numbers from Solounias and Semprebon (2002) and Semprebon et al. (2004). The aforementioned morphospaces were used for exemplification of the acquired data and for comparing the dietary characterization of the specimens based on the different dental elements. Based on their upper and lower second molars, all C. capreolus specimens studied here fell into the browser morphospace. This dietary categorization agrees well with other studies based on microwear data (Solounias and Semprebon 2002;Merceron et al. 2004), as well as studies based on stomach content analysis and field observation of wild roe deer populations (Cibien and Sempere 1989;Navarre 1993;Tixier and Duncan 1996). Meanwhile, the three O. aries specimens fell between the browser and grazer morphospaces, into the mixed-feeder dietary category, with particularly low number of pits. This shift in a presumably grazer animal could be explained by habitat differences, for example a difference in humidity, vegetation, soil properties or temperature (Lucas et al. 2014). Such difference in feeding habitats was investigated by Mainland (2003), who found significant difference between the wear patterns of sheep populations pastured in open grasslands and in areas of deciduous woodland.
Complete molars or premolars are scarce in the fossil record, which could make the microwear analysis much more difficult if we restrain ourselves to the traditionally applied constraints. If the region suggested by, e.g., Solounias and Semprebon (2002), Merceron et al. (2005) or Rivals et al. (2009) is not available, or damaged, than a tooth cannot be further analysed. However, the comparison of the microwear pattern on the protoconids and paracones of the second molars with the rest of the occlusal surface of these teeth showed no statistically significant difference between these regions despite the differing functions of the different enamel surface areas. If available, than the aforementioned areas are still suggested for the analysis, however, in the many cases, when there is no other way, any other part of a tooth seems to represent the diet of the animal sufficiently well.
The variability of the microwear structures on the surface of the enamel is relatively high. The average standard deviation on each tooth was around 3.5 for both the pits and scratches. This standard deviation seems to be relatively constant throughout all of the premolars and molars of the animals, making their comparison possible. Similarly, high variability of microwear scars was reported by Todd et al. (2007) on elephant molars and by Valli et al. (2012) on horse molars. The high variance of the microwear structures might suggest that results based on them should be treated with precautions. However, each tooth can be characterized by an average microwear pattern. By averaging multiple sample sites on the enamel surface of a tooth, it is possible to get closer and closer to the underlying average microwear pattern of a given tooth. As the number of sampling sites is increased, the mean of those sites better approximates the mean of the whole tooth. Including numerous sample sites to the analysis would be time consuming, and would undermine the fastness and simplicity of the low-magnification microwear method. Consequently, an optimal number of included sites should be determined, which can characterize a tooth sufficiently with as few sites as possible. The results presented in this paper suggest that it is possible to represent a tooth with as few as five randomly selected sampling sites with an adequate number of wear features on them. Including further sites does not improve the results meaningfully, but using less than five sites makes the obtained results less and less certain.
The comparison of the scars observed on the different teeth of the same specimens raises other important Fig. 6 Comparison of the scratch/pit ratios of the teeth of the two species. The green violin plots represent the sheep, and the blue plots represent the roe deer. The lower and upper second molars are plotted first on each figure as a reference to which other teeth are comparable implications. The similarity of the scratch-pit values of molars and premolars suggests that it is possible to base dietary reconstructions not only on the second upper and lower molars, but also on other dental elements. The work of Xafis et al. (2017) also suggested that apart from the second molars, also premolars could be suitable for such analyses. The scratch and pit numbers of the upper and lower teeth suggest that fourth premolars and first and third molars bear basically the same microwear pattern as the second molars in the case of the studied roe deer and sheep specimens. Other teeth, namely the second and third premolars are less reliable than the aforementioned three. For second and third premolars, it is possible that the microwear scars observed on them classify the specimens into the correct dietary categories, but the average is usually somewhat shifted compared to the results of the second molars.
The comparison of scratch/pit ratios of the teeth suggest that all molars show similar values to those of the second molars, whereas the ratios observed on the premolars, especially on the second and third are markedly different from the second molars in most cases. The differentiation of the different dietary categories could be made in our case based on any given tooth.
Based on these aforementioned results, it can be concluded that it is possible to make dietary assumptions for an animal based on not only the second upper/lower molars, but on other dental elements as well. Although it should be noted that, if possible, the second and third premolars should be excluded, for their microwear pattern is far less reliable than that of any other molars or the fourth premolars. Furthermore, analyses focusing on the dietary information obtained from scratch/pit ratios can also be conducted confidently on any of the molars, for their results are comparable with those of the second molars. Xafis et al. (2017) suggested that the use of the premolars for dietary reconstructions is acceptable. Our results, however, suggest that in the case of C. capreolus the differences between the second molars and the second and third premolars are greater than the differences between the second molars and the fourth premolars and the first and third molars. Apart from the second and third premolars, the position of teeth selected for the analysis is seemingly irrelevant, but analysing the microwear features on the second and third premolars could possibly lead to distortions in the results. Microwear results based on the first two upper and lower premolars-at least for the examined group-should be further treated with caution.

Conclusions
The examination of the low-magnification microwear structure of the enamel of four extant ungulate species suggests that some restrictions of the method can be lifted.
One important outcome of the present study is that the results of two independent observers correlated well with each other, suggesting with high confidence that trained but independent observers would get similar results. The results of the different counters can be directly compared with each other, and no distortion would emerge from the observer bias, if during the preparation and evaluation the observers follow the same protocol for sample processing.
According to the results of the present study, no statistically significant difference in microwear can be seen between the paracone/protoconid and any other part of a given tooth. Furthermore, to sufficiently describe the wear pattern of a given tooth, the wear features of at least five different 0.4 × 0.4 mm areas should be quantified. Fewer sites could be a source of distortion, and the inclusion of more sites per tooth does not improve the results meaningfully.
The other important notion presented in this paper is the possible expansion of the usable dental elements for microwear analysis. Based on specimens of the two extant ungulate species, apart from the second upper and lower molars the first and the third upper/lower molars and the fourth upper/lower premolars could be used for such analyses. If possible, the use of the second and third premolars should be avoided, for the analyses based on them could be misleading.
Funding Open Access funding provided by Eötvös Loránd University.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.