Introduction

In the past years, various groupings of mammalian diets have been used in macroecological and palaeoecological research. These studies show that understanding mammalian diets is important not only for researching ecosystems today but also for reconstructing ecosystems of the past (Andrews et al. 1979).

Dietary groups have been used to analyse the relationships between biomes and trophic structures (Mendoza and Araújo 2019; Kissling et al. 2014), predator–prey interactions (Sandom et al. 2013), producer–consumer richness relationships (Jetz et al. 2009), and the relationship between diet and body mass (Pineda-Munoz et al. 2016). In palaeoecology, dietary groups have been used to analyse the evolution of dietary preferences and trophic structures (Price et al. 2012; Cantalapiedra et al. 2014; Grossnickle 2020), palaeoenvironments (Andrews et al. 1979), and palaeoclimate (Semprebon et al. 2004; Janis et al. 2004).

Early works of grouping mammalian diets were based on observations of the naturalists without quantitative models. The system developed by Eisenberg (1981) contained 16 dietary categories that cover terrestrial as well as aquatic mammals but also includes behavioural aspects such as “Foliage-gleaning insectivores”. This system has been used widely (Robinson and Redford 1989; Fa and Purvis 1997; Ungar 2010).

Building on Eisenberg’s system, Miljutin (2009) defined three main dietary categories: Animalivores, Herbivores, and Frugivores. He preferred the term Frugivores instead of the widely used Omnivores category. Omnivores, often defined as species that feed on more than one trophic level (Pimm and Lawton 1978; Tanabe and Namba 2005; Yodzis 1984), is a problematic category, since many animals eat occasionally both animal and plant matter, making it hard to establish simple boundaries between herbivores, omnivores and carnivores (Pineda-Munoz and Alroy 2014). Yet many researchers actively use the term (Landry 1970; Yodzis 1984; Diehl 2003; Tanabe and Namba 2005) and continue to make a more specific distinction of the category (Lambert 2016; Balestrieri et al. 2019).

While Eisenberg’s system was global and could be used for all mammals, many later prevailing dietary categorisations have been designed for specific geographical areas or groups of mammals—for example, for terrestrial mammals (Pineda-Munoz and Alroy 2014; Kissling et al. 2014), for small mammals (Langer 2002), for ruminants (Hofmann and Stewart 1972), for African bovids (Gagnon and Chew 2000) and marine tetrapods (Kelley and Motani 2015). Most of these categorisations were based on observational studies (observing animals eating) or analyses of the stomach content of dead animals, listing the dominant food items and their rank order.

Raubenheimer and Simpson (1997) and Langer (2002) pioneered quantitative analyses of diets by macronutrient contents. Raubenheimer and Simpson (1997) studied animal nutrition as a multi-dimensional phenomenon analysing the regulation of macronutrient (lipid, protein, carbohydrate) selection while Langer (2002) characterised different diets by different crude fibre contents. High-quality diets without much plant cell wall substance (e.g., nectarivory or carnivory) had low values of fiber, and poor quality diets with a lot of plant cell wall material (e.g., folivory or graminivory) had high amounts of fiber. Recently, Balestrieri et al. (2019) has used the macronutrient availability and their composition to analyse diets.

The number of distinct dietary categories has varied—from three main categories as proposed by Hofmann and Stewart (1972) up to 16, as proposed by Eisenberg (1981), making it challenging to perform systematic analyses between mammals from different geographical areas or groups. The lack of standardised criteria used for classifying the diets has been recognised earlier (Pineda-Munoz and Alroy 2014). Until recently, no large and consistent benchmark datasets of mammalian diets have been available for such analyses.

The challenge was tackled with the classification schemes by Price et al. (2012), Pineda-Munoz and Alroy (2014), Kissling et al. (2014), Wilman et al. (2014), and the updated version of Kissling et al. (2014) by Gainsbury et al. (2018). Price et al. (2012) collected primary data from stomach contents and fecal analysis for 1534 mammals. They assigned the dietary items into four food groups and generated three diet categories: herbivores, omnivores and carnivores. Pineda-Munoz and Alroy (2014) used also primary data for 139 land mammals, using eight predefined food groups to generate seven diet categories. While Price et al. (2012) and Pineda-Munoz and Alroy (2014) used primary data in their research, other researchers have used more general data sources like Nowak and Walker (1991). Kissling et al. (2014) used Nowak and Walker (1991) (for 682 species) supplemented with IUCN (2013) (1351 species). They assigned diet items into 12 food groups and arranged them as ordinal data (ranks 1–3) to generate 16 diet categories. They then used the resulting data for 2033 terrestrial mammals to extrapolate genus level data for 3331 species, resulting diet categories for 5364 mammals. Wilman et al. (2014) used also Nowak and Walker (1991) with various other sources, both books and primary literature for their data. They defined a protocol to translate the verbal descriptions into standardised, semi-quantitative information about the relative importance of different food groups. They estimated the relevance of the food items in 10% steps between 0 and 100% within 10 food groups (5397 species).

None of the studies attempted to infer food groups and proportions computationally. Instead, they used predefined food groups and translations of common terms into diet item proportions. Pineda-Munoz and Alroy (2014) concluded, “More realistic classifications should be based on the physical, nutritive and ecological characteristics of food items”. Langer (2002) used such an approach when he quantified the crude fiber contents as an approximation for the nutritive characteristics of the diets; however, he did not specify the diet item proportions computationally while defining these groups.

Here, we propose to quantify dietary categorisation based on the structural (parts eaten) and ecological (taxonomy) characteristics of food items that are mapped onto the macronutrient space. We present a computationally derived framework for characterising mammalian diets. The framework allows mapping diets from different information sources, such as stomach contents studies or lists of food items consumed. The proposed dietary categories result from clustering the diets using the taxonomy of the diet items and structural parts of the diet items consumed. We then characterise the resulting dietary categories in terms of their nutritive properties. Taxonomy of items consumed acts as a proxy for the structural and ecological characterisation of the food items, enabling estimation of their chemical compounds (proximate analysis).

The proposed analytical scheme with the accompanying dataset offers new possibilities for analysing mammalian diets at scale in terms of their evolutionary and environmental contexts. As an illustrative example, with a case study in mammalian dental morphology, we showed how the concentrations of some nutrients in the diets associates with the complexity of mammalian dentition.

To accompany this article and encourage follow-up research, we make the data publicly available as an online updatable database https://www.mammalbase.net. The source code and data for the analysis can be found at https://version.helsinki.fi/dacs/mammalbase. For published datasets see (Lintulaakso 2022a, b, c).

Materials and methods

Diets

We compiled the diet information for 4453 mammal species, which covered the majority (82%) of the extant mammal species. In addition, we compiled the dietary group information of Eisenberg (1981) for 1894 species with two revisions, we added the mixed-feeders to the herbivore group, and combined the aerial insectivores and foliage-gleaning insectivores into one category: insectivores (Lintulaakso and Kovarovic 2016). We matched the species affiliations against the taxonomy of Wilson and Reeder (2005).

Our contributed dataset is publicly available via MammalBase a database of mammalian attributes and diets compiled and maintained by K.L. (Lintulaakso 2022a, b, c) The data come from published sources and the dataset compiled by the National Center for Ecological Analysis and Synthesis (NCEAS) Workshop on Mammalian Communities (Badgley et al. 2001; van Dam et al. 2001; Damuth et al. 2002).

We collected diet descriptions from different data sources over the last 15 years. Several textbooks, including Grzimek’s animal life encyclopedia and Walker’s Mammals of the World were used (Nowak and Walker 1991; Hutchins 2003; Reid 2006; Smith et al. 2010), as well as the NCEAS dataset and dozens of primary data sources with stomach contents and scat analysis data.

We did not use diets of captives in this study. We also excluded cases where the feeding was forced by hunger, were laboratory experiments, or trap baits.

We used heterogeneous sources since we aimed at developing an approach to integrate data describing diets at different granularity. Most of this data was collected by the first author and he aimed at consistent treatment to prioritize broad textbooks over individual specific information. If several data sources were available for a species, all of them were used. As diets of species can vary depending on the local community composition or other local factors, by collecting data from several data sources provided us broader assemblage of dietary data for a species and, therefore, a broader view on the average expected diet. While this approach created less distinct diets, it took into account the capability of a species to exploit a variety of food types (Rex et al. 2010; Djagoun et al. 2013).

The diet of a species consisted of one or several diet sets, which in turn were lists of food items (such as ‘insects’, ‘leaves’ or ‘acacia pods’). Separate diet sets for a species could be lists of food items grouped by locality, habitat, time of year, sex, individual count, measurement method, cited references and the publication. We estimated the duration of the reported time of year in months and we used 12 months as a default for a diet set. For example, if a diet set was for winter, we set the weighted value for it as 3 months/12 months = 0.25.

The food items also contained information on which part (leaf, root, ...) or a life stage (egg, larvae, ...) of a food item the species was consuming. We standardised these parts to maintain the information from the data sources (Table 1). We did our best to choose between stem and shoot and include those parts in the raw data while we combined them in our clustering data.

Table 1 Standardisation of diet item parts and life phases

Our main assumption was that the food items in the published sources were listed in the order from the most common to the least consumed (Jernvall and Wright 1998; Wilman et al. 2014). Based on this assumption, we computed the fractions in the diet set by weighing food items according to their listed order. We tested two weighing schemes. The first was a simple linear model in which the weight of the \(i^{\mathrm{th}}\) food item is \(\frac{2(n + 1 - i)}{n(n + 1)}\), where n is the total number of items in the dietary list of a species. For a diet set containing three food items, this resulted in \(6/12=0.5\) fraction for the first food item, \(4/12=0.33\) for the second, and \(2/12=0.17\) for the third item, summing up to 1 for the whole diet set (Jernvall and Wright 1998). The second model was a geometric series (Motomura 1932) which assumes that the weight of ith most common food item in the diet is twice as much as the next common food item, i.e. it is equal to \(0.5^i / N\), where N is the normalization factor so the that the fractions sum to 1. For a diet set containing three food items, this resulted in \(0.5/0.875=0.57\), \(0.25/0.875=0.29\), \(0.125/0.875=0.14\), summing up to 1.

We evaluated these two weighing schemes against reported primary data (food item percentages in analysis of stomach contents or fecal analysis) for 60 mammals (102 diet sets having 669 diet items). We calculated Pearson product–moment correlation coefficient between the reported values and the fractions calculated using the two models. For the linear model, the Pearson r for the relationship was \(r(667) = 0.75, p < 0.0001\) and for the geometric model, it was \(r(667) = 0.82, p < 0.0001\). In this study, we decided to use the geometric model for computing the fractions in the diet sets.

If the food item percentage information was reported in the primary data, we used this information only for ordering the data from the most common to the least consumed in our master dataset compilation. If literature reported several diet sets for the same species, all diet sets were included and the sum of all diet sets was normalised to one. For example, two diet sets (one with a default 12 months time of year and one with 3 months) have initial weights of 1 and 0.25 and final weights of \(1/1.25=0.8\) and \(0.25/1.25=0.2\).

For the usability to other researchers and for the future quality improvements of the data, we made a quality assessment of the diets by calculating a score between 1 to 10. We scored the taxon quality (genus level diet data 0 points, species or subspecies level 1 point), the weight of having a reported citation of the data in the diet (0 or 2 points), the weight of source quality in the diet (data set 1, book 2, journal-article 3), the weight of having a described method in the diet (0 or 2 points), the weight of food item taxonomy (kingdom level data 0 to species level 2). Lowest score (1) is for a dataset, containing genus level diet without any citations or method descriptions, and only describing the diet items at the kingdom level. Highest score (10) is for a journal-article, containing (sub)species level diet with citation to the original source (or being one) and a method description, and describing the diet items at the species level. In our data, the quality scores for 4453 diets ranged from 1.3 to 8.6, having a mean of 3.5 and mode 2.5. The total number of diet sets was 5495. The number of diet sets within diets ranged from 1 to 28, having a mean of 1.64, and mode 1.

For 1939 species, the diets were described at the species level. Diets that were described at the subspecies level were included into species diet. For 2504 species, the diets were described at the genus level in the data sources. We did not extrapolate the diets for other species within a genus if we had a species level description for only some of the species for the same genus. See Lintulaakso (2022b) for the diet data.

Proximate analysis of food items

The proximate analysis forms the basis for feed analysis (Greenfield and Southgate 2003). This scheme was developed to mimic the animal digestion process and quantify shares of organic and inorganic materials. It estimates the relative amounts of water (moisture), ash, crude fat (ether extract, lipid), crude protein, crude fiber and Nitrogen-free extract (sugars and starches) in a food item (Henneberg and Stohmann 1860).

The first author collected chemical compound data for proximate analysis of food items from primary sources. He evaluated 210 journal articles, and selected 132 for this study. The focus was on wild animal diet components and less conventional human foods, avoiding (but not excluding entirely) cultivated crops or bred animals. In particular, we excluded domesticated fruits from the database using the definition and classification of fruit commodities by FAO (1996). Domesticated fruits have less fiber, protein, and calcium but more sugar than wild fruits (Oftedal and Allen 1996; Schwitzer et al. 2009). This provided us with Proximate Analysis data for 935 food items (Lintulaakso 2022a). Many of the reported values missed the moisture and for this reason, we discarded water and renormalised the chemical composition on a dry matter basis.

Throughout this study, we use ternary plots for illustrating the chemical composition of the resulting clusters and other data. The axes on these plots consist of nitrogen-free extracts, combined crude fiber+ash, and combined crude protein+ether extract. Although two of the axes combine two chemical categories, we chose to use the ternary plots for their clarity. The principal component analysis (results not reported here) showed that 97.7% of the variance was explained by two principal components. The PCA loadings follow those chemical categories we selected for the ternary plot axes.

Taxonomic standardisation of the food items

We matched all food item and chemical compound data against the taxonomic backbone from the Integrated Taxonomic Information System (ITIS) (http://www.itis.gov). ITIS provides a taxonomic hierarchy, which enabled us to assign Chemical compound data for the mammalian diets using the percentages of the food items. In total, 3698 verbatim food items resulted in 1199 standard food items. 4% of these food items are between kingdom and superclass level, 30% between class and family, and 66% are between subfamily and variety and their respective percentages in all diets are 14%, 76%, and 10%. In many cases, the food items were described quite broadly (i.e. leaves, roots) and therefore we had to calculate a general proximate analysis value for these food items based on the hierarchy from ITIS (Plantae/Leaf, Plantae/Root). Some food items were described at a species level, while chemical compound data was only available at higher taxonomic levels (family/order). To see how big this difference was, we counted the number of taxonomic levels that was needed to move upwards the proximate analysis taxonomy until the two taxonomies met. For the 1199 food items, this value ranged from 0 to 9, having a mean of 1.96 and mode 0.

Dental morphology

For characterising dental morphology of the eaters, we used the scoring scheme of Jernvall (1995), which describes structural components of teeth—the molar cusp patterns and their positioning. A tooth is described in five variables: (1) cusp shape (\(R = \hbox {round}\); \(S = \hbox {sharp}\); \(L = \hbox {loph}\)), (2) the number of buccal cusps, (3) the number of lingual cusps, (4) the number of longitudinal lophs and (5) the number of transverse lophs. Lophs here denote crests connecting two or more cusps.

To the first approximation, one can argue that the more components a tooth has, the more complex the demands on chewing it has. Here, we used the number of components as a proxy to the complexity of dental morphology. For this analysis we derived two variables: the total number of cusps (variable (2) + variable (3)), and the total number of lophs (variable (4) + variable (5)). The original schema used ‘M’ to denote more than three cusps or more than three lophs, for our analysis we recorded it as ‘4’.

We used existing scoring of mammalian teeth at the species and genus level. The bulk of the data came from the original scoring paper (Jernvall 1995) and from Jernvall and Wright (1998). They both list crown types for mammalian families (\(n=2531\) species in this study), supplemented with additional 21 entries from the original author (Jernvall, Jukka, personal communication, February 5, 2021). We complemented this data with data from the NOW database (NOW Community 2021) which provided crown type data for 290 species. NOW is a database of fossil mammals but it also contains records of living species that have been found as fossils.

Our final set of crown-types covered 2843 species, which was 64% share of our species for which we had dietary information. See Lintulaakso (2022c) for the dataset.

Clustering

To be able to cluster the species, we need to define a distance between the two diets of those species, indicating how similar the species’ diets are. The distance should have two properties: similar diets should produce small values and diets consisting of similar food items—concerning the taxonomic hierarchy—should also produce small values.

For simplicity, in this explanation, we ignore the information about the part of the food; we will address it later. To define the distance, we represent a diet of a single species as a real-valued feature vector f. The ith feature \(f_i\) corresponds to an entry in the taxonomic hierarchy of the diet item. We ignore any information below the family level. We should stress that all taxonomic levels of diet items are included, not just family, but also higher levels. The value of \(f_i\) corresponds to the proportion of food item i in the diet. Note that a single food item in a diet will result in many features due to the taxonomy. For example, if the diet consists 15% of Poaceae and 5% of Cyperaceae, then \(f_{{ Poaceae}} = 0.15\), \(f_{{ Cyperaceae}} = 0.05\) and \(f_{{ Lilianae}} = 0.2\), and so on for higher taxonomy levels. The distance d between two taxa say a and b is then defined as

$$\begin{aligned} d_{{ nopart}}(a, b) = \sum _{i} \left| f_i - g_i\right| , \end{aligned}$$

where f and g are the respective feature vectors for a and b. This distance is the \(L_1\) distance. Note that d has the properties we desire: similar diets yield small values. Moreover, if food items are close in a taxonomic hierarchy, then the features corresponding to higher taxonomy levels will be the same or similar, again yielding small values. As an extreme example, consider two diets consisting only of one food item. In this case, the distance is equal to path distance in the taxonomic hierarchy.

So far, we have ignored the part of the food item. To incorporate this information, we construct a similar feature vector, but now each feature corresponds to a species and a food part, for example, a feature \(f_i\) could match to roots of Poaceae. We use these features with \(L_1\) distance to define \(d_{{ part}}(a, b)\). Our final distance is then the total of the two aforementioned distances

$$\begin{aligned} d(a, b) = d_{{ nopart}}(a, b) + d_{{ part}}(a, b). \end{aligned}$$

Finally, we did the clustering with agglomerative clustering the distance d and Ward linkage function (Ward 1963). The number of clusters was set to one hundred. We also used the resulting hierarchy to further group and analyze the resulting clusters.

Comparing the resulting clusters with other datasets and diet schemes

To complement our analysis, we compared the resulting clusters with five existing dietary schemes: Eisenberg (1981), Kissling et al. (2014), Gainsbury et al. (2018), Price et al. (2012) and Wilman et al. (2014). The system of Eisenberg (1981) listed 16 detailed dietary groups for all mammals. We compared these groups with our resulting clusters on a ternary plot of the chemical composition. We then made four two-way table comparisons of our dietary groups with previously published dietary datasets. We compared our four main dietary groups with the three main dietary groups of Kissling et al. (2014), Gainsbury et al. (2018), Price et al. (2012) and with the five main dietary groups of Wilman et al. (2014). Although Wilman et al. (2014) did not directly report the dietary categories for mammals, their research reported five dietary categories for birds and how these categories were calculated.

Results

Nutritional value of food items

Figure 1 depicts the nutritional composition of individual food items. Blood, eggs, whole animals, and larvae are high in protein and fat, whereas nectar, fruits, roots, pollen, and exudates contains high amounts of sugars. We can see that the highest amount of fiber and ash is in bark, bones, feces, stems, and shoot. Other items with a higher amount of fiber and ash are leaves, buds, and flowers. Seeds and roots have a similar amount of fiber and ash but differ in the amounts of sugars, protein, and fat.

Fig. 1
figure 1

Ternary plot of the proximate analysis values for food item parts; Legend: NFE, nitrogen-free extracts, CF+ASH, crude fiber+ash; CP+EE, crude protein+ether extract

Fig. 2
figure 2

Clustering of species based on taxonomy and parts of food that they eat. H1–P8 are the cluster keys. The numbers in parentheses are number of species in the cluster. The bars on the right side represent the average chemical compound composition of a cluster

Fig. 3
figure 3

Ternary plots of the proximate analysis values. Legend: NFE, nitrogen-free extracts, CF+ASH, crude fiber+ash; CP+EE, crude protein+ether extract. The grey dots represent centroid of one diet

Resulting dietary clusters

The clustering was set to provide a hundred clusters. Some of the clusters were quite similar, like the insectivorous clusters (I4, I5), or very specific in detail like the piscivore cluster (P1) (see Fig. 2). To name the clusters, we calculated a mean diet for each cluster (see Mean Cluster Diet in Supplementary Material S1). The mean diet consists of all diets of the mammals within a cluster normalised to one. We then replaced with regular expressions all food item taxonomies and part combinations with a Qualitative Descriptor (e.g., WHOLE.*Mammalia.* \(\rightarrow\) Carnivore, .*Fungi.* \(\rightarrow\) Fungivore, LEAF.*Lilianae.* \(\rightarrow\) Grazer, see Qualitative Descriptors in Supplementary Material S1) and grouped the Qualitative Descriptors by cluster (see Mean Cluster Descriptors in Supplementary Material S1). Next, we selected maximum of four most important descriptors that contributed up to 64% of the diet of a cluster. The final cluster name was made of one or two words. One or two first Qualitative Descriptors were used if their share was over 64% of the diet, otherwise the second word was created based on the rest of the descriptors. If they contained descriptors from several kingdoms, ‘Omnivore’ was used as the second word otherwise the second descriptor was ‘Herbivore’ or ‘Animalivore’ depending on the kingdom. 72 qualitative cluster descriptions was generated (see Cluster Names in Supplementary Material S1).

Figure 2 shows the hierarchical clustering of species according to the taxonomic composition and parts of their food items. We indicated thirteen main cluster branches by colour. Clusters prefixed with P, CI, C, IV, and I are diets mainly of animal matter. Clusters prefixed with HO and IO are omnivores. Clusters prefixed with R, HF, F, N, G, and H are diets mainly of plant matter, all with some exceptions.

The five main branches of the animal matter clusters are mainly driven by the taxonomic affiliation of food items: invertebrates, vertebrates, or aquatic animals.

Clusters P are Piscivores that may also consume squids or crustaceans (P1–8). Clusters CI are Carnivore–Invertivores that may also consume some plant material (CI1–5). Clusters C are Carnivores (C1–2, C4) that may also consume some fruits (C5). Sanguivores (C3) belong to this branch too. Clusters IV contain mainly invertebrate food items: Vermivores (IV1–5), various Invertivores (IV6–11), Crustacivores (IV12–13), and Planktonivores (IV14). Clusters I are Insectivores. Clusters (I1–2) are Myrmecophages and clusters (I3–5) feed on various insects.

Clusters HO and IO are Omnivores of which (HO1–7) are various Herbivore–Omnivores and (IO1–4) are Insectivore–Omnivores.

The six branches of the plant matter clusters are mainly driven by food item parts: R are Rootivores (R1–3); HF are Herbivore–Frugivores with a higher portion of herbivorous (cellulose-rich) than frugivorous (cellulose-poor) diets (HF1–5); F are Frugivores with a higher portion of frugivorous than herbivorous diets (F1–16) of which some have Xylovorous (F2–3) or Fungivorous (F5) diets; N are Nectarivores (N1–4); G are Granivores (G1–10); H are Herbivores, like the Browser (H2, H4), Grazer (H1), and Mixed-Feeder (H3, H6–7, H12) continuum but also Xylovores (H9); and Herbivores with various combinations of Omnivory, Granivory, and Rootivory (H5, H8, H10–11, H13–14).

Nutritional content of diets

Figure 3a suggests three main dietary groups: animalivores, herbivores, and frugivores (an additional category, omnivores, will be discussed later). Animalivore diets have a high amount of protein and fat (\({>}\,60\%\)) and low amount of sugars (\({<}\,30\%\)) and fiber and ash (\({<}\,30\%\)). Herbivore diets have a high amount of fiber and ash (25–50%), a relatively high amount of sugars (40–55%), and a low amount of protein and fat (\({<}\,25\%\)). Structural carbohydrates form these diets and require symbiotic microbial enzyme systems for their digestion (Eisenberg 1981). Frugivore diets have a high amount of sugars (\({>}\,55\%\)), a low amount of fiber and ash (\({<}\,25\%\)), and a low amount of protein and fat (\({<}\,25\%\)). This group requires less physiological specialisation than the cellulose-rich herbivore diets (Pineda-Munoz and Alroy 2014). The contour shows a ‘bridge’ from animalivores to herbivores/frugivores (protein and fat between 25–60%), representing the conventional omnivorous diets (protein and fat between 25 and 60%, fiber and ash \({<}\,30\%\), and sugars \({>}\,20\%\)). Using above-mentioned boundary values, we calculated the major dietary group for each species and cluster and made a comment on possible mismatches (see Species Clusters and Cluster Names in Supplementary Material S1).

Do dietary clusters match the taxonomic orders of mammals?

The answer is yes and no. Figure 4 shows that some taxa are strict animal or plant matter eaters while some of the orders consume both.

Taxa having protein and fat greater than 60% are animalivores (see Fig. 3b). 13 out of 29 orders belong to this group: Afrosoricida, Cetacea, Cingulata, Dasyuromorphia, Erinaceomorpha, Macroscelidea, Microbiotheria, Monotremata, Notoryctemorpha, Pausituberculata, Pholidota, and Tubulidentata. Taxa having fiber and ash greater than 25% are herbivores. Five orders belong to this group: Hyracoidea, Lagomorpha, Perissodactyla, Proboscidea, and Sirenia.

None of the orders is strict frugivores (sugars \({>}\,55\%\)). However, some of the members of the order Chiroptera belong to this group while others are animalivorous. Pilosa order has two main dietary groups—Animalivores and Herbivores. Many of the orders contain a dietary continuum. The Herbivore–Frugivore-continuum contains two orders: Artiodactyla, of which some are also omnivores, and Dermoptera, eating fruit, leaves, and flowers. The Animalivore–Omnivore-continuum contains four orders: Carnivora, of which some are herbivores, Didelphimorphia, Scandentia, and Soricomorpha. Orders Diprotodontia, Peramelemorphia, Primates, and Rodentia have the widest spectrum of diets, including Animalivores, Herbivores, Frugivores, and Omnivores.

Fig. 4
figure 4

Ternary plots of the Proximate Analysis values of mammalian diets by taxonomic orders. On each plot: left bottom corner—NFE, nitrogen-free extracts; top corner—CF+ASH, crude fiber+ash; right bottom corner—CP+EE, crude protein+ether extract. Each dot represents one diet

Do dietary clusters match the Eisenberg’s dietary assignments?

The answer is mostly yes, while the new dietary clusters have much finer resolution. Figure 3c shows that the groupings made by Eisenberg closely reflect the nutritional composition of diets. Herbivore and frugivore groups appear close to each other. Specialised frugivore groups, nectarivore and gumivore match the position of food items in Fig. 1.

There is very little density of dots in the centre which would conceptually represent the middle way between meat eaters and plant eaters. Most of omnivores in are adjacent to specialised diets, a bit off towards the centre. The omnivore groups, including frugivore-omnivore and insectivore-omnivore, connect animalivores and frugivores/herbivores.

The seven animalivorous groups are in the right corner of the plot. Many of these groups are quite far from each other, suggesting the distinct chemical compositions of their diets.

Do dietary clusters match with other datasets and diet schemes?

The answer is also mostly yes. For the two-way table comparison, we matched species names with those in Kissling et al. (2014) (\(n=4072\) species), Gainsbury et al. (2018) (\(n=1064\)), Price et al. (2012) (\(n=1467\)), each having three dietary groups, and Wilman et al. (2014) (\(n=4441\), five groups, see Supplementary Material S2). The comparison with the published datasets shows that our herbivorous and animalivorous dietary groups have a good match while the members in our frugivorous and omnivorous groups are more dispersed. 87–96% of our herbivorous group members are in the published herbivorous groups (Herbivore, PlantSeed) while 45–50% of the published herbivorous group members are in our herbivorous group and 28–42% are in our frugivorous group. 82–88% of our animalivorous group members are in the published carnivorous groups (Carnivore, Invertebrate+VertFishScav). 88–95% of the published carnivorous group members are in our animalivorous group. 37–73% of our omnivorous group members are in the published omnivorous groups (Omnivore). 41–54% of the published omnivorous group members are in our omnivorous group and 24–30% are in our frugivorous group. \(32\%\) of omnivores in Wilman et al. (2014) are in our animalivorous group. This indicates that the definition of omnivory varies among different studies. The frugivores group has high variability against the other datasets. It has only one direct comparable group in the published groups, FruiNect, by Wilman et al. (2014). \(35\%\) of our frugivorous group members are in this group. 44–71% of our frugivorous group members are in the published herbivorous groups (Herbivore, PlantSeed). 28–42% of the published herbivorous group members and 24–30% of the omnivorous group members are in our frugivorous group (see omnivores above). \(78\%\) of the FruiNect group members are in our frugivorous group. The other publications do not include frugivores as a separate group. Therefore, the members of this group are dispersed in the published Herbivore, Omnivore, and PlantSeed groups.

Where in the dietary space are omnivores?

Clusters HO and IO form the majority of the Omnivorous dietary groups (Fig. 2) and on the ternary plot most of them lie between the Animalivores (protein and fat \({>}\,60\%\)) and the Herbivores/Frugivores (protein and fat \({<}\,25\%\), Fig. 3b). These clusters represent various Omnivorous groups: Insectivore–Omnivores, Frugivore–Insectivores, Insectivore–Frugivores, but also Granivore–Insectivores, and Gumivore–Insectivores. Many other clusters plot on this area too: Browser–Omnivores, Granivore–Omnivores, Carnivore–Herbivores, and Frugivore–Omnivores.

Some omnivorous diets follow the chemical composition of the major food items and plot outside the ‘omnivorous’ area. In these clusters, the omnivorous food items form only minority of the food consumed. Thus, there are different sub-categories of omnivores.

How does dental complexity relate to nutritional content of diets?

Figure 5 shows the associations between the dental complexity prescribed by Jernvall crown types and the nutritional contents of diets. We see gradients of increasing number of dental components from the right to the left side of the triangles and slightly upwards. The lower right side represents diets rich in crude protein and fat, the lower left side represents diets rich in sugars and the upward direction represents diets that contain larger shares of crude fiber.

Figure 5a shows how the number of lophs varies across the nutritional space. There is a decreasing pattern from the upper left of the plot (corresponding to high fiber diets) via the center (a mix of sugars and fat with a low amount of fiber) to the right (corresponding to a high amount of protein and fat diets). An exception to the overall trend is the category (4), representing 4 lophs. In this type of tooth, lophs make a square shape. In this arrangement only two lophs are perpendicular to the direction of chewing, thus in the developmental sense this dental morphology has four lophs, but from the functional perspective it is closer to two lophs. If we considered this category as (2) the diagonal pattern in the figure would be even stronger.

A similar although not identical gradient is visible in Fig. 5b, which shows the counts of cusps. The lower the amount of protein and the higher the amount of fiber, the more cusps molars tend to have. The patterns of cusps fall into two categories: three or fewer cusps associate with diets of high nutritional quality, whereas four or more cusps per tooth associate with lower quality diets. This is sensible in terms of the overall shape of the tooth. Teeth that are made of 1–3 cusps are primarily shaped as blades, while teeth that have 4 cusps or more make cubicles with cutting edges on the surface. While carnivory favors the former, the latter are primarily suitable for herbivory. Category (0) is an exception to this. The majority in this category belongs to family Ctenomyidae, tucos, they have kidney shaped hypsodont teeth that have lophs when in wear. Cusps for this group were not scored in the Jernvall’s original scheme (Jernvall 1995) following the uncertainty about their unworn teeth. If cusps were scored; however, there would have been many, somewhat similarly to proboscideans, and the category would match the nearby context of (4) and (8) well.

We conducted Hotelling’s \(T^2\) statistical tests to see whether the animal groups with different cusps or lophs had significant variation in proximate values. The results, given in Table 2, indicate that we reject every null hypothesis at the significance level of \(\alpha = 0.01\). The P-values remain significant even if we correct for multiple hypothesis problem using the Bonferroni–Holm method. Moreover, we see that the results support the results shown in ternary plots given in Fig. 5. The largest P-values are groups with 2 and 8 lophs, and 2 and 3 cusps. In both cases, the average PAs are close in ternary plots.

Overall from this analysis, we see an associative pattern—the lower the amount of protein, the more components teeth have. The main spread in both plots is on the horizontal axis quantifying the amount of protein. The patterns of lophs and cusps differ slightly on the vertical axis. The categories with a high number of lophs are located in the upper region of possible diets (indicated by grey dots in the background), suggesting a general association of lophs with high crude fiber. The categories with a high number of cusps are relatively more in the centre vertically, suggesting an association of the number of cusps with both fibrous and sugary diets. Both the number of lophs and cusps peak far away from high protein.

This analysis of dental morphology categories supports existing ecomorphological knowledge that nutritional content and biomechanics of acquisition of that content from food items are associated.

Fig. 5
figure 5

Ternary plots and box plots of the proximate analysis (PA) values. The numbered points show average PA in the ternary plot for the species with a particular number of cusps/lophs. The corresponding box plots show the dispersion along each axis. a PAs for number of lophs, b PAs for number of cusps

Table 2 Pair-wise P-values of Hotelling’s \(T^2\) tests

Discussion

We quantified mammalian diets and analysed them in an integrative way from the form via nutritional composition to the macronutrient content. While the sources for our data are heterogeneous, the proposed approach allows mapping and comparing diets at this large-scale. Despite the challenge of having many diverse sources, we see that the obtained results are biologically consistent and in line with the previous studies (see Supplementary Material S2). It is notable that the scheme by Price et al. (2012) performed best with our main dietary groups. Their data came from original studies with reported proportions, while our data were more heterogeneous. It seems that the geometric model (Motomura 1932) for estimating the diet item shares performed well. In fact, Hutchinson et al. (2022) found that the ‘hollow-curve’ model—with few common and many rare foods—for the structure of vertebrate diets was the best fit for dietary abundance distributions analysed (1084 of 1130). The first theory attempting to explain the mechanism underlying hollow-curve was by Motomura in 1932 (McGill et al. 2007).

We are aware that some diets contain occasional food items, and some of the diet descriptions and cluster assignments may not be perfectly separable. We tried to locate those occasional food items that bias the outcome when they are treated as a single diet item within a diet set. We also noticed that some of the textbooks may report the diets at very low resolution. For example, ‘green vegetation’ is too general description to identify a species as a grazer. Also, some of the data sources may not list the food items in the order from the most common to the least consumed which will bias the computed fractions of the food items. However, such an order would be expected if a reader is supposed to make a good assumption of a species diet. Overcoming such a bias, original dietary studies with reported fractions should be favoured.

Our intention is to establish a data source that can be updated dynamically by the scientific community. We contribute a database of mammalian diets accompanied with confidence information. We provide a means to update the database by using the form supplied in Lintulaakso (2022b). It is designed to use standard terms like Darwin Core (Wieczorek et al. 2012) and vocabularies like Ecological Trait-data Standard (Schneider et al. 2019) but we call for a standard that covers animal diets and enables the sharing of dietary data. Ideally, all data we will use in the future would come from original diet composition studies that include species-level food items and their detailed shares and omit non-original studies with only qualitative information successively. While this is not the case yet, our result shows that using multiple heterogeneous data sources for large-scale macroecological and palaeontological studies yields similar results as other approaches using similar data sources, but also acknowledges the insecurity linked with the use of textbook data without an expertise-based control.

Diets have been studied for centuries (Aristoteles Balme and Peck 1965) and it would be surprising to discover completely new patterns. However, we hope that consistency of quantitative treatment here provides a solid reference framework of dietary categories for future studies. A somewhat unexpected result is to find a relatively low density of species in the middle of the nutritional space. There is little general omnivory, most of the omnivorous categories are somewhat specialised and lie close to their more specialised dietary counterparts.

By clustering mammalian diets and plotting them on a ternary plot (which was independent of the clustered food items), we computationally re-identified four major dietary groups, Herbivores (Plant parts having high amount of fiber and ash), Frugivores (Non-fibrous plant parts with high amount of sugars), Animalivores (Invertebrates and Vertebrates with high amount of protein and fat), and Omnivores (Mixture of plant and animal matter, resulting with medium amount of sugars, protein and fat) in Fig. 3b. 80% of the species belong to the first three groups, and the remaining 20% are denoted as Omnivores. Wilman et al. (2014) report 19% for Omnivores while studies that use the classic trophic relationships (herbivore–omnivore–carnivore) for defining the dietary groups report higher percentages (27–33%) for Omnivory (Kissling et al. 2014; Gainsbury et al. 2018; Price et al. 2012). Pineda-Munoz and Alroy (2014) discuss the vagueness and oversimplification of this classification criterion and avoid the use of the term ‘omnivore’, because it does not communicate all the complexity inherent to food choice. Our nutritional analysis clarifies this complexity. There are many kinds of omnivores and most of their categories in the nutritional space shadow their specialised counterparties.

We identified several distinct omnivorous dietary clusters. In some fibrous or non-fibrous plant (or fungal) material appeared as the main diet component and invertebrates as the less consumed food items (Herbivore–Omnivores, Frugivore–Omnivores, Granivore–Omnivores, Rootivore–Omnivores), while in others, animal-based food items were the main dietary components and plant material was less consumed (Insectivore–Omnivores, Carnivore–Herbivores).

Many taxonomic orders of mammals plot in distinct areas of the nutritional space in Fig. 4. Herbivorous groups, like Perissodactyla and Lagomorpha, appear in the left upper area, while animalivorous taxa like Cetacea and Macroscelidea appear in the lower right area. The insect-eating and fruit-eating chiropterans form two distinct areas—extreme left, and extreme right at the bottom of the plot. Some taxa display a large spread over the nutritional space: Carnivora, Didelphimorphia, Primates, and Rodentia. These orders have taxa that are often considered omnivores. The large spread on the plot indicates various adaptations for different kinds of foods, from fibrous and non-fibrous plant material to soft tissues of animals.

The nutritional space of various taxonomic orders overlap. For example, the non-Australian order Soricomorpha and the Australian animalivorous marsupials, Dasyuromorphia are taxonomically and geographically distant taxa, preying on taxonomically different taxa. Yet they share a similar, animalivorous dietary space that the ternary plot of diets’ chemical composition clearly indicates. Also, the mostly herbivorous taxa Artiodactyla and Diprotodontia have similar dietary spectra.

All of the Eisenberg groups are captured by our clusters which is promising from the point of the computational approach. Eisenberg is one of the main classification schemes which is designed for all mammals, not only for terrestrial ones. In addition to the Eisenberg groups, we identified several finer, distinct dietary groups, like Vermivores, Rootivores, Granivores, and many combinations, like Xylovore–Browsers or Fungivore–Herbivores.

The proposed clustering approach over the geometrically weighted taxonomic space of dietary items and their structural parts can be used for making even more distinct dietary groups within certain taxonomic groups and use it in more specific studies.

We also thought of other use cases for the ternary plots of diets. For example, the dietary niche of a species could be plotted as an area of various diet descriptions (all diets, temporal or spatial variations). The larger or smaller the area, the more generalised or specialised diet a species has. Other use cases could be plotting a heat map of stable isotope values of mammal teeth to see how they reflect the main dietary groups or one could plot various other tooth morphology values, for example, hypsodonty index (Janis 1988), to analyse the relationship with diet. In community analysis, plotting the mammalian community diets from different vegetation categories may show some interesting new patterns.

Our case study of mammalian tooth morphology showed that the dental complexity associates with the nutritional space—more lophs or cusps associate with diets of poorer nutritional quality.

While concerns about phylogenetic niche conservatism sometimes arise, it has been repeatedly demonstrated that in mammals dental morphology is correlated with their actual diet (e.g. Pineda-Munoz et al. 2017; Evans et al. 2007), whether or not their diet is at the same time correlated with phylogeny [as discussed e.g. by DeSantis et al. (2018)]. At the same time, distantly related taxa with similar diets have convergent morphology (e.g. Kingston 2011).

Linking dental morphology to diets in this way opens a potential extension for analysis of fossil mammals and reconstructing their diets in more precise ways than it has been possible until now, including estimating nutritional characteristics of the diets from teeth. An applied methodology for this purpose remains to be developed and validated.

Conclusion

Our approach to quantifying mammalian diets based on the food item proportions offers a method for defining and analysing dietary groups, which can be used in future studies. The proposed schema allows linking the source and form of food (taxonomic affiliation and structural part of food items) with nutritional contents of diets. We then show how nutritional contents of diets associate with some dental morphology.

The nutritional analysis clarifies the position of omnivory among the dietary groups—there are many kinds of omnivores and most of their categories in the nutritional space shadow their specialised counterparties. We interpret this as that omnivores are of different kinds and propose that Omnivores is a fair category to use in analyses as long as it is defined more specifically than just eating a mix of animal and plant matter. From the analysis, it appears that omnivory is not common in contrast to some previous analyses (e.g. Gainsbury et al. 2018; Kissling et al. 2014; Price et al. 2012).

Our case study on dental traits quantitatively links the nutritional space of the diets to dental morphology. The case study reassures that the nutritional properties of diets are associated with the form in which food comes. The contributed dataset offers a means for extending this research to fossil analysis, further research pending.