# Simple MaxEnt models explain food web degree distributions

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s12080-009-0052-6

- Cite this article as:
- Williams, R.J. Theor Ecol (2010) 3: 45. doi:10.1007/s12080-009-0052-6

- 17 Citations
- 925 Views

## Abstract

Degree distributions are widely used to characterize networks, including food webs, and play a vital role in models of food web structure. To date, there have been no mechanistic or statistical explanations for the form of food web degree distributions. Here, I introduce models for food web degree distributions based on the principle of maximum entropy (MaxEnt) and show that the distributions of the number of consumers and resources in 23 (45%) and 35 (69%) of 51 food webs are not significantly different at a 95% confidence level from the MaxEnt distribution. These findings offer a new null model for the most probable degree distributions in food webs and other networks. They suggest that there is relatively little pressure favoring generalist or specialist consumption strategies but that biological drivers or methodological bias may force the consumer distribution away from the MaxEnt form.

### Keywords

NetworkResource distributionConsumer distributionNull model## Introduction

An enormous variety of strategies have evolved by which organisms capture the resources necessary for life and by which organisms avoid being consumed. These strategies range from organisms that are specialized on a single resource species to ones that consume a wide range of resources at multiple trophic levels. Similarly, some organisms have evolved elaborate defensive strategies and are consumed by few species while others are vulnerable to a much wider range of consumers. The nature of the balance between specialization and generality in consumers, the range of vulnerability of resources, and the determination of the biological processes that drive these interrelationships are central problems in food web ecology (Dunne 2006).

Food web degree distributions or the distribution of the fraction of nodes in a network with a particular number of links provide a description of this balance. Degree distributions play a central role in the description and interpretation of the structure of complex networks (Strogatz 2001; Albert and Barabasi 2002) and have been widely used to characterize biological networks (Jordano et al. 2003; Barabasi and Oltvai 2004; May 2006), including food webs. They also play a vital role in recent models of food web structure (Stouffer et al. 2005). Despite their importance, to date, there has been no mechanistic or statistical explanation for this vitally important aspect of food web structure.

A food web is a directed network of *S* nodes connected by *L* links, with links indicating the flow of biomass between nodes, which typically represent species or more coarsely resolved aggregations of species. Previous work on degree distributions in food webs has described their functional form. An early study of three food webs considered the undirected degree distribution, combining incoming and outgoing links, and suggested that degree distributions followed a power law and so are scale-free (Montoya and Sole 2002). This was disputed by a study of seven food webs, which considered the consumer and resource distributions separately, and argued that both followed a single-scale functional form (Camacho et al. 2002). A study of 16 food webs found that the form of the undirected degree distributions varied with network connectance (*C* = *L*/*S*^{2}), with power law distributions at low values of connectance (Dunne et al. 2002). None of these studies provide any explanation as to why these distributions should occur.

In addition to their use in the description of complex networks, degree distributions play an important role in the performance of structural models of complex food webs. In particular, it has been shown recently that the success of the niche model (Williams and Martinez 2000) and its variants (Cattin et al. 2004; Stouffer et al. 2005; Stouffer et al. 2006; Allesina et al. 2008; Williams and Martinez 2008) depends in large part on the form of the resource distribution (Stouffer et al. 2005). While the other components of the niche model, ordering of species in a feeding hierarchy and constraining diets to contiguous niches, are grounded in well-established ecological ideas (Hutchinson 1959; Cohen 1978; Cohen et al. 1990), no justification was given for the choice of the resource distribution in the niche model, and this centrally important choice has simply been copied in more recent models.

Here, I propose simple null models for the consumer, resource, and undirected degree distributions of food webs which help fill this important gap in our understanding of food web structure. It has often been argued (Albert and Barabasi 2002; Montoya and Sole 2002; May 2006) that a random network (Erdős and Rényi 1959) where any link is equally probable is a suitable null model, with deviations in the degree distributions from the sharply peaked binomial distribution of this model requiring explanation. This model assumes that all links occur with equal probability and, therefore, when considering the nodes in the network, it assumes that every node behaves identically; this assumption imposes biologically unlikely constraints on the degree distributions.

According to the principle of maximum entropy (MaxEnt; Jaynes 1957), the probability distribution with the maximum information entropy is the least biased probability distribution which satisfies a set of information-containing constraints. Recently, the MaxEnt theory has been used to explain various macroecological patterns (Dewar and Porté 2008; Harte et al. 2008) but to date has not been applied to network degree distributions. By applying the MaxEnt theory to model the number of connections to or from each node, different nodes have different expected numbers of links, a more biologically realistic scenario than the random network model. Here, I compare observed food web degree distributions to MaxEnt models constrained only by the numbers of species, top or basal species, and links in the food webs. I also test whether the degree distributions of niche model food webs (Williams and Martinez 2000) follow the MaxEnt models and whether deviations from the MaxEnt models were similar in the niche model and the empirical data.

## Materials and methods

The consumer and resource distributions of the trophic species (Cohen et al. 1990) in 51 food webs were analyzed. The data are all the webs with 25 or more trophically distinct taxa (Cohen et al. 1990) from two recent studies (Stouffer et al. 2007; Thompson et al. 2007); details of the data are given in Tables S2 and S3 of the Electronic supplementary material. These are among the largest and best-resolved data available, and while still subject to the many criticisms that food web data have received (Cohen et al. 1993), the many robust patterns found in these methodologically heterogeneous data (Stouffer et al. 2007; Thompson et al. 2007; Williams and Martinez 2008) give confidence that these findings are not the result of consistent bias in the data.

Two resource distributions were considered, termed the “all-species resource distribution” and the “restricted resource distribution.” The “all-species resource distribution” is defined as the distribution of the number of resources of each species, including the basal species, which consume no resources. This model is constrained only by knowledge of *S* and *L*. The “restricted resource distribution” is defined as the distribution of the number of resources of only the consumer species. As such, it includes prior knowledge of the number of basal species *B* and does not attempt to predict the fraction of basal species. Similarly, two consumer distributions are considered, the “all-species consumer distribution” and the “restricted consumer distribution.” The “all-species consumer distribution” is defined as the distribution of the number of consumers of each species, including the top species, which have no consumers. This model is constrained only by knowledge of *S* and *L*. The “restricted consumer distribution” is defined as the distribution of the number of consumers of the resource species, includes prior knowledge of the number of top species *T*, and does not attempt to predict the fraction of top species.

In the “all-species” distributions, the number of consumers or resources of each species can range from 0 to *S* and the mean number of links per species is *L*/*S*. In the “restricted” resource distribution, the number of links to each consumer can potentially range from 1 to *S* and the mean number of links to each consumer is *L*/(*S* − *B*). In the “restricted” consumer distribution, the number of links from each resource can potentially range from 1 to *S* and the mean number of links from each resource is *L*/(*S* − *T*). In general, the problem is to find a discrete distribution on a set of *n* values, here either {0,…,*S*} or {1,…,*S*} but more generally {*x*_{1},…,*x*_{n}}, with mean *μ* that maximizes \( H = - \sum\limits_i {p_i \;\ln \;p_i } \) subject to a set of constraints. This MaxEnt distribution is \( p_i = P\left( {X = x_i } \right) = Ce^{{\lambda x_i }} \) for *i* = 1,…,*n*. The constants *C* and *λ* are determined by the constraints that the probabilities sum to 1 and have mean *μ* (the number of links to or from each node): \( \sum\limits_i {p_i = 1} \) and \( \sum\limits_i {x_i p_i = \mu } \) (Jaynes 1957; Cover and Thomas 2006). The derivation, using Lagrange multipliers, is given in the Electronic supplementary material.

Finally, I developed a simple model of the undirected (sum of the number of consumer and resource links) distributions by assuming that the number of consumers and resources of each node are independent. Top species have no consumers, so for *T* species, the number of links is drawn from the MaxEnt resource distribution. Similarly, for *B* species, the number of links is drawn from the MaxEnt consumer distribution. For the remaining *S*–*B*–*T* intermediate species, the number of links is the sum of numbers drawn from the consumer and resource distributions.

The consumer, resource, and undirected distributions of the 51 empirical food webs were compared to the MaxEnt distributions derived using the empirical values of *S*, *L*, *B*, and *T*. Two tests of the fit of the MaxEnt models to the empirical data were used. In the first, the likelihood ratio (*G*) statistic (Sokal and Rohlf 1995) is used to compare an observed distribution to some expected (model) distribution. *G* is defined as \( G = 2\sum\limits_i {O_i \ln \left( {{{O_i } \mathord{\left/ {\vphantom {{O_i } {E_i }}} \right. } {E_i }}} \right)} \) where *O*_{i} is the observed frequency, *E*_{i} the expected frequency, and *i* indexes through all values in the discrete distribution with nonzero expected value. A randomization procedure is used; for each of the 10,000 trials, a sample is drawn from the MaxEnt distribution and its *G* value is compared to the *G* value of the empirical distribution where, in both cases, the expected distribution is the MaxEnt distribution. The goodness of fit, *f*_{G}, is measured by the fraction of trials in which the *G* value of the empirical distribution is greater than the *G* value of the distribution drawn from the MaxEnt distribution. The empirical distribution is considered to be significantly different from the MaxEnt distribution if *f*_{G} > 0.95.

The goodness of fit, *f*_{G}, does not differentiate between webs with overly broad or narrow degree distributions, a range of variation found in an earlier study of food web degree distributions (Dunne et al. 2002). To measure whether the empirical webs were more broadly or narrowly distributed than the model distributions, I measured the relative width of a distribution \( W = \log \left( {{{\sigma_{\text{O}} } \mathord{\left/ {\vphantom {{\sigma_{\text{O}} } {\sigma_{\text{M}} }}} \right. } {\sigma_{\text{M}} }}} \right) \) where *σ*_{O} is the standard deviation of the observed distribution and *σ*_{M} is the standard deviation of the model distribution. For each empirical web, the distribution of *W* for 10,000 webs drawn from the model distribution was computed. The quantity *W*_{95} is defined as the deviation of the empirical value of *W* from the model median normalized by the width of the upper or lower half of the central interval of the model distribution of *W* at the 95% significance level. This gives the normalized difference in standard deviations of the empirical distribution relative to the median standard deviation of a set of samples drawn from the model distribution and so measures the relative width of the empirical distribution. Webs with *W*_{95} < −1 have distributions that are significantly narrower than the model distributions; *W*_{95} > 1 occurs for distributions significantly broader than the model distributions.

## Results

*f*

_{G}> 0.95. Webs with

*W*

_{95}< −1 or

*W*

_{95}> 1 are significantly narrower or broader than the model distributions, respectively. The “all-species” models perform consistently worse than the models which are restricted to exclude nodes with zero links. All subsequent results will be for the better-performing “restricted” models which incorporate prior knowledge of the number of top or basal species.

Number and (fraction) of 51 food webs which are not significantly different from the all-species and restricted MaxEnt models and the binomial (random) model based on various criteria

Criteria | Consumer distribution | Resource distribution |
---|---|---|

All-species | ||

| 25 (0.49) | 21 (0.41) |

| 28 (0.55) | 41 (0.80) |

| 21 (0.41) | 20 (0.39) |

| 36 (0.71) | 28 (0.55) |

Restricted | ||

| 28 (0.55) | 42 (0.82) |

| 31 (0.61) | 40 (0.78) |

| 23 (0.45) | 35 (0.69) |

| 39 (0.76) | 47 (0.92) |

Binomial | ||

| 1 (0.02) | 4 (0.078) |

In the most conservative evaluation, the restricted consumer and resource distributions are not significantly different from the model distribution at a 95% confidence level if both *f*_{G} < 0.95 and −1 < *W*_{95} < 1. These conditions are satisfied for 23 (45%) and 35 (69%) of the webs, respectively. Thus, there is some asymmetry between the fit of the consumer and resource distributions to their respective MaxEnt distributions (*p* = 0.027, Fisher’s exact test). Many of the poorly fit degree distributions are only marginally significantly different from the MaxEnt model. Of the distributions with *f*_{G} > 0.95, 12 of 23 consumer distributions and four of nine resource distributions’ *f*_{G} fall between 0.95 and 0.99. The table also shows that the random model (Erdős and Rényi 1959) is a very poor predictor of the empirical degree distributions.

In 16 webs, both distributions are well fit by the MaxEnt models; in 19 webs, only the resource distribution is well fit; in seven webs, only the consumer distribution is well fit; and in nine webs, neither distribution is well fit. Fisher’s exact test suggests that the two degree distributions are independent (*p* = 1). Given this result, I created a model for the undirected degree distribution by assuming that each node’s incoming and outgoing links were drawn from independent MaxEnt models. Using the conditions that both *f*_{G} < 0.95 and −1 < *W*_{95} < 1, the undirected degree distributions were well fit in 28 (57%) of the empirical webs. This result is intermediate between the results for the consumer and resource distributions taken separately and further reinforces the idea that the consumer and resource distributions can be treated as independent.

*W*

_{95}of the consumer and resource distributions of the empirical webs against goodness of fit

*f*

_{G}. Webs with poorly fit consumer and resource distributions (

*f*

_{G}> 0.95) have a wide range of relative widths, but are generally more broadly spread (positive

*W*

_{95}) than the MaxEnt model. Webs in Fig. 2 are broken into two groups, the stream webs collected by Thompson and his collaborators (Thompson and Townsend 2003, 2004) and all the other webs. There is a well-defined cluster of Thompson’s stream webs whose consumer degree distributions are relatively broad and poorly fit by the MaxEnt model. The resource distributions of these webs are better fit by the MaxEnt model and, while mostly not significantly different in width from the model webs, they do stand out as having relatively narrow distributions, indicated by their consistently negative values of

*W*

_{95}.

*f*

_{G}) and relative width (

*W*

_{95}) of the resource distribution does not depend on network size (

*S*) or mean connectivity (

*L*/

*S*). There is a weak, marginally significant relationship between the consumer distribution’s

*f*

_{G}and

*S*and a more strongly significant decrease in consumer

*W*

_{95}with

*L*/

*S*, as shown in Fig. 3 (details are given in the Table S3 of the Electronic supplementary material). This figure also shows the relatively broad consumer distributions of the Thompson stream webs. At higher

*L*/

*S*, the consumer distributions of the empirical webs tend to be narrower than the MaxEnt model distributions, with a more rapid drop-off at higher link values than predicted by the model. The truncation of the consumer distributions with increasing

*L*/

*S*is more extreme than the truncation of the MaxEnt distribution at higher

*L*/

*S*noted earlier. These results, along with the strong correlation between

*L*/

*S*and

*C*in these data, suggest that the truncation of the consumer distribution at higher

*L*/

*S*drives the truncation of the undirected degree distribution at high

*C*noted in an earlier study (Dunne et al. 2002).

*f*

_{G}) and relative width (

*W*

_{95}) tests and the mean

*f*

_{G}and

*W*

_{95}of the niche model webs. Figure 4c, d shows the same information for the niche model resource distributions. These figures show that, while the niche model resource distributions were always fairly close to the MaxEnt model, the niche model consumer distributions were consistently much more narrowly distributed than the MaxEnt model. As

*L*/

*S*increases, the empirical webs’ consumer distributions tend to become more narrowly constrained than in the MaxEnt model (Fig. 3), but this trend is far stronger in the niche model (Fig. 4b). Finally, while the niche model resource distribution is reasonably well fit by the MaxEnt model, the fit is consistently worse (higher

*f*

_{G}) and the distribution is consistently broader (higher

*W*

_{95}) as the network increases in size (

*S*; Fig. 5). No such scale dependence is apparent in the empirical data.

## Discussion

Two important pieces of information characterize the distribution of links in a food web, the total number of links in the system, hence the mean number of links per species, and the distribution of those links among the species in the food web, here characterized by the various degree distributions. This work does not attempt to explain the mean diet breadth (Beckerman et al. 2006; Petchey et al. 2008) or the number of links per species, but instead addresses the drivers of the distribution about this mean. The relatively close agreement between the degree distributions of the 51 empirical food webs and the MaxEnt models shows that, in many food webs, one does not need to consider detailed ecological processes to be able to predict the consumer, resource, or undirected degree distributions. While many features of food webs are clearly nonrandom and require an ecological explanation, their degree distributions are largely explainable by a simple null model based in statistical rather than ecological theory.

The “all-species” models, which predict the numbers of nodes with zero links (basal and top species in the consumer and resource distributions, respectively) perform consistently worse than the models which are restricted to exclude nodes with zero links. The differences are much larger for resource distributions than for consumer distributions. This suggests that the number of basal species is particularly different from the number predicted by the all-species MaxEnt model, and a biological or methodological basis for their abundance should be sought. Uneven resolution of the basal species due to methodological bias has been noted previously in some of the data used here (Rossberg et al. 2006).

When significant deviations from the MaxEnt distributions occur, other constraints are at work in determining the form of the empirical degree distributions. When this occurs, closer examination of the ecological processes or observational techniques must be carried out to determine what processes are forcing the consumer or resource distribution away from the MaxEnt form.

One example of such a deviation is that the consumer distributions of the empirical webs are often narrower than the MaxEnt model empirical food webs, especially at high *L*/*S* (Fig. 3). These distributions have a shorter tail (Fig. 1c), and so there are fewer taxa with large numbers of consumers than predicted by the MaxEnt model. A possible explanation for this is that the top-down pressure on taxa with large numbers of consumers increases their risk of extinction, and so empirical networks often have fewer highly vulnerable taxa than predicted by the simple MaxEnt null model. This would lead to the frequently observed narrowness of the empirical consumer distributions relative to the MaxEnt distributions.

The opposite pattern of deviation from the MaxEnt distribution is also observed. Previous studies have suggested that some food web degree distributions follow a power law (Dunne et al. 2002; Montoya and Sole 2002), and indeed, some of the data sets examined here have degree distributions that are significantly more fat-tailed than the MaxEnt distribution, which has an exponential cutoff (see, for example, the consumer distribution of the Powder web shown in Fig. 1b).

The Thompson stream webs have consistently broader consumer distributions than the MaxEnt model distributions. These webs comprise the vast majority of the stream webs analyzed, and some features of the ecology of stream habitats might cause this consistent difference in stream food web consumer distributions. It is also possible that the data-gathering techniques used produced food webs that are consistently different from food webs generated using other techniques, as suggested in an earlier study (Stouffer et al. 2007). These webs stand out methodologically, being based on gut content analysis of a relatively small number of individuals of each species, leading to an acknowledged likely undersampling of links (Thompson and Townsend 2003). If rare links tend to be to relatively invulnerable species, increased sampling could make the consumer distributions less broadly distributed by reducing the number of species with very low vulnerability.

The comparison of the degree distributions of niche model food webs with the MaxEnt models show that there are a number of consistent differences between the degree distributions of the empirical webs and the niche model which need to be addressed by future structural models of food webs. While consumer distributions follow the MaxEnt model fairly closely, resource distributions are more narrowly distributed than predicted by the MaxEnt model. This suggests that, while the rules used to assign niche width and, therefore, determine consumer distributions are reproducing the empirical data reasonable well, the rules that place these niches on the niche axis are not correct. There is also weak but consistent scale dependence in the fit of the consumer distributions, with larger niche model webs having more broadly distributed consumer distributions.

Given the methodological variability of the data sets, not only between the Thompson data and the other webs but also across the other webs (Dunne et al. 2004; Stouffer et al. 2007), the degree distributions of complex food webs are remarkably well-described by the simple MaxEnt model presented here. The many questions surrounding data quality mean that it is currently difficult to assess whether deviations from the MaxEnt model are a result of ecological processes or biases in the data.

## Acknowledgments

Thanks to Jen Dunne and Ross Thompson for generously sharing their data sets and to Jen Dunne, Drew Purves, Daniel Stouffer, and Carlos Melián for the helpful discussions and comments on earlier versions of this paper.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.