How optimal foragers should respond to habitat changes: a reanalysis of the Marginal Value Theorem

The Marginal Value Theorem (MVT) is a cornerstone of biological theory. It connects the quality and distribution of patches in a fragmented habitat to the optimal time an individual should spend exploiting them, and thus its optimal rate of movement. However, predictions regarding how habitat alterations should impact optimal strategies have remained elusive, with heavy reliance on graphical arguments. Here we derive the sensitivity of realized fitness and optimal residence times to general habitat attributes, for homogeneous and heterogeneous habitats, retaining the level of generality of the MVT. We provide new predictions on how altering travel times, patch qualities and/or relative abundances should affect optimal strategies, and study the consequences of habitat heterogeneity. We show that knowledge of average characteristics is in general not sufficient to predict the change in the average rate of movement. We apply our results to examine the conditions under which the optimal strategies are invariant to scaling. We prove a previously conjectured form of invariance in homogeneous habitats, but show that invariances to scaling are not generic in heterogeneous habitats. We also consider the relative exploitation of patches that differ in quality, clarifying the conditions under which it is adaptive to stay longer on poorer patches.


Introduction
The Marginal Value Theorem (MVT) is an important and popular tenet of biological theory (Stephens and Krebs 1986), combining high generality and a relatively simple mathematical formulation. When resources are distributed as discrete patches throughout the habitat, the MVT predicts how long an individual should spend exploiting each patch before moving to another, depending on the kinetics of fitness accumulation within patches, and on the time it takes to move between patches (the travel time ;Charnov 1976). This question has many applications in evolutionary biology, and beyond (Hayden et al. 2011;Rijnsdorp et al. 2011). The MVT for instance provides a framework to understand the optimal duration of copulation for males (Parker and Stuart 1976), the evolution of animal migration (Baker 1978), clutch-size (Wilson and Lessells 1994), foraging strategies across a broad range of taxa (Danchin et al. 2008), lysis time for bacteriolytic viruses (Bull et al. 2004), or the expected duration of interactions for cooperative cleaner fish (Bshary et al. 2008). In fragmented landscapes, the MVT gives a rationale to determine when individuals should start dispersing (Poethke and Hovestadt 2002), and yields quantitative predictions on the expected rate of movement throughout a habitat (Belisle 2005;Bowler and Benton 2005).
A key question is how optimal strategies should compare between patches or habitats that differ in quality (Stephens and Krebs 1986). However, this is not directly addressed by the MVT. Charnov's 1976 seminal article established the existence of, and characterized, the optimal residence time on each patch, such that the long term average rate of gain, taken to be a predictor of fitness, is maximized. Yet, computing the optimal residence times requires specifying a specific functional form for the accumulation of gains in patches, and, even so, it is usually impossible to solve the equations analytically. This is at best feasible for some simple functions in homogeneous habitats (i.e. if all patches are identical; Stephens and Krebs 1986) or using tractable approximations (Parker and Stuart 1976;McNair 1982;Stephens and Dunbar 1993;Charnov and Parker 1995;Ranta et al. 1995). These difficulties seriously complicate the investigation of how optimal residence times vary with habitat characteristics (Sih 1980;Stephens and Krebs 1986;Charnov and Parker 1995). As an alternative, graphical methods have proven very intuitive and can accommodate arbitrary gain functions (Parker and Stuart 1976), so that even today most discussions of the MVT rely on graphical arguments (e.g. Danchin et al. 2008). But this is not without caveats.
First, the graphical argument is restricted to homogeneous habitats, limiting the scope for predictions in heterogeneous habitats (Stephens and Krebs 1986). Second, the generality and robustness of conclusions is hard to assess, which has sustained some confusion in the literature. For instance, it is commonly claimed, and tested experimentally, that, under the MVT, residence times should be higher on better patches in a given habitat (e.g. Kelly 1990;Wajnberg et al. 2000), or that residence time should increase with patch quality (e.g. Riechert and Gillespie 1986;Astrom et al. 1990;Alonso et al. 1994;Tenhumberg et al. 2001;Corley et al. 2010;Rijnsdorp et al. 2011). However, theoretical investigations of different particular ways to alter patch quality have yielded variable predictions (Sih 1980;Charnov and Parker 1995;Ranta et al. 1995;Danchin et al. 2008). For example, from some simple gain functions, it has been argued that scaling the gain function vertically (a natural way to make a patch better) leaves the optimal residence time unchanged (Charnov and Parker 1995;Ranta et al. 1995;Livoreil and Giraldeau 1997). Even one of the most basic predictions attributed to the MVT, that increasing travel time should increase optimal residence time, may not hold in all generality (Stephens and Krebs 1986). This is a concern, since such predictions are often used as a basis to evaluate the theory (e.g. Nonacs 2001;Wajnberg et al. 2006;Hayden et al. 2011).
In this article, we propose to derive general analytical predictions on the impact of varying habitat attributes under the MVT. By using sensitivity analysis on the implicit definition of optimal strategies, we do not have to specify specific functional forms and thus retain the original generality of the Theorem. This will allow us to refine and clarify existing predictions, and to generate novel predictions. In particular, our approach can deal with the arguably more general case of heterogeneous habitats, allowing for a systematic analysis of the consequences of habitat heterogeneity. We will use our results to reanalyze the main predictions attributed to the MVT, in particular the effect of varying travel time, the consequences of improving quality, the invariance of the optimal strategies upon vertical and horizontal scalings, and the relative time individuals should spend on patches of different qualities.

The Marginal Value Theorem
Consider an individual foraging over many discrete patches that are encountered sequentially, with characteristics drawn randomly from a stationary distribution. Let there be s different types of patches, each with relative frequency p i . Let function F i (t) be the cumulated gain of an individual that exploits a patch of type i for t time units. Functions F i should represent net expected gains, discounting costs (Stephens and Krebs 1986;Brown 1988). They must be positive, increasing, and concave for at least some t in order to yield a fitness maximum (Charnov 1976). Let T i be the travel time it takes to find and move to a patch of type i, allowing the possibility for some patches to be more accessible than others.
In a homogeneous habitat, F i = F and T i = T for all i. The MVT states that an individual should leave a patch after t * time units, as defined by (1) a b Fig. 1 Graphical interpretation of the MVT. a In homogeneous habitats, (4) can be solved for t * by constructing the line tangential to F as shown. The resulting line has slope E * n , the realized fitness. b In heterogeneous habitats the graphical construct does not work to solve (5). If E * n is known, the optimal residence times on each patch-type can be determined by constructing lines tangential to the gain functions with slope E * n . Here there are three patch-types and the first is not effectively exploited, i.e. = {2, 3}. In this case patch-type 3 has higher quality than patch-type 2, and t * 3 > t * 2 Both sides are then equal to E * n , the long term average rate of gain in the habitat, which effectively represents fitness and is maximized at the optimal residence time t * (Charnov 1976). Equation (1) has a well-known graphical solution (Fig. 1).
In heterogeneous habitats, the MVT states that at one or more patch-types (whose indices make up the set ) should be exploited, while others should be left as soon as entered. We denote the average value of quantity y over the habitat as In some contexts the average should be over the exploited patches only. This will be made clear with a subscript: y j = j∈ p j y j / j∈ p j .
The optimal residence times are then defined by For exploited patch-types, both sides of (3) are again equal to E * n , the fitness of an optimal individual in this habitat. Set is determined as the set that satisfies (3) while resulting in the highest value of E * n (Charnov 1976;Stephens and Krebs 1986). There is no graphical solution in this case, even though if E * n has been determined, one can still deduce the optimal residence times on each patch-type (Fig. 1).
In order to determine the consequences of changing habitat characteristics, we introduce an indicator variable x that represents some relevant attribute of patches. Different attributes (e.g. patch size, nutritional value...) can be relevant depending on context (Charnov and Parker 1995;Rita et al. 1997). Attributes of interest would typically impact the shape of the gain function (McNair 1982) and/or travel time Charnov and Parker 1995). In this context, the homogeneous MVT Eq. (1) can be expressed as evaluated at x 0 = x 01 , . . . , x 0 j , . . . , x 0s and t * ( For generality, all functions F and T in (4) and (5) will be assumed to be sufficiently smooth in their arguments. We will also assume that there exists only one MVT optimum in a given habitat. We will study the consequences of slightly varying the x 0 values on the MVT optimum defined from (4)/(5). In order to reduce clutter, we will simply recall that expressions are evaluated at the MVT optimum by noting t * i in lieu of t i .

Realized fitness, or what is quality under the MVT
The notion of quality is seldom made precise in the context of the MVT. Quality is sometimes equated with accessibility or connectivity (Thompson and Fedak 2001;Belisle 2005;Nolet and Klaassen 2009), so that higher quality implies shorter travel time. On the other hand, better patches are often considered to be those with more resources, and hence higher gains. However, there is no unique way to 'improve' a gain function. In this article, we remark that for an optimal forager, an objective measure of habitat quality is the realized fitness E * n , i.e. the long-term rate of gain it extracts from its habitat. Hence, we consider than any alteration of the habitat corresponds to improving quality if it increases the realized fitness E * n . In particular, regarding the choice of x i : Definition 1 In a given habitat, a patch-attribute x i is called a metric of quality if and only if ∂ E * n /∂ x i > 0.
We now proceed to compute ∂ E * n /∂ x i from (5), in order to clarify which sorts of patch alterations result in improved quality.

Proposition 1 A patch attribute x i is a metric of quality (Definition 1) if and only if
Proof From the expression of the realized fitness in (5), j , we see that it is affected by x i directly through the effect on F j and T j , and indirectly through the effect on t * . We can thus express the variation of E * n as Each derivative with respect to t l (second term on the r.h.s.) is zero, as the long term average rate of gain E * n is maximized at t * (x 0 ) under the MVT. Hence, all terms involving variations of the optimal residence times vanish. Expanding the remaining terms yields: Remembering the expression of E * n (from (5)), this simplifies as: We divide both sides by E * n , and remark that which, upon taking the logarithms, yields the relative variation of E * n in terms of the relative variation of average quantities: Requiring this to be positive yields Proposition 1.
Equation (8) states that the variation of realized fitness only depends on the relative variations of average absolute gains (first term) and of average travel time (second term). Changing the time-derivative of the fitness function (i.e. the instantaneous rate of gain) has no direct impact on fitness; only absolute gains matter. It is therefore unduely restrictive to assume better patches have steeper slopes with respect to time, as in some earlier analyses (e.g. McNair 1982). The slope of the fitness functions might vary arbitrarily with quality, as it will prove important throughout this article.
We also remark that the relative variation of average travel time is weighted by Since this represents the proportion of time an individual spends traveling between patches, it is necessarily smaller than one. Hence, a relative increase in average travel time does not compensate for a similar relative increase in the average gains. In other words, travel time has comparatively less impact than the gain function.

Homogeneous habitats
We now show that, in a homogeneous habitat, the effect of varying a patch-attribute x depends on how this changes the time-derivative of F, its height, and travel time. We have the following theorem:

Theorem 1 Increasing x increases t * if and only if
Proof Since the MVT holds irrespective of habitat quality, (4) remains true if both sides are differentiated with respect to x, which yields: Isolating the derivative of interest: The two terms in the parenthesis can be turned into relative variations by dividing them by E * n = ∂ F(x, t * )/∂t: Since function F is concave at a MVT optimum, ∂ 2 F(x, t * )/∂t 2 < 0 and the sign of variation of t * is that of the first parenthesis. Replacing ∂ ln E * n /∂ x with its value from (8) (evaluated in the homogeneous case) concludes the proof.
As was the case for realized fitness (8), travel time has relatively less impact on optimal residence time than the two attributes of the gain function. This follows directly from (9) in which the relative variation of travel time is weighted down by T (x)/(T (x) + t * ) < 1.
One consequence of Theorem 1 is that an increase in quality may increase the optimal residence time only if it increases sufficiently the slope of the gain function. This follows directly from (11), since ∂ ln E * n /∂ x is positive for any metric of quality x (Definition 1).

Heterogeneous habitats
In heterogeneous habitats, the optimal residence time on patch-type i is affected not only by the attribute of patch-type i, but also by the attributes of all other patches. We have the following result: Theorem 2 In a heterogeneous habitat, for any i ∈ {1, . . . , s} and k ∈ , increasing Proof For any patch-type m not in , t * m = 0 and, generically, it does not vary with x, i.e. ∂t * m /∂ x i = 0 for all i. Let us consider the variation of t * k , k ∈ , with respect to the attribute of some patch-type i. We use Eq. (5), replacing i with k, and differentiate both sides with respect to x i , to get The same rearrangements as above yield: Replacing ∂ ln E * n /∂ x i with its value from Eq. (8), we get the condition for t * k to increase with x i as expressed in Theorem 2.
Corollary 1 For any i ∈ {1, . . . , s} and k ∈ , k = i, if x i is a metric of quality increasing x i decreases t * k .
Proof We remark that, in the absence of further assumptions, Equation (13) includes the homogeneous case studied in the previous section as a special case. It is thus insightful to compare the value of ∂t * i /∂ x i , for one patch-type i, depending on whether the habitat is homogeneous or heterogeneous. For this, we consider as known all quantities observable at the patch level, i.e. t * i (and thus E * n ), F i (x 0i , t) and, if relevant, T i (x 0i ), but let the habitat context ( p i and the attributes of other patches) unspecified. When considering attributes of quality, this yields the following proposition:

Proposition 2 In a habitat of quality E * n , the variation of t * i with a quality metric x i is always greater if the habitat is heterogeneous rather than homogeneous. In heterogeneous habitats, ∂t
Proof Consider a habitat of quality E * n and a focal patch-type i with attributes F i (x 0i ,t) and T i (x 0i ), so that t * i is fixed. If the habitat is homogeneous, ∂t * i /∂ x i is given by (11) applied to patch-type i, whereas if it is heterogeneous, ∂t * i /∂ x i is given by (13). The two equations are almost identical, differing only in the relative variation of E * n . In the heterogeneous case, the latter is: which can be rewritten as For E * n to be the same in the homogeneous and heterogeneous cases, we must have This is the same as in a homogeneous habitat (Eq. 11), multiplied by (12) is no smaller than from (11), with equality in the homogeneous case. The difference decreases in proportion of p i , and in inverse proportion of F j (x j , t * j ) , concluding the proof.
Intuitively, Proposition 2 means that the habitat acts as a diluting factor, buffering the impact of patch attributes on the overall habitat quality E * n . The greater the contribution of patch-type i to the overall quality, i.e. the higher p i and F i (x i , t * i ), the greater the variation of E * n with x i , which feedbacks negatively on t * i . Homogeneous habitats represent an ideal case where the retroaction of E * n on t * i has full intensity, maximizing the chances of having a negative ∂t * i /∂ x i .

Average residence time
Comparing equations (11) and (13) helped evaluate the consequences of habitat heterogeneity from the perspective of a focal patch-type. From a whole-habitat perspective, a more meaningful comparison is between the behavior of t * in the homogeneous case and that of t * j in the heterogeneous case. Indeed, t * and t * j both capture the global rate of movement throughout the habitat. One question of interest is whether heterogeneous habitats behave on average just like an average homogeneous habitat, so that one might just plug average quantities into Eq. (9), or whether heterogeneity changes things qualitatively.
Proof We compute the variation of average residence time with x i : For any patch k not in , ∂t * k /∂ x i = 0, and using (13) for the others, we get Here we remark that, from (2) and the definition of y j , Thus, we have Also, we can write Using these in Eq. (15) yields: Requiring this to be positive establishes the theorem.
Unlike in a homogeneous habitat, one cannot in general obtain a criterion for the sign of ∂ t * j /∂ x i that does not require estimating second time-derivatives. Indeed, the first term in square brackets in (16) where H is the harmonic (rather than arithmetic) mean over exploited patches. Since, at the optimal residence times, all time-derivatives in are equal to E * n , second time-derivatives are effectively proportional to the curvatures w.r.t. time. The harmonic mean (17) is thus the appropriate way to average curvatures in the context of the MVT.
Curvatures can be disregarded if all are identical at the MVT optimum, or if the manipulated patch-type i has exactly average curvature. These may seem very contrived situations, but we will encounter an occurrence of the former in Sect. 5.3, which is also found in a broader context of biological relevance (Calcagno et al. in prep.). In these circumstances, minus the harmonic mean is equal to the second time-derivative of F i so that both can be factored out in (16), yielding the condition for ∂ t * j /∂ x i to be positive as: This is equivalent to criterion (9), stated in terms of habitat-level averages (the first only covering ).
In general, however, a given change of average habitat characteristics might have contrasted impacts on the average optimal residence time, depending on the distribution of second-time derivatives, and on which patch-type is altered. If x i is a metric of quality, Theorem 3 implies that t * j might increase with x i only if ∂ 2 F i (x i , t * i )/∂ x i ∂t i is sufficiently positive. As the latter impacts t * j in inverse proportion of the second time derivative ∂ 2 F i (x i , t * i )/∂t 2 i , it follows that an increase of t * j is more likely, all else equal, when altering patch-types whose gain functions are relatively less curved.

Manipulating travel time
A graphical argument (corresponding to pushing −T to the right in Fig. 1) is often used to predict that decreasing the travel time should shorten the optimal residence time, and thus increase movement (e.g. Danchin et al. 2008). This is possibly the simplest and most often tested prediction attributed to the MVT (Nonacs 2001;Hayden et al. 2011). However, the graphical argument works only for homogeneous habitats, and assumes that the gain functions are not affected by changes in travel time. This is not the case if there are costs associated with traveling between patches (e.g. energetic locomotory costs). These make the net gain function vary with T , so that the argument cannot be relied on (Stephens and Krebs 1986). We use our results to address this issue formally.
Consider the cost of moving between patches is given by an increasing function of travel time C T (T ), while foraging costs in a patch are given by an increasing function C F (t). We thus have to consider the class of gain functions where F 0i (t) is some function representing the gross gains in patch-type i.
In this context, so that, from Eq. (7): As the term in parenthesis is always negative, E * n varies in opposite direction of travel time, and x i is a metric quality if and only if T i is decreasing in x i . From Eq. (19) we further have From Theorem 2, this implies that the sign of ∂t * k /∂ x i , for all i ∈ {1, . . . , s} and k ∈ , is that of −∂ E * n /∂ x i . Hence optimal residence times invariably increase with travel time, proving the graphical prediction in a general setting.

Manipulating patch frequencies
In the previous application, the time-derivatives of the gain functions were unaffected by the habitat modification; the sign of the variation of optimal residence times was thus entirely governed by the variation of E * n (Theorem 2). We remark here that a similar situation arises when one manipulates the relative frequency of patch-types (i.e. the p i ). Whereas most applications of the MVT have investigated the consequences of changing patch-attributes (Stephens and Krebs 1986), changing the abundance of different patch-types constitutes a general alternative way to alter a habitat.
For clarity, let us omit the dependence of F i and T i on x i . If we consider a change in p i , at least one other p k must also change in order to maintain j p j = 1. When differentiating Eq. (3) with respect to p i , we thus have to take the total derivatives with respect to p i . We get, for all i ∈ {1, . . . , s} and k ∈ , Since d 2 F k (t * k )/dt 2 k < 0 at any MVT optimum, this immediately shows that dt * k /dp i has the sign of −d E * n /dp i . Hence, improving habitat quality by manipulating relative patch frequencies decreases all patch residence times, i.e. increases the movement rate. This is another illustration that, if the time-derivatives of the gain functions are left unchanged, the optimal residence times on exploited patches invariably decrease with E * n .

On the scaling invariance of optimal strategies
We now consider two forms of scaling invariance of the optimal strategies that have been attributed to the MVT based on particular functions, the first corresponding to scaling the gain function vertically (i.e. scaling the gains), the second corresponding to scaling time (including travel time). Two particular gain functions are often used to implement these scenarios, namely the negative exponential function and the Michaelis-Menten function v m t/(k + t).

Scaling the gains
A generic way to model an increase in the quality of a patch is to multiply its gain function by some constant greater than one, effectively "stretching" it vertically. This can represent a change in the per-capita value of resource items (such as the sugar concentration in nectar or honeydew; Bonser et al. 1998), a change in their sheer number (Parker and Stuart 1976;Wajnberg et al. 2006), or the increased harvesting rate when more social foragers work together on a patch (Ranta et al. 1995;Livoreil and Giraldeau 1997). This has traditionally been modelled as increasing parameters μ and v m in functions (21) and (22). From the latter functions, it has been found that t * does not vary with x in homogeneous habitats, if travel time is kept constant (Stephens and Dunbar 1993;Charnov and Parker 1995;Ranta et al. 1995). A graphical illustration is given in Fig. 2a. We will here establish this result in a more general setting, and show that this invariance is non-generic when one considers habitat heterogeneity. In this example x 2 and p 2 were varied in a habitat with two other patch-types, having qualities x 1 = 1 and x 3 = 3, and relative frequencies p 1 = p 3 = (1 − p 2 )/2. Remark that increasing p 2 increases t * 2 if it decreases E * n (low x 2 values, below dotted curve) and decreases t * 2 otherwise (high x 2 values, above dotted curve). c Except in the homogeneous case (black line), the average residence time varies with x 2 . It increases for low x 2 values and decreases for high x 2 values. d In a heterogeneous habitat, if all gain functions are scaled by the same factor, optimal residence times do not vary. In this example, all x values where multiplied by 5 4 (from gray to black). Other parameters: T = 1, λ = 1 (color figure online) We will consider the following class of gain functions: with x i > 0 and some arbitrary functions G i , and constant travel times, i.e. dT i (x i )/dx i = 0. Class (23) includes both (21) and (22), with x taken to be μ and v m , respectively.
We first remark that F i (x i , t * i ) must be positive at a MVT optimum, so that Hence, from Proposition 1, x i is indeed a metric of quality. Equation (23) also implies From Theorem 2, we thus have the condition for t * i to increase with x i , i ∈ , as By the definition of the average operator (2), this is always true in a heterogeneous habitat, and thus t * i always increases with x i . Homogeneous habitats, for which p i = 1 and x j G j (t * j ) = x i G i (t * i ), correspond to the knife-edge case of equality in Eq. (24), so that dt * /dx = 0 (Eq. 11). This is illustrated, in the context of function (21) and a three patch-type habitat, in Fig. 2c. Only the limit case of homogeneity (dot on the far right) yields invariance. In all other contexts, t * i increases with x i . We also observe in the figure that the smaller p i , the steeper the increase of t * i with x i , and that the higher x i (i.e. the richer the patch-type relative to the average), the shallower the increase of t * i with x i . Both are illustrations of Proposition 2. Last, we observe that an increase in p i increases (decreases) t * i if it decreases (increases) the overall habitat quality, which illustrates the result (20) obtained in Sect. 5.2 If we consider the average optimal residence time in a habitat, invariance to x i is again not observed in heterogeneous habitats (Fig. 2c). If gain functions have identical second time-derivatives at the MVT optimum (an example of this is (21) with one λ value; Appendix), we can use (18) to predict the response of t * j to x i . In the context of function (23) Hence, average residence time increases (decreases) with patch quality if the manipulated patch-type yields lower (higher) absolute gains than average at the MVT optimum. Invariance only results when the manipulated patch-type yields exactly average absolute gains (F i ). From the fact that x i is a metric of quality and that, as we have just shown ∂t * i /∂ x i > 0, we have In addition, d F k (x k , t * k (x))/dx i < 0 for all k ∈ (from Corollary 1 and the fact that thus represents a maximum of t * j with respect to x i . This is illustrated Fig. 3a 1 . Since, for each patch-type individually, t * j is maximized when the patch-type yields exactly average gains, we can further conclude that t * j is globally maximized when all patch-types have the same x value, i.e. in the homogeneous case. Thus, in a heterogeneous habitat, t * j is smaller than the t * value one would observe in a homogeneous habitat; heterogeneity always decreases the average optimal strategy. This is visible in Fig. 2d. In the more general case where second time-derivatives do differ at the MVT optimum (an example of this is function (22); Appendix), Theorem 3 implies that these further influence the response of average residence time. This is illustrated in Fig. 3b, in which the maximum of t * j no longer coincides with . In this case, the manipulated patch-type happens to have a gain function less curved than average in the neighborhood of so that, according to Theorem 3, an increase of t * j is more likely, all else equal. Consistent with this, the maximum of t * j is shifted toward higher x i values (Fig. 3b). Last, if several patch-types are simultaneously altered in the habitat together with patch-type i (i.e. x l = x l (x i )), we get from (11), for any k ∈ : The first term in the parenthesis can be simplified as above to yield (dx From (8), this means, in the context of (23): Hence, scaling the gains in one patch-type leaves the optimal residence time unchanged if and only if the scaling is identical to that of the average gain in the habitat. A necessary condition for all optimal residence times t * k to be invariant is to have (26) hold for all k ∈ : all exploited patch-types should thus have their gain functions scaled in exactly the same way, i.e. d ln x k /d ln x i = 1 for all k ∈ . However, it still remains to be determined whether equality holds in Eq. (26), which also a b Fig. 3 The variation of average optimal residence time when the gain function is scaled, in the case of a function (21) and b function (22). As in Fig. 2c, x 2 is varied in a three patch-type habitat with x 1 = 1, x 3 = 3, p 2 = 0.6 and p 1 = p 3 = 0.2. In the first case, the maximum of t * j (thick curve) occurs when patch-type 2 yields average absolute gains (thin curves). In the second case, the maximum of t * j occurs when patch-type 2 yields greater-than-average absolute gains. The second time-derivatives are also shown (the average was computed as the harmonic mean (17)). Other parameters: T = 1, λ = 1, k = 1 depends on the variation of x k for non-exploited patches. As shown in Appendix, this imposes an additional constraint on non-exploited patches, which, for instance, is satisfied if F l (x l , 0) = 0 (as is often assumed). In any case, a sufficient condition for invariance of all residence times is to have all gain functions (even for non-exploited patches) rescaled in exactly the same way. This type of transformation is illustrated in Fig. 2b. Hence, upon scaling the gains in a heterogeneous habitat, one should preserve the habitat heterogeneity (in the sense that the coefficient of variation of x must stay constant over all exploited patches), otherwise invariance is lost.

Scaling the time
A different form of scaling invariance was proposed by Charnov and Parker (1995), based on an approximation of function (21). They reported that if parameter λ is increased, and travel time is simultaneously reduced (so that the product λT stays constant), then λt * appears to be almost invariant under the MVT. This invariance and the underlying constraint on λT are consistent with data on the duration of copulation in dung-flies (Charnov and Parker 1995). In this context, the relevant patch attribute x is λ rather than μ in (21). Intuitively, increasing λ corresponds to accelerating time, and thus the kinetics of gain acquisition, which constitutes another natural way to improve patch quality (Parker and Stuart 1976;Ranta et al. 1995). Remark that decreasing k in function (22) has exactly the same accelerating effect. We are thus led to considering the class of gain functions for some x i > 0 and G i , together with having travel time inversely proportional to x i , i.e. T i (x i ) = τ i /x i for some positive τ i . Class (28) includes both (21) and (22), with x taken to be λ and 1/k, respectively. Given x i is a metric of quality for all t * i (Proposition 1), as was the case for (23). Graphically, just like the earlier form of invariance (Sect. 5.3.1) corresponded to scaling the gain function vertically, the present invariance corresponds to scaling it horizontally, together with travel time. This is illustrated in Fig. 4. Invariance of x i t * i in (28) implies invariance of the absolute gains F i (x i , t * i ), as shown in the figure. Using our results, we can prove that this invariance property suggested by Charnov and Parker (1995) holds exactly, not only approximately, in homogeneous habitats. However, in heterogeneous habitats, this invariance is again non-generic. Since the approach is the same as above, we will directly consider the case where several patchtypes are simultaneously manipulated in the habitat.
From (25), for any exploited patch-type, we can express dt * k /dx i as (22) is used as an illustration of class (28). In a homogeneous habitat, increasing x while decreasing T in the same proportion leaves the product xt * unchanged, as was conjectured by Charnov and Parker (1995). This in turn implies that the absolute gains extracted from a patch are also invariant (horizontal green line) (color figure online) where, as before, d ln E * n /dx i incorporates the effects of all manipulated patches.

Fig. 4 Scaling the time and invariance. Function
Noting that x k t * k stays constant if and only if which leads us to exactly the same condition (26) as for the previous form of scaling invariance.
In the context of (28), it is shown in Appendix that the invariance condition means: We immediately see that it is true in homogeneous habitats, proving the invariance property conjectured by Charnov and Parker (1995), not only for function (21) but for any function in class (28). However, just like the previous form of invariance, this one is non-generic in heterogeneous habitats. In particular, global invariance of the x k t * k results if and only if all exploited patch-types are scaled homogeneously, i.e. d ln x j /d ln x i = 1 for all j ∈ , and an additional constraint is satisfied for non-exploited patches (Appendix).

Should one stay longer on better patches?
So far we compared different habitats in the sense that changes in patch attributes caused a change in the overall quality (E * n ). However, one (intuitive) prediction often attributed to the MVT is that, in a given heterogeneous habitat, optimal residence times should rank in the same order as patch qualities, where quality is intended as having a 'better' gain function (Parker and Stuart 1976;Kelly 1990;Wajnberg et al. 2000). This was already suggested graphically in the seminal MVT article (Charnov 1976;see also Fig. 1b).
Let us consider that the gain functions all come from varying a parameter x in some The classes of gain functions (23)/(28) considered in the previous section, with one function G, are instances of this scenario. Since we are only interested in the gain functions, we will assume that travel times do not vary with x i . In a given habitat, unexploited patches have null optimal residence times, and all positive optimal residence times are determined from E * n , as shown in Fig. 1b. Hence, x i entirely determines t * i ; for all patch-types i ∈ , t * i is given by one function of x i . If x min is the lowest x value over patch-types in , and x max the highest, greater x values unambiguously represent better patches within the habitat if x i is a metric of quality (Definition 1) for all x i ∈ (x min , x max ). We are interested in determining whether t * i is an increasing function of x i , for a given value of E * n . When varying x i for some patch-type i ∈ , ignoring the variation of habitat quality E * n , the change in t * i is obtained from (13) with ∂ ln E * n /∂ x i set to zero. This gives the (total) derivative of t * i as: The sign of ∂ 2 F(x i , t * i )/∂ x i ∂t i is not constrained by x i being a quality metric (Proposition 1), so that dt * i /dx i can have any sign, depending on how the timederivative of F changes with x i . We can immediately conclude from (30) that, in a given habitat, t * i is an increasing function of The generic transformation corresponding to these scenarios is rotating the gain functions, with x i representing the angle of rotation. If x i > 0, ∂ 2 F(x i , t i )/∂ x i ∂t i > 0 for all t i , so that dt * i /dx i > 0: individuals should spend more time on better patches. If x i < 0, the reverse is true. This is illustrated in Fig. 5a.
Going back to the functions studied in this previous section, it is straightforward to see that varying x in class (23) is an example of the first situation. Indeed, for all t and x, we have: a b Fig. 5 Stay longer on poorer patches? a From some gain function (red), increasing quality by rotating the function clockwise makes t * i a decreasing function of quality within a habitat (top). Rotating the function anti-clockwise makes t * i an increasing function of quality within a habitat (bottom). Dotted gray curves translating the gain function vertically makes t * i constant over patch-types (green vertical line). b Two four patch-type habitats are illustrated. In the lowmost (magenta) habitat, parameter λ was varied in gain function (21). In the topmost (blue) habitat, parameter k was varied in gain function (22). In the first case, the four optimal residence times (labeled 1 to 4) increase with patch quality if patches are less than 50 % exploited (horizontal dashed line), but decrease if patches are more than 50 % exploited. In the second case, they increase with patch quality if patches are less than 63 % exploited (horizontal dotted line), but decrease with quality if patches are more than 63 % exploited. For clarity, the magenta curves were shifted to the right. Other parameters: v m = 1.2, μ = 0.5 (color figure online) As F is increasing in t at any MVT optimum, dt * i /dx i > 0 (30), for all x i . Thus, individuals should indeed spend more time on better patches for this class of gain functions. Figure 2a offered an illustration of this in the case of function (21). However, even for very similar and natural ways to model patch quality, the MVT can readily yield the opposite prediction that individuals should stay longer on poorer patches. If we consider instead the class of functions (28), for instance the same two functions (21) and (22), the cross derivatives It follows that, in both cases, they are positive if Remembering that parameter k in (22) is the half-saturation constant, i.e. the time it takes to obtain gains v m /2, we immediately see that t * i < 1/x i if and only if the patch-type is less than half-depleted. Therefore, t * i is an increasing function of x i if all exploited patches are less than half-depleted, but a decreasing function of x i if all are more than half-depleted. Similarly, in the first case, t * i < 1/x i implies that the relative exploitation of patches should be no more than F(x i , 1/x i )/μ = 1 − e −1 , which is about 63 %. These predictions are illustrated in Fig. 5b.
Remark that, from Eq.(30), if the time-derivative of F does not vary with x i , dt * i /dx i = 0 and the optimal residence time will be the same on all patch-types, irrespective of their quality. It will thus be the same as t * in a similar homogeneous habitat. The generic transformation corresponding to this situation is varying quality by translating gain functions vertically, i.e. F(x i , t) = F(t) + x i (Fig. 5a). This can describe instant rewards obtained upon entering and/or leaving patches, such as the reward of biting for cheaters in cleaning mutualisms (Bshary et al. 2008). Therefore, just like scaling the gain functions vertically (functions (23) in the previous section) was an identity transformation in homogeneous habitats, translating the gain functions vertically is an identity transformation in heterogeneous habitats, as optimal residence time is invariant to heterogeneity.
Finally, Eq.(30) reveals an affinity between the sign of variation of optimal residence times with quality (Theorems 1-3) and the ordering of optimal residence times with respect to quality in a habitat. Indeed, in both cases, the key element is the sign of If it is negative for all x, optimal residence times are lower on better patches, while, from Theorems 1-3, t * i∈ and t * j all decrease with quality. If it is positive for all x, optimal residence times are longer on better patches, while t * i∈ and t * j might increase with quality. This shows that the condition for optimal strategies to decrease with quality is similar to, but less stringent, than the condition for residence times to rank in reverse order of patch qualities within habitats. As an example, in Fig. 5b, while we can be sure that t * would decrease with x when t * i is a decreasing function of x i within habitats (i.e. when patches are sufficiently depleted), the fact that t * i is an increasing function of x i when patches are little depleted does not guarantee that t * would increase with x (actually, using the type of construct shown in Fig. 1a, one can visualize that t * always decreases with x, as an application of Theorem 1 would confirm).

Conclusions and perspectives
The Marginal Value Theorem (MVT; Charnov 1976) offers a fairly general theoretical connection between the attributes of patchy habitats and optimal foraging strategies (Stephens and Krebs 1986). However, as it only provides an implicit definition of optimal strategies, general predictions on the consequences of habitat alterations have remained elusive, with strong reliance on graphical arguments. Here we have reanalysed the MVT in order to provide such general predictions on how optimal strategies should vary with habitat characteristics. We found that some existing predictions were indeed robust: we confirmed the effect of increasing travel time in a more general setting (Sect. 5.1) and proved an invariance property conjectured by Charnov and Parker (1995)(Sect. 5.3). However, several predictions sometimes attributed to the MVT did not prove robust.
First, there is no general trend between optimal residence times and quality: the former can increase or decrease with quality, depending on the exact way gain functions are transformed. We have provided general guidelines regarding what sort of transformations would yield one or the other outcome (Theorems 1 and 2). The crucial point is how the time-derivative of the gain function varies with quality: only if it increases sufficiently can optimal residence time go up with quality. Any habitat alteration that does not make gain functions steeper (including changing the relative abundances of patch-types; Sect. 5.2) invariably yields a decrease of optimal residence time with quality. Second, even within a given habitat, optimal residence times do not necessarily rank in the same order as patch qualities, i.e. one should not always spend more time on better patches: the contrary can, counterintuitively, be optimal. The conditions for this are similar, but more stringent, than those required to observe a lower patch residence time following increased patch quality (Sect. 5.4). Last, the scaling invariances of optimal strategies that were proposed for homogenous habitats(e.g. Parker and Stuart 1976;Charnov and Parker 1995;Ranta et al. 1995) have been shown to be non-generic in heterogeneous habitats (Sect. 5.3). Interestingly, however, we obtained a prediction that the average rate of movement should always be higher in a heterogeneous habitat than in a homogeneous habitat, in the oftenconsidered scenario where patch quality corresponds to a vertical scaling of the gain function.
Our results help better understand the consequences of habitat heterogeneity. All else equal, optimal residence time is more likely to increase with patch quality in a heterogeneous, rather than homogeneous, habitat. This is especially true if the focal patch of interest is rare in its habitat, and is poorer than the average patch (Proposition 2). This indicates that predicting the effect of increasing patch-quality, in experimental settings where the whole habitat context is not known, is hazardous. The non-genericity of the above-mentioned invariances was a manifestation of this. However, a strong prediction emerges: increasing the quality of some patch-types always decrease the optimal residence time on all other exploited patches (Corollary 1). We also provided a comparison between the average behavior of heterogeneous habitats and that of an "average homogeneous habitat". We have shown that the two behave similarly only if there is no heterogeneity in the curvature of gain functions (at the optimal residence times). Otherwise, a given change in average habitat characteristics might elicit contrasted responses of the average residence time, depending on which patches are altered (Theorem 3). As a consequence, some patches may have disproportionately stronger impact than one would expect based on mean-field considerations, qualifying as keystones (Mouquet et al. 2013). In practice, determining if we are in this sort of situation necessitates estimating curvatures of gain functions respect to time, and predictions involve the harmonic mean of curvatures, which is the appropriate mean in this context. These are much more demanding tasks from a statistical perspective, adding to the challenge of prediction in heterogeneous habitats, compared to homogeneous habitats.
The general results we obtained for heterogeneous habitats pave the way for more applications of the MVT at the level of whole habitats, whereas it is traditionally used at the level of specific patches (Stephens and Krebs 1986). Experimental microcosms appear particularly well-suited to test our predictions (e.g. Friedenberg 2003). These new developments on the MVT can be applied to specific gain functions, as we did in the applications, to obtain precise predictions tailored to particular systems or scenarios. They also provide a framework to assess, in all generality, the robustness of other predictions that have been proposed from graphical arguments and tested experimentally, for instance that varying travel times should have a stronger impact on residence time in richer habitats (Muratori et al. 2008).