Understanding the connections between species distribution models

Models for accurately predicting species distributions have become essential tools for many ecological and conservation problems. For many species, presence-background (presence-only) data is the most commonly available type of spatial data. A number of important methods have been proposed to model presence-background (PB) data, and there have been debates on the connection between these seemingly disparate methods. The paper begins by studying the close relationship between the LI (Lancaster&Imbens, 1996) and LK (Lele&Keim, 2006) models, which were among the first developed methods for analysing PB data. The second part of the paper identifies close connections between the LK and point process models, as well as the equivalence between the Scaled Binomial (SB), Expectation-Maximization (EM), partial likelihood based Lele (2009) and LI methods, many of which have not been noted in the literature. We clarify that all these methods are the same in their ability to estimate the relative probability (or intensity) of presence from PB data; and the absolute probability of presence, when extra information of the species' prevalence is known. A new unified constrained LK (CLK) method is also proposed as a generalisation of the better known existing approaches, with less theory involved and greater ease of implementation.


Introduction
Ecologists employ Species Distribution Models (SDMs) to assist in mapping the spatial distribution of a species over its geographic range, despite there being only limited observational data available. These models often analyse presence-background (PB) data (equivalently referred to as presence-only data), which contains a list of 'presences', or locations where individuals have been observed, but typically having no information about absences -sites where species have not been observed. PB data is often plentifully available from so-called "opportunistic surveys" and can be found in museum and herbarium collections, historical database records (Pearce & Boyce, 2006), and is now becoming increasingly available via online repositories such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org). Given such data, one of the key goals of SDMs is to estimate the site-specific probability of presence in the study region. Since SDMs assume that covariates are ultimately responsible for determining species' spatial distributions, SDMs model how covariates affect the local probability of presence. To help estimate the site-specific presence probabilities, SDMs make use of extra background sites at which the information of the environmental covariates (e.g. temperature, altitude, etc.) are available.
More specifically, SDMs estimate the probability p(y = 1|x) that a species of interest is present, y = 1 (versus absent, y = 0) at a particular site, conditional on environmental covariates x at that site. The probability p(y = 1|x) is also referred to as the resource selection probability function (RSPF). A common practice is to assume a parametric structure for modelling p(y = 1|x), for example, the widely used logit form, log p(y=1|x) 1−p(y=1|x) = η(x T β). Here η(x) can be a linear or a nonlinear function of x, and a logit-linear specification follows log p(y = 1|x) 1 − p(y = 1|x) The goal of the SDM is to estimate all of the parameters β i . Also of interest is the species' overall prevalence π, which is the proportion of sites with species' presence in the study region. Thus, π = p(y = 1|x)dF (x), where F (x) is the unknown probability distribution function for x. For PB data, π is generally unknown since there is little or no information about the presence status of the background points.
A number of methods exist for modeling species distributions based on PB data, and in the paper we will focus on the maximum-likelihood based logistic regression methods discussed in Phillips and Elith (2013), the point process models (PPM) (Chakraborty et al., 2011;Warton & Shepherd, 2010), and the widely applied MAXENT method (Phillips et al., 2006). The logistic regression methods include the SC method (Steinberg & Cardell, 1992), LI by Lancaster and Imbens (1996), LK by Lele and Keim (2006) and Royle et al. (2012), the Expectation-Maximisation (EM) of Ward et al. (2009), and the scaled binomial loss model (SB) by Phillips and Elith (2011). We also include the partial likelihood based Lele method by Lele (2009) in our study.
These methods on species distribution modelling have been developed independently using different definitions and framework. The key goal of the paper is to show the equivalence between these seemingly disparate models. This is one of the major contributions of our manuscript. The connections are revealed initially by studying the close link between the LI and LK methods, which were among the first developed methods for analysing PB data. It will be shown for the first time that the LK method is a numerical approximation of the LI method.
Secondly, we examine the analogy between the PPM and the LK model, when the likelihood function of the PPM is approximated by its discrete counterpart. We also show the equivalence between the SB, EM, LI and the Lele methods. These equivalences have not been noted previously in the literature. Along with other findings on relations in the field, such as those done by Baddeley et al. (2010); Warton and Shepherd (2010); Aarts, Fieberg, and Matthiopoulos (2012); Fithian and Hastie (2013); Renner and Warton (2013), we conclude that all these methods are essentially equivalent in their ability to estimate the relative probability of presence. Furthermore, we present a unified constrained LK (CLK) method, which bridges the gaps between these seemingly different approaches. Each of the methods discussed in the paper is shown to be a special case of the unified CLK method.
2 The relationship between LI and LK methods Lancaster and Imbens (1996) proposed a contaminated case control study for representing PB data, in which the set of sites in the study area is divided into two subsets. Subset 1 consists of all those sites in the study area on which the species is present. Subset 0 comprises the whole set of sites in the study area, with no information made available regarding which of these 'background sites' the species is present or not. However, the relevant environmental covariates are known at all background sites.
LI defined a sequence of n Bernoulli trials with the probability h to choose between the presence (case) and background (contaminated controls) points. A binary indicator u was used to denote the stratum, with u = 1 if the observation was drawn from the presence, and u = 0 if it was drawn from the whole population. After the n Bernoulli trials, there are n 1 sites chosen with species presences, and n 0 background sites with unknown status. The background points in analysing the PB data is usually taken as either a uniform sample or a regular grid with a sufficient large set of values of covariates. The distribution of the environmental covariates F (x) can be approximated by a discrete distribution with unknown probabilities α l on L + 1 known points of support x l (Lancaster and Imbens, 1996). An empirical estimator of α l is the fraction of observations taking the value x l in the background data, i.e.,α l = n l /n 0 .
From Bayes theorem, we can derive p(x|y = 1) = p(y=1|x)f (x) π . The joint distribution of stratum u and covariates x is: (Lancaster and Imbens, 1996) , which can be rewritten as The full likelihood function for the contaminated sampling scheme based on the joint distribution of (x, u) is where the total number of sample points is n = n 0 + n 1 . By splitting the full likelihood into three partial likelihoods in (2), the role of each likelihood becomes clear. The partial likelihood function L 3 (h), which is independent of other parts of the likelihood function, is used to estimate the unknown sampling proportion with a binomial type of estimatorĥ = n 1 /n. Similarly, L 2 is relevant to the estimation of the probability distribution function of covariates F (x). It is the partial likelihood L 1 (β, π) that contributes to the estimation of β, and the probability of presence p(y = 1|x, β).
Let's take a further look at the partial likelihood of L 1 . The population prevalence π involves an integral p(y = 1|x)dF (x) , which can be approximated by x l p(y = 1|x l , β) n l n 0 on L + 1 known points of support x l , with F (x) replaced by its empirical estimate ofα l = n l /n 0 . This approximation for π can be rewritten as 1 , when we shift the sample space from the environmental space x(s) to the geographic feature s . Upon this transformation for π, the partial likelihood L 1 (β, π) in the LI method becomes . ( This approximation of the partial likelihood for β is exactly the likelihood of the popular LK method (Lele & Keim, 2006;Royle et al., 2012). Through maximising the likelihood, it is possible to estimate the best fitting parameters β required to determine the probability of presence. The LK method can be viewed as a numerical approximation of the LI method, where the accuracy of the approximation will improve, in a statistical sense, by increasing the size of the background samples. In the following Simulations section, we will show the similarity between the LI and LK estimates, when the number of background points is large.
Can the true probability of presence p(y = 1|x) be estimated from the LI or LK methods? There have been extensive discussions on this topic (Lele & Keim, 2006;Ward et al., 2009;Royle et al., 2012;Phillips & Elith, 2013;Solymos & Lele, 2016). In the Simulation section, we will demonstrate the need for extra information, such as the parametric structure of p(y|x) or prior knowledge of species' prevalence, π, in order to estimate the absolute probability of presence. As both the LI and LK methods require no prior knowledge of π, their successful operation relies on the resource selection probability function (RSPF) conditions, which have been given by Lele and Keim (2006) and further discussed in Solymos and Lele (2016). Loosely speaking, the RSPF condition includes, for example, that the true (actual) function of log p(y = 1|x) is nonlinear, and not all covariates in the model are categorical.

The relationship between LK and PPM
The point process model (PPM) in spatial analysis has recently been proposed as a versatile approach for analysing species presence-background data (Warton & Shepherd, 2010;Chakraborty et al., 2011), because it treats space as continuous, which seems more realistic than discrete space approaches. Poisson point process models, however, have been shown to be closely connected to other popular methods in ecology such as MAXENT (Aarts et al., 2012;Renner & Warton, 2013), logistic regression (Baddeley et al., 2010;Warton & Shepherd, 2010) and resource selection models (Aarts et al., 2012). Hefley, Brost, and Hooten (2017) also showed that the discrete space SDMs can be linked to the PPM using the so-called change of support.
In this section, we will briefly demonstrate how the likelihood of the LK method is associated with the conditional likelihood of the PPM, which is equivalent to MAXENT. We note that Aarts et al. (2012) also observed the equivalence between the LK and the conditional PPM. However they did not provide any formal details of how the equivalence can be reached through a numerical approximation of the PPM, as we show here.
In the PPM framework, PB data consists of a set of locations s 1 , s 2 , · · · , s n 1 , where individuals of a species are observed in a region D. These locations are defined as a realization of a point process that is characterized by the intensity λ(s), which varies spatially according to a parametric function of environmental features x(s). The likelihood proposed for fitting an inhomogeneous Poisson process is L(s 1 , · · · , s n 1 , n 1 ) = exp(− D λ(s)ds)( (Cressie & Wikle, 2011;Renner et al., 2015). This likelihood function was derived as the product of the conditional likelihood, and the marginal likelihood, P (N (D) = n 1 ) = exp(−Λ(D))Λ(D) n 1 /n 1 ! (6) (Møller & Waagepetersen, 2003;Dorazio, 2014), where Λ(D) = D λ(s)ds is the cumulative intensity over the study area of D.
The likelihood (5) involves an integral over a study area that cannot be computed exactly and must be approximated numerically. Berman and Turner (1992) developed a numerical quadrature method for estimating the integral by approximating it as a finite sum using any quadrature rule, i.e., D λ(s)ds ≈ n 0 i=1 λ(s i )w i . In the simplest form, we assign equal weight to each quadrature point, for example w i = |D| n 0 , by partitioning D into n 0 equal rectangular tiles and a single quadrature point selected from each tile (Baddeley, Rubak, & Turner, 2015). |D| represents the total area of the region. With this simple quadrature scheme, the conditional likelihood for the PPM is approximated by The above discretized version of the conditional likelihood of the PPM is the same as the likelihood of MAXENT using the log-linear intensity function, log λ(s) = β x(s) Renner & Warton, 2013). We want to address that this approximation is also the analogy of the likelihood of the LK method in Eq. (3). It is worth noting that the background points used to approximate π in the LK method play exactly the same role as the quadrature points, which are used in the PPM for numerically evaluating the cumulative intensity D λ(s)ds. The choice of different quadrature schemes in approximating the conditional PPM can lead to models very different from the LK method (A discussion of the various quadrature schemes can be found in Chapter 9 of Baddeley et al. (2015)).
The difference between the LK and the approximated version of the conditional PPM lies in the link functions being used for modelling p(y|x, β) and λ(s) respectively, often chosen by consideration of the range of values a probability and an intensity function can take. Nevertheless, we will show through numerical simulations that the choice between the different link functions, for example, the logit, log-linear or the complementary loglog functions, makes little difference in estimating the ratio, p(y=1|x i ,β) π , i.e., the relative probability (or intensity) of presence. In other words, when using either the LK/LI, MAXENT or conditional PPM model in studying the PB data, all of them will yield the same relative probability (or intensity) of presence. It is also worth mentioning that although the PPM has been introduced as a natural framework for modelling PB data, its ability to produce the relative probability of presence (or relative intensity), which is free of grid or transect selection, is the same as other so-called discrete space models.
Can the true intensity of presence be estimated with PPM methods? From the discussions of Fithian and Hastie (2013), Renner et al. (2015) and Dorazio (2012), we know that the PPM can only estimate the intensity of reported presence from its full likelihood function, instead of the intensity of true presence λ(s). It is because an underlying equation Λ(D) = n 1 is derived from the full likelihood function, in addition to the conditional likelihood of the PPM . This additional information of Λ(D) is biased, as the cumulative intensity should equal the number of true presence over the study area, whereas n 1 was only observed by opportunity. One way to correct for this bias is to make use of an appropriate species presence number in the PPM for Λ(D). Ward et al. (2009) proved that the probability of presence is not identifiable from PB data, if there was no information about the structure of the the probability function. Under this circumstance, the knowledge of the population's prevalence is required to estimate the true probability of presence. They used the commonly used logit function to fit the PB data, and proposed the Expectation-Maximisation (EM) algorithm to estimate the parameters of the logistic regression. The EM algorithm was able to estimate the probability of presence accurately at any site, using the species' prevalence as an additional information.

The relationship between EM, SB, Lele and LI methods
Two other successful methods discussed in the literature, the SC (Steinberg & Cardell, 1992) and SB (Phillips & Elith, 2011) method, also require the true species' prevalence to obtain estimates of the site-specific probability of presence. Although the EM and the SB methods work on different likelihood functions, their estimates of the probability of presence are essentially the same. It is also the first time to show the equivalence between the likelihood function of the EM/SB method and the LI method (see details in Appendix A).
Lele (2009) proposed a new method, referred to as the Lele method in our paper, to improve the instability of the the LK method that they identified. The Lele method is a combination of the partial likelihood and data cloning to obtain the maximum likelihood estimator of both β and π. In Appendix B, we show that the likelihood function of the Lele method is the same as that of the LI method, although these two seemingly different approaches were developed independently.

General connections between all methods
The methods discussed in this paper, i.e., LI, LK, Lele, EM, SC, SB, MAXENT, and the PPM, can be divided into three different camps, based on their underlying likelihood functions and type of extra information required. The LI, LK, and Lele methods are sorted into Camp 1, which can estimate the absolute probability of presence, provided that the probability function satisfies the RSPF conditions (Lele & Keim, 2006;Solymos & Lele, 2016). The method of EM, SB and SC fall into Camp 2, and are also able to estimate the absolute site-specific probability of presence, using an additional input of the species' prevalence. The MAXENT, continuous space PPM method and its associated models, are categorized into Camp 3, according to their connection between one another (Warton & Shepherd, 2010;Baddeley et al., 2010;Aarts et al., 2012;Renner & Warton, 2013).
We have discussed some of the pairwise relationships, such as between LI and LK, LK and PPM, SB and LI, and between Lele and LI, respectively. Is there a way to connect together all the methods discussed in the paper? We believe this is in fact possible and a summary of our findings is given in Fig. 1, where the relations inside the same camps and across different camps are first-time presented.
The methods in Camp 1 and Camp 3 (e.g., the LK and the PPM) are shown to share a common conditional likelihood, which has the same structure as the partial likelihood L 1 (β, π) in Eq. 2. The methods in Camp 1 and 2 (e.g., the LI, Lele, EM and SB methods) are constructed on the same likelihood function, i.e., L(β, h, α, π) in Eq. 2, which can be further decomposed as the product of the likelihood L 1 (β, π) and other terms that do not involve both β and π. Therefore, all the methods in the three camps are actually built on the same partial/conditional likelihoods, i.e., L 1 (β, π). In other words, all these seemingly different SDM models are equivalent in their ability to estimate the relative probability of presence for modelling PB data, regardless of their different presentations. The difference between Camp 1 and Camp 2 is that the methods in the latter require a pre-determined value of species' prevalence, π, while the LI and Lele methods in Camp 1 treats π as an unknown parameter. As for the LK method, in order for the LI and Lele methods to identify π, the RSPF conditions listed in Lele and Keim (2006) and Solymos and Lele (2016) need to be satisfied. However, this has led to controversy criticized by data scientists in particular (Ward et al., 2009;Phillips & Elith, 2011;, because the true parametric functions are generally unknown in practice, and the functions used to fit these true functions can be of different structures. Under these circumstances, a revised version of the LK method is proposed in Section 6, where the PB data is augmented with an additional datum on the species' prevalence π. This makes the LI/LK methods comparable to the EM, SB and SC methods.

A unified Constrained LK (CLK) method
From previous studies, we have found that the LI, LK, MAXENT and the conditional PPM share a similar likelihood function i.e., Eq. 3 and Eq. 7, which alone (without extra information) can only provide the relative probability (intensity) of presence. In order to obtain the absolute probability of presence, an extra information of the species' prevalence π can be introduced as a constraint imposed on the optimization of this common likelihood function. In details, the CLK method maximizes the following (LK type of) likelihood function, with the constraint, i.e., 1 n 0 n 0 j=1 p(y = 1|x j , β) = π 0 , where π 0 is the population prevalence that is assumed to be known in advance. Note that this is very different from just maximising the function of log L = n 1 i=1 log p i π 0 , since the constraint reduces the effective parameter space over which the maximization is performed. The statistical mechanism and efficiency underlying the CLK method is provided in Appendix C, where we have proved that the CLK is capable of estimating the true probability of presence, the same as the SB and SC methods.
The LI, LK and the partial likelihood of the PPM (or MAXENT) would intrinsically have identification problems in solving their likelihood functions, if there is no prior knowledge of the species prevalence, and/or the structure of the function of the probability of presence. In other words, these methods in general would generate multiple solutions of the absolute probability of presence, i.e., the relative probabilities of presence. By introducing the constraint, the CLK method forces the estimates from these methods to converge to the unique solution, which is just one of the multiple solutions obtained from the LI, LK and the MAXENT methods. The CLK method provides a unification of the seemingly disparate methods discussed so far (SB, SC, EM, LI, LK, Lele, PPM and MAXENT). Each of these methods can be shown to be either equivalent to, or a special case of, the CLK method.
Firstly, LK, LI and Lele are special cases of the CLK method, when log p(y|x, β) is a nonlinear function and no constraint is used. If the RSPF conditions (Lele & Keim, 2006;Solymos & Lele, 2016) are not satisfied, using the logit-linear or other functions to fit without constraint fails to estimate the probability of presence (Phillips & Elith, 2013). The inclusion of the additional information of π in the CLK method fixes this problem, and enables the LI/LK methods to perform as well as SB, SC or EM method.
Secondly, the CLK method has the same performance as the SB, SC and EM methods, when the logit link function is employed. However, unlike these methods, which were only derived for the logit function, the formulation of the CLK method is much simpler and can easily adapt to any type of link functions.
Next, the PPM can be reviewed as a special case of the CLK method, when the log-linear function is used for p(y|x, β), and a constraint of n 1 |D| is imposed on the denominator of Eq. 8. For a log-linear function function, i.e., log p(y|x, β) = β 0 + β 1 x, estimates of β 1 are the same for both the CLK method and the conditional PPM (equivalently the MAXENT), whereas the ratio of the two methods differ by a constant, i.e., the exponent of the difference between the two β 0 s . It is the constraint that provides the estimate of the intercept in the log-linear model. Similarly, MAXENT model is also a special case of the CLK method, using the logarithm function but without any constraint supplied.
Unlike all of these previous methods, the CLK does not specify any particular link function; instead it can use any of the commonly used link functions, such as the logit, log or the complementary log-log functions. Interestingly, we will show in the Simulation section that using different link functions actually have little difference on estimating both the relative and the absolute probabilities of presence. The proposed CLK method is easy to implement, and users can choose any general-purpose nonlinear constraint optimization package in their preferred programming language. We have implemented the CLK method in R, and used the constraint optimization package 'nloptr' (Ypma, 2014). The code of the new method is included in the Supporting Information.

Simulations
In this section, the performance of the proposed CLK method is evaluated through numerical simulations, using three commonly applied link functions, logit-linear, log-linear, and complementary log-log, denoted separately as CLK logit, CLK log and CLK clog. The CLK method can easily include other link functions. The large sample equiva- We consider eight species, with seven of them having the same probability functions of occurrence used in Phillips and Elith (2013) (see Table 1). The extra species considered in our paper has the exponential distribution. The probability of presence p(y = 1|x) depends only on a single environmental covariate or explanatory variable x, and its value ranges uniformly between [0,1]. Five models were considered, i.e., LI, LK, CLK logit, CLK log and CLK clog. In our simulation, no knowledge is assumed about the parametric structure of the true probability of presence, and we fit the data with the commonly used logit function for both the LI and LK methods.
We plot the logarithm of each probability function in Table 1 to verify the RSPF conditions listed in Lele and Keim (2006) and Solymos and Lele (2016). It is observed from Figure 2 that only Logistic-2, Quadratic and Gaussian distributions exhibit the required nonlinearity and appear to satisfy the RSPF conditions, from the eight species.
For each species, 2,000 presence samples were drawn representing the locations of 2,000 observed individuals. Similarly 20,000 background samples were drawn. For each species, 100 simulations were run, and both the LI and LK methods were used to fit each simulation. The three CLK models were only fitted and plotted for one of the 100 simulations respectively, as all the 100 fits were very similar to each other for each CLK model. The fits were compared both visually (Fig. 3) and using the root mean square (RMS) error (Fig. 4) as the assessment statistics, against the true probability of presence. For the "Quadratic" and "Gaussian" species, quadratic terms of x were added to fit the true probability. As the CLK method requires an estimate of the species' prevalence, we use the true prevalence as the estimate. Sensitivity analysis was also carried out by varying the true prevalence by ±0.1, and the results are reported in Fig. 3 as well.
We note that the numerical results of the LI and LK methods reported in Phillips  Table 1 is plotted, in order to verify the RSPF conditions provided by Lele and Keim (2006) and Solymos and Lele (2016), i.e, log p(y|x, β) being non-linear. Three graphs of the probabilities of presence are nonlinear and thus satisfy the RSPF conditions, and they are Logistic-2, Quadratic and Gaussian distributions.
and Elith (2013) appear different to those reported here, because parameters of these two methods are not identifiable in some of the simulations. In our simulations, the identifiability was assessed by computing the reciprocal of the condition number, the ratio of the largest to the smallest eigenvalues of the Hessian matrix. A ratio very close to zero (not exactly zero using the Hessian matrix as the estimate) indicates an identifiability issue. We arbitrarily chose 0.001 as the threshold to assess the identifiability for each simulation. The summary statistics (means and standard errors of the estimates) were computed for the adjusted interceptβ 0 and slopeβ 1 , after removing unidentifiable simulations. In order to demonstrate the large sample equivalence between the LI and the LK methods, both methods were fitted with logit-linear and log-linear functions, and their summary statistics are shown in Table 2 and Table 3 separately. Only the slope estimates are reported in Table 3, because the intercept of the log-linear model is not identifiable for the LI and LK methods.
We also plotted the ratio of p(y=1|x,β) π for the three CLK methods, as well as the LK method fitted with log-linear and logit-linear functions respectively, shown in Fig. 5. The relative probabilities of the LK method fitted with the log and logit linear link functions were plotted for each of the 100 simulations, while the relative probabilities for the CLK methods were only plotted once due to the high similarities among the 100 replications.
The R code provided by Phillips and Elith (2013) facilitated our programming process. All model-fitting was carried out in R version 3.2.2 (R Core Team, 2016).

Results
Firstly, we see in Fig. 3 that when the true species probability of presence is logit-linear in the case of Logistic-2, the LI/LK methods fit the data well, because the logit-linear function satisfies the RSPF conditions (Lele & Keim, 2006;Solymos & Lele, 2016). In most other cases, both LI/LK methods have a wide spread for their estimates in the plots, which gives an indication of the non-identifiability of LI/LK methods in estimating the probability of presence. These happened because the probability functions for most simulated species do not satisfy the RSPF conditions (see Figure 2 for details). The Quadratic and Gaussian distributions do satisfy the RSPF conditions, however, the performance of the LI/LK model using the logit function were not good for fitting these two distributions. When the PB data is augmented with the species' prevalence, the CLK method closely approximates the true probability of presence, using the loglinear (red lines), logit (red lines) or the complementary loglog link functions (blue lines) (except for the species of Logistic 1).
Secondly we can see a close resemblance between the LI method (yellow lines) and LK method (purple lines) fitted with logit function respectively (Fig. 3). This resemblance is further supported by the RMS errors of the two methods (see two grey column charts in Fig. 4). However, there still exists some discrepancy between the two methods for some simulated species, for example with the constant and quadratic distributions. The mean and standard errors of the estimates for the LI and LK methods are similar to Figure 3: LK and LI methods were fitted for each species with a replication of 100 times (two types of grey dotted lines), with the true probability given by the black line. CLK method was fit with the logit-linear function using the true prevalence (red lines), and logit function using the true prevalence ±10% (red dashed lines). CLK method was fit with the log-linear function using the true prevalence (yellow line), and log-linear function using the true prevalence ±10% (yellow dashed lines). CLK method was fit with the complementary loglog function using the true prevalence (blue lines), and using the prevalence ±10% (blue dashed lines).    Table 2. The similar results of the LI and LK methods can also be seen in Table 3, where both methods were fitted with the log-linear functions. It is obvious that slope estimates between the LI and LK method are nearly the same. As the LK method is just the numerical approximation of the LI method, any difference between the two methods become less obvious as the number of background points increase. Upon examining Table 2, it is hard to see a clear and simple relationship between the LI/LK and the CLK estimates obtained when fitted with the logit-linear function. The estimates from both the LI and LK methods in general have higher standard errors compared to the CLK estimates. For some species such as the Quadratic or the Gaussian distribution, both the intercept and the slope have significantly large standard errors that would lead to possible rejection of the influential covariate, if we were to use the LI and LK methods to make statistical inference. However when all methods were fitted with the log-linear functions in Table 3, not only are the slope estimates of the LI and LK methods nearly the same, but they are also the same for the CLK method. The resulting relative probabilities of presence from these three models are all proportional to the true probability of presence, by a ratio of 1/ logβ 0 , estimated from the CLK method. Meanwhile in most of our simulated species, the estimates fitted by a log-linear function in general have a smaller variation compared to the estimates fitted with either a logit or complementary loglog functions. The performance of the complementary loglog functions is in particular poor, when the true probability is very low as in the species of Logistic 1 (see Fig. 3 and 4).
Although it is hard to see what the LK or LI method have estimated in Table 2, this ambiguity, however, becomes clear when we plot the relative probability of presence, i.e., the ratios p(y=1|x,β) π of the LK estimates fitted with both the logit and log-linear functions (Fig. 5). Comparing these ratios with the CLK estimates, we see that these ratios are all similar to each other, regardless of the functional form of the link function and which type of likelihood (full vs the conditional) have been used to fit the PB data. It further confirms that the LK/LI method can provide a good estimate of the relative probability of presence, when no extra information is available on either the RSPF conditions (Lele & Keim, 2006;Solymos & Lele, 2016) or the species' prevalence. Also there are some 'erratic' curves observed for the LI and LK estimates in Fig. 3 and 5 for the species with a Gaussian distribution. These estimates were again simply caused by the non-identifiability problem in the LI and LK methods.
We have also noticed an overall better fit in our simulations using the log-linear function compared to the logit and the complementary loglog functions. This may be due to the fact that most probability functions in our simulation belong to the exponential family. Meanwhile in our numerical studies, we have also observed that when the predetermined prevalence rate π varies from the true prevalence π 0 , the estimated probabilities of presence at each location from the CLK method using log-linear function still provides a consistent ranking, for the true probabilities of presence at each location. However, when the supplied estimated prevalence rate is too high, this is not true for the logit and complementary loglog functions. The estimated probabilities would level off at a probability close to 1, and therefore not able to give a correct ranking. This may be another reason that the log function is used as the link function in the popular MAXENT method.

Discussion
In this paper we have revisited some commonly used methods for modelling species probability of presence with PB data. These methods include the LI (Lancaster & Imbens, 1996), LK (Lele & Keim, 2006;Royle et al., 2012), Lele (Lele, 2009), MAXENT (Phillips & Dudík, 2008), point process models (Warton & Shepherd, 2010;Chakraborty et al., 2011), EM (Ward et al., 2009), SB (Phillips & Elith, 2011) and SC (Steinberg & Cardell, 1992) methods. It is not fair to compare their performance, because these methods are formulated on different conditions and prior information. When there is no information available on the conditions, we can conclude that all these methods for estimating the relative probability of presence, regardless of being well known or not, are essentially the same. Furthermore, these methods also have similar performance in estimating the absolute probability of presence, when the same additional information is provided. Firstly, we have shown that it is the conditional/partial likelihood that is actually employed for modelling PB data, which alone (without the extra information) can only be Figure 5: The ratio between the estimated probability of presence p(y = 1|x,β) and the estimated population prevalenceπ, are fitted with the LK method using logit (grey 1 line) and log-linear function (grey 2 line) over 100 simulations. The ratio is also fitted with the CLK Logit (red line), CLK Log (yellow line) and CLK Clog methods (blue line), using one randomly selected simulation (due to resemblance among replications).
used to make inference about the relative probability of presence. The LI, LK, Lele, MAXENT and the conditional PPM model, were built upon the conditional/partial likelihood. Other methods, such as the Poisson generalized linear regression, logistic regression and the PPM, can only estimate the probability of reporting (as opposed to the probability of presence), due to the lack of appropriate information on the true population prevalence or number of true presences Dorazio, 2014;Renner et al., 2015).
Another important contribution of the paper is to propose a constraint CLK method. It makes the LI/LK approaches comparable to other well performing methods (SC, EM and SB), when the parametric RSPF conditions are not satisfied. Under this circumstance, the LI, LK and the conditional likelihood of the PPM (or MAXENT) methods intrinsically have identification problem in estimating all the parameters relevant to the true probability of presence (or intensity). The introduction of the constraint in the CLK method guarantees a unique estimate for the probability of presence, which equals the true population prevalence if the supplied prevalence is equal to the true one. This unique estimate is just one of the multiple solutions obtained from the LI, LK and the MAXENT methods.
One may argue that the CLK method requires the population prevalence π which is sometimes hard to obtain or estimate in practice. For our purposes, the CLK method proposed in this paper serves more as a technical generalisation tool to gain insight into modelling presence-background data, and to look at the connection of seemingly different methods. On the other hand, the information of population prevalence can be obtained from either pilot studies or other types of data, for example, the presenceabsence (PA) survey data or the complementary expert map. There have been a few recent studies on the combination of PB and PA data (Dorazio, 2014;Fithian, Elith, Hastie, & Keith, 2015;Koshkina et al., 2017). These combined methods can estimate the absolute probability of presence successfully, by gaining the information of population prevalence from the PA data.
It was shown in the previous section that the PPM cannot estimate the absolute intensity of presence, given n 1 is not a true reflection of the number of presences in study area. In order to estimate the true intensity for the PPM, it can simply maximize the conditional PPM likelihood Eqn 5, with the constraint of Λ(D) = true number of presences over the study area, the same idea behind the CLK method.

Appendix A: Equivalence of EM, SB and the LI methods
In the following, we demonstrate how the Expectation-Maximisation (EM) method (Ward et al., 2009) and the scaled binomial loss model (SB) (Phillips & Elith, 2011) are essentially the same as the LI method through simple mathematical derivations. Ward et al. (2009) let z = 1 and z = 0 denote the observed presences and background data, respectively. Note that when z = 1, we know y = 1. However, when z = 0 we do not know whether y = 0 or y = 1. The EM method proposed the following likelihood for the presence-background data (Ward et al., 2009): Here n 0 is the number of observed presences denoted by z = 1, and n 1 the number of background points denoted by z = 0. The notation s = 1 is a construct of casecontrol modeling and indicates that this observation is in the presence-background data sample. The logit link function is used to model the true probability of presence, i.e., log p(y=1|x) 1−p(y=1|x) = η(x). As the information on the true presence y is missing, direct maximisation of this likelihood is difficult. The expectation-maximization (EM) technique is implemented on the full likelihood of both the true and observed presences, with the missing y imputed with its expectation. Now examine the SB method, in which the probability used in the likelihood function is defined as P U A (s = 1|x) = 1 1+r+exp(−η(x)+ln r) (Phillips & Elith, 2011), where r equals 1−fp fp π through the sampling probability of the presence points f p . The sampling probability f p in the SB method can be rewritten as f p = n 0 n 1 +n 0 , and it therefore gives r = n 0 n 1 π. The probability P U A (s = 1|x) can be re-written as P U A (s = 1|x) (SB) = 1 1 + r + exp(−η(x) + ln r) = 1 1 + πn 0 n 1 + πn 0 n 1 e −η(x) = n 1 πn 0 e η(x) 1 + (1 + n 1 πn 0 )e η(x) .

(A.2)
It is obvious that P U A (s = 1|x) used in the SB method is exactly the same as P (z|s = 1, x) of the EM method. Instead of working indirectly on the likelihood of the observed data (as the EM method), the SB method directly maximizes the observed likelihood function from the outset, by using a modification of the standard binomial loss function.
In the LI method, each observed presence is drawn uniformly with the probability h, the same as f p defined in the SB method. The likelihood function L(β, π, h) is constructed on the probability R 1n through L(β, π, h) = i R 1n (β, π, h) z i (1−R 1n (β, π, h)) 1−z i ,  Lele (2009) and our paper, making them comparable to each other.

(A.3)
Obviously R 1n (LI) = P (z|s = 1, x)(EM ) = P U A (s = 1|x)(SB), i.e. the probabilities on which the likelihood functions were formulated, are the same for these three seemingly different methods. The difference lies in the extra information required: the SB and EM methods need a pre-determined value of π, while the LI method treats π as one of the unknown parameters. In order for the LI method to identify π, the identifiability conditions listed in Lele and Keim (2006) and Solymos and Lele (2016) have to be satisfied. The numerical examples in Lancaster and Imbens (1996) paper work well, because they satisfy these parametric identifiability conditions.
(B.2) Therefore both the Lele (2009) and the LI methods are based on exactly the same likelihood functions.
Here p i is the logit function p i = 1 1+exp(−η(x)) , and P U A = 1 1+r+exp(−η(x)+ln r) = 1 1+r/p i . Taking the derivative of U (p i ) with respect to p i , setting it to zero, and the score function for p i becomes n 0 P U A = n 1 (1 − P U A ), i.e., n 0 1+r/p i = n 1 r/p i 1+r/p i . In Phillips and Elith (2011), r is defined as r = 1−fp fp π 0 , which is equivalent to n 0 n 1 π 0 , given the sampling proportion of the presence only points f p = n 1 n 1 +n 0 . It therefore yieldsp i = n 0 n 1 r = π 0 . As for the SC method (Steinberg & Cardell, 1992), the log-likelihood function is Similarly, solving the score function for p i leads to the estimate ofp i = π 0 . Therefore the proposed CLK method obtains the same estimate as the SC (Steinberg & Cardell, 1992) and SB (Phillips & Elith, 2011) methods for p i , which are all estimated to be equal to the predetermined probability of prevalence π 0 .