Keywords

1 Introduction

Wildlife poaching continues to be a global problem as key species are hunted toward extinction. For example, the latest African census showed a 30% decline in elephant populations between 2007 and 2014 [1]. Wildlife conservation areas have been established to protect these species from poachers, and these areas are protected by park rangers. These areas are vast, and rangers do not have sufficient resources to patrol everywhere with high intensity and frequency.

At many sites now, rangers patrol and collect data related to snares they confiscate, poachers they arrest, and other observations. Given rangers’ resource constraints, patrol managers could benefit from tools that analyze these data and provide future poaching predictions. However, this domain presents unique challenges. First, this domain’s real-world data are few, extremely noisy, and incomplete. To illustrate, one of rangers’ primary patrol goals is to find wire snares, which are deployed by poachers to catch animals. However, these snares are usually well-hidden (e.g., in dense grass), and thus rangers may not find these snares and (incorrectly) label an area as not having any snares. Second, poaching activity changes over time, and predictive models must account for this temporal component. Third, because poaching happens in the real world, there are mutual spatial and neighborhood effects that influence poaching activity. Finally, while field tests are crucial in determining a model’s efficacy in the world, the difficulties involved in organizing and executing field tests often precludes them.

Previous works in this domain have modeled poaching behavior with real-world data. Based on data from a Queen Elizabeth Protected Area (QEPA) dataset, [6] introduced a two-layered temporal graphical model, CAPTURE, while [4] constructed an ensemble of decision trees, INTERCEPT, that accounted for spatial relationships. However, these works did not (1) account for both spatial and temporal components nor (2) validate their models via extensive field testing.

In this paper, we provide the following contributions. (1) We introduce a new hybrid model that enhances an ensemble’s broad predictive power with a spatio-temporal model’s adaptive capabilities. Because spatio-temporal models require a lot of data, this model works in two stages. First, predictions are made with an ensemble of decision trees. Second, in areas where there are sufficient data, the ensemble’s prediction is boosted via a spatio-temporal model. (2) In collaboration with the Wildlife Conservation Society and the Uganda Wildlife Authority, we designed and deployed a large, controlled experiment to QEPA. Across 27 areas we designated across QEPA, rangers patrolled approximately 452 km over the course of five months; to our knowledge, this is the largest controlled experiment and field test of Machine Learning-based predictive models in this domain. In this experiment, we tested our model’s selectiveness: is our model able to differentiate between areas of high and low poaching activity?

In experimental results, (1) we demonstrate our model’s superior performance over the state-of-the-art [4] and thus the importance of spatio-temporal modeling. (2) During our field test, rangers found over three times more snaring activity in areas where we predicted higher poaching activity. When accounting for differences in ranger coverage, rangers found twelve times the number of findings per kilometer walked in those areas. These results demonstrate that (i) our model is selective in its predictions and (ii) our model’s superior predictive performance in the laboratory extends to the real world.

2 Background and Related Work

Spatio-temporal models have been used for prediction tasks in image and video processing. Markov Random Fields (MRF) were used by [11, 12] to capture spatio-temporal dependencies in remotely sensed data and moving object detection, respectively.

Critchlow et al. [2] analyzed spatio-temporal patterns in illegal activity in Uganda’s Queen Elizabeth Protected Area (QEPA) using Bayesian hierarchical models. With real-world data, they demonstrated the importance of considering the spatial and temporal changes that occur in illegal activities. However, in this work and other similar works with spatio-temporal models [8, 9], no standard metrics were provided to evaluate the models’ predictive performance (e.g., precision, recall). As such, it is impossible to compare our predictive models’ performance to theirs. While [3] was a field test of [2]’s work, [8, 9] do not conduct field tests to validate their predictions in the real-world.

In the Machine Learning literature, [6] introduced a two-layered temporal Bayesian Network predictive model (CAPTURE) that was also evaluated on real-world data from QEPA. CAPTURE, however, assumes one global set of parameters for all of QEPA which ignores local differences in poachers’ behavior. Additionally, the first layer, which predicts poaching attacks, relies on the current year’s patrolling effort which makes it impossible to predict future attacks (since patrols haven’t happened yet). While CAPTURE includes temporal elements in its model, it does not include spatial components and thus cannot capture neighborhood specific phenomena. In contrast to CAPTURE, [4] presented a behavior model, INTERCEPT, based on an ensemble of decision trees and was demonstrated to outperform CAPTURE. While their model accounted for spatial correlations, it did not include a temporal component. In contrast to these predictive models, our model addresses both spatial and temporal components.

It is vital to validate predictive models in the real world, and both [3, 4] have conducted field tests in QEPA. [4] conducted a one month field test in QEPA and demonstrated promising results for predictive analytics in this domain. Unlike the field test we conducted, however, that was a preliminary field test and was not a controlled experiment. On the other hand, [3] conducted a controlled experiment where their goal, by selecting three areas for rangers to patrol, was to maximize the number of observations sighted per kilometer walked by the rangers. Their test successfully demonstrated a significant increase in illegal activity detection at two of the areas, but they did not provide comparable evaluation metrics for their predictive model. Also, our field test was much larger in scale, involving 27 patrol posts compared to their 9 posts.

3 Wildlife Crime Dataset: Features and Challenges

This study’s wildlife crime dataset is from Uganda’s Queen Elizabeth Protected Area (QEPA), an area containing a wildlife conservation park and two wildlife reserves, which spans about 2,520 km\(^2\). There are 37 patrol posts situated across QEPA from which Uganda Wildlife Authority (UWA) rangers conduct patrols to apprehend poachers, remove any snares or traps, monitor wildlife, and record signs of illegal activity. Along with the amount of patrolling effort in each area, the dataset contains 14 years (2003–2016) of the type, location, and date of wildlife crime activities.

Rangers lack the manpower to patrol everywhere all the time, and thus illegal activity may be undetected in unpatrolled areas. Patrolling is an imperfect process, and there is considerable uncertainty in the dataset’s negative data points (i.e., areas being labeled as having no illegal activity); rangers may patrol an area and label it as having no snares when, in fact, a snare was well-hidden and undetected. These factors contribute to the dataset’s already large class imbalance; there are many more negative data points than there are positive points (crime detected). It is thus necessary to consider models that estimate hidden variables (e.g., whether an area has been attacked) and also to evaluate predictive models with metrics that account for this uncertainty, such as those in the Positive and Unlabeled Learning (PU Learning) literature [5]. We divide QEPA into 1 km\(^2\) grid cells (a total of 2,522 cells), and we refer to these cells as targets. Each target is associated with several static geospatial features such as terrain (e.g., slope), distance values (e.g., distance to border), and animal density. Each target is also associated with dynamic features such as how often an area has been patrolled (i.e., coverage) and observed illegal activities (e.g., snares) (Fig. 1).

Fig. 1.
figure 1

Photo credit: UWA ranger

Fig. 2.
figure 2

Geo-clusters and graphical model

4 Models and Algorithms

4.1 Prediction by Graphical Models

Markov Random Field (MRF). To predict poaching activity, each target, at time step \(t \in \lbrace t_{1}, ..., t_{m} \rbrace \), is represented by coordinates i and j within the boundary of QEPA. In Fig. 2(a), we demonstrate a three-dimensional network for spatio-temporal modeling of poaching events over all targets. Connections between nodes represent the mutual spatial influence of neighboring targets and also the temporal dependence between recurring poaching incidents at a target. \(a^{t}_{i,j}\) represents poaching incidents at time step t and target ij. Mutual spatial influences are modeled through first-order neighbors (i.e., \(a^{t}_{i,j}\) connects to \(a^{t}_{i\pm 1,j}\), \(a^{t}_{i,j\pm 1}\) and \(a^{t-1}_{i,j}\)) and second-order neighbors (i.e., \(a^{t}_{i,j}\) connects to \(a^{t}_{i\pm 1,j\pm 1}\)); for simplicity, the latter is not shown on the model’s lattice. Each random variable takes a value in its state space, in this paper, \(\mathcal {L} = \lbrace 0,1 \rbrace \).

To avoid index overload, henceforth, nodes are indexed by serial numbers, \(\mathcal {S} = \lbrace 1,2,...,N \rbrace \) when we refer to the three-dimensional network. We introduce two random fields, indexed by \(\mathcal {S}\), with their configurations: \(\mathcal {A}=\lbrace \varvec{a} = (a_1,...,a_N)|a_i \in \mathcal {L}, i \in \mathcal {S} \rbrace \), which indicates an actual poaching attack occurred at targets over the period of study, and \(\mathcal {O} =\lbrace \varvec{o} =(o_1,...,o_N)|o_i \in \mathcal {L}, i \in \mathcal {S} \rbrace \) indicates a detected poaching attack at targets over the period of study. Due to the imperfect detection of poaching activities, the former represents the hidden variables, and the latter is the known observed data collected by rangers, shown by the gray-filled nodes in Fig. 2(a). Targets are related to one another via a neighborhood system, \({\mathcal {N}}_n\), which is the set of nodes neighboring n and \(n \not \in {\mathcal {N}}_n\). This neighborhood system considers all spatial and temporal neighbors. We define neighborhood attackability as the fraction of neighbors that the model predicts to be attacked: \(u_{{\mathcal {N}}_n} ={\sum _{n \in {\mathcal {N}}_n}a_n}/{|{\mathcal {N}}_n |}\).

The probability, \(P(a_i|u_{{\mathcal {N}}_n},\varvec{\alpha })\), of a poaching incident at each target n at time step t is represented in Eq. 1, where \(\varvec{\alpha }\) is a vector of parameters weighting the most important variables that influence poaching; \(\varvec{Z}\) represents the vector of time-invariant ecological covariates associated with each target (e.g., animal density, slope, forest cover, net primary productivity, distance from patrol post, town and rivers [2, 7]). The model’s temporal dimension is reflected through not only the backward dependence of each \(a_{n}\), which influences the computation of \(u_{{\mathcal {N}}_n}\), but also in the past patrol coverage at target n, denoted by \(c^{t-1}_{n}\), which models the delayed deterrence effect of patrolling efforts.

$$\begin{aligned} p(a_{n}=1|u_{{\mathcal {N}}_n},\varvec{\alpha })= \dfrac{e^{-{\varvec{\alpha }} [\varvec{Z}, u_{{\mathcal {N}}_n}, c^{t-1}_{n}, 1]^\intercal }}{1+e^{-\varvec{\alpha } [\varvec{Z}, u_{{\mathcal {N}}_n}, c^{t-1}_{n}, 1]^\intercal }} \end{aligned}$$
(1)

Given \(a_n\), \(o_n\) follows a conditional probability distribution proposed in Eq. 2, which represents the probability of rangers detecting a poaching attack at target n. The first column of the matrix denotes the probability of not detecting or detecting attacks if an attack has not happened, which is constrained to 1 or 0 respectively. In other words, it is impossible to detect an attack when an attack has not happened. The second column of the matrix represents the probability of not detecting or detecting attacks in the form of a logistic function if an attack has happened. Since it is less rational for poachers to place snares close to patrol posts and more convenient for rangers to detect poaching signs near the patrol posts, we assumed \(dp_{n}\) (distance from patrol post) and \(c^{t}_{n}\) (patrol coverage devoted to target n at time t) are the major variables influencing rangers’ detection capabilities. Detectability at each target is represented in Eq. 2, where \(\varvec{\beta }\) is a vector of parameters that weight these variables.

$$\begin{aligned} p(o_{n} |a_{n})&= \begin{bmatrix} p(o_n=0|a_n=0)&p(o_n=0|a_n=1, \varvec{\beta }) \\[5pt] p(o_n=1|a_n=0)&p(o_n=1|a_n=1, \varvec{\beta }) \end{bmatrix}= \begin{bmatrix} 1,&\dfrac{1}{1+e^{-\varvec{\beta }[dp_{n}, c^{t}_{n},1]^\intercal }} \\[10pt] 0,&\dfrac{e^{-\varvec{\beta }[dp_{n}, c^{t}_{n},1]^\intercal }}{1+e^{-\varvec{\beta }[dp_{n}, c^{t}_{n},1]^\intercal }} \end{bmatrix} \end{aligned}$$
(2)

We assume that \((\varvec{o},\varvec{a})\) is pairwise independent, meaning \(p(\varvec{o},\varvec{a}) = \prod _{n \in \mathcal {S}} p(o_n, a_n)\).

EM Algorithm to Infer on MRF. We use the Expectation-Maximization (EM) algorithm to estimate the MRF model’s parameters \(\varvec{\theta }= \lbrace \varvec{\alpha }, \varvec{\beta }\rbrace \). For completeness, we provide details about how we apply the EM algorithm to our model. Given a joint distribution \(p(\varvec{o}, \varvec{a} |\varvec{\theta })\) over observed variables \(\varvec{o}\) and hidden variables \(\varvec{a}\), governed by parameters \(\varvec{\theta }\), EM aims to maximize the likelihood function \(p(\varvec{o} |\varvec{\theta })\) with respect to \(\varvec{\theta }\). To start the algorithm, an initial setting for the parameters \(\varvec{\theta }^{old}\) is chosen. At E-step, \(p(\varvec{a} \vert \varvec{o}, \varvec{\theta }^{old})\) is evaluated, particularly, for each node in MRF model:

$$\begin{aligned} p(a_n \vert o_n, \varvec{\theta }^{old}) = \dfrac{{p(o_n |a_n,\varvec{\beta }^{old}).p(a_n |u_{{\mathcal {N}}_n}^{old},\varvec{\alpha }^{old})}}{p(o_n)} \end{aligned}$$
(3)

M-step calculates \(\varvec{\theta }^{new}\), according to the expectation of the complete log likelihood, \(\log p(\varvec{o}, \varvec{a} |\varvec{\theta })\), given in Eq. 4.

$$\begin{aligned} \varvec{\theta }^{new} = \arg \max _{\varvec{\theta }} \sum _{a_n \in \mathcal {L}} p(\varvec{a} |\varvec{o}, \varvec{\theta }^{old}).\log p(\varvec{o}, \varvec{a} |\varvec{\theta }) \end{aligned}$$
(4)

To facilitate calculation of the log of the joint probability distribution, \(\log p(\varvec{o},\varvec{a} |\varvec{\theta })\), we introduce an approximation that makes use of \(u_{{\mathcal {N}}_n}^{old}\), represented in Eq. 5.

$$\begin{aligned} \log p(\varvec{o},\varvec{a} |\varvec{\theta }) = \sum _{n \in \mathcal {S}}\sum _{a_n \in \mathcal {L}}\log p(o_n |a_n, \varvec{\beta })+\log p(a_n |u_{{\mathcal {N}}_n}^{old}, \varvec{\alpha }) \end{aligned}$$
(5)

Then, if convergence of the log likelihood is not satisfied, \(\varvec{\theta }^{old} \leftarrow \varvec{\theta }^{new}\), and repeat.

Dataset Preparation for MRF. To split the data into training and test sets, we divided the real-world dataset into year-long time steps. We trained the model’s parameters \(\varvec{\theta }= \lbrace \varvec{\alpha }, \varvec{\beta }\rbrace \) on historical data sampled through time steps \((t_1,...,t_m)\) for all targets within the boundary. These parameters were used to predict poaching activity at time step \(t_{m+1}\), which represents the test set for evaluation purposes. The trade-off between adding years’ data (performance) vs. computational costs led us to use three years (\(m=3\)). The model was thus trained over targets that were patrolled throughout the training time period \((t_1, t_2, t_3)\). We examined three training sets: 2011–2013, 2012–2014, and 2013–2015 for which the test sets are from 2014, 2015, and 2016, respectively.

Capturing temporal trends requires a sufficient amount of data to be collected regularly across time steps for each target. Due to the large amount of missing inspections and uncertainty in the collected data, this model focuses on learning poaching activity only over regions that have been continually monitored in the past, according to Definition 1. We denote this subset of targets as \({\mathcal {S}}_c\).

Definition 1

Continually vs. occasionally monitoring: A target ij is continually monitored if all elements of the coverage sequence are positive; \(c_{i,j}^{t_k}>0, \forall k=1,...,m\) where m is the number of time steps. Otherwise, it is occasionally monitored.

Experiments with MRF were conducted in various ways on each data set. We refer to (a) a global model with spatial effects as GLB-S, which consists of a single set of parameters \(\varvec{\theta }\) for the whole QEPA, and (b) a global model without spatial effects (i.e., the parameter that corresponds to \(u_{{\mathcal {N}}_n}\) is set to 0) as GLB. The spatio-temporal model is designed to account for temporal and spatial trends in poaching activities. However, since learning those trends and capturing spatial effects are impacted by the variance in local poachers’ behaviors, we also examined (c) a geo-clustered model which consists of multiple sets of local parameters throughout QEPA with spatial effects, referred to as GCL-S, and also (d) a geo-clustered model without spatial effects (i.e., the parameter that corresponds to \(u_{{\mathcal {N}}_n}\) is set to 0) referred to as GCL.

Figure 2(b) shows the geo-clusters generated by Gaussian Mixture Models (GMM), which classifies the targets based on the geo-spatial features, \(\varvec{Z}\), along with the targets’ coordinates, \((x_{i,j},y_{i,j})\), into 22 clusters. The number of geo-clusters, 22, are intended to be close to the number of patrol posts in QEPA such that each cluster contains one or two nearby patrol posts. With that being considered, not only are local poachers’ behaviors described by a distinct set of parameters, but also the data collection conditions, over the targets within each cluster, are maintained to be nearly uniform.

4.2 Prediction by Ensemble Models

A Bagging ensemble model or Bootstrap aggregation technique, called Bagging, is a type of ensemble learning which bags some weak learners, such as decision trees, on a dataset by generating many bootstrap duplicates of the dataset and learning decision trees on them. Each of the bootstrap duplicates are obtained by randomly choosing M observations out of M with replacement, where M denotes the training dataset size. Finally, the predicted response of the ensemble is computed by taking an average over predictions from its individual decision trees. To learn a Bagging ensemble, we used the fitensemble function of MATLAB 2017a. Dataset preparation for the Bagging ensemble model is designed to find the targets that are liable to be attacked [4]. A target is assumed to be attackable if it has ever been attacked; if any observations occurred in the entire training period for a given target, that target is labeled as attackable. For this model, the best training period contained 5 years of data.

4.3 Hybrid of MRF and Bagging Ensemble

Since the amount and regularity of data collected by rangers varies across regions of QEPA, predictive models perform differently in different regions. As such, we propose using different models to predict over them; first, we used a Bagging ensemble model, and then improved the predictions in some regions using the spatio-temporal model. For global models, we used MRF for all continually monitored targets. However, for geo-clustered models, for targets in the continually monitored subset, \({\mathcal {S}}_c^{q}\), (where temporally-aware models can be used practically), the MRF model’s performance varied widely across geo-clusters according to our experiments. q indicates clusters and \( 1 \le q \le 22 \). Thus, for each q, if the average Catch Per Unit Effort (CPUE), outlined by Definition 2, is relatively large, we use the MRF model for \({\mathcal {S}}_c^{q}\). In Conservation Biology, CPUE is an indirect measure of poaching activity abundance. A larger average CPUE for each cluster corresponds to more frequent poaching activity and thus more data for that cluster. Consequently, using more complex spatio-temporal models in those clusters becomes more reasonable.

Definition 2

Average CPUE is \(\sum _{n \in {\mathcal {S}}_c^{q}}o_n/\sum _{n \in {\mathcal {S}}_c^{q}}c_n^t\) in cluster q.

To compute CPUE, effort corresponds to the amount of coverage (i.e., 1 unit = 1 km walked) in a given target, and catch corresponds to the number of observations. Hence, for \( 1 \le q \le 22 \), we will boost selectively according to the average CPUE value; some clusters may not be boosted by MRF, and we would only use Bagging ensemble model for making predictions on them. Experiments on historical data show that selecting \(15\%\) of the geo-clusters with the highest average CPUE results in the best performance for the entire hybrid model (discussed in the Evaluation Section).

5 Evaluations and Discussions

5.1 Evaluation Metrics

The imperfect detection of poaching activities in wildlife conservation areas leads to uncertainty in the negative class labels of data samples [4]. It is thus vital to evaluate prediction results based on metrics which account for this inherent uncertainty. In addition to standard metrics in Machine Learning (e.g., precision, recall, F1) which are used to evaluate models on datasets with no uncertainty in the underlying ground truth, we also use the L&L metric introduced in [5], which is a metric specifically designed for models learned on Positive and Unlabeled datasets. L&L is defined as \( L \& L=\frac{r^2}{Pr[f(Te)=1]}\), where r denotes the recall and \(Pr[f(Te)=1]\) denotes the probability of a classifier f making a positive class label prediction.

5.2 Experiments with Real-World Data

Evaluation of models’ attack predictions are demonstrated in Tables 1 and 2. Precision and recall are denoted by Prec. and Rec. in the tables. To compare models’ performances, we used several baseline methods, (i) Positive Baseline, PB; a model that predicts poaching attacks to occur in all targets, (ii) Random Baseline, RB; a model which flips a coin to decide its prediction, (iii) Training Label Baseline, TL; a model which predicts a target as attacked if it has been ever attacked in the training data. We also present the results for Support Vector Machines, SVM, and AdaBoost methods, AD, which are well-known Machine Learning techniques, along with results for the best performing predictive model on the QEPA dataset, INTERCEPT, INT, [4]. Results for the Bagging ensemble technique, BG, and RUSBoost, RUS, a hybrid sampling/boosting algorithm for learning from datasets with class imbalance [10], are also presented. In all tables, BGG* stands for the best performing model among all variations of the hybrid model, which will be discussed in detail later. Table 1 demonstrates that BGG* outperformed all other existing models in terms of L&L and also F1.

Table 1. Comparing all models’ performances with the best performing BGG model.

Table 2 provides a detailed comparison of all variations of our hybrid models, BGG (i.e., when different MRF models are used). When GCL-S is used, we get the best performing model in terms of L&L score, which is denoted as BGG*. The poor results of learning a global set of parameters emphasize the fact that poachers’ behavior and patterns are not identical throughout QEPA and should be modeled accordingly.

Table 2. Performances of hybrid models with variations of MRF (BGG models)

Our experiments demonstrated that the performance of the MRF model within \({\mathcal {S}}_c^{q}\) varies across different geo-clusters and is related to the CPUE value for each cluster, q. Figure 3(a) displays an improvement in L&L score for the BGG* model compared to BG vs. varying the percentile of geo-clusters used for boosting. Experiments with the 2014 test set show that choosing the \(85^{\text {th}}\) percentile of geo-clusters for boosting with MRF, according to CPUE, (i.e., selecting \(15\%\) of the geo-clusters, with highest CPUE), results in the best prediction performance. The \(85^{\text {th}}\) percentile is shown by vertical lines in Figures where the BGG* model outperformed the BG model. We used a similar percentile value for experiments with the MRF model on test sets of 2015 and 2016. Figure 3(b) and (c) confirm the efficiency of choosing an \(85^{\text {th}}\) percentile value.

Fig. 3.
figure 3

L&L improvement vs. CPUE percentile value; BGG* compared to BG

6 QEPA Field Test

While our model demonstrated superior predictive performance on historical data, it is important to test these models in the field.

Fig. 4.
figure 4

Patrol area statistics

The initial field test we conducted in [4], in collaboration with the Wildlife Conservation Society (WCS) and the Uganda Wildlife Authority (UWA), was the first of its kind in the Machine Learning (ML) community and showed promising improvements over previous patrolling regimes. Due to the difficulty of organizing such a field test, its implications were limited: only two 9-km\(^2\) areas (18 km\(^2\)) of QEPA were patrolled by rangers over a month. Because of its success, however, WCS and UWA graciously agreed to a larger scale, controlled experiment: also in 9 km\(^2\) areas, but rangers patrolled 27 of these areas (243 km\(^2\), spread across QEPA) over five months; this is the largest to-date field test of ML-based predictive models in this domain. We show the areas in Fig. 4(a). Note that rangers patrolled these areas in addition to other areas of QEPA as part of their normal duties.

This experiment’s goal was to determine the selectiveness of our model’s snare attack predictions: does our model correctly predict both where there are and are not snare attacks? We define attack prediction rate as the proportion of targets (a 1 km by 1 km cell) in a patrol area (3 by 3 cells) that are predicted to be attacked. We considered two experiment groups that corresponded to our model’s attack prediction rates from November 2016 to March 2017: High (group 1) and Low (group 2). Areas that had an attack prediction rate of 50% or greater were considered to be in a high area (group 1); areas with less than a 50% rate were in group 2. For example, if the model predicted five out of nine targets to be attacked in an area, that area was in group 1. Due to the importance of QEPA for elephant conservation, we do not show which areas belong to which experiment group in Fig. 4(a) so that we do not provide data to ivory poachers.

To start, we exhaustively generated all patrol areas such that (1) each patrol area was \(3\,\times \,3\) km\(^2\), (2) no point in the patrol area was more than 5 km away from the nearest ranger patrol post, and (3) no patrol area was patrolled too frequently or infrequently in past years (to ensure that the training data associated with all areas was of similar quality); in all, 544 areas were generated across QEPA. Then, using the model’s attack predictions, each area was assigned to an experiment group. Because we were not able to test all 544 areas, we selected a subset such that no two areas overlapped with each other and no more than two areas were selected for each patrol post (due to manpower constraints). In total, 5 areas in group 1 and 22 areas in group 2 were chosen. Note that this composition arose due to the preponderance of group 2 areas (see Table 3). We provide a breakdown of the areas’ exact attack prediction rates in Fig. 4(b); areas with rates below 56% (5/9) were in group 2, and for example, there were 8 areas in group 2 with a rate of 22% (2/9). Finally, when we provided patrols to the rangers, experiment group memberships were hidden to prevent effects where knowledge of predicted poaching activity would influence their patrolling patterns and detection rates.

Table 3. Patrol area group memberships

6.1 Field Test Results and Discussion

The field test data we received was in the same format as the historical data. However, because rangers needed to physically walk to these patrol areas, we received additional data that we have omitted from this analysis; observations made outside of a designated patrol area were not counted. Because we only predicted where snaring activity would occur, we have also omitted other observation types made during the experiment (e.g., illegal cattle grazing). We present results from this five-month field test in Table 4. To provide additional context for these results, we also computed QEPA’s park-wide historical CPUE (from November 2015 to March 2016): 0.04.

Table 4. Field test results: observations

Areas with a high attack prediction rate (group 1) had significantly more snare sightings than areas with low attack prediction rates (15 vs. 4). This is despite there being far fewer group 1 areas than group 2 areas (5 vs. 22); on average, group 1 areas had 3 snare observations whereas group 2 areas had 0.18 observations. It is worth noting the large standard deviation for the mean observation counts; the standard deviation of 5.2, for the mean of 3, signifies that not all areas had snare observations. Indeed, two out of five areas in group 1 had snare observations. However, this also applies to group 2’s areas: only 3 out of 22 areas had snare observations.

We present Catch per Unit Effort (CPUE) results in Table 4. When accounting for differences in areas’ effort, group 1 areas had a CPUE that was over ten times that of group 2 areas. Moreover, when compared to QEPA’s park-wide historical CPUE of 0.04, it is clear that our model successfully differentiated between areas of high and low snaring activity. The results of this large-scale field test, the first of its kind for ML models in this domain, demonstrated that our model’s superior predictive performance in the laboratory extends to the real world.

7 Conclusion

In this paper, we presented a hybrid spatio-temporal model to predict wildlife poaching threat levels. Additionally, we validated our model via an extensive five-month field test in Queen Elizabeth Protected Area (QEPA) where rangers patrolled over 450 km\(^2\) across QEPA—the largest field-test to-date of Machine Learning-based models in this domain. On real-world historical data from QEPA, our hybrid model achieves significantly better performance than prior work. On the data collected from our field test, we demonstrated that our model successfully differentiated between areas of high and low snaring activity. These findings demonstrated that our model’s predictions are selective and also that its superior laboratory performance extends to the real world. Based on these promising results, future work will focus on deploying these models as part of a software package to UWA to aid in planning future anti-poaching patrols.