## Abstract

Worldwide, conservation agencies employ rangers to protect conservation areas from poachers. However, agencies lack the manpower to have rangers effectively patrol these vast areas frequently. While past work has modeled poachers’ behavior so as to aid rangers in planning future patrols, those models’ predictions were not validated by extensive field tests. In this paper, we present a hybrid spatio-temporal model that predicts poaching threat levels and results from a five-month field test of our model in Uganda’s Queen Elizabeth Protected Area (QEPA). To our knowledge, this is the first time that a predictive model has been evaluated through such an extensive field test in this domain. We present two major contributions. First, our hybrid model consists of two components: (i) an ensemble model which can work with the limited data common to this domain and (ii) a spatio-temporal model to boost the ensemble’s predictions when sufficient data are available. When evaluated on real-world historical data from QEPA, our hybrid model achieves significantly better performance than previous approaches with either temporally-aware dynamic Bayesian networks or an ensemble of spatially-aware models. Second, in collaboration with the Wildlife Conservation Society and Uganda Wildlife Authority, we present results from a five-month controlled experiment *where rangers patrolled over 450 sq km across QEPA*. We demonstrate that our model successfully predicted (1) where snaring activity would occur and (2) where it would not occur; in areas where we predicted a high rate of snaring activity, rangers found more snares and snared animals than in areas of lower predicted activity. These findings demonstrate that (1) our model’s predictions are selective, (2) our model’s superior laboratory performance extends to the real world, and (3) these predictive models can aid rangers in focusing their efforts to prevent wildlife poaching and save animals.

Shahrzad Gholami and Benjamin Ford are both first authors of this paper.

You have full access to this open access chapter, Download conference paper PDF

### Similar content being viewed by others

## Keywords

- Predictive models
- Ensemble techniques
- Graphical models
- Field test evaluation
- Wildlife protection
- Wildlife poaching

## 1 Introduction

Wildlife poaching continues to be a global problem as key species are hunted toward extinction. For example, the latest African census showed a 30% decline in elephant populations between 2007 and 2014 [1]. Wildlife conservation areas have been established to protect these species from poachers, and these areas are protected by park rangers. These areas are vast, and rangers do not have sufficient resources to patrol everywhere with high intensity and frequency.

At many sites now, rangers patrol and collect data related to snares they confiscate, poachers they arrest, and other observations. Given rangers’ resource constraints, patrol managers could benefit from tools that analyze these data and provide future poaching predictions. However, this domain presents unique challenges. First, this domain’s real-world data are few, extremely noisy, and incomplete. To illustrate, one of rangers’ primary patrol goals is to find wire snares, which are deployed by poachers to catch animals. However, these snares are usually well-hidden (e.g., in dense grass), and thus rangers may not find these snares and (incorrectly) label an area as not having any snares. Second, poaching activity changes over time, and predictive models must account for this temporal component. Third, because poaching happens in the real world, there are mutual spatial and neighborhood effects that influence poaching activity. Finally, while field tests are crucial in determining a model’s efficacy in the world, the difficulties involved in organizing and executing field tests often precludes them.

Previous works in this domain have modeled poaching behavior with real-world data. Based on data from a Queen Elizabeth Protected Area (QEPA) dataset, [6] introduced a two-layered temporal graphical model, CAPTURE, while [4] constructed an ensemble of decision trees, INTERCEPT, that accounted for spatial relationships. However, these works did not (1) account for both spatial and temporal components nor (2) validate their models via extensive field testing.

In this paper, we provide the following contributions. (1) We introduce a new hybrid model that enhances an ensemble’s broad predictive power with a spatio-temporal model’s adaptive capabilities. Because spatio-temporal models require a lot of data, this model works in two stages. First, predictions are made with an ensemble of decision trees. Second, in areas where there are sufficient data, the ensemble’s prediction is boosted via a spatio-temporal model. (2) In collaboration with the Wildlife Conservation Society and the Uganda Wildlife Authority, we designed and deployed a large, controlled experiment to QEPA. Across 27 areas we designated across QEPA, rangers patrolled approximately 452 km over the course of five months; to our knowledge, this is the largest controlled experiment and field test of Machine Learning-based predictive models in this domain. In this experiment, we tested our model’s selectiveness: is our model able to differentiate between areas of high and low poaching activity?

In experimental results, (1) we demonstrate our model’s superior performance over the state-of-the-art [4] and thus the importance of spatio-temporal modeling. (2) During our field test, rangers found over three times more snaring activity in areas where we predicted higher poaching activity. When accounting for differences in ranger coverage, rangers found twelve times the number of findings per kilometer walked in those areas. These results demonstrate that (i) our model is selective in its predictions and (ii) our model’s superior predictive performance in the laboratory extends to the real world.

## 2 Background and Related Work

Spatio-temporal models have been used for prediction tasks in image and video processing. Markov Random Fields (MRF) were used by [11, 12] to capture spatio-temporal dependencies in remotely sensed data and moving object detection, respectively.

Critchlow et al. [2] analyzed spatio-temporal patterns in illegal activity in Uganda’s Queen Elizabeth Protected Area (QEPA) using Bayesian hierarchical models. With real-world data, they demonstrated the importance of considering the spatial and temporal changes that occur in illegal activities. However, in this work and other similar works with spatio-temporal models [8, 9], no standard metrics were provided to evaluate the models’ predictive performance (e.g., precision, recall). As such, it is impossible to compare our predictive models’ performance to theirs. While [3] was a field test of [2]’s work, [8, 9] do not conduct field tests to validate their predictions in the real-world.

In the Machine Learning literature, [6] introduced a two-layered temporal Bayesian Network predictive model (CAPTURE) that was also evaluated on real-world data from QEPA. CAPTURE, however, assumes one global set of parameters for all of QEPA which ignores local differences in poachers’ behavior. Additionally, the first layer, which predicts poaching attacks, relies on the current year’s patrolling effort which makes it impossible to predict future attacks (since patrols haven’t happened yet). While CAPTURE includes temporal elements in its model, it does not include spatial components and thus cannot capture neighborhood specific phenomena. In contrast to CAPTURE, [4] presented a behavior model, INTERCEPT, based on an ensemble of decision trees and was demonstrated to outperform CAPTURE. While their model accounted for spatial correlations, it did not include a temporal component. In contrast to these predictive models, our model addresses both spatial and temporal components.

It is vital to validate predictive models in the real world, and both [3, 4] have conducted field tests in QEPA. [4] conducted a one month field test in QEPA and demonstrated promising results for predictive analytics in this domain. Unlike the field test we conducted, however, that was a preliminary field test and was not a controlled experiment. On the other hand, [3] conducted a controlled experiment where their goal, by selecting three areas for rangers to patrol, was to maximize the number of observations sighted per kilometer walked by the rangers. Their test successfully demonstrated a significant increase in illegal activity detection at two of the areas, but they did not provide comparable evaluation metrics for their predictive model. Also, our field test was much larger in scale, involving 27 patrol posts compared to their 9 posts.

## 3 Wildlife Crime Dataset: Features and Challenges

This study’s wildlife crime dataset is from Uganda’s Queen Elizabeth Protected Area (QEPA), an area containing a wildlife conservation park and two wildlife reserves, which spans about 2,520 km\(^2\). There are 37 patrol posts situated across QEPA from which Uganda Wildlife Authority (UWA) rangers conduct patrols to apprehend poachers, remove any snares or traps, monitor wildlife, and record signs of illegal activity. Along with the amount of patrolling effort in each area, the dataset contains 14 years (2003–2016) of the type, location, and date of wildlife crime activities.

Rangers lack the manpower to patrol everywhere all the time, and thus illegal activity may be undetected in unpatrolled areas. Patrolling is an imperfect process, and there is considerable uncertainty in the dataset’s negative data points (i.e., areas being labeled as having no illegal activity); rangers may patrol an area and label it as having no snares when, in fact, a snare was well-hidden and undetected. These factors contribute to the dataset’s already large class imbalance; there are many more negative data points than there are positive points (crime detected). It is thus necessary to consider models that estimate hidden variables (e.g., whether an area has been attacked) and also to evaluate predictive models with metrics that account for this uncertainty, such as those in the Positive and Unlabeled Learning (PU Learning) literature [5]. We divide QEPA into 1 km\(^2\) grid cells (a total of 2,522 cells), and we refer to these cells as targets. Each target is associated with several static geospatial features such as terrain (e.g., slope), distance values (e.g., distance to border), and animal density. Each target is also associated with dynamic features such as how often an area has been patrolled (i.e., coverage) and observed illegal activities (e.g., snares) (Fig. 1).

## 4 Models and Algorithms

### 4.1 Prediction by Graphical Models

**Markov Random Field (MRF).** To predict poaching activity, each target, at time step \(t \in \lbrace t_{1}, ..., t_{m} \rbrace \), is represented by coordinates *i* and *j* within the boundary of QEPA. In Fig. 2(a), we demonstrate a three-dimensional network for spatio-temporal modeling of poaching events over all targets. Connections between nodes represent the mutual spatial influence of neighboring targets and also the temporal dependence between recurring poaching incidents at a target. \(a^{t}_{i,j}\) represents poaching incidents at time step *t* and target *i*, *j*. Mutual spatial influences are modeled through first-order neighbors (i.e., \(a^{t}_{i,j}\) connects to \(a^{t}_{i\pm 1,j}\), \(a^{t}_{i,j\pm 1}\) and \(a^{t-1}_{i,j}\)) and second-order neighbors (i.e., \(a^{t}_{i,j}\) connects to \(a^{t}_{i\pm 1,j\pm 1}\)); for simplicity, the latter is not shown on the model’s lattice. Each random variable takes a value in its state space, in this paper, \(\mathcal {L} = \lbrace 0,1 \rbrace \).

To avoid index overload, henceforth, nodes are indexed by serial numbers, \(\mathcal {S} = \lbrace 1,2,...,N \rbrace \) when we refer to the three-dimensional network. We introduce two random fields, indexed by \(\mathcal {S}\), with their configurations: \(\mathcal {A}=\lbrace \varvec{a} = (a_1,...,a_N)|a_i \in \mathcal {L}, i \in \mathcal {S} \rbrace \), which indicates an *actual* poaching attack occurred at targets over the period of study, and \(\mathcal {O} =\lbrace \varvec{o} =(o_1,...,o_N)|o_i \in \mathcal {L}, i \in \mathcal {S} \rbrace \) indicates a *detected* poaching attack at targets over the period of study. Due to the imperfect detection of poaching activities, the former represents the hidden variables, and the latter is the known observed data collected by rangers, shown by the gray-filled nodes in Fig. 2(a). Targets are related to one another via a neighborhood system, \({\mathcal {N}}_n\), which is the set of nodes neighboring *n* and \(n \not \in {\mathcal {N}}_n\). This neighborhood system considers all spatial and temporal neighbors. We define neighborhood attackability as the fraction of neighbors that the model predicts to be attacked: \(u_{{\mathcal {N}}_n} ={\sum _{n \in {\mathcal {N}}_n}a_n}/{|{\mathcal {N}}_n |}\).

The probability, \(P(a_i|u_{{\mathcal {N}}_n},\varvec{\alpha })\), of a poaching incident at each target *n* at time step *t* is represented in Eq. 1, where \(\varvec{\alpha }\) is a vector of parameters weighting the most important variables that influence poaching; \(\varvec{Z}\) represents the vector of time-invariant ecological covariates associated with each target (e.g., animal density, slope, forest cover, net primary productivity, distance from patrol post, town and rivers [2, 7]). The model’s temporal dimension is reflected through not only the backward dependence of each \(a_{n}\), which influences the computation of \(u_{{\mathcal {N}}_n}\), but also in the past patrol coverage at target *n*, denoted by \(c^{t-1}_{n}\), which models the delayed deterrence effect of patrolling efforts.

Given \(a_n\), \(o_n\) follows a conditional probability distribution proposed in Eq. 2, which represents the probability of rangers detecting a poaching attack at target *n*. The first column of the matrix denotes the probability of not detecting or detecting attacks if an attack has not happened, which is constrained to 1 or 0 respectively. In other words, it is impossible to detect an attack when an attack has not happened. The second column of the matrix represents the probability of not detecting or detecting attacks in the form of a logistic function if an attack has happened. Since it is less rational for poachers to place snares close to patrol posts and more convenient for rangers to detect poaching signs near the patrol posts, we assumed \(dp_{n}\) (distance from patrol post) and \(c^{t}_{n}\) (patrol coverage devoted to target *n* at time *t*) are the major variables influencing rangers’ detection capabilities. Detectability at each target is represented in Eq. 2, where \(\varvec{\beta }\) is a vector of parameters that weight these variables.

We assume that \((\varvec{o},\varvec{a})\) is pairwise independent, meaning \(p(\varvec{o},\varvec{a}) = \prod _{n \in \mathcal {S}} p(o_n, a_n)\).

**EM Algorithm to Infer on MRF.** We use the Expectation-Maximization (EM) algorithm to estimate the MRF model’s parameters \(\varvec{\theta }= \lbrace \varvec{\alpha }, \varvec{\beta }\rbrace \). For completeness, we provide details about how we apply the EM algorithm to our model. Given a joint distribution \(p(\varvec{o}, \varvec{a} |\varvec{\theta })\) over observed variables \(\varvec{o}\) and hidden variables \(\varvec{a}\), governed by parameters \(\varvec{\theta }\), EM aims to maximize the likelihood function \(p(\varvec{o} |\varvec{\theta })\) with respect to \(\varvec{\theta }\). To start the algorithm, an initial setting for the parameters \(\varvec{\theta }^{old}\) is chosen. At E-step, \(p(\varvec{a} \vert \varvec{o}, \varvec{\theta }^{old})\) is evaluated, particularly, for each node in MRF model:

M-step calculates \(\varvec{\theta }^{new}\), according to the expectation of the complete log likelihood, \(\log p(\varvec{o}, \varvec{a} |\varvec{\theta })\), given in Eq. 4.

To facilitate calculation of the log of the joint probability distribution, \(\log p(\varvec{o},\varvec{a} |\varvec{\theta })\), we introduce an approximation that makes use of \(u_{{\mathcal {N}}_n}^{old}\), represented in Eq. 5.

Then, if convergence of the log likelihood is not satisfied, \(\varvec{\theta }^{old} \leftarrow \varvec{\theta }^{new}\), and repeat.

**Dataset Preparation for MRF.** To split the data into training and test sets, we divided the real-world dataset into year-long time steps. We trained the model’s parameters \(\varvec{\theta }= \lbrace \varvec{\alpha }, \varvec{\beta }\rbrace \) on historical data sampled through time steps \((t_1,...,t_m)\) for all targets within the boundary. These parameters were used to predict poaching activity at time step \(t_{m+1}\), which represents the test set for evaluation purposes. The trade-off between adding years’ data (performance) vs. computational costs led us to use three years (\(m=3\)). The model was thus trained over targets that were patrolled throughout the training time period \((t_1, t_2, t_3)\). We examined three training sets: 2011–2013, 2012–2014, and 2013–2015 for which the test sets are from 2014, 2015, and 2016, respectively.

Capturing temporal trends requires a sufficient amount of data to be collected regularly across time steps for each target. Due to the large amount of missing inspections and uncertainty in the collected data, this model focuses on learning poaching activity only over regions that have been continually monitored in the past, according to Definition 1. We denote this subset of targets as \({\mathcal {S}}_c\).

### Definition 1

**Continually vs. occasionally monitoring:** A target *i*, *j* is continually monitored if all elements of the coverage sequence are positive; \(c_{i,j}^{t_k}>0, \forall k=1,...,m\) where *m* is the number of time steps. Otherwise, it is occasionally monitored.

Experiments with MRF were conducted in various ways on each data set. We refer to (a) a *global* model with spatial effects as **GLB-S**, which consists of a single set of parameters \(\varvec{\theta }\) for the whole QEPA, and (b) a *global* model without spatial effects (i.e., the parameter that corresponds to \(u_{{\mathcal {N}}_n}\) is set to 0) as **GLB**. The spatio-temporal model is designed to account for temporal and spatial trends in poaching activities. However, since learning those trends and capturing spatial effects are impacted by the variance in local poachers’ behaviors, we also examined (c) a *geo-clustered* model which consists of multiple sets of local parameters throughout QEPA with spatial effects, referred to as **GCL-S**, and also (d) a *geo-clustered* model without spatial effects (i.e., the parameter that corresponds to \(u_{{\mathcal {N}}_n}\) is set to 0) referred to as **GCL**.

Figure 2(b) shows the geo-clusters generated by Gaussian Mixture Models (GMM), which classifies the targets based on the geo-spatial features, \(\varvec{Z}\), along with the targets’ coordinates, \((x_{i,j},y_{i,j})\), into 22 clusters. The number of geo-clusters, 22, are intended to be close to the number of patrol posts in QEPA such that each cluster contains one or two nearby patrol posts. With that being considered, not only are local poachers’ behaviors described by a distinct set of parameters, but also the data collection conditions, over the targets within each cluster, are maintained to be nearly uniform.

### 4.2 Prediction by Ensemble Models

A **Bagging ensemble model** or **B**ootstrap **agg**regation technique, called Bagging, is a type of ensemble learning which bags some weak learners, such as decision trees, on a dataset by generating many bootstrap duplicates of the dataset and learning decision trees on them. Each of the bootstrap duplicates are obtained by randomly choosing M observations out of M with replacement, where M denotes the training dataset size. Finally, the predicted response of the ensemble is computed by taking an average over predictions from its individual decision trees. To learn a Bagging ensemble, we used the *fitensemble* function of MATLAB 2017a. **Dataset preparation** for the Bagging ensemble model is designed to find the targets that are liable to be attacked [4]. A target is assumed to be attackable if it has ever been attacked; if any observations occurred in the entire training period for a given target, that target is labeled as attackable. For this model, the best training period contained 5 years of data.

### 4.3 Hybrid of MRF and Bagging Ensemble

Since the amount and regularity of data collected by rangers varies across regions of QEPA, predictive models perform differently in different regions. As such, we propose using different models to predict over them; first, we used a Bagging ensemble model, and then improved the predictions in some regions using the spatio-temporal model. For global models, we used MRF for all continually monitored targets. However, for geo-clustered models, for targets in the continually monitored subset, \({\mathcal {S}}_c^{q}\), (where temporally-aware models can be used practically), the MRF model’s performance varied widely across geo-clusters according to our experiments. *q* indicates clusters and \( 1 \le q \le 22 \). Thus, for each *q*, if the average Catch Per Unit Effort (CPUE), outlined by Definition 2, is relatively large, we use the MRF model for \({\mathcal {S}}_c^{q}\). In Conservation Biology, CPUE is an indirect measure of poaching activity abundance. A larger average CPUE for each cluster corresponds to more frequent poaching activity and thus more data for that cluster. Consequently, using more complex spatio-temporal models in those clusters becomes more reasonable.

### Definition 2

**Average CPUE** is \(\sum _{n \in {\mathcal {S}}_c^{q}}o_n/\sum _{n \in {\mathcal {S}}_c^{q}}c_n^t\) in cluster *q*.

To compute CPUE, effort corresponds to the amount of coverage (i.e., 1 unit = 1 km walked) in a given target, and catch corresponds to the number of observations. Hence, for \( 1 \le q \le 22 \), we will boost selectively according to the average CPUE value; some clusters may not be boosted by MRF, and we would only use Bagging ensemble model for making predictions on them. Experiments on historical data show that selecting \(15\%\) of the geo-clusters with the highest average CPUE results in the best performance for the entire hybrid model (discussed in the Evaluation Section).

## 5 Evaluations and Discussions

### 5.1 Evaluation Metrics

The imperfect detection of poaching activities in wildlife conservation areas leads to uncertainty in the negative class labels of data samples [4]. It is thus vital to evaluate prediction results based on metrics which account for this inherent uncertainty. In addition to standard metrics in Machine Learning (e.g., precision, recall, F1) which are used to evaluate models on datasets with no uncertainty in the underlying ground truth, we also use the L&L metric introduced in [5], which is a metric specifically designed for models learned on Positive and Unlabeled datasets. L&L is defined as \( L \& L=\frac{r^2}{Pr[f(Te)=1]}\), where *r* denotes the recall and \(Pr[f(Te)=1]\) denotes the probability of a classifier *f* making a positive class label prediction.

### 5.2 Experiments with Real-World Data

Evaluation of models’ attack predictions are demonstrated in Tables 1 and 2. Precision and recall are denoted by Prec. and Rec. in the tables. To compare models’ performances, we used several baseline methods, (i) Positive Baseline, **PB**; a model that predicts poaching attacks to occur in all targets, (ii) Random Baseline, **RB**; a model which flips a coin to decide its prediction, (iii) Training Label Baseline, **TL**; a model which predicts a target as attacked if it has been ever attacked in the training data. We also present the results for Support Vector Machines, **SVM**, and AdaBoost methods, **AD**, which are well-known Machine Learning techniques, along with results for the best performing predictive model on the QEPA dataset, INTERCEPT, **INT**, [4]. Results for the Bagging ensemble technique, **BG**, and RUSBoost, **RUS**, a hybrid sampling/boosting algorithm for learning from datasets with class imbalance [10], are also presented. In all tables, **BGG*** stands for the best performing model among all variations of the hybrid model, which will be discussed in detail later. Table 1 demonstrates that **BGG*** outperformed all other existing models in terms of L&L and also F1.

Table 2 provides a detailed comparison of all variations of our hybrid models, **BGG** (i.e., when different MRF models are used). When **GCL-S** is used, we get the best performing model in terms of L&L score, which is denoted as **BGG***. The poor results of learning a global set of parameters emphasize the fact that poachers’ behavior and patterns are not identical throughout QEPA and should be modeled accordingly.

Our experiments demonstrated that the performance of the MRF model within \({\mathcal {S}}_c^{q}\) varies across different geo-clusters and is related to the CPUE value for each cluster, *q*. Figure 3(a) displays an improvement in L&L score for the **BGG*** model compared to **BG** vs. varying the percentile of geo-clusters used for boosting. Experiments with the 2014 test set show that choosing the \(85^{\text {th}}\) percentile of geo-clusters for boosting with MRF, according to CPUE, (i.e., selecting \(15\%\) of the geo-clusters, with highest CPUE), results in the best prediction performance. The \(85^{\text {th}}\) percentile is shown by vertical lines in Figures where the **BGG*** model outperformed the **BG** model. We used a similar percentile value for experiments with the MRF model on test sets of 2015 and 2016. Figure 3(b) and (c) confirm the efficiency of choosing an \(85^{\text {th}}\) percentile value.

## 6 QEPA Field Test

While our model demonstrated superior predictive performance on historical data, it is important to test these models in the field.

The initial field test we conducted in [4], in collaboration with the Wildlife Conservation Society (WCS) and the Uganda Wildlife Authority (UWA), was the first of its kind in the Machine Learning (ML) community and showed promising improvements over previous patrolling regimes. Due to the difficulty of organizing such a field test, its implications were limited: only two 9-km\(^2\) areas (18 km\(^2\)) of QEPA were patrolled by rangers over a month. Because of its success, however, WCS and UWA graciously agreed to a larger scale, controlled experiment: also in 9 km\(^2\) areas, but rangers patrolled 27 of these areas (243 km\(^2\), spread across QEPA) over five months; this is the largest to-date field test of ML-based predictive models in this domain. We show the areas in Fig. 4(a). Note that rangers patrolled these areas in addition to other areas of QEPA as part of their normal duties.

This experiment’s goal was to determine the selectiveness of our model’s snare attack predictions: does our model correctly predict both where there are and are not snare attacks? We define attack prediction rate as the proportion of targets (a 1 km by 1 km cell) in a patrol area (3 by 3 cells) that are predicted to be attacked. We considered two experiment groups that corresponded to our model’s attack prediction rates from November 2016 to March 2017: High (group 1) and Low (group 2). Areas that had an attack prediction rate of 50% or greater were considered to be in a high area (group 1); areas with less than a 50% rate were in group 2. For example, if the model predicted five out of nine targets to be attacked in an area, that area was in group 1. Due to the importance of QEPA for elephant conservation, we do not show which areas belong to which experiment group in Fig. 4(a) so that we do not provide data to ivory poachers.

To start, we exhaustively generated all patrol areas such that (1) each patrol area was \(3\,\times \,3\) km\(^2\), (2) no point in the patrol area was more than 5 km away from the nearest ranger patrol post, and (3) no patrol area was patrolled too frequently or infrequently in past years (to ensure that the training data associated with all areas was of similar quality); in all, 544 areas were generated across QEPA. Then, using the model’s attack predictions, each area was assigned to an experiment group. Because we were not able to test all 544 areas, we selected a subset such that no two areas overlapped with each other and no more than two areas were selected for each patrol post (due to manpower constraints). In total, 5 areas in group 1 and 22 areas in group 2 were chosen. Note that this composition arose due to the preponderance of group 2 areas (see Table 3). We provide a breakdown of the areas’ exact attack prediction rates in Fig. 4(b); areas with rates below 56% (5/9) were in group 2, and for example, there were 8 areas in group 2 with a rate of 22% (2/9). Finally, when we provided patrols to the rangers, *experiment group memberships were hidden to prevent effects where knowledge of predicted poaching activity would influence their patrolling patterns and detection rates*.

### 6.1 Field Test Results and Discussion

The field test data we received was in the same format as the historical data. However, because rangers needed to physically walk to these patrol areas, we received additional data that we have omitted from this analysis; observations made outside of a designated patrol area were not counted. Because we only predicted where snaring activity would occur, we have also omitted other observation types made during the experiment (e.g., illegal cattle grazing). We present results from this five-month field test in Table 4. To provide additional context for these results, we also computed QEPA’s park-wide historical CPUE (from November 2015 to March 2016): 0.04.

Areas with a high attack prediction rate (group 1) had significantly more snare sightings than areas with low attack prediction rates (15 vs. 4). This is despite there being far fewer group 1 areas than group 2 areas (5 vs. 22); on average, group 1 areas had 3 snare observations whereas group 2 areas had 0.18 observations. It is worth noting the large standard deviation for the mean observation counts; the standard deviation of 5.2, for the mean of 3, signifies that not all areas had snare observations. Indeed, two out of five areas in group 1 had snare observations. However, this also applies to group 2’s areas: only 3 out of 22 areas had snare observations.

We present Catch per Unit Effort (CPUE) results in Table 4. When accounting for differences in areas’ effort, group 1 areas had a CPUE that was over ten times that of group 2 areas. Moreover, when compared to QEPA’s park-wide historical CPUE of 0.04, it is clear that our model successfully differentiated between areas of high and low snaring activity. The results of this large-scale field test, the first of its kind for ML models in this domain, demonstrated that our model’s superior predictive performance in the laboratory extends to the real world.

## 7 Conclusion

In this paper, we presented a hybrid spatio-temporal model to predict wildlife poaching threat levels. Additionally, we validated our model via an extensive five-month field test in Queen Elizabeth Protected Area (QEPA) where rangers patrolled over 450 km\(^2\) across QEPA—the largest field-test to-date of Machine Learning-based models in this domain. On real-world historical data from QEPA, our hybrid model achieves significantly better performance than prior work. On the data collected from our field test, we demonstrated that our model successfully differentiated between areas of high and low snaring activity. These findings demonstrated that our model’s predictions are selective and also that its superior laboratory performance extends to the real world. Based on these promising results, future work will focus on deploying these models as part of a software package to UWA to aid in planning future anti-poaching patrols.

## References

Great Elephant Census: The great elephant census—a Paul G. Allen project. Press Release, August 2016

Critchlow, R., Plumptre, A., Driciru, M., Rwetsiba, A., Stokes, E., Tumwesigye, C., Wanyama, F., Beale, C.: Spatiotemporal trends of illegal activities from ranger-collected data in a Ugandan National Park. Conserv. Biol.

**29**(5), 1458–1470 (2015)Critchlow, R., Plumptre, A.J., Alidria, B., Nsubuga, M., Driciru, M., Rwetsiba, A., Wanyama, F., Beale, C.M.: Improving law-enforcement effectiveness and efficiency in protected areas using ranger-collected monitoring data. Conserv. Lett.

**10**(5), 572–580 (2017). Wiley Online LibraryKar, D., Ford, B., Gholami, S., Fang, F., Plumptre, A., Tambe, M., Driciru, M., Wanyama, F., Rwetsiba, A., Nsubaga, M., et al.: Cloudy with a chance of poaching: adversary behavior modeling and forecasting with real-world poaching data. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 159–167 (2017)

Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, vol. 3 (2003)

Nguyen, T.H., Sinha, A., Gholami, S., Plumptre, A., Joppa, L., Tambe, M., Driciru, M., Wanyama, F., Rwetsiba, A., Critchlow, R., et al.: CAPTURE: a new predictive anti-poaching tool for wildlife protection. In: AAMAS, pp. 767–775 (2016)

O’Kelly, H.J.: Monitoring Conservation Threats, Interventions, and Impacts on Wildlife in a Cambodian Tropical Forest, p. 149. Imperial College, London (2013)

Rashidi, P., Wang, T., Skidmore, A., Mehdipoor, H., Darvishzadeh, R., Ngene, S., Vrieling, A., Toxopeus, A.G.: Elephant poaching risk assessed using spatial and non-spatial Bayesian models. Ecol. Model.

**338**, 60–68 (2016)Rashidi, P., Wang, T., Skidmore, A., Vrieling, A., Darvishzadeh, R., Toxopeus, B., Ngene, S., Omondi, P.: Spatial and spatiotemporal clustering methods for detecting elephant poaching hotspots. Ecol. Model.

**297**, 180–186 (2015)Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE SMC-A Syst. Hum.

**40**(1), 185–197 (2010)Solberg, A.H.S., Taxt, T., Jain, A.K.: A Markov random field model for classification of multisource satellite imagery. IEEE TGRS

**34**(1), 100–113 (1996)Yin, Z., Collins, R.: Belief propagation in a 3d spatio-temporal MRF for moving object detection. In: IEEE CVPR, pp. 1–8. IEEE (2007)

## Acknowledgments

This research was supported by MURI grant W911NF-11-1-0332, NSF grant with Cornell University 72954-10598 and partially supported by Harvard Center for Research on Computation and Society fellowship. We are grateful to the Wildlife Conservation Society and the Uganda Wildlife Authority for supporting data collection in QEPA. We also thank Donnabell Dmello for her help in data processing.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Rights and permissions

## Copyright information

© 2017 Springer International Publishing AG

## About this paper

### Cite this paper

Gholami, S. *et al.* (2017). Taking It for a Test Drive: A Hybrid Spatio-Temporal Model for Wildlife Poaching Prediction Evaluated Through a Controlled Field Test.
In: Altun, Y., *et al.* Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_24

### Download citation

DOI: https://doi.org/10.1007/978-3-319-71273-4_24

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-319-71272-7

Online ISBN: 978-3-319-71273-4

eBook Packages: Computer ScienceComputer Science (R0)