Exploring Tracking Data: Representations, Methods and Tools in a Spatial Database
The objects of movement ecology studies are animals whose movements are usually sampled at more-or-less regular intervals. This spatiotemporal sequence of locations is the basic, measured information that is stored in the database. Starting from this data set, animal movements can be analysed (and visualised) using a large set of different methods and approaches. These include (but are not limited to) trajectories, raster surfaces of probability density, points, (home range) polygons and tabular statistics. Each of these methods is a different representation of the original data set that takes into account specific aspects of the animals’ movement. The database must be able to support these multiple representations of tracking data. In this chapter, a wide set of methods for implementing many GPS tracking data representations into a spatial database (i.e. with SQL code and database functions) are introduced. The code presented is based on the database created in Chaps. 2, 3, 4, 5, 6, 7 and 8.
KeywordsAnimal movement Trajectory Home range Database functions Movement parameters
The objects of movement ecology studies are animals whose movements are usually sampled at more-or-less regular intervals. This spatiotemporal sequence of locations is the basic, measured information that is stored in the database. Starting from this data set, animal movements can be analysed (and visualised) using a large set of different methods and approaches. These include (but are not limited to) trajectories, raster surfaces of probability density, points, (home range) polygons and tabular statistics. Each of these methods is a different representation of the original data set that takes into account specific aspects of the animals’ movement. The database must be able to support these multiple representations of tracking data.
Although some very specific algorithms (e.g. kernel home range) must be run in a dedicated GIS or spatial statistics environment (see Chaps. 10 and 11), a number of analyses can be implemented directly in PostgreSQL/PostGIS. This is possible due to the large set of existing spatial functions offered by PostGIS and to the powerful but still simple possibility of combining and customising these tools with procedural languages for applications specific to wildlife tracking. What makes the use of databases to process tracking data very attractive is that databases are specifically designed to perform a massive number of simple operations on large data sets. In the recent past, biologists typically undertook movement ecology studies in a ‘data poor, theory rich’ environment, but in recent years this has changed as a result of advances in data collection techniques. In fact, in the case of GPS data, for which the sampling interval is usually frequent enough to provide quite a complete picture of the animal movement, the problem is not to derive new information using complex algorithms run on limited data sets (as for VHF or Argos Doppler data), but on the contrary to synthesise the huge amount of information embedded in existing data in a reduced set of parameters.
Complex models based on advanced statistical tools are still important, but the focus is on simple operations performed in near real time on a massive data flow. Databases can support this approach, giving scientists the ability to test their hypotheses or provide managers the compact set of information that they need to take their decisions. The database can also be used in connection with GIS and spatial statistical software. The database can preprocess data in order to provide more advanced algorithms the data set requires for the analysis. In the exercise for this chapter, you will create a number of functions1 to manipulate and prepare data for more complex analysis. These include functions to extract environmental statistics from a set of GPS positions; create (and store) trajectories; regularise trajectories (subsample and spatially interpolate GPS positions at a defined time interval); define bursts; compute geometric parameters (e.g. spatial and temporal distance between GPS positions, relative and absolute angles, speed); calculate home ranges based on a minimum convex polygon (MCP) algorithm; and run and store analyses on trajectories. These are examples that can be used to develop your own tools.
Extraction of Statistics from the GPS Data Set
A first, simple example of animal movement modelling and representation based on GPS positions is the extraction of statistics to characterise animals’ environmental preferences (in this case, minimum, maximum, average and standard deviation of altitude, and the number of available GPS positions): Open image in new window
The result is
It is also possible to calculate similar statistics for categorised attributes, like land cover classes: Open image in new window
The result is
A New Data Type for GPS Tracking Data
Before adding new tools to your database, it is useful to define a new composite data type2. The new data type combines the simple set of attributes animals_id (as integer), acquisition_time (as timestamp with time zone), and geom (as geometry) and can be used by most of the functions that can be developed for tracking data. Having this data type, it becomes easier to write functions to process GPS locations. First create a data type that combines these attributes: Open image in new window
You can also create a view where this subset of information is retrieved from gps_data_animals: Open image in new window
The result is the complete set of GPS locations stored in main.gps_data_animals with a limited set of attributes. As you can see, for locations without valid coordinates (gps_validity_code ! = 1), the geometry is set to NULL. Records with duplicated acquisition times are excluded from the data set. This view can be used as a reference for the functions that have to deal with the locations_set data set.
Representations of Trajectories
You can exploit the locations_set data type to create trajectories and permanently store them in a table. For a general introduction to trajectories in wildlife ecology, see Calenge et al. (2009), which is also a major reference for a review of the possible approaches in wildlife tracking data analysis. First, you can create the table to accommodate trajectories (see the comment in the function itself for more details): Open image in new window
Then, you can create a function that produces the trajectories and stores them in the table analysis.trajectories. This function creates a trajectory given an SQL code that selects a set of GPS locations (as locations_set object) where users can specify the desired criteria (e.g. id of the animal, start and end time). It is also possible to add a second parameter: a text that is used to comment the trajectory. A trajectory will be created for each animal in the data set. Open image in new window Open image in new window
Note that in PostgreSQL, if you want to add a single quote in a string (‘), which is usually the character that closes a string, you have to use an escape character before3. This can be done using two single quotes (‘‘): the result in the string will be a single quote. Here are two examples of use. The first example is Open image in new window
The second example is Open image in new window
The outputs are stored in the analysis.trajectories table. You can see the results in tabular format with Open image in new window
A subset of the fields returned from this query is reported below.
You can compare the length calculated on a 2D trajectory and on a 3D trajectory (i.e. also considering the vertical displacement). This is the code for the 2D trajectory: Open image in new window
The result is
This is the code for the 3D trajectory: Open image in new window
The result is
You can see how in an alpine environment the difference can be relevant. Many functions in PostGIS support 3D objects. For a complete list, you can check the documentation4. You can also store points as 3DM objects, where not just the altitude is considered, but also a measure that can be associated with each point. For tracking data, this can be used to store, embedded in the spatial attribute, the acquisition time. As the timestamp data type cannot be used directly, it can be transformed to an integer using epoch 5, an integer that represents the number of seconds since 1 January 1970.
Regularisation of GPS Location Data Sets
Another useful tool is the regularisation of the location data set. Many times the acquisition time of the GPS sensor is scheduled at a varying frequency. The function introduced below transforms an irregular time series into a regular one, i.e. with a fixed time step. Records that do not correspond to the desired frequency are discharged, while if no record exists at the required time interval, a (virtual) record with the timestamp but no coordinates is created (see the comments embedded in the function for more information on input parameters). Note that this function does not perform any interpolation, but simply resamples the available locations adding a record with NULL coordinates where necessary. Open image in new window Open image in new window Open image in new window
You can test the effects of the function, comparing the different results with the original data set. For instance, let us extract a regular trajectory for animal 6 with a time interval of 8 h (i.e. 60 × 60 × 8 s): Open image in new window
The first 15 results (out of a total of 96) are
The same with a time interval of 4 h Open image in new window
The first 15 results (out of a total of 191) are
And finally, with a time interval of just 1 h Open image in new window
The first 15 results (out of a total of 762) are
Interpolation of Missing Coordinates
The next function creates the geometry for the records with no coordinates. It interpolates the positions of the previous and next record, with a weight proportional to the temporal distance. Before you can define the function, you have to create an index that is a sequence number generator6. This is used to create temporary table with a name that is always unique in the database: Open image in new window
You can now create the interpolation function. It accepts as input animals_id and a locations_set (by default, the main.view_locations_set). It checks for all locations with NULL geometry to be interpolated. You can also specify a threshold for the allowed time gap between locations with valid coordinates, where the default is two days. If the time gap is smaller, i.e. if you have valid locations before and after the location without coordinates at less than two days of time difference, the new geometry is created, otherwise the NULL value is kept (it makes no sense to interpolate if the closest points with valid coordinates are too distant in time). Open image in new window Open image in new window Open image in new window
The locations which were interpolated are not marked. You can identify the interpolated locations by joining the result with the original table and see where records originally without coordinates were updated. You can test it comparing the results of the next two queries. In the first one, you just retrieve the original data set: Open image in new window
The first 15 rows of the result (1,486 rows including 398 NULL geometries) are
In the second query, you can fill the empty geometries using the tools.interpolate function: Open image in new window
The first 15 rows of the result (same number of records, but NULL geometries have been replaced by interpolation) are reported below. You can see that there are no gaps anymore.
You can also use this function in combination with the regularisation function to obtain a regular data set with all valid coordinates. In this query, first you regularise the function using a time interval of 4 h (for the animal 4), and then, you fill the gap in records with no coordinates: Open image in new window
The first 15 records of the result (now 2,854 records with no NULL geometries) are
In fact, both functions (as with many other tools for tracking data) have the same information (animal id, acquisition time, geometry) as input and output, so they can be easily nested.
Detection of Sensors Acquisition Scheduling
Another interesting piece of information that can be retrieved from your GPS data set is the sampling frequency scheduling. This information should be available as it is defined by GPS sensors’ managers, but in many cases it is not, so it can be useful to derive it from the data set itself. To do so, you have to create a function based on a new data type: Open image in new window
This function gives the ‘bursts’ for a defined animal. Bursts are groups of consecutive locations with the same frequency (or time interval). It requires an animal id and a temporal buffer (in seconds) as input parameters and returns a table with the (supposed) schedule of acquisition frequency. The output table contains the fields animals_id, starting_time, ending_time, num_locations, num_locations_null and interval_step (in seconds, approximated according to multiples of the buffer value). A relocation is considered to have a different interval step if the time gap is greater or less than the defined buffer (the buffer takes into account the fact that small changes can occur because of the delay in reception of the GPS signal). The default value for the buffer is 600 (10 min). The function is directly computed on main.view_locations_set (locations_set structure) and on the whole data set of the selected animal. Here is the code of the function: Open image in new window Open image in new window Open image in new window
Here, you can verify the results. You can use the function with animal 5: Open image in new window
The result is
In this case, the time interval is constant (14,400 s, which means 4 h). The second and fourth bursts are made of a single location. This is because you have a gap greater than the temporal buffer with no records, not a real new burst.
Now run the same function on animal 6: Open image in new window
The result is reported below. In this case, a more varied scheduling has been used (1, 2 and 4 h):
Representations of Home Ranges
Home range is another representation of animal movement and behaviour that can be derived from GPS tracking data. Home range is roughly described as the area in which an animal normally lives and travels, excluding migration, emigration or other large infrequent excursions. There are different ways to define this concept and different methods for computing it. A common approach to modelling home ranges is the delineation of the boundaries (polygons) of the area identified (according to a specific definition) as home range. The simplest way to create a home range is the MCP approach. PostGIS has a specific function to compute MCP (ST_ConvexHull). In this example, you can create a function to produce an MCP using just a percentage of the available locations, in order to exclude the outliers which are far from the pool of locations, based on a starting and ending acquisition time. First, you can create a table where data can be stored. This table also includes some additional information that describes the result and can be used both to document it and to run meta-analysis. In this way, all the results of your analysis are permanently stored, accessible, compact and documented. Open image in new window Open image in new window
This function applies the MCP algorithm (also called convex hull) to a set of locations. The input parameters are the animal id (each analysis is related to a single individual), the percentage of locations to be considered and a locations_set object (the default is main.view_locations_set). An additional parameter can be added: a description that will be included in the table home_ranges_mcp, where the result of the analysis is stored. The parameter percentage defines how many locations are included in the analysis: if, for example, 90 % is specified (as 0.9), the 10 % of locations farthest from the centroid of the data set will be excluded. If no parameters are specified, the percentage of 100 % is used and the complete data set (from the first to the last location) are considered. The following creates the function: Open image in new window Open image in new window Open image in new window Open image in new window
You can create the MCP at different percentage levels: Open image in new window
The output is stored in the table. You can retrieve part of the columns of the table with Open image in new window
The result is
Note that the last statement generates the MCP for all the animals with a single command.
A further example of synthetic representation of the GPS location set is illustrated in the view below: for each GPS position, you can compute a buffer (a circle of 0.001 degrees, which at this latitude corresponds to about 100 meters), and then, all the buffers of the same animal are merged together: Open image in new window
Geometric Parameters of Animal Movements
Another type of analytical tool that can be implemented within the database is the computation of the geometric parameters of trajectories (e.g. spatial and temporal distance between locations, speed and angles). As the meaning of these parameters changes with the time step, you will create a function that computes the parameters just for steps that have a time gap equal to a value defined by the user. First, you must create the new data type tools.geom_parameters: Open image in new window
Now you can create the function tools.geom_parameters. It returns a table with the geometric parameters of the data set (reference: previous location): time gap with the previous point, time gap with the previous–previous point, distance to the previous point, speed of the last step, distance to the first point of the data set, absolute angle (from the previous location), relative angle (from the previous and previous–previous locations). The input parameters are the animal id, the time gap and a buffer to take into account possible time differences due to GPS data reception. The time gap parameter selects just locations that have the previous point at the defined time interval (with a buffer tolerance). All the other locations are not taken into consideration. A locations_set class is accepted as the input table. It is also possible to specify the starting and ending acquisition time of the time series. The output is a table with the structure geom_parameters. If you want to calculate the geometric parameters of an irregular sequence (i.e. the parameters calculated in relation to the previous/next location regardless of the regularity of the time gap), you can use a plain SQL based on window functions7 with no need for customised functions. It is important to note that while a step is the movement between two points, in many cases the geometric parameters of the movement (step) are associated with the starting or the ending point. In this book, we use the ending point as reference. In some software, particularly the adehabitat8 package for R (see Chap. 10), the step is associated with the starting point. If needed, the queries and functions presented here can be modified to follow this convention. The code of the function is Open image in new window Open image in new window Open image in new window
To test how the function works, you can run and compare the function applied to the same animal 6 at different time steps. In the first case, you can use 2 h: Open image in new window
A subset of the columns of the first 10 rows returned by the function is
The real results include a longer list of parameters that is not possible to report because of space constraints. To save space, the dates have been transformed into Julian day of the year (DOY, in the range 1–365).
You can apply the function with an interval step of 4 h: Open image in new window
A subset of the result is reported below:
As you can see, there are very few sequences of at least three points at a regular temporal distance of 4 h in the original data set (at least in the first records).
Now apply the function with 8 h interval step: Open image in new window
The result is reported below. Just 3 records are retrieved because the scheduling of 8 h is not used in this data set.
An Alternative Representation of Home Ranges
In the next example of possible methods to represent and analyse GPS locations using the tools provided by PostgreSQL and PostGIS, you can create a grid surface and calculate an estimation of the time spent in seconds by each animal within each ‘pixel’. There are many existing approaches to producing this information; in this case, you will use an algorithm that is conceptually similar to a simplified Brownian bridge method (Horne et al. 2007) and to the method proposed in (Kranstauber et al. 2012). In this example, you can assume that the animal moves with along the trajectory described by the temporal sequence of locations and that the speed is constant along each step. You can create a grid with the given resolution that is intersected with the trajectory. For each segment of the trajectory generated by the intersection, the time spent by the animal is calculated (considering the time interval of that step and the relative length of the segment compared to the whole step length). Finally, you can sum the time spent in all the segments inside each cell. You can implement this method using a view and a function that creates the grid, which is based on a new data type that you create with the code Open image in new window
it is implemented with SQL, which is a relatively simple language to modify/customise/extend;
it is run inside the database, so results can be directly stored in a table, used to run meta-analysis, and extended using other database tools;
it is conceptually simple and gives a ‘real’ measure (time spent in terms of hours);
no parameters with unclear physical meaning have to be set; and
it handles heterogeneous time intervals.
On the other hand, it implicitly relies on a very simplified movement model (the animal moves along the segment that connects two locations with a constant speed).
Dynamic Age Class
While age class is stored in the animals table with reference to the capture time, it can change over time. If this information must be associated with each location (according to the acquisition time), a dynamic calculation of the age class must be used. We present here an example valid for roe deer. With a conservative approach, we can consider that on 1 April of each year, all the animals that were fawns become yearlings, and all the yearlings become adults. Adults remain adults. The function below requires an animal id and an acquisition time as input. Then, it checks the capture date and the age class at capture. Finally, it compares the capture time to the acquisition time: if 1 April has been ‘crossed’ once or more, the age class is increased accordingly: Open image in new window Open image in new window
Unfortunately, all the animals in the database are adults, so no change in the age class is possible. In any case, as an example of usage, we report the code to retrieve the dynamic age class: Open image in new window
The result is
Generation of Random Points
In some cases, it can be useful to generate a determined number of random points in a given polygon (e.g. resource selection function, in order to get a representation of the available habitat). This can be done using the database function reported below. It requires a polygon (or multipolygon) geometry and the desired number of points as input. The output is the set of points: Open image in new window Open image in new window
It can be used in a view to generate a set of points automatically whenever the view is called. In this example, the study area is used as input geometry to generate 100 random points: Open image in new window
The row_number() is added to generate a unique integer associated with each point; otherwise, some of the client applications will not be able to deal with this view. If you visualise the view in a GIS environment (e.g. in QGIS), you will notice that the set of points changes every time that you refresh your GIS interface. This is because the view generates a new set of points at every call. If you need to consistently generate the same set of points for reproducibility, you can specify a third parameter that defines the seed9 (a numeric value in the range from -1 to 1) based on the PostgreSQL setseed 10 function. The seed option allows you to reproduce the same results while keeping the generation process random. Changing the seed will generate another set of random locations. Another option is to make the random points permanent and upload the result into a permanent table that can then be processed further (e.g. intersected with environmental layers): Open image in new window