Overview
To establish a model of data collection and interpretation capable of identifying specific behaviors, two primary components were utilized: a multidimensional high frequency sensor and a computerized analytics model developed to interpret data and identify specific behaviors.
Data collection
To collect examples of behaviors, dogs were observed and video recorded at 2 humane society facilities (HS1, HS2) and a dermatology referral practice (DR1). Dogs were chosen at HS1 and HS2 based on availability, good health as reported by caretaker staff and personality that allowed handling, placement of the collar, observation and video recording. Dogs were chosen at DR1 based on owner report of pruritus of any kind. Information on each subject dog was documented including name and weight. Breeds were also recorded when known (DR1) or were estimated (HS1, HS2; Table 1).
Table 1 Dogs utilized by Breed Sensor
The wearable sensor used for data collection was an AX3 data logger (Axivity Ltd, United Kingdom, Fig. 1). The sensor includes a micro-electro-mechanical systems (MEMS) 3-axis accelerometer and Flash based on-board memory. The on-board memory is capable of collecting and storing high density data (up to 100 Hz) for 14 days that was later offloaded via the sensor’s micro-USB port interface. The AX3 sensor was selected due to its ability to support configurable resolution/frequency data collection and to collect multidimensional data. This allowed the sensor to be set at sampling rates as low as 10 Hz (10 samples per second) up to 100 Hz. As illustrated in Fig. 2, the ability to collect data at a higher frequency allows more data points to be collected from a single event, presenting a more accurate representation of the original signal. Further analysis of spectrograms of behaviors collected, showed a significant difference in the high frequency content over lower frequency data that would be useful for distinguishing between behaviors (Fig. 3). The AX3 data logger was therefore configured at 100 Hz data sampling rate for data collection and computer algorithm development.
Multidimensional sampling
Sensor data captured can be represented as either single-dimensional data - measuring overall activity (Fig. 4) - or multi-dimensional data, evaluating data from x, y, and z axes (Fig. 5). With multidimensional sampling such as that used here, behaviors like running and scratching become much more differentiated and can be more easily identified as distinct behaviors.
Video recording was performed using a Nexus 7 tablet, Cannon VIXIA HF R600, and GoPro Hero4. Video capture devices were carried by the observer during data collection at HS1 and HS2 and tripod mounted on counter tops with view of the entire exam room at DR1. Sensors were attached to standard 1 in. collars prior to being applied to the dog. To synchronize sensor data and video documentation, an intentional 5 times shaking of the collar-attached sensor was performed within the video field of view when a data collection session was started and prior to the collar being applied to the dog. The collar was then applied to the dog and tightened as needed to provide a space equivalent to two finger-widths between the collar and the dog’s neck. Collars were rotated to position the sensor at the ventral cervical midline. Video recording continued during the data collection process and until the conclusion of the session. To conclude the recording session, the collar was removed from the dog, and 5 times shaking of the sensor was again performed within the video field of view. Duration of the recording session varied from 10 to 15 min at HS1 and HS2 and 15 to 60 min at DR1. Behaviors observed during the recording sessions included walking, running, resting (sitting, standing), eating, drinking, barking, chewing, urinating, digging, excreting (defecating), head shaking and scratching with a preponderance of normal behaviors at HS1 and HS2 and greater incidence of scratching and head shaking at DR1.
Each video collection segment was imported into ELAN Linguistic Annotator [7] and was manually annotated by two observers using a controlled vocabulary (Table 2) while blinded to sensor data. The common annotations from the two observers were exported to a single file (Fig. 6). Sensor data was also imported into ELAN and synchronized with video for each data collection session. Once annotated, data from each dog’s recording session was exported into a separate data file containing columns for time, sensor data, and the annotated behavior (Fig. 7). All non-annotated rows were dropped from each file, and then each data file was broken into one second frames of data, where the label of each frame was taken from the annotation. Each frame contained 100 records, each record representing 0.01 s of X, Y and Z accelerometer axis measurements. As data was aggregated, each frame was also labeled with its file of origin. This allowed data from a single dog’s collection event to be used only once in an algorithm’s training, testing, or validation set.
Table 2 Controlled Vocabulary for Video Editing Prior to separating and cross-folding the data cohort, the population contained more than 110-thousand labeled one-second frames of activities (more than 30 h of annotated examples). To date, there is no data set of this magnitude in the animal health industry that has annotated behaviors.
Algorithm development
Algorithms to identify behaviors were created using the Evolutionary Multi-objective Algorithm Design Engine (EMADE) framework, developed at Georgia Institute of Technology [8]. EMADE processes the data files through multiple generations of algorithm development cycles using a genetic programming approach. Genetic programming (GP) is a bio-inspired approach that allows computers to create a process or set of rules to be followed in calculations or problem solving (algorithms). It uses the concepts of survival of the fittest, mating, and mutation to create a population of candidate solutions. GP is distinguished from broader categories of genetic algorithms by its ability to change the structure of a program in addition to its parameters. To evaluate each candidate algorithm generated by EMADE, three criteria were chosen for simultaneous multi-objective optimization: false negative rate, false positive rate, and complexity of the algorithm. Because the first two are measures of error and our preference was something simpler over complex, the goal was to minimize all three of these objectives.
For the evolutionary machine learning process, the data collection was organized into two groups. The first was the set of data used to train and score the models to select the best candidate. The second set of data was withheld until the final algorithms were chosen and was then used to validate the performance of the algorithms on data to which they had not been previously exposed.
A Pareto front graph for head shaking algorithm development (Fig. 8) displays sample algorithm performance associated with EMADE running through 112 generation cycles. The y-axis indicates the false negative rate (1 minus positive detection rate) of the behavior and the x-axis indicates the false positive rate of the behavior. The Pareto front graph illustrates that successive generation cycles result in new algorithm instances that progressively drive the next generation toward the lower left corner of the graph as it minimizes false negatives and false positives. Once the final algorithm was selected, new data was evaluated and scored to test the ability of the system to correctly identify behaviors.
Statistical analysis was performed and reported using the metrics of sensitivity (true positive rate), specificity (true negative rate), positive predictive value (PPV, precision), negative predictive value (NPV) and accuracy. The equations for each are shown below.
$$ {\displaystyle \begin{array}{l} Sensitivity=\frac{\# True\kern0.5em Positives}{\#\kern0.5em True\kern0.5em Positives\kern0.5em +\kern0.5em \# False\kern0.5em Negatives}\\ {} Specificity=\frac{\# True\kern0.5em Negatives}{\# True\kern0.5em Negatives\kern0.5em +\kern0.5em \# False\kern0.5em Positives}\\ {} PPV=\frac{\# True\kern0.5em Positives\kern0.5em }{\# True\kern0.5em Positives\kern0.5em +\kern0.5em \# False\kern0.5em Positives}\\ {} NPV=\frac{\# True\kern0.5em Negatives}{\# True\kern0.5em Negatives\kern0.5em +\kern0.5em \# False\kern0.5em Negatives}\\ {} Accuracy=\frac{\# True\kern0.5em Negatives\kern0.5em +\kern0.5em \# True\kern0.5em Positives}{\#\kern0.5em All\kern0.5em Negatives+\kern0.5em \# All\kern0.5em Positives}\end{array}} $$