Introduction

Parkinson’s Disease (PD) is a multi-symptomatic progressive neurological disease resulting in fatal symptomatic-system failure. It is estimated that as many as 10 million people worldwide are living with PD (Mhyre et al. 2012). In the United States alone, the incidence of PD is estimated at the rate of 90,000 new cases a year. The only neurological disease more prevalent than PD is Alzheimer’s Disease. The greatest known risk factor of PD is aging, with over 90% of cases in the United States at age 65 or older (Willis et al. 2022). The second-greatest risk factor is gender, with the incidence rate of males developing PD at 1.5 times greater than females (Willis et al. 2022).

There is no conclusive screening test for PD and the diagnosis is most often provided by a combination of clinical assessment of symptoms, dopamine transporter scan, skin biopsy, hereditary, and environmental factors. The misdiagnosis rate in PD is estimated to be between 10 and 20%, with a greater rate of misdiagnosis occurring within the first two years of diagnosis (Rizzo et al. 2016). By the time a patient receives a clinical diagnosis, the disease has progressed to the point of a 60% dopaminergic reduction in the basal ganglia of the brain. Due to this reduction in dopamine, the diagnostic hallmark symptom of tremors become present, along with other associated symptoms, such as loss of balance, cramped handwriting, sleep disturbance, and altered patterns of speech (Hess and Okun 2016).

The gradual reduction of dopamine is estimated to begin in the brain 5–10 years before the presentation of the symptoms that lead to a clinical diagnosis (Rees et al. 2019). Once this diagnosis has been made, a patient is usually treated with a drug or combination of drugs, which work to cross the blood–brain barrier and deliver a chemical substitute for dopamine, helping to slow, but not stop, the progressive reduction of dopaminergic loss to the brain (Kobylecki 2020). It is believed that if drugs, along with other therapeutic interventions, could be introduced at the onset of PD, then the progression of the disease could be slowed at a much greater rate than provided through current methods of detection and treatment. This explains the urgent need for an early detectable biomarker of PD (Le et al 2017).

In 2015, Joy Milne, a woman diagnosed with hyperosmia, a condition that provides her with an extremely sensitive sense of smell, was shown to be able to distinguish between T-shirts worn by PD-positive sample donors and T-shirts worn by human control sample donors (Morgan 2016). In 2020, working with Joy Milne, researchers at the University of Manchester utilized paper spray ionization mass spectrometry, along with Milne’s sense of smell, to discover more than 500 organic compounds unique to the sebum of PD-positive patients (Sarkar et al. 2022; Walton-Doyle C et al. preprint in 2023). This discovery is being used for the development of a laboratory skin test for the screening of a biomarker for Parkinson’s (Morgan 2016).

With evidence of Milne being able to detect a PD-associated odor by sense of smell, the use of canines, with their superior olfactory ability (Jenkins et al. 2018) for the purpose of PD detection, quickly emerged as an area of interest. Dogs can be trained to detect target odors with a sensitivity that surpasses the capabilities of not only humans, but most modern instruments (Kokocińska-Kusiak et al. 2021). Dogs can detect odors in parts per trillion (Jenkins et al. 2018) and have been successfully trained to use their olfactory ability to correctly identify and distinguish diseases such as lung (Amundsen et al. 2014; Feil et al. 2021; Hackner et al. 2016; Junqueira et al. 2019), breast (Kure et al. 2021; McCulloch et al. 2006), and colorectal cancers (Sonoda et al. 2011), as well as malaria (Guest et al. 2019), COVID-19 (Grandjean et al. 2022; Devillier et al. 2022; Meller et al. 2022; Otto et al. 2023) and diabetes—each with a reported sensitivity and specificity of > 80% (Amundsen et al. 2014; Catala et al. 2019; Feil et al. 2021; Grandjean et al. 2022; Guest et al. 2019; Jenkins et al. 2018; Kokocińska-Kusiak et al. 2021; McCulloch et al. 2006; Sonoda et al. 2011).

The first published study findings to support canine detection of PD were made available in 2022 by Gao et al. (Gao and Wang 2022). The Gao study presented proof of concept for the canine detection of PD in a controlled, laboratory setting. The three canines trained for the study were of the Belgian Malinois breed and were trained in a laboratory setting for canine detection of PD. Before the study, all three dogs had been in training for two years for PD detection. In 2023, a second study by Rooney et al., preprint in 2023, provided further supporting evidence of canine detection of PD with findings of two trained dogs in a medical detection program in the U.K.

The first program to undertake the training of canine detection of PD was PADs (Parkinson’s Alert Dogs), a nonprofit canine detection program that began in February of 2016 and continued through March of 2023. The study findings in this paper provide results compiled from the years 2021 and 2022, based on a determination that the final two years of the research program held the most relevant information for the scientific community.

Since the acquisition and development of an in-house canine detection program in which the dogs are owned, socialized and trained as puppies for detection can be a costly enterprise, PADs in 2016 opted to enlist readily available, breed-varied household canine pets belonging to volunteer community members. Household-maintained canines have been successfully trained for medical detection purposes by both Medical Detection Dogs, U.K., (Medical Detection Dogs, UK) and the Penn Vet Working Dog Center (Penn Vet Working Dog Center) as opposed to a canine group-kenneled environment. The dogs in these two detection programs are brought from their home to the facility for training purposes and returned to their home at the end of each training day. The dogs of Medical Detection have been successfully trained for the detection of cancer, diabetes, and malaria (Medical Detection Dogs, UK). The dogs of Penn Vet Working Dog Center have been successfully trained for the detection of ovarian cancer detection (Kane et al. 2022).

Prioritizing breed significance as a primary selection criterion for a detection dog has also been brought into question in recent years. In a study of scent detection dogs (Troisi et al. 2019), a variety of socioenvironmental factors were explored as to how they directly influence canine detection performance, and many were found to be independent of breed characterization. Another canine detection breed comparison study was conducted in which German Shepherds, a breed considered desirable for scent detection, were compared to Pugs, a brachycephalic breed considered unsuitable for scent detection due to the physical attributes of a Pug’s shortened nose. In the study, the Pugs outperformed the German Shepherds in the detection tasks set before them. A third breed, Greyhounds, were also included in the study, but the Greyhounds did not perform well in the scent detection tasks set before them, likely due to lack of motivation in performing the task (Hall et al. 2015).

In addition to the primary objective of determining whether breed-varied, household pet companion dogs could be trained to distinguish between sebum samples provided by PD-positive and PD-negative human controls, a secondary objective of the study was to determine if the drug, levodopa, would emerge as a factor in the sensitivity findings of the dogs. Since 1970, levodopa has been the most prescribed drug for the therapeutic treatment of PD (Abbott 2010) and a possibility existed that dogs could target a volatile organic compound as a byproduct of levodopa drug usage, rather than a PD-associated odor(s) caused by the disease.

The following questions were selected for investigation to support study objectives:

  • How would house-raised pet dogs of varied breeds perform on a PD detection task?

  • Would the dogs select previously encountered samples by memory rather than by olfactory distinction?

  • What would be the comparison in canine performance as training time and exposures were increased?

  • What would be the comparison in canine performance between samples obtained from levodopa-naïve PD-donors and levodopa-positive PD-donors?

  • What would be the comparison in canine performance between samples collected from male and female PD-positive donors?

Methods

Study design

This was a handler-blind, randomized study that included a significant number of detection-trained companion dogs of varying breeds, exposures, and training days to evaluate sensitivity and specificity when presented with PD-positive and PD-negative samples. The study took place in a controlled laboratory setting that was located within a dedicated facility. The facility building was leased and remodeled for the purpose of the seven-year study and included a 14 × 20 ft training area, canine waiting room, research and scribe observation area, sample storage room, office and storage room, and canine exit hallway. The facility was used exclusively for the purpose of this study.

The study period for these two years included 200 total working days—95 days in 2021 and 105 days in 2022. Twenty-three dogs participated in a total of 4553 individual trials. The average canine attendance was 8.7 dogs per daily working session. For a video example of the dogs working—see supplementary materials (online resource 1). For each daily working session in which 10 or more dogs participated, a minimum of 320 data points were recorded.

PD-positive sample determination

Due to a 10%, or higher, potential rate of misdiagnoses in PD; samples that were obtained from novel donors carried the inherent possibility of a misdiagnosis. In these instances, some of the dogs in the first round of sample presentation were not reinforced for a positive indication, regardless of whether it may have been a correct response. This was to reduce the opportunity for any dog to be reinforced for an incorrect response. These samples were only categorized as PD-positive when a minimum of eight of 10 trained dogs provided a positive indication to the sample when the unique (novel) PD-questionable sample was first introduced in the first round. Since no conclusion could be drawn prior to the analysis of the response of the first 10 dogs in the round, no dogs were reinforced until the outcome was determined. Other factors that were considered to support this canine determination included length of diagnosis, levodopa tolerance and whether the diagnosis was provided by a neurologist, movement order specialist or family physician. Similar to Gao (Gao and Wang 2022), the dogs’ response itself was used as further evidence of a PD-positive or PD-negative nature of the sample.

This form of feedback in which positive reinforcer (reward) was withheld for a correct indication carried the risk of causing extinction, or diminishing, the desired response, and was in direct conflict with the dogs’ fundamental, learned cognitive response to the target odor. To counteract this, the dogs underwent training recovery rounds and recovery days on previously determined PD-positive samples, so that the dogs could regain drive for continued work and maintain their expected performance levels for sensitivity and specificity.

If the assessment sample (novel PD-donor sample) in the first round of presentation to the dogs was deemed PD-positive based on the response of 80% or higher sensitivity of 10 dogs, then the assessment sample was then presented in a second round, in a new position on the wheel (see Fig. 2), and the dogs were reinforced for a correct response in that round. If the assessment sample in round one was deemed PD-negative based on the response of 79% or lower sensitivity of 10 dogs, then the dogs were presented with a different, previously assessed PD-positive sample, in the second round.

Sample participants

PD-positive sample donors were recruited from Parkinson’s support groups, Rock Steady Boxing (a PD-participant fitness organization), physicians, and neurologists, as well as via media coverage and pull-through responses from the website, (padsforparkinsons.org). Control sample donors were recruited from activity clubs, friends, and family members of program volunteers, businesses, and other events. All samples were collected from donors who provided informed consent, and personal donor sample information was blinded from all study personnel, affiliates, and media. The informed consent document used to obtain permission for sample inclusion in this research study was prepared and reviewed by legal counsel.

There were 43 total PD-positive sample donors and 31 total PD-negative sample donors whose samples were used in this study spanning 2021 and 2022 (Table 1). PD-participant donors were screened for levodopa drug usage, age, gender, onset of symptoms, type of diagnosis, date of diagnosis, and whether the diagnosis was by a physician, movement disorder specialist, or neurologist. Overall, 24 of 43 sample donors were diagnosed by a neurologist, one was diagnosed by a movement disorder specialist, and the remaining 18 sample donors were diagnosed by a physician. Control participants were screened for age and gender and were eliminated for any of the following: household partner diagnosed with PD, anosmia, bowel disorder, sleep disorder, changes in gait, speech, or handwriting. When all unique samples were presented to the dogs, the Control and PD samples were matched in age within 10 years and matched by gender in 80% or greater instances. Both PD participant and Control participant donor samples were of varied geographic origin within the United States. Table 2 lists the control sample donor characteristics for the 31 control samples used in this study.

Table 1 PD-positive sample donor characteristics by age group and levodopa usage
Table 2 PD-negative donor sample characteristics by age group

For this two-year study, 58% of PD sample donors were selected based on confirmed diagnosis by a neurologist and the elapsed time since diagnosis and symptoms. The remaining 42% of the PD-patient sample donors were selected based on reported symptoms with clinical diagnoses. The levodopa-negative Parkinson’s sample donors comprised 17 of the 43 total sample donors. Control sample donors were of varying age, ethnicity, and gender, reporting no associative Parkinson’s symptoms.

Sample material

T-shirts that were worn overnight by all participants were chosen as sample material. This selection was based on the Joy Milne test developed by Dr. Tilo Kunath at the University of Edinburgh (Morgan 2016). All T-shirts provided to both PD and control participants were purchased from the same manufacturer, tank style, ribbed, 100% cotton, and then freshly-washed with the same brand detergent, in the same machine, and packaged and shipped using the same materials. Along with a T-shirt, sample donors were provided with a new, stainless-steel double-walled vacuum-insulated 24-oz thermos and a new metal lunchpail for packaging the T-shirt, once worn, for the return to the program (Fig. 1). The thermos was designed to hold heat for 14 h and cold for up to 24 h and was selected as a solution for containing volatile organic compounds within a container for shipping and storage. Donor sample participants were instructed to pack the T-shirt into the thermos with the neck of the T-shirt at the top of the thermos.

Fig. 1
figure 1

Sample kit including T-shirt and canister, with additional materials for swab sample collection. Panel I Kit container, numbered sample container, packaged T-shirt sample to be worn by donor, disposable gloves for caretaker (if assistance is needed), research collection forms. Panel II Sterile swab packet, swab sample container showing removable cap at either end so that heads of swabs are available for sniffing without being handled by a researcher

Sample storage

All thermoses containing samples were stored in a dedicated cooler set at 11 °C and presented to the dogs within 30 days of receipt. Samples on which the dogs performed with averages of 80% or higher sensitivity as indicated by 10 or more dogs in one session were used for maintenance training, recovery training, and foundation training of new dogs.

In 2022, during the second year of this study reporting-period, we learned that Medical Detection Dogs, U.K., had successfully trained dogs for the detection of PD-associated odor using cotton swabs. Based on this information, in September of 2022, the PADs program changed the sample material used in canine detection from T-shirts to cotton swabs, a less expensive and more efficient sample material for collection and storage. Under the sample collection protocol for swabs, 10 cotton swab samples were collected from the upper back and neck area of each PD and control sample participant. The swab samples were collected by a gloved PADs researcher and each sample collection of 10 swabs was placed in a 10.9 cm (4.3 in.) height by 1.3 cm (1/2 in.) diameter metal tube that was capped at both ends. This allowed for the swab-tip end of the cotton swab to be opened for sniffing by the dogs without handling by a researcher. The tubes, containing swab samples, were stored under the same protocol, and handled under the same procedure as the T-shirt samples (Fig. 1).

Canine participants

A total of 23 canines of various breeds and ages participated. The canines were selected based on age, demonstration of drive, and availability for training, with little consideration given to breed or sex. All dogs completed a minimum of eight months of prior training for PD detection, whereas some had completed as many as five years of prior training. The Parkinson’s canine detection program had been underway since 2016, so all dogs in the study were previously trained to distinguish between T-shirt samples worn by Parkinson’s patients and T-shirt samples worn by healthy human controls with an average of 75% or higher sensitivity and specificity. Breeds represented in the study included commonly utilized breeds for detection purposes, such as Labrador Retriever and Vizsla, and some of the dogs were of less commonly utilized breeds, such as Pomeranian and English Mastiff. Table 3 shows the breed, age, and sex of each dog at the beginning of the study.

Table 3 Characteristics of all canine participants in this study

Foundation detection training

Before study selection and during the study period, all dogs were exposed to operant conditioning training methods in their home environment. Owner/handlers engaged primarily in force-free behavioral training, providing feedback in the form of delivering a reward/reinforcer for a correct response, or withholding a reward/reinforcer for an incorrect response.

All dogs were trained for PD odor detection by a single detection trainer. None of the dogs were owned or separately handled by the detection trainer. The dogs underwent foundation training through methods similar to controlled substance detection training, beginning with building hunt drive on a primary reinforcer, then followed by pairing in which the primary reinforcer was available to the dog along with the PD-positive target odor. At the final stage of foundation training, the dogs worked in the absence of pairing and received a reinforcer for an indication of the target odor. During foundation training, all dogs underwent a minimum of 300 intermittent paired exposures (the primary reinforcer, i.e., food, or residual or lingering food odor, was available to the dog at the time of exposure) at the rate of 10 exposures per training day on their target odor over a period of six months. The dogs were also subjected to varied background noise in both target and control samples and were presented with distractors (i.e., onion, coffee, spices, soaps, etc.) and rounds that were absent of PD-positive target odor (clear rounds). In clear rounds, where no PD-positive target odor was presented, the dogs were trained to go to the exit gate of the facility and were then reinforced outside of the exit gate. During the final 20 working days of foundation training, approximately 10% of rounds were presented as clear rounds. Unless presented with a clear round, the dogs were presented with equal numbers of human-sourced control samples and PD-positive samples in each round, so the dogs would learn to distinguish their target odor from the background noise of human scent.

Once a dog had detected the target odor, the dog was trained to hold its body position at the target odor for a minimum of three seconds before receiving a reinforcer. This allowed adequate time for the dogs to clear the control samples with most clearing of samples taking place in one second or less. The dogs were allowed to sit, stand, down, or point while holding, but an alert was not confirmed unless the dog held the position for three or more seconds (Essler et al. 2020). If a dog held position at a human-sourced control sample for three seconds and then moved onto the target odor for three seconds, the dog was reinforced for indication of the target odor but scored as 0% sensitive, 0% specific for the round (see Table 4, Scoring of the Dogs).

Table 4 Scoring of the dogs for each round

Of note is that the foundation training of the dogs in this study was designed to allow the dogs to adopt their chosen form of communication to indicate the presence of the target odor. The dogs were also trained to adopt a natural behavioral response to the absence of the target odor (leave the room). As part of the foundation training, dogs were allowed to choose their own directional path in how they sourced or eliminated the presence of a target odor, with some dogs taking a clockwise direction in the room, while some dogs took a counterclockwise direction in the room, and some dogs changed direction during the trial. This form of training allowed the dogs to adopt a more natural, self-taught predictive response to the PD-associated target odor and may lessen false-positive or false-negative response. In this manner of training, the learned behavioral response (or indication) of the companion dog becomes more closely and singularly associated with the environment of the trained olfactory odor because it is not usually a behavioral response common outside of the environment. It should be noted that the dogs were shaped within their natural response to demonstrate tight criteria and were required to continue this demonstration for a minimum of three seconds when indicating target odor before receiving a reinforcer (reward).

Once foundation training had been completed and a dog was able to demonstrate the olfactory distinction between a unique (novel) unpaired sample containing the PD-associated target odor (sensitivity) and that of a human control sample (specificity) of 80% or higher across six working sessions on varied samples, the dog would proceed to the next step. All results reported within this study were post-foundation training.

Sample presentation

The donor T-shirt or swab samples within the stainless-steel canisters were not handled by the researcher or any member of the research team. A gloved researcher placed each of four sample canisters into fixed canister-holders on the wheel (Fig. 2). The wheel was placed on the floor and then the order of position of the four canisters was indicated by color-coded random selection which also determined whether the wheel was to be rotated in counterclockwise or clockwise position between rounds during a working session. The wheel was rotated into the position as indicated by the randomizer, and then the PD-negative control samples were taken from the refrigerated storage unit, canister lids were uncapped, and the control samples were placed onto the wheel. The wheel held four samples consisting of one PD-positive and one PD-negative sample donor. The remaining two samples were distractor T-shirt samples that had not been worn by a human. Once all other samples had been placed on the wheel, the researcher re-gloved and placed the PD-positive sample on the wheel. To reduce the potential for odor contamination (lingering odor) on the floor matting, the wheel frame was supported by four caster wheels which set the frame at 4.5 inches in height from the floor, so that air could flow freely beneath the frame. The wheel could easily be rotated in either direction between rounds. The canister holders on the wheel were routinely wiped (on Thursdays and Tuesdays during the working week) with isopropanol to eliminate lingering or residual odor. Samples were placed into different canister holder positions prior to each working session. Canister holders were spaced four feet in distance between them to reduce potential for false positives indications caused by overlapping odor pools, drifts or lingering odor (odor footprints) between samples.

Fig. 2
figure 2

Sample presentation wheel. Sample canister holders were elevated above the flooring to prevent residual odor contamination to the floor. The wheel is designed to be rotated in any direction

For sniffing indication by the dogs, four samples were randomly placed on a sample wheel. To minimize the potential for fringing (an alert that occurs when the dog indicates close to the target odor but not at the source) and reduce odor transfer between samples, the canisters were spaced four feet (1.23 m). The wheel, for each round of canine sniffing, held one PD-positive sample, one human-sourced control sample, and two other unworn T-shirt samples. The wheel position in the room, sample position on the wheel, and order of dogs were all placed by random selection. The dogs worked off-leash and were free roaming as they entered the sample presentation room and were permitted to work the wheel in any direction. Some of the dogs worked the wheel in a clockwise direction and some worked counterclockwise. A few of the dogs were prone to changing direction based on the scent signature in the room. The dogs worked individually, one after the other, for two or three consecutive rounds and maintained the same run order as determined for the day. The sample wheel was rotated for each round so that, for each round, the dogs encountered the samples in a different position on the wheel.

For most of the study period, there were four working days per week. Not all dogs attended each working day, but most dogs attended at least one or two working days per week. This means that, on each working day, there was a different assembly of dogs. The dogs also varied in their run order, so that the order in which the dogs encountered the samples on the wheel and were subsequently scored was random on each day attended. The dogs were further varied in their ages, breeds, environmental and genetic factors, sex, duration, and experience within the program.

Handler-blind research protocol

All handlers observed the dogs under blind conditions and were instructed to stand motionless in one position outside of the immediate training area. One trainer/researcher observed the dogs in a mirror while standing motionless in the same consistent location, while facing the same direction with eyes looking up to the mirror, through all rounds and all sessions in the study (Fig. 2). While the dogs sniffed samples, no eye contact was made by any handler, researcher, or trainer with the dogs. A data recorder (scribe) sat outside the training room behind a half-wall barrier and was able to observe the dogs from a distance.

Scoring of the dogs

Indications were confirmed between the trainer and scribe and determined as 100% sensitive if the dog’s first 3-s indication (hold) on a PD participant-sourced sample was considered correct, and the dog had not held a three-second alert on any other sample on the wheel. This confirmation occurred once the dog had left the training area with their handler (protocol was for handlers not to be informed as to whether their dog had been scored as correct or incorrect.) If the dog cleared the human-sourced PD-negative control sample (sniffed and did not hold position) on the wheel before indicating on the PD-positive sample, the dog was determined as 100% specific for the round. If the dog indicated the target (PD-positive) sample as positive before encountering the human control sample on the wheel, the specificity for that round was scored as “NA” or non-applicable and did not apply to the daily specificity calculations for the dog or sample in that round. If a dog were to hold for three seconds on a human-sourced control sample at any time in the round, the dog was scored as 0% specific and 0% sensitive. In this instance, the dog was scored as 0% sensitive, even if the dog did not have the opportunity to sniff the PD-sourced sample. If the dog indicated the human-control sample as PD-negative but also indicated the PD-positive as PD-negative, the sample was scored as 0% sensitive and 100% specific. The dog was always scored as 0% sensitive if the dog did not indicate a PD-positive as positive, even if dog did not have the opportunity to sniff the PD-sourced sample. (See Table 4 for an explanation of scoring). In this scoring setup, the dogs could leave and return to a sample, but any first-time three second alert on any sample was considered a final determination by the dog and the trial was concluded at this time. A trial was also concluded if a dog were to clear the samples and go to the exit gate, indicating that the room was clear of PD-positive samples. Additionally, a trial would be concluded if a dog were to make more than one pass on the sample wheel. Though there were no restrictions on whether a dog could leave and return to a sample a second time, in nearly all instances, dogs indicated their final determination during the first sniffed encounter of a sample.

Sensitivity was calculated as in Trevethan (2017), see Box 1. Specificity was also calculated as in Trevethan, except that only human-sourced controls were included as PD-negative samples. Had all negative samples been included in specificity calculations, the overall average specificity for the dogs would have been higher. This was a conservative approach that better aligned with our primary study objective. In all trials, a human-sourced control sample was presented on the wheel. In this manner, the dogs were scored for sensitivity and specificity to mimic a correct or incorrect response in a simulated screening scenario in which all people, or all samples, would exude human scent signature.

Statistical methods

Sensitivity and specificity data were recorded for each dog on prepared data sheets as the dog exited the training area. These daily rounds data sheets were then manually summarized into daily summaries, and then into monthly canine performance reports. The data from the daily sheets were also entered into individual daily spreadsheets. Daily spreadsheets were then combined and summarized to match the manually summarized reports, then verified against the manually entered canine performance reports, and discrepancies were corrected to match the original recordings made during each working session.

The verified spreadsheet data sets were imported into Microsoft Access database tables, where queries were developed to select and summarize the results. Where confidence limits are presented, the Clopper-Pearson (exact) confidence interval method (Clopper and Pearson 1934) for binomial trials was used through the confidence interval for a proportion (Kohn and Senyak 2024). Where comparisons of proportions are presented, the N-1 Chi-squared test as recommended by Campbell (Campbell 2007) and Richardson (Richardson 2011) was employed in MedCalc Software Ltd (MedCalc Software Ltd., 2024). For the comparison of proportions of levodopa usage results, a two-proportion z-test was used (Bobbit 2020).

All sensitivity and specificity values were calculated using only unpaired exposures during post-foundation training.

Results

Overall average performance of dogs in study

Table 5 summarizes the overall averaged sensitivity and specificity by dog for the two years of the study (4959 total dog-sample encounters). Assessment days were included if the sample was found to be PD-positive. During assessment days, sensitivity was not recorded, but specificity was. The dogs were ranked by percent sensitivity, excluding assessment days. There were 10 “top-tier” dogs (90% or higher in both sensitivity and specificity), although three of those dogs had less than 100 rounds.

Table 5 Averaged sensitivity and specificity for 23 dogs for January 2021—December 31, 2022 (top-tier dogs in bold)

For all 23 dogs in the program, the combined overall average sensitivity was 89.0% (4053 correct/4553 total encounters), and overall specificity was 86.6% (2592 passed-up human control/2993 total human control encounters). Specificity includes a lower number of encounters because sometimes dogs would alert on the PD sample before encountering the human control sample.

Average sensitivity/specificity for top-tier dogs

For dogs that achieved 90% or higher for both sensitivity and specificity (10 dogs: Sasha, Bendy, Velvet, Penny, Jaden, Hudson, Quil, Russell, River, Scarlett), the overall sensitivity was 93.5% (2163 correct/2313 total encounters) and the overall specificity was 93.3% (1452 passed up human control / 1556 total human control encounters).

Overall average performance for the dogs by AKC breed group

To show variations in breed groups represented, dogs were categorized as shown in Table 9. These groupings are based on AKC-recognized breed standards (AKC, 2023).

The combined average sensitivity and specificity for the breed groups are reported in Table 6. In most cases, there are not enough dogs in any breed group to make any inferences about particular breed sensitivity or specificity trends. In addition, each dog was raised in a different environment, and most breed groups represented in the study consist of dogs of dissimilar ages with varied days and duration of program attendance.

Table 6 Combined average percentage sensitivity and specificity for breed groups

Figure 3 represents the overall sensitivity for each dog grouped by breed, labeled with dog age. Figure 4 represents the overall specificity for each dog grouped by breed, labeled with dog age. Both Figs. 3 and 4 show that breed did not significantly influence outcomes for sensitivity and specificity for this study.

Fig. 3
figure 3

Overall sensitivity for each dog by age and breed

Fig. 4
figure 4

Overall specificity for each dog grouped by age and breed

Memorization of samples: sensitivity and specificity results when encountering all novel samples compared to previously encountered samples

The average sensitivity for first-time encounters (first round only with no warm-up round, so all first encounters were cold runs) with a novel PD sample and novel human control sample was 86.3% (139 correct/161 total encounters). The average specificity for first-time encounters (first round only) with a novel PD sample and novel human control sample was 89.0% (121 passed-up human control / 136 total human control encounters). Dogs encountered the novel samples in random order for each round, and the samples were in randomized positions in the room for each round.

Comparing the sensitivity and specificity for first-time encounters with a novel PD sample and novel human control sample to the overall sensitivity and specificity for first-round encounters (with novel sample rounds removed) showed no significant difference (p = 0.95) as shown in Table 7. Specificity was higher for those rounds with novel samples but did not exceed the 90% confidence level. This indicates that it is not likely that the dogs are indicating PD-positive samples by memory.

Table 7 Comparison of Overall Round 1 encounters (novel encounters removed) with novel Round 1 encounters

The combined sensitivity for the 10 top-tier dogs (90% overall sensitivity and specificity for the entire study period) when they first encountered a unique PD and unique human control sample (first round only) was 87.1% (74 correct / 85 total encounters). The combined specificity for the same group and samples was 90.4% (66 passed-up human control / 73 total human control encounters).

Memorization of samples: comparison of sensitivity and specificity results between first and final rounds in a working session

The overall combined average sensitivity for all dogs in round 1 of each daily session (where both round 1 and round 3 were attended) was 86.2% (1174 correct / 1362 total encounters), whereas the overall combined average sensitivity for all dogs in round 3 of each daily session (where both round 1 and round 3 were attended) was 91.1% (1241 correct / 1362 total encounters).

The overall combined specificity for all dogs in round 1 of each daily session was 83.6% (766 passed up human control / 916 total human control encounters), and the overall combined specificity for all dogs in round 3 of each daily session was 88.1% (749 passed up human control / 850 total human control encounters).

In both sensitivity and specificity, there was a statistically significant increase of 4.9% (p = 0.0001) and 4.5% (p = 0.007), respectively, between round 1 and round 3 (Table 8). This could indicate that the dogs were becoming more familiar with the PD odor, or that they were becoming more familiar with the procedure of the trials throughout the day. The location of the PD-positive sample was randomized between each round, so the increase between rounds 1 and 3 was not likely due to memorizing the location of the PD sample. This further denotes that the dogs were not identifying PD-positive samples by memorization.

Table 8 Comparison of matched rounds 1 and 3 for all dogs

When comparing sensitivity and specificity for the first round of all novel samples to the final round of the day for all unique samples, the differences were not significant at the 95% level (p = 0.3 and p = 0.96, respectively), as shown in Table 9.

Table 9 Comparison of sensitivity and specificity for first and final rounds of the day for all novel samples on the wheel

Tables 8 and 9 suggest that the dogs were not memorizing target odor by sample or position of sample regardless of whether they were encountering all samples on the wheel for the first time or had encountered them in a prior round.

Cumulative total exposures by dog as compared to performance

Figure 5 plots the average sensitivity for each dog during the 2020–2021 study period against the total exposures that each dog had to PD-Positive odor over the life of the PADs program. While there was one dog with low total exposures and low average percent sensitivity, this is likely a special case related to the individual dog. When the dogs were ranked by total program number of exposures and their rank as percentage sensitivity, a Spearman’s rank correlation (Spearman’s Rho) showed no significant relationship (rs = 0.1087, p (2-tailed) = 0.6215).

Fig. 5
figure 5

Average percent sensitivity vs. total program exposures for each dog

Figure 6 plots the average specificity for each dog during the 2020–2021 study period against the total exposures that each dog had to PD-positive odor over the life of the PADs program. When the dogs were ranked by total program number of exposures and their rank as percentage specificity, a Spearman’s rank correlation (Spearman’s Rho) showed no significant relationship (rs = 0.0247, p (2-tailed) = 0.9109).

Fig. 6
figure 6

Average percent specificity vs. total program exposures for each dog

Comparison of performance results between samples collected from levodopa-naïve and levodopa-positive PD sample donors

Overall sensitivity for participating PD-positive sample donors who reported levodopa usage at the time of sample collection was 89.0% (3020 of 3395, 95% CI 87.9–90.0%), and sensitivity for participating PD-positive sample donors who reported no levodopa usage at the time of sample collection was 88.25% (954 of 1081, 95% CI 86.18–90.11%). Using a two-proportion z-test these percentages are not shown to be significantly different (z = 0.64, two-tailed p-value = 0.52).

Comparing specificity between participating PD-positive sample donors who reported levodopa usage at the time of sample collection vs. those who did not use levodopa showed similar non-significant results. Overall specificity for PD-positive levodopa users was 85.6% (1575 of 1839, 95% CI 84.0–87.2%), and specificity for PD-positive non-levodopa users was 87.1% (600 of 689, 95% CI 84.4–89.5%). Using a two-proportion z-test, these percentages are not significantly different from each other (z = – 0.69, two-tailed p-value = 0.49). This shows that the dogs were not indicating based on levodopa.

Comparison of performance results between samples collected from male and female PD-positive sample donors

Table 10 shows the overall sensitivity and specificity for male PD-positive sample donors compared to those of female PD-positive sample donors. For both sensitivity and specificity, there were no significant differences between male PD-positive donors and female PD-positive donors.

Table 10 Comparison of overall sensitivity and specificity for male PD-positive sample donors and female PD-positive sample donors

Sensitivity/specificity showing the transition from T-shirt samples to swab samples

During September 2022, the samples presented to the dogs transitioned from T-shirts to cotton swabs that had been rubbed over the skin near the back of the lower neck for both human control and PD-positive donors. Figures 7 and 8 show the daily average sensitivity and specificity trends for the transition period, August 1, 2022, to November 17, 2022. Note: The following findings were not included as an objective in the Introduction of this study, since at that time, we had not considered a change in sample material as a factor. The findings are provided here because they are considered relevant to furthering research into canine detection of PD.

Fig. 7
figure 7

Sensitivity trend during the period of transition from t-shirt samples to swab samples, August 1, 2022, to November 17, 2022. Comparing the seven days prior to the start of the transition from t-shirt samples (blue circles) to swab samples (orange squares) to the seven days after the transition was complete, average percent sensitivity increased from 82.4 (n = 204) to 91.1% (n = 180), p = 0.0124

Fig. 8
figure 8

Specificity trends during the period of transition from t-shirt samples to swab samples, August 1, 2022, to November 17, 2022. Comparing the seven days prior to the start of the transition from t-shirt samples (blue circles) to swab samples (orange squares) to the seven days after the transition was complete, average percent sensitivity increased from 81.7 (n = 180) to 88.4% (n = 112), p = 0.1256

During the introduction of cotton swabs only, we saw a significant decline in sensitivity, but not specificity, during September 2022. Once paired with a small piece of T-shirt from the same donor, the results were comparable to the T-shirt-only sensitivity. By November 2022, the dogs showed confidence and proficiency in sensitivity using cotton swabs alone. The decline in sensitivity of September 2022 was included in all reported data. Specificity showed no decline during this period, suggesting that the dogs were not challenged by the absence of PD-associated odor in the sample material. This suggests that for the dogs, the absence of their target odor was understood by them, regardless of the difference in sample material.

Discussion

In this study, we were able to determine that most household companion dogs when trained through classical detection methods could be used to distinguish between PD-positive and PD-negative samples with a sensitivity of 85% or greater. The top-tier group of dogs was able to distinguish and select PD-positive samples with a sensitivity of 94%. This indicates the selection of top-tier (highly motivated for detection purposes independently of breed) canines would be better suited for the detection of PD-associated odor.

An aspect of canine medical detection training that could be called into question (Elliker et al. 2014), is that of memorization of sample odor. The Elliker study suggested that target samples that are repeatedly presented to the dogs would realize an increase in sensitivity. In our study, we purposely presented the dogs with samples never encountered and compared these outcomes with samples previously encountered. We repeated this in a high number of instances and under different conditions. For all the dogs, in all instances, there was less than a 4% difference in sensitivity and specificity between presenting the dogs with “unique” (novel, or never previously encountered) samples, and samples that had been previously encountered. Findings were consistent even when all the samples presented in the round were unique. This would suggest that the dogs, as a group, were primarily reliant upon their olfactory sensitivity as opposed to any prior familiarization with a specific sample. We further suggest that in determining the sensitivity of canine detection, it would be of value to present findings of the dogs when they encounter all unique donor samples.

An additional aspect of canine detection of PD that is frequently called into question is how drug usage of sample donors affects sensitivity findings. In our study, we found that levodopa usage (the most prescribed drug in PD), did not significantly affect the sensitivity or specificity of the dogs. In Trivedi, et al., a study that investigated levodopa in sebum secretions, it was determined that changes in sebum from PD-positive patients were not associated with this medication (Trivedi et al. 2019). In both Gao et al. (Gao and Wang 2022) and Rooney et al., preprint in 2023, determined that sensitivity and specificity findings were not influenced by drug usage of sample donors.

We did not find that sensitivity and specificity averages for the dogs increased with time spent in training beyond 300 exposures. The study data showed that once the dogs had been in training for one to two years and had achieved between 300 and 600 exposures during that time, the dogs did not increase in their sensitivity and would level off with some slight increases and decreases. Two of the dogs that had been in the program for more than five years showed a marked decrease in sensitivity as they aged, but both dogs had physical disorders; one dog had repeated seizures and the other had a malignant tumor.

Observations from prior years of the program showed that not all dogs were able to recover on the same day from rounds in which reinforcement was not delivered. This was especially true of instances in which the dogs were likely (80% or higher probability) correct in their indication of PD-associated odor and did not receive feedback in the form of a reinforcer (reward). These were novel samples and included sample donors that presented with symptoms, but may have not been diagnosed by a neurologist, or the diagnosis may have been more recent than two years. In these instances, the dogs were simply observed in their indication of all unique samples on the wheel in the first round, and not reinforced for their indication unless a total of eight or more of ten dogs had presented with indication on the PD-possible sample.

The decision to not reinforce the dogs for samples in this situation was made based on the science and principles of operant conditioning. Operant conditioning behaviors that are reinforced are more likely to be repeated than behaviors that are not reinforced. For this reason, the dogs, as a group, were then presented with several days of recovery training in which they were presented with samples of known status. This was necessary to rebuild canine confidence that may have been diminished through a lack of feedback in the form of a reinforcer, or reward, for a correct response. This reduction in sensitivity outcomes only occurred with some dogs and was likely a consequence resulting from the high number of dogs, all with varied environmental backgrounds and differences in cognitive olfactory experience and interpretation. In the absence of recovery training, when these select dogs were averaged in with the group, the group averages declined in sensitivity. This is likely because companion dogs that are trained with operant conditioning and subjected to increased pressure are more likely to offer behavior leading to false positives or false negatives. Since all the dogs differed in their learning backgrounds, we can draw the conclusion that each of the dogs likely experienced a different level of pressure when not reinforced for a correct response. Many of the dogs presented with no change in their average rate of performance, but some of the dogs did realize a decline in sensitivity, so we found it necessary to rebuild confidence for these dogs by presenting all the dogs with known samples for an additional one or two working days. For the dogs that did not present with a downward trend in sensitivity under these circumstances, then these additional working days had the same effect upon these more confident dogs as maintenance training.

In the Hackner study (Hackner et al. 2016), which simulated diagnostic screening conditions for dogs for lung cancer, the specificity of the dogs dropped to 34% under screening conditions, suggesting a high incidence of false positives. The dogs in the study had been trained and were performing at a sensitivity rate of 90% or higher before being subjected to a simulated diagnostic screening test (Hackner et al. 2016; Lazarowski et al. 2020).

For this reason, we would not recommend the use of household companion dogs for production-style diagnostic screening, as in the case of a laboratory instrument, for PD. Since there is no readily available diagnostic test for the presence of the PD odor, it would not be feasible to reinforce the dogs for every positive indication with confidence in canine accuracy. It requires a high number of dogs to provide statistical significance for sensitivity and, until statistical probability is determined, the dogs cannot be reinforced without risking reinforcement of incorrect behavior. This limits the number of samples the dogs can be used to screen since reinforceable recovery training is needed between the presentation of questionable samples. However, the dogs could be used to work in tandem with complementary scientific methods to confirm and analyze the presence of PD odor. This has been useful for the development of electronic or bioelectronic noses (Shor et al. 2022), for further research into the nature of PD-specific odor, and possibly to understand the evolution of odor as it relates to disease progression.

Training protocol, as it relates to outcome, is a study of ongoing investigation. Different training protocols may provide a different outcome, and we, like dogs, are evolving and learning as both individual and collective knowledge and experience is accumulated. Our hope is that this study will provide information for other canine detection programs to help them adjust, modify or build upon our training protocol.

In Gao et al. (Gao and Wang 2022) it was noted that the dogs in the study from China did not indicate on three separate samples provided by sample providers with a type of PD that is ascribed to a genetic mutation. In our study, it was also noted that none of the dogs indicated on a sample that was provided by a PD-positive donor who reported a GBA gene mutation. We also noted that the dogs in this study did not indicate on a prior confirmed canine-assessed sample from a sample donor who died within two weeks of providing an additional sample to the program. Samples from this donor had been provided to the dogs in prior instances and, in all previous cases, the dogs had indicated, as a group, that the sample was likely PD-positive. For the sample provided two weeks before death, none of the ten dogs in the round indicated a behavioral response to the sample. This would suggest that the target odor for this sample was not present immediately before the death of this sample donor. This could be due to the absence of target odor or qualitative difference in odor volume. Both possibilities require further investigation.

In Gao et al. (Gao and Wang 2022) the researchers used a determination of two of the three dogs providing a sensitive indication to a PD-donor sourced sample to determine the sample as “PD-positive” to assess the sensitivity accuracy of the dogs. In our study, no weighting was done based on the performance of the dogs. In cases where a sample donor had not been diagnosed by a neurologist or had been diagnosed for less than two years, the researchers required the sensitivity of 80% of the dogs in the first daily working session for that sample to receive a determination of PD-positive.

To the best of our knowledge, this is the first detection program and most comprehensive study to investigate the sensitivity of household companion dogs for the detection of PD. This study systematically considered factors that could affect canine detection outcomes including breed, training duration, attendance days, age, and environmental background. Despite these differences, the dogs, as a group, were consistent in their ability to distinguish between PD-positive and PD-negative samples with a sensitivity rate of 89% or higher.

This study serves to advance the field of canine detection of PD as follows:

  • Household-raised companion dogs of varying breeds and backgrounds can be used for PD detection, thus eliminating the need for the stewardship responsibility and expense of canine ownership by a detection program.

  • Household-raised companion dogs can be used to assess small sample sizes for PD odor detection provided they are worked within their physical limitations.

  • This study provides additional substantiating evidence for the presence of one or more volatile organic compounds obtained from sebum samples of PD-positive patients.

  • This study provides additional supporting evidence for the viability of canine detection of PD.

This study demonstrates that companion dogs can detect a Parkinson’s-associated target odor, which likely exists as one or more volatile organic compound(s). Further investigation is necessary into whether any of volatile organic compounds that comprise the odor of Parkinson’s Disease are present within the scent signature in sebum of other neurological diseases, such as ALS, Huntington chorea, multiple sclerosis and Alzheimer’s Disease. The Manchester study cites the discovery of 500 compounds unique to PD-associated sebum samples (Sarkar et al. 2022; Walton-Doyle et al., preprint in 2023). If any of these volatile compounds are also present in the scent signature of other neurological diseases, the possibility exists for dogs, trained for PD-odor indication, to provide a generalized indication on other neurological diseases that share one or more of these VOCs. This possibility may be prevented by selecting and isolating a VOC, or VOCs, that only present in PD-associated odor, and training dogs on just this compound. With further investigation, this odor, or a compound(s) within this odor, could be an important biomarker for the early and non-invasive detection of prodromal Parkinson’s disease. If a biomarker can be isolated and reproduced in the form of a singular training aid for dogs, companion dogs could then become a useful, cost-effective, and widespread method of early detection of Parkinson’s disease.