Keywords

1 Introduction and Background

Physical inactivity is listed as one of the major contributors to mortality, resulting in an estimated 3.2 million deaths worldwide [1]. Adults are advised to engage in a minimum of 150 min of moderate-to-vigorous-intensity physical activity (PA) per week to reduce the risk of chronic diseases like hypertension, diabetes, and obesity [2, 3]. Physical inactivity can be reduced by walking, which is an inexpensive form of exercise for many adults and requires no special equipment [2, 4]. The adoption and use of consumer wearable health device (CWHD) for PA tracking is increasing. This increase is evident in wearable health technologies retaining the top-three global fitness trends since 2016, taking the first spot in 2016, 2017, and 2019 [5].

CWHDs support individuals to take ownership of their personal well-being and keep track of their fitness goals. To do this, CWHDs have features that support continuous monitoring and recording of physiological (e.g. heart rate, sleep pattern, blood sugar levels, and so on) and PA data (e.g. duration of PA, distance covered, energy expended, and so forth) [6, 7]. To promote healthy habits, CWHDs incorporate behavioural change techniques like goal setting, self-monitoring, feedback, social influence, and reward [8]. In South Africa, the uptake of CWHDs are on the rise, especially among health-conscious individuals in the urban areas. This increase can partly be attributed to new practice by health insurers who use incentives to motivate their members to use wearable health device to track their PA [9].

In addition to the increase in the adoption of CWHDs by health-conscious individuals to track PA, wearable health devices are also used for remote monitoring of people with chronic disease conditions [6, 10, 11]. In both of these usage conditions, it is important that the data collected by the device is accurate. Inaccurate data from wearable health devices could lead to dire consequences, especially when the device is integrated with healthcare applications [12]. Manufacturers of CWDHs often make strong claims about the accuracy and reliability of their devices [13]. However, there are genuine concerns over the accuracy of the data collected by CWHDs [14]. For example accelerometer and pedometer-based CWHDs are known to be inaccurate in their estimation of energy expended (EE), and are unable to accurately track the number of steps in PA like cycling [6, 7]. Users of CWHDs expect, and are increasingly demanding that manufacturers deliver on their promises. The two class action lawsuits filed by users against one of the major manufacturers of CWHD in 2016 underscore the importance of accurate and reliable data collected by CWHDs [15]. Hence, it is no surprise that many researchers from developed countries [16,17,18] are focusing on the accuracy of the data collected by CWHDs. As discussed later in Sect. 3, all the papers analyzed in this systematic literature review (SLR) were published by authors from developed countries. This points to an apparent dearth of studies that focus on the accuracy of the data collected by CWHDs by researchers from developed countries, including Africa. To address this gap, this research investigates the factors that influence the accuracy of the data collected by CWHDs. More specifically the research focuses on the accuracy of the data generated from heart rate measurement, PA, and sleep monitoring. The research question that we address in the paper is: “What are the factors that influence the accuracy of the data collected by consumer health wearable devices?”.

The remaining sections of the paper are structured as follow: In Sect. 2 we present the process that was followed in the SLR. This is followed by detailed discussions of our analysis of the papers included in the SLR in Sect. 3. In Sect. 4, we discuss the study contribution, limitations, and the implications for the manufacturers of CWHDs.

2 Systematic Literature Review Process

In order to scope the SLR process, research articles were retrieved from the following scientific databases, based on their publication of quality and high impact research journals and conference papers: IEEE, PUBMED-NCBI, ScienceDirect, MDPI, and Springer. To ensure that we retrieve relevant papers, we used the following search phrases: “Consumer wearable health device” OR “wearable health device” OR “wearable health technology” OR “Personal health device” AND “Data Accuracy” OR “Reliability”.

Inclusion and Exclusion Criteria.

Only candidate papers that met the inclusion criteria, specified in Fig. 1, were screened for possible inclusion in the SLR. Papers were excluded based on the criteria specified in Fig. 1.

Fig. 1.
figure 1

Inclusion and exclusion criteria.

Source Selection.

The search period for the SLR was between April and October 2019. An initial search on Google Scholar returned more than 22 000 results. To ensure a more realistic number of potential papers to screen for eligibility, we focused specifically on five databases, namely IEEE, PUBMED-NCBI, ScienceDirect, MDPI, and Springer. A total of 1393 papers were retrieved from the five databases. An additional 20 papers were retrieved from other sources (see Sect. 3 for the list of other sources), thus yielding a total of 1413 candidate papers for screening. Details of all 1413 papers were extracted and copied into an Excel worksheet with the following columns: Title, Author, Publication type, DOI, Abstract, Relevance, Included/Excluded 1st Screening, and Included/Excluded 2nd Screening.

In Excel, a Vertical Lookup (VLOOKUP) was performed on the papers’ Title and DOI to check for duplicates. This process resulted in 1214 unique sources. Thereafter, the 1214 sources were reviewed against the inclusion and exclusion criteria specified in Fig. 1, resulting in 465 papers. We then screened the 465 papers for relevance based on their title, keywords, and abstract. Of the 465 papers, 311 were excluded based on their title and 70 were excluded based on their abstracts. Thus, the remaining 84 papers were marked as relevant and eligible for further screening.

We carried out a first level screening on the remaining 84 papers by reading the abstracts, findings, and conclusion sections of the papers. After reading the three sections, we assigned a priority level of ‘high’ (focus is on accuracy of CWHDs with comparisons/validation between various devices), ‘medium’ (focus is on accuracy of CWHDs but no comparisons/validation between various devices) or ‘low’ (focus is on CWHDs but with emphasis on big data, mobile health apps, smart cloth technologies, etc.) to the relevance column of each paper. Following the first level screening, a total of 36 papers were assigned ‘low’ priority and thus excluded from the study. The second level of screening involved full text reading of the remaining 47 papers. Figure 2 illustrates the source selection process.

Fig. 2.
figure 2

Source selection process.

3 Results

In this section, we present the results obtained from the analysis of the 47 papers included in the SLR.

Quantitative analysis of the 47 papers using descriptive statistics showed that 17 were published in IEEE, 16 in PUBMED-NCBI database, five were published in ScienceDirect, and two were published in MDPI and Springer databases respectively. The remaining six were papers from BMC Public Health, PLOS Medicine, Routledge, Albany Law Journal of Science & Technology, and USENIX. Table 1 shows the distribution of the papers across the databases.

Table 1. Distribution of research papers per database.

Our analysis of the papers according to year of publication shows that the majority of papers were published between 2016 (13) and 2017 (12). The number of publications tapered down to eight in 2018, with only one of the papers analyzed being published in 2019. These statistics show increasing interest of researchers in the accuracy of the data collected by CWHDs. The limited number of papers in 2019 should not be construed as waning interest in the topic. Rather, it can be attributed to the search period for sources that were included in the SLR.

Our analysis of the papers included in the SLR shows that all authors are from developed countries. The majority of the papers (13) were published by authors from the United States of America (USA). There were six papers from Australia, five from China, and four from Korea. Three papers were published by authors from Italy and Japan respectively. There were two papers published by authors from Denmark, the United Kingdom, and Germany respectively. Authors from Argentina, Canada, India, Malaysia, Netherlands, Portugal, and Spain published one paper each. Based on our analysis, authors from African countries are conspicuously absent in the publication of papers that focus on the accuracy of the data collected by CWHDs.

Following the quantitative analysis, we identified common themes in the papers and grouped them into three categories. In the following sub-sections, we discuss the three main factors that influence the accuracy of the data collected by CWHDs.

3.1 The Tracker and Sensor Types

The type of sensor technology fitted into CWHD and the body part where the device is attached can influence the accuracy of the data collected by the device. [10, 11, 19]. Our analysis of the papers included in the SLR showed that the sensor type that typically comes with CWHDs include one or a combination of the following:

  • Photoplethysmography (PPG) sensors: PPG sensors are used in CWHD to monitor heart rate. Using optical sensors, changes in the blood volume of body tissues can be detected by shining light on the surface of the skin to detect discoloration when oxygen-rich blood is ‘flushed’ underneath the skin [12, 20].

  • Pedometer and accelerometer: Pedometer is a lightweight device with sensor that measures the number of steps taken or the distance covered. Accelerometer measures PA by detecting movements across three planes (side-to-side, up-and-down, or forward-and-backward) [21].

  • Actigraphy: This is a non-invasive, wrist-worn device that comes with accelerometer to measure sleep pattern by distinguishing between the states of wakefulness and sleep unobtrusively. This is based on the premise that limited movement is associated with sleep while increased movement is linked with wakefulness [22, 23].

PPG sensors can be attached to various body parts, including the upper arm, the earlobe, the forehead, the wrist, or the finger. The part of the body where a PPG sensor is attached can influence its level of accuracy. Signals from finger-based PPG sensors have higher wavelengths compared to other sites. This makes finger-based PPG sensors more accurate. However, wearing a finger-based PPG sensor can interfere with daily routines, which makes their use less practical compared to other PPG sensors [19].

Another factor that could influence the accuracy of PPG sensors is the colour of the light emitting diode (LED) light that comes with the sensor. The majority of CWHDs that utilize PPG sensors for monitoring heart rate come with green light PPG (gPPG) [12, 24, 25]. However, red light PPG (rPPG) sensors (i.e. pulse oximeters) are commonly used in clinical environments [12, 25,26,27]. rPPG sensors have a number of advantages over gPPG sensors. The green light in gPPG sensors emits shorter wavelengths and does not penetrate deeper into the innermost layer of the skin. In contrast, rPPG sensors can penetrate deeper into the skin because the human body does not absorb the red light [25, 26]. This property enables rPPG sensors to detect other biological signals like the arterial oxygen saturation, respiration, and blood pressure [24, 26, 27]. In addition, rPPG light is not absorbed by melanin (the pigmentation that is responsible for the colour of the human skin) but gPPG light absorbs melanin. Therefore, the skin colour does not influence the accuracy of heart rate measurements when using rPPG sensors. In contrast, darker skin colours influence the accuracy of gPPG sensors [19, 26, 28].

A drawback of rPPG sensors is that they are more susceptible to background noise generated from the body part that the device is attached (for example, hand waving or rubbing), often referred to as ‘motion artefact’. Motion artefacts are known to have negative influence on the accuracy of rPPG sensors. This is not the case for gPPG sensors, which are less vulnerable to the effects of motion artifacts [19, 25, 29].

Sleep, increased PA, and good nutrition are integral parts of maintaining personal well-being. Prior to the pervasive adoption and use of CWHDs, monitoring and tracking of sleep can only be carried out in specialized sleep laboratories using polysomnography (PSG). PSG measures sleep quality by collecting data on eye movements, heart rates, muscle tones, brain activities, and physical movements [13, 22]. The unnatural setting and the need for a sleep technologist to set up PSG equipment make its use impractical in a home setting. Consumer wearable sleep monitoring sensors, called actigraphy, is a non-invasive wrist-worn device that comes with accelerometer, heart rate, and respiratory monitor to detect and record the movements of the wearer at regular intervals in order to estimate sleep and wakefulness [13, 22, 23, 30, 31].

Actigraphy has been shown to be accurate in detecting the state of sleep, but less so in sensing wakefulness. For example, lying down could be misinterpreted as sleep due to the absence of movements, thereby leading to overestimation [13, 23, 32]. This deficiency is primarily due to the fact that actigraphy associates reduced movements with sleep. As such, actigraphy is not very effective in monitoring the different stages of sleep.

CWHDs are equipped with sensors to track PA in the form of pedometer or accelerometer. Previous studies show that while CWHDs with pedometer sensors are effective in estimating step counts, they typically underestimate energy expenditure (EE). Accelerometers on the other hand are deficient in their accurate measurement of steps taken in PA like cycling [3, 6, 7, 33]. The placement of a PA sensor and the speed of walking are some of the factors that could influence its accuracy. Pedometers are less accurate when the sensor is attached to the wrist or hip, compared to ankle-based pedometers. Similarly, slower walking speed, unsteady and uneven gaits influence accuracy [33,34,35,36]. In the case of accelerometers, Nelson et al. [6] found that wrist-worn accelerometers are more accurate than hip-worn sensors. However, Simpson et al. [36] suggest that better accuracy could be achieved when an accelerator sensor is place around the ankle, especially for individuals that walk at slower speeds.

3.2 The Algorithm Used in Consumer Wearable Health Devices

The algorithm used to monitor health parameters by CWHDs is another factor that influence the accuracy of the data collected by the devices [37]. The built-in algorithms in CWHDs support the measurement of bio-sensory and PA data, their processing, and the communication of the outcome of the measurements to the user. Manufacturers of CWHDs do not disclose the algorithms that are used to track and measure bio-sensory and PA for proprietary reasons [17, 33, 38, 39]. This makes it difficult for users to objectively make comparison between devices. In this section, we summarize the algorithms that could be used to monitor heart rate and PA. The discussion of algorithms is limited to the ones reported in the papers that were included in the SLR.

Algorithms for detecting and monitoring motion and PA, developed or proposed by researchers, include pedestrian dead reckoning (PDR) and zero velocity update (ZUPT) algorithms [38, 40, 41]. PDR algorithm estimates walking distance by sensing the number of steps taken and the length of each step. PDR algorithms are more accurate in their estimation of distance covered when the tracking device is attached to the foot [40]. ZUPT algorithm is used to detect and bound static position errors that are accumulated when calculating distance covered using a PDR algorithm. The ZUPT algorithm then detects the periodic static states when the foot returns flat to the ground during walking [40, 41].

Researchers like [27, 41,42,43,44] have proposed algorithms that could improve signals from PPG sensors, thereby improving their accuracy. As discussed in Sect. 3.1, the PPG sensors used to monitor heart rates are susceptible to background noise from ‘motion artefacts’, which could affect the accuracy of heart rate measurements. In their study, Yang et al. [27] develop an Adaptive Spectrum Noise Cancellation (ASNC) algorithm that significantly improve accuracy when ‘motion artefact’ increases. Yousefi et al. [41] also propose a motion-tolerant algorithm to improve signals from PPG sensors by removing ‘motion artefacts’. Similarly, Tang et al. [44] use the Empirical Mode Decomposition (EMD) and Discrete Wavelet Transform (DWT) algorithms to enhance and reduce noise from PPG signals. These authors provide evidence that demonstrate the ability of the algorithms to improve the accuracy of heart rates captured by PPG sensors.

Another element that is closely linked to the algorithms used to measure bio-sensory and PA data is the firmware installed on CWHDs. Firmware updates are necessary to ensure optimal performance and the security the data collected by the device. However, CWHDs can become vulnerable to privacy and security threats during firmware updates. The authors, Fereidooni et al. [45] and, Lin and Sun [46] provide evidence that it is possible for people with the technical wherewithal to inject arbitrary or malicious codes into CWHDs’ firmware during updates. The ability to modify firmware by unauthorized persons can affect the integrity of the data collected by CWHDs. Another concern about firmware updates is that the same CWHD could provide different measurements, depending on the firmware applied. Thus distorting the measurements even if other variables remain unchanged [39].

3.3 Limitations in the Design, Energy Consumption, and Processing Capabilities

Based on our analysis of the papers included in this SLR, the third main factor that could influence the accuracy of the data collected by CWHDs relates to inherent limitations in the design, energy consumption, and the processing capability of a device.

CWHDs are increasingly becoming part of the evolving Internet of Things (IoT) ecosystem. IoT-enabled wearable health devices provide opportunity for continuous monitoring of patients from the comfort of their homes and the transfer of health data to healthcare providers. However, the performance, energy consumption, and the form factor could influence the success of IoT-enabled CWHDs [47]. The convenience and usefulness of a CWHD is dependent on the balance between the device’s size and its battery life. Smaller devices are easier to carry, but do not always have longer battery life. Conversely, longer battery life is commonly associated with bigger size devices [47, 48]. Additional strain is placed on the energy requirements of CWHDs due to continuous collection and exchange of physiological data between a CWHD, other connected IoT devices and applications (apps) [49].

The quality of the components (battery, storage capacity, Bluetooth module, etc.) fitted unto CWHDs can influence the accuracy of the data collected by the device. In a study by Haghi et al. [50], the authors confirm the influence of high-quality components on the performance of CWHDs. Components such as high storage capacity, long wearing battery, Bluetooth, and Wi-Fi compatibility performed better and were more accurate than devices with low quality components.

4 Conclusion

This paper presents a SLR of 47 papers that focus on the accuracy of the data collected by CWHDs. The results of our analysis showed that the highest number of papers were published in two high-quality databases, namely IEEE (36%) and PUBMED-NCBI (33%). 10% of the papers were published in ScienceDirect, while MDPI and Springer were at 2% respectively. The remaining 13% papers were published in journals such as BMC Public Health and PLOS Medicine. All 47 papers analyzed in the SLR were published by authors from developed countries, with the majority from the USA, followed by Australia and China. None of the authors are from developing countries, including Africa. This points to a gap in studies that focus on the accuracy of the data collected by CWHDs by authors are from developing countries. Based on our analysis, there are three main factors that influence the accuracy of the data collected by CWHDs. These are (i) the tracker and sensor type, (ii) the algorithm used in the CWHD, and (iii) the limitation in the design, energy consumption, and the processing capability of the device.

The study has a number of limitations. Firstly, the search and extraction of sources were based on specific key phrases that include consumer wearable health device and data accuracy. This meant that papers that could potentially have been relevant were excluded from the study because they did not use our search phrases in their keywords. Secondly, the study focused specifically on the accuracy of the data generated from heart rate measurement, PA, and sleep monitoring data. Research papers that focused on the accuracy of CWHDs in general were excluded from the study. The inclusion of such papers could have increased the number of factors beyond the three identified in this study. Finally, the proprietary nature of algorithms used to track and measure bio-sensory and PA data meant that the algorithms reported on in the study were those developed or proposed by researchers.

This study contributes to the number research that focus on the accuracy of the data collected by CWHDs. Given the increasing trend in the use of CWHDs across the globe, and the limited number of studies from developing countries that focus on the topic of accuracy of the data collected by CWHDs, it is imperative that more research is done to better understand the factors that influence their accuracy. The study also has implications for the manufacturers of CWHDs. It is important that the manufacturers of CWHDs take into account the factors that influence data accuracy in the design and development of their devices. This will enable users and healthcare professionals to make meaningful use of the data generated by these device, thus contributing to improved personal well-being and the quality of healthcare service delivery.