Reliability of accelerometric measurement of physical activity in older adults-the benefit of using the trimmed sum

There is general consensus that physical activity is important for preserving functional capacities of older adults and positively influencing quality of life. While accelerometry is widely accepted and applied to assess physical activity in studies, several problems with this method remain (e.g., low retest reliability, measurement errors). The aim of this study was to test the intra-instrumental retest reliability of a wrist-worn accelerometer in a 3-day measurement of physical activity in older adults and to compare different estimators. A sample of 123 older adults (76.5 ± 5.1 years, 59 % female) wore a uniaxial accelerometer continuously for 1 week. The data were split into two repeated measurement values (week set) of 3 days each. The sum, the 80–99th quantiles and the 80–99th trimmed sums were built for each week set. Retest reliability was assessed for each estimator and graphically demonstrated by Bland–Altman plots. The intraclass correlation of the retest reliability ranged from 0.22 to 0.91. Retest reliability increases when a more robust estimator than the overall sum is used. Therefore, the trimmed sum can be recommended as a conservative estimate of the physical activity level of older adults.


Introduction
There is general consensus that physical activity is important for preserving functional capacities of older adults and for positively influencing quality of life [7,12]. To measure physical activity in studies, a variety of direct (e.g., pedometer) or indirect (e.g., questionnaires) methods is used [18,20]. The measurement methods differ with regard to their quality, criteria validity and retest reliability, costs and acceptance by study participants, and depend closely on the feasibility within the study design. At present, no gold standard for the assessment of physical activity has been established [18,28].
Among direct methods to measure physical activity, accelerometry is accepted and widely applied. An accelerometer is worn on the body (e.g., at the hip, ankle, or wrist) measuring acceleration in up to three dimensions. In so doing, information on frequency, intensity, and duration of an individual's physical activity is collected, expressed in "counts per minute" (CPM). It is assumed that the amount of CPM is associated with the intensity of physical activity [5,14]. To represent the average physical activity of an individual, a minimum of 3-day measurement is suggested [28,29].
Despite the widespread use, direct measurement of physical activity using an accelerometer remains challenging [18,31]. There is-for example-no consensus on the type of accelerometer to use [1,18,28], nor is there agreement as to the part of the body on which it should be worn [9], just recommendations for different target groups, e.g., for older adults exist [4,6,16,21]. Older adults frequently perform physical activity with light to moderate intensity, such as housekeeping, gardening, or walking for leisure [11]. In order to take these activities into account, some authors recommend the use of a wrist-worn uniaxial accelerometer [4,6,16,21], since movements mainly occur in the upper body and arms (e.g., the wrist-worn "Actiband" AB64 uniaxial accelerometer, Cambridge Neurotechnology Ltd., UK).
Despite the wide use of accelerometry-based measurement of physical activity in all kinds of studies, data on the retest reliability are seldom published [16]. This is true for the uniaxial wrist-worn Actiband accelerometer itself, as well as for other accelerometers in general. The only published data on the retest reliability of the Actiband was found in Rowe et al. [19]. They found a high interinstrumental retest reliability of two Actibands which were worn simultaneously during a test on a treadmill (ICC0 0.98; 95 % CI: 0.91-0.99). However, the study was performed with ten 10 to 11-year-old boys in a laboratory environment, comparing two different Actibands. Therefore, these results cannot directly be adopted for the measurement of activities of daily life in communitydwelling older adults within a nonlaboratory situation.
Maybe the reason for the limited data of the accelerometerbased measurement of physical activity is partly explained by the disappointing results of the analysis of retest reliability. Usually, the sum of CPM or the mean CPM, collected over a period of a few days and divided by the number of days, [26,28,30], is used to express the average amount of physical activity of an individual. The resulting "average counts per day" often show tremendous intra-and inter-individual variability [9]. This variability may be partly explained by outliers of the CPM measured by the accelerometer. Outliers are multiples of reasonable CPM values. These values are defined as measurement errors, since they are clearly due to methodological issues of the manufacture and cannot be achieved by any kind of physical activity. Consequently, using the sum or the mean of CPM which still include the outliers cannot result in high retest reliability. Unfortunately, a standardized recommendation on how to deal with outliers of accelerometry is lacking. Retest reliability might be low due to the outliers which account for the overall sum and not due the general missing possibility of reproducing the results. Orsini et al. [17], for example, defined CPM greater than 20,000 as malfunction of the accelerometer without further explanations on the cut-point they chose. These data were then set as missing and thereby excluded from analyses. Instead of defining a certain cut-point for each accelerometer, we would like to suggest a different approach. In order to enhance the retest reliability, an alternative and more robust estimator that is less sensitive to outliers/measurement errors might be needed. The trimmed (or truncated) sum may be an alternative estimator. The trimmed sum is obtained by omitting a certain percentage of the most extreme observations (e.g., 5 % of the low and 5 % of the high end) and taking the sum of the rest. It is a robust measure of central tendency and is stable against abnormal extreme values (such as measurement errors/outliers), which get "trimmed" away [2]. Using the trimmed sum to express the average amount of physical activity of an individual instead of the overall sum of CPM may result in higher retest reliability.
The aim of the study was therefore to find a more robust estimator in order to account for outliers that occur by using accelerometry to measure physical activity. This more robust estimator will then be used to test the intra-instrumental retest reliability of a wrist-worn accelerometer in a 3-day measurement of physical activity in community-dwelling older adults. We hypothesized that using quantiles and the trimmed sum instead of the overall sum (which includes the outliers) of CPM will decrease the measurement error and increase the retest reliability.

Methods
The presented study was part of a validation study of a physical activity questionnaire for older adults [23][24][25]. Participants were recruited via 13 general practitioners in North-Rhine Westphalia, Germany during springtime. All patients, who visited the practice for any reason and fulfilled the inclusion criteria, were asked to become a participant of the study. The inclusion criteria were being 70 years or older, being legally competent and able to cooperate appropriately, and providing written informed consent. The exclusion criteria were life expectancy less than 6 months, being in a wheelchair or bedridden. Recruitment time within one general practitioner practice was 1 week. Body mass index (kg×m −2 ) was computed from measured height and weight on a standard balance scale (Seca 862) and stadiometer (Seca 214, both: Seca, Germany). The study was approved by the Ethics Committee.
A sample of 123 community-dwelling older adults wore the Actiband AB64 uniaxial accelerometer (Cambridge Neurotechnology Ltd., UK) continuously for 1 week (7 days, 24 h per day). The Actiband is a lightweight device (12 g; size, 35×15×5 mm) that measures and records vertical acceleration with a 1-min epoch. The device is waterproof.
In order to find a more robust estimator than the overall sum of CPM, the retest reliability of selected quantiles and trimmed sums was evaluated and compared to that of the overall sum of CPM. After participants had returned the Actiband to the study center, the complete data of 1 week were afterwards split into two repeated measurement values (week set) of 3 days each. The first set included the CPM data from Monday, Tuesday, and Wednesday; and the second set included the data from Thursday, Friday, and Saturday. Sunday was excluded. Assuming that a maximum of 20 % of the CPM values at the high end of the distribution were due to outliers, these values were discarded by building two datasets each: the 80th, 85th, 90th, 95th and 99th quantiles (Q80-Q99), and the 80, 85, 90, 95, and 99 % trimmed sums (TS80-TS99). The CPM value of the, e.g., Q80 (80th quantile) is the value where 80 % of the CPM values are less or equal to it and 20 % are greater than or equal to the particular CPM value. To build the trimmed sums, only the high end of the CPM values was trimmed, since the low end contained a large number of null values (measured during sleep or physical inactivity), and then summed up. This means that the (1−β) trimmed sum is calculated as and 0≤β≤1. The parameter β defines the percentage of the trimming. For comparison, the special case of the overall sum (β00, i.e., nothing is trimmed) will also be described.
A Bland-Altman plot was used as a graphic assessment of the agreement of the 2 week sets where the difference between the 2 week sets is plotted against their mean for each subject [3]. The 95 % limits of agreement, estimated by mean difference ±1.96×standard deviation of the differences, provide an interval within which 95 % of differences between measurements by the 2-week sets are expected to lie.

Participants
The anthropometric characteristics of the 123 participants (aged 76.5±5.1 years, 59 % female) of the study are shown in Table 1.

CPM
The assessment of physical activity of older adults in this study showed measurement errors (outliers). Figure 1 shows the 1week measurement of a participant. This activity profile mainly shows CPM in between 0 (inactive) and 1,000, whereas five single values clearly exceed the usual level. These values can be defined as outliers. This occurrence was similar in physical activity profiles of other participants who were randomly chosen.

Discussion
In order to measure physical activity within studies, accelerometry is a measurement technique which is widely applied. But this method is still afflicted with some peculiarities, e.g., measurement errors. In order to enhance the quality of data, we proposed to use more robust estimators such as quantiles or trimmed sums as a summary measure of physical activity instead of the overall sum. These estimators were then analyzed in terms of retest reliability. As assumed, both quantiles and trimmed sums CPM increased the retest reliability compared to the overall sum. This was true irrespective of the chosen cut off point (e.g., 80 or 95 %) above which the values were omitted (trimmed sum). We clearly showed that the retest reliability increases when using an estimator such as the TS95. We prefer to use the trimmed sum rather than a certain quantile. When using a quantile to describe the physical activity, it remains unclear whether this CPM value at a predefined quantile was achieved due to one intensive physical activity (e.g., running quickly to get to the bus once during the 3-day measurement period) irrespective of the remaining physical activity or if the person has a high average level of overall physical activity in general. In contrast to the quantiles, the trimmed sum accounts for this, and the average physical activity remains in the estimator. The trimmed sum allows a predefined (or estimated) error level β of the used device to be selected, and then yields a single number which reflects all activities. Additionally, the trimmed sum is proportional to the weighted/truncated mean, which allows the calculation of average activity [13], e.g.  Summing up, the advantage of the trimmed sum is that it includes both the average level of activity and more intense effort, whereas a quantile would only measure the peak of more intense activities. Even though the TS95 did not show the highest retest reliability, it seems to be the most conservative estimator of the physical activity. Virtually all measurement errors are eliminated, while all background level and intensive activities are still included. Summing up, the TS95 appears to be an appropriate measurement value for demonstrating the physical activity level of older adults.
Besides the introduced recommendation on dealing with outliers, we would like to compare the retest reliability of our study to others implemented on the same target group. With special regard to other manufacturers of wrist-worn accelerometers, Gao and Tsang [8] found a high retest reliability of the mean CPM in their study on 3-day measurements on 12 people (ICC00.98; 95 % CI: 0.93-0.99). These older participants, aged 79.8±11.2 years, wore the uniaxial Actiwatch accelerometer (Mini Mitter Co., Inc., Bend, OR, USA) on the wrist for the same 3 days during two consecutive weeks. Harris et al. [10] found a comparably high agreement of a 7-day measurement with the Actigraph uniaxial accelerometer (GT1M; Manufacturing Technology Inc), worn on the hip by 20 participants (mean age about 74 years) with Pearson's r00.87. They repeated the measurement after 2 months.
These two studies both have a more laboratory character compared to our study, since the measurement was repeated on the same days of two consecutive weeks or a whole week after 2 months. In contrast, we took the same week and split it into half (Sunday excluded). With our results on retest reliability, we were able to demonstrate that it is not compulsory to measure physical activity on the same half of the week. Moreover, we also included Saturdays in our analysis, even though activities on a Saturday may be different to the rest of the week. We assumed that participants would be less active on Saturdays compared to Monday to Friday. If this was true, our results on retest reliability would be negatively influenced and would have been even higher, if we had not included Saturdays in the analysis. These two differences in the study design might have relevance for further studies.

Perspectives
Accelerometry is a method widely applied to assess physical activity in studies [17,22,27] and it will often be used in the future. Our recommendation of using the trimmed sum (e.g., 95 %) as a conservative estimate of the physical activity level of older adults would increase the quality of data by controlling for measurement errors.