# New methods for estimating follow-up rates in cohort studies

- 2.2k Downloads
- 3 Citations

**Part of the following topical collections:**

## Abstract

### Background

The follow-up rate, a standard index of the completeness of follow-up, is important for assessing the validity of a cohort study. A common method for estimating the follow-up rate, the “Percentage Method”, defined as the fraction of all enrollees who developed the event of interest or had complete follow-up, can severely underestimate the degree of follow-up. Alternatively, the median follow-up time does not indicate the completeness of follow-up, and the reverse Kaplan-Meier based method and Clark’s Completeness Index (CCI) also have limitations.

### Methods

We propose a new definition for the follow-up rate, the Person-Time Follow-up Rate (PTFR), which is the observed person-time divided by total person-time assuming no dropouts. The PTFR cannot be calculated directly since the event times for dropouts are not observed. Therefore, two estimation methods are proposed: a formal person-time method (FPT) in which the expected total follow-up time is calculated using the event rate estimated from the observed data, and a simplified person-time method (SPT) that avoids estimation of the event rate by assigning full follow-up time to all events. Simulations were conducted to measure the accuracy of each method, and each method was applied to a prostate cancer recurrence study dataset.

### Results

Simulation results showed that the FPT has the highest accuracy overall. In most situations, the computationally simpler SPT and CCI methods are only slightly biased. When applied to a retrospective cohort study of cancer recurrence, the FPT, CCI and SPT showed substantially greater 5-year follow-up than the Percentage Method (92%, 92% and 93% vs 68%).

### Conclusions

The Person-time methods correct a systematic error in the standard Percentage Method for calculating follow-up rates. The easy to use SPT and CCI methods can be used in tandem to obtain an accurate and tight interval for PTFR. However, the FPT is recommended when event rates and dropout rates are high.

## Keywords

Person-time Loss to follow-up Median survival time Reverse Kaplan-Meier survival curve Competing risk## Abbreviations

- CCI
Clark’s completeness index

- EMR
Electronic medical records

- FPT
formal person-time method

- KM
Kaplan-Meier

- NPMLE
Nonparametric maximum likelihood approach

- PTFR
Person-Time Follow-up Rate

- SPT
Simplified person-time method

## Background

The follow-up rate, a standard index of the completeness of follow-up, is important for assessing the adequacy of a prospective or retrospective longitudinal cohort dataset for research purposes. In particular, a low follow-up rate raises concerns regarding the possibility of informative censoring, bias and diminishing statistical power [1, 2, 3, 4, 5]; concerns that increase incrementally with the extent of participant dropout from the cohort [5, 6, 7, 8, 9, 10, 11, 12]. Common sources of “loss-to-follow-up” include, death due to causes other than the endpoint of interest, patient withdrawal, as well as other reasons for dropout, such as a change in at-risk status (e.g., undergoing a hysterectomy during a study of cervical cancer). For simplicity, in this paper we refer to all loss-to-follow-up and censoring due to any causes other than the event of interest or the end of the study as dropout.

Methods to accurately assess follow-up rates are likely to be of growing importance during the current, expanding era of electronic medical records (EMRs). That is, hospital and outpatient databases are increasingly being exploited for research purposes, but require careful scrutiny to determine whether they are truly adequate for use in scientific studies. Patients in routine clinical practice may be more likely than research volunteers in a prospective cohort to seek care from multiple, unaffiliated providers, leading to low follow-up rates observed at a specific health care facility, raising particular concerns regarding informative censoring. Investigators may therefore need to screen through multiple potential clinics or other sources of EMR data to find an appropriate population with adequate follow-up data.

Thus, while there are many sources of potential bias, the follow-up rate provides a quick and easy tool to initially screen potential retrospective clinical cohorts prior to doing more in depth evaluation of the adequacy of the data. Both the researchers and journal reviewers should therefore routinely examine the follow-up rate in an EMR-based study over a period of observation relevant to the study question.

The most commonly used method to assess the completeness of the follow-up, recommended by Cochrane Handbook [13] and the CONSORT guidelines [14] and often referred to as the “Percentage Method” [15], involves simply calculating the proportion of subjects present at baseline (e.g., enrollment) who remained through the end of the study interval or developed the event of interest by the end of the interval [7, 13, 14, 16]. However, this definition is “naïve” in that it does not distinguish subjects who dropped out early during a study from subjects who dropped out late in the study. In fact, the Percentage Method essentially assumes that all the subjects who were lost to follow-up were lost at the very beginning of the study, and therefore can severely underestimate the follow-up rate in a cohort, leading to a false conclusion regarding the quality of the data.

Several attempts have been made to improve upon the Percentage Method for assessing the degree of follow-up. For example, the median follow-up time has been used as a measure to examine the length of follow-up. However, there have been disagreements regarding how the median follow-up time should be calculated: whether it should be calculated among all subjects, only dropouts, or other variations, each has its limitations [17, 18, 19, 20]. Further, there is an increasing recognition that the median follow-up time does not directly measure the “completeness of the follow-up”: e.g., the median follow-up can be low with excellent follow-up, and it can be high with poor follow-up [18, 20, 21, 22]. While time to event studies must have sufficient length of follow-up to capture enough events in order to have sufficient statistical power, as we mentioned earlier, poor follow-up raises concern on the validity of the study. Thus, to assess adequate of follow-up for a cohort study, we need to examine both the length and the completeness of follow-up.

Alternatively, a reverse Kaplan-Meier (KM) survival curve has also been used to assess the length as well as the completeness of the follow-up, which is constructed by reversing “censor” and “event” [18]. However, as explained in detail below, because the reverse KM method treats the events of interest as censoring, it exaggerates the cumulative loss to follow-up rate. In addition, a measure of follow-up completeness proposed by Clark et al. [21], which we explained more later, fails to account for possible events that could have occurred among those who were lost to follow-up if they had remained in the study. Further, the accuracy of this method, to our knowledge, was never formally examined using simulations.

In this paper, we review major existing methods for estimating follow-up, and propose a new person-time follow-up rate (PTFR) – essentially, the observed person-time divided by the person-time assuming no dropouts – to address the limitations we found with existing methods. We then describe two methods to estimate PTFR. Simulation studies are used to examine the accuracy of the proposed methods and the existing methods, and each method is applied to a real-world prostate cancer recurrence “retrospective cohort” study based on EMR data [23].

### Existing measures for following-up rates

Consider a cohort of size N, and that T_{i} and C_{i} represent the time to the development of event of interest and the censoring time for the ith subject, respectively, *i* = 1,2,…,N. For simplicity, we assume the study ends at a specified time,*τ*.

#### Standard “percentage method”

*η*

_{ percentage }defines the follow-up rate as

*τ*. Note that although participants dropped out at different times, the percentage method essentially considers their follow-up time as zero no matter how long they contributed person-time to the study-systematically underestimating the true follow-up. To help illustrate these points, Fig. 1 provides a simple example of a hypothetical cohort of 100 subjects who were followed and assessed with annual visits for three years. There were 10, 5 and 5 outcome events in the 1st, 2nd and 3rd year, respectively with 40 dropouts in the 1st year in scenario (A) and in the 3rd year in scenario (B). The Percentage Method estimates follow-up rate to be 60%, regardless of whether the dropouts occurred at the beginning of the study or late in the study.

As mentioned above, alternative methods have been developed to address the length of actual observation within a cohort. Two of the most commonly referenced are the reverse KM Survival Curve and the Clark et al.’s Completeness Index method [21].

#### Reverse Kaplan-Meier (KM) survival curve

#### Clark’s completeness index (CCI)

Specifically, PT_{observed} = the actual total person-time observed in the study, while PT_{potential} = total potential person-time of follow-up estimated by assuming that all dropouts had the full follow-up time. However, this approach fails to consider that those dropouts could have developed the event of interest during the study interval. Therefore, it can overestimate the total potential follow-up time and consequently underestimate the completeness of follow-up; the extent of underestimation would necessarily increase with higher event and dropout rates. In Fig. 1, *η* _{ CCI } = 62.3% for scenario (A) and *η* _{ CCI } = 92.5% for scenario (B), suggesting that the method takes into account observation time for dropouts. However, if in scenario (A) 5 of the 40 dropouts died shortly after dropping out, PT_{potential} would be overestimated and thus *η* _{ CCI } would underestimate the true follow-up rate. The extent to which this affects the estimates given varying conditions and assumptions, to our knowledge, has not been examined before.

## Methods

### A new person-time definition of follow-up rate (PTFR)

*η*

_{ PTFR }as:

_{no-dropout}= the total person-time that would have been observed in the study if there were no dropouts. The denominator is the hypothetical situation of no dropout, with subjects contributing time to event

*T*

_{ i }or time to the end of the study, whichever came first. Note that the calculation of

*η*

_{ PTFR }requires that the time to event

*T*

_{ i }is known for all participants, whether they dropped out or not.

*η*

_{ CCI }underestimates

*η*

_{ PTFR }since

*W*

_{ i }follows the distribution of

*T*

_{ i }truncated at

*τ*. Using the example in Fig. 1, if none of the dropouts became events during the study,

*η*

_{ PT }= 62.3% for scenario (A) and

*η*

_{ PT }= 92.4% for scenario (B),

*η*

_{ PTFR }=

*η*

_{ CCI }; however, if 5 of the dropouts became events shortly after they dropped out, then

*η*

_{ PTFR }= 65.3 % >

*η*

_{ CCI }.

Because the PTFR cannot be calculated directly since the event times for dropouts are not observed, here we propose two estimation methods.

### A formal method to estimate the person-time follow-up rate (FPT)

_{0}= 0, we denote the pre-specified visit times as (t

_{1}, t

_{2},…,t

_{K}) where t

_{K}= τ, i.e., the end of the follow-up. It is then assumed that, on average, events and censoring occur midway through each interval, consistent with standard practice in life-table analysis [24]. Therefore, the numerator (i.e., the actual person-time of follow-up) of Eq. (3) is estimated to be

*N*

_{ k − 1}= number of subjects at risk at the beginning of the time interval k (i.e., at time t

_{k-1}) and \( {N}_k={N}_{k-1}-{N}_{E_k}-{N}_{C_k},{N}_{E_k} \)and \( {N}_{C_k} \) are number of events and dropouts that occurred during the interval k, respectively.

While PT_{observed} can be easily calculated by summing all participants their observed follow-up time during the study, calculation of the denominator, PT_{no-dropout} in the definition of *η* _{ PTFR }, requires knowledge of the actual time to outcome event for each participant if it happened during the study, regardless whether or not the participant dropped out. This information is typically not available in a real-world study. In an earlier effort to address this problem, Chen, Wei and Huang used the known event rate for the population from which the cohort was derived to calculate “the maximum person-year”, which in our nomenclature, is PT_{no-dropout} [15]. However, it is often difficult to specify the population from which a cohort is derived [25], nor will the event rate be known except for certain general endpoints, such as all-cause mortality. Therefore, this approach is not applicable to most studies.

To estimate PT_{no-dropout}, herein we propose estimating the event rate based on the observed data. The survival function and the conditional probability of developing the event of interest are estimated using a nonparametric maximum likelihood approach (NPMLE) proposed by Turnbull [26], equivalent of a Kaplan-Meier survival curve but appropriate for interval observations. To use this approach, all subjects follow-up time need to be described by an interval: if a subject experiences an event between the (k-1)th and kth visit, then that individual’s time to event is described by the interval (t_{k-1},t_{k}); if a subject dropped out between the (k-1)th and kth visit, then that individual’s event time is described by an interval (t_{k-1},t_{K + 1}) where t_{K + 1} = some large number, such as 100 years(a theoretical time interval that in essence indicates that the person who dropped out will eventually develop an event assuming there are no competing risks); if this subject was free of events till the end of the study t_{K}, then that individual is given an interval (t_{K},t_{K + 1}). The Interval package in R [27, 28] can be readily applied to estimate the survival curve and the conditional probability of developing the event of interest during each interval.

_{k-1},t

_{k}) is estimated to be \( {N}_{k-1}^{\ast }{\widehat{P}}_k \) where \( {N}_{k-1}^{\ast }= \) number of subjects remained in the study at time t

_{k-1}if there was no loss of follow-up and \( {\widehat{P}}_k= \)the estimated conditional probability of event during the kth interval using the NPMLE method for k = 1,…,K and \( {N}_0^{\ast }=N \). Therefore, the number of subjects remained in the study at the beginning of the interval k + 1 if there was no loss of follow-up is then \( {N}_k^{\ast }={N}_{k-1}^{\ast }-{N}_{k-1}^{\ast }{\widehat{P}}_k \). Then, the expected person time if there was no dropout is estimated to be

This method, apparently, is relying on the assumption of independent censoring, that is, the event rate of the dropout is the same as that in the general population.

While a prospective epidemiological cohort study may intend to follow participants at serial intervals of approximate equal-length (e.g., annual or semi-annual visits), not every participant returns for each visit or does so at the planned time. This leads to varying lengths of time between visits, which can sometimes be quite extensive. Clinical based cohort studies that involve ad hoc patient follow-up (e.g., cohorts defined retrospectively from hospital EMR) often result in irregular schedules of clinical visits with clustering that does not occur at random (e.g., motivated by symptoms, or an abnormal laboratory test result). To assess the follow-up rate for such data, we extended the proposed approach above to address irregular intervals between visits.

_{i}is either (a) the date of the last visit in the study for the ith person; or (b) the visit that ith person was diagnosed of the event. Then for (a) we used time to the last visit as an estimate of the person’s censoring time, i.e., \( {\widehat{C}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({\mathrm{T}}_i,{C}_i\right)={t}_{K_i} \), and for (b)we estimate the time to event occurred in the mid of the interval, i.e.,\( {\widehat{T}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({T}_i,{C}_i\right)=\frac{t_{K_i-1}+{t}_{K_i}}{2} \). The actual Person-time of follow-up by a specified time, say,

*t*

_{ K }, is then estimated by the summation of all the observed follow-up times across subjects, i.e.,

To estimate PT_{no-dropout}, if the ith person developed the event at his/her last visit, the interval event time is \( \left({t}_{K_i-1},{t}_{K_i}\right) \) and if a person did not develop event at his/her last visit, the interval event time is then \( \left({\mathrm{t}}_{K_i},\mathrm{E}\right) \) where again E represents some large number. Then the NPMLE method can be applied to PT_{no-dropout}.

As mentioned above, the use of observed data to estimate the event rate relies on the assumption that the loss to follow-up is not informative, i.e., event rate among those who remained in the study is the same as those who dropped out so that the event rate estimates obtained from the observed data apply to the unobserved. However, if the subjects who were lost to follow-up are at a different risk of recurrence than those who remained in the study, the estimates of event rates are biased. For example, if the subjects who were lost to follow-up had a higher risk of event, then the event risk is under-estimated using the observed data and the follow-up rate will be underestimated using the person-time approach because *PY* _{ nodropout } is overestimated. Conversely, if the subjects who were loss to follow-up had a lower risk of event, then the event risk is over-estimated and the follow-up rate will consequently be overestimated using the Person-time approach. Here we proposed to calculate a lower bound to the Person-time follow-up rate by assuming all those who dropped out never developed event of interest during the time interval we examined. In this case, *PY* _{ nodropout }reaches its highest possible value, leading to a lower bound for the follow-up rate. Note in this case *PY* _{ nodropout } = *PY* _{ potential } so that min *η* _{ PTFR } = *η* _{ CCI }. The lower bound of the follow-up rate is important because it provides a conservative estimate of the follow-up rate: if the follow-up rate was over-estimated it can lead to over-optimism on the quality of the follow-up.

### A simplified method to estimate the person-time follow-up rate (SPT)

The need to estimate the event rate for the purpose of calculating the PTFR can be difficult especially to a non-statistician. Therefore, we also explore a simplified alternative method to allow quick estimation of *η* _{ PTFR } without having to estimate the event rate. Our proposed Simplified Person-Time method is a hybrid method including aspects of the Percentage Method and the Person-Time Method. Specifically, as in the Percentage Method, individuals who developed the event of interest during the study are treated the same as individuals who were followed till the end of the study, i.e., they are treated as having contributed complete follow-up since they have already provided complete data regarding the factors associated with becoming a case. Furthermore, as a Person-Time Method, dropouts contribute partial follow-up time in the numerator.

*η*

_{ SPT }= 66.7% for scenario (A) and

*η*

_{ SPT }= 93.3% for scenario (B), remarkably close to but slightly overestimate

*η*

_{ PTFR }, the slight overestimation is because events are given the full length of follow-up in this method. It can be shown that

Figure 1 also indicated that *η* _{ CCI } and *η* _{ SPT } together provides a close boundary for *η* _{ PTFR }. In fact, the outcome events can be viewed as competing risk to loss to follow-up and we can therefore use the method in competing risk framework for the computation of cumulative loss to follow-up rate [29, 30] and then to obtain the subdistribution reverse KM curve.

To revisit the reverse KM survival time, we will instead assign the events to have full follow-up time and then the rate of follow-up over time is no longer affected by the amount and the timing of the events. In Fig. 2, both scenarios (A) and (B) will share the same curve of follow-up rate over time after addressing the competing risk of events. It can be shown mathematically that the area under the curve of this new follow-up rate over time divided by τ is *η* _{ SPT }.

R program for computation of each method is provided in Additional file 1.

### Simulation studies

Simulation studies were used to examine follow-up rates computed using the standard Percentage Method, the CCI, the FPT, and the SPT as compared to the true follow-up rate *η* _{ PTFR }. To conduct these comparisons, we assumed a range of different outcome event rates and dropout rates. Specifically, the simulations involved *N* = 1000 subjects and time-to-event and time-to-dropout were generated for each subject using exponential distributions. The event rate was varied between 5% to 50% and the dropout rate from 10% to 50%, which covers a wide range of plausible values for these two parameters. In the first scenario of the simulation, the length of the study was five years with annual clinical visits; the second scenario incorporated random variations in the time between clinic visits (from 0.5 to 1.5 years). The results were then averaged across 1000 simulated datasets.

### Application to the prostate cancer clinical cohort study

A retrospective clinical cohort study of time to recurrence of prostate cancer (PrCa) was conducted using EMRs among patients who underwent robotic assisted laparoscopic prostatectomy (RALP) by a single surgeon at Montefiore Medical Center in the Bronx from October, 2005 through December, 2012 [23]. We used this dataset as a real-world example with staggered study entry and ad hoc follow-up. The dataset included *N* = 610 PrCa patients. Clinical guidelines held that PrCa patients should have PSA levels measured every 3 to 4 months in the first year following RALP, every 6 months in the second and third year, and then annually. However, PSA measurements were to be conducted more frequently if the post-operative serum PSA value exceeded 0.1 ng/dl. The median number of follow-up serum PSA measurements was 7 (range 1–28). PrCa recurrence was defined as a rise in serum PSA of 0.2 ng/ml or higher. There were 87 (14.3%) recurrence events following RALP. Three-year and five-year recurrence rates were of primary interest.

Note although there were no observed deaths in the study, death can be a potential competing risk here. For the interest of assessing the completeness of the follow-up, death should be included as an event when calculating the follow-up rate.

## Results

### Simulation studies

*η*

_{ percentage }systematically underestimated the follow-up rate: the larger the dropout rate, the higher the level of underestimation. For example, when the event rate was fixed at 10%, the averaged

*η*

_{ percentage }varied from 91.0% to 46.4%, whereas the true

*η*

_{ PTFR }varied from 95.3% to 68.4%. In contrast, the FPT

*η*

_{ FPT }consistently provided an accurate estimate of

*η*

_{ PTFR }with bias less than 2%. The downward bias is because the Turnbull’s NPMLE [26] tends to slightly underestimate the event rate consequently the follow-up rate. This under-estimation of the cumulative incidence function using the NPMLE method for interval-censored data has been recognized [31, 32] and more research on alternative estimators are needed.

Follow-up rates under varying assumptions estimated using four methods: (i) the standard Percentage Method (Eq. 1), (ii) the Clark’s Completeness Index (CCI, Eq. 2), (iii) the Person-Time Method estimated using the formal method (FPT, Eq. 4) and (iv) the Simplified Person-Time Method (SPT, Eq. 5)

Assumed event rate | True Person-time follow-up rate | Percentage Method | Estimated using the formal method | Clark’s compleness inex | Simplified Person-time method | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Average | %bias | \( \sqrt{\mathrm{MSE}} \) | Average | %bias | \( \sqrt{\mathrm{MSE}} \) | Average | %bias | \( \sqrt{\mathrm{MSE}} \) | Average | %bias | \( \sqrt{\mathrm{MSE}} \) | ||

5% | 95.0% | 90.4% | −4.90 | .047 | 95.0% | 0.00 | .001 | 94.9% | −0.08 | .001 | 95.1% | 0.05 | .001 |

81.9% | 66.7% | −18.5 | .158 | 82.1% | 0.16 | .002 | 81.7% | −0.26 | .003 | 82.1% | 0.25 | .003 | |

68.1% | 44.8% | −34.3 | .233 | 68.3% | 0.45 | .004 | 67.9% | −3.49 | .003 | 68.2% | 0.59 | .005 | |

56.7% | 29.3% | −48.2 | .274 | 57.3% | 0.92 | .006 | 56.5% | −0.27 | .003 | 57.3% | 1.09 | .007 | |

10% | 95.2% | 91.0% | −4.48 | .043 | 95.2% | −0.01 | .002 | 95.1% | −0.29 | .003 | 95.4% | 0.10 | .002 |

82.1% | 67.9% | −17.4 | .142 | 82.3% | 0.17 | .002 | 81.6% | −0.69 | .006 | 82.5% | 0.49 | .006 | |

68.5% | 46.5% | −32.2 | .220 | 68.9% | 0.64 | .005 | 67.9% | −1.11 | .008 | 69.2% | 1.07 | .009 | |

56.4% | 30.3% | −46.2 | .261 | 57.1% | 1.33 | .008 | 55.6% | −1.36 | .008 | 57.4% | 1.84 | .011 | |

30% | 94.4% | 90.0% | −4.61 | .044 | 93.4% | −0.95 | .009 | 93.7% | −0.64 | .006 | 94.6% | 0.31 | .004 |

82.7% | 70.7% | −14.5 | .120 | 82.3% | −0.04 | .004 | 81.1% | −1.94 | .016 | 83.6% | 1.06 | .009 | |

69.4% | 50.9% | −26.8 | .186 | 69.7% | 0.42 | .005 | 67.1% | −3.33 | .023 | 70.9% | 2.17 | .016 | |

53.6% | 31.0% | −42.2 | .226 | 54.8% | 2.17 | .012 | 51.1% | −4.71 | .026 | 55.9% | 4.19 | .023 | |

50% | 93.2% | 89.5% | −3.97 | .037 | 88.3% | −0.53 | .049 | 91.2% | −2.08 | .020 | 93.9% | 0.80 | .008 |

77.6% | 67.2% | −13.4 | .105 | 74.6% | −3.79 | .030 | 72.5% | −6.58 | .051 | 79.9% | 3.00 | .024 | |

65.0% | 51.1% | −2.14 | .140 | 63.6% | 2.03 | .014 | 58.5% | −9.93 | .647 | 68.4% | 5.25 | .035 | |

46.6% | 31.3% | −32.8 | .153 | 47.8% | 0.03 | .014 | 40.0% | −14.0 | .066 | 51.3% | 10.0 | .005 |

The *η* _{ CCI } in general provided a good but slightly lower estimate of *η* _{ PTFR }, except when both the event and dropout rates were high because it fails to take into account events occurred in dropouts. For example, when the event rate was 50% and dropout was 70%, the true *η* _{ PTFR } = 46.6% while *η* _{ CCI } = 40.0%, a 14% downward bias. The *η* _{ SPT } is also in close agreement with the true person-time follow-up rate *η* _{ PTFR } but slightly higher because the events are given the full length of follow-up. The overestimation is also more apparent when the event and dropout rates are high. In the same above example, *η* _{ SPT } = 51.3%, a 10% upward bias. Careful examination of Table 1 shows that the easily estimable SPT and the CCI were as likely to be the closest to the “True Person-Time” follow-up rate in most scenarios as the more complex and laborious FPT. When *η* _{ SPT } is used in tandem with *η* _{ CCI }, they provide a tight range of the true follow-up rate so that the use of *η* _{ FPT } is not necessary.

Similar results for each of the methods of estimating follow-up rates were obtained when visits were irregular; i.e., allowing the time-intervals between visits to vary within a person and between persons (results not shown).

### Example dataset

The follow-up rate at each annual interval after subjects (*N* = 610) in a retrospective cohort study of 3-year and 5-year prostate cancer (PrCa) recurrence risk based on electronic medical record (EMR) data

Follow-up | N | Percentage Method | Estimated follow-up using the formal method | Clark’s completeness index | Simplified Person-time Method |
---|---|---|---|---|---|

1 Year | 558 | 91.4% | 95.7% | 95.5% | 95.7% |

2 Year | 472 | 86.2% | 95.0% | 94.5% | 95.0% |

3 Year | 383 | 80.9% | 93.6% | 92.9% | 93.8% |

4 Year | 295 | 75.6% | 92.5% | 92.3% | 93.3% |

5 Year | 197 | 67.5% | 91.8% | 91.8% | 93.0% |

In case of informative censoring, as mentioned in the method section, the CCI estimate provides a lower bound for the person-time follow-up rate. Table 2 showed that the lower bounds were very close to the Person-time estimates, suggesting that even in the extreme case that all the dropouts have no risk of developing event during the study, we do not expect the true follow-rate to be much lower.

## Discussion and Conclusion

The completeness of follow-up and the length of follow-up are important measures to determine the adequacy of a cohort dataset for research purposes. The longer the follow-up is, the less the concern regarding statistical power; the better the follow-up is, the less the concern regarding the validity of a study. This paper focused on measures to assess the completeness of the follow-up. A commonly used follow-up rate to assess the completeness of the follow-up, the naïve Percentage Method, fails to consider the person-time contributed to a study by subjects who drop out prior to study completion; other existing measures of completeness of the follow-up including the reverse Kaplan-Meier survival curve and the Clark’s completeness index (CCI) all have its own limitations. Therefore, we define a new follow-up rate based on total observed person-time of follow-up out of the total person-time of follow-up that could have been observed if there was no dropout. This definition corrects the inherited biases in the existing methods.

We next proposed two methods to estimate the proposed Person-Year follow-up rate. In the formal person-time method, we proposed to estimate the event rate using the observed data, based on which we then estimate the expected number of events if they were no dropouts. Note non-informative censoring is assumed for the validity of FPT, that is, event rate among the dropouts is the same as those who did not. Although this assumption is not verifiable, sensitivity analyses can be conducted to examine the robustness of the estimate of the follow-up rate, for example, by assuming that the dropouts have either a higher event rate or lower event rate than those who did not drop out. The second simplified method (SPT) assigns event time as full follow-up therefore does not require the estimation of event rate and consequently is much easier to use.

Our simulations showed that the Percentage Method often underestimates the follow-up rate quite extensively when the dropouts occurred later in the study. The FPT performed well and the CCI and SPT also performed well in most scenarios, while the CCI tends to slightly underestimate and the SPT slightly overestimate the follow-up rate. The bias can be moderate only when both the event rate and the dropout rate are high; otherwise, the SPT used in tandem with the CCI provides an accurate and tight interval estimate of the true Person-time follow-up rate. In these cases, the use of FPT which involves more computations is not necessary. However, the FPT is recommended when event rates and dropout rates are high.

Application of the methods to an example dataset, based on a study of prostate cancer recurrence, helped demonstrate the critical importance of considering person-time prior to dropout when estimating follow-up rates. Briefly, using the standard Percentage Method the 5-year follow-up rate was estimated to be approximately 68%, whereas the CCI, the FPT and SPT all showed the follow-up to be greater than 90%.

Although the CCI method has been proposed over a decade ago, the use of this person-time method to determine follow-up rates has not been widely adopted, likely due to the fact that the performance of the CCI has not been fully examined and/or the misconception that median follow-up time and the reverse KM survival curve are sufficient. Thus, the presentation of this work is timely. The availability and ease of the calculation of the proposed person-time follow-up rate can represent an important advance in assessing the completeness of the follow-up.

Guidelines on how much the extent of loss to follow-up can be problematic have been based primarily on the percentage method. New guidelines that are based on the person-time follow-up rate should be developed to suggest “acceptable” and “alarming” follow-up rates. Recent work by von Allmen [33] examined the bias in estimating mortality rate under various levels of CCI. However, this work did not distinguish missing mechanisms including missing completely at random, missing at random and missing not at random; further, research studies are often interested in obtaining an unbiased estimate of the exposure-disease association or relative risk associated with the exposure instead of absolute risk of death or disease. Therefore, further studies including conducting series of simulation studies to examine the bias and efficiency loss on relative risk estimates under various levels of loss to follow-up measured by our proposed person-time follow-up rates and under various missing mechanisms are needed and will be the primary focus of our future research.

## Notes

### Acknowledgements

Not applicable.

### Funding

This work was supported by Albert Einstein Cancer Center Support Grant *5P30-CA013330–40*.

### Availability of data and materials

The data that support the findings of this study are available from Montefiore Medical Center (MMC) electronic medical records but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of MMC IRB.

### Authors’ contributions

XX made contributions to every aspect of the study including method development, design of simulations, data analysis, drafting and reviewing the manuscript; IA made contribution to the conception and method development and data interpretation; MK to method development and simulation design; TW made contribution to the method development; JL made contribution on the data analysis; RG made contributions to acquisition of data and interpretation of the data analysis results; HS made substantial contributions to the conception and method development and the interpretation and presentation of simulation results and data analysis results, helped to draft the manuscript and critically reviewed the manuscript in great detail. All authors read and approved the final manuscript.

### Ethics approval and consent to participate

This paper involves a secondary analysis of a data set obtained from hospital electronic medical records. The original study were approved by the Institutional Review Board of Albert Einstein College of Medicine and Montefiore Medical Center and has been published elsewhere [23].

### Consent for publication

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary material

## References

- 1.Choi BC, Noseworthy AL. Classification, direction, and prevention of bias in epidemiologic research. Journal of occupational medicine : official publication of the Industrial Medical Association. 1992;34(3):265–71.CrossRefGoogle Scholar
- 2.Greenland S. Response and follow-up bias in cohort studies. Am J Epidemiol. 1977;106(3):184–7.CrossRefPubMedGoogle Scholar
- 3.Johnson ES. Treatment of subjects lost to follow-up in the analysis of mortality studies. Journal of occupational medicine : official publication of the Industrial Medical Association. 1988;30(1):60–2.Google Scholar
- 4.Johnson ES. Bias on withdrawing lost subjects from the analysis at the time of loss, in cohort mortality studies, and in follow-up methods. Journal of occupational medicine : official publication of the Industrial Medical Association. 1990;32(3):250–4.CrossRefGoogle Scholar
- 5.Kristman V, Manno M, Cote P. Loss to follow-up in cohort studies: how much is too much? Eur J Epidemiol. 2004;19(8):751–60.CrossRefPubMedGoogle Scholar
- 6.Deeg DJ, van Tilburg T, Smit JH, de Leeuw ED. Attrition in the longitudinal aging study Amsterdam. The effect of differential inclusion in side studies. J Clin Epidemiol. 2002;55(4):319–28.CrossRefPubMedGoogle Scholar
- 7.Dettori JR. Loss to follow-up. Evidence-based spine-care journal. 2011;2(1):7–10.CrossRefGoogle Scholar
- 8.Kempen GI, van Sonderen E. Psychological attributes and changes in disability among low-functioning older persons: does attrition affect the outcomes? J Clin Epidemiol. 2002;55(3):224–9.CrossRefPubMedGoogle Scholar
- 9.Sackett DL. Evidence-based medicine. Semin Perinatol. 1997;21(1):3–5.CrossRefPubMedGoogle Scholar
- 10.Touloumi G, Babiker AG, Pocock SJ, Darbyshire JH. Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study. Stat Med. 2001;20(24):3715–28.CrossRefPubMedGoogle Scholar
- 11.Twisk J, de Vente W. Attrition in longitudinal studies. How to deal with missing data. J Clin Epidemiol. 2002;55(4):329–37.CrossRefPubMedGoogle Scholar
- 12.Van Beijsterveldt CE, van Boxtel MP, Bosma H, Houx PJ, Buntinx F, Jolles J. Predictors of attrition in a longitudinal cognitive aging study: the Maastricht aging study (MAAS). J Clin Epidemiol. 2002;55(3):216–23.CrossRefPubMedGoogle Scholar
- 13.Higgins JPT, Green S, Cochrane collaboration. Cochrane handbook for systematic reviews of interventions. Chichester, West Sussex; Hoboken NJ: Wiley-Blackwell; 2008.CrossRefGoogle Scholar
- 14.Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet (London, England). 2001;357(9263):1191–4.CrossRefGoogle Scholar
- 15.Chen R, Wei L, Huang H. Methods for calculation of follow-up rate in a cohort study. Int J Epidemiol. 1993;22(5):950–2.CrossRefPubMedGoogle Scholar
- 16.Renquist K, Jeng G, Mason EE. Calculating follow-up rates. Obes Surg. 1992;2(4):361–7.CrossRefPubMedGoogle Scholar
- 17.Schemper M, Smith TL. A note on quantifying follow-up in studies of failure time. Control Clin Trials. 1996;17(4):343–6.CrossRefPubMedGoogle Scholar
- 18.Shuster JJ. Median follow-up in clinical trials. Journal of clinical oncology : official journal of the American Society of Clinical Oncology. 1991;9(1):191–2.CrossRefGoogle Scholar
- 19.Altman DG, De Stavola BL, Love SB, Stepniewska KA. Review of survival analyses published in cancer journals. Br J Cancer. 1995;72(2):511–8.CrossRefPubMedPubMedCentralGoogle Scholar
- 20.Korn EL. Censoring distributions as a measure of follow-up in survival analysis. Stat Med. 1986;5(3):255–60.CrossRefPubMedGoogle Scholar
- 21.Clark TG, Altman DG, De Stavola BL. Quantification of the completeness of follow-up. Lancet (London, England). 2002;359(9314):1309–10.CrossRefGoogle Scholar
- 22.Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89(2):232–8.CrossRefPubMedPubMedCentralGoogle Scholar
- 23.Agalliu I, Williams S, Adler B, Androga L, Siev M, Lin J, Xue X, Huang G, Strickler HD, Ghavamian R. The impact of obesity on prostate cancer recurrence observed after exclusion of diabetics. Cancer causes & control : CCC. 2015;26(6):821–30.CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Lawless JF. Some nonparametric and graphical procedures. In: Statistical Models and Methods for Lifetime Data. New York: Wiley; 2002. p. 79–145.Google Scholar
- 25.Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case-control studies. I. Principles. Am J Epidemiol. 1992;135(9):1019–28.CrossRefPubMedGoogle Scholar
- 26.Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B Methodol. 1976;38(3):290–5.Google Scholar
- 27.Fay MP, Shaw PA. Exact and asymptotic weighted Logrank tests for interval censored data: the interval R package. J Stat Softw. 2010;36(2):i02.Google Scholar
- 28.Gentleman R, CJ G. Maximum likelihood for interval censored data: consistency and computation. Biometrika. 1994;81(3):618–23.CrossRefGoogle Scholar
- 29.Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170(2):244–56.CrossRefPubMedPubMedCentralGoogle Scholar
- 30.Satagopan JM, Ben-Porat L, Berwick M, Robson M, Kutler D, Auerbach AD. A note on competing risks in survival data analysis. Br J Cancer. 2004;91(7):1229–35.CrossRefPubMedPubMedCentralGoogle Scholar
- 31.Pan W, Chappell R. Estimating survival curves with left-truncated and interval-censored data under monotone hazards. Biometrics. 1998;54(3):1053–60.CrossRefPubMedGoogle Scholar
- 32.Pan W, Chappell R. A nonparametric estimator of survival functions for arbitrarily truncated and censored data. Lifetime Data Anal. 1998;4(2):187–202.CrossRefPubMedGoogle Scholar
- 33.von Allmen RS, Weiss S, Tevaearai HT, Kuemmerli C, Tinner C, Carrel TP, Schmidli J, Dick F. Completeness of follow-up determines validity of study findings: results of a prospective repeated measures cohort study. PLoS One. 2015;10(10):e0140817.CrossRefPubMedPubMedCentralGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.