Background

Longitudinal studies are vital to understanding disease progression. Chart reviews are a common source of longitudinal data, and can be used to identify the long-term benefits of a medical intervention, risk factors for poor outcomes, and the burden of disease over time. Chart reviews are inexpensive and popular; for example, they are estimated to comprise 25% of all scientific articles published in emergency medicine journals [1]. However, chart reviews often feature irregular follow-up times, i.e. visit times that vary among patients, often to the extent that no two patients share an observation time. If patients visit more often when unwell, this can lead to a biased picture of disease course unless the data are analyzed appropriately [2].

Many analyses of longitudinal data subject to irregular observation use traditional approaches to longitudinal data analysis such as generalized estimating equations (GEEs) [3] and linear mixed models [4]. While these methods can be run on data with irregular follow-up, they will give biased inferences if the visit intensity is related to the outcome [5]. For this reason, methods designed specifically for irregular observation are usually required.

Statistical methods to handle longitudinal data subject to irregular follow-up began to be developed in the 1990s [6, 7]. There is now a substantial literature on these methods, which include inverse-intensity weighting [2, 8,9,10] and semiparametric joint models [11,12,13,14]. Although specifically developed to help medical researchers by addressing the problem of over-representation of certain individuals or certain types of measurements in longitudinal studies with irregular follow-up, their use remains limited. A 2015 citation analysis using the Web of Science revealed that these methods were used only once as the primary analysis [15] and applied twice as a sensitivity analysis [16, 17].

These methods are either not being used because they are not needed or because there is a knowledge translation gap. This paper aimed to assess whether the lack of use is due to a lack of need. Specifically, we used a systematic review to address the following questions: Among longitudinal studies published in the medical literature that used data collected as part of patients’ usual care, and that were published in the period January 2005 to May 2015, 1. what proportion reported summary statistics on a) the number of visits per patient, b) gaps between visits, c) total follow-up time; 2. was there an assessment of predictors of visit time, and if so, was there a need to account for the fact that visit time was irregular; 3. was a method used that accounted for potential informativeness of visit times? The first question addresses whether the extent of irregularity was reported, the second whether visit times were informative about the outcome, and the third whether an appropriate method was used.

Methods

This review did not include outcomes of direct patient or clinical relevance and was thus not eligible for registration in Prospero (International Prospective Register of Ongoing Systematic Reviews, http://www.crd.york.ac.uk/prospero) [18, 19].

Search

We performed a search of the MEDLINE and EMBASE databases to identify studies assessing longitudinal data collected as part of patients’ usual care (see Additional file 1 for search terms). For both databases, the earliest publication date was restricted to January 2005, since several methods for analyzing longitudinal data subject to irregular follow-up were proposed by this time [6, 7], and the latest publication date was May 13, 2015.

Study selection and eligibility criteria

Eligibility criteria were chosen so as to specify studies where follow-up would be expected to be irregular, and where inverse-intensity weighting or semi-parametric joint modelling would be an appropriate method of analysis. Our analysis was limited to articles published in English.

We included studies that used patient-level data collected as part of patients’ usual care with an outcome that was measured on at least three occasions. We excluded studies that met one or more of the following criteria: 1) outcome was assessed on fewer than three occasions; 2) outcome was whether or not a visit occurred, or the number of visits; 3) visit times were specified by protocol, or analysis restricted to visits at specified times; 4) time-to-event analyses; 5) outcome was a single binary outcome per patient; 6) the outcome could have occurred only if a visit occurred; 7) outcome was measured on aggregate data. In addition, systematic reviews, meta-analysis and randomized controlled trials were also excluded.

We combined the searches from MEDLINE and EMBASE, removed duplicates and screened abstracts for eligibility. In the summer of 2016 (May–September) we trained a team of four reviewers (AA, JK, ES, YW) and two reviewers were chosen at random for each paper. These reviewers independently assessed both the abstracts and full-text articles, made eligibility decisions and resolved disagreements by discussion. If necessary, a third party was consulted. As our reviewers were working part time, not all papers were assessed during this time, and the remainder were assessed by DF and EP. The same template was provided to each reviewer to record their results. In the first stage, abstracts were classified as either ineligible based on the above inclusion and exclusion criteria, or as needing full-text review. In the second stage, the full-texts were reviewed for abstracts that were not excluded. Agreement between reviewers was assessed using Cohen’s kappa [20].

Data extraction

The following data were extracted independently by two reviewers (DF and EP), with discrepancies resolved by consensus: descriptive data on the number of visits per patient (e.g. mean, median, range); descriptive data on gaps between visits; descriptive data on follow-up time (e.g. maximum follow-up time, median follow-up); how the longitudinal data was analyzed (methods used, covariance structure reported, rationale explained); whether participants were enrolled prospectively; whether there was a clearly defined end of the study, and if so, how many participants were followed to the end of the study; whether characteristics of those lost to follow-up were compared with those who were not; whether there was an assessment of predictors of visit times, and if so, how this was assessed (e.g. recurrent event regression); whether there was a need to account for the fact that visit time was irregular, and if so, whether the statistical analysis accounted for it. The statistical literature indicates that visit irregularity should be accounted for if it is informative, that is, if the visit and outcome processes are not independent. This could happen if there were a covariate (observed or unobserved) that was associated with both the outcome and the visit times. For example, if the outcome of interest is blood pressure and older patients tend to have higher blood pressure and also more measurements, then the visit scheme is informative. Thus if analysis of visit times uncovers a predictor that is also a predictor of outcome, the visit times are informative and should be accounted for. We distinguished between papers that reported results of analysis intended to assess whether the visit scheme was informative (i.e. an assessment of predictors of visit times, e.g. through recurrent event analysis of the visit process), papers where an informative visit scheme could be deduced based on other information in the paper (e.g., descriptive statistics on length of follow up or number of visits, separately for certain subgroups), and papers where it was not possible to tell whether the visit scheme was informative because insufficient analysis was reported.

Results were summarized using percentages.

Assessment of study quality

The Newcastle-Ottawa Scale (NOS) [21] was used to assess the quality of included studies in this systematic review. Each study was evaluated based on the NOS scale for fulfilling the established criteria in NOS for the 3 components of selection, comparability and outcome. An overall quality score was calculated by adding the number of stars for each category for a maximum total of 9.

Results

The search identified 1546 articles, of which 279 proceeded to full-text review, and 44 were included in final analysis (See Fig. 1). The reviewers agreed in their inclusion/exclusion decision in 96% of the 1546 articles, with a kappa of 0.57. We found that the proportions of articles that reported summary statistics on the number of visits per patient, gaps between visits and the total follow-up time were 57% (n = 25), 7% (n = 3) and 57% (n = 25), respectively (Table 1). Twenty-two percent (n = 10) of articles did not provide summary statistics on any of the above (See Table 2).

Fig. 1
figure 1

PRISMA flow diagram

Table 1 Summary statistics on reporting of visit irregularity, predictors of visit times, and methods of analysis
Table 2 Descriptive information and extracted variables of interest for included studies

The majority of articles (93%, n = 41) did not assess predictors of visit time. In 38 articles (86%), there was insufficient analysis to determine whether there was a need to account for informative visit times, and in the remaining 6 studies, this need was present. Only one of these 6 studies detailed analysis in the methods section that was intended to check for predictors of visit times (i.e. an informative visit scheme) [22] . In four of the 6 studies, the reviewers inferred that visit times were informative: one study provided results demonstrating that age was a predictor of visiting [23]; a further three studies reported predictors of the total length of follow-up [24,25,26]; and in the remaining study, it was known by design that high-risk patients were asked to visit more often [27].

Thirty-one of 44 articles (70%) used mixed models or repeated measures to analyze outcomes. In two cases data was reduced before using repeated measures (once by taking a mean within pregnancy trimesters, once by using the first three measurements only). Only one study used a method specifically designed to handle informative visit times, namely an inverse-intensity weighted GEE [2, 22] .

The mean overall quality score using NOS for all included studies is 7.11 with a standard deviation of 1.46. We found that 70%, 59% and 32% of included studies obtained maximum scores for each of the 3 subcategories of NOS which are selection, comparability and outcomes, respectively. A histogram of this data is depicted in Fig. 2 and the individual scores are available in Table 3.

Fig. 2
figure 2

NOS Overall Quality Scores for included studies

Table 3 Newcastle-Ottawa Score for included studies

Discussion

We conducted a systematic review of articles that used longitudinal data collected as part of patients’ usual care. We found that reporting of variability in number or timing of visits was suboptimal, and reporting on the potential informativeness of visit times was rare. Furthermore, a method specifically designed to account for informativeness of visit times was used in just one of the 44 studies. On using the NOS scale to assess study quality, only 14 studies (32%) reported adequate cohort follow-up.

When visit times are irregular, it is important the investigate whether visit times are informative, that is, whether visit and outcome processes are dependent [2, 5]. This should also be reported on, so that the reader is aware of the scope for bias due to visit irregularity; this is very similar to the need to investigate and report missingness mechanisms when missing data is present [28, 29]. Only one study detailed analysis in the methods section designed to check for informativeness of the visit times, while in a further five studies informativeness was inferred by the reviewers but neither named as a potential source of bias nor accounted for in the analysis.

Our findings are consistent with an overall context of poor reporting. For example, a recent systematic review of studies using routinely collected health data found that reporting was poor, with 30% reporting study design in the title or abstract, and only 41% providing sufficient information to formulate a research question [30]. In the context of longitudinal prognostic studies in lupus, a systematic review found that 56% of studies had a high risk of bias with regards to attrition [31]. Only 43% of prospective cohort studies were found to have reported the amount of missing data [32], and only half of trials with missing longitudinal data explained the reasons for their choice of missing data method [33]. Given that this occurs despite considerable efforts to improve the reporting of observational studies and missing data (including the widely endorsed STROBE reporting guideline [28]), it is not surprising that few studies report on the degree and informativeness of irregular visits, for which there is no guidance in the literature.

Poor reporting makes it impossible to determine definitively whether lack of use of methods for longitudinal data with irregular follow-up is due to lack of need. However, the inclusion/exclusion criteria were designed to capture studies with irregular follow-up, and for such studies the set of circumstances under which a simple GEE or linear mixed model leads to unbiased inferences is extremely narrow. For a GEE this requires visit times to be independent of both past and future outcomes. This is generally implausible when data is collected as part of usual care, since usually patients will be seen more often when unwell. A linear mixed effects model yields unbiased estimates of regression coefficients in the presence of informative visit times only if the predictors of visit times are included in the mixed model [4]. Moreover, in the case of repeated measures analysis the outcome should not be dependent on time if the timings of the visits vary. Some studies attempt to standardize the number of data points per patient used in regression models, e.g. by taking the mean measurement per patient per year. While this is effective at ensuring that each patient is equally represented, it overlooks the fact that certain types of measurement are likely over-represented. For example, if patients visit more often when unwell, then the mean of the observed measurements in any given year over-estimates the patient’s burden of disease for that year. We thus hypothesize that among the 44 studies identified, many did in fact need analytic techniques specifically designed to account for an informative visit process.

In each of the five papers that identified predictors of both visit times and outcomes but that did not use a method to account for the informative visit process, an inverse intensity weighted analysis was feasible. Such analyses could be made more accessible through availability of suitable software. Inverse intensity weighted GEEs can be fitted using PROC GENMOD in SAS or geeglm in R after calculating the intensity separately, but a one-step estimation function would be preferable. Similarly, there is no R package or set of SAS macros for fitting semi-parametric joint models.

While a 2015 Web of Science citation analysis suggested that methods that account for informative visit times had been used just three times in the medical literature, this review identified a fourth [22]. This paper was not identified by the citation analysis as the reference to the inverse-intensity weighting method was incorrect (first and last author names were reversed).

The analysis of longitudinal data subject to irregular follow-up has been an active area of research in the past decade [2, 6, 7, 34, 35]. However, our findings suggest that knowledge of these methods has yet to be translated into medical research. These methods have received less attention than those used in handling missing data [34]. The uptake of biostatistical methods in medical research is facilitated through collaboration and the availability of software to implement these methods [36]. A proactive approach is needed to bridge the knowledge gap with respect to longitudinal data subject to irregular follow-up. There is also a need for standards for reporting longitudinal studies subject to irregular follow-up, both in terms of the extent of irregularity and its informativeness. Improving the quality of reporting and using methods that account for the informative nature of the visit process will reduce the risk of bias and hence improve the quality of evidence in the medical literature.

Recommendations

The best way to avoid bias due to irregular observation is through study design. In a prospective study this can be accomplished by specifying visit times a priori. Some studies, however, follow clinic-based cohorts where visits are on an as-needed basis and vary among patients; adding additional study visits would substantially increase the cost of the study. Likewise, in a retrospective study the visit times are already set. In these cases, analysis should begin with an investigation of the variability of visit times, and by looking at whether there are any factors that predict visit frequency. The former can be accomplished by descriptive statistics on numbers of visits and gaps between visits, and the latter by a recurrent event analysis on the visit times. If important predictors of visit frequency are found, a method that accounts for the informativeness of visit times should be used. Such methods include inverse intensity weighting [2, 8,9,10] and semi-parametric joint models [11,12,13,14]. See Pullenayegum & Lim [5] for a review together with guidance on when to use each method.

Conclusion

We found a low proportion of studies reporting on the potential informativeness of visit times. There is a need for guidance to researchers on the potential for bias and the reporting of longitudinal studies subject to irregular follow-up.