Background

Transparency is considered fundamental for the reproducibility of any research finding [1]. Initiatives such as SPIRIT, CONSORT, PRISMA, and PROSPERO have contributed to transparent reporting of protocols and findings of randomised clinical trials and systematic reviews [2,3,4,5]. Still, the multitude of decisions taken during the statistical analysis phase of any study have been shown to impact on results and conclusions, irrespective of pre-published protocols [6]. While any protocol for a clinical study should include the principle features of the statistical analysis of the data, a statistical analysis plan (SAP) should fully outline the details of all planned analyses, including any additional analyses. Recently, Gamble and colleagues used a Delphi survey to reach consensus and provide recommendations for a minimum set of items that should be addressed in a SAP for a randomised clinical trial [7].

Observational studies are frequently the source for multiple statistical analyses and reports. Guidelines for reporting such as STROBE, TRIPOD or STARD are key to transparent reporting of findings of observational studies [8,9,10], but these do not reduce the number of possible decisions taken during the analysis phase of such studies. Like randomised clinical trials, the validity of conclusions of cohort studies is likely to improve by use of published SAPs to distinguish pre-planned analyses from data-driven exercises [1, 11]. Journals now encourage researchers to preregister observational studies and SAPs [11,12,13,14,15], but there are no guidelines on the required content of the latter.

Therefore, we argue that SAP guidelines should also be developed for observational studies. In the absence of such a guideline, we appraised and modified recently developed recommendations for the content of SAPs for clinical trials to be used for observational studies. This paper reports the applicability of SAP guidelines for clinical trials to our single-centre observational study, of which the study design is described elsewhere [16].

Main text: recommended content of SAPs in observational studies

We have appraised the recommendations for the content of SAPs for clinical trials and assessed the applicability of each section to be used for an observational study (Table 1). We added the item ‘confounding’ to the recommended list for observational studies. Compared to clinical trials, confounding is an even more pronounced issue in observational studies and should be considered during model building.

Table 1 Applicability of recommend content of statistical analysis plans for clinical trials to observational studies

The SAP of our observational study, the Simple Intensive Care Studies (SICS)-I, is presented as an example document (Additional file 1). This SAP was written as an add-on document to a pre-published protocol on clinicaltrials.gov [NCT02912624]. In absence of guidelines for observational study protocols, we used the first 20 items from the SPIRIT as a backbone for our observational study protocol (Additional file 2).

Section 1: administrative information

The administrative information section in a SAP for an observational study is equally applicable to the content of a SAP for a randomised clinical trial. Item 1a and 1b were renamed while the content remained the same. For item 1b; a protocol of an observational study can be registered in a dedicated database (e.g. clinicaltrials.gov, researchregistry.com) alike randomised clinical trials [14, 17]. The description of item 4 was rephrased since in observational studies usually no interim analyses are planned (Table 1). All other items, names, and descriptions were left unchanged.

Section 2: introduction

The introduction section in a SAP for an observational study is equal to the content of a SAP for a randomised clinical trial.

Section 3: study methods

Sample size

Unlike randomised clinical trials that calculate a sample size to study an intervention effect taking power into consideration, the sample sizes of most observational studies are influenced by other factors (e.g. resources, time restrictions, convenience). Accordingly, most observational studies will have a given sample size and, if sufficiently large, affording enough power. The STROBE guidelines only expect authors to explain how the study size was arrived at [8], which may reduce the incentive to conduct sample size calculations for observational studies.

When there is a given sample size or if a sample size was not specified in the protocol, we suggest providing power considerations for the primary analysis of the observational study to limit random errors. The power considerations necessitate a definition of a minimally important difference or intervention effect in the presence of a given sample size. Any power calculation provides the chance of a type-II error (false negative findings), while a detectable difference may be clinically more informative. For example, it shows the minimal relative risk that can be detected with the specified power and sample size given a type I error probability α.

Framework

While causality can never be proven in observational studies, observed associations may fuel hypotheses that later can be tested in randomised clinical trials [18]. Although the vast majority of observational studies test for superiority, there are some that address equivalence and non-inferiority hypotheses [19,20,21,22]. Of course, confounding will always be present in any of these frameworks. Nevertheless, a SAP should describe whether the relevant hypothesis was assessed for superiority, equivalence or non-inferiority.

Statistical interim analyses and stopping guidance

Interim analyses are typically known to guide randomised clinical trials for early stopping due to benefit, harm or futility of tested interventions. Investigators are ethically obliged to conduct interim analyses to reduce study patients’ exposure to an inferior intervention. While there is usually no intervention component in observational studies which can be halted, there may be incentives to perform interim analyses for early stopping of continued (costly) data collections due to already clear observed associations or futility. Furthermore, observational studies may be subject to repeated testing of accumulating data, which needs adjustment of significance levels to reduce inflated type-I errors (false positive findings), such as those described by O’Brien & Fleming [23]. Such methods should be described in the SAP.

Timing of final analysis

A SAP for a blinded clinical trial should be published prior to unblinding the data or prior to the randomisation of the first participant in case of an open clinical trial. Likewise, a SAP for prospective observational studies should also be published before the first participant is included or at least all access to the database should be restricted. Randomised clinical trials that include blinding have a natural advantage that interventions can be coded during the statistical analyses. Such coding of interventions is usually not in question in observational studies, but it should be possible to mask the statistician by using coding for several covariates (at least dichotomous and categorical). Except for the study monitors, researchers should be unable to read the database before the study is finished or a SAP is written. If all study data were accessible to the researchers, a detailed SAP may still provide transparency on the intended analytic steps and may prevent ‘fishing’ for statistically significant predictors in analyses or other manipulations of the data. Any analysis that was not prespecified in the protocol and/or the SAP can only be explorative in nature, which should be described accordingly (i.e. exploratory or post-hoc analysis).

Section 4: statistical principles

Multiplicity and type I errors

Multiplicity issues are similar in randomised clinical trials and in observational studies, but rarely addressed in the latter. Most observational studies ignore multiplicity issues by testing in multiple analyses at the same conventional P < 0.05 significance level. This increases the risks of a family wise error rate (FWER), that is the type I error of at least one false positive finding. Several methods have been suggested to adjust for multiplicity, such as those according to Bonferroni or Šidák [24, 25]. Even though International Conference on Harmonization of Good Clinical Practice guidelines recommend full Bonferroni adjustment [26], such an adjustment may be too conservative in correlated outcomes of observational studies [27].

For example, the SICS-I addresses six different primary outcomes spread out across 13 hypotheses [16]. Our outcomes cardiac output, acute kidney injury, and mortality are all affected by a patient’s haemodynamic status, so that most outcomes will probably be positively correlated. Since the Bonferroni adjustment assumes that outcomes are unrelated, we used an adjustment of our significance level that was pragmatic and probably more accurate. For more details we refer to the paper by Jakobsen and colleagues [28].

Section 5: study population

Recruitment

It is necessary to elucidate the numbers of eligible and included patients of an observational study in a flow diagram, preferably according to the STROBE recommendations [29].

Potential confounding covariates

Results of observational studies can be seriously biased by confounding covariates. The randomisation procedure is used in randomised clinical trials to reduce the imbalance in observed and unobserved confounders between the allocated groups, although success can never be guaranteed [30]. The STROBE guidelines advocate to address the rate of confounding; however, it was recently shown that adherence to this statement is suboptimal [31]. A SAP could serve to predefine confounders, and how to address the expected rate of residual confounding by adjustment, or stratification.

Confounding variables are key important to address in observational studies. Usually, datasets of observational studies include large amounts of variables with many inevitably correlated to each other. For example, the SICS-I database contained 19 clinical examination findings which all reflected (a part of) the haemodynamic status of a patient. Next to expected confounding factors, the values of the variables can also be confounded by unmeasured factors such as environmental, genetic, or psychological influences. Therefore, we suggest to provide an a priori list of potentially confounding variables (both ‘measured’ and ‘unmeasured’) so that the reader is better able to assess the degree of residual confounding. Prelisting all potential variables and the approach to model building should be a main concern, if not the most important issue, in the SAP of observational studies.

Section 6: analysis

Analysis methods

Analysis methods of clinical trials and observational studies are different, yet both study types are suspicious of selective reporting when no SAP is written [32]. Many decisions are needed during the analysis phase of an observational study and all that can be foreseen should be prespecified. An extensive description of the planned statistical analyses, all covariates, and all considerations need to be prespecified and detailed, which can only be done in a SAP. The usually short statistical analysis section of a manuscript does not allow a detailed explanation, nor can it guarantee the prespecified status of the analysis.

Sensitivity and subgroup analyses

The cost- and time-intensive nature of a randomised clinical trial necessitates a strict protocol in which all sensitivity and subgroup analyses are (usually) specified. In observational studies, these additional analyses are seldomly specified beforehand. A SAP is an opportunity for authors to prove that they had prespecified intentions of their sensitivity and subgroup analyses.

Missing data

Observational studies are particularly prone to missing data, but often do not address the mechanism of missing values. Complete case analyses in the presence of missing data are associated with bias, when data are not missing completely at random [33, 34]. Tests to identify the patterns and type of missing data, and the statistical methods to handle missing values should be described in a SAP. Examples include multiple imputations for data missing at random or worst-best and best-worst case scenarios for data missing not at random [34, 35].

Harms

Randomised clinical trials are costly and therefore often limited in size and length of follow-up, so that rare harms or late harms (e.g. after decades) remain undetected. Observational studies and post-marketing phase IV randomised clinical trials are much more suitable for detection of rare or late harms [35], of which the cardiovascular harms of clarithromycin in patients with stable coronary heart disease or cyclooxygenase-2 (cox-2) inhibitors are good examples [36, 37]. This item only applies to observational studies with a research questions focusing on an intervention effect. Our SICS-I cohort, for example, was not designed to study such associations.

Applicability of SAP guidelines developed for randomised clinical trials to observational studies

Of the 32 proposed items by Gamble and colleagues (Table 1) [7], 30 items (94%) were also more or less directly applicable to a SAP for an observational study (Table 2). Some of these 30 items differ between trials and observational studies, mainly from a methodological point of view. We enclosed our SAP and study protocol in the supplements for illustrative purposes, so that it may serve as an example document for developing SAPs for other observational studies.

Table 2 Recommended content of statistical analysis plans for observational studies

Main reasons for ignoring two items (6%) were that these recommendations were specifically limited to trials, that is descriptions on randomisation and definition of adherence to the intervention.

Discussion

Preregistration of protocols and SAPs for observational studies has been intensely debated [12,13,14,15, 38,39,40,41,42,43,44,45,46,47,48]. Opposing authors state that preregistration creates the false assumption that data are of high quality, would discourage publication of important accidental findings, and would delay these publications due to bureaucratic procedures [38,39,40,41,42,43,44]. Authors in favour argue that preregistration of protocols and SAPs distinguishes prespecified hypotheses from data dredging expeditions, ensures that methods can be replicated and findings confirmed, and reduces selective outcome reporting and publication bias [45,46,47,48,49]. Our present recommendations show the large similarities between SAPs for randomised clinical trials and observational studies and are parallel to our previous recommendations to publicly and transparently communicate all aspects of randomised clinical trials as well as observational studies from protocol to final results [1].

Observational studies are prone to confounding by indication, residual confounding, and flaws in data collection [50]. We argue that publication of a SAP increases the chance that hypotheses are adequately powered and investigated in the appropriate study population in which also all known confounders, mediators, and covariates are measured [46, 51]. Since credibility and replicability of findings in observational studies are a concern to many [11,12,13,14,15, 46, 52], the publication of a SAP allows better validation of findings in independent cohorts in an identical methodological and statistical manner. Furthermore, the concern that important findings will remain unpublished is less worrying than a lot of accidental findings getting published, creating confusion by researchers hunting hypothesis without real content. For the credibility of an ‘eye-catching’ finding to prevail, it still has to be replicated in a methodological sound study with an a priori hypothesis and an adequate statistical power. Irrespective of its potential benefits, publishing a SAP would at least do no harm and may be seen as an independent transparent determinant of validity.

Conclusions

Both a protocol and a SAP in the public domain are equally helpful for both observational studies and randomised clinical trials [45]. By applying the guideline for the content of SAPs for clinical trials to our observational study we can conclude that more than 90% of the recommended content based on an extensive Delphi survey suits an observational study as well. There are only few adjustments needed for guidance of a SAP for observational studies when compared to a SAP for randomised clinical trials. In absence of SAP guidelines, we think that current recommend contents of SAPs for clinical trials could serve as a structure for SAPs of observational studies.