Background

Specific research questions are ideally answered through tailor-made studies. Although these ad hoc studies provide more accurate and updated data, designing a completely new project may not represent a feasible strategy [1, 2]. On the other hand, clinical and administrative databases used for billing and other fiscal purposes (i.e. “secondary data”) are a valuable resource as an alternative to ad hoc methods (i.e. “primary data”) since it is easier and less costly to reuse the information than collecting it anew [3]. The potential of secondary automated databases for observational epidemiological studies is widely acknowledged; however, their use is not without challenges, and many quality requirements and methodological pitfalls must be considered [4].

Meta-analysis represents one of the most valuable tools for assessing drug effects as it may lead to the best evidence possible in epidemiology [5]. Consequently, its use for making relevant clinical and regulatory decisions on the safety and efficacy of drugs is dramatically increasing [6]. Existence of heterogeneity in a given meta-analysis is a feature that needs to be carefully described by analyzing the possible factors responsible for generating it [7]. In this regard, the results of a recent study [8] show that whether the origin of the data (primary vs secondary) is explored as a potential cause of heterogeneity may change the conclusions of a meta-analysis due to an effect modification [9]. Thus, considering the source of data as a variable in sensitivity and subgroup analyses, or meta-regression analyses, seems crucial to avoid misleading conclusions in meta-analyses of drug effects.

Given the evidence noted [8, 9], we surveyed published meta-analyses in a selection of high-impact journals over a 6-year period, to assess to what extent the origin of the data, either primary or secondary, is explored as a source of heterogeneity in meta-analyses of observational studies.

Methods

Meta-analysis selection and data collection process

General and internal medicine journals with an impact factor > 15 according to the Web of Science were included in the survey [10]. This method has been widely used to assess quality as well as publication trends in medical journals [11,12,13]. The rationale is that meta-analyses published in high impact journals: (1) are likely to be rigorously performed and reported due to the exhaustive editorial process [12, 14]; and, (2) in general, exert a higher influence on medical practice due to the major role played by these journals in the dissemination of the new medical evidence [14, 15]. We searched MEDLINE on May 2018 using the search terms “meta-analysis” as publication type and “drug” in any field between January 1, 2012 and May 7, 2018 in the New England Journal of Medicine (NEJM), Lancet, Journal of the American Medical Association (JAMA), British Medical Journal (BMJ), JAMA Internal Medicine (JAMA Intern Med), Annals of Internal Medicine (Ann Intern Med), and Nature Reviews Disease Primers (Nat Rev Dis Primers).

Two investigators (GP-R, FR) independently assessed publications for eligibility. Abstracts were screened and if deemed potentially relevant, full text articles were retrieved. Articles were excluded if they met any of the following conditions: (1) were not a meta-analysis of published studies, (2) no drug effects were evaluated, (3) only randomized clinical trials were included in the meta-analysis (in order to consider observational studies), (4) less than two observational studies were included in the meta-analysis (since with a single study it would not have been possible to calculate a pooled measure). When a meta-analysis included both observational studies and clinical trials, only observational studies were considered.

A data extraction form was developed previously to extract information from articles. Two investigators (GP-R, FR) independently extracted and recorded the information and resolved discrepancies by referring to the original report. If necessary, a third author (AF) was asked to resolve disagreements between the investigators.

When available we extracted the following data from each eligible meta-analysis: first author, publication year, journal, drug(s) exposure and outcome(s); number of individual studies included in the meta-analysis based on each type of data source used (primary vs secondary), for both exposure and outcome assessment; and exposure- and outcome-related variables included in sensitivity, subgroup or meta-regression analyses. We extracted data directly from the tables, figures, text, and supplementary material of the meta-analyses, not from the individual studies.

Assessment of exposure and outcome

We considered “primary data” the information on drug exposure collected directly by the researchers using interviews –personal or by telephone– or self-administered questionnaires. The origin of the data was also considered primary when objective diagnostic methods were used for the determination of drug exposure (e.g. blood test). “Secondary data” are data that were formerly collected for other purposes than that of the study at hand and that were included in databases on drug prescription (e.g. prescription registers, medical records/charts) and dispensing (e.g. computerized pharmacy records, insurance claims databases). Regarding the outcome assessment, we considered primary data when an objective confirmation is available that endorses them (e.g. confirmed by individual medical ad hoc diagnosis, lab test or imaging results). These criteria are based on those commonly used in the risk assessment of bias for observational studies [16,17,18,19].

Results

MEDLINE search results yielded 217 articles from the major general medical journals (3 from NEJM, 46 from Lancet, 26 from JAMA, 85 from BMJ, 19 from JAMA Intern Med, 38 from Ann Intern Med, and 0 from Nat Rev Dis Primers) (see Fig. 1). A total of 194 articles were excluded (see list of excluded articles with reasons for exclusion in Additional file 1) leaving 23 articles to be examined [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. General characteristics of the 23 included meta-analyses are outlined in Table 1.

Fig. 1
figure 1

Flow diagram of literature search results

Table 1 Characteristics of the 23 included meta-analyses

Source of exposure and outcome data

Table 2 summarizes the evidence regarding the type of data source included in each meta-analysis, according to the information presented in the data extraction tables of the article. The information was evaluated taking the study design into account. Only eight meta-analyses [21, 24, 26, 31, 32, 34, 38, 41] reported the source of data, three of them [31, 34, 38] reporting mixed sources for both the exposure and outcome assessment. Five meta-analyses [21, 24, 26, 32, 41] reported only secondary sources for the exposure assessment, three of them [21, 24, 41] reporting as well only secondary sources for the outcome assessment, while in the other two [26, 32] only primary and mixed sources for the outcome assessment were reported respectively.

Table 2 Reporting of the data source in the data extraction tables of the included meta-analyses

Source of data in the analysis of heterogeneity

All but two [20, 42] of the meta-analyses performed subgroup and/or sensitivity analyses. Although three of them [23, 34, 36] considered the methods of outcome assessment – type of diagnostic assay used for Clostridium difficile infection, method of venous thrombosis diagnosis confirmation, and type of scale for psychosis symptoms assessment respectively– as stratification variables, only the second referred to the origin of the data. Only five meta-analyses [22, 28, 33, 35, 39] included meta-regression analyses to describe heterogeneity, none of which considered the source of data as an explanatory variable. Other findings for the inclusion of the data source as a variable in the analysis of heterogeneity are presented in Table 3.

Table 3 Inclusion of the data source as a variable in the analysis of heterogeneity of the included meta-analyses

We finally assessed if the influence of the data origin on the conclusions of the meta-analyses was discussed by their respective authors. We found that only four meta-analyses [21, 31, 32, 34] noted limitations derived from the type of data source used.

Discussion

The findings of this research suggest that the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity and an effect modifier in meta-analyses of drug effects published in general medicine journals with high impact. Few meta-analyses reported the source of data and only one [34] of the articles included in our survey compared and discussed the meta-analysis results considering the different sources of data.

Although it is usual to consider the design of the individual studies (i.e. case-control, cohort or experimental studies) in the analysis of the heterogeneity of a meta-analysis [43, 44], the type of data source (primary vs secondary) is still rarely used for this purpose [9, 45]. In fact, the current reporting guidelines for meta-analyses, such as MOOSE (Meta-analysis Of Observational Studies in Epidemiology) [18] or PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) [46, 47], do not recommend that authors specifically report the origin of the data. This is probably due to the close relationship that exists between the study design and the type of data source used, despite the fact that each criterion has its own basis. Performing this additional analysis is a simple task that involves no additional cost. Failure to do so may lead to diverging conclusions [8].

Conclusions about the effects of a drug that are derived from studies based exclusively on data from secondary sources may be dicey, among other reasons, because no information is collected on consumption of over-the-counter drugs (i.e. drugs that individuals can buy without a prescription) [48] and/or out-of-pocket expenses for prescription drugs (i.e. costs that individuals pay out of their own cash reserves) [49]. In the health care and insurance context, out-of-pocket expenses usually refer to deductibles, co-payments or co-insurance. Figure 2 shows the model that we propose to describe the relationship between the different data records according to their origin, including the possible loss of information (susceptible to be registered only through primary research).

Fig. 2
figure 2

Conceptual model of individual data recording. * Never dispensed. Absence of dispensing of successive prescriptions (or self-medication) among patients with primary adherence, or inadequate secondary adherence

Failure to take these situations into account may lead to exposure measurement bias [48, 49]. Consumption of a drug may be underestimated when only prescription data is used as secondary source without additionally considering unregistered consumption, such as over-the-counter consumption (e.g. oral contraceptives [34, 50]), that may only be available from a primary database. Alternatively, this may occur when dispensing data for billing purposes (reimbursement) are used for clinical research, if out-of-pocket expenses are not considered (see Fig. 2). The portion of the medical bill that the insurance company does not cover, and that the individual must pay on his own, is unlikely to be recorded. Data on the sale of over-the-counter drugs will also not be available in this scenario.

The reverse situation may also occur and consumption may be overestimated when only prescription data is used, if the prescribed drug is not dispensed by the pharmacist; or when dispensing data is used, if the drug is not really consumed by the patient. While primary non-adherence occurs when the patient does not pick up the medication after the first prescription, secondary non-adherence refers to the absence of dispensing of successive prescriptions among patients with primary adherence, or to inadequate secondary adherence (i.e. ≥20% of time without adequate medication) [51] (see Fig. 2). In some diseases the medication adherence is very low [52,53,54,55], with percentages of primary non-adherence (never dispensed) that exceed 30% [56]. It should be noted that the impact of non-adherence varies from medication to medication. Therefore, it must be defined and measured in the context of a particular therapy [57].

Moreover, failing to take into consideration the portion of consumption due to over-the-counter and/or out-of-pocket expenses may lead to confounding, as that variable may be related to the socio-economic level and/or to the potential of access to the health system [58], which are independent risk factors of adverse outcomes of some medications (e.g. myocardial infarction [21, 28, 30, 41]). Given the presence of high-deductible health plans and the high co-insurance rate for some drugs, cost-sharing may deter clinically vulnerable patients from initiating essential medications, thus negatively affecting patient adherence [59, 60].

Outcome misclassification may also give rise to measurement bias and heterogeneity [61]. This occurs, for example, in the meta-analysis that evaluates the relationship between combined oral contraceptives and the risk of venous thrombosis [34]. In the studies without objective confirmation of the outcome, the women were classified erroneously regardless of the use of contraceptives. This led to a non-differential misclassification that may have underestimated the drug–outcome relationship, especially when the third generation of progestogen is analysed: Risk ratio (RR) primary data = 6.2 (95% confidence interval (CI) 5.2–7.4), RR secondary data = 3.0 (95% CI 1.7–5.4) [34].

On the one hand, medical records are often considered as being the best information source for outcome variables. However, they present important limitations in the recording of medications taken by patients [62]. On the other hand, dispensing records show more detailed data on the measurement of drug exposure. However, they do not record the over-the-counter or out-of-pocket drug consumption at an individual level [48, 49], apart from offering unreliable data on outcome variables [62, 63].

Limitations

The first limitation of this research is that its findings may not be applicable to journals not included in our survey such as journals with low impact factor. Despite the widespread use of the impact factor metric [64], this method has inherent weaknesses [65, 66]. However, meta-analyses published in high impact general medicine journals are likely to be most rigorously performed and reported due to their greater availability of resources and procedures [12, 14]. It is then expected that the overall reporting quality of articles published in other lesser-known journals will be similar. Another limitation would be related to the limited search period. In this sense, and given that the general tendency is the improvement of the methodology of published meta-analyses [67, 68], we find no reason to suspect that the adverse conclusions could be different before the period from 2012 to 2018. Although it exceeds the objective of this research, one last limitation may be the inability to reanalyse the included meta-analyses stratifying by the type of data source since our study design restricts the conclusions to the published data of the meta-analyses, which were insufficiently reported, or the number of individual studies in each stratum was insufficient to calculate a pooled measure (see Table 2).

Conclusions

Owing to automated capture of data on drug prescription and dispensing that are used for billing and other administration purposes, as well as to the implementation of electronic medical records, secondary databases have generated enormous possibilities. However, neither their limitations, nor the risk of bias that they pose should be overlooked [69]. Thus, researchers should consider the link between administrative databases and medical records, as well as the advisability of combining secondary and primary data in order to minimize the occurrence of biases due to the use of any of these databases.

No source of heterogeneity in a meta-analysis should ever be considered alone but always as part of an interconnected set of potential questions to be addressed. In particular, the origin of the data, either primary or secondary, is insufficiently explored as a source of heterogeneity in meta-analyses of drug effects, even in those published in high impact general medicine journals. Thus, we believe that authors should systematically include the source of data as an additional variable in subgroup and sensitivity analyses, or meta-regression analyses, and discuss its influence on the meta-analysis results. Likewise, reviewers, editors and future guidelines should also consider the origin of the data as a potential cause of heterogeneity in meta-analyses of observational studies that include both primary and secondary data. Failure to do this may lead to misleading conclusions, with negative effects on clinical and regulatory decisions.