Introduction

Health policy on vaping, also known as using e-cigarettes, vapes, vape pens, or electronic nicotine delivery systems (ENDS) [1], should be guided by scientific research. However, the majority of published studies on the topic of vaping are replete with flawed methodologies, misleading discussions, and unreliable conclusions [2]. Misleading literature can misinform well-intended health care practitioners, researchers and policy makers, as well as patients and caregivers. It is, therefore, essential that published literature on the topic of vaping be reviewed to establish whether they are fit for this purpose.

As many journal articles on the topic of vaping or tobacco smoking provide conflicting and unsubstantiated research findings, we undertook a critical appraisal of such research articles. Herein, we delineate our findings, including common flaws in study design, participant recruitment, data analysis, and other methods that undermine the reliability of vaping studies. The purpose of this paper is threefold: (i) to help guide researchers who endeavor to improve the quality of study design and methods; (ii) to prepare readers to critically evaluate the reliability of vaping research and literature; and (iii) to address myths and misconceptions perpetuated by flawed vaping literature.

Methods

We used the Google Scholar search engine (30 November 2020) to obtain the most "popular" journal articles on vaping research. We used the Google algorithm definition of "popular", i.e., the articles most read and most cited in other literature and policy discussions. We searched behavioral human subjects research on causal claims related to vaping. Specifically, we ran the search string: “e-cigarette OR ‘electronic cigarette’ OR vaping OR ‘electronic nicotine delivery system.’" One researcher stepped through the articles in order of search results ranking and identified the ten most frequently cited articles on each of the following topics: (i) the effects of vaping on smoking cessation/reduction; (ii) the effects of vaping on smoking initiation; and (iii) the health outcomes associated with vaping. A second researcher reviewed the ten identified studies for inclusion. Disagreements between the two reviewers were resolved by consensus or by the decision of a third reviewer, and additional studies were stepped through until ten agreed upon studies were identified.

We acknowledge that alternative methods exist to define "popular" and to identify vaping literature, such as a PubMed search. However, we used the Google Scholar algorithm for the purpose of this paper because it better reflects the search methods used by policy makers, advocacy groups, health care providers, and patient populations.

An initial search returned the titles of articles, which, upon manual review, were determined to not truly meet search criteria. Such articles were replaced by continuing through the search. Excluded papers were those that did not meet our intended search criteria, e.g., those that addressed descriptive epidemiology, chemistry and toxicology, acute responses to exposure, and analytic papers that were not empirical.

We conducted a review and critical appraisal of the 24 most popular journal articles on causal claims related to vaping and discuss our findings below. An analysis of each paper includes a discussion of common study design and methodology flaws. In particular, we critically analyzed papers for significant limitations: improper methods; significant flaws in applying potentially useful methods; suboptimal participant recruitment and retention. Specifically, papers meeting inclusion criteria were critically analyzed for the following strengths and limitations:

  1. 1.

    Did the study clearly describe the method of investigating causal pathways? Scientific standards require researchers to specify a causal hypothesis, and describe a study design and data collection methods to investigate that hypothesis. If researchers merely discuss a causal association and present statistical data without establishing causation, we highlight such deficiencies.

  2. 2.

    Were the study design and research methods sufficiently robust to control for confounding factors?

  3. 3.

    Do the results support the stated conclusions, without overstatement?

  4. 4.

    Do the researchers present language or data that is misleading, or fail to acknowledge significant limitations?

While many included papers contained idiosyncratic problems, we did not address such flaws as they fell beyond the scope of our analysis. Instead, we highlight the themes of common flaws that warrant focused attention that will guide future researchers.

One researcher then grouped the studies according to whether they addressed the effects of vaping on smoking cessation and reduction; the effects of vaping on smoking initiation; or of vaping on health outcomes (Table 1) A second researcher critically appraised studies and reported on each study, presenting strengths and limitations (Appendices A, B and C), which were discussed with the other researchers until a consensus was reached on each study.

Table 1 Main characteristics of included studies

Moreover, to avoid any misjudgment of the selected studies, we contacted each corresponding authors and shared the critical appraisal of their papers asking them to identify misinterpretations from our part and to discuss any additional methodological limitations/strengths of their work. In all cases, we received constructive advice that improved the quality of our final analyses and appraisal.

We also analyzed the studies collectively as a body of literature, highlighting common missteps in study design, methodology, and implementation (Table 2). As many included papers demonstrated common research flaws involving confounding factors, causative associations, and the counterfactual analysis [2, 3], these terms are set forth in Fig. 1.

Table 2 Included studies: study design, methodology, and implementation flaws
Fig. 1
figure 1

Terminology related to assessing for causal associations between vaping exposures, behavior, and health outcomes

Results

The 24 journal articles identified by our search methods are listed in Table 1 [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Many papers purporting to be scientific literature contained only subjective information. Exclusions included approximately 28% of search results that addressed cannabinoid vaping (EVALI) or other cannabis use, and approximately 33% of search results that were case studies.

The majority of papers devote their focus to either smoking cessation or initiation, not both; such articles are assigned to one category accordingly, even if the article contains secondary discussion of the other topic (Table 1). One paper included a substantial analysis of both smoking cessation and smoking initiation and is addressed in both sections (Table 1). The remaining papers address the health outcomes associated with vaping (Table 1).

The 24 included papers for all three categories were riddled with flaws as summarized below and discussed in more detail in Table 2 and Appendices A, B, C.

Effects of vaping on smoking cessation and reduction

The ten articles on the effect of vaping on smoking cessation or reduction are described in more detail with regards to strengths and limitations in Appendix A. An individual who smokes cigarettes may engage in vaping as a strategy to aid smoking cessation or reduction. Several research studies purport to assess the effect of vaping on smoking cessation and reduction success. A critical appraisal of these studies revealed numerous flaws. Researchers often evaluate the probability of success for a given quit method, yet mistakenly assume that the number of quit attempts is fixed. In fact, education as to a novel quit strategy may prompt additional quit attempts. Thus, the quit method (e.g., vaping) warrants credit for prompting an additional quit attempt. Research study designs should include a multivariate analysis, and control for confounding, assessing factors such as vaping status, smoking status, cessation and reduction goals, number and method of quit attempts.

Several researchers failed to clearly state the causal pathway they were investigating. For example, if an individual has a successful smoking quit attempt, and would not have successfully quit in the absence of vaping, then vaping caused that cessation. However, other potential causal pathways exist that the researchers did not explore. Consider the individual who would not otherwise have made a quit attempt, yet does so (and succeeds) because the option of vaping motivates the quit attempt. This second pathway includes both intentional quit attempts, and unintentional quitting (dubbed, the “accidental quitter” phenomenon), whereby someone who smokes tries vaping without the intention of switching, but finds it so appealing that they switch.

The researchers also selected flawed inclusion/exclusion criteria. For example, a given population may consist of a many former smokers who have made successful quit attempts by switching to vaping. To conduct a study in such a population, but include only active smokers, researchers will be evaluating only those already less likely to quit by switching (as suggested by the fact that others did, and they did not). Moreover, to exclude former smokers who successfully quit by vaping creates a biased participant population. In our literature review of vaping effects on smoking cessation, many researchers did not account for these trends. The numerous anecdotal reports of those who found vaping to be an effective aid to smoking cessation may inspire future researchers to formulate robust study designs and participant recruitment methodology.

Epidemiological studies assessing population trends may note the incidence and prevalence of vaping and smoking. However, such studies generally lack the specificity needed to establish a causal association between vaping and smoking cessation. Moreover, incidence and prevalence of vaping-related behaviors tend to fluctuate due to a confluence of variables, such as changing technology, marketing, and media coverage. Research on general population trends should include relevant data points in their analysis and not overstate their conclusions.

Effect of vaping on smoking initiation

The effect of vaping on smoking initiation was addressed in 11 of the included papers, with detailed discussion of strengths and limitations in Appendix B.

One risk from vaping investigated by researchers is the possibility that those who initiate vaping are more likely to subsequently initiating smoking. Often dubbed a “gateway effect,” this potential causal association is often asserted as if proven by data, when it is not.

The flaws in studies addressing the “gateway effect” have been discussed at length [28]. The studies we analyzed lacked sound research methods, and as such, could not reliably establish causation or identify a gateway effect. Moreover, health behaviors related vaping and smoking were described with insufficient detail as to the duration, amount, and frequency of the vaping/smoking. This renders participant classification uninformative, and the resulting data unreliable. For example, the phrases “tried vaping” and “was a vaper” may describe two very different levels of vaping exposure, yet these participants may be classified together in a research study. Moreover, researchers should be sufficiently culturally competent to explore causal pathways. For example, “tried vaping, discovered an appreciation for nicotine, and as a result took the opportunity to start smoking when it was presented” is a plausible causal pathway, whereas “became a dedicated vaper and then switched to smoking” might represent someone who would have become a smoker anyway.

Further, the propensity to initiate tobacco use in the absence of vaping is also poorly established, and as such, constitutes a questionable metric. This is particularly troublesome when the claims of a “gateway effect” may be exploited, without support, to create concern regarding other risk taking behaviors, such as illicit drug use. Discussions in health literature should be grounded in data and not be unduly alarmist.

Further, it is important to categorize participants according to nicotine preference: about half the population likes being under the influence of nicotine and half does not. This variation alone guarantees a substantially higher smoking uptake among vapers (and vice versa). This is partially a result of physiology and psychological characteristics, and partially a matter of attitude. Most people who do not use any nicotine product are actively averse to doing so. Thus, people who never vape, smoke, or use any other any tobacco product will inevitably initiate one such product less often than users of tobacco products.

The studies analyzed did not control for confounding or use methods designed to account for heterogeneity among participants. As such, the limitations were significant, and the researchers could not credibly make causal claims. Finally, many studies contained data findings suggesting that vaping behavior may replace would-be smoking. Populations studies reveal trends of increased rates of vaping associated with decreased rates of smoking. However, the role of vaping in preventing smoking initiation has not been fully investigated, and is a meaningful topic for future research.

Epidemiology of smoking, vaping, and health outcomes

The four papers that addressed health outcomes from vaping (Appendix C) also had numerous flaws. For example, researchers attempted to assess the non-acute effects of vaping in a population of former smokers without acknowledging an inherent limitation: the characteristic clinical traits of this population include the consequences of prior smoking, which mask the non-acute effects of vaping. In such a population, it is difficult to determine whether morbidity and mortality outcomes are attributable to vaping or prior smoking. It is a major design flaw to fail to account for current, former and dual use of cigarettes, thereby ignoring that the majority of vapers do so to quit or cut down on cigarette smoking [29,30,31]. Despite up to 70% of e-cigarette users reporting dual use, 32 studies did not routinely account for dual use when investigating risk from vaping, thereby attributing health outcomes to vaping when they may have actually resulted from smoking cigarettes. It is also difficult to design a study and identify potential participants to control for likely confounding factors.

There is ample evidence to suggest that the individual chemical exposures from vaping cause either a fraction of the risk posed by smoking. The plausible range here is an order of magnitude smaller than the variation in the residual health effects from former smoking, which vary based not merely on the existence of former smoking (typically the sole metric), but other factors, e.g., the duration and quantity of former smoking (occasionally measured), time since quitting (occasionally measured), intensity of use and puffing behavior (rarely measured). Classifications of smoking status also lacked granularity. For example, many studies merely classify smoking status generally (e.g., current, former or never) without accounting for duration of smoking, time since quitting, or frequency and quantity of tobacco use.

These flaws were found consistently in each of the four articles analyzed, rendering their conclusions misleading. Long-term prospective studies of appropriately categorized participants would be useful to compare the health outcomes associated with vaping to those of smoking. It is also essential to control for confounding. In conclusion, the use of unrefined definitions and classifications in this population is a serious flaw in study design and methodology.

In Table 2, we provide a summary of our critical appraisal revealing common, preventable flaws, the identification of which may guide future researchers to improve on their ability to design or appraise not only tobacco harm reduction research domain, but any form of epidemiologic research for that matter.

Discussion

A critical review of the included literature revealed numerous flaws, and limitations notably outweighed strengths.

For articles on smoking cessation and reduction, most notably the researchers failed to acknowledge that vaping as a quit strategy may increase the number of quit attempts, thereby increasing the likelihood of success. Further, many studies lacked a robust design with a multivariate analysis that controls for confounding. Moreover, the researchers often failed to articulate a hypothesis or identify a suspected causal pathway. Finally, the researchers used flawed inclusion/exclusion criteria for study participants, such that former smokers who already quit using vaping as a quit strategy are excluded, effectively reducing the number of people who found this method successful.

Many researchers investigating smoking initiation refer to the so-called “gateway effect.” The included papers did not reliably establish a causal association between vaping and smoking initiation. Many papers referred to a so-called "gateway effect" as if supported by data, when it is not. Several such papers had an alarmist tone, lacked meaningful metrics, and lacked relevant descriptions of vaping-related behaviors. As such, the authors' conclusions were unreliable.

Some papers investigating health outcomes as a result of vaping attempted to identify effects of vaping in a cohort of former smokers, with significant pre-existing health conditions that would mask assessment of the outcome of interest.

Study designs should carefully consider the causal pathway under study. Without a longitudinal study design, it is difficult to infer causation, yet cohort studies suffer their own problems when studying vaping due to issues such as the stock-flow problem. Careful attention should be given to account for the numerous confounders. However, in real-life settings, where it is not feasible to conduct research under randomized, blinded, controlled settings, the prospect of residual confounding is very likely.

Overall, the population-based studies lacked granularity and meaningful metrics, and therefore, could not reliably make causal claims.

Some studies contained interesting data points worthy of future research, but lacked generalizability beyond conditions specific to the study. The lessons available from the papers in this review are predominantly negative. There are several papers that are solid workaday building blocks, but their generalizable lesson simple: “don’t overreach.” Most of the included papers offer only errors from which to learn. The questions most researchers address are far more difficult than typical epidemiology questions. We found no studies that employed carefully designed, fit for purpose methods to try to address the particular challenges of answering these difficult research questions. Our analysis provides several specific lessons:

First, none of the included papers proposes a valid hypothesis, and none assesses what associations we should expect to find if truly based on causal pathways. Determining causal association is very complex, particularly in the context of vaping and smoking behavioral research. This research study designs do not account for such complexity.

Second, changing the exposure and result measurements from "vaped at least once in the last 30 days and smoked every day for the last month" to, for example, "vapes daily and smoked at least once in the past week" would be more relevant for public and clinical health purposes.

Third, proposed causal claims must be made precisely, in terms of exposure(s) and outcome(s), and with hypotheses about the various potential causal pathways. The research should then be designed to assess whether the results support the primary hypothesis of interest. Attention to causal pathways would avoid many of the problems noted in the included papers. However, we acknowledge the challenge of addressing multiple causal pathways that would produce a particular association, and the difficulty in distinguishing them, as is the case with the “gateway” studies.

Fourth, it is important to recognize the pathway that vaping inspires additional quit attempts that would not otherwise happen. Overlooking this pathway is a common failure in research design. The stock-flow problem could be avoided by recognizing, for example, that the pathways to smoking plus vaping at a particular point in time include discovering that vaping is not a satisfying complete substitute, while the pathways to being among non-smokers includes discovering that it is.

Fifth, causal pathways are most often considered in research with regard to confounding and identifying which variables should and should not be used as control covariates. The use of causal pathway analysis would also be useful in these areas of research, but was not contemplated in the papers we reviewed.

Finally, using conventional epidemiology methods to assess complicated causal questions is not appropriate for real-world science such as vaping. For example, trying to identify health effects of vaping in a cohort of former smokers is quite challenging, as it is nearly impossible to reliably distinguish what can only be a tiny signal from enormous noise. This is made worse by flawed measures of smoking history and vaping patterns.

One of our aims in preparing this analytical review was to identify common, avoidable methodologic mistakes and to provide simple lessons for conducting more robust research. Perhaps, an important lesson is to identify important relevant research questions. Useful questions are those that are precise, contingent, nuanced, and focused on quantifications that are motivated by externally defined questions rather than what is convenient to do with a dataset.

Another aim was to empower readers of vaping literature to critically analyze the studies, findings, and conclusions of papers they may read. Skepticism as to the validity of conclusions may be warranted, because they are often misleading and unsupported. This is not currently a field where high levels of trust in “the scientific literature” is warranted and where readers would be able to extract reliable information without consideration of the methodological issues pointed out in this review.

Readers, however, should be able to come away from reading the present paper with a better collection of ideas about how to assess what research results really show, and whether the authors' claims are accurate. The findings from the studies included in our review, alongside more general reading, has been formulated in Table 2 into a series of recommendations for future research in the field of vaping and smoking cessation, initiation and health outcomes.

It is also worth noting that due to the slow pace of both data releases and the publishing process, alongside the rapid change in vaping technology, almost all publications in the journal literature are out of date by the time they are published. In an area of study where technology is improving year-to-year, fads and social acceptability have changed multiple times, and dominant messaging can change month-to-month, timing is important. Non-academic literature and pre-prints may then be a useful adjunct to the more formal peer-reviewed journal articles and should be included in the arsenal of evidence base.

The findings of our analysis have implications for researchers, reviewers, and scientific editors; we have distilled out clear recommendations for optimal, if aspirational, study design and analysis strategies for assessing the impact of vaping on the outcomes of interest into a summary table (Table 3). It is the authors’ view that this would further strengthen the paper by providing a useful and readily accessible template for investigators to refer to in planning their own vaping-related studies as well as for readers and policy makers to help them in evaluating the validity of the literature on vaping.

Table 3 Summary table of key study design recommendations

The findings of our review article have implications for policy makers. Our review found that very few studies were sufficiently rigorous to form conclusions on smoking cessation, initiation (the so-called Gateway Effect) and health risks and were not rigorous enough to inform policy. Yet such studies are used for policies (e.g., on tobacco harm reduction regulation). Policies should be informed by better science; this review and two previous systematic reviews on this topic support a more rigorous approach by policy makers in the selection of studies used to inform policy [2, 32, 33].

Conclusion

Our critical appraisal reveals common, preventable flaws, the identification of which may provide guidance to researchers, reviewers, scientific editor, journalists, and policy makers. One striking result of the review is that a large portion of the high-ranking papers came out of US-dominated research institutions whose funders are unsupportive of a tobacco harm reduction agenda.

However, this does not mean there is a trove of good research out there that answers the big questions, but merely did not make the popularity cut. There is not. Notably, papers discussing the effect of vaping on smoking initiation shared common flaws. By contrast, papers addressing the effect of vaping on smoking cessation or reduction demonstrated a broader variety of flaws, yet common themes emerged. Our analysis of common flaws and limitations may guide future researchers to conduct more robust studies and, concomitantly, produce more reliable literature. There are countless sources of good building-block information that can be pieced together to provide knowledge. To provide useful information, research questions should be precise, contingent, nuanced and focused on quantifications that are motivated by externally defined questions. Such research necessitates proactive design, rather than utilizing already existing, but not fit-for-purpose, datasets.