Introduction

Although cigarette smoking among adults in the United States (US) has declined significantly since the release of the 1964 Surgeon General’s Report on Smoking and Health (US Department of Health and Human Services 2014), the use of electronic cigarettes (e-cigarettes) has increased in recent years (King et al. 2015). During 2013–2014, 17.0 % of US adults reported smoking cigarettes every day or some days, while 6.6 % of US adults reported using e-cigarettes every day, some days or rarely, compared with 18.0 and 4.2 % of US adults in 2012–2013, respectively. A rising trend of e-cigarette use has also been found in other countries in Europe (Andler et al. 2016; Filippidis et al. 2016; Goniewicz et al. 2016) and Asia (Lee et al. 2016). Despite a steady increase in e-cigarette use, the long term health effects of e-cigarettes are not well-understood, and research has yielded mixed conclusions regarding the harm of e-cigarettes to public health (Wellman and O’Loughlin 2016).

Advertisement of e-cigarettes on social media platforms could influence social media users to begin using e-cigarette products and might impact the general public’s attitudes toward e-cigarettes and smoking (Dai and Hao 2016a, b). Exposure to e-cigarette advertisements, news, and entertainment media might influence and even reduce public support for e-cigarette regulation (Durkin et al. 2012, Mello et al. 2015, Tan et al. 2015a, b). Over the past several decades, combustible tobacco products have been under strict regulation of marketing and promotion in order to inform the public about health risks associated with smoking, to protect vulnerable groups from targeted marketing of tobacco products, and to create a healthy breathing environment in public spaces. Ample evidence shows that e-cigarette sales, adverting expenditures, and use have exponentially increased over the last 5 years (Giovenco et al. 2015). Total e-cigarette advertising expenditures through media channels, such as magazines, television (TV), newspaper, and Internet, tripled from $6.4 million in 2011 to an estimated $115 million in 2014 (Centers for Disease Control and Prevention 2016).

Awareness of e-cigarettes has grown significantly, from approximately 4 in 10 adults in 2010 to 8 in 10 adults in 2013 (King et al. 2015). Advertisements often deliver messages that can be misleading, suggesting that e-cigarettes are less harmful than combustible cigarettes and can be used for smoking cessation (Ramamurthi et al. 2015). Repeated exposure to such messages through social media platforms could offset massive efforts and education programs in the public health battle against smoking. Youth, minority populations, and disadvantaged groups in particular could become vulnerable targets by these advertisements (Dai and Hao 2016a).

Recent studies have analyzed e-cigarette commercial Twitter data (Huang et al. 2014, Jo et al. 2015) (Kim et al. 2015) and Twitter communications about e-cigarettes and smoking cessation (van der Tempel et al. 2016). These studies provide valuable information regarding e-cigarette brand marketing, price discount promotion strategies, and communication messages on Twitter. However, these studies analyzed e-cigarette commercial Twitter data without geographic information. Mixing tweets from all countries and different regions in the US could lead to inference bias, since tobacco jurisdiction and socio-economic factors vary by geographic regions (Dai and Hao 2016b). We hypothesize that the prevalence of e-cigarette advertisements varies by region in the US, which could be affected by smoking prevalence, distribution of vulnerable populations, and perceptions of smoking-related risks. Our study also seeks to confirm whether jurisdiction variations on smoking and e-cigarette use could impact e-cigarette advertisements. Understanding these regional variations and socio-economic disparities could help identify communities that are vulnerable to e-cigarette advertisements and assist researchers and policy makers in developing strategies and denormalization campaigns in the public health battles against vaping and smoking in the US specifically, and other nations more generally.

This study links social media data with geographic information systems, social demographic data, and tobacco regulation information at the US state level. The study further examines socio-economic variables across four categories: gender, race, education, and income. State regulations of tobacco were analyzed through regulation impact, cigarette tax, and prevalence of youth smoking.

Methods

Twitter data

Twitter currently has over 400 million active users across the world, and over 500 million “tweets” (short messages consisting of fewer than 140 characters) are posted on Twitter each day. A previous study has shown that Twitter is used more often by members of minority groups (27 % of African American online adults and 25 % of Hispanic online adults, compared to 21 % of White online adults) and younger generations (37 % of online adults aged 18–29 years, compared to 12 % of online adults aged 54–64 years) (Duggan et al. 2015). Twitter provides public access to its data through an Advanced Programming Interface (API) with a random sample of approximately 1 % of all tweets in real time.

We used the StreamR package of the R software to retrieve tweet information. The metadata include text, time when the tweet was sent, user’s language, number of favorites and followers, expanded URL if included in the tweet, and the author’s location, along with the geocode of latitude and longitude if the users chose to enable this feature. We further searched keywords including “e-cig”, “e-cigarette”, “e-liquid”, “e-cigarette”, “vape”, “vaping”, “vapor”, and “vaporizer” to collect tweets related to this study. Further text mining techniques were applied to remove duplicate tweets and exclude non-English tweets.

E-cigarette commercial tweets are messages on Twitter that promote and market e-cigarette products. Some examples are “Win 3 40 ml bottles of Naked Fish e-liquid—http://t.co/NPcSS8Pfxu”, “Mt Baker Vapor brings you the highest quality electronic cigarette e-liquid at the lowest prices. https://t.co/DySY47uQzb”, “buy 2 get 1 free Glass globe Bulb Tank e hookah vape pen atomizer http://t.co/8GpkEBu3WK”, “Buy a vape and get 10 % OFF your order using this special discount code: STONERXPRESS http://t.co/QMG5RPdzbo”. Most of these tweets include a branded promotion message, pricing or discount information, or URLs linking to commercial websites. Commercial tweets were identified using the multi-label Naive Bayes machine learning algorithm, a method that has been used in previous studies to successfully identify commercial tweets on Twitter (Huang et al. 2014; Dai and Hao 2016b).

We imputed the geographic locations of tweeters based on their self-reported city, state or country location in the Twitter metadata (Dai and Hao 2016b). With this approach, we were able to identify 38 % of commercial tweets with country information (101,041/267,995) and 94 % of commercial tweets with the state information in the US (72,205/77,202). This approach significantly increases the sample size when analyzing the social media data at the geographic level and enables us to analyze the socio-economic disparities. The sample sizes of commercial tweets with geographic locations are presented in Fig. 1. The prevalence of commercial tweets was calculated using the following formula:

$${\text{Prevalence }}\,{\text{of }}\,{\text{commercial}}\,{\text{tweets }}\,{\text{per}}\, 10,000\, {\text{persons}}\, {\text{per }}\,{\text{day}}\, = \,\frac{{{\text{number }}\,{\text{of}}\, {\text{commercial}}\, {\text{tweets}}\, {\text{in }}\,{\text{a }}\,{\text{state}}}}{{{\text{state}}\, {\text{population}}\, \times \,{\text{days}}\, {\text{of}}\, {\text{tweet }}\,{\text{collection}}\, \times \,1\,\% }} \times 10,000,$$

where the state population as of July 1, 2014 is from census data, tweet collection period is 84 days, and Twitter API data presents approximately 1 % of all real-time tweets.

Fig. 1
figure 1

Study design and sample sizes

State regulation and smoking prevalence data

To evaluate the impact of state regulations and smoking prevalence data in the US, we considered four additional variables (Tax Burden on Tobacco 2014; Bach 2015; Ungar et al. 2015):

  • Tobacco impact score (derived from adult and youth smoking rates and public perception about the risks of smoking. States with higher ranking scores have better impacts on tobacco control);

  • State cigarette tax rate (state tax average of cigarette tax (per pack) as of October 1, 2015);

  • Adult smoking rate;

  • Youth smoking rate.

Socio-economic data

The American Community Survey (ACS) is a statistical survey conducted by the US Census Bureau and it is sent to approximately 3.5 million addresses each year. The survey gathers information related to demographic, social and economic status from US residents. We selected key variables from the 2014 ACS to investigate the potential socio-economic impacts on e-cigarette commercials. We examined the demographic and socio-economic variables in five categories: age, gender, race, education, and income. These selected variables are:

  • Persons under 18 years, percent, July 1, 2014;

  • Female persons, percent, July 1, 2014;

  • Black or African American alone, percent, July 1, 2014;

  • Bachelor degree or higher, percent of persons aged 25 years+, 2009–2013;

  • Per capita income in past 12 months (in 2013 dollars), 2009–2013.

Statistical methods

In the univariate analysis, the association with the prevalence of e-cigarette commercial tweets was evaluated using the Spearman correlation. We choose this nonparametric test in order to prevent the bias from extreme values. Spearman correlation coefficients along with p-values are reported. In the scatter plots with regression line, the shaded area represents 95 % confidence limits for mean predicted values. One US state with extremely high prevalence of commercial tweets (Maryland: 53.7) was removed for better visualization. Because we used the Spearman correlation, which is a nonparametric robust method, results are consistent with and without extreme values from Maryland. Since state-level regulation, smoking prevalence, and socio-economic status are correlated, we performed multivariate analysis using a general linear model. Statistical analyses were performed using SAS 9.4 (Cary, NC), and p value <0.05 was considered statistically significant.

Results

Summary of commercial tweets

A total of 757,167 tweets from July 23 to October 14, 2015 were collected. Tweets that were irrelevant to e-cigarettes were removed (126,012) from further analysis. Among 631,155 tweets related to e-cigarettes, 42.5 % (267,995) were classified as commercial tweets and the remaining 57.5 % (363,160) were classified as organic tweets that expressed personal opinions about e-cigarettes. The potential reach of commercial tweets, calculated by summing the total number of followers for each tweet, was enormous (8,304,957 from approximately 1 % of Twitter API feed).

Commercial tweets were sent by 94,573 users, and each user sent out an average of 2.8 tweets. The top ten users sent out 22.6 % of total commercial tweets, which suggests that a small number of highly active users might use an automatic process to tweet on a bulk level. 72.1 % of commercial tweets included a URL.

Prevalence of commercial tweets

The prevalence of commercial tweets varied dramatically by state (see note in Table 1). The top five states with the highest prevalence of e-cigarette commercial tweets were Maryland (53.7), District of Columbia (6.8), Tennessee (4.7), Alabama (4.1), and New York (3.5); the five states with the lowest e-cigarette commercial rates were Wyoming (0.14), Mississippi (0.16), West Virginia (0.34), Montana (0.35), and Arkansas (0.44).

Table 1 Tobacco regulation and socio-economic status among US states 2015

States with higher tobacco control impact scores had significantly higher prevalence of e-cigarette commercials (r = 0.54, p < 0.0001) (Fig. 2a). In Table 1, states were divided into quartiles with prevalence ranging from low to high. The top quartile of states had more than three times the number of e-cigarette commercials as the bottom quartile of states [for instance, Maryland (53.7), District of Columbia (6.81), and Tennessee (4.71) versus Wyoming (0.14), Mississippi (0.16), and West Virginia (0.34)]. The tobacco impact score increased along with the prevalence of e-cigarette commercials (15.7 in the 1st quartile states, 20.4 in the 2nd quartile states, 34.3 in the third quartile states, and 34.3 in the fourth quartile states). Similarly, we observed that higher prevalence of e-cigarette commercials were associated with states with lower youth smoking rates (r = −0.39, p = 0.005) (Fig. 2b).

Fig. 2
figure 2

Prevalence of e-cigarette commercial tweets in association with state-level tobacco control (a) and youth smoking rate (b), United States, 2015. The state names corresponding to the state abbreviations are listed in the footnote of Table 1

E-cigarette commercial tweets were associated with states with higher minority populations, especially African American populations (r = 0.34, p = 0.01, Fig. 3a). The percentage of African Americans in a state’s population correlated positively with the increase in prevalence of e-cigarette commercials in that state (6.6 % in the 1st quartile states, 11.7 % in the 2nd quartile states, 12.6 % in the third quartile states, 16.1 % in the fourth quartile states, Table 1).

Fig. 3
figure 3

Prevalence of e-cigarette commercial tweets in association with socio-economic status, United States, 2015. The state names corresponding to the state abbreviations are listed in the footnote of Table 1

States with higher education rates were more likely to receive commercial e-cigarette tweets (r = 0.40, p = 0.003, Fig. 3b). This finding is consistent with the previous finding that Twitter users as a group tend to be more highly educated than the general population (Duggan et al. 2015). States with higher per capita income were associated with higher prevalence of e-cigarette commercials (r = 0.40, p = 0.004), suggesting that increases in e-cigarette advertising are aimed at expanding market share among states with higher incomes.

We performed a multivariate analysis in prediction of commercial tweets by taking all explanatory variables into account (Table 2). The results show that state tobacco control impact is significantly associated with the prevalence of commercial tweets (β = 0.03 ± 0.01, p = 0.02) after adjusting for cigarette tax, youth smoking prevalence, and other socio-economic factors. There is no significant association between cigarette tax and prevalence of e-cigarette commercials.

Table 2 Multivariate analysis of prevalence of e-cigarette commercial tweets, United States, 2015

Discussion

Coinciding with the significant growth in e-cigarette sales and use over the past several years, e-cigarette makers continue to spend heavily on advertising to promote their products, especially on social media sites and other online channels. We observed an increase of e-cigarette commercial tweets on Twitter in 2015 as compared to data collected in 2012 May–June (Huang et al. 2014). Because Twitter is a social network platform with a large population of young users, the increased prevalence of e-cigarette commercial tweets exposes this population to e-cigarette advertising and potentially to distorted information about the health effects of vaping. Past studies of the effects of tobacco advertising on youth smoking have shown that youth are vulnerable to tobacco companies’ advertising promotions and more receptive to emerging tobacco products (Davis et al. 2008). Similar results have been found for teens exposed to e-cigarette advertising (Farrelly et al. 2015; Dai and Hao 2016a, b). Younger users of Twitter and other social media, therefore, might be especially vulnerable to the pervasive influence of e-cigarette advertising. Due to their heightened vulnerability, parents and policy-makers incur a duty to protect them from the deleterious effects such influences may have on their health. Our findings speak in favor of increased federal and state-level regulations in the US of e-cigarette advertising content on social media. The findings also provide some basis for concern that other nations might experience a similar increase in e-cigarette advertising to targeted populations through social media.

In 2014, the World Health Organization (WHO) called for regulations on e-cigarette advertising, promotion, and sponsorship to prevent the targeting of youth, non-smokers, or individuals who do not currently use nicotine (WHO 2014). While non-binding, the WHO’s recommendation implies that it views e-cigarettes as potentially harmful to some of these populations. In August 2016, the US Food and Drug Administration began prohibiting the sale of e-cigarettes to youth under 18 years of age, and began to rollout regulations on e-cigarette marketing and sales, which currently includes prohibition of modified risk claims in advertising and will eventually require the display of health warnings on e-cigarette packaging. Around the same time (May 2016), e-cigarettes came under new regulation by the revised European Union (EU) Tobacco Products Directive, which prohibits cross-border advertising and promotion of e-cigarettes (European Commission 2016). Despite domestic and international efforts to regulate e-cigarette marketing and sales, concern remains that e-cigarette commercials will continue to proliferate and their potentially misleading content will negatively affect public health, particularly through social media. The addictiveness of nicotine, the potential harm from chemicals in e-cigarettes, incidents of liquid nicotine poisoning, and possible hazard of device have been commonly reported as risks in the literature on e-cigarette and vapor product use (Hajek et al. 2014). However, these risks tend to be downplayed or altogether omitted in e-cigarette commercial tweets, and it is not clear how the aforementioned regulations would apply to advertising through social media. The increased prevalence of social media commercials over the past few years, therefore, portends the wide dissemination of misinformation about the risks of e-cigarettes and vapor products. The potentially adverse impact this will have on individual and public health should be taken into consideration when designing regulatory policies on e-cigarette advertising.

Our study is the first of its kind to examine the regional variation of e-cigarette commercial tweets in the US Understanding regional variations and socio-economic disparities could help develop denormalization campaigns to lessen the negative impact of e-cigarette advertisements (Durkin et al. 2009). We found that e-cigarette advertisements had higher prevalence among states with better tobacco control (r = 0.54, p < 0.0001). For instance, in early 2015 the California Department of Health declared that e-cigarettes are mass-marketed products with serious health consequences. Accordingly, the California Department of Health launched new television and digital ads as a part of an anti-vaping campaign to inform and warn the public about health risks associated with e-cigarette use (Sifferlin 2015). Our results show that the prevalence of commercial tweets in California is more than four times higher than the prevalence in Kentucky (2.53 in California vs. 0.60 in Kentucky). California is ranked first in the US for tobacco control, with a low adult smoking rate (12.5 %), low youth smoking rate (10.7 %), high cigarette tax ($0.87), and high FY2015 funding for state tobacco control programs ($58.9 million), while Kentucky is ranked in last for tobacco control, with a high adult smoking rate (26.5 %), high youth smoking rate (12.1 %), low cigarette tax ($0.60), and low FY2015 funding for state tobacco control programs ($2.5 millions) (Bach 2015).

The findings from our study are consistent with a recent search query surveillance study that found a positive association between the popularity of electronic nicotine delivery systems and stronger tobacco control (Ayers et al. 2016). The high prevalence of commercial tweets in states with stronger tobacco regulation may be part of a marketing strategy by tobacco companies and e-cigarette manufacturers that is designed to circumvent smoking regulations and restrictions. We suggest this for three reasons. First, in states with better tobacco control, use of conventional cigarettes has declined dramatically, and this might prompt tobacco companies to promote and expand market share in e-cigarettes and vapor products. Second, states with better tobacco control often have strict regulations on tobacco use in public places. However, e-cigarette use in public spaces typically is not subject to the same restrictions as conventional tobacco products. Third, states with better tobacco control often have strong anti-tobacco campaigns and high tobacco tax to motivate smoking cessation. E-cigarettes are often considered to be less harmful than combustible cigarettes and therefore a better alternative among smokers. All these factors may have contributed to the surge in e-cigarette advertising on Twitter among states with better tobacco control.

There is some evidence that e-cigarette use increases the likelihood of eventual use of conventional tobacco products, especially among youth (Primack et al. 2015) (Leventhal et al. 2015; Wills et al. 2016). The addictiveness of nicotine and similarity between e-cigarettes and combustible cigarettes could motivate e-cigarette users to try other tobacco products. Thus, the high prevalence of e-cigarette commercial tweets in states that have established effective anti-smoking campaigns could set back decades of advances made by public health initiatives to curb smoking.

Using social media, geographic information mapping, and The American Community Survey, we examined the socio-economic impact of the prevalence of e-cigarette commercials on Twitter. E-cigarette commercial tweets had higher prevalence among states with larger minority populations, especially African American populations. As shown in Table 1, the percentage of e-cigarette commercial tweets among African Americans increased along with the prevalence of e-cigarette commercials (6.6 % in the 1st quartile states, 11.7 % in the 2nd quartile states, 12.6 % in the third quartile states, 16.1 % in the fourth quartile states). Socio-economic disparities in advertising and the use of conventional cigarettes have been identified in literature (Hiscock et al. 2012), and many efforts have been taken to eliminate these disparities (Durkin et al. 2009; Hill et al. 2014). Traditionally, disadvantaged populations composed of lower income and larger minority groups have higher smoking prevalence, higher exposures to secondhand smoke, and lower cessation rates. The correlation between the adult smoking rate hitting a record low in 2014 and the rapid expansion of the e-cigarette market raises the possibility that e-cigarette commercials will target these populations in order to grow market share and circumvent smoking restrictions. Thus, communication and promotion disparities may further widen health inequalities due to increased e-cigarette and tobacco use over time.

There are some limitations to our findings. First, our research is based on a sample of tweets extracted with selected search keywords related to e-cigarettes. A previous study has shown that Twitter users have significantly grown among specific demographic groups, including Whites (Non-Hispanic), Hispanics, younger generations, and the college-educated (Duggan et al. 2015). Given the sample biases inherent in the Twitter data, the findings from this study should be interpreted with an appropriate degree of caution, and they may be reflective of perceptions of certain populations. Second, we used the self-reported metadata to extract the geo-location information. However, self-reported information might be inaccurate with respect to users’ current locations. Continuous improvement of geo-coding and information extraction techniques could help mitigate these limitations in future study.

Despite these limitations, our study underscores the need for consideration of mass and social media campaigns to disseminate accurate information about the potential health effects of e-cigarette use and to promote regulatory policies for marketing, distribution, and open-space use of e-cigarettes and vapor products. Additional efforts might include health education about potential harms from vaping, dissemination of research findings about health risks from vaping, and anti-vaping campaigns to guide public perception, knowledge, and attitudes about e-cigarettes. Further, our study suggests that greater action is needed at the federal and state levels to introduce and enact uniform regulatory policies aimed at protecting vulnerable populations from misinformation and harm due to current e-cigarette advertising trends on social media.