Introduction

By damaging general health and increasing the risk of several chronic diseases, smoking remains among the leading causes of mortality worldwide1,2,3,4. Despite the progress in smoking reduction made by the World Health Organization’s Framework Convention on Tobacco Control (WHO FCTC)3,5, a recent worldwide analysis predicted that over one billion individuals will remain mokers in 2025 if current smoking trends remain constant6. In order to reduce smoking effectively, tobacco control policies and cessation support should be based on a comprehensive characterization of a population’s smoking behaviour and the associated underlying factors. For example, the WHO FCTC includes a requirement for the ‘surveillance of the magnitude, patterns, determinants and consequences of tobacco consumption’5, and in the US, the identification and elimination of tobacco-related disparities is listed as one of the four milestones for comprehensive tobacco control7.

Although a number of studies have identified smoking trajectories and assessed the underlying factors8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, lifelong designs have been scarce. Smoking is commonly initiated in adolescence or early adulthood4,30, and seems to associate with individual and family characteristics in early life29, which is why most previous studies have focused on these periods. However, more comprehensive approaches extending over the entire life course are necessary in order to detect smoking patterns also in later life. Lifelong designs seem relevant as several previous reports have concluded that the duration of smoking (i.e., the number of years smoked over the life course) predicts the risk of chronic obstructive pulmonary disease and several smoking-related cancers more accurately than the intensity of smoking (i.e., the average number of cigarettes smoked per day) or pack-years (i.e., duration multiplied by intensity)31,32,33,34,35. Recent evidence has associated even low-intensity smoking with increased all-cause mortality in the long term36, indicating that accurate characterization of a population’s lifelong smoking patterns is likely to benefit the assessment of subsequent risk of comorbidity and mortality.

In this study, we aim to provide a detailed characterization of the lifelong smoking trajectories of a large unselected Northern Finnish birth cohort population. Using extensive data on the study population’s smoking behaviour, we first profile each individual’s smoking history from the age of 5 to 47 in a year-by-year manner and conduct a latent class trajectory analysis on the population data. We then characterize the identified smoking trajectory classes in terms of sociodemographic and lifestyle variables collected at three time points over the lifelong follow-up of the cohort. Lastly, we perform a longitudinal multivariable comparison between the heaviest smokers and less intense smokers, in an attempt to reveal specific sociodemographic and lifestyle factors that predict non-smoking and discontinued smoking. We expect future initiatives to benefit from our data by exploiting the identified predictors as direct targets of intervention, or as a means of identifying individuals who may benefit from such interventions.

Methods

Study sample

The material of this study stems from the prospective population-based Northern Finland Birth Cohort 1966 (NFBC1966) study37. In 1965–1966, pregnant women who resided in Northern Finland (provinces of Oulu and Lapland) with expected dates of delivery during 1966 were recruited into the cohort. Initially, the study covered 12 231 children, corresponding to 96.3% of all births in the area at the time. Major follow-ups took place in 1980, 1997–1998 and 2012–2014, when the NFBC1966 members were 14, 31 and 46 years old, respectively. Figure 1 provides a timeline of the follow-ups and a summary of the data used in the present study. The present sample was comprised of individuals who had responded to the smoking questionnaires at all three follow-ups (n = 5797). Exclusions were solely based on missing data.

Figure 1
figure 1

Timeline of the study.

Smoking behaviour

The NFBC1966 cohort members self-reported their tobacco smoking behaviour at several time points during the lifelong follow-up of the cohort (Fig. 1). In the 31- and 46-year follow-up questionnaires, all participants were asked whether they had ever smoked tobacco during their lives. Tobacco was defined as filter cigarettes, other cigarettes, pipes and cigars. The individuals who responded positively were asked to report the age(s) at which they had started smoking (i.e., starting age) and, in a subsequent question, also the age(s) at which they had possibly quit (i.e., quitting age). Starting and quitting ages were reported to the accuracy of one year. For ages of ≤ 31, the values reported in the 31-year questionnaire were the primary source of data. For ages of > 31, or if the 31-year data were missing, we used the 46-year values. Based on the starting and quitting ages, we created 43 binary variables reflecting the annual smoking status of each individual (smoker/non-smoker) from the age of 5 to 47. At each time point t, an individual was considered a ’smoker’ if the following conditions were fulfilled: age at initiation ≤ t, and age at cessation ≥ t or missing.

Sociodemographic and lifestyle characteristics

As part of the follow-ups at the ages of 14, 31 and 46, the NFBC1966 members were asked about several background characteristics regarding their sociodemographic status and lifestyle. Figure 1 summarizes these data.

Education

Basic and additional education were elicited at the ages of 31 and 46. We took into account the most recent data on each individual. Basic education (which includes the Finnish compulsory education of nine years and an optional three years of high school leading to the matriculation examination) was reported using the following options: ‘(1) Less than nine years of compulsory school, (2) Compulsory school, (3) Completion of matriculation examination.’ Additional education (defined as the highest level of other education completed) was reported from the following choices: ‘(1) None, (2) Occupational course, (3) Vocational school, (4) Other lower-level institute/academy/college, (5) Polytechnic, (6) University, (7) Other, (8) Not yet completed.’ An individual was considered to have ‘low education’ if they had not completed secondary or tertiary education (i.e., matriculation examination, vocational school, polytechnic, or university).

Employment

Participants self-reported their parents’ employment status at the age of 14 and own employment status at the ages of 31 and 46. At the age of 14, parental employment was reported by completing two statements: ‘My mother is… (1) Staying at home (housewife), (2) Working outside of home, (3) Currently unemployed, (4) On a sick leave, (5) Retired’ and ‘My father is… (1) Working, (2) Unemployed, (3) On a sick leave, (4) Retired.’ The participants were also asked whether they lived with their mother and/or father. If the participant lived in a family with no employed parent(s) (including stay-at-home mothers and permanently retired parents), the family was considered ‘unemployed’.

At the ages of 31 and 46, the participants were asked to respond to the question: ‘Which of the following describes your current employment status best?’ The response options at the age of 31 were: ‘(1) Permanent full-time employee, (2) Temporary full-time employee, (3) Part-time employee, (4) Self-employed, (5) Entrepreneur, (6) Full-time student, (7) Unemployed, (8) Employed/educated through labour market support, (9) Laid off or reduced working hours, (10) Maternity/paternity leave or parental leave, (11) Retired, (12) Other.’ The response options at the age of 46 were: ‘(1) Permanent full-time employee, (2) Permanent part-time employee, (3) Temporary full-time employee, (4) Temporary part-time employee, (5) Full-time self-employed or entrepreneur, (6) Part-time self-employed or entrepreneur, (7) Full-time student, (8) Part-time student, (9) Unemployed for < 6 months, (10) Unemployed for 6–12 months, (11) Unemployed for > 12 months, (12) Employed/educated through labour market support, (13) Laid off or reduced working hours, (14) Maternity/paternity leave or parental leave, (15) Retired, (16) Caring for my own household, (17) Other.’ At both time points, an individual was considered ‘unemployed’ if they were not working or studying full-time or part-time, self-employed or entrepreneur, employed/educated by labour market support, or on parental leave.

Obesity

Participants underwent objective height and weight measurements according to previously described methods as part of routine growth monitoring in childhood and adolescence, and as part of the clinical examinations at the ages of 31 and 4638. At each follow-up point, body mass index (BMI, kg/m2) was calculated as weight (kg) divided by height (m) squared. In accordance with the definitions of WHO, ‘obesity’ was defined as a crude BMI of > 26.5 kg/m2 among 14-year-old boys, > 27.8 kg/m2 among 14-year-old girls, and ≥ 30 kg/m2 among adult men and women39,40. For 14-year-olds, we used the BMI cut-offs for individuals aged 14.5 years as the data collections were organized over a longer period of time during the year.

Physical activity

Participants self-reported leisure-time physical activity at the ages of 14, 31 and 46 by responding to the following questions. At the age of 14: ‘How often do you participate in sports outside school hours? (1) Daily, (2) Every other day, (3) Twice a week, (4) Once a week, (5) Every other week, (6) Once a month, (7) Generally not at all.’ At the ages of 31 and 46: ‘How often do you participate in brisk physical activity/exercise [defined as causing at least some sweating and breathlessness] during your leisure time? (1) Daily, (2) 4–6 times a week, (3) 2–3 times a week, (4) Once a week, (5) 2–3 times a month, (6) Once a month or less often.’ At each time point, an individual was considered ‘inactive’ if they participated in sports/physical activity less than once a week.

Alcohol consumption

Participants self-reported alcohol consumption at the ages of 14, 31 and 46 by responding to the following questions. At the age of 14: ‘Do you consume alcohol [defined as beer or any other drink containing alcohol]? (1) Never, (2) Experimented once, (3) Experimented a few times, (4) Yes, on a monthly basis, (5) Yes, on a weekly basis.’ At the ages of 31 and 46: ‘Do you currently consume any alcoholic beverages (e.g., beer, cider, low-alcohol wines, wine or spirits) even occasionally? (1) I have never consumed alcohol, (2) No, I have quit drinking, (3) Yes, less than once a month, (4) Yes, at least once a month.’ At each time point, an individual was considered a ‘regular drinker’ if they consumed alcohol at least once a week.

Substance addiction

Participants self-reported substance-related behaviour (defined as other than alcohol or tobacco) at the ages of 14, 31 and 46. At the age of 14, the participants were asked to complete the statement: ‘Other substances… (1) Never experimented, (2) Experimented once, (3) Experimented several times, (4) I use regularly.’ Regular users were considered to have ‘substance addiction’. At the age of 31 and 46, the questionnaires elicited directly whether the participant had a ‘substance addiction’ (yes/no).

Statistical analysis

Modelling of lifelong smoking behaviour

We profiled the lifelong smoking behaviour of the sample by means of latent class growth modelling (LCGM). LCGM is a semi-parametric statistical modelling method that aims to reveal latent groups of individuals (i.e., classes) following a distinct pattern of change (i.e., trajectory) over time41. The LCGM analysis was performed using the SAS version 9.4 (SAS Institute Inc., Cary, NC, USA), the PROC TRAJ macro and the logistic LOGIT model for binary data42. First, models with one to six classes were fitted to the data, after which the most adequate model was chosen according to the following measures of model adequacy: (1) Bayesian information criterion and Akaike information criterion (BIC and AIC, respectively; lower values indicate better fit); (2) Posterior membership probability (averages should exceed 0.70); (3) Absolute and relative class sizes, also taking into consideration the subsequent analyses; and (4) Clinical significance of the models41. Once the most suitable model was selected, the participants were classed according to the highest posterior membership probabilities41. As the smoking trajectories of men and women were highly similar in our data (data not shown), both sexes were modelled together in order to obtain equivalent definitions of each smoking trajectory among men and women. This approach was also supported by the findings of a previous study, which detected no difference in the latent class structure between the multi-decadal smoking trajectory models (age 18 to 50) of men and women26.

Profiles of smoking trajectory classes and between-class comparisons

In order to further describe the identified smoking trajectory classes, we presented the distributions of all background variables for each class separately. As the variables were dichotomous, frequencies and percentages were presented.

To address the associations between background variables and smoking trajectory class in a longitudinal, multivariable analysis, we used a generalized estimating equations (GEE) approach. An extension to regression-based methods, GEE is able to correct for correlations within data (such as temporal dependencies due to repeated measurements) by means of a working correlation matrix43,44. Here, we used the binary logistic main-effects GEE model with ‘exchangeable’ working correlation matrix structure. Each background variable had their own model, with smoking trajectory class as the main predictor and the other background variables as covariates. Sex, education, and smoking trajectory class were fixed (i.e., time-invariant) variables, whereas all the other variables were considered to be repeated measurements (i.e., records from three time points) and thus nested within individuals. As we aimed to reveal specific factors which predict non-smoking and discontinued smoking, we chose the heaviest smokers’ class to be compared with the other smoking trajectory classes. Exponentiated regression coefficients (i.e., odds ratios, ORs), their 95% Wald confidence intervals (CIs) and the corresponding P values were documented from the data output.

Analysis of representativeness

To address the potential selection bias associated with the long follow-up and consequent attrition, we studied the differences between the present smoking trajectory sample and the rest of the NFBC1966 population. The background variables were compared between the sample and those excluded by means of Chi square tests.

Except for the trajectory modelling, we conducted all the statistical analyses using SPSS version 26 (IBM, Armonk, NY, USA). The threshold for statistical significance was set at P = 0.05.

Ethical considerations

The data were pseudonymized by the NFBC1966 data experts prior to analysis. Informed consent was collected at each stage from the participants and/or their legal guardians. The Declaration of Helsinki was followed, and ethical approvals were obtained from the Ethics Committee of the Northern Ostrobothnia Hospital District (12/2003; 94/2011). The datasets generated and analyzed during the current study are not publicly available due to local privacy regulations but are available from the NFBC Project Center for researchers who meet the criteria for accessing confidential data.

Results

Study sample

The study sample consisted of 5797 individuals, of whom 44.0% were men and 56.0% women. The full study population’s annual prevalence of smoking from the age of 5 to 47 is presented in Fig. 2. After the age of 10, the prevalence steeply increased, reaching its maximum of 52.5% at the age of 20, with a mild but steady decrease thereafter. The cumulative prevalence of smoking (i.e., percentage of ever-smokers) in the sample was 60.8%. Supplementary Table 1 shows the comparison of the present sample to those excluded. The drop-outs were characterized by varyingly higher rates of male sex, low education, unemployment, obesity, physical inactivity, regular drinking, and substance addiction than the present sample.

Figure 2
figure 2

Overall prevalence of smoking from age 5 to 47 in the study population (n = 5797). Supplementary Table 2 presents the annual smoking prevalences in numerical format.

Table 1 Fit statistics from trajectory models with one to six classes.

Smoking trajectory analysis

Of the LCGM models with one to six smoking trajectories (fit statistics listed in Table 1), we selected the six-class model as the most adequate. It had considerably lower BIC and AIC values than the corresponding models with one to five groups, and importantly, was the only model to make a distinction between never-smokers and youth smokers. Figure 3 presents the six identified smoking trajectories. Each of the trajectories was considered to represent a clear, distinct pattern in terms of lifelong smoking behaviour, and the corresponding classes were subsequently named as follows: never-smokers (relative class size 41.0% of the sample, n = 2376), youth smokers (12.6%, n = 730), young adult quitters (10.8%, n = 627), late adult quitters (10.5%, n = 611), late starters (4.3%, n = 252) and lifetime smokers (20.7%, n = 1201).

Figure 3
figure 3

Six distinct trajectories for lifelong smoking behaviour among the study population (n = 5797). Supplementary Table 2 presents the annual smoking prevalences of each class in numerical format.

Profiles of smoking trajectory classes

Background characteristics of the smoking trajectory classes are presented in Table 2. Generally, the smokers’ classes tended to include more individuals who were male, had low education level, were unemployed, obese, physically inactive and regular drinkers. Lifetime smokers and late adult quitters (i.e., long-term smokers) showed the highest contrast to never-smokers.

Table 2 Sociodemographic and lifestyle characteristics among the smoking trajectory classes.

Comparison of lifetime smokers to other smoking trajectory classes

To reveal specific sociodemographic and lifestyle factors that discriminate between heavy smokers and less intense smokers, we compared the heaviest smokers’ class (i.e., lifetime smokers) to the other smoking trajectory classes. The corresponding results from multivariable GEE models are presented in Table 3. There was an obvious distinction between lifetime smokers and never-smokers in the GEE models, as male sex, low education, unemployment, physical inactivity and regular drinking were each independently and significantly associated with increased odds of belonging to the lifetime smokers’ class. Youth smoking was associated with female sex, employment and physical activity. Successful adult quitting was associated with male sex and obesity, whereas unemployed and physically inactive had higher odds of belonging to the lifetime smokers’ class. Late starting was associated with employment and physical activity.

Table 3 Multivariable generalized estimating equations (GEE) analysis addressing the association of sociodemographic and lifestyle characteristics with smoking trajectory.

Discussion

Among 5797 Northern Finns followed up for 46 years, this population-based birth cohort study identified six distinct trajectories for smoking behaviour across the life course: never-smokers, youth smokers, young adult quitters, late adult quitters, late starters, and lifetime smokers. Generally, the smokers’ classes tended to include more individuals who were male, had lower socioeconomic status and unhealthier lifestyle. Multivariable comparisons between lifetime smokers and the other smoking trajectory classes identified unemployment and physical inactivity as significant predictors of lifetime smoking relative to any other class. Female sex increased the odds of never-smoking and youth smoking, whereas male sex increased the odds of adult quitting.

According to worldwide predictions, over one billion individuals will remain smokers in 2025 if current smoking trends remain constant6, leading to nearly 500 million smoking-related deaths between 2000 and 205045. Although the Nordic countries have a relatively low prevalence of smoking on the global scale4, it has been estimated that up to 15% of health care expenditure in high-income countries is attributed to smoking46. Thus, it is clear that zero smoking is the ideal for both the individual and society1. In 2018, 15% of Finnish working-aged men and 13% of women were current smokers, respectively; the prevalence figures have clearly decreased over time47. However, the present study showed that 60% of the now middle-aged Northern Finnish population had smoked at some point in their lives, and that 25% remained smokers at midlife, emphasizing the need for a detailed characterization of smoking behaviour and underlying factors specifically among this population. In this study, we were able to exploit an explicit dataset that enabled the assessment of annual smoking status of each individual from the age of 5 to 47.

The present analysis identified six trajectories for lifelong smoking behaviour. One of these represented never-smokers, two current smokers (lifetime smokers and late starters), and three ex-smokers (youth smokers, young adult quitters, and late adult quitters). As reflected in their names, each of the six trajectories represented a clear, distinct smoking pattern. Each trajectory also showed a reasonable class size and high average posterior membership probability, indicating that the final LCGM model was robust and the identified trajectories were truthful. Most previous studies have also described similar trajectories, depending on the study population and age period modelled8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29; for example, studies focusing on adolescence and early adulthood have commonly presented more detailed trajectories regarding the initiation of smoking but have correspondingly lacked trajectories representing those who quit in later life.

The present analyses regarding the association of smoking trajectory with sociodemographic and lifestyle characteristics give ground for several remarks. Firstly, our data confirm the previously known coexistence of low socioeconomic status, unhealthy lifestyle, and smoking; our study setting enabled us to demonstrate this in a longitudinal manner from adolescence to midlife. Secondly, our multivariable GEE models which compared lifetime smokers with other smoking trajectory classes revealed statistically significant differences between classes. Youth smoking (i.e., smoking limited to late adolescence and early adulthood) was associated with female sex, employment and physical activity. Young and late adult quitting (i.e., successful quitting in adulthood) were positively associated with male sex and negatively associated with unemployment and physical inactivity. Late starting (i.e., starting in adulthood) was associated with employment and physical activity.

In general, the differences have been observed in previous studies (e.g., association of smoking with sex10,11,14, low education14,27, unemployment27, low socioeconomic status23, and alcohol or substance use9,28), though mostly among adolescents, young adults, or non-general population samples. Moreover, most previous studies have only been able to assess a limited number of background variables, without a longitudinal multivariable approach concerning both smoking behaviour and background variables. In our lifelong approach, the disparities in sociodemographic and lifestyle characteristics between smoking trajectory classes could be observed in a longitudinal manner, i.e., from adolescence to midlife, indicating that they provide means for early detection of individuals at higher risk of starting and continuing smoking. For each smoking trajectory class, we were able to present at least two sociodemographic or lifestyle-related characteristics that statistically differentiate the class from lifetime smokers.

The identified characteristics should serve as primary targets for future interventions and preventive/supportive measures. First, our data highlight the importance of interventions aimed at adolescents aged 10 to 20 years; in four out of five ever-smokers trajectories, the prevalence of smoking exceeded 50% by the age of 16. Young adolescents should be informed (e.g., by parents, teachers, school nurses and sports coaches) about nicotine dependence and the health effects of short-term and long-term smoking, and older adolescents should also be offered cessation support. Second, unemployed individuals constitute a clear target group for interventions, regardless of age. Unemployment security services should actively promote cessation support, and health checks of the unemployed should routinely include enquiry about smoking history and open discussion regarding the health effects of smoking. Third, physical inactivity was associated with lifetime smoking. Inactivity is typically coupled with increased sedentary time, implying that electronic (e.g., mobile application-, internet-, or television-mediated) interventions may prove effective among these individuals. We expect future initiatives to benefit from our data by exploiting the identified predictors as direct targets of intervention, or as a means of identifying individuals who may benefit from such interventions.

The main strength of this study was its long follow-up period, extending over the life course of the NFBC1966 population. As a population-based birth cohort, the NFBC1966 provided the best available estimate of the general population, with extensive data on smoking behaviour, sociodemographics and lifestyle across the life course. Despite the long follow-up, the sample size remained large (n = 5797), favouring the population-based setting of the study. Importantly, the smoking trajectories were based on annual data on smoking from the age of 5 to 47, which enhanced the accuracy of trajectory estimation. A lifelong approach was considered highly valuable in order to detect the potential trajectories that represent a change in smoking behaviour in later life.

There were also limitations to our study. First, it was based on self-reported smoking data, which raises the question as to whether our dataset was somewhat affected by social desirability or some other type of response bias. Further, despite the large study sample (n = 5797), a significant number of those originally born into the cohort (52.6%) dropped out at some point during the follow-up. The fact that the cohort study was initiated in 1966 and the follow-up extended over five decades explains the high drop-out to some extent, but also suggests that our data may be subject to selection bias. Our analysis of representativeness confirmed the mild differences between the sample and those excluded, as the drop-outs were characterized by varyingly higher rates of male sex, lower socioeconomic status, and unhealthier lifestyle. While we fully acknowledge that our data are affected by selection bias, we point out that both response bias and selection bias would primarily cause our data to underestimate the prevalence of smoking such that non-smokers would be over-estimated and ever-smokers would be underestimated in the current data. Second, our study setting prevented us from addressing causal relationships between smoking and background variables. As such, we reported mere associations between variables without implying the direction of the association. We also used the term ‘predictor’ in a neutral sense to refer to an independent variable in a statistical model. Third, we did not assess the intensity of smoking, because self-reports of average smoking intensity are typically subject to imprecision31, and because previous reports had concluded that the duration of smoking predicts smoking-related comorbidities more accurately than intensity31,32,33,34,35. Importantly, growing evidence has also associated even low-intensity smoking with increased mortality in the long-term36, further emphasizing the predictive value of smoking duration over intensity. Fourth, this study addressed only tobacco smoking, defined as filter cigarettes, other cigarettes, pipes and cigars. Passive exposure to tobacco smoke, as well as exposure to other tobacco products such as snus and electronic cigarettes, were omitted. In this currently middle-aged study population, the use of new tobacco products seemed to be minor and was considered to have a minimal effect on lifelong smoking patterns. The population’s lifelong exposure to tobacco smoke is likely to be considerable, but we lacked annual data on passive smoking exposure and therefore decided to focus our approach on active smoking.

Conclusion

In this birth cohort study of 5797 Northern Finns followed up for 46 years, we identified six trajectories for lifelong smoking behaviour: one trajectory representing never-smokers, two representing current smokers, and three representing ex-smokers. Smoking was generally associated with male sex, lower socioeconomic status and unhealthier lifestyle. Detailed between-class comparisons showed that unemployment and physical inactivity were significant predictors of lifetime smoking relative to any other smoking trajectory class. Female sex increased the odds of never-smoking and youth smoking, whereas male sex increased the odds of adult quitting, relative to lifetime smoking. We expect future initiatives to benefit from our data by exploiting the identified predictors as direct targets of intervention, or as a means of identifying individuals who may benefit from such interventions.