Data
This study uses data from the National Longitudinal Survey of Youth from 1997 (hereafter referred to as NLSY97), a panel study conducted by the U.S. Bureau of Labor Statistics. Respondents were selected in 1997 at ages 12 to 17 (born 1980–1984), using a multi-stage area (housing units) stratified random sampling design,Footnote 1 and were interviewed annually until 2013 (with the exception of 2012). The NLSY97 contains an oversample of respondents of Afro-American and Latino descent. When weighted, the NLSY97 provides a nationally representative sample. The total sample consists of 8984 respondents. However, we only included those respondents who participated in all waves and for whom there is at least some information on body height and weight at (around) age 28. Most respondents are excluded, because they do not provide full information on the timing of key events in the transition to adulthood, while only few respondents are excluded because of a missing or invalid height and weight. In all, our analysis is based on N = 4688 cases (47% men, 53% women). We use sample weights designed specifically for the group of respondents that participated in all waves.
Obesity definition
The NLSY97 contains measures of self-reported height in feet and inches and weight in pounds (lbs). BMI is calculated by (weight(lbs) × 703)/height2(inches). Our main dependent variable is a binary variable indicating whether or not the subject was obese at age 28, with this age chosen also because all respondents in the survey were at least 28 years old. If respondents did not report height and weight at age 28, their BMI at age 29 was used, and if this was also missing, their BMI at age 27 was used. In line with common practice [13], respondents were classified as obese when their BMI was 30 or higher. Furthermore, adopting the same approach as MacMillan and Furstenberg [29], all BMI scores below 12 or over 50 were considered invalid.
Family background and control variables
The first NLSY97 wave contains a “Parent Questionnaire”, from which we derived family background characteristics, such as parental income, education, and family structure. Parental education was coded as the highest education of the mother or father using five categories: lower than high school, high school, some college, 4-year college or higher, and missing. Parental income refers to the household income reported by one of the parents when the respondent was 12 to 16 years old and was coded in quartiles, also including a missing category. Family structure is the recorded family structure in 1997 and was coded in four categories: 1) Both biological parents, 2) 1 biological, 1 step-parent, 3) 1 biological parent, 4) other (no biological parents). For the main respondent, the gender variable (Female) was coded 0 for males and 1 for females, and Race was coded in four categories: 1) white (non-Hispanic), 2) black (non-Hispanic), 3) Hispanic, 4) other (mixed). Finally, two controls were included. First, we control for obesity at the end of adolescence, so that we can examine how career-family sequences during the transition to adulthood affect the probability of becoming obese, rather than possibly viceversa obesity affecting career-family trajectories. We therefore included the variable Obesity age 17 as a dichotomous variable (0 = not obese, 1 = obese). We defined obesity at age 17 at a cut-off point of 28 rather than 30, as previous research has shown that a somewhat lower cut-off point more accurately captures obesity at younger ages [38]. Second, pregnant indicates whether the respondent was pregnant (1) or not (0) at age 28.
Table 1 shows the proportions of all the categories of the family background variables in the sample and the percentage of obesity within these categories.
Table 1 Descriptive statistics on family background variables (N = 4688) Analytical strategy
Multichannel analysis of career-family sequences
In NLSY97, respondents reported the year and month in which specific life-course events occurred. In terms of education, in each wave they were asked whether they had entered or exited an educational institution in the previous year. Respondents were also asked to report the level of education in which they enrolled, i.e., secondary school, 2-year college, or 4-year college (including postgraduates). Regarding employment, respondents were asked to provide the start and end dates of each job they had in the previous year, including the number of working hours.Footnote 2 With respect to family formation characteristics, respondents were asked whether they had started or ended a marriage or cohabiting relationship in the previous year, as well as the year and month of birth of each of their children. In each wave, respondents reported who was living in their household at that time. Furthermore, respondents were asked the month and year in which they first left and returned to the parental home (if they had done this).Footnote 3
We use NLSY97 information to construct a sequence-type life-course dataset, creating, for each individual, a sequence of 96 consecutive months between ages 17 and 27, along two dimensions: career and family. In order to create a sequence dataset it is necessary to define the ‘state space’, consisting of the different states individuals can occupy at each time-point. The career states cover educational enrollment and employment status. Respondents are classified as being enrolled in high school, in a 2-year college education, a 4-year college education, or not enrolled. Where there are gaps between educational episodes, we consider someone as continuously enrolled if those gaps are shorter than 3 months. Regarding employment, individuals are classified as employed 35 h per week or more, employed for less than 35 h per week, or not employed (the last category includes people who are not actively seeking employment, for instance stay-at-home mothers). Combining these educational and employment statuses leads to 12 (4 × 3) possible different career states.
Family states are defined in terms of living arrangements and parenthood status. Four living arrangements are distinguished: living with parents, living alone/independent, living with partner (cohabiting), and living with spouse (marriage). Within each of these options the respondent can either have had a child or not. Entering parenthood is considered irreversible. Once respondents have become parents, they are classified as parents for the rest of the sequence, independently of whether they co-reside with the child. This leads to 8 (4 × 2) possible family states.
Multichannel sequence analysis has been developed to compare life-course sequences on multiple dimensions [20, 37], such as career and family. In multichannel sequence analysis, sequences are compared on both dimensions simultaneously. The pathways of two different individuals are similar if the timing, occurrence, ordering, and duration in states are similar to each other in both the career and family sequences. In order to develop a series of ideal-typical pathways in the transition to adulthood, we start from a dissimilarity, or distance, matrix and use cluster analysis. We use Optimal Matching Analysis to measure the level of dissimilarity of sequences [1]. The measure is based on how many states would have to be substituted, deleted, or inserted in order to transform one sequence into another. The more of these operations are required, the less similar the sequences are. However, some life-course transitions may occur more often than others. Therefore, in line with the literature we assign costs of substitutions based on the transition rates between different states [52]. When the transition rate from one state to another is low, the substitution costs for these states is high, leading therefore to a larger distance between sequences.
Multichannel sequence analysis is performed using the TraMineR package in R. Based on the distance matrix resulting from the multichannel Optimal Matching procedure, a weighted (using NLSY97 weights) hierarchical clustering procedure using Ward’s method was chosen to produce clusters of respondents with similar life sequences. An advantage of the Ward algorithm is that it produces fairly equal-sized groups [3].
The choice of the optimal number of clusters is based on the best model fit in terms of the Akaike Information Criteria (AIC) [4]. We conduct multiple logistic regressions, in which each logistic regression differs in the number career-family clusters (based on the different cluster solutions) that are included as dummy variables, in order to test which set of career-family pathway variables most adequately predict obesity at age 28. Table 2 shows that the 8-cluster solution provides the lowest AIC and therefore the best model fit, thus we opt for the 8-cluster solution. .
Table 2 Model fit (AIC) of logistic regression for different number career-family clusters Analyzing precursors of obesity
We use binary logistic regression to identify the effects of career-family sequences on the risk of obesity at age 28. In addition to the family background and control variables, dummy variables for the set of career-family sequence clusters during the transition to adulthood are included, indicating whether someone is a member of a particular career-family cluster. The career-family cluster variables are interacted with gender in order to examine differences in the influence of each career-family type between men and women. Weights constructed by the NLSY were used to counter any potential selectivity of the sample.