Sample
The Millennium Cohort Study (MCS) is an ongoing, multi-disciplinary study that follows the lives of approximately 19,000 children born in the United Kingdom between 2000 and 2002. See (Plewis, 2007a) and https://cls.ucl.ac.uk/cls-studies/millennium-cohort-study/ for full sampling details. These details are summarized here in brief. Families were identified as eligible for participation in the MCS using child benefit records, which were universal social security payments made to all families with children. Families were recruited to the study when the children were 9 months old and were subsequently followed up at age 3 years, 5 years, 7 years, 11 years, 14 years, and 17 years. Trained researchers administered surveys and conducted interviews in family homes at each wave of data collection. Parents answered questions about demographic characteristics (e.g., socioeconomic status indicators) and indicators of child health and well-being (e.g., physical activity, cognitive development, socioemotional well-being). During the latter stages of the study, young people themselves also completed self-report questionnaires and took part in assessments.
Participants for the current analysis (N = 17,157) were taken from the MCS, which is a population-based cohort study representative of the United Kingdom. The sex of the participants was approximately equally split (52% male). Participant ethnicity was also diverse (81% White, 10% South Asian, 4% Black, 3% Mixed, and 2% Other). For the analyses reported here, only the first child per family was included and those without any siblings at either age 11 or 14 years were excluded.
Measures
Sibling bullying
At age 11 and 14 years, the young people were asked two questions about sibling bullying: “how often do your brothers or sisters hurt you or pick on you on purpose?” (victimization) and “how often do you hurt or pick on your brothers or sisters on purpose?” (perpetration). Responses were re-coded on to a six-point scale (0 = never, 1 = less often, 2 = every few months, 3 = approximately once a month, 4 = approximately once a week, 5 = most days). The correlation between a single-item scale, such as the one used here, and multi-item scales (e.g., Wolke & Samara, 2004) was calculated in an independent sample (Avon Longitudinal Study of Parents and Children (Boyd et al., 2013)), and it was shown to be high (victimization: r = 0.91, n = 6,909, p < 0.01; perpetration: r = 0.85, n = 6856, p < 0.01). There is also evidence to suggest that the prevalence of sibling aggression victimization and perpetration using single-item scales (e.g., Toseeb et al., 2018) is similar to that when using multi-item scales (e.g., Tippett & Wolke, 2015). Thus, there is good evidence for the validity of the single-item scales.
Self-report positive mental health
Adolescents self-reported two aspects of positive mental health when they were 17 years old.
General wellbeing
The seven-item Warwick-Edinburgh mental wellbeing scale (Tennant et al., 2007) was used to measure general wellbeing in the preceding two weeks. Sample questions were “I’ve been feeling optimistic about the future” and “I’ve been thinking clearly”. Responses to the seven questions were coded on a five-point scale (1 = none of the time, 2 = rarely, 3 = some of the time, 4 = often, 5 = all of the time). These were then summed and scaled in line with scoring guidelines so that higher scores indicated higher levels of wellbeing. The internal reliability for the scale was good (α = 0.83).
Self-esteem
The shortened five-item Rosenberg self-esteem scale (Rosenberg, 1965) was used to measures self-esteem. Sample questions were “on the whole, I am satisfied with myself” and “I am a person of value”. Responses to the five questions were recoded on to a four-point scale (0 = strongly disagree, 1 = disagree, 2 = agree, 3=strongly agree). These were then summed so that a higher score indicated higher levels of self-esteem. The internal reliability for the scale was excellent (α = 0.91).
Self-report negative mental health
Adolescents completed several well-validated measures of mental health difficulties when they were 17 years old.
Internalizing and externalizing problems
The self-report strengths and difficulties questionnaire (SDQ; Goodman, 1997) was completed by the young person with reference to the preceding six months. Responses were given on a three-point scale (0 = not true, 1 = somewhat true, 2 = certainly true). In line with the scoring guidelines (sdqinfo.org), four five-item subscales were created: emotional problems (e.g., “I get a lot of headaches, stomaches, and sickness”), peer problems (e.g., “I would rather be alone than with other people”), conduct problems (e.g., “I get very angry and often lose my temper”), and hyperactivity and inattention (e.g., “I am easily distracted, I find it difficult to concentrate”). In line with scoring guidelines and previous literature (e.g., Winsper et al., 2020), the emotional and peer problems subscales were combined to create an internalizing problems subscale and the conduct and hyperactivity subscales were combined to create an externalizing problems subscale. For all subscales, higher scores indicated more mental health difficulties. The internal reliability for both scales was acceptable (internalizing α = 0.74 and externalizing α = 0.75).
Psychological distress
The Kessler 6 scale (Kessler et al., 2003) was used to measure non-specific psychological distress. The scale consists of six questions relating to symptoms of depression and anxiety that the young person may have experienced in the preceding 30 days. Responses were re-coded on to a five-point scale (0 = none of the time, 1 = a little of the time, 2= some of the time, 3= most of the time, 4=all of the time). Sample items were “during the last 30 days, about how often did you feel so depressed that nothing could cheer you up?” and “during the last 30 days, about how often did you feel nervous?”. Responses were summed so that higher scores were indicative of higher levels of psychological distress. The internal reliability for the scale was good (α = 0.86).
Self-harm
Young people were asked whether they had hurt themselves on purpose in the preceding year. They were shown six types of self-harming behaviours and asked to respond on a binary scale (0 = no, 1 = yes). The behaviours were: “cut or stabbed yourself”, “burned yourself”, “bruised or pinched yourself”, “taken an overdose of tablets”, “pulled out your hair”, and “hurt yourself in some other way”. Responses were summed so that a higher score indicated higher levels of self-harm. The internal reliability for the scale was good (α = 0.81).
Parent-report negative mental health (internalizing and externalizing problems)
The primary caregivers (mostly the biological mother) completed a number of questionnaires about their child. The parent-report SDQ (Goodman, 1997) was completed by the primary caregiver about their child when the adolescent was 11, 14, and 17 years old. The parent-report version of the SDQ has identical questions to the self-report version described previously except the wording reflects its parent-report nature. Parents were asked to reflect on the preceding six months. As with the self-report version, emotional and peer problems items were combined to create an internalizing problems subscale. The conduct and hyperactivity items were combined to create an externalizing problems subscale. For all subscales, higher scores indicated more symptoms of mental health difficulties. The internal reliability for both scales was at least acceptable (internalizing problems α: 0.76 (11 years), 0.77 (14 years), 0.78 (17 years) and externalizing problems α: 0.81 (11 years), 0.81 (14 years), 0.80 (17 years)).
Covariates
A number of covariates were included in the statistical models. These are described in this section.
Pre-existing mental health difficulties
The parent-report strengths and difficulties questionnaire (SDQ: Goodman, 1997) was completed by the primary caregiver when the child was three years old. This has been described previously. The internal reliability of the scale was at least acceptable (internalizing α = 0.61 and externalizing α = 0.78).
Sex
At the first wave of data collection, primary caregivers reported their child’s biological sex (0 = female, 1 = male).
Poverty
Primary caregivers reported income from all sources (government benefits, employment etc.) when the young person was 11 years old, and this was used to calculate overall income. The OECD-modified scale was then used to standardize this overall household income (Hagenaars et al., 1994). Poverty was categorized as those families whose income was lower than 60% of the median income level (0 = not in poverty, 1 = in poverty).
Statistical Analyses
STATA/MP version 17.0 (StataCorp, 2021) was used for data analysis. The analyses reported here were preregistered (https://osf.io/63q45). Some analyses, which were preregistered (research question 4 in the preregistration document), were removed from this study and will be included as part of a separate article. In addition to this, instead of using p-values to make inferences from the statistical models, 95% confidence intervals were used.
Missing data
There was some sample attrition over time. In line with the recommended use of the MCS dataset, the data were assumed to be missing at random (Plewis, 2007b). To maximize power, multiple imputation was used to deal with missing data. The proportions of missing data for each variable are shown in Table 1. The “mi impute” command with “chained” equations was used, which generated 50 imputed datasets. The command fills in missing values for multiple different variables with a set of possible values by using chained equations, a sequence of univariate imputation methods with fully conditional specification of prediction equations. Two imputation models were fitted, the first to test hypotheses 1 and 3 and the other to test hypothesis 2 (a single model was attempted but failed). To account for the application of disproportionate stratification and sample attrition all estimates were weighted to the population level. Weights were applied according to the MCS data handling guide (Agalioti-Sgompou & Johnson, 2020). Where possible, the “mibeta” command was used to calculate estimated standardized β coefficients.
Table 1 Missing data and imputed values In addition to the analyses reported in the main body of the article, a set of sensitivity analyses were carried out. For this additional set of analyses, all models were repeated with only participants with complete data on all four sibling bullying questions. This is reported in the supplementary materials (Tables S1–S3).
Testing hypothesis 1
Mutually exclusive sibling bullying groups were created based on established cut-offs (Wolke & Samara, 2004): victim-only: victimized at least once a week but not perpetrated; bully-only: perpetrated at least once a week but not victimized; bully-victim: both perpetrated and victimized at least once a week; uninvolved: does not meet the criteria for any of the other categories. To test whether sibling bullying at age 11 years is associated with self-report positive and negative mental health at age 17 years, six multiple regression models were fitted: internalizing problems (1), externalizing problems (2), psychological distress (3), self-harm (4), general wellbeing (5), and self-esteem (6). For each model, the predictor was entered as sibling bullying group (uninvolved, victim-only, bully-only, bully-victim) as a dummy variable with the uninvolved group as the reference category. The outcome was entered as one of the previously mentioned measures of positive and negative mental health. Sex, poverty, and pre-existing mental health difficulties were entered as covariates in all models. The models were then re-run each time changing the reference group.
Testing hypothesis 2
To determine whether there is a dose-response effect of sibling bullying victimization at age 11 and 14 years on self-report positive and negative mental health at age 17 years, six multiple regression models were fitted: internalizing problems (1), externalizing problems (2), psychological distress (3), self-harm (4), general wellbeing (5), and self-esteem (6). For each model, the predictor was entered as the victimization frequency (0 = not bullied at least once per week at either age 11 or 14 years, 1 = bullied at least once per week at either age 11 or 14 years, 2 = bullied at least once per week at both age 11 and 14 years). Initially, the reference category in all models was uninvolved. Therefore, the model estimates were for uninvolved vs transient and uninvolved vs repeated. Then, the reference category was changed to transient to allow for a comparison of the transient group to the repeated group. Sex, poverty, and pre-existing mental health difficulties were entered as covariates in all models.
Testing hypothesis 3
To determine whether the change in mental health difficulties between age 11 and 17 years was different depending on the type of sibling bullying involvement at age 11 years, two multilevel mixed-effects regression models were fitted. The outcome variable was entered as parent-report internalizing (1) or externalizing problems (2). The predictors in the fixed part of the model were the linear effect of age, sibling bullying group (uninvolved, victim-only, bully-only, bully-victim) - as a dummy variable with the uninvolved group as the reference category, and the interaction between sibling bullying group and linear effect of age. Anonymized participant number and the linear effect of age were included in the random part of the model. Sex, poverty, and pre-existing mental health difficulties were entered as covariates in all models. The models were then re-run each time changing the reference group to allow for comparisons of the three bullying groups to each other.