Introduction

In the United States (US), nearly one quarter of annual breast cancer (BC) cases occur in women under 50 years of age and the incidence is increasing [1, 2]. The etiology of BC varies by age [3, 4] and is poorly understood in young-onset BC [3, 5,6,7,8]. Breast tumors are also now recognized to have different histopathologic and molecular characteristics with heterogeneous etiology, prognosis, and treatment [9,10,11,12]. Tumors in young women are also more likely to present at a later stage, have a worse prognosis, and be hormone receptor-negative (HR-)[13,14,15]. Non-Hispanic White (NHW) and non-Hispanic Black (NHB) women have the highest incidences of BC in the U.S. [2] and racial and socioeconomic inequities in BC also persist [16,17,18].

Racial inequities exist in the U.S. in overall BC mortality and incidence, particularly in younger women, and there are unequal distributions of tumor subtypes. Overall BC mortality was 40% higher in NHB compared to NHW women during 2013–2017 [17] and this inequity is particularly pronounced among women < 50 years of age, where mortality was 82% higher in NHB compared to NHW women in 2018 [19]. Though overall incidence of BC among NHB women has historically been lower than NHW women, rates are now nearly equal [2], and among the youngest women (aged < 40 years) incidence rates have consistently been higher among Black women [2, 20]. Among women < 50 years of age, NHB women also had a 90% higher incidence of the most aggressive HR-/HER2- (i.e., triple-negative (TNBC)) tumors compared to NHW women in 2012–2016 [2]. Studies examining racial residential segregation have observed that among Black women, both a lower [21] and higher [22] proportion of Black residents in census tracts is associated with a higher odds of TNBC. Everyday experiences of discrimination have also been associated with increased incidence of BC among Black women, particularly among those aged < 50 years [23], potentially contributing to an explanation for observed patterns of racial residential segregation and TNBC [22].

Socioeconomic inequities in BC mortality and incidence also exist. Poorer women have historically had lower mortality from BC at all ages [18]; however, mortality from BC has steadily increased since 1950 among women residing in disadvantaged census tracts and decreased among women in affluent tracts[18]) such that, by 2013, BC mortality in the most disadvantaged tracts was 6% higher than in the most affluent tracts [18]. The incidence of BC overall has also increased among women residing in the most disadvantaged counties more rapidly than among women in the most affluent counties: from 1981–1990 to 2001–2010, incidence increased by 15% in the most disadvantaged and only by 9% in the most affluent counties [24]. Black and White women residing in the most disadvantaged counties (> 20% poverty) also had a higher prevalence of HR- BC relative to women residing in wealthier counties (< 10% poverty) in 2004–2007 [25]; this is most pronounced for NHB compared to NHW women < 50 years old (HR-/HR + ratio = 1.51, 95% Confidence Interval (CI): 1.20, 1.90) [25]. Similar patterns are seen at the census-tract level: women residing in tracts with intermediate and low compared to high socioeconomic status index had 1.81 (95% CI 1.20, 2.71) and 1.95 (95% 1.27–2.99) relative risk ratios for TNBC, respectively, in 2005–2017 [21].

Few modifiable factors have been identified to inform BC prevention strategies [26], particularly in young women [9, 27,28,29,30,31] and by tumor type [9, 13], or to explain racial and socioeconomic inequities in BC incidence [32,33,34]. We conducted a population-based case–control study of BC risk among NHB and NHW women aged < 50 years old from diverse socioeconomic backgrounds in the US: The Young Women’s Health History Study (YWHHS). Our research is informed by an eco-social theory of health, which situates health outcomes—particularly those between groups—within a complex socio-historical context; eco-social theory seeks to identify the pathways through which that context is embodied [35, 36]. Further, we recognize racism is a potent social determinant that continues to regulate differences in exposures to socioeconomic and other opportunities by race, thereby contributing to racial health inequities in the U.S. [37, 38]. We hypothesize that socio-cultural factors related to race and socioeconomic position determine exposures over the life-course (e.g., reproductive and energy balance factors) that modify biology and, in turn, risk for young-onset BC tumor types (Fig. 1) [22, 36, 38,39,40,41,42,43,44]. In this paper, we document details of the YWHHS study design, life-course measures collected, data collection methods, response and cooperation rates, and provide a description of our final study population.

Fig. 1
figure 1

YWHHS conceptual framework: socio-historical context, life-course reproductive and energy balance factors, and breast cancer risk among young non-Hispanic Black and White women

Methods

Overall study objectives

The primary objectives of the YWHHS were to provide insight into modifiable early life and life-course factors associated with young-onset (< 50 years) BC risk and to understand racial and socioeconomic inequities in BC risk in the U.S. [40, 44,45,46,47]. We are investigating: (1) the association between early life and life-course factors and risk for BC overall and by tumor subtypes among young NHB and NHW women [9, 27,28,29,30,31,32], (2) the potentially modifying effects of the socio-historic context of race/ethnicity (hereafter “race”) and life-course socioeconomic position (SEP) on BC risk, and have also (3) created a bio-repository of blood (or saliva) and breast tumor tissue for current and future study of the contribution of biomarkers, gene-environment interactions, and gene expression on BC risk in young women.

Overall study design

BC cases were identified from the metropolitan Detroit (Oakland, Wayne, and Macomb counties) and Los Angeles County Surveillance, Epidemiology and End Results (SEER) registries diagnosed between 2010 and 2015. Controls were identified through area-based sampling from the 2010 Census and matched to cases by study site, age, and race. Primary data collected included: (1) an in-person computer-assisted personal interview (CAPI) conducted with a life history calendar, (2) anthropometric measurements, (3) blood collection (or saliva when not available) and related questionnaire, (4) SEER tumor type information, including ER, PR and HER2 status, and (5) breast tumor tissue collected from participants’ BC surgeries. Additional collected data included: (6) an interviewer-completed built environment survey of participants’ neighborhoods, (7) a survey completed by participants’ primary childhood caregiver, and (8) childhood photos of body size. We also requested (9) permission to obtain information from the health department(s) where women gave birth and (10) where she was born, and (11) most recent mammogram reports from healthcare providers. Participation in the main study questionnaire was necessary for enrollment; all other study components were optional. This study protocol was approved by the Institutional Review Boards at the University of Wisconsin—Milwaukee (UWM); Michigan State University (MSU); Wayne State University (WSU); the Michigan Department of Community Health; University of Southern California (USC); the California Committee for the Protection of Human Subjects (CPHS); and for the Medical College of Wisconsin (MCW), IRB oversight was deferred to UWM. The California Cancer Registry also approved the study.

Study organization

The YWHHS Coordinating Center (initially hosted at MSU, moved to UWM in 2014) were responsible for study design, development, and oversight of the study tracking system. Westat, a research services corporation, and study collaborators developed the control sampling design, oversaw identification and recruitment of control participants, and created final study sample weights. Final recruitment, in-person interviews, and biospecimen collection were conducted at two field sites: Los Angeles County (at USC) and metropolitan Detroit (at WSU). A community advisory panel was assembled and consulted about data collection materials and study methodologies.

Eligibility criteria (see Table 1)

Table 1 Eligibility criteria for cases of breast cancer and controls, Young Women’s Health History Study

Study tracking system

A centralized computer system that tracked all corresponding study data and biospecimens was adapted and managed for YWHHS by the USC Cancer Research Informatics Core (CRIC).

Ascertainment, sampling, recruitment, and screening

Ascertainment, sampling, recruitment, and screening activities for cases and controls are outlined in Fig. 2.

Fig. 2
figure 2

Control and case sampling, eligibility, and recruitment: Young Women’s Health History Study

Cases

Potentially eligible cases were identified by the Metropolitan Detroit Cancer Surveillance System (MDCSS) SEER registry and the LA County Cancer Surveillance Program (CSP) SEER registry. For both sites, cases were identified through rapid case ascertainment (RCA), which aims to identify cases within 3–6 months after diagnosis.

Case sampling.

We sampled from all eligible NHW 45–49 years of age due to budgetary constraints. Given that there is a paucity of studies among NHB women, the youngest women (< 45 years of age), and women diagnosed with estrogen receptor-negative tumors, we retained all NHB women diagnosed 20–49 years of age, all NHW women 20–44 years of age, and among NHW women aged 45–49 years, oversampled women with estrogen receptor-negative tumors. Thus, all eligible NHB cases 20–49 years of age and NHW cases 20–44 years of age were included, and a sample of NHW cases aged 45–49 years (n = 829 of 2,527 Detroit; n = 883 of 2,782 LA), sampled as follows: between 09/01/2010 and 08/31/2012 30.5% of all NHW 45–49 year old cases; between 08/31/2012 and 08/31/2015 84.5% of ER- cases and 40.8% of ER + tumors.

Case screener interview.

All sampled cases were screened to determine final eligibility status. Cases not successfully screened by a study site team were checked against the updated SEER Registry to determine eligibility status. Cases initially sampled were considered ineligible for the following reasons: not U.S.-born (n = 373), self-identified as neither White nor Black (n = 153), self-identified as Hispanic (n = 151), previous cancer diagnosis (n = 117), resided outside of the study areas at reference date (see definition of reference date in Table 1; n = 50), tumor had ineligible histology (n = 44), did not speak English (n = 29), updated age or reference date was out-of-range (n = 17), physically or mentally unable to complete the interview (n = 14), or institutionalized at reference date (n = 7). Two percent of cases were ineligible for screening for one or more of these reasons. In Detroit, a letter was sent to each eligible case’s physician before cases were contacted; if the physician did not respond within three weeks the case could be contacted, except for a few Detroit hospitals that required active physician approval.

Controls

YWHHS investigators and the Westat team developed the area-based control sampling strategy and Westat developed the statistical sampling methodology [48, 49]. Westat also oversaw control identification and recruitment, household rostering, screener interviews, and initiated control recruitment efforts. Once potentially eligible controls were identified, their contact information was provided to the YWHHS Coordinating Center to be entered into the study tracking database for recruitment.

Control sampling.

A three-stage area probability sample was conducted to provide coverage of metropolitan Detroit and LA County from which YWHHS case participants were identified (see Supplemental Materials). The first stage of sample selection was that of PSUs (Primary Sampling Units) consisting of one or more Census blocks as identified in the U.S. Census conducted in 2010. Within sampled PSUs, the second stage was the sampling of approximately 24,000 + addresses from listings based on addresses served by the U.S. Postal Service. Households within occupied sampled addresses were rostered to identify members who were potential controls for the study. The third stage of sample selection involved randomly selecting women from among those potentially eligible. The sampling rates employed were designed to obtain a set of controls that were frequency matched to the expected case distribution within study site by race (NHB/NHW) and 5-year age intervals.

Control household roster.

A total of 24,612 households were sampled (Table 2) and 21,668 were determined eligible for roster. An introductory letter, brief roster, and a $2 bill were mailed to all sampled residential addresses. The same follow-up household contact recruitment protocol was then used as the National Health and Nutrition Examination Survey [50]. A total of 18,612 household were rostered. The roster asked the initials/name, age, and race/ethnicity of all adult women 20–50 years old in the household (see Supplementary Materials for additional details).

Table 2 Overall ascertainment numbers by race and site, Young Women’s Health History Study
Control screener interview.

An in-person screener interview was conducted to determine the final eligibility of potentially eligible women identified and sampled from the household roster. Those who completed the screener received $5. Respondents willing to participate or interested in learning more were asked to provide their contact information for a study site (WSU/USC) interviewer to contact them.

Data collection

In-home case–control interview recruitment.

An introductory letter and study brochure were sent to all sampled case and control women. After sending the introductory letter, study staff (WSU/USC) telephoned women to determine (cases) or confirm (controls) eligibility, answer questions, and identify a location and time for an in-person interview. Women not reached by phone were sent follow-up letters and reminder postcards, and, in some cases, in-person visits. Women who declined to participate were asked to complete a brief questionnaire about demographic characteristics to characterize non-respondents.

In-person interview scheduling and informed consent.

Study participants were interviewed at their selected location. Prior to interview, participants were mailed a confirmation letter and their interviewer’s business card with a photograph. Before the interview, the participant was asked to read and sign a consent form that described the study and participant rights and safeguards; it also requested permission to conduct the interview and each component of the study. Women were informed they could refuse any questions and terminate the interview at any time. Women who had a mammogram were asked to complete a separate consent form that requested permission to obtain information from her healthcare provider about her last mammogram before reference date. Additionally, case participants were asked to provide consent to obtain tumor tissue sampled at the time of diagnosis or thereafter. A thank you gift of $75, which was later increased to $100, was provided for the main interview.

Main questionnaire.

The YWHHS questionnaire captured information about energy balance factors (e.g., childhood and adult diet, physical activity, and adult body size), factors known to affect life-course energy balance (e.g., food security, sleep patterns, built environment), known risk factors for BC (e.g., reproductive and family history), as well as race/ethnicity and life-course socioeconomic indicators. Collected information related to race/ethnicity includes self-reported race and Hispanic ethnicity, as well as the race/ethnicity others typically ascribe to the participant. We also asked about early life discrimination, experiences of every-day discrimination and the source of discrimination. Life-course socioeconomic indicators include residential history, household percent poverty (HPP), educational attainment, and occupational status [51, 52]. HPP was calculated using household net income adjusted for household size. Other factors associated with social context collected include life-course experiences of adversity (including childhood experiences), financial status and use of governmental subsidies, food insecurity, occupational status, and health insurance status. Other information on factors potentially associated with BC risk include prenatal exposures, medical history, non-steroidal anti-inflammatory medication use, contraceptive use, hormone medication use, fertility history, and life-course personal and secondhand tobacco exposure, as well as alcohol use. Study questions were developed based on existing questionnaires [53,54,55,56,57].

Multiple tools were used throughout the questionnaire to assist participants with recall, including a life history calendar of key life events [58], showcards, which also provided a non-verbal method of responding to sensitive questions, and a photobook of oral contraceptive, hormone, and thyroid medications [58].

Additional components of the in-person interview: anthropometric assessment.

Height, weight, waist circumference, and body composition (assessed by Tanita bioelectrical impedance analysis (BIA)) were measured. Diet. A modified version of the full 100-item Block Food Frequency Questionnaire (FFQ) was developed by NutritionQuest (Berkeley, CA) with the study PI (Velie) to capture total diet and foods suspected to be associated with BC risk (e.g., cruciferous vegetables) in the 12 months prior to reference date. The FFQ was administered on paper or verbally during the interview; those who did not complete it at the interview returned it via mail or at the phlebotomy visit. Childhood diet was assessed with a food list. Childhood photographs. Participants provided photos from “head to toe” at ages 6, 9, 12, 15, and 18 years to validate recalled relative body size (assessed by somatotype); photos were scanned and de-identified by digitally masking the participant’s eyes/face, if requested. Built environment survey. Interviewers conducted a survey of neighborhood characteristics, primarily at the time of the interview [59, 60]. Surveys not completed by the end of study recruitment (6.5%) were conducted remotely via Google Maps Street View using photos collected at the date closest to the interview date [61]. Primary caregiver survey Participants were asked to mail their primary childhood caregiver a brief survey. Caregivers were given $10. The survey included respondent’s demographics, biologic mother’s pregnancy with the participant, and the study participant’s childhood body size, physical activity, and SEP.

Biospecimen collection

Blood.

All study participants were asked to provide a blood sample. Samples were collected by a phlebotomist, generally at the second visit (96%, 4% at first visit). Phlebotomists attempted to obtain 30 mL (approximately 2 tablespoons) collected in four 10-mL vacutainers: two with no additive and two with EDTA. For cases, our protocol indicated samples should not be collected until at least two months after last treatment (average days post treatment = 376 days; 95% CI 353.9, 398.6). Participants who provided blood samples were originally given a $20 thank you gift, which was later increased to $25. Samples were processed at the MSU Cytogenics laboratory and MCW Tissue Bank.

Blood Questionnaire.

Phlebotomists administered a questionnaire to each participant at the time of blood draw. Questions addressed recent medication use; medical history; menstrual, pregnancy, and lactation status; and recent food, beverage, alcohol, and tobacco consumption.

Menstrual calendar.

During the main interview, if a participant reported menstruating within the past year and if they consented to have their blood drawn, they were asked to complete a menstrual calendar that indicated each day they experienced menstrual bleeding until the date their blood was drawn. If participants had not completed this calendar at the time of blood draw, the phlebotomist completed it with the participant for the preceding two months.

Menstrual postcard.

At the end of the blood draw, menstruating participants were given a pre-addressed stamped postcard, and asked to record the date of the first day of their next menstrual cycle and mail it; this information was used to determine the participant’s menstrual phase at the time her blood was drawn.

Saliva.

Participants unwilling or unable to provide a blood sample were asked to provide a saliva sample with the Oragene OG-500 DNA kit. Saliva samples were collected immediately after administration of the main questionnaire, by the phlebotomist at the second visit, or mailed to the participant after the first visit and returned by mail.

Tumor SEER Information.

Tumors were characterized by ER, PR, and HER2 molecular subtypes, and histological grade to differentiate luminal A and luminal B tumors using data from SEER registries [11]. SEER reports also included ICD-O codes, tumor size, laterality, lymph node involvement, and initial treatment and surgical history.

Tumor Tissue.

To evaluate other tumor characteristics, e.g., Ki-67 status [11], tumor tissue from consenting cases was requested from hospitals or clinics where they were stored; when possible, tumor samples were taken before treatment. When adequate tissue was provided, tumor microarrays (TMAs) were created.

Biospecimen storage.

All blood, saliva, and tumor tissue biospecimens are stored at the MCW Tissue Bank as part of the YWHHS Biorepository. Separate biomarker studies will be conducted with all collected biospecimens.

Interviewer Training and Quality Control Measures

Control recruitment interviewer training.

Control field interviewers were employees of Westat. Interviewers from both study sites were trained together to synchronize data collection. Once they demonstrated adherence to all protocols they were certified for data collection.

Study site interviewer and phlebotomist training.

Training was conducted by the YWHHS Coordinating Center to synchronize data collection. All field staff completed appropriate IRB-mandated training and field safety training and were certified by the YWHHS Coordinating Center once they demonstrated adherence to all protocols and competence in a complete study interview.

Main interview and phlebotomy quality control.

Interviews and phlebotomy visits of consenting participants were audio recorded for quality control. The first five recorded interviews completed by each interviewer and additional interviews as needed based on performance (4.8% in Detroit; 2.6% in LA of completed interviews) were reviewed by a trained evaluator. The evaluator documented discrepancies in recorded responses, deviations from protocol, and appropriate probing, and provided detailed feedback to each interviewer.

Study response and cooperation rate calculations

Response and cooperation rates were calculated using imputation methods in accordance with the American Association for Public Opinion (AAPOR) guidelines [62] (see Supplemental Tables 1 and 2).

Sample weights

Sample weights were created for both cases and controls to account for sampling design and non-response. Weights reflect probabilities of selection and adjustments for non-response. Adjustments for non-response were done at the screener and main interview levels. To achieve the frequency matching of controls to cases, a weighted distribution of cases for each study site was established across cells of age and race. The sample weights of controls were then post-stratified to the weighted totals within each of these cells [63]. Additionally, replicate weights were created to develop estimates of variability, including standard errors. Demographic characteristics were obtained for 86% of sampled controls (complete roster information), and 100% of sampled case participants (age, race, site, county, ER status) to inform non-response weights. Replicate weights were created for case–control analyses and case-only analyses. A second set of weights was created for control-only analyses, to weight controls to the source population. Replicate weights were also created for blood sample analyses.

Statistical analyses

Primary analyses are conducted using survey weighted multiple logistic regression to account for study design and potential confounding. Where appropriate, potential effect modification by study site, race and/or socioeconomic position are being evaluated. For some analyses, structural equation modeling (SEM) with latent variables is being conducted to evaluate exposures over the life-course [64]. Additionally, for some analyses we are using survey weighted polytomous logistic regression to assess heterogeneity in risk by tumor subtypes.

Operational results

Case participation

A total of 5,309 potentially eligible women were identified through the Detroit (n = 2,527) and LA (n = 2,782) SEER registries (Table 2). Of these, 80% were sampled (see Case Sampling), and 3,326 were determined to be eligible or potentially eligible (Table 2). Among sampled cases, 124 women died before they could be interviewed and 82 could not be contacted because physician or hospital permission was not obtained. Other reasons for non-interview included: 177 could not be located, 70 moved away from the study area, 23 were too ill, and 415 did not respond after maximum contact attempts. Of the 3,326 sampled and potentially eligible participants, study staff had the opportunity to recruit 2,435 participants. Of these, 623 declined to participate, and 1,812 women were interviewed (ER + n = 1,310; ER- n = 437). The overall cooperation rate was 74.4% (Detroit: 71.9%, LA: 77.2%) and response rate was 59.8% (Detroit: 53.1%, LA: 66.4%) (Supplemental Table 1). Response rates were higher for NHB women (60.2%) than NHW women (59.8%), and for LA (66.4%) than Detroit (53.1%) (Supplemental Table 1), but did not vary significantly by age (Supplemental Table 2).

Control participation

A total of 24,612 households were sampled in Detroit (n = 9,994) and LA (n = 14,618) (Table 2). Of these, 21,668 were eligible or potentially eligible and 18,612 households completed a roster (86% response rate) (Supplemental Table 1). Households not rostered because they were in an inaccessible gated community included in LA 9% and Detroit 1% of potentially eligible households. Of households that completed rosters, 3,414 participants were sampled and 2,720 completed screeners (88% response rate, Supplemental Table 1). Reasons that screeners were not obtained were the following: resided outside the study area (n = 24), was too ill (n = 2), was not reached after maximum attempts (n = 132), or sampled in error (n = 9). Of the 3,247 participants sampled for screening that interviewers had the opportunity to screen, 83.6% were screened. Of these, 1,988 were eligible or potentially eligible and 97.2% agreed to be contacted by study site staff. Thus, Westat provided control participant information for 1,933 women. Of these, study site staff had no opportunity to interview 223 women for the following reasons: 12 were ineligible, 2 died before interview, 6 could not be located, 30 moved away from the study area, 2 were too ill, and 171 were not reached after the maximum number attempts. Thus, 1,708 participants were confirmed to be eligible and agreed to be contacted by the study site staff. Of these, 327 women refused to participate in the study (4% via proxy) and 1,381 completed the main interview (Table 2). Accounting for the household roster cooperation rate (94%), screener cooperation rate (84%), and study site recruitment cooperation rate (81%), the overall study cooperation rate was 65% (Supplemental Table 1). Similarly, taking into account the household roster response rate (86%), the participant screener response rate (88%), Westat agreed to be contacted response rate (98%), and the study site recruitment response rate (72%) led to an overall control response rate of 53% (supplemental Table 1). Response rates were higher for NHB women (57.9%) compared to NHW women (48.3%), and for LA (58.5%) compared to Detroit (49.3%) (Supplemental Table 1) but did not vary significantly by age (Supplemental Table 2).

Main interview

Location of completed interviews

A total of 73.2% and 80.8% of interviews were conducted in-home, 3.4% and 3.0% were conducted at a study site office, and 23.5% and 16.2% were conducted at other locations (e.g., a coffee shop, local library, or healthcare provider’s office) for cases and controls, respectively. Distributions of interview locations were similar across study sites.

Interview timing

The median period between reference date and interview date was 153 days for controls and 378 days for cases (Supplemental Table 3).

Table 3 Weighted demographic characteristics of interviewed participants by site and case–control status, Young Women’s Health History Study (N = 3,193)

Length of main questionnaire

The questionnaire included 639 questions (excluding probing questions and repeat questions about exposures over the life-course). The median administration time of the questionnaire was 130 and 120 min for cases and controls, respectively (Supplemental Table 4). The median duration of the measured anthropometry section was 11 min for both cases and controls (Supplemental Table 4). Interview time for study participants was longer for NHB women (141 min) compared to NHW women (119 min) and for poorer women (HHP < 150; 132 min) compared to wealthier women (HHP ≥ 300; 120 min).

Table 4 Completion rates of study materials by case–control status and race, Young Women’s Health History Study

Description of interviewed study population

Table 3 shows the weighted demographic characteristics of interviewed study participants. Cases were more likely to be wealthier than controls (52.0% vs. 46.3% HHP ≥ 300) and less likely to be unemployed (17.9% vs. 25.9%). Participants were similar across study sites, although both NHB and NHW women were more likely to be poor (HHP < 150%) in Detroit than LA. NHB women across both study sites were also significantly more likely to be poor (35.1% cases; 49.1% controls) compared to NHW women (12.3% cases; 15.8% controls) (Table 3).

Completion of study components

Response rates for all ancillary data collection efforts and for biospecimen collection are reported in Table 4. Nearly all participants completed the main interview (99%) and provided anthropometry measurements (95% of cases and 96% of controls). Most also provided blood samples (75% of cases and controls), or if blood was not provided, saliva (84% of cases and 81% of controls provided blood or saliva). In addition, 60% of women with BC who consented to allow us to retrieve tumor tissue had tissue available for analysis and thus far, of available participant tumor tissue, 58% has been retrieved (n = 660). Nearly all interviewed participants (97%) agreed to be contacted in the future.

Discussion

We successfully conducted the YWHHS: a large population-based case–control epidemiologic study based on the eco-social theory of disease etiology [42] to identify potentially modifiable factors associated with young-onset BC overall and by molecular tumor subtypes, and to investigate racial and socioeconomic inequities in BC among NHB and NHW young women. For the extensive in-person interview (median time 120–130 min), we achieved a 60% response rate among cases and 53% response rate among controls, and the cooperation rate, among those we had the opportunity to interview was 74% among cases and 65% among controls. This was achieved through extensive follow-up efforts with the use of a centralized computer tracking system. Subsequently we achieved a high response rate to our request for blood (75%) or saliva samples when blood was not available (82%). With linkage to NCI SEER cancer registry data, we have valid information on the definition of a breast cancer case and detailed information on tumor subtype. With survey data linked to biospecimen information, we have collected comprehensive data to address this study’s research questions, as well as future studies of breast cancer. This is one the largest, population-based case–control studies of young-onset BC. Additionally, to our knowledge, this is the largest population-based case–control study of BC in young NHB women and the largest where extensive life-course individual-level socioeconomic measures were collected to evaluate racial and socioeconomic inequities in BC risk.

Strengths

Strengths of this study include its exclusive focus on young women (aged < 50 years) incorporating information on tumor subtypes [9], and that it is designed to shed light on inequities in risk in young NHB compared to NHW women by life-course SEP. Other strengths include its population-based ascertainment of cases and controls and availability of created sample weights. The centralized YWHHS Coordinating Center synchronized data collection across study sites through conduct of all study interviewer and recruitment training and oversight, and through the study’s centralized tracking system. Other strengths include its in-depth assessment of social context, including residential history and current built environment. Additionally, biomarkers and both inherited genetic factors associated with BC and gene expression changes can be evaluated in this population-based study of young-onset BC—all of which are understudied.

Limitations

Limitations of this study include potential residual recall bias for exposures that could not be validated. The study, however, used methods such as a life calendar, to minimize these issues [65]; life-course exposures were collected with recall aids, and YWHHS was able to validate recalled responses for key exposures, e.g., using measured adult and childhood photos to validate recalled anthropometry. The study sample size also limits our ability to examine young-onset BC risk by some rarer tumor subtypes and within some population subgroups for small effect sizes and more rare exposures; data from this study can be pooled with other studies to evaluate these questions. The timing of blood sample collection also prohibits examination of factors potentially affected by treatment or “case” status, though extensive information was collected to allow the study of these potential influences. Additionally, information on “race” is ultimately self-reported but was originally based on the SEER registry for cases. SEER registry reports of “race” and “Hispanic ethnicity,” however, are highly correlated with self-report [66, 67].

An additional limitation could be the study response rates; however, complete enumeration of cases in the SEER registry and 86% enumeration of sampled control households enabled us to incorporate non-response sample weights to mitigate this limitation. Declining response rates for national-level surveys, particularly telephone surveys, are well documented over the course of the survey period, and the challenges that caused this decline in rates also contributed to reduced response rates for YWHHS cases and controls [68]. Study response rates are, however, well within ranges reported in the literature [53, 69, 70], particularly for the data collection time period, participants’ ages, and the well-recognized challenges in enrolling disadvantaged populations [71, 72]. We found that women were more willing to participate when interviewers were similar in race and age (data not shown) [71, 73] and that response rates may have been lower among White women in Detroit due to interviewer-participant age incongruence. Recruitment and scheduling challenges included that women who were juggling childcare, work, other family responsibilities or challenging cancer treatment regimens often rescheduled interviews. To address these obstacles exclusive telephone schedulers were hired, targeted letters were mailed to address concerns regarding confidentiality and time constraints, in-person follow-up visits were attempted with controls in Detroit and cases and controls in LA, and the study incentive was increased.

Future directions

Analyses using collected YWHHS data are in progress. Additional supplemental projects are possible, including pooling of data, particularly to study rarer tumor subtypes, studies to evaluate risk for other BC tumor subtypes, to study factors associated with mammograms and BC survival, to study biomarkers, e.g., gene expression, to integrate external data with data on geocoded life-course residential histories, and/or to evaluate intermediate biomarkers and BC risk. Results from YWHHS will expand our understanding of potentially modifiable factors associated with BC risk overall and by subtype and should identify sources of racial and socioeconomic inequities in young-onset BC.