Theory, methods, and operational results of the Young Women’s Health History Study: a study of young-onset breast cancer incidence in Black and White women

Purpose The etiology of young-onset breast cancer (BC) is poorly understood, despite its greater likelihood of being hormone receptor-negative with a worse prognosis and persistent racial and socioeconomic inequities. We conducted a population-based case–control study of BC among young Black and White women and here discuss the theory that informed our study, exposures collected, study methods, and operational results. Methods Cases were non-Hispanic Black (NHB) and White (NHW) women age 20–49 years with invasive BC in metropolitan Detroit and Los Angeles County SEER registries 2010–2015. Controls were identified through area-based sampling from the U.S. census and frequency matched to cases on study site, race, and age. An eco-social theory of health informed life-course exposures collected from in-person interviews, including socioeconomic, reproductive, and energy balance factors. Measured anthropometry, blood (or saliva), and among cases SEER tumor characteristics and tumor tissue (from a subset of cases) were also collected. Results Of 5,309 identified potentially eligible cases, 2,720 sampled participants were screened and 1,812 completed interviews (682 NHB, 1140 NHW; response rate (RR): 60%). Of 24,612 sampled control households 18,612 were rostered, 2,716 participants were sampled and screened, and 1,381 completed interviews (665 NHB, 716 NHW; RR: 53%). Ninety-nine% of participants completed the main interview, 82% provided blood or saliva (75% blood only), and SEER tumor characteristics (including ER, PR and HER2 status) were obtained from 96% of cases. Conclusions Results from the successfully established YWHHS should expand our understanding of young-onset BC etiology overall and by tumor type and identify sources of racial and socioeconomic inequities in BC. Supplementary Information The online version of this article contains supplementary material available (10.1007/s10552-021-01461-x).


Introduction
In the United States (US), nearly one quarter of annual breast cancer (BC) cases occur in women under 50 years of age and the incidence is increasing [1,2]. The etiology of BC varies by age [3,4] and is poorly understood in young-onset BC [3,[5][6][7][8]. Breast tumors are also now recognized to have different histopathologic and molecular characteristics with heterogeneous etiology, prognosis, and treatment [9][10][11][12]. Tumors in young women are also more likely to present at a later stage, have a worse prognosis, and be hormone receptor-negative (HR-) [13][14][15]. Non-Hispanic White (NHW) and non-Hispanic Black (NHB) women have the highest incidences of BC in the U.S. [2] and racial and socioeconomic inequities in BC also persist [16][17][18].
Racial inequities exist in the U.S. in overall BC mortality and incidence, particularly in younger women, and there are unequal distributions of tumor subtypes. Overall BC mortality was 40% higher in NHB compared to NHW women during 2013-2017 [17] and this inequity is particularly pronounced among women < 50 years of age, where mortality was 82% higher in NHB compared to NHW women in 2018 [19]. Though overall incidence of BC among NHB women has historically been lower than NHW women, rates are now nearly equal [2], and among the youngest women (aged < 40 years) incidence rates have consistently been higher among Black women [2,20]. Among women < 50 years of age, NHB women also had a 90% higher incidence of the most aggressive HR-/HER2-(i.e., triple-negative (TNBC)) tumors compared to NHW women in 2012-2016 [2]. Studies examining racial residential segregation have observed that among Black women, both a lower [21] and higher [22] proportion of Black residents in census tracts is associated with a higher odds of TNBC. Everyday experiences of discrimination have also been associated with increased incidence of BC among Black women, particularly among those aged < 50 years [23], potentially contributing to an explanation for observed patterns of racial residential segregation and TNBC [22].
Socioeconomic inequities in BC mortality and incidence also exist. Poorer women have historically had lower mortality from BC at all ages [18]; however, mortality from BC has steadily increased since 1950 among women residing in disadvantaged census tracts and decreased among women in affluent tracts [18]) such that, by 2013, BC mortality in the most disadvantaged tracts was 6% higher than in the most affluent tracts [18]. The incidence of BC overall has also increased among women residing in the most disadvantaged counties more rapidly than among women in the most affluent counties: from 1981-1990 to 2001-2010, incidence increased by 15% in the most disadvantaged and only by 9% in the most affluent counties [24]. Black and White women residing in the most disadvantaged counties (> 20% poverty) also had a higher prevalence of HR-BC relative to women residing in wealthier counties (< 10% poverty) in 2004-2007 [25]; this is most pronounced for NHB compared to NHW women < 50 years old (HR-/HR + ratio = 1.51, 95% Confidence Interval (CI): 1.20, 1.90) [25]. Similar patterns are seen at the census-tract level: women residing in tracts with intermediate and low compared to high socioeconomic status index had 1.81 (95% CI 1.20, 2.71) and 1.95 (95% 1.27-2.99) relative risk ratios for TNBC, respectively, in 2005-2017 [21].
Few modifiable factors have been identified to inform BC prevention strategies [26], particularly in young women [9,[27][28][29][30][31] and by tumor type [9,13], or to explain racial and socioeconomic inequities in BC incidence [32][33][34]. We conducted a population-based case-control study of BC risk among NHB and NHW women aged < 50 years old from diverse socioeconomic backgrounds in the US: The Young Women's Health History Study (YWHHS). Our research is informed by an eco-social theory of health, which situates health outcomes-particularly those between groupswithin a complex socio-historical context; eco-social theory seeks to identify the pathways through which that context is embodied [35,36]. Further, we recognize racism is a potent social determinant that continues to regulate differences in exposures to socioeconomic and other opportunities by race, thereby contributing to racial health inequities in the U.S. [37,38]. We hypothesize that socio-cultural factors related to race and socioeconomic position determine exposures over the life-course (e.g., reproductive and energy balance factors) that modify biology and, in turn, risk for young-onset BC tumor types ( Fig. 1) [22,36,[38][39][40][41][42][43][44]. In this paper, we document details of the YWHHS study design, life-course measures collected, data collection methods, response and cooperation rates, and provide a description of our final study population.

Overall study objectives
The primary objectives of the YWHHS were to provide insight into modifiable early life and life-course factors associated with young-onset (< 50 years) BC risk and to understand racial and socioeconomic inequities in BC risk in the U.S. [40,[44][45][46][47]. We are investigating: (1) the association 1 3 between early life and life-course factors and risk for BC overall and by tumor subtypes among young NHB and NHW women [9,[27][28][29][30][31][32], (2) the potentially modifying effects of the socio-historic context of race/ethnicity (hereafter "race") and life-course socioeconomic position (SEP) on BC risk, and have also (3) created a bio-repository of blood (or saliva) and breast tumor tissue for current and future study of the contribution of biomarkers, gene-environment interactions, and gene expression on BC risk in young women.

Overall study design
BC cases were identified from the metropolitan Detroit (Oakland, Wayne, and Macomb counties) and Los Angeles County Surveillance, Epidemiology and End Results (SEER) registries diagnosed between 2010 and 2015. Controls were identified through area-based sampling from the 2010 Census and matched to cases by study site, age, and race. Primary data collected included: (1) an in-person computer-assisted personal interview (CAPI) conducted with a life history calendar, (2) anthropometric measurements, (3) blood collection (or saliva when not available) and related questionnaire, (4) SEER tumor type information, including ER, PR and HER2 status, and (5) breast tumor tissue collected from participants' BC surgeries. Additional collected data included: (6) an interviewer-completed built environment survey of participants' neighborhoods, (7) a survey completed by participants' primary childhood caregiver, and (8) childhood photos of body size. We also requested (9) permission to obtain information from the health department(s) where women gave birth and (10) where she was born, and (11) most recent mammogram reports from healthcare providers. Participation in the main study questionnaire was necessary for enrollment; all other study components were optional. This study protocol was approved by the Institutional Review Boards at the University of Wisconsin-Milwaukee (UWM); Michigan State University (MSU); Wayne State University (WSU); the Michigan Department of Community Health; University of Southern California (USC); the California Committee for the Protection of Human Subjects (CPHS); and for the Medical College of Wisconsin (MCW), IRB oversight was deferred to UWM. The California Cancer Registry also approved the study.

Study organization
The YWHHS Coordinating Center (initially hosted at MSU, moved to UWM in 2014) were responsible for study design, development, and oversight of the study tracking system. Westat, a research services corporation, and study collaborators developed the control sampling design, oversaw identification and recruitment of control participants, and created final study sample weights. Final recruitment, in-person interviews, and biospecimen collection were conducted at two field sites: Los Angeles County (at USC) and metropolitan Detroit (at WSU). A community advisory panel was assembled and consulted about data collection materials and study methodologies. Eligibility criteria (see Table 1)

Study tracking system
A centralized computer system that tracked all corresponding study data and biospecimens was adapted and managed for YWHHS by the USC Cancer Research Informatics Core (CRIC).

Ascertainment, sampling, recruitment, and screening
Ascertainment, sampling, recruitment, and screening activities for cases and controls are outlined in Fig. 2.

Cases
Potentially eligible cases were identified by the Metropolitan Detroit Cancer Surveillance System (MDCSS) SEER registry and the LA County Cancer Surveillance Program (CSP) SEER registry. For both sites, cases were identified through rapid case ascertainment (RCA), which aims to identify cases within 3-6 months after diagnosis. Case screener interview. All sampled cases were screened to determine final eligibility status. Cases not successfully screened by a study site team were checked against the updated SEER Registry to determine eligibility status. Cases initially sampled were considered ineligible for the following reasons: not U.S.-born (n = 373), self-identified as neither White nor Black (n = 153), self-identified as Hispanic (n = 151), previous cancer diagnosis (n = 117), resided outside of the study areas at reference date (see definition of reference date in Table 1; n = 50), tumor had ineligible histology (n = 44), did not speak English (n = 29), updated age or reference date was out-of-range (n = 17), physically or mentally unable to complete the interview (n = 14), or institutionalized at reference date (n = 7). Two percent of cases were ineligible for screening for one or more of these reasons. In Detroit, a letter was sent to each eligible case's physician before cases were contacted; if the physician did not respond within three weeks the case could be contacted, except for a few Detroit hospitals that required active physician approval.

Controls
YWHHS investigators and the Westat team developed the area-based control sampling strategy and Westat developed the statistical sampling methodology [48,49]. Westat also oversaw control identification and recruitment, household rostering, screener interviews, and initiated control recruitment efforts. Once potentially eligible controls were identified, their contact information was provided to the YWHHS Coordinating Center to be entered into the study tracking database for recruitment.

Control sampling.
A three-stage area probability sample was conducted to provide coverage of metropolitan Detroit and LA County from which YWHHS case participants were identified (see Supplemental Materials). The first stage of sample selection was that of PSUs (Primary Sampling Units) consisting of one or more Census blocks as identified in the U.S. Census conducted in 2010. Within sampled PSUs, the second stage was the sampling of approximately 24,000 + addresses from listings based on addresses served by the U.S. Postal Service. Households within occupied sampled addresses were rostered to identify members who were potential controls for the study. The third stage of sample selection involved randomly selecting women from among those potentially eligible. The sampling rates Table 1 Eligibility criteria for cases of breast cancer and controls, Young Women's Health History Study 1 For cases, race/ethnicity was initially determined by SEER-derived from medical report or hospital admissions. Participants with "Hispanic" or "Arab American" last names based on SEER last name lists [74] at both study sites and participants with "Asian" last names based on SEER lists in LA County were considered ineligible 2 For controls, race/ethnicity was initially reported on the household roster (potentially by proxy) based on Census 2010 as "Hispanic or Latina origin" and as many races as applied: "Black/African American, White, Asian, Native Hawaiian/other Pacific Islander, American Indian/Alaska Native, or Other [51]. Westat also applied SEER Hispanic surname lists in LA. Final race/ethnicity determination was self-reported on the screener. Participants were asked to report their ethnicity as "Hispanic or Latina origin," and then to select the race they identified with most: "Black or African American; White; American Indian or Native American or Alaska Native; Arab American or Chaldean; East Asian or Southeast Asian; Asian Indian or South Asian; Native Hawaiian or other Pacific Islander; Some Other Group; Refused; Don't know." Participants who did not identify as "Hispanic or Latina origin" and those who identified as "Black or African American" or "White" were considered eligible NOTE: We use the terms Black and African American interchangeably [75] Cases Controls employed were designed to obtain a set of controls that were frequency matched to the expected case distribution within study site by race (NHB/NHW) and 5-year age intervals.

Control household roster.
A total of 24,612 households were sampled (Table 2) and 21,668 were determined eligible for roster. An introductory letter, brief roster, and a $2 bill were mailed to all sampled residential addresses. The same follow-up household contact recruitment protocol was then used as the National Health and Nutrition Examination Survey [50]. A total of 18,612 household were rostered. The roster asked the initials/name, age, and race/ethnicity of all adult women 20-50 years old in the household (see Supplementary Materials for additional details).

Control screener interview.
An in-person screener interview was conducted to determine the final eligibility of potentially eligible women identified and sampled from the household roster. Those who completed the screener received $5. Respondents willing to participate or interested in learning more were asked to provide their contact information for a study site (WSU/USC) interviewer to contact them.

Data collection
In-home case-control interview recruitment. An introductory letter and study brochure were sent to all sampled case and control women. After sending the introductory letter, study staff (WSU/USC) telephoned women to determine (cases) or confirm (controls) eligibility, answer questions, and identify a location and time for an in-person interview.
Women not reached by phone were sent follow-up letters and reminder postcards, and, in some cases, in-person visits.
Women who declined to participate were asked to complete a brief questionnaire about demographic characteristics to characterize non-respondents.

In-person interview scheduling and informed consent.
Study participants were interviewed at their selected location. Prior to interview, participants were mailed a confirmation letter and their interviewer's business card with a photograph. Before the interview, the participant was asked to read and sign a consent form that described the study and participant rights and safeguards; it also requested permission to conduct the interview and each component of the study. Women were informed they could refuse any questions and terminate the interview at any time. Women who had a mammogram were asked to complete a separate consent form that requested permission to obtain information from her healthcare provider about her last mammogram before reference date. Additionally, case participants were asked to provide consent to obtain tumor tissue sampled at the time of diagnosis or thereafter. A thank you gift of  Table 2 Overall ascertainment numbers by race and site, Young Women's Health History Study Sampled/potentially eligible cases who did not complete a telephone screener and were determined to be ineligible based on SEER information c Active physician approval required by specific hospitals among a subset of case participants in Detroit d For efficiency, 60% of households identified by the Westat address list vendor as likely to include at least one "Hispanic" adult were randomly excluded. Information from the other 40% was used to impute adjusted sampling values e Non-response households include those that refused, that were not reached after maximal contacts, that were locked buildings staff were unable to enter, or where language barriers existed f 7% and 18% of in-person rosters were completed by neighbors in Detroit and Los Angeles, respectively g Households containing more than one potentially eligible and sampled woman (Detroit: 49; LA: 121) h Of sampled potentially eligible participants, five lacked "race" values, 2 reported not knowing their self-selected "race," and seven refused to report a "race" value i If participants lacked a self-reported "race" value at the screener level, their reported "race" value from the household roster was used instead j Of potentially eligible participants who did not complete a screener, 4 were missing "race" values, 1 reported not knowing their self-selected "race," and 6 refused to report a "race" value k Of participants who completed a screener, 1 reported not knowing their self-selected "race" and 1 refused to report a "race" value l Includes participants lost (n = 6) or unable to contact to schedule an interview (n = 171) m Households containing more than one eligible and interviewed participant (Detroit: 20; LA: 43) $75, which was later increased to $100, was provided for the main interview.
Main questionnaire. The YWHHS questionnaire captured information about energy balance factors (e.g., childhood and adult diet, physical activity, and adult body size), factors known to affect life-course energy balance (e.g., food security, sleep patterns, built environment), known risk factors for BC (e.g., reproductive and family history), as well as race/ethnicity and life-course socioeconomic indicators.
Collected information related to race/ethnicity includes selfreported race and Hispanic ethnicity, as well as the race/ ethnicity others typically ascribe to the participant. We also asked about early life discrimination, experiences of everyday discrimination and the source of discrimination. Lifecourse socioeconomic indicators include residential history, household percent poverty (HPP), educational attainment, and occupational status [51,52]. HPP was calculated using household net income adjusted for household size. Other factors associated with social context collected include lifecourse experiences of adversity (including childhood experiences), financial status and use of governmental subsidies, food insecurity, occupational status, and health insurance status. Other information on factors potentially associated with BC risk include prenatal exposures, medical history, non-steroidal anti-inflammatory medication use, contraceptive use, hormone medication use, fertility history, and lifecourse personal and secondhand tobacco exposure, as well as alcohol use. Study questions were developed based on existing questionnaires [53][54][55][56][57].
Multiple tools were used throughout the questionnaire to assist participants with recall, including a life history calendar of key life events [58], showcards, which also provided a non-verbal method of responding to sensitive questions, and a photobook of oral contraceptive, hormone, and thyroid medications [58].
Additional components of the in-person interview: anthropometric assessment. Height, weight, waist circumference, and body composition (assessed by Tanita bioelectrical impedance analysis (BIA)) were measured. Diet. A modified version of the full 100-item Block Food Frequency Questionnaire (FFQ) was developed by NutritionQuest (Berkeley, CA) with the study PI (Velie) to capture total diet and foods suspected to be associated with BC risk (e.g., cruciferous vegetables) in the 12 months prior to reference date. The FFQ was administered on paper or verbally during the interview; those who did not complete it at the interview returned it via mail or at the phlebotomy visit. Childhood diet was assessed with a food list. Childhood photographs. Participants provided photos from "head to toe" at ages 6,9,12,15, and 18 years to validate recalled relative body size (assessed by somatotype); photos were scanned and de-identified by digitally masking the participant's eyes/face, if requested. Built environment survey. Interviewers conducted a survey of neighborhood characteristics, primarily at the time of the interview [59,60]. Surveys not completed by the end of study recruitment (6.5%) were conducted remotely via Google Maps Street View using photos collected at the date closest to the interview date [61]. Primary caregiver survey Participants were asked to mail their primary childhood caregiver a brief survey. Caregivers were given $10. The survey included respondent's demographics, biologic mother's pregnancy with the participant, and the study participant's childhood body size, physical activity, and SEP.

Biospecimen collection
Blood. All study participants were asked to provide a blood sample. Samples were collected by a phlebotomist, generally at the second visit (96%, 4% at first visit). Phlebotomists attempted to obtain 30 mL (approximately 2 tablespoons) collected in four 10-mL vacutainers: two with no additive and two with EDTA. For cases, our protocol indicated samples should not be collected until at least two months after last treatment (average days post treatment = 376 days; 95% CI 353.9, 398.6). Participants who provided blood samples were originally given a $20 thank you gift, which was later increased to $25. Samples were processed at the MSU Cytogenics laboratory and MCW Tissue Bank.

Blood Questionnaire.
Phlebotomists administered a questionnaire to each participant at the time of blood draw. Questions addressed recent medication use; medical history; menstrual, pregnancy, and lactation status; and recent food, beverage, alcohol, and tobacco consumption.

Menstrual calendar.
During the main interview, if a participant reported menstruating within the past year and if they consented to have their blood drawn, they were asked to complete a menstrual calendar that indicated each day they experienced menstrual bleeding until the date their blood was drawn. If participants had not completed this calendar at the time of blood draw, the phlebotomist completed it with the participant for the preceding two months.
Menstrual postcard. At the end of the blood draw, menstruating participants were given a pre-addressed stamped postcard, and asked to record the date of the first day of their next menstrual cycle and mail it; this information was used to determine the participant's menstrual phase at the time her blood was drawn.

Saliva.
Participants unwilling or unable to provide a blood sample were asked to provide a saliva sample with the Oragene OG-500 DNA kit. Saliva samples were collected immediately after administration of the main questionnaire, by the phlebotomist at the second visit, or mailed to the participant after the first visit and returned by mail.
Tumor SEER Information. Tumors were characterized by ER, PR, and HER2 molecular subtypes, and histological grade to differentiate luminal A and luminal B tumors using data from SEER registries [11]. SEER reports also included ICD-O codes, tumor size, laterality, lymph node involvement, and initial treatment and surgical history.
Tumor Tissue. To evaluate other tumor characteristics, e.g., Ki-67 status [11], tumor tissue from consenting cases was requested from hospitals or clinics where they were stored; when possible, tumor samples were taken before treatment. When adequate tissue was provided, tumor microarrays (TMAs) were created.
Biospecimen storage. All blood, saliva, and tumor tissue biospecimens are stored at the MCW Tissue Bank as part of the YWHHS Biorepository. Separate biomarker studies will be conducted with all collected biospecimens.

Interviewer Training and Quality Control Measures
Control recruitment interviewer training. Control field interviewers were employees of Westat. Interviewers from both study sites were trained together to synchronize data collection. Once they demonstrated adherence to all protocols they were certified for data collection.

Study site interviewer and phlebotomist training.
Training was conducted by the YWHHS Coordinating Center to synchronize data collection. All field staff completed appropriate IRB-mandated training and field safety training and were certified by the YWHHS Coordinating Center once they demonstrated adherence to all protocols and competence in a complete study interview.

Main interview and phlebotomy quality control.
Interviews and phlebotomy visits of consenting participants were audio recorded for quality control. The first five recorded interviews completed by each interviewer and additional interviews as needed based on performance (4.8% in Detroit; 2.6% in LA of completed interviews) were reviewed by a trained evaluator. The evaluator documented discrepancies in recorded responses, deviations from protocol, and appropriate probing, and provided detailed feedback to each interviewer.

Study response and cooperation rate calculations
Response and cooperation rates were calculated using imputation methods in accordance with the American Association for Public Opinion (AAPOR) guidelines [62] (see Supplemental Tables 1 and 2).

Sample weights
Sample weights were created for both cases and controls to account for sampling design and non-response. Weights reflect probabilities of selection and adjustments for nonresponse. Adjustments for non-response were done at the screener and main interview levels. To achieve the frequency matching of controls to cases, a weighted distribution of cases for each study site was established across cells of age and race. The sample weights of controls were then poststratified to the weighted totals within each of these cells [63]. Additionally, replicate weights were created to develop estimates of variability, including standard errors. Demographic characteristics were obtained for 86% of sampled controls (complete roster information), and 100% of sampled case participants (age, race, site, county, ER status) to inform non-response weights. Replicate weights were created for case-control analyses and case-only analyses. A second set of weights was created for control-only analyses, to weight controls to the source population. Replicate weights were also created for blood sample analyses.

Statistical analyses
Primary analyses are conducted using survey weighted multiple logistic regression to account for study design and potential confounding. Where appropriate, potential effect modification by study site, race and/or socioeconomic position are being evaluated. For some analyses, structural equation modeling (SEM) with latent variables is being conducted to evaluate exposures over the life-course [64]. Additionally, for some analyses we are using survey weighted polytomous logistic regression to assess heterogeneity in risk by tumor subtypes.

Case participation
A total of 5,309 potentially eligible women were identified through the Detroit (n = 2,527) and LA (n = 2,782) SEER registries ( Table 2). Of these, 80% were sampled (see Case Sampling), and 3,326 were determined to be eligible or potentially eligible (Table 2). Among sampled cases, 124 women died before they could be interviewed and 82 could not be contacted because physician or hospital permission was not obtained. Other reasons for non-interview included: 177 could not be located, 70 moved away from the study area, 23 were too ill, and 415 did not respond after maximum contact attempts. Of the 3,326 sampled and potentially eligible participants, study staff had the opportunity to recruit 2,435 participants. Of these, 623 declined to participate, and 1,812 women were interviewed (ER + n = 1,310; ERn = 437). The overall cooperation rate was 74.4% (Detroit: 71.9%, LA: 77.2%) and response rate was 59.8% (Detroit: 53.1%, LA: 66.4%) (Supplemental Table 1). Response rates were higher for NHB women (60.2%) than NHW women (59.8%), and for LA (66.4%) than Detroit (53.1%) (Supplemental Table 1), but did not vary significantly by age (Supplemental Table 2).

Control participation
A total of 24,612 households were sampled in Detroit (n = 9,994) and LA (n = 14,618) ( Table 2). Of these, 21,668 were eligible or potentially eligible and 18,612 households completed a roster (86% response rate) (Supplemental Table 1). Households not rostered because they were in an inaccessible gated community included in LA 9% and Detroit 1% of potentially eligible households. Of households that completed rosters, 3,414 participants were sampled and 2,720 completed screeners (88% response rate, Supplemental Table 1). Reasons that screeners were not obtained were the following: resided outside the study area (n = 24), was too ill (n = 2), was not reached after maximum attempts (n = 132), or sampled in error (n = 9). Of the 3,247 participants sampled for screening that interviewers had the opportunity to screen, 83.6% were screened. Of these, 1,988 were eligible or potentially eligible and 97.2% agreed to be contacted by study site staff. Thus, Westat provided control participant information for 1,933 women. Of these, study site staff had no opportunity to interview 223 women for the following reasons: 12 were ineligible, 2 died before interview, 6 could not be located, 30 moved away from the study area, 2 were too ill, and 171 were not reached after the maximum number attempts. Thus, 1,708 participants were confirmed to be eligible and agreed to be contacted by the study site staff. Of these, 327 women refused to participate in the study (4% via proxy) and 1,381 completed the main interview (Table 2). Accounting for the household roster cooperation rate (94%), screener cooperation rate (84%), and study site recruitment cooperation rate (81%), the overall study cooperation rate was 65% (Supplemental Table 1). Similarly, taking into account the household roster response rate (86%), the participant screener response rate (88%), Westat agreed to be contacted response rate (98%), and the study site recruitment response rate (72%) led to an overall control response rate of 53% (supplemental Table 1). Response rates were higher for NHB women (57.9%) compared to NHW women (48.3%), and for LA (58.5%) compared to Detroit (49.3%) (Supplemental Table 1) but did not vary significantly by age (Supplemental Table 2).

Location of completed interviews
A total of 73.2% and 80.8% of interviews were conducted inhome, 3.4% and 3.0% were conducted at a study site office, and 23.5% and 16.2% were conducted at other locations (e.g., a coffee shop, local library, or healthcare provider's office) for cases and controls, respectively. Distributions of interview locations were similar across study sites.

Interview timing
The median period between reference date and interview date was 153 days for controls and 378 days for cases (Supplemental Table 3).

Length of main questionnaire
The questionnaire included 639 questions (excluding probing questions and repeat questions about exposures over the life-course). The median administration time of the questionnaire was 130 and 120 min for cases and controls, respectively (Supplemental Table 4). The median duration of the measured anthropometry section was 11 min for both cases and controls (Supplemental Table 4). Interview time for study participants was longer for NHB women (141 min) compared to NHW women (119 min) and for poorer women (HHP < 150; 132 min) compared to wealthier women (HHP ≥ 300; 120 min). Table 3 shows the weighted demographic characteristics of interviewed study participants. Cases were more likely to be wealthier than controls (52.0% vs. 46.3% HHP ≥ 300) and less likely to be unemployed (17.9% vs. 25.9%). Participants were similar across study sites, although both NHB and NHW women were more likely to be poor (HHP < 150%) in Detroit than LA. NHB women across both study sites were also significantly more likely to be poor (35.1% cases; 49.1% controls) compared to NHW women (12.3% cases; 15.8% controls) ( Table 3).

Completion of study components
Response rates for all ancillary data collection efforts and for biospecimen collection are reported in Table 4. Nearly Table 3 Weighted      all participants completed the main interview (99%) and provided anthropometry measurements (95% of cases and 96% of controls). Most also provided blood samples (75% of cases and controls), or if blood was not provided, saliva (84% of cases and 81% of controls provided blood or saliva). In addition, 60% of women with BC who consented to allow us to retrieve tumor tissue had tissue available for analysis and thus far, of available participant tumor tissue, 58% has been retrieved (n = 660). Nearly all interviewed participants (97%) agreed to be contacted in the future.

Discussion
We successfully conducted the YWHHS: a large population-based case-control epidemiologic study based on the eco-social theory of disease etiology [42] to identify potentially modifiable factors associated with young-onset BC overall and by molecular tumor subtypes, and to investigate racial and socioeconomic inequities in BC among NHB and NHW young women. For the extensive in-person interview (median time 120-130 min), we achieved a 60% response rate among cases and 53% response rate among controls, and the cooperation rate, among those we had the opportunity to interview was 74% among cases and 65% among controls. This was achieved through extensive follow-up efforts with the use of a centralized computer tracking system. Subsequently we achieved a high response rate to our request for blood (75%) or saliva samples when blood was not available (82%). With linkage to NCI SEER cancer registry data, we have valid information on the definition of a breast cancer case and detailed information on tumor subtype. With survey data linked to biospecimen information, we have collected comprehensive data to address this study's research questions, as well as future studies of breast cancer. This is one the largest, population-based case-control studies of youngonset BC. Additionally, to our knowledge, this is the largest population-based case-control study of BC in young NHB women and the largest where extensive life-course individual-level socioeconomic measures were collected to evaluate racial and socioeconomic inequities in BC risk.

Strengths
Strengths of this study include its exclusive focus on young women (aged < 50 years) incorporating information on tumor subtypes [9], and that it is designed to shed light on inequities in risk in young NHB compared to NHW women by life-course SEP. Other strengths include its populationbased ascertainment of cases and controls and availability of created sample weights. The centralized YWHHS Coordinating Center synchronized data collection across study sites through conduct of all study interviewer and recruitment training and oversight, and through the study's centralized tracking system. Other strengths include its in-depth assessment of social context, including residential history and current built environment. Additionally, biomarkers and both inherited genetic factors associated with BC and gene expression changes can be evaluated in this population-based study of young-onset BC-all of which are understudied.

Limitations
Limitations of this study include potential residual recall bias for exposures that could not be validated. The study, however, used methods such as a life calendar, to minimize these issues [65]; life-course exposures were collected with recall aids, and YWHHS was able to validate recalled responses for key exposures, e.g., using measured adult and childhood photos to validate recalled anthropometry. The study sample size also limits our ability to examine youngonset BC risk by some rarer tumor subtypes and within some population subgroups for small effect sizes and more rare exposures; data from this study can be pooled with other studies to evaluate these questions. The timing of blood sample collection also prohibits examination of factors potentially affected by treatment or "case" status, though extensive information was collected to allow the study of these potential influences. Additionally, information on "race" is ultimately self-reported but was originally based on the SEER registry for cases. SEER registry reports of "race" and "Hispanic ethnicity," however, are highly correlated with self-report [66,67]. An additional limitation could be the study response rates; however, complete enumeration of cases in the SEER registry and 86% enumeration of sampled control households enabled us to incorporate non-response sample weights to mitigate this limitation. Declining response rates for national-level surveys, particularly telephone surveys, are well documented over the course of the survey period, and the challenges that caused this decline in rates also contributed to reduced response rates for YWHHS cases and controls [68]. Study response rates are, however, well within ranges reported in the literature [53,69,70], particularly for the data collection time period, participants' ages, and the well-recognized challenges in enrolling disadvantaged populations [71,72]. We found that women were more willing to participate when interviewers were similar in race and age (data not shown) [71,73] and that response rates may have been lower among White women in Detroit due to interviewer-participant age incongruence. Recruitment and scheduling challenges included that women who were juggling childcare, work, other family responsibilities or challenging cancer treatment regimens often rescheduled interviews. To address these obstacles exclusive telephone schedulers were hired, targeted letters were mailed to address concerns regarding confidentiality and time constraints, in-person follow-up visits were attempted with controls in Detroit and cases and controls in LA, and the study incentive was increased.

Future directions
Analyses using collected YWHHS data are in progress. Additional supplemental projects are possible, including pooling of data, particularly to study rarer tumor subtypes, studies to evaluate risk for other BC tumor subtypes, to study factors associated with mammograms and BC survival, to study biomarkers, e.g., gene expression, to integrate external data with data on geocoded life-course residential histories, and/or to evaluate intermediate biomarkers and BC risk. Results from YWHHS will expand our understanding of potentially modifiable factors associated with BC risk overall and by subtype and should identify sources of racial and socioeconomic inequities in young-onset BC.
Zhang was supported by the NIH Office of Research on Women's Health and the National Institute of Child Health and Human Development K12HD043488 (Building Interdisciplinary Research Careers in Women's Health, BIRCWH). The ideas and opinions expressed herein are those of the authors and do not necessarily reflect the opinions of the funders.

Availability of data and material
The datasets analyzed during the current study are not publicly available because main study findings are in process of being analyzed, but are available from the corresponding author on reasonable request.

Conflict of interest None declared.
Ethical approval This study protocol was approved by the Institutional Review Boards at the University of Wisconsin-Milwaukee (UWM); Michigan State University (MSU); Wayne State University (WSU); the Michigan Department of Community Health; University of Southern California (USC); the California Committee for the Protection of Human Subjects (CPHS); and for the Medical College of Wisconsin (MCW), IRB oversight was deferred to UWM. The California Cancer Registry also approved the study.
Consent to participate Written informed consent was obtained from all participants included in the YWHHS.

Consent for publication
All participants included in the final YWHHS sample consented to having their data published in scientific publications.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.