Skip to main content

Theory, methods, and operational results of the Young Women’s Health History Study: a study of young-onset breast cancer incidence in Black and White women



The etiology of young-onset breast cancer (BC) is poorly understood, despite its greater likelihood of being hormone receptor-negative with a worse prognosis and persistent racial and socioeconomic inequities. We conducted a population-based case–control study of BC among young Black and White women and here discuss the theory that informed our study, exposures collected, study methods, and operational results.


Cases were non-Hispanic Black (NHB) and White (NHW) women age 20–49 years with invasive BC in metropolitan Detroit and Los Angeles County SEER registries 2010–2015. Controls were identified through area-based sampling from the U.S. census and frequency matched to cases on study site, race, and age. An eco-social theory of health informed life-course exposures collected from in-person interviews, including socioeconomic, reproductive, and energy balance factors. Measured anthropometry, blood (or saliva), and among cases SEER tumor characteristics and tumor tissue (from a subset of cases) were also collected.


Of 5,309 identified potentially eligible cases, 2,720 sampled participants were screened and 1,812 completed interviews (682 NHB, 1140 NHW; response rate (RR): 60%). Of 24,612 sampled control households 18,612 were rostered, 2,716 participants were sampled and screened, and 1,381 completed interviews (665 NHB, 716 NHW; RR: 53%). Ninety-nine% of participants completed the main interview, 82% provided blood or saliva (75% blood only), and SEER tumor characteristics (including ER, PR and HER2 status) were obtained from 96% of cases.


Results from the successfully established YWHHS should expand our understanding of young-onset BC etiology overall and by tumor type and identify sources of racial and socioeconomic inequities in BC.


In the United States (US), nearly one quarter of annual breast cancer (BC) cases occur in women under 50 years of age and the incidence is increasing [1, 2]. The etiology of BC varies by age [3, 4] and is poorly understood in young-onset BC [3, 5,6,7,8]. Breast tumors are also now recognized to have different histopathologic and molecular characteristics with heterogeneous etiology, prognosis, and treatment [9,10,11,12]. Tumors in young women are also more likely to present at a later stage, have a worse prognosis, and be hormone receptor-negative (HR-)[13,14,15]. Non-Hispanic White (NHW) and non-Hispanic Black (NHB) women have the highest incidences of BC in the U.S. [2] and racial and socioeconomic inequities in BC also persist [16,17,18].

Racial inequities exist in the U.S. in overall BC mortality and incidence, particularly in younger women, and there are unequal distributions of tumor subtypes. Overall BC mortality was 40% higher in NHB compared to NHW women during 2013–2017 [17] and this inequity is particularly pronounced among women < 50 years of age, where mortality was 82% higher in NHB compared to NHW women in 2018 [19]. Though overall incidence of BC among NHB women has historically been lower than NHW women, rates are now nearly equal [2], and among the youngest women (aged < 40 years) incidence rates have consistently been higher among Black women [2, 20]. Among women < 50 years of age, NHB women also had a 90% higher incidence of the most aggressive HR-/HER2- (i.e., triple-negative (TNBC)) tumors compared to NHW women in 2012–2016 [2]. Studies examining racial residential segregation have observed that among Black women, both a lower [21] and higher [22] proportion of Black residents in census tracts is associated with a higher odds of TNBC. Everyday experiences of discrimination have also been associated with increased incidence of BC among Black women, particularly among those aged < 50 years [23], potentially contributing to an explanation for observed patterns of racial residential segregation and TNBC [22].

Socioeconomic inequities in BC mortality and incidence also exist. Poorer women have historically had lower mortality from BC at all ages [18]; however, mortality from BC has steadily increased since 1950 among women residing in disadvantaged census tracts and decreased among women in affluent tracts[18]) such that, by 2013, BC mortality in the most disadvantaged tracts was 6% higher than in the most affluent tracts [18]. The incidence of BC overall has also increased among women residing in the most disadvantaged counties more rapidly than among women in the most affluent counties: from 1981–1990 to 2001–2010, incidence increased by 15% in the most disadvantaged and only by 9% in the most affluent counties [24]. Black and White women residing in the most disadvantaged counties (> 20% poverty) also had a higher prevalence of HR- BC relative to women residing in wealthier counties (< 10% poverty) in 2004–2007 [25]; this is most pronounced for NHB compared to NHW women < 50 years old (HR-/HR + ratio = 1.51, 95% Confidence Interval (CI): 1.20, 1.90) [25]. Similar patterns are seen at the census-tract level: women residing in tracts with intermediate and low compared to high socioeconomic status index had 1.81 (95% CI 1.20, 2.71) and 1.95 (95% 1.27–2.99) relative risk ratios for TNBC, respectively, in 2005–2017 [21].

Few modifiable factors have been identified to inform BC prevention strategies [26], particularly in young women [9, 27,28,29,30,31] and by tumor type [9, 13], or to explain racial and socioeconomic inequities in BC incidence [32,33,34]. We conducted a population-based case–control study of BC risk among NHB and NHW women aged < 50 years old from diverse socioeconomic backgrounds in the US: The Young Women’s Health History Study (YWHHS). Our research is informed by an eco-social theory of health, which situates health outcomes—particularly those between groups—within a complex socio-historical context; eco-social theory seeks to identify the pathways through which that context is embodied [35, 36]. Further, we recognize racism is a potent social determinant that continues to regulate differences in exposures to socioeconomic and other opportunities by race, thereby contributing to racial health inequities in the U.S. [37, 38]. We hypothesize that socio-cultural factors related to race and socioeconomic position determine exposures over the life-course (e.g., reproductive and energy balance factors) that modify biology and, in turn, risk for young-onset BC tumor types (Fig. 1) [22, 36, 38,39,40,41,42,43,44]. In this paper, we document details of the YWHHS study design, life-course measures collected, data collection methods, response and cooperation rates, and provide a description of our final study population.

Fig. 1
figure 1

YWHHS conceptual framework: socio-historical context, life-course reproductive and energy balance factors, and breast cancer risk among young non-Hispanic Black and White women


Overall study objectives

The primary objectives of the YWHHS were to provide insight into modifiable early life and life-course factors associated with young-onset (< 50 years) BC risk and to understand racial and socioeconomic inequities in BC risk in the U.S. [40, 44,45,46,47]. We are investigating: (1) the association between early life and life-course factors and risk for BC overall and by tumor subtypes among young NHB and NHW women [9, 27,28,29,30,31,32], (2) the potentially modifying effects of the socio-historic context of race/ethnicity (hereafter “race”) and life-course socioeconomic position (SEP) on BC risk, and have also (3) created a bio-repository of blood (or saliva) and breast tumor tissue for current and future study of the contribution of biomarkers, gene-environment interactions, and gene expression on BC risk in young women.

Overall study design

BC cases were identified from the metropolitan Detroit (Oakland, Wayne, and Macomb counties) and Los Angeles County Surveillance, Epidemiology and End Results (SEER) registries diagnosed between 2010 and 2015. Controls were identified through area-based sampling from the 2010 Census and matched to cases by study site, age, and race. Primary data collected included: (1) an in-person computer-assisted personal interview (CAPI) conducted with a life history calendar, (2) anthropometric measurements, (3) blood collection (or saliva when not available) and related questionnaire, (4) SEER tumor type information, including ER, PR and HER2 status, and (5) breast tumor tissue collected from participants’ BC surgeries. Additional collected data included: (6) an interviewer-completed built environment survey of participants’ neighborhoods, (7) a survey completed by participants’ primary childhood caregiver, and (8) childhood photos of body size. We also requested (9) permission to obtain information from the health department(s) where women gave birth and (10) where she was born, and (11) most recent mammogram reports from healthcare providers. Participation in the main study questionnaire was necessary for enrollment; all other study components were optional. This study protocol was approved by the Institutional Review Boards at the University of Wisconsin—Milwaukee (UWM); Michigan State University (MSU); Wayne State University (WSU); the Michigan Department of Community Health; University of Southern California (USC); the California Committee for the Protection of Human Subjects (CPHS); and for the Medical College of Wisconsin (MCW), IRB oversight was deferred to UWM. The California Cancer Registry also approved the study.

Study organization

The YWHHS Coordinating Center (initially hosted at MSU, moved to UWM in 2014) were responsible for study design, development, and oversight of the study tracking system. Westat, a research services corporation, and study collaborators developed the control sampling design, oversaw identification and recruitment of control participants, and created final study sample weights. Final recruitment, in-person interviews, and biospecimen collection were conducted at two field sites: Los Angeles County (at USC) and metropolitan Detroit (at WSU). A community advisory panel was assembled and consulted about data collection materials and study methodologies.

Eligibility criteria (see Table 1)

Table 1 Eligibility criteria for cases of breast cancer and controls, Young Women’s Health History Study

Study tracking system

A centralized computer system that tracked all corresponding study data and biospecimens was adapted and managed for YWHHS by the USC Cancer Research Informatics Core (CRIC).

Ascertainment, sampling, recruitment, and screening

Ascertainment, sampling, recruitment, and screening activities for cases and controls are outlined in Fig. 2.

Fig. 2
figure 2

Control and case sampling, eligibility, and recruitment: Young Women’s Health History Study


Potentially eligible cases were identified by the Metropolitan Detroit Cancer Surveillance System (MDCSS) SEER registry and the LA County Cancer Surveillance Program (CSP) SEER registry. For both sites, cases were identified through rapid case ascertainment (RCA), which aims to identify cases within 3–6 months after diagnosis.

Case sampling.

We sampled from all eligible NHW 45–49 years of age due to budgetary constraints. Given that there is a paucity of studies among NHB women, the youngest women (< 45 years of age), and women diagnosed with estrogen receptor-negative tumors, we retained all NHB women diagnosed 20–49 years of age, all NHW women 20–44 years of age, and among NHW women aged 45–49 years, oversampled women with estrogen receptor-negative tumors. Thus, all eligible NHB cases 20–49 years of age and NHW cases 20–44 years of age were included, and a sample of NHW cases aged 45–49 years (n = 829 of 2,527 Detroit; n = 883 of 2,782 LA), sampled as follows: between 09/01/2010 and 08/31/2012 30.5% of all NHW 45–49 year old cases; between 08/31/2012 and 08/31/2015 84.5% of ER- cases and 40.8% of ER + tumors.

Case screener interview.

All sampled cases were screened to determine final eligibility status. Cases not successfully screened by a study site team were checked against the updated SEER Registry to determine eligibility status. Cases initially sampled were considered ineligible for the following reasons: not U.S.-born (n = 373), self-identified as neither White nor Black (n = 153), self-identified as Hispanic (n = 151), previous cancer diagnosis (n = 117), resided outside of the study areas at reference date (see definition of reference date in Table 1; n = 50), tumor had ineligible histology (n = 44), did not speak English (n = 29), updated age or reference date was out-of-range (n = 17), physically or mentally unable to complete the interview (n = 14), or institutionalized at reference date (n = 7). Two percent of cases were ineligible for screening for one or more of these reasons. In Detroit, a letter was sent to each eligible case’s physician before cases were contacted; if the physician did not respond within three weeks the case could be contacted, except for a few Detroit hospitals that required active physician approval.


YWHHS investigators and the Westat team developed the area-based control sampling strategy and Westat developed the statistical sampling methodology [48, 49]. Westat also oversaw control identification and recruitment, household rostering, screener interviews, and initiated control recruitment efforts. Once potentially eligible controls were identified, their contact information was provided to the YWHHS Coordinating Center to be entered into the study tracking database for recruitment.

Control sampling.

A three-stage area probability sample was conducted to provide coverage of metropolitan Detroit and LA County from which YWHHS case participants were identified (see Supplemental Materials). The first stage of sample selection was that of PSUs (Primary Sampling Units) consisting of one or more Census blocks as identified in the U.S. Census conducted in 2010. Within sampled PSUs, the second stage was the sampling of approximately 24,000 + addresses from listings based on addresses served by the U.S. Postal Service. Households within occupied sampled addresses were rostered to identify members who were potential controls for the study. The third stage of sample selection involved randomly selecting women from among those potentially eligible. The sampling rates employed were designed to obtain a set of controls that were frequency matched to the expected case distribution within study site by race (NHB/NHW) and 5-year age intervals.

Control household roster.

A total of 24,612 households were sampled (Table 2) and 21,668 were determined eligible for roster. An introductory letter, brief roster, and a $2 bill were mailed to all sampled residential addresses. The same follow-up household contact recruitment protocol was then used as the National Health and Nutrition Examination Survey [50]. A total of 18,612 household were rostered. The roster asked the initials/name, age, and race/ethnicity of all adult women 20–50 years old in the household (see Supplementary Materials for additional details).

Table 2 Overall ascertainment numbers by race and site, Young Women’s Health History Study
Control screener interview.

An in-person screener interview was conducted to determine the final eligibility of potentially eligible women identified and sampled from the household roster. Those who completed the screener received $5. Respondents willing to participate or interested in learning more were asked to provide their contact information for a study site (WSU/USC) interviewer to contact them.

Data collection

In-home case–control interview recruitment.

An introductory letter and study brochure were sent to all sampled case and control women. After sending the introductory letter, study staff (WSU/USC) telephoned women to determine (cases) or confirm (controls) eligibility, answer questions, and identify a location and time for an in-person interview. Women not reached by phone were sent follow-up letters and reminder postcards, and, in some cases, in-person visits. Women who declined to participate were asked to complete a brief questionnaire about demographic characteristics to characterize non-respondents.

In-person interview scheduling and informed consent.

Study participants were interviewed at their selected location. Prior to interview, participants were mailed a confirmation letter and their interviewer’s business card with a photograph. Before the interview, the participant was asked to read and sign a consent form that described the study and participant rights and safeguards; it also requested permission to conduct the interview and each component of the study. Women were informed they could refuse any questions and terminate the interview at any time. Women who had a mammogram were asked to complete a separate consent form that requested permission to obtain information from her healthcare provider about her last mammogram before reference date. Additionally, case participants were asked to provide consent to obtain tumor tissue sampled at the time of diagnosis or thereafter. A thank you gift of $75, which was later increased to $100, was provided for the main interview.

Main questionnaire.

The YWHHS questionnaire captured information about energy balance factors (e.g., childhood and adult diet, physical activity, and adult body size), factors known to affect life-course energy balance (e.g., food security, sleep patterns, built environment), known risk factors for BC (e.g., reproductive and family history), as well as race/ethnicity and life-course socioeconomic indicators. Collected information related to race/ethnicity includes self-reported race and Hispanic ethnicity, as well as the race/ethnicity others typically ascribe to the participant. We also asked about early life discrimination, experiences of every-day discrimination and the source of discrimination. Life-course socioeconomic indicators include residential history, household percent poverty (HPP), educational attainment, and occupational status [51, 52]. HPP was calculated using household net income adjusted for household size. Other factors associated with social context collected include life-course experiences of adversity (including childhood experiences), financial status and use of governmental subsidies, food insecurity, occupational status, and health insurance status. Other information on factors potentially associated with BC risk include prenatal exposures, medical history, non-steroidal anti-inflammatory medication use, contraceptive use, hormone medication use, fertility history, and life-course personal and secondhand tobacco exposure, as well as alcohol use. Study questions were developed based on existing questionnaires [53,54,55,56,57].

Multiple tools were used throughout the questionnaire to assist participants with recall, including a life history calendar of key life events [58], showcards, which also provided a non-verbal method of responding to sensitive questions, and a photobook of oral contraceptive, hormone, and thyroid medications [58].

Additional components of the in-person interview: anthropometric assessment.

Height, weight, waist circumference, and body composition (assessed by Tanita bioelectrical impedance analysis (BIA)) were measured. Diet. A modified version of the full 100-item Block Food Frequency Questionnaire (FFQ) was developed by NutritionQuest (Berkeley, CA) with the study PI (Velie) to capture total diet and foods suspected to be associated with BC risk (e.g., cruciferous vegetables) in the 12 months prior to reference date. The FFQ was administered on paper or verbally during the interview; those who did not complete it at the interview returned it via mail or at the phlebotomy visit. Childhood diet was assessed with a food list. Childhood photographs. Participants provided photos from “head to toe” at ages 6, 9, 12, 15, and 18 years to validate recalled relative body size (assessed by somatotype); photos were scanned and de-identified by digitally masking the participant’s eyes/face, if requested. Built environment survey. Interviewers conducted a survey of neighborhood characteristics, primarily at the time of the interview [59, 60]. Surveys not completed by the end of study recruitment (6.5%) were conducted remotely via Google Maps Street View using photos collected at the date closest to the interview date [61]. Primary caregiver survey Participants were asked to mail their primary childhood caregiver a brief survey. Caregivers were given $10. The survey included respondent’s demographics, biologic mother’s pregnancy with the participant, and the study participant’s childhood body size, physical activity, and SEP.

Biospecimen collection


All study participants were asked to provide a blood sample. Samples were collected by a phlebotomist, generally at the second visit (96%, 4% at first visit). Phlebotomists attempted to obtain 30 mL (approximately 2 tablespoons) collected in four 10-mL vacutainers: two with no additive and two with EDTA. For cases, our protocol indicated samples should not be collected until at least two months after last treatment (average days post treatment = 376 days; 95% CI 353.9, 398.6). Participants who provided blood samples were originally given a $20 thank you gift, which was later increased to $25. Samples were processed at the MSU Cytogenics laboratory and MCW Tissue Bank.

Blood Questionnaire.

Phlebotomists administered a questionnaire to each participant at the time of blood draw. Questions addressed recent medication use; medical history; menstrual, pregnancy, and lactation status; and recent food, beverage, alcohol, and tobacco consumption.

Menstrual calendar.

During the main interview, if a participant reported menstruating within the past year and if they consented to have their blood drawn, they were asked to complete a menstrual calendar that indicated each day they experienced menstrual bleeding until the date their blood was drawn. If participants had not completed this calendar at the time of blood draw, the phlebotomist completed it with the participant for the preceding two months.

Menstrual postcard.

At the end of the blood draw, menstruating participants were given a pre-addressed stamped postcard, and asked to record the date of the first day of their next menstrual cycle and mail it; this information was used to determine the participant’s menstrual phase at the time her blood was drawn.


Participants unwilling or unable to provide a blood sample were asked to provide a saliva sample with the Oragene OG-500 DNA kit. Saliva samples were collected immediately after administration of the main questionnaire, by the phlebotomist at the second visit, or mailed to the participant after the first visit and returned by mail.

Tumor SEER Information.

Tumors were characterized by ER, PR, and HER2 molecular subtypes, and histological grade to differentiate luminal A and luminal B tumors using data from SEER registries [11]. SEER reports also included ICD-O codes, tumor size, laterality, lymph node involvement, and initial treatment and surgical history.

Tumor Tissue.

To evaluate other tumor characteristics, e.g., Ki-67 status [11], tumor tissue from consenting cases was requested from hospitals or clinics where they were stored; when possible, tumor samples were taken before treatment. When adequate tissue was provided, tumor microarrays (TMAs) were created.

Biospecimen storage.

All blood, saliva, and tumor tissue biospecimens are stored at the MCW Tissue Bank as part of the YWHHS Biorepository. Separate biomarker studies will be conducted with all collected biospecimens.

Interviewer Training and Quality Control Measures

Control recruitment interviewer training.

Control field interviewers were employees of Westat. Interviewers from both study sites were trained together to synchronize data collection. Once they demonstrated adherence to all protocols they were certified for data collection.

Study site interviewer and phlebotomist training.

Training was conducted by the YWHHS Coordinating Center to synchronize data collection. All field staff completed appropriate IRB-mandated training and field safety training and were certified by the YWHHS Coordinating Center once they demonstrated adherence to all protocols and competence in a complete study interview.

Main interview and phlebotomy quality control.

Interviews and phlebotomy visits of consenting participants were audio recorded for quality control. The first five recorded interviews completed by each interviewer and additional interviews as needed based on performance (4.8% in Detroit; 2.6% in LA of completed interviews) were reviewed by a trained evaluator. The evaluator documented discrepancies in recorded responses, deviations from protocol, and appropriate probing, and provided detailed feedback to each interviewer.

Study response and cooperation rate calculations

Response and cooperation rates were calculated using imputation methods in accordance with the American Association for Public Opinion (AAPOR) guidelines [62] (see Supplemental Tables 1 and 2).

Sample weights

Sample weights were created for both cases and controls to account for sampling design and non-response. Weights reflect probabilities of selection and adjustments for non-response. Adjustments for non-response were done at the screener and main interview levels. To achieve the frequency matching of controls to cases, a weighted distribution of cases for each study site was established across cells of age and race. The sample weights of controls were then post-stratified to the weighted totals within each of these cells [63]. Additionally, replicate weights were created to develop estimates of variability, including standard errors. Demographic characteristics were obtained for 86% of sampled controls (complete roster information), and 100% of sampled case participants (age, race, site, county, ER status) to inform non-response weights. Replicate weights were created for case–control analyses and case-only analyses. A second set of weights was created for control-only analyses, to weight controls to the source population. Replicate weights were also created for blood sample analyses.

Statistical analyses

Primary analyses are conducted using survey weighted multiple logistic regression to account for study design and potential confounding. Where appropriate, potential effect modification by study site, race and/or socioeconomic position are being evaluated. For some analyses, structural equation modeling (SEM) with latent variables is being conducted to evaluate exposures over the life-course [64]. Additionally, for some analyses we are using survey weighted polytomous logistic regression to assess heterogeneity in risk by tumor subtypes.

Operational results

Case participation

A total of 5,309 potentially eligible women were identified through the Detroit (n = 2,527) and LA (n = 2,782) SEER registries (Table 2). Of these, 80% were sampled (see Case Sampling), and 3,326 were determined to be eligible or potentially eligible (Table 2). Among sampled cases, 124 women died before they could be interviewed and 82 could not be contacted because physician or hospital permission was not obtained. Other reasons for non-interview included: 177 could not be located, 70 moved away from the study area, 23 were too ill, and 415 did not respond after maximum contact attempts. Of the 3,326 sampled and potentially eligible participants, study staff had the opportunity to recruit 2,435 participants. Of these, 623 declined to participate, and 1,812 women were interviewed (ER + n = 1,310; ER- n = 437). The overall cooperation rate was 74.4% (Detroit: 71.9%, LA: 77.2%) and response rate was 59.8% (Detroit: 53.1%, LA: 66.4%) (Supplemental Table 1). Response rates were higher for NHB women (60.2%) than NHW women (59.8%), and for LA (66.4%) than Detroit (53.1%) (Supplemental Table 1), but did not vary significantly by age (Supplemental Table 2).

Control participation

A total of 24,612 households were sampled in Detroit (n = 9,994) and LA (n = 14,618) (Table 2). Of these, 21,668 were eligible or potentially eligible and 18,612 households completed a roster (86% response rate) (Supplemental Table 1). Households not rostered because they were in an inaccessible gated community included in LA 9% and Detroit 1% of potentially eligible households. Of households that completed rosters, 3,414 participants were sampled and 2,720 completed screeners (88% response rate, Supplemental Table 1). Reasons that screeners were not obtained were the following: resided outside the study area (n = 24), was too ill (n = 2), was not reached after maximum attempts (n = 132), or sampled in error (n = 9). Of the 3,247 participants sampled for screening that interviewers had the opportunity to screen, 83.6% were screened. Of these, 1,988 were eligible or potentially eligible and 97.2% agreed to be contacted by study site staff. Thus, Westat provided control participant information for 1,933 women. Of these, study site staff had no opportunity to interview 223 women for the following reasons: 12 were ineligible, 2 died before interview, 6 could not be located, 30 moved away from the study area, 2 were too ill, and 171 were not reached after the maximum number attempts. Thus, 1,708 participants were confirmed to be eligible and agreed to be contacted by the study site staff. Of these, 327 women refused to participate in the study (4% via proxy) and 1,381 completed the main interview (Table 2). Accounting for the household roster cooperation rate (94%), screener cooperation rate (84%), and study site recruitment cooperation rate (81%), the overall study cooperation rate was 65% (Supplemental Table 1). Similarly, taking into account the household roster response rate (86%), the participant screener response rate (88%), Westat agreed to be contacted response rate (98%), and the study site recruitment response rate (72%) led to an overall control response rate of 53% (supplemental Table 1). Response rates were higher for NHB women (57.9%) compared to NHW women (48.3%), and for LA (58.5%) compared to Detroit (49.3%) (Supplemental Table 1) but did not vary significantly by age (Supplemental Table 2).

Main interview

Location of completed interviews

A total of 73.2% and 80.8% of interviews were conducted in-home, 3.4% and 3.0% were conducted at a study site office, and 23.5% and 16.2% were conducted at other locations (e.g., a coffee shop, local library, or healthcare provider’s office) for cases and controls, respectively. Distributions of interview locations were similar across study sites.

Interview timing

The median period between reference date and interview date was 153 days for controls and 378 days for cases (Supplemental Table 3).

Table 3 Weighted demographic characteristics of interviewed participants by site and case–control status, Young Women’s Health History Study (N = 3,193)

Length of main questionnaire

The questionnaire included 639 questions (excluding probing questions and repeat questions about exposures over the life-course). The median administration time of the questionnaire was 130 and 120 min for cases and controls, respectively (Supplemental Table 4). The median duration of the measured anthropometry section was 11 min for both cases and controls (Supplemental Table 4). Interview time for study participants was longer for NHB women (141 min) compared to NHW women (119 min) and for poorer women (HHP < 150; 132 min) compared to wealthier women (HHP ≥ 300; 120 min).

Table 4 Completion rates of study materials by case–control status and race, Young Women’s Health History Study

Description of interviewed study population

Table 3 shows the weighted demographic characteristics of interviewed study participants. Cases were more likely to be wealthier than controls (52.0% vs. 46.3% HHP ≥ 300) and less likely to be unemployed (17.9% vs. 25.9%). Participants were similar across study sites, although both NHB and NHW women were more likely to be poor (HHP < 150%) in Detroit than LA. NHB women across both study sites were also significantly more likely to be poor (35.1% cases; 49.1% controls) compared to NHW women (12.3% cases; 15.8% controls) (Table 3).

Completion of study components

Response rates for all ancillary data collection efforts and for biospecimen collection are reported in Table 4. Nearly all participants completed the main interview (99%) and provided anthropometry measurements (95% of cases and 96% of controls). Most also provided blood samples (75% of cases and controls), or if blood was not provided, saliva (84% of cases and 81% of controls provided blood or saliva). In addition, 60% of women with BC who consented to allow us to retrieve tumor tissue had tissue available for analysis and thus far, of available participant tumor tissue, 58% has been retrieved (n = 660). Nearly all interviewed participants (97%) agreed to be contacted in the future.


We successfully conducted the YWHHS: a large population-based case–control epidemiologic study based on the eco-social theory of disease etiology [42] to identify potentially modifiable factors associated with young-onset BC overall and by molecular tumor subtypes, and to investigate racial and socioeconomic inequities in BC among NHB and NHW young women. For the extensive in-person interview (median time 120–130 min), we achieved a 60% response rate among cases and 53% response rate among controls, and the cooperation rate, among those we had the opportunity to interview was 74% among cases and 65% among controls. This was achieved through extensive follow-up efforts with the use of a centralized computer tracking system. Subsequently we achieved a high response rate to our request for blood (75%) or saliva samples when blood was not available (82%). With linkage to NCI SEER cancer registry data, we have valid information on the definition of a breast cancer case and detailed information on tumor subtype. With survey data linked to biospecimen information, we have collected comprehensive data to address this study’s research questions, as well as future studies of breast cancer. This is one the largest, population-based case–control studies of young-onset BC. Additionally, to our knowledge, this is the largest population-based case–control study of BC in young NHB women and the largest where extensive life-course individual-level socioeconomic measures were collected to evaluate racial and socioeconomic inequities in BC risk.


Strengths of this study include its exclusive focus on young women (aged < 50 years) incorporating information on tumor subtypes [9], and that it is designed to shed light on inequities in risk in young NHB compared to NHW women by life-course SEP. Other strengths include its population-based ascertainment of cases and controls and availability of created sample weights. The centralized YWHHS Coordinating Center synchronized data collection across study sites through conduct of all study interviewer and recruitment training and oversight, and through the study’s centralized tracking system. Other strengths include its in-depth assessment of social context, including residential history and current built environment. Additionally, biomarkers and both inherited genetic factors associated with BC and gene expression changes can be evaluated in this population-based study of young-onset BC—all of which are understudied.


Limitations of this study include potential residual recall bias for exposures that could not be validated. The study, however, used methods such as a life calendar, to minimize these issues [65]; life-course exposures were collected with recall aids, and YWHHS was able to validate recalled responses for key exposures, e.g., using measured adult and childhood photos to validate recalled anthropometry. The study sample size also limits our ability to examine young-onset BC risk by some rarer tumor subtypes and within some population subgroups for small effect sizes and more rare exposures; data from this study can be pooled with other studies to evaluate these questions. The timing of blood sample collection also prohibits examination of factors potentially affected by treatment or “case” status, though extensive information was collected to allow the study of these potential influences. Additionally, information on “race” is ultimately self-reported but was originally based on the SEER registry for cases. SEER registry reports of “race” and “Hispanic ethnicity,” however, are highly correlated with self-report [66, 67].

An additional limitation could be the study response rates; however, complete enumeration of cases in the SEER registry and 86% enumeration of sampled control households enabled us to incorporate non-response sample weights to mitigate this limitation. Declining response rates for national-level surveys, particularly telephone surveys, are well documented over the course of the survey period, and the challenges that caused this decline in rates also contributed to reduced response rates for YWHHS cases and controls [68]. Study response rates are, however, well within ranges reported in the literature [53, 69, 70], particularly for the data collection time period, participants’ ages, and the well-recognized challenges in enrolling disadvantaged populations [71, 72]. We found that women were more willing to participate when interviewers were similar in race and age (data not shown) [71, 73] and that response rates may have been lower among White women in Detroit due to interviewer-participant age incongruence. Recruitment and scheduling challenges included that women who were juggling childcare, work, other family responsibilities or challenging cancer treatment regimens often rescheduled interviews. To address these obstacles exclusive telephone schedulers were hired, targeted letters were mailed to address concerns regarding confidentiality and time constraints, in-person follow-up visits were attempted with controls in Detroit and cases and controls in LA, and the study incentive was increased.

Future directions

Analyses using collected YWHHS data are in progress. Additional supplemental projects are possible, including pooling of data, particularly to study rarer tumor subtypes, studies to evaluate risk for other BC tumor subtypes, to study factors associated with mammograms and BC survival, to study biomarkers, e.g., gene expression, to integrate external data with data on geocoded life-course residential histories, and/or to evaluate intermediate biomarkers and BC risk. Results from YWHHS will expand our understanding of potentially modifiable factors associated with BC risk overall and by subtype and should identify sources of racial and socioeconomic inequities in young-onset BC.

Availability of data and material

The datasets analyzed during the current study are not publicly available because main study findings are in process of being analyzed, but are available from the corresponding author on reasonable request.



American Association for Public Opinion


Breast cancer


Bioelectrical impedance analysis


Computer-assisted personal main interview


Contraceptive and Reproductive Endpoints


California Committee for the Protection of Human Subjects


Cancer Research Informatics Core


Cancer Surveillance Program


Food Frequency Questionnaire


Hormone receptor


Institutional review board


Medical College of Wisconsin


Metro Detroit Cancer Surveillance System


Michigan State University


Non-Hispanic Black


Non-Hispanic White


Quality control


Rapid case ascertainment


Response rate


Surveillance: Epidemiology and End Results


United States


University of Southern California


University of Wisconsin—Milwaukee




Wayne State University


Young Women Health History Study


  1. Ward E et al (2019) Annual Report to the Nation on the Status of Cancer, 1999–2015, Featuring Cancer in Men and Women ages 20–49. J Natl Cancer Inst

  2. DeSantis C et al (2019) Breast cancer statistics, 2019. CA Cancer J Clin 69(6):438–451

    PubMed  Article  Google Scholar 

  3. Warner ET et al (2013) Reproductive factors and risk of premenopausal breast cancer by age at diagnosis: are there differences before and after age 40? Breast Cancer Res Treat 142(1):165–175

    CAS  PubMed  Article  Google Scholar 

  4. White AJ et al (2015) Overall and central adiposity and breast cancer risk in the Sister Study. Cancer 121(20):3700–3708

    PubMed  Article  Google Scholar 

  5. Chollet-Hinton L et al (2016) Breast cancer biologic and etiologic heterogeneity by young age and menopausal status in the Carolina Breast Cancer Study: a case-control study. Breast Cancer Res 18(1):79

    PubMed  PubMed Central  Article  Google Scholar 

  6. Assi H et al (2013) Epidemiology and prognosis of breast cancer in young women. J Thorac Dis 5(1):S2–S8

    PubMed  PubMed Central  Google Scholar 

  7. Nichols HB, Schoemaker MJ, Wright LB, McGowan C, Brook MN, McClain KM, Jones ME, Adami HO, Agnoli C, Baglietto L, Bernstein L (2017) The premenopausal breast cancer collaboration: a pooling project of studies participating in the National Cancer Institute Cohort Consortium. Cancer Epidemiol Prev Biomarks. 26(9):1360–9

    Article  Google Scholar 

  8. Johnson KC, Glantz SA (2008) Evidence secondhand smoke causes breast cancer in 2005 stronger than for lung cancer in 1986. Prev Med 46(6):492–496

    CAS  PubMed  Article  Google Scholar 

  9. Barnard M, Boeke C, Tamimi R (2015) Established breast cancer risk factors and risk of intrinsic tumor subtypes. Biochem Biophys Acta 1856(1):73–85

    CAS  PubMed  Google Scholar 

  10. Perou CM et al (2000) Molecular portraits of human breast tumours. Nature 406(6797):747–752

    CAS  PubMed  Article  Google Scholar 

  11. Goldhirsch A et al (2013) Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol 24(9):2206–2223

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Balic M et al (2019) St. Gallen/Vienna 2019: a brief summary of the consensus discussion on the optimal primary breast cancer treatment. Breast Care 14(2):103–110

    PubMed  PubMed Central  Article  Google Scholar 

  13. Shoemaker ML, White MC, Wu M, Weir HK, Romieu I (2018) Differences in breast cancer incidence among young women aged 20–49 years by stage and tumor characteristics, age, race, and ethnicity, 2004–2013. Breast Cancer Res Treat 169(3):595–606

    PubMed  PubMed Central  Article  Google Scholar 

  14. Chen HL et al (2016) Effect of age on breast cancer patient prognoses: a population-based study using the SEER 18 Database. PLoS ONE 11(10):e0165409

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. Eccles SA et al (2013) Critical research gaps and translational priorities for the successful prevention and treatment of breast cancer. Breast Cancer Res 15(5):R92

    PubMed  PubMed Central  Article  Google Scholar 

  16. Bray F et al (2018) Global Cancer Statistics 2018: GLOBCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer 68(6):394–424

    Google Scholar 

  17. Henley A et al (2020) Annual Report to the Nation on the Status of Cancer, Part 1: National Cancer Statistics. Cancer 126:2225–2249

    PubMed  Article  Google Scholar 

  18. Singh GK, Jemal A (2017) Socioeconomic and racial/ethnic disparities in cancer mortality, incidence, and survival in the United States, 1950–2014: over six decades of changing patterns and widening inequalities. J Environ Public Health 2017:2819372

    PubMed  PubMed Central  Article  Google Scholar 

  19. SEER*Explorer Application: Breast Cancer Recent Trends in SEER Age-Adjusted Mortality Rates, 2000–2018 by Race/Ethnicity, Female, Ages <50, SEER, Editor. 2020.

  20. SEER*Explorer Application: Breast Cancer Recent Trends in SEER Age-Adjusted Incidence Rates, 2000–2017 by Race/Ethnicity, Female, Ages 15–39, All Stages, Delay-adjusted Rates, SEER, Editor. 2020.

  21. Qin B et al (2021) Neighborhood social environmental factors and breast cancer Subtypes among Black Women. Cancer Epidemiol Biomarkers Prev 30(2):344–350

    CAS  PubMed  Article  Google Scholar 

  22. Linnenbringer E et al (2020) Associations between breast cancer subtype and neighborhood socioeconomic and racial composition among Black and White women. Breast Cancer Res Treat 180(2):437–447

    PubMed  PubMed Central  Article  Google Scholar 

  23. Taylor TR et al (2007) Racial discrimination and breast cancer incidence in US Black women: the Black Women’s Health Study. Am J Epidemiol 166(1):46–54

    PubMed  Article  Google Scholar 

  24. Lu G et al (2018) The fluctuating incidence, improved survival of patients with breast cancer, and disparities by age, race, and socioeconomic status by decade, 1981–2010. Cancer Manag Res 10:4899–4914

    PubMed  PubMed Central  Article  Google Scholar 

  25. Andaya AA et al (2012) Socioeconomic disparities and breast cancer hormone receptor status. Cancer Causes Control 23(6):951–958

    PubMed  Article  Google Scholar 

  26. Colditz GA, Bohlke K, Berkey CS (2014) Breast cancer risk accumulation starts early: prevention must also. Breast Cancer Res Treat 145(3):567–579

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Kawai M et al (2014) Height, body mass index (BMI), BMI change, and the risk of estrogen receptor-positive, HER2-positive, and triple-negative breast cancer among women ages 20 to 44 years. Cancer 120(10):1548–1556

    PubMed  Article  Google Scholar 

  28. Ma H et al (2015) Reduced risk of breast cancer associated with recreational physical activity varies by HER2 status. Cancer Med 4(7):1122–1135

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Premenopausal Breast Cancer Collaborative, G et al (2018) Association of Body Mass Index and age with subsequent breast cancer risk in premenopausal women. JAMA Oncol 4(11):e181771

  30. Robinson WR et al (2014) Body size across the life course and risk of premenopausal and postmenopausal breast cancer in Black women, the Carolina Breast Cancer Study, 1993–2001. Cancer Causes Control 25(9):1101–1117

    PubMed  PubMed Central  Article  Google Scholar 

  31. Xue F et al (2016) Body fatness throughout the life course and the incidence of premenopausal breast cancer. Int J Epidemiol 45(4):1103–1112

    PubMed  PubMed Central  Google Scholar 

  32. Millikan RC et al (2008) Epidemiology of basal-like breast cancer. Breast Cancer Res Treat 109(1):123–139

    PubMed  Article  Google Scholar 

  33. Chollet-Hinton L et al (2017) Biology and Etiology of Young-Onset Breast Cancers among Premenopausal African American Women: Results from the AMBER Consortium. Cancer Epidemiol Biomarkers Prev 26(12):1722–1729

    PubMed  PubMed Central  Article  Google Scholar 

  34. Bandera EV et al (2015) Obesity, body fat distribution, and risk of breast cancer subtypes in African American women participating in the AMBER Consortium. Breast Cancer Res Treat 150(3):655–666

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Krieger N (1994) Epidemiology and the web of causation: has anyone seen the spider? Soc Sci Med 39(7):887–903

    CAS  PubMed  Article  Google Scholar 

  36. Krieger N (2020) Measures of racism, sexism, heterosexism, and gender binarism for health equity research: from structural injustice to embodied harm-an ecosocial analysis. Annu Rev Public Health 41:37–62

    PubMed  Article  Google Scholar 

  37. Omi M, Winant H (1994) Racial formation in the United States: from the 1960’s to the 1990’s. Routledge, New York

    Google Scholar 

  38. Jones C (2000) Levels of racism: a theoretic framework and a gardener’s tale. Am J Public Health 90:1212–1215

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. Duster T (2005) MEDICINE: enhanced: race and reification in science. Science 307(5712):1050–1051

    CAS  PubMed  Article  Google Scholar 

  40. Williams DR, Mohammed SA, Shields AE (2016) Understanding and effectively addressing breast cancer in African American women: Unpacking the social context. Cancer 122(14):2138–2149

    PubMed  Article  Google Scholar 

  41. Ford CL, Harawa NT (2010) A new conceptualization of ethnicity for social epidemiologic and health equity research. Soc Sci Med 71(2):251–258

    PubMed  PubMed Central  Article  Google Scholar 

  42. Krieger, N., Ecosocial Theory of Disease Distribution: Embodying Societal & Ecologic Context, in Epidemiology and the People's Health. Theory and Context. 2013. p. 202–235.

  43. Williams DR, Mohammed SA (2009) Discrimination and racial disparities in health: evidence and needed research. J Behav Med 32(1):20–47

    PubMed  Article  Google Scholar 

  44. Linnenbringer E, Gehlert S, Geronimus AT (2017) Black-White disparities in breast cancer subtype: the intersection of socially patterned stress and genetic expression. AIMS Public Health 4(5):526–556

    PubMed  PubMed Central  Article  Google Scholar 

  45. Jones CP (2002) Confronting institutionalized racism. Phylon 50(1/2):7–22

    Article  Google Scholar 

  46. Jones CP (2001) Invited commentary: "race," racism, and the practice of epidemiology. Am J Epidemiol 154(4):299–304; discussion 305–6.

  47. Krieger N (2013) History, biology, and health inequities: emergent embodied phenotypes and the illustrative case of the breast cancer estrogen receptor. Am J Public Health 103(1):22–27

    PubMed  PubMed Central  Article  Google Scholar 

  48. DiGaetano R, Waksberg J (2002) Commentary: trade-offs in the development of a sample design for case-control studies. Am J Epidemiol 155(8):771–775

    PubMed  Article  Google Scholar 

  49. Brogan DJ et al (2001) Comparison of telephone sampling and area sampling: response rates and within-household coverage. Am J Epidemiol 153(11):1119–1127

    CAS  PubMed  Article  Google Scholar 

  50. National Health And Nutrition Examination Survey III: Field Operations Manual. 1991.

  51. Mesenbourg T et al (2010) Census Summary File 1

  52. Jones CP et al (2008) Using “socially assigned race” to probe white advantages in health status. Ethn Dis 18(4):496–504

    PubMed  Google Scholar 

  53. Marchbanks PA et al (2002) The NICHD Women’s Contraceptive and Reproductive Experiences Study: methods and operational results. Ann Epidemiol 12(4):213–221

    PubMed  Article  Google Scholar 

  54. Gammon MD et al (2002) The Long Island Breast Cancer Study Project: description of a multi-institutional collaboration to identify environmental risk factors for breast cancer. Breast Cancer Res Treat 74(3):235–254

    CAS  PubMed  Article  Google Scholar 

  55. Brinton LA et al (1995) Oral contraceptives and breast cancer risk among younger women. J Natl Cancer Inst 87(11):827–835

    CAS  PubMed  Article  Google Scholar 

  56. Resnick M, Bearman P, Blum R (1997) Protecting adolescents from harm. Findings from the National Longitudinal Study on Adolescent Health. JAMA 278(10):823–832

    CAS  PubMed  Article  Google Scholar 

  57. Hamilton C, Strader L, Pratt J (2011) The PhenX Toolkit: get the most from your measures. Am J Epidemiol 174(3):253–260

    PubMed  PubMed Central  Article  Google Scholar 

  58. Wingo P et al (1988) The evaluation of the data collection process for a multicenter, population-based, case-control design. Am J Epidemiol 128(1):206–217

    CAS  PubMed  Article  Google Scholar 

  59. Pebley NSA (2003) Neighborhood and family effects on children’s health in Los Angeles. RAND, Santa Monica

    Google Scholar 

  60. Weir SS Healthy environments partnership neighborhood observational checklist. University of Michigan, Ann Arobor, pp 1–12.

  61. Kelly C et al (2013) Using Google Street View to audit the built environment: inter-rater reliability results. Ann Behav Med 45(Suppl_1):108–12

    Article  Google Scholar 

  62. AAPOR (2016) Standard definitions: final dispositions of case codes and outcome rates for surveys. AAPOR, Oakbrook Terrace

    Google Scholar 

  63. Li Y, Graubard B, Digaetano R (2011) Weighting methods for population-based case-control studies with complex sampling. J R Stat Soc Ser C 60(2):165–185

    Article  Google Scholar 

  64. Bollen K (1989) Structural equations with latent variables. Wiley, New York

    Book  Google Scholar 

  65. Mahabir S et al (2012) Challenges and opportunities in research on early-life events/exposures and cancer development later in life. Cancer Causes Control 23(6):983–990

    PubMed  Article  Google Scholar 

  66. Gomez SL, Glaser SL (2006) Misclassification of race/ethnicity in a population-based cancer registry (United States). Cancer Causes Control 17(6):771–781

    PubMed  Article  Google Scholar 

  67. Hamilton A et al (2009) Latinas and breast cancer outcomes: population-based sampling, ethnic identity, and acculturation assessment. Cancer Epidemiol Biomarkers Prev 18(7):2022–2029

    PubMed  PubMed Central  Article  Google Scholar 

  68. Tourangeau R, Plewes (2013) Nonresponse in social science surveys: a research agenda. National Academy of Sciences, Washington, DC

  69. Xu M et al (2018) Response rates in case-control studies of cancer by era of fieldwork and by characteristics of study design. Ann Epidemiol 28(6):385–391

    PubMed  Article  Google Scholar 

  70. Palmer JR, Ambrosone CB, Olshan AF (2014) A collaborative study of the etiology of breast cancer subtypes in African American women: the AMBER consortium. Cancer Causes Control 25(3):309–319

    PubMed  Article  Google Scholar 

  71. Pinn V et al (2003) Outreach Notebook: For the inclusion, recruitment and retention of women and minority subjects in clinical research, U.S.D.o.H.a.H. Services, Editor. National Institutes of Health

  72. Bartlett DWR (2013) Recruitment and retention of African American and Hispanic girls and women in research. Public Health Nurs 30(2):159–166

    PubMed  Article  Google Scholar 

  73. Moorman PG et al (1999) Participation rates in a case-control study: the impact of age, race, and race of interviewer. Ann Epidemiol 9(3):188–195

    CAS  PubMed  Article  Google Scholar 

  74. Schwartz K et al (2013) Enhancement and validation of an Arab surname database. J Registry Manag 40(4):176–179

    PubMed  PubMed Central  Google Scholar 

  75. Williams DR (1997) Race and health: basic questions, emerging directions. Ann Epidemiol 7:322–333

    CAS  PubMed  Article  Google Scholar 

  76. Folsom R, otter F, Williams SRTI (1987) Notes on a composite size measure for self-weighting samples in multiple domains. American Statistical Association Meeting, pp 792–796

Download references


We would first like to extend our deep appreciation to the women who contributed as participants to the Young Women’s Health History Study. We would also like to thank the following individuals who contributed to the study design and data collection. Community Advisors: Twyla Griffin; Kommah McDowell; Hope Bradford; Katie Clark; Diana Dyer; Brenda Krentler; Karen Owens; Vernessa Patrick; Karry Samulski; Lori Wesby; Hanna Weber; the Metropolitan Detroit SEER registry and Epidemiology Research Core (Wayne State University/Karmanos Cancer Institute): Dr. Jennifer Beebe-Dimmer, Julie Ruterbusch and Fawn Vigneau; the Michigan State Vital Statistics Registrar: Dr. Glenn Copeland; the Los Angeles County SEER registry: Dr. Dennis Deapen; Justin Cook; Maria Isabel Gaeta; Yaping Wang; YWHHS Los Angeles County Data Collection and Processing Team (University of Southern CA): Denise Modjeski; Kashonda Davis; Wendy McGlothlin; Paige Rosenthal; Jennifer Zelaya; Renee Bickerstaff-Magee; Elesa Maxie; Priscilla Gardner; the YWHHS Metropolitan Detroit Data Collection and Processing Team (Wayne State University/Karmanos Cancer Institute): Dr. Gwendolyn Norman; Landa Daniels; Tara Baird; Amanda Bullock; Terry Smith; Mary Beth Kolbicz; Verona Ivory; Arkeshia Barnes; Heloise Glenn; Velma White; Terry Smith; Ernestine Anthony; and Deborah Kimbrough; our Westat Team: Dr. Jeanne Rosenthal; Giannella De Rienzo; Craig Ray; Jane Li; Sabrina Zhang; and the many field interviewers in Detroit and LA; YWHHS Computer Tracking System (University of Southern California Cancer Informatics Core): Aarti Vaishnav; Reed Comire; Vaibhav Bora; Jeet Poonater; Waikeung Louis Lee; and Charanya Ram Kumar; Survey Biostatistical Consultant (National Cancer Institute): Dr. Barry Graubard; Racial Sensitivity Trainer/Field Work Consultants (University of Southern California): Dr. Karen Lincoln; Dr. Rose Monteiro; Questionnaire Development Consultants: Dr. Lorraine Halinka Malcoe; Dr. Christine Erdmann; YWHHS Biospecimen Biorepository Staff (Medical College of Wisconsin Tissue Bank): Dr. Saul Suster; Mary Rau; Janelle Lang-Piette; Ellen Schneider; Matthew Dunham; Whitney Stibb; YWHHS Detroit Tumor Processing Team (Medical College of Wisconsin): Dr. Craig MacKinnon; Dr. Zainab Basir; Kathy Stoll; Los Angeles YWHHS Tumor Collection/Processing: Dr. Wendy Cozen; Dr. Debra Hawes; Jose Aparicio; Dr. Maria Sibug-Saber; Moli Chen; Tumor Subtyping Consultants: Dr. Howard Chang; Dr. Sandra Haslam; Dr. Melissa Troester; Dr. Mark Sherman; YWHHS Biospecimen Laboratory Staff (Michigan State University): Dr. Rachel Schiffman; Alice Schehr; Melanie Adkins; Dr. Sainan Wei; Genetic Biostatistician Consultant: Dr. Goncalo Abecasis; Nutritional Assessment Consultants (NutritionQuest): Tory Block; Dr. Jean Norris; Kinesiology Consultants and Interviewer Trainers: Dr. Emily Guseman (OSU); Dr. Kimbo Yee; YWHHS Interviewer Training and Quality Control (Michigan State University/University of Wisconsin—Milwaukee): Dr. Jeanne Meier; Scientific Advisors: Dr. Otis Brawley; Dr. Lawrence Brody; Dr. Larry Kushi; Dr. Camara Jones; Dr. Julie Palmer; Dr. Mark Sherman; and Dr. Anne Sumner; and our past YWHHS Central Coordinating Center Research team members (Michigan State University/University of Wisconsin, Milwaukee): Kara Mannor; Steven Larmore; Marielle Gagnier; James Dodge; Andrew Jessmore; Olga Prushinskaya; Theresa Kowalaski; Kevin Petersen; Stephan Diljak; Hannah Selig; Cristin McArdle; Dr. Julie Schuppie; Amy Parry; Beneet Pandey; Bethany Canales; Cory Steinmetz; David Strong; Sofia Haile; James Groh; Jenn Woo; Brian Thayer; Dan Sanfelippo; Nicole Carlson and Anamarie LeDuc. Additionally, we would like to thank Drs. Leslie Bernstein and Katie Henderson, City of Hope, for assistance with the development of the study design and obtaining funding, as well as Dr. Karen Klomparens, Dean of the Graduate School, Michigan State University, for her support. The authors assume full responsibility for analyses and interpretation of these data.


This work was directly supported by the National Institute of Health (NIH) National Cancer Institute (NCI) grant R01CA136861 (E.Velie). The collection of cancer incidence data from California used in this study was supported by the California Department of Public Health pursuant to California Health and Safety Code Sect. 103885; Centers for Disease Control and Prevention’s (CDC) National Program of Cancer Registries, under cooperative agreement 5NU58DP006344; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract HHSN261201800032I awarded to the University of California, San Francisco, contract HHSN261201800015I awarded to the University of Southern California, and contract HHSN261201800009I awarded to the Public Health Institute. Z. Zhang was supported by the NIH Office of Research on Women’s Health and the National Institute of Child Health and Human Development K12HD043488 (Building Interdisciplinary Research Careers in Women’s Health, BIRCWH). The ideas and opinions expressed herein are those of the authors and do not necessarily reflect the opinions of the funders.

Author information

Authors and Affiliations



EMV: Conceptualization, supervision, methodology, funding acquisition, data curation, writing—original draft, writing—reviewing and editing. LRM: Project administration, data curation, formal analysis, writing—original draft, writing—reviewing and editing. DRP: Conceptualization, methodology, funding acquisition, data curation, writing—reviewing and editing. ASH: Conceptualization, supervision, methodology, funding acquisition, data curation, writing—reviewing and editing. RD: Conceptualization, supervision, methodology, funding acquisition, data curation, writing—reviewing and editing. RK: Project administration, supervision, data curation, writing—reviewing and editing. BG: Project administration, supervision, data curation, writing—reviewing and editing. RH: Conceptualization, methodology, funding acquisition, writing—reviewing and editing. NC: Conceptualization, methodology, writing—reviewing and editing. LKO: Conceptualization, methodology, funding acquisition, writing—reviewing and editing. AA: Conceptualization, methodology, writing—reviewing and editing. ZZ: Conceptualization, methodology, data curation, writing—reviewing and editing. DM: Project administration, supervision, data curation, writing—reviewing and editing. GN: Project administration, supervision, data curation, writing—reviewing and editing. DRL: Project administration, formal analysis, writing—reviewing and editing. SG: Project administration, data curation, writing—reviewing and editing. HR: Conceptualization, supervision, methodology, data curation, writing—reviewing and editing. KS: Conceptualization, supervision, methodology, funding acquisition, data curation, writing—reviewing and editing.

Corresponding author

Correspondence to Ellen M. Velie.

Ethics declarations

Conflict of interest

None declared.

Ethical approval

This study protocol was approved by the Institutional Review Boards at the University of Wisconsin—Milwaukee (UWM); Michigan State University (MSU); Wayne State University (WSU); the Michigan Department of Community Health; University of Southern California (USC); the California Committee for the Protection of Human Subjects (CPHS); and for the Medical College of Wisconsin (MCW), IRB oversight was deferred to UWM. The California Cancer Registry also approved the study.

Consent to participate

Written informed consent was obtained from all participants included in the YWHHS.

Consent for publication

All participants included in the final YWHHS sample consented to having their data published in scientific publications.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (DOCX 48 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Velie, E.M., Marcus, L.R., Pathak, D.R. et al. Theory, methods, and operational results of the Young Women’s Health History Study: a study of young-onset breast cancer incidence in Black and White women. Cancer Causes Control 32, 1129–1148 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Breast cancer
  • Young-onset breast cancer
  • Epidemiology
  • Life-course
  • Health status disparities
  • Premenopause