Skip to main content

Development and validation of genome-wide polygenic risk scores for predicting breast cancer incidence in Japanese females: a population-based case-cohort study



This study aimed to develop an ancestry-specific polygenic risk scores (PRSs) for the prediction of breast cancer events in Japanese females and validate it in a longitudinal cohort study.


Using publicly available summary statistics of female breast cancer genome-wide association study (GWAS) of Japanese and European ancestries, we, respectively, developed 31 candidate genome-wide PRSs using pruning and thresholding (P + T) and LDpred methods with varying parameters. Among the candidate PRS models, the best model was selected using a case-cohort dataset (63 breast cancer cases and 2213 sub-cohorts of Japanese females during a median follow-up of 11.9 years) according to the maximal predictive ability by Harrell’s C-statistics. The best-performing PRS for each derivation GWAS was evaluated in another independent case-cohort dataset (260 breast cancer cases and 7845 sub-cohorts of Japanese females during a median follow-up of 16.9 years).


For the best PRS model involving 46,861 single nucleotide polymorphisms (SNPs; P + T method with PT = 0.05 and R2 = 0.2) derived from Japanese-ancestry GWAS, the Harrell’s C-statistic was 0.598 ± 0.018 in the evaluation dataset. The age-adjusted hazard ratio for breast cancer in females with the highest PRS quintile compared with those in the lowest PRS quintile was 2.47 (95% confidence intervals, 1.64–3.70). The PRS constructed using Japanese-ancestry GWAS demonstrated better predictive performance for breast cancer in Japanese females than that using European-ancestry GWAS (Harrell’s C-statistics 0.598 versus 0.586).


This study developed a breast cancer PRS for Japanese females and demonstrated the usefulness of the PRS for breast cancer risk stratification.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Data availability

GWAS summary statistics of Japanese and European ancestry are available at and Ethical guidelines issued by the Ministry of Health, Labor, and Welfare of Japan and the Ministry of Education, Culture, Sports, Science, and Technology of Japan ensure that the privacy of study participants prevents the public provision of individual data. Additionally, the informed consent we obtained did not include a provision for public data sharing. Researchers interested in verifying or replicating the results of our study can submit a research proposal to the Administration Office of the Japan Public Health Center-based prospective study at The study group board will follow a prescribed process to evaluate the importance of the research proposal in consideration of the study participants’ privacy and may provide special permission to access a minimal dataset.



Polygenic risk score


Genome-wide association studies


Single nucleotide polymorphism


Japan Public Health Center‐based


Public health center


Quality control


Linkage disequilibrium


Area under the receiver operating characteristic


  1. Yanes T, Young MA, Meiser B, James PA (2020) Clinical applications of polygenic breast cancer risk: a critical review and perspectives of an emerging field. Breast Cancer Res 22:21.

    Article  Google Scholar 

  2. Narod SA (2018) Personalised medicine and population health: breast and ovarian cancer. Hum Genet 137:769–778.

    Article  CAS  Google Scholar 

  3. Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A et al (2019) Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet 104:21–34.

    Article  CAS  Google Scholar 

  4. Mars N, Widén E, Kerminen S, Meretoja T, Pirinen M, Della Briotta Parolo P et al (2020) The role of polygenic risk and susceptibility genes in breast cancer over the course of life. Nat Commun 11:6383.

    Article  CAS  Google Scholar 

  5. Mars N, Koskela JT, Ripatti P, Kiiskinen TTJ, Havulinna AS, Lindbohm JV et al (2020) Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat Med 26:549–557.

    Article  CAS  Google Scholar 

  6. Vachon CM, Pankratz VS, Scott CG, Haeberle L, Ziv E, Jensen MR et al (2015) The contributions of breast density and common genetic variation to breast cancer risk. J Natl Cancer Inst 107:dju397.

    Article  CAS  Google Scholar 

  7. Cuzick J, Brentnall AR, Segal C, Byers H, Reuter C, Detre S et al (2017) Impact of a panel of 88 single nucleotide polymorphisms on the risk of breast cancer in high-risk women: results from two randomized tamoxifen prevention trials. J Clin Oncol 35:743–750.

    Article  CAS  Google Scholar 

  8. Shieh Y, Eklund M, Madlensky L, Sawyer SD, Thompson CK, Stover Fiscalini A et al (2017) Breast cancer screening in the precision medicine era: risk-based screening in a population-based trial. J Natl Cancer Inst 109:5.

    Article  Google Scholar 

  9. Esserman LJ, WISDOM Study and Athena Investigators (2017) The WISDOM study: breaking the deadlock in the breast cancer screening debate. npj Breast Cancer 3:34.

    Article  Google Scholar 

  10. UNICANCER (2021) International randomized study comparing personalized, risk-stratified to standard breast cancer screening in women aged 40–70 [Internet] Cited 2021 Nov 4. Report No. nct03672331.

  11. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51:584–591.

    Article  CAS  Google Scholar 

  12. Park SL, Cheng I, Haiman CA (2018) Genome-wide association studies of cancer in diverse populations. Cancer Epidemiol Biomarkers Prev 27:405–417.

    Article  CAS  Google Scholar 

  13. Lin PI, Vance JM, Pericak-Vance MA, Martin ER (2007) No gene is an island: the flip-flop phenomenon. Am J Hum Genet 80:531–538.

    Article  CAS  Google Scholar 

  14. Lee CPL, Irwanto A, Salim A, Yuan J, min, Liu J, Koh WP, et al (2014) Breast cancer risk assessment using genetic variants and risk factors in a Singapore Chinese population. Breast Cancer Res 16:R64.

    Article  Google Scholar 

  15. Wen W, Shu XO, Guo X, Cai Q, Long J, Bolla MK et al (2016) Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry. Breast Cancer Res 18:124.

    Article  Google Scholar 

  16. Hsieh YC, Tu SH, Su CT, Cho EC, Wu CH, Hsieh MC et al (2017) A polygenic risk score for breast cancer risk in a Taiwanese population. Breast Cancer Res Treat 163:131–138.

    Article  CAS  Google Scholar 

  17. Chan CHT, Munusamy P, Loke SY, Koh GL, Yang AZY, Law HY et al (2018) Evaluation of three polygenic risk score models for the prediction of breast cancer risk in Singapore Chinese. Oncotarget 9:12796–12804.

    Article  Google Scholar 

  18. Tsugane S, Sawada N (2014) The JPHC study: design and some findings on the typical Japanese diet. Jpn J Clin Oncol 44:777–782.

    Article  Google Scholar 

  19. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.

    Article  CAS  Google Scholar 

  20. Yamaguchi-Kabata Y, Nakazono K, Takahashi A, Saito S, Hosono N, Kubo M et al (2008) Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. Am J Hum Genet 83:445–456.

    Article  CAS  Google Scholar 

  21. Hachiya T, Komaki S, Hasegawa Y, Ohmomo H, Tanno K, Hozawa A et al (2017) Genome-wide meta-analysis in Japanese populations identifies novel variants at the TMC6-TMC8 and SIX3-SIX2 loci associated with HbA1c. Sci Rep 7:16147.

    Article  CAS  Google Scholar 

  22. Hachiya T, Narita A, Ohmomo H, Sutoh Y, Komaki S, Tanno K et al (2018) Genome-wide analysis of polymorphism × sodium interaction effect on blood pressure identifies a novel 3′-BCL11B gene desert locus. Sci Rep 8:14162.

    Article  CAS  Google Scholar 

  23. Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK et al (2016) Reference-based phasing using the haplotype reference consortium panel. Nat Genet 48:1443–1448.

    Article  CAS  Google Scholar 

  24. Loh PR, Palamara PF, Price AL (2016) Fast and accurate long-range phasing in a UK Biobank cohort. Nat Genet 48:811–816.

    Article  CAS  Google Scholar 

  25. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A et al (2016) Next-generation genotype imputation service and methods. Nat Genet 48:1284–1287.

    Article  CAS  Google Scholar 

  26. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH et al (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50:1219–1224.

    Article  CAS  Google Scholar 

  27. Ishigaki K, Akiyama M, Kanai M, Takahashi A, Kawakami E, Sugishita H et al (2020) Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat Genet 52:669–679.

    Article  CAS  Google Scholar 

  28. Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S et al (2017) Association analysis identifies 65 new breast cancer risk loci. Nature 551:92–94.

    Article  CAS  Google Scholar 

  29. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Genomes Project Consortium et al (2015) A global reference for human genetic variation. Nature 526:68–74.

    Article  CAS  Google Scholar 

  30. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4:7.

    Article  CAS  Google Scholar 

  31. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47:291–295

    Article  CAS  Google Scholar 

  32. Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S et al (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet 97:576–592.

    Article  CAS  Google Scholar 

  33. Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of medical tests. JAMA 247:2543–2546.

    Article  Google Scholar 

  34. Barlow WE, Ichikawa L, Rosner D, Izumi S (1999) Analysis of case-cohort designs. J Clin Epidemiol 52:1165–1172.

    Article  CAS  Google Scholar 

  35. Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A et al (2018) Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet 104:21–34.

    Article  CAS  Google Scholar 

  36. Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A et al (2021) The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat Genet 53:420–425.

    Article  CAS  Google Scholar 

  37. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C et al (2019) The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47:D1005–D1012.

    Article  CAS  Google Scholar 

  38. Sham PC, Cherny SS, Purcell S, Hewitt JK (2000) Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 66:1616–1630.

    Article  CAS  Google Scholar 

  39. Momozawa Y, Iwasaki Y, Parsons MT, Kamatani Y, Takahashi A, Tamura C et al (2018) Germline pathogenic variants of 11 breast cancer genes in 7051 Japanese patients and 11,241 controls. Nat Commun 9:4083.

    Article  CAS  Google Scholar 

Download references


We thank the participants and staff involved in this study. The JPHC study members are listed on the following website: We are indebted to the Aomori, Iwate, Akita, Ibaraki, Niigata, Osaka, Kochi, Nagasaki, and Okinawa Cancer Registries for providing the incidence data. We also thank Drs. Hiromi Sakamoto and Teruhiko Yoshida, staff at the Genetics Division, National Cancer Center Research Institute, and the Department of Clinical Genomics, Fundamental Innovative Oncology Core, National Cancer Center Research Institute, and Drs. Yukihide Momozawa and Michiaki Kubo, and staff at the RIKEN Center for Integrative Medical Sciences.


This study was supported by the National Cancer Center Research and Development Fund (Grant Nos. 23-A-31 [toku], 26-A-2, 29-A-4, and 2020-J-4), Grant-in-Aid for Cancer Research from the Ministry of Health, Labour, and Welfare of Japan (Grant No. 19shi-2), and Practical Research for Innovative Cancer Control (Grant Nos. JP16ck0106095, JP19ck0106266, and JP22ck0106551) from the Japan Agency for Medical Research and Development. The funders had no role in the study design, data collection and analysis, decision to publish, or the preparation of the study.

Author information

Authors and Affiliations




HO, TH, and TY conceived the study. TH and TY conducted the genotyping, quality control, and imputation. TH derived genome-wide PRS models by analyzing publicly available GWAS summaries. HO, TH, and TY analyzed the datasets from the JPHC study. SN, YM, YS, YOY, AS, HY, NS, M. Inoue, and ST interpreted data and contributed to the manuscript. M. Iwasaki supervised the study. All authors provided critical feedback on the analysis and the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Tsuyoshi Hachiya or Taiki Yamaji.

Ethics declarations

Conflict of interest

TH is a board member of Genome Analytics Japan Inc.

Ethical approval

The genetic research in the JPHC study, including this study, was approved by the Institutional Review Board of the National Cancer Center, Tokyo, Japan (Approval No.2011–044).

Consent to participations

Before the initiation of genetic research in the JPHC study, we announced its execution via our website and contacted all living blood donors by mail to provide them with the opportunity to opt out of participation in the research.

Consent for publications

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 29 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ohbe, H., Hachiya, T., Yamaji, T. et al. Development and validation of genome-wide polygenic risk scores for predicting breast cancer incidence in Japanese females: a population-based case-cohort study. Breast Cancer Res Treat 197, 661–671 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Breast cancer
  • Polygenic risk score
  • Genome-wide association study
  • East-Asian
  • Japanese