Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Use of single nucleotide polymorphisms and haplotypes to identify genomic regions associated with protein content and water-soluble protein content in soybean

  • 830 Accesses

  • 10 Citations


Key message

Four major SPC-specific loci were identified, and these accounted for 8.5–15.1 % of the phenotypic variation, thus explaining why certain soybean varieties have a high PC but a low SPC.


Water-soluble protein content (SPC) is a critical factor in both food quality and the production of isolated soybean proteins. However, few data are available regarding the genetic control and the mechanisms contributing to elevated SPC. In this study, a soybean collection of 192 accessions from a wide geographic range was used to identify genomic regions associated with soybean protein content (PC) and SPC using an association mapping approach employing 1,536 SNP makers and 232 haplotypes. The diverse panel revealed a large genetic variation in PC and SPC. Association mapping was performed using three methods to minimize false-positive associations. This resulted in 4/8 SNPs and 3/6 haplotypes that were significantly associated with soybean PC/SPC in two or more environments based on the mixed model. An SNP that was highly significantly associated with PC, BARC-021267-04016, was localized 0.28 cM away from a published glycinin gene, G7, and was detected across all four environments. Four major SPC-specific loci, BARC-029149-06088, BARC-018023-02499, BARC-041663-08059 and haplotype 15 (hp15), were stably identified on chromosomes five and eight and explained 8.5–15.1 % of the phenotypic variation. Moreover, a glutelin type-B 2-like gene was identified on chromosome eight and may be related to soybean protein solubility. These markers, which are located in previously reported QTL, reconfirmed previous findings and may be important targets for the identification of protein-related genes. These novel SNPs and haplotypes are important for further understanding the genetic basis of PC and SPC. In addition, by comparing the correlation and genetic loci between PC and SPC, we provide new insights into why certain soybean varieties have a high protein content but a low SPC.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465(7298):627–631

  2. Bagos PG (2011) Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 12(1):8

  3. Barrero RA, Bellgard M, Zhang X (2011) Diverse approaches to achieving grain yield in wheat. Funct Integr Genomic 11(1):37–48

  4. Beilinson V, Chen Z, Shoemaker R, Fischer R, Goldberg R, Nielsen N (2002) Genomic organization of glycinin genes in soybean. Theor Appl Genet 104(6–7):1132–1140

  5. Bergelson J, Roux F (2010) Towards identifying genes underlying ecologically relevant traits in Arabidopsis thaliana. Nat Rev Genet 11(12):867–879

  6. Cai S, Wu D, Jabeen Z, Huang Y, Huang Y, Zhang G (2013) Genome-wide association analysis of aluminum tolerance in cultivated and Tibetan wild barley. Plos One 8(7):e69776

  7. Cockram J, White J, Zuluaga DL, Smith D, Comadran J, Macaulay M, Luo Z, Kearsey MJ, Werner P, Harrap D (2010) Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome. Proc Natl Acad Sci 107(50):21611–21616

  8. Dunwell JM (1998) Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins. Biotechnol Genet Eng Rev 15(1):1–32

  9. Dunwell JM, Purvis A, Khuri S (2004) Cupins: the most functionally diverse protein superfamily? Phytochemistry 65(1):7–17

  10. Erdman JW Jr (2000) Soy protein and cardiovascular disease: a statement for healthcare professionals from the nutrition committee of the AHA. Circulation 102(20):2555–2559

  11. Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, Mitchell SE, Doebley J, Kresovich S, Goodman MM, Buckler ES (2005) Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J 44(6):1054–1064

  12. Garner C, Slatkin M (2003) On selecting markers for association studies: patterns of linkage disequilibrium between two and three diallelic loci. Genet Epidemiol 24(1):57–67

  13. Hao C, Wang Y, Hou J, Feuillet C, Balfourier F, Zhang X (2012a) Association mapping and haplotype analysis of a 3.1-Mb genomic region involved in fusarium head blight resistance on wheat chromosome 3BS. Plos One 7(10):e46444

  14. Hao D, Cheng H, Yin Z, Cui S, Zhang D, Wang H, Yu D (2012b) Identification of single nucleotide polymorphisms and haplotypes associated with yield and yield components in soybean (Glycine max) landraces across multiple environments. Theor Appl Genet 124(3):447–458

  15. Hu Z, Zhang H, Kan G, Ma D, Zhang D, Shi G, Hong D, Zhang G, Yu D (2013) Determination of the genetic architecture of seed size and shape via linkage and association analysis in soybean (Glycine max L. Merr.). Genetica 141(4–6):1–8

  16. Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, Li W, Guo Y, Deng L, Zhu C (2011) Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet 44(1):32–39

  17. Hyten D, Pantalone V, Sams C, Saxton A, Landau-Ellis D, Stefaniak T, Schmidt M (2004) Seed quality QTL in a prominent soybean population. Theor Appl Genet 109(3):552–561

  18. Jiang Z, Han Y, Teng W, Zhang Z, Sun D, Li Y, Li W (2010) Identification of QTL underlying the filling rate of protein at different developmental stages of soybean seed. Euphytica 175(2):227–236

  19. Jun T-H, Van K, Kim MY, Lee S-H, Walker DR (2008) Association analysis using SSR markers to find QTL for seed protein content in soybean. Euphytica 162(2):179–191

  20. Kump KL, Bradbury PJ, Wisser RJ, Buckler ES, Belcher AR, Oropeza-Rosas MA, Zwonitzer JC, Kresovich S, McMullen MD, Ware D (2011) Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat Genet 43(2):163–168

  21. Lee S, Bailey M, Mian M, Carter T Jr, Shipe E, Ashley D, Parrott W, Hussey R, Boerma H (1996) RFLP loci associated with soybean seed protein and oil content across populations and locations. Theor Appl Genet 93(5–6):649–657

  22. Lu Y, Xu J, Yuan Z, Hao Z, Xie C, Li X, Shah T, Lan H, Zhang S, Rong T (2012) Comparative LD mapping using single SNPs and haplotypes identifies QTL for plant height and biomass as secondary traits of drought tolerance in maize. Mol Breed 30(1):407–418

  23. Lu W, Wen Z, Li H, Yuan D, Li J, Zhang H, Huang Z, Cui S, Du W (2013) Identification of the quantitative trait loci (QTL) underlying water soluble protein content in soybean. Theor Appl Genet 126(2):425–433

  24. Malhotra A, Coupland JN (2004) The effect of surfactants on the solubility, zeta potential, and viscosity of soy protein isolates. Food Hydrocoll 18(1):101–108

  25. Molina Ortiz SE, Wagner JR (2002) Hydrolysates of native and modified soy protein isolates: structural characteristics, solubility and foaming properties. Food Res Int 35(6):511–518

  26. Orf J, Chase K, Jarvik T, Mansur L, Cregan P, Adler F, Lark K (1999) Genetics of soybean agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Sci 39(6):1642–1651

  27. Pandurangan S, Pajak A, Molnar SJ, Cober ER, Dhaubhadel S, Hernández-Sebastià C, Kaiser WM, Nelson RL, Huber SC, Marsolais F (2012) Relationship between asparagine metabolism and protein concentration in soybean seed. J Exp Bot 63(8):3173–3184

  28. Panthee D, Kwanyuen P, Sams C, West D, Saxton A, Pantalone V (2004) Quantitative trait loci for β-conglycinin (7S) and glycinin (11S) fractions of soybean storage protein. J Am Oil Chem Soc 81(11):1005–1012

  29. Panthee D, Pantalone V, West D, Saxton A, Sams C (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Sci 45:2015–2022

  30. Panthee D, Pantalone V, Saxton A, West D, Sams C (2006) Genomic regions associated with amino acid composition in soybean. Mol Breed 17(1):79–89

  31. Pathan SM, Vuong T, Clark K, Lee J-D, Shannon JG, Roberts CA, Ellersieck MR, Burton JW, Cregan PB, Hyten DL (2013) Genetic mapping and confirmation of quantitative trait loci for seed protein and oil contents and seed weight in soybean. Crop Sci 53(3):765–774

  32. Reinprecht Y, Poysa V, Yu K, Rajcan I, Ablett G et al (2006) Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome 49:1510–1527

  33. Singh P, Kumar R, Sabapathy SN, Bawa AS (2008) Functional and edible uses of soy protein products. Compr Rev Food Sci Food Saf 7(1):14–28. doi:10.1111/j.1541-4337.2007.00025.x

  34. Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, Normandeau É, Laroche J, Larose S, Jean M (2013) An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. Plos One 8(1):e54603

  35. Tajuddin T, Watanabe S, Yamanaka N, Harada K (2003) Analysis of quantitative trait loci for protein and lipid contents in soybean seeds using recombinant inbred lines. Breed Sci 53(2):133–140

  36. Walsh DJ, Cleary D, McCarthy E, Murphy S, FitzGerald RJ (2003) Modification of the nitrogen solubility properties of soy protein isolate following proteolysis and transglutaminase cross-linking. Food Res Int 36(7):677–683

  37. Walstra P (1989) Principles of foam formation and stability. In: Wilson AJ (ed) Springer, London pp 1–15

  38. Wang Y, Gai J (2002) Study on the ecological regions of soybean in China. II. Ecological environment and representative varieties. J Appl Ecol 13(1):71–75

  39. Wang H, Gao Z, Zhang D, Cheng H, Yu D (2011) Identification of genes with soybean resistance to common cutworm by association analysis. Chin Bull Bot 46(5):514–524

  40. Yu J, Pressoir G, Briggs W, Vroh Bi I, Yamasaki M, Doebley J, McMullen M, Gaut B, Nielsen D, Holland J (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38(2):203–208

  41. Zhang D, Song H, Cheng H, Hao D, Wang H, Kan G, Jin H, Yu D (2014) The acid phosphatase-encoding gene gmacp1 contributes to soybean tolerance to low-phosphorus stress. Plos Genet 10(1):e1004061

  42. Zhu C, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. Plant Genome 1(1):5–20

Download references


This work was supported by the National Basic Research Program of China (973 Program) (2010CB125906), the National Natural Science Foundation of China (31171573, 31201230, 31301336, 31301341) and the Jiangsu Provincial Programs (BE2012328, BK2012768).

Conflict of interest

We declare that we have no conflicts of interest.

Author information

Correspondence to Dan Zhang.

Additional information

D. Zhang and G. Kan contributed equally to this work.

Communicated by Istvan Rajcan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 14 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Kan, G., Hu, Z. et al. Use of single nucleotide polymorphisms and haplotypes to identify genomic regions associated with protein content and water-soluble protein content in soybean. Theor Appl Genet 127, 1905–1915 (2014). https://doi.org/10.1007/s00122-014-2348-1

Download citation


  • Single Nucleotide Polymorphism
  • Association Mapping
  • Single Nucleotide Polymorphism Marker
  • Soybean Protein
  • Soybean Variety