Skip to main content
Log in

Using unsupervised learning techniques to assess interactions among complex traits in soybeans

  • Published:
Euphytica Aims and scope Submit manuscript

Abstract

Soybean yield components and agronomic traits are connected through physiological pathways that impose tradeoffs through genetic and environmental constraints. Our primary aim is to assess the interdependence of soybean traits by using unsupervised machine learning techniques to divide phenotypic associations into environmental and genetic associations. This study was performed on large scale, jointly analyzing 14 quantitative traits in a large multi-parental population designed for genetic studies. We collected phenotypes from 2012 to 2015 from a soybean nested association panel with 40 families of approximately 140 individuals each. Pearson and Spearman correlations measured phenotypic associations. A multivariate mixed linear model provided genotypic and environmental correlations. To evaluate relationships among traits, the study used principal component and undirected graphical models from phenotypic, genotypic, and environmental correlation matrices. Results indicate that high phenotypic correlation occurs when traits display both genetic and environmental correlations. In genetic terms, length of reproductive period, node number, and canopy coverage play important roles in determining yield potential. Optimal grain yield production occurs when the growing environment favors faster canopy closure and extended reproductive length. Environmental associations found among yield components give insight into the nature of yield component compensation. The use of unsupervised learning methods provides a good framework for investigating interactions among various quantitative traits and defining target traits for breeding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

DAP:

Days after planting

LASSO:

Least absolute shrinkage and selection operator

MG:

Maturity group

NAM:

Nested association mapping

PCA:

Principal component analysis

QTL:

Quantitative trait loci

RIL:

Recombinant inbred line

SNP:

Single nucleotide polymorphism

References

  • Ali F, Kanwal N, Ahsan M, Ali Q, Bibi I, Niazi NK (2015) Multivariate analysis of grain yield and its attributing traits in different maize hybrids grown under heat and drought stress. Scientifica 2015:1–6

    Article  CAS  Google Scholar 

  • Ball RA, Purcell LC, Vories ED (2000) Short-season soybean yield compensation in response to population and water regime. Crop Sci 40(4):1070–1078

    Article  Google Scholar 

  • Board JE (2000) Light interception efficiency and light quality affect yield compensation of soybean at low plant populations. Crop Sci 40(5):1285–1294

    Article  Google Scholar 

  • Board JE, Hall W (1984) Premature flowering in soybean yield reductions at nonoptimal planting dates as influenced by temperature and photoperiod. Agron J 76(4):700–704

    Article  Google Scholar 

  • Board JE, Harville BG (1993) Soybean yield component responses to a light interception gradient during the reproductive period. Crop Sci 33(4):772–777

    Article  Google Scholar 

  • Board JE, Kahlon CS (2011) Soybean yield formation: what controls it and how it can be improved? Soybean Physiol Biochem. doi:10.5772/17596

    Google Scholar 

  • Board JE, Kahlon CS (2012) A proposed method for stress analysis and yield prediction in soybean using light interception and developmental timing. Crop Management 11(1):22

    Article  Google Scholar 

  • Board JE, Tan Q (1995) Assimilatory capacity effects on soybean yield components and pod number. Crop Sci 35(3):846–851

    Article  Google Scholar 

  • Board JE, Kamal M, Harville BG (1992) Temporal importance of greater light interception to increased yield in narrow-row soybean. Agron J 84(4):575–579

    Article  Google Scholar 

  • Board JE, Kang MS, Harville BG (1997) Path analyses of the yield formation process for late-planted soybean. Agron J 91(1):128–135

    Article  Google Scholar 

  • Borrás L, Slafer GA, Otegui ME (2004) Seed dry weight response to source-sink manipulations in wheat, maize and soybean: a quantitative reappraisal. Field Crops Res 86(2):131–146

    Article  Google Scholar 

  • Carpenter AC, Board JE (1997) Branch yield components controlling soybean yield stability across plant populations. Crop Sci 37(3):885–891

    Article  Google Scholar 

  • Chung J, Babka HL, Graef GL, Staswick PE, Lee DJ, Cregan PB, Specht JE (2003) The seed protein, oil, and yield QTL on soybean linkage group I. Crop Sci 43(3):1053–1067

    Article  CAS  Google Scholar 

  • Cober ER, Stewart DW, Voldeng HD (2001) Photoperiod and temperature responses in early-maturing, near-isogenic soybean lines. Crop Sci 41(3):721–727

    Article  Google Scholar 

  • Concibido V, LaVallee B, Mclaird P, Pineda N, Meyer J, Hummel L, Wang J, Wu K, Delannay X (2003) Introgression of a quantitative trait locus for yield from Glycine soja into commercial soybean cultivars. Theor Appl Genet 106(4):575–582

    Article  CAS  PubMed  Google Scholar 

  • Crabbe JC, Phillips TJ, Kosobud A, Belknap JK (1990) Estimation of genetic correlation: interpretation of experiments using selectively bred and inbred animals. Alcohol Clin Exp Res 14(2):141–151

    Article  CAS  PubMed  Google Scholar 

  • Cui S, He X, Fu S, Meng Q, Gai J, Yu D (2008) Genetic dissection of the relationship of apparent biological yield and apparent harvest index with seed yield and yield related traits in soybean. Crop Pasture Sci 59:86–93

    Article  CAS  Google Scholar 

  • DeBruin JL, Pedersen P (2008) Soybean seed yield response to planting date and seeding rate in the Upper Midwest. Agron J 100(3):696–703

    Article  Google Scholar 

  • DeJong G, VanNoordwijk AJ (1992) Acquisition and allocation of resources: genetic (co) variances, selection, and life histories. Am Nat 139(4):749–770

    Article  Google Scholar 

  • Diers, B.W., 2014. SoyNAM Project Update. Soybean Breeders Workshop, St. Louis MO. http://soybase.org/meeting_presentations/soybean_breeders_workshop/SBW_2014/presentations/Diers_SBW2014.pdf

  • Dinkins RD, Keim KR, Farno L, Edwards LH (2002) Expression of the narrow leaflet gene for yield and agronomic traits in soybean. J Hered 93(5):346–351

    Article  CAS  PubMed  Google Scholar 

  • Dornhoff GM, Shibles RM (1970) Varietal differences in net photosynthesis of soybean leaves. Crop Sci 10(1):42–45

    Article  Google Scholar 

  • Ecochard R, Ravelomanantsoa Y (1982) Genetic correlations derived from Full-sib relationships in soybean (Glycine max Merr.). Theor Appl Gen 63(1):9–15

    Article  CAS  Google Scholar 

  • Edwards JT, Purcell LC (2005) Soybean yield and biomass responses to increasing plant population among diverse maturity groups. Crop Sci 45(5):1770–1777

    Article  Google Scholar 

  • Egli DB (1993) Cultivar maturity and potential yield of soybean. Field Crops Res 32(1):147–158

    Article  Google Scholar 

  • El-Mohsen AAA, Mahmoud GO, Safina SA (2013) Agronomical evaluation of six soybean cultivars using correlation and regression analysis under different irrigation regime conditions. J Plant Breed Crop Sci 5(5):91–102

    Article  Google Scholar 

  • Elmore RW (1990) Soybean cultivar response to tillage systems and planting date. Agron J 82(1):69–73

    Article  Google Scholar 

  • Epler M, Staggenborg S (2008) Soybean yield and yield component response to plant density in narrow row systems. Crop Manag. doi:10.1094/CM-2008-0925-01-RS

    Google Scholar 

  • Falconer DS (1952) The problem of environment and selection. Am Nat 86(830):293–298

    Article  Google Scholar 

  • Fehr WR, Caviness CE, Burmood DT, Pennington JS (1971) Stage of development descriptions for soybeans, Glycine max (L. Merrill). Crop Sci 11(6):929–931

    Article  Google Scholar 

  • Fehr WR, Burris JS, Gilman NA (1973) Soybean emergence under field conditions. Agron J 65(5):740–742

    Article  Google Scholar 

  • Frederick JR, Alm DM, Hesketh JD (1989) Leaf photosynthetic rates, stomatal resistances, and internal CO2 concentrations of soybean cultivars under drought stress. Photosynthetica 23(4):575–584

    Google Scholar 

  • Frederick JR, Camp CR, Bauer PJ (2001) Drought-stress effects on branch and mainstem seed yield and yield components of determinate soybean. Crop Sci 41(3):759–763

    Article  Google Scholar 

  • Gay S, Egli DB, Reicosky DA (1980) Physiological aspects of yield improvement in soybeans. Agron J 72(2):387–391

    Article  Google Scholar 

  • Ghanem ME, Marrou H, Sinclair TR (2014) Physiological phenotyping of plants for crop improvement. Trends Plant Sci 20:139–144

    Article  PubMed  Google Scholar 

  • Giglioti ÉA, Sumida CH, Canteri MG (2015) Disease phenomics. Phenomics. Springer, Berlin, pp 101–123

    Google Scholar 

  • Hall B (2015) Quantitative characterization of canopy coverage in the genetically diverse soybean population. M.Sc. Thesis, Department of Agronomy, Purdue University

  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    Google Scholar 

  • Hazel LN (1943) The genetic basis for constructing selection indexes. Genetics 28(6):476–490

    CAS  PubMed  PubMed Central  Google Scholar 

  • Herbert SJ, Litchfield GV (1982) Partitioning soybean seed yield components. Crop Sci 22(5):1074–1079

    Article  Google Scholar 

  • Hu G, Liu C, Jiang H, Wang J, Chen Q, Qi Z (2011) Integration of major QTLs of important agronomic traits in soybean. INTECH, Rijeka

    Book  Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer, New York

    Book  Google Scholar 

  • Jin J, Liu X, Wang G, Mi L, Shen Z, Chen X, Herbert SJ (2010) Agronomic and physiological contributions to the yield improvement of soybean cultivars released from 1950 to 2006 in Northeast China. Field Crops Res 115(1):116–123

    Article  Google Scholar 

  • Johnson HW, Robinson HF, Comstock RE (1955) Estimates of genetic and environmental variability in soybeans. Agron J 47(7):314–318

    Article  Google Scholar 

  • Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11(1):94

    Article  PubMed  PubMed Central  Google Scholar 

  • Kahlon CS, Board JE (2012) Growth dynamic factors explaining yield improvement in new versus old soybean cultivars. J Crop Improv 26(2):282–299

    Article  Google Scholar 

  • Koester RP, Skoneczka JA, Cary TR, Diers BW, Ainsworth EA (2014) Historical gains in soybean (Glycine max Merr.) seed yield are driven by linear increases in light interception, energy conversion, and partitioning efficiencies. J Exp Bot 65(12):3311–3321

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kwon SH, Torrie JH (1964) Heritability of and interrelationships among traits of two soybean populations. Crop Sci 4(2):196

    Article  Google Scholar 

  • Larson EM, Hesketh JD, Woolley JT, Peters DB (1981) Seasonal variations in apparent photosynthesis among plant stands of different soybean cultivars. Photosynth Res 2(1):3–20

    Article  CAS  PubMed  Google Scholar 

  • Lee SH, Bailey MA, Mian MAR, Carter TE, Ashley DA, Hussey RS, Parrott WA, Boerma HR (1996a) Molecular markers associated with soybean plant height, lodging, and maturity across locations. Crop Sci 36(3):728–735

    Article  CAS  Google Scholar 

  • Lee SH, Bailey MA, Mian MAR, Shipe ER, Ashley DA, Parrott WA, Hussey RS, Boerma HR (1996b) Identification of quantitative trait loci for plant height, lodging, and maturity in a soybean population segregating for growth habit. Theor Appl Genet 92(5):516–523

    Article  CAS  PubMed  Google Scholar 

  • Lesoing GW, Francis CA (1999) Strip intercropping effects on yield and yield components of corn, grain sorghum, and soybean. Agron J 91(5):807–813

    Article  Google Scholar 

  • Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, vol 1. Sinauer, Sunderland

    Google Scholar 

  • Malausa T, Guillemaud T, Lapchin L (2005) Combining genetic variation and phenotypic plasticity in tradeoff modelling. Oikos 110(2):330–338

    Article  Google Scholar 

  • Mandl FA, Buss GR (1981) Comparison of narrow and broad leaflet isolines of soybean. Crop Sci 21(1):25–27

    Article  Google Scholar 

  • Mansur LM, Lark KG, Kross H, Oliveira A (1993) Interval mapping of quantitative trait loci for reproductive, morphological, and seed traits of soybean (Glycine max L.). Theor Appl Genet 86(8):907–913

    CAS  PubMed  Google Scholar 

  • Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG (1996) Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Sci 36(5):1327–1336

    Article  CAS  Google Scholar 

  • Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462

    Article  Google Scholar 

  • Misztal I, Tsuruta S, Strabel T, Auvray B, Druet T, Lee DH (2002) BLUPF90 and related programs (BGF90). In: Proceedings of the 7th World Congress on Genetics Applied to Livestock Production, August, 2002. Session 28. Institut National de la Recherche Agronomique (INRA), Montpellier, France, pp 1–2

  • Ordas B, Malvar RA, Hill WG (2008) Genetic variation and quantitative trait loci associated with developmental stability and the environmental correlation between traits in maize. Genet Res 90(5):385

    Article  CAS  Google Scholar 

  • Palomeque L, Li-Jun L, Li W, Hedges B, Cober ER, Rajcan I (2009a) QTL in mega-environments: I. Universal and specific seed yield QTL detected in a population derived from a cross of high-yielding adapted x high-yielding exotic soybean lines. Theor Appl Genet 119(3):417–427

    Article  PubMed  Google Scholar 

  • Palomeque L, Li-Jun L, Li W, Hedges B, Cober ER, Rajcan I (2009b) QTL in mega-environments: II. Agronomic trait QTL co-localized with seed yield QTL detected in a population derived from a cross of high-yielding adapted × high-yielding exotic soybean lines. Theor Appl Genet 119(3):429–436

    Article  PubMed  Google Scholar 

  • Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Sci 45(5):2015–2022

    Article  CAS  Google Scholar 

  • Paterson AH (1995) Molecular dissection of quantitative traits: progress and prospects. Genome Res 5(4):321–333

    Article  CAS  PubMed  Google Scholar 

  • Pedersen P, Lauer JG (2004) Response of soybean yield components to management system and planting date. Agron J 96(5):1372–1381

    Article  Google Scholar 

  • Peirson BE (2015) Plasticity, stability, and yield: the origins of Anthony David Bradshaw’s model of adaptive phenotypic plasticity. Stud Hist Philos Sci C 50:51–66

    Google Scholar 

  • Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9:1295–1342

    Google Scholar 

  • Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161(1–2):209–228

    Article  Google Scholar 

  • Purcell LC (2000) Soybean canopy coverage and light interception measurements using digital imagery. Crop Sci 40(3):834–837

    Article  Google Scholar 

  • R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  • Ramachandra D, Madappa S, Phillips J, Loida P, Karunanandaa B (2015) Breeding and biotech approaches towards improving yield in soybean. In: Davey MR, Daniell H, Azhakanandam K, Silverstone A (eds) Recent advancements in gene expression and enabling technologies in crop plants. Springer, New York, pp 131–192

    Chapter  Google Scholar 

  • Recker JR, Burton JW, Cardinal A, Miranda L (2013) Analysis of quantitative traits in two long-term randomly mated soybean populations: I. Genetic Variances. Crop Sci 53(4):1375–1383

    Article  Google Scholar 

  • Recker JR, Burton JW, Cardinal A, Miranda L (2014) Genetic and phenotypic correlations of quantitative traits in two long-term, randomly mated soybean populations. Crop Sci 54(3):939–943

    Article  Google Scholar 

  • Richards RA (2000) Selectable traits to increase crop photosynthesis and yield of grain crops. J Exp Bot 51(suppl 1):447–458

    Article  CAS  PubMed  Google Scholar 

  • Rincker K, Nelson R, Specht J, Sleper D, Cary T, Cianzio SR, Diers B (2014) Genetic improvement of US soybean in maturity groups II, III, and IV. Crop Sci 54(4):1419–1432

    Google Scholar 

  • Rowntree SC, Suhre JJ, Weidenbenner NH, Wilson EW, Davis VM, Naeve SL, Casteel SN, Diers BW, Esker PD, Specht JE, Conley SP (2013) Genetic gain x management interactions in soybean: I. Planting date. Crop Sci 53(3):1128–1138

    Article  Google Scholar 

  • Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press, Baco Raton

    Book  Google Scholar 

  • Searle SR (1961) Phenotypic, genetic and environmental correlations. Biometrics 17(3):474–480

    Article  Google Scholar 

  • Simpson AM, Wilcox JR (1983) Genetic and phenotypic associations of agronomic characteristics in four high protein soybean populations. Crop Sci 23(6):1077–1081

    Article  Google Scholar 

  • Soares MM, Oliveira GL, Soriano PE, Sekita MC, Sediyama T (2013) Performance of soybean plants as function of seed size: II. Nutritional stress. J Seed Sci 35(4):419–427

    Article  Google Scholar 

  • Song Q, Yan L, Quigley C, Jordan BD, Fickus E, Schroeder S, Song BH, Charles An YQ, Hyten D, Nelson R, Rainey KM, Beavis WD, Specht JE, Diers BW, Cregan P (2017) Genetic characterization of the soybean nested association mapping population. Plant Genome 10(2):1–14

    Article  Google Scholar 

  • Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York

    Book  Google Scholar 

  • Spear JD, Fehr WR (2007) Genetic improvement of seedling emergence of soybean lines with low phytate. Crop Sci 47(4):1354–1360

    Article  CAS  Google Scholar 

  • Specht JE, Hume DJ, Kumudini SV (1999) Soybean yield potential: a genetic and physiological perspective. Crop Sci 39(6):1560–1570

    Article  Google Scholar 

  • Steinsland I, Jensen H (2010) Utilizing Gaussian Markov random field properties of Bayesian animal models. Biometrics 66(3):763–771

    Article  PubMed  Google Scholar 

  • Sudaric A, Vrataric M, Duvnjak T (2002) Quantitative genetic analysis of yield components and grain yield for soybean cultivars. Poljoprivreda 2(8):11–15

    Google Scholar 

  • Swoboda C, Pedersen P (2009) Effect of fungicide on soybean growth and yield. Agron J 101(2):352–356

    Article  CAS  Google Scholar 

  • Ustun A, Allen FL, English BC (2001) Genetic progress in soybean of the US Midsouth. Crop Sci 41(4):993–998

    Article  Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423

    Article  CAS  PubMed  Google Scholar 

  • Vieira SR, Paz-Gonzalez A (2003) Analysis of the spatial variability of crop yield and soil properties in small agricultural plots. Bragantia 62(1):127–138

    Article  Google Scholar 

  • Wells R (1991) Soybean growth response to plant density: relationships among canopy photosynthesis, leaf area, and light interception. Crop Sci 31(3):755–761

    Article  Google Scholar 

  • Wilcox JR, Sediyama T (1981) Interrelationships among height, lodging and yield in determinate and indeterminate soybeans. Euphytica 30(2):323–326

    Article  Google Scholar 

  • Wilson EW, Rowntree SC, Suhre JJ, Weidenbenner NH, Conley SP, Davis VM, Diers BW, Naeve SL, Esker PD, Specht J, Casteel SN (2014) Genetic gain × management interactions in soybean: II. Nitrogen utilization. Crop Sci 54(1):340–348

    Article  Google Scholar 

  • Wortman SE, Francis CA, Galusha TD, Hoagland C, VanWart J, Baenziger PS, Johnson M et al (2013) Evaluating cultivars for organic farming: maize, soybean, and wheat genotype by system interactions in Eastern Nebraska. Agroecol Sust Food Syst 37(8):915–932

    Google Scholar 

  • Wu T, Sun S, Wang C, Lu W, Sun B, Song X, Han T (2015) Characterizing changes from a century of genetic improvement of soybean cultivars in Northeast China. Crop Sci 55(5):2056–2067

    Article  CAS  Google Scholar 

  • Xavier A, Xu S, Muir WM, Rainey KM (2015) NAM: association studies in multiple populations. Bioinformatics 31:3862–3864

    CAS  PubMed  Google Scholar 

  • Xavier A, Muir WM, Rainey KM (2016) Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans. BMC Bioinform 17(1):17–55

    Article  Google Scholar 

  • Xavier A, Hall B, Hearst A, Cherkauer KA, Rainey KM (2017) Genetic architecture of phenomic-enabled canopy coverage in glycine max. Genetics 206(2):1081–1089

    Article  PubMed  PubMed Central  Google Scholar 

  • Yan W, Rajcan I (2003) Prediction of cultivar performance based on single-versus multiple-year tests in soybean. Crop Sci 43(2):549–555

    Article  Google Scholar 

  • Zera AJ, Harshman LG (2001) The physiology of life history trade-offs in animals. Annu Rev Ecol Syst 32:95–126

    Article  Google Scholar 

  • Zhang WK, Wang YJ, Luo GZ, Zhang JS, He CY, Wu XL, Chen SY et al (2004) QTL mapping of ten agronomic traits on the soybean (Glycine max L. Merr.) genetic map and their association with EST markers. Theor Appl Genet 108(6):1131–1139

    Article  CAS  PubMed  Google Scholar 

  • Zhang D, Cheng H, Wang H, Zhang H, Liu C, Yu D (2010) Identification of genomic regions determining flower and pod numbers development in soybean (Glycine max L.). J Genet Genom 37(8):545–556

    Article  CAS  Google Scholar 

  • Zhao T, Liu H, Roeder K, Lafferty J, Wasserman L (2012) The huge package for high-dimensional undirected graph estimation in R. J Mach Learn Res 13(1):1059–1062

    PubMed  PubMed Central  Google Scholar 

Download references

Funding

United Soybean Board funded the SoyNAM experiment from 2012 to 2013. Dow AgroScience funded the SoyNAM experiment from 2014 to 2015 in Indiana, and the data collection of yield component data from 2013 to 2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katy Martin Rainey.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 12424 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xavier, A., Hall, B., Casteel, S. et al. Using unsupervised learning techniques to assess interactions among complex traits in soybeans. Euphytica 213, 200 (2017). https://doi.org/10.1007/s10681-017-1975-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10681-017-1975-4

Keywords

Navigation