Skip to main content
Log in

Characterization of soybean genomic features by analysis of its expressed sequence tags

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

We analyzed 314,254 soybean expressed sequence tags (ESTs), including 29,540 from our laboratory and 284,714 from GenBank. These ESTs were assembled into 56,147 unigenes. About 76.92% of the unigenes were homologous to genes from Arabidopsis thaliana (Arabidopsis). The putative products of these unigenes were annotated according to their homology with the categorized proteins of Arabidopsis. Genes corresponding to cell growth and/or maintenance, enzymes and cell communication belonged to the slow-evolving class, whereas genes related to transcription regulation, cell, binding and death appeared to be fast-evolving. Soybean unigenes with no match to genes within the Arabidopsis genome were identified as soybean-specific genes. These genes were mainly involved in nodule development and the synthesis of seed storage proteins. In addition, we also identified 61 genes regulated by salicylic acid, 1,322 transcription factor genes and 326 disease resistance-like genes from soybean unigenes. SSR analysis showed that the soybean genome was more complex than the Arabidopsis and the Medicago truncatula genomes. GC content in soybean unigene sequences is similar to that in Arabidopsis and M. truncatula. Furthermore, the combined analysis of the EST database and the BAC-contig sequences revealed that the total gene number in the soybean genome is about 63,501.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie WR, Venter JC (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656

    CAS  PubMed  Google Scholar 

  • Andreeva AV, Evans DE, Hawes CR, Bennett N, Kutuzov MA (1998) PP7, a plant phosphatase representing a novel evolutionary branch of eukaryotic protein Ser/Thr phosphatases. Biochem Mol Biol Int 44:703–715

    CAS  PubMed  Google Scholar 

  • Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208–219

    CAS  Google Scholar 

  • Borsani O, Valpuesta V, Botella MA (2001) Evidence for a role of salicylic acid in the oxidative damage generated by NaCl and osmotic stress in Arabidopsis seedlings. Plant Physiol 126:1024–1030

    Article  CAS  PubMed  Google Scholar 

  • Burke J, Davison D, Hide W (1999) D2-cluster: a validated method for clustering EST and full-length cDNAsequences. Genome Res 9:1135–1142

    Article  CAS  PubMed  Google Scholar 

  • Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, Waugh R (2000) Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics 156:847–854

    CAS  PubMed  Google Scholar 

  • Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An integrated genetic linkage map of the soybean genome. Crop Sci 39:1464–1490

    CAS  Google Scholar 

  • Delseny M, Cooke R, Raynal M, Grellet F (1997) The Arabidopsis thaliana cDNA sequencing projects. FEBS Lett 403:221–224

    Article  CAS  PubMed  Google Scholar 

  • Dever TE, Wei CL, Benkowski LA, Browning K, Merrick WC, Hershey JW (1994) Determination of the amino acid sequence of rabbit, human, and wheat germ protein synthesis factor eIF-4C by cloning and chemical sequencing. J Biol Chem 269:3212–3218

    CAS  PubMed  Google Scholar 

  • Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194

    CAS  PubMed  Google Scholar 

  • Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185

    CAS  PubMed  Google Scholar 

  • Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S, Claverie JM (1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 9:950–959

    CAS  PubMed  Google Scholar 

  • Fernandes J, Brendel V, Gai X, Lal S, Chandler VL, Elumalai RP, Galbraith DW, Pierson EA, Walbot V (2002) Comparison of RNA expression profiles based on maize expressed sequence tag frequency analysis and micro-array hybridization. Plant Physiol 128:896–910

    PubMed  Google Scholar 

  • Garg K, Green P, Nickerson DA (1999) Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags. Genome Res 9:1087–1092

    PubMed  Google Scholar 

  • Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100

    CAS  PubMed  Google Scholar 

  • Goldberg RB (1978) DNA sequence organization in the soybean plant. Biochem Genet 16:45–68

    CAS  PubMed  Google Scholar 

  • Graham MA, Marek LF, Shoemaker RC (2002) Organization, expression and evolution of a disease resistance gene cluster in soybean. Genetics 162:1961–1977

    Google Scholar 

  • Gurley WB, Hepburn AG, Key JL (1979) Sequence organization of the soybean genome. Biochem Biophys Acta 561:167–183

    Article  CAS  PubMed  Google Scholar 

  • Hammond-Kosack KE, Jones JDG (1997) Plant disease resistance genes. Annu Rev Plant Physiol Plant Mol Biol 48:575–607

    CAS  Google Scholar 

  • Hatey F, Tosser-Klopp G, Clouscard-Martinato C, Mulsant P, Gasser F (1998) Expressed sequence tags for genes: a review. Genet Sel Evol 30:521–541

    CAS  Google Scholar 

  • He C-Y, Zhang Z-Y, Chen S-Y (2001) Isolation and characterization of soybean NBS analogs. Chin Sci Bull 46:1984–1988

    CAS  Google Scholar 

  • He C-Y, Zhang J-S, Chen S-Y (2002) A soybean gene encoding a proline-rich protein is regulated by salicylic acid, an endogenous circadian rhythm and by various stresses. Theor Appl Genet 104:1125–1131

    CAS  Google Scholar 

  • He C-Y, Tian A-G, Zhang J-S, Zhang Z-Y, Gai J-Y, Chen S-Y (2003) Isolation and characterization of a full-length resistance gene homolog from soybean. Theor Appl Genet 106:786–793

    CAS  PubMed  Google Scholar 

  • Hoeven R van der , Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14:1441–1456

    Article  PubMed  Google Scholar 

  • Iuchi S, Kobayashi M, Yamaguchi-Shinozaki K, Shinozaki K (2000) A stress-inducible gene for 9-cis-epoxycarotenoid dioxygenase involved in abscisic acid biosynthesis under water stress in drought-tolerant cowpea. Plant Physiol 123:553–562

    PubMed  Google Scholar 

  • Kanazin V, Marek LF, Shoemaker RC (1996) Resistance gene analogs are conserved and clustered in soybean. Proc Natl Acad Sci USA 93:11746–11750

    CAS  PubMed  Google Scholar 

  • Kurata N, Nagamura Y, Yamamoto K, Harushima Y, Sue N, Wu J, Antonio BA, Shomura A, Shimizu T, Lin SY, Inoue T, Fukuda A, Shimano T, Kuboki Y, Toyama T, Miyamoto Y, Kirihara T, Hayasaka K, Miyao A, Monna L, Zhong HS, Tamura Y, Wang ZX, Momma T, Umehara Y, Yano M, Sasaki T, Minobe Y (1994) A 300 kilobase interval genetic map of rice including 883 expressed sequences. Nat Genet 8:365–372

    CAS  PubMed  Google Scholar 

  • Liu F, Zhuang BC, Zhang JS, Chan SY (2000) Construction and Analysis of Soybean Genetic Map. Acta Genet Sin 27:1018–1026

    CAS  PubMed  Google Scholar 

  • Michelmore RW, Meyers BC (1998) Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res 8:1113–1130

    CAS  PubMed  Google Scholar 

  • Okubo K, Hori N, Matoba R, Niiyama T, Matsubara K (1991) A novel system for large-scale sequencing of cDNA by PCR amplification. DNA Seq 2:137–144

    CAS  PubMed  Google Scholar 

  • Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, Creelman R, Pilgrim M, Broun P, Zhang JZ, Ghandehari D, Sherman BK, Yu G (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290:2105–2110

    CAS  PubMed  Google Scholar 

  • Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15:1000–1011

    Article  CAS  PubMed  Google Scholar 

  • Shirasu K, Nakajima H, Rajasekhar VK, Dixon RA, Lamb C (1997) Salicylic acid potentiates an agonist-dependent gain control that amplifies pathogen signals in the activation of defense mechanisms. Plant Cell 9:261–270

    Google Scholar 

  • Shoemaker R, Keim P, Vodkin L, Retzel E, Clifton SW, Waterston R, Smoller D, Coryell V, Khanna A, Erpelding J, Gai X, Brendel V, Raph-Schmidt C, Shoop EG, Vielweber CJ, Schmatz M, Pape D, Bowers Y, Theising B, Martin J, Dante M, Wylie T, Granger C (2002) A compilation of soybean ESTs: generation and analysis. Genome 45:329–338

    Article  PubMed  Google Scholar 

  • Tang Z, Sadka A, Morishige DT, Mullet JE (2001) Homeodomain leucine zipper proteins bind to the phosphate response domain of the soybean vspb tripartite promoter. Plant Physiol 125:797–809

    CAS  PubMed  Google Scholar 

  • Ulmasov T, Hagen G, Guilfoyle TJ (1997) ARF1, a transcription factor that binds to auxin response elements. Science 276:1865–1868

    CAS  PubMed  Google Scholar 

  • Wang J, Wong GK, Ni P, Han Y, Huang X, Zhang J, Ye C, Zhang Y, Hu J, Zhang K, Xu X, Cong L, Lu H, Ren X, He J, Tao L, Passey DA, Yang H, Yu J, Li S (2002) RePS: a sequence assembler that masks exact repeats identified from the shotgun data. Genome Res 12:824–831

    Article  CAS  PubMed  Google Scholar 

  • Wang X, Ullah Z, Grumet R (2000) Interaction between zucchini yellow mosaic potyvirus RNA-dependent RNA polymerase and host poly-(A) binding protein. Virology 275:433–443

    Article  CAS  PubMed  Google Scholar 

  • Wu X-L, He C-Y, Wang Y-J, Zhang Z-Y, Dong F-Y, Zhang J-S, Chen S-Y, Gai J-Y (2001) Construction and analysis of a genetic linkage map of soybean. Acta Genet Sin 28:1051–1061

    CAS  PubMed  Google Scholar 

  • Xu B, Zhen HY, Lu QH, Zhao SW, Zhou SH, Hu ZA (1986) Three new evidences of the original area of soybean. Soybean Sci 5:123–130

    Google Scholar 

  • Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Li J, Liu Z, Qi Q, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Zhao W, Li P, Chen W, Zhang Y, Hu J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Tao M, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92

    CAS  PubMed  Google Scholar 

  • Zhang DS, Dong W, Hui DW, Chen SY, Zhuang B-C (1997) Construction of a soybean linkage map using an F2 hybrid population from a cultivated variety and a semi-wild soybean. Chin Sci Bull 42:1326–1330

    Google Scholar 

  • Zhang ZA, Xing P, Staswick T, Clemente T (1999) The use of glufosinate as a selective agent in Agrobacterium-mediated transformation of soybean. Plant Cell Tissue Organ Cult 556:37–46

    Article  Google Scholar 

  • Zhou JM, Loh YT, Bressan RA, Martin GB (1995) The Tomato Gene Ptil encodes a Serine/Threonine kinase that is phosphorylated by Pto and is involved in the hypersensitive response. Cell 83:925–935

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by National Key Basic Research Special Funds, P.R. China (G1998010209, 2002CB111301).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jin-Song Zhang or Shou-Yi Chen.

Additional information

A.-G. Tian and J. Wang contributed equally to this work.

Communicated by H.F. Linskens

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, AG., Wang, J., Cui, P. et al. Characterization of soybean genomic features by analysis of its expressed sequence tags. Theor Appl Genet 108, 903–913 (2004). https://doi.org/10.1007/s00122-003-1499-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-003-1499-2

Keywords

Navigation