Abstract
We analyzed 314,254 soybean expressed sequence tags (ESTs), including 29,540 from our laboratory and 284,714 from GenBank. These ESTs were assembled into 56,147 unigenes. About 76.92% of the unigenes were homologous to genes from Arabidopsis thaliana (Arabidopsis). The putative products of these unigenes were annotated according to their homology with the categorized proteins of Arabidopsis. Genes corresponding to cell growth and/or maintenance, enzymes and cell communication belonged to the slow-evolving class, whereas genes related to transcription regulation, cell, binding and death appeared to be fast-evolving. Soybean unigenes with no match to genes within the Arabidopsis genome were identified as soybean-specific genes. These genes were mainly involved in nodule development and the synthesis of seed storage proteins. In addition, we also identified 61 genes regulated by salicylic acid, 1,322 transcription factor genes and 326 disease resistance-like genes from soybean unigenes. SSR analysis showed that the soybean genome was more complex than the Arabidopsis and the Medicago truncatula genomes. GC content in soybean unigene sequences is similar to that in Arabidopsis and M. truncatula. Furthermore, the combined analysis of the EST database and the BAC-contig sequences revealed that the total gene number in the soybean genome is about 63,501.
Similar content being viewed by others
References
Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie WR, Venter JC (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656
Andreeva AV, Evans DE, Hawes CR, Bennett N, Kutuzov MA (1998) PP7, a plant phosphatase representing a novel evolutionary branch of eukaryotic protein Ser/Thr phosphatases. Biochem Mol Biol Int 44:703–715
Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208–219
Borsani O, Valpuesta V, Botella MA (2001) Evidence for a role of salicylic acid in the oxidative damage generated by NaCl and osmotic stress in Arabidopsis seedlings. Plant Physiol 126:1024–1030
Burke J, Davison D, Hide W (1999) D2-cluster: a validated method for clustering EST and full-length cDNAsequences. Genome Res 9:1135–1142
Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, Waugh R (2000) Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics 156:847–854
Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An integrated genetic linkage map of the soybean genome. Crop Sci 39:1464–1490
Delseny M, Cooke R, Raynal M, Grellet F (1997) The Arabidopsis thaliana cDNA sequencing projects. FEBS Lett 403:221–224
Dever TE, Wei CL, Benkowski LA, Browning K, Merrick WC, Hershey JW (1994) Determination of the amino acid sequence of rabbit, human, and wheat germ protein synthesis factor eIF-4C by cloning and chemical sequencing. J Biol Chem 269:3212–3218
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185
Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S, Claverie JM (1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 9:950–959
Fernandes J, Brendel V, Gai X, Lal S, Chandler VL, Elumalai RP, Galbraith DW, Pierson EA, Walbot V (2002) Comparison of RNA expression profiles based on maize expressed sequence tag frequency analysis and micro-array hybridization. Plant Physiol 128:896–910
Garg K, Green P, Nickerson DA (1999) Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags. Genome Res 9:1087–1092
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100
Goldberg RB (1978) DNA sequence organization in the soybean plant. Biochem Genet 16:45–68
Graham MA, Marek LF, Shoemaker RC (2002) Organization, expression and evolution of a disease resistance gene cluster in soybean. Genetics 162:1961–1977
Gurley WB, Hepburn AG, Key JL (1979) Sequence organization of the soybean genome. Biochem Biophys Acta 561:167–183
Hammond-Kosack KE, Jones JDG (1997) Plant disease resistance genes. Annu Rev Plant Physiol Plant Mol Biol 48:575–607
Hatey F, Tosser-Klopp G, Clouscard-Martinato C, Mulsant P, Gasser F (1998) Expressed sequence tags for genes: a review. Genet Sel Evol 30:521–541
He C-Y, Zhang Z-Y, Chen S-Y (2001) Isolation and characterization of soybean NBS analogs. Chin Sci Bull 46:1984–1988
He C-Y, Zhang J-S, Chen S-Y (2002) A soybean gene encoding a proline-rich protein is regulated by salicylic acid, an endogenous circadian rhythm and by various stresses. Theor Appl Genet 104:1125–1131
He C-Y, Tian A-G, Zhang J-S, Zhang Z-Y, Gai J-Y, Chen S-Y (2003) Isolation and characterization of a full-length resistance gene homolog from soybean. Theor Appl Genet 106:786–793
Hoeven R van der , Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14:1441–1456
Iuchi S, Kobayashi M, Yamaguchi-Shinozaki K, Shinozaki K (2000) A stress-inducible gene for 9-cis-epoxycarotenoid dioxygenase involved in abscisic acid biosynthesis under water stress in drought-tolerant cowpea. Plant Physiol 123:553–562
Kanazin V, Marek LF, Shoemaker RC (1996) Resistance gene analogs are conserved and clustered in soybean. Proc Natl Acad Sci USA 93:11746–11750
Kurata N, Nagamura Y, Yamamoto K, Harushima Y, Sue N, Wu J, Antonio BA, Shomura A, Shimizu T, Lin SY, Inoue T, Fukuda A, Shimano T, Kuboki Y, Toyama T, Miyamoto Y, Kirihara T, Hayasaka K, Miyao A, Monna L, Zhong HS, Tamura Y, Wang ZX, Momma T, Umehara Y, Yano M, Sasaki T, Minobe Y (1994) A 300 kilobase interval genetic map of rice including 883 expressed sequences. Nat Genet 8:365–372
Liu F, Zhuang BC, Zhang JS, Chan SY (2000) Construction and Analysis of Soybean Genetic Map. Acta Genet Sin 27:1018–1026
Michelmore RW, Meyers BC (1998) Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res 8:1113–1130
Okubo K, Hori N, Matoba R, Niiyama T, Matsubara K (1991) A novel system for large-scale sequencing of cDNA by PCR amplification. DNA Seq 2:137–144
Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, Creelman R, Pilgrim M, Broun P, Zhang JZ, Ghandehari D, Sherman BK, Yu G (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290:2105–2110
Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15:1000–1011
Shirasu K, Nakajima H, Rajasekhar VK, Dixon RA, Lamb C (1997) Salicylic acid potentiates an agonist-dependent gain control that amplifies pathogen signals in the activation of defense mechanisms. Plant Cell 9:261–270
Shoemaker R, Keim P, Vodkin L, Retzel E, Clifton SW, Waterston R, Smoller D, Coryell V, Khanna A, Erpelding J, Gai X, Brendel V, Raph-Schmidt C, Shoop EG, Vielweber CJ, Schmatz M, Pape D, Bowers Y, Theising B, Martin J, Dante M, Wylie T, Granger C (2002) A compilation of soybean ESTs: generation and analysis. Genome 45:329–338
Tang Z, Sadka A, Morishige DT, Mullet JE (2001) Homeodomain leucine zipper proteins bind to the phosphate response domain of the soybean vspb tripartite promoter. Plant Physiol 125:797–809
Ulmasov T, Hagen G, Guilfoyle TJ (1997) ARF1, a transcription factor that binds to auxin response elements. Science 276:1865–1868
Wang J, Wong GK, Ni P, Han Y, Huang X, Zhang J, Ye C, Zhang Y, Hu J, Zhang K, Xu X, Cong L, Lu H, Ren X, He J, Tao L, Passey DA, Yang H, Yu J, Li S (2002) RePS: a sequence assembler that masks exact repeats identified from the shotgun data. Genome Res 12:824–831
Wang X, Ullah Z, Grumet R (2000) Interaction between zucchini yellow mosaic potyvirus RNA-dependent RNA polymerase and host poly-(A) binding protein. Virology 275:433–443
Wu X-L, He C-Y, Wang Y-J, Zhang Z-Y, Dong F-Y, Zhang J-S, Chen S-Y, Gai J-Y (2001) Construction and analysis of a genetic linkage map of soybean. Acta Genet Sin 28:1051–1061
Xu B, Zhen HY, Lu QH, Zhao SW, Zhou SH, Hu ZA (1986) Three new evidences of the original area of soybean. Soybean Sci 5:123–130
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Li J, Liu Z, Qi Q, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Zhao W, Li P, Chen W, Zhang Y, Hu J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Tao M, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92
Zhang DS, Dong W, Hui DW, Chen SY, Zhuang B-C (1997) Construction of a soybean linkage map using an F2 hybrid population from a cultivated variety and a semi-wild soybean. Chin Sci Bull 42:1326–1330
Zhang ZA, Xing P, Staswick T, Clemente T (1999) The use of glufosinate as a selective agent in Agrobacterium-mediated transformation of soybean. Plant Cell Tissue Organ Cult 556:37–46
Zhou JM, Loh YT, Bressan RA, Martin GB (1995) The Tomato Gene Ptil encodes a Serine/Threonine kinase that is phosphorylated by Pto and is involved in the hypersensitive response. Cell 83:925–935
Acknowledgements
This research was supported by National Key Basic Research Special Funds, P.R. China (G1998010209, 2002CB111301).
Author information
Authors and Affiliations
Corresponding authors
Additional information
A.-G. Tian and J. Wang contributed equally to this work.
Communicated by H.F. Linskens
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Tian, AG., Wang, J., Cui, P. et al. Characterization of soybean genomic features by analysis of its expressed sequence tags. Theor Appl Genet 108, 903–913 (2004). https://doi.org/10.1007/s00122-003-1499-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-003-1499-2