BioEnergy Research

, 2:209 | Cite as

pDAWG: An Integrated Database for Plant Cell Wall Genes

  • Fenglou Mao
  • Yanbin Yin
  • Fengfeng Zhou
  • Wen-Chi Chou
  • Chan Zhou
  • Huiling Chen
  • Ying XuEmail author


We have recently developed a database, pDAWG, focused on information related to plant cell walls. Currently, pDAWG contains seven complete plant genomes, 12 complete algal genomes, along with computed information for individual proteins encoded in these genomes of the following types: (a) carbohydrate active enzyme (CAZy) family information when applicable; (b) phylogenetic trees of cell wall-related CAZy family proteins; (c) protein structure models if available; (d) physical and predicted interactions among proteins; (e) subcellular localization; (f) Pfam domain information; and (g) homology-based functional prediction. A querying system with a graphical interface allows a user to quickly compose information of different sorts about individual genes/proteins and to display the composite information in an intuitive manner, facilitating comparative analyses and knowledge discovery about cell wall genes. pDAWG can be accessed at


Cell wall genes Biological database Bioinformatics Biofuel 



This work is supported in part by the BioEnergy Science Center (BESC) grant from the Office of Biological and Environmental Research in the DOE Office of Science and National Science Foundation (DBI-0354771, ITR-IIS-0407204, DBI-0542119).


  1. 1.
    Duvick J et al (2008) PlantGDB: a resource for comparative plant genomics. Nucl Acids Res 36(suppl_1):D959–D965PubMedGoogle Scholar
  2. 2.
    JGI (2009) Phytozome: a tool for green plant comparative genomics. Available from
  3. 3.
    Wall PK et al (2008) PlantTribes: a gene and gene family resource for comparative genomics in plants. Nucleic Acids Res 36(Database issue):D970–D976PubMedGoogle Scholar
  4. 4.
    Conte MG et al (2008) GreenPhylDB: a database for plant comparative genomics. Nucleic Acids Res 36(Database issue):D991–D998PubMedGoogle Scholar
  5. 5.
    Hartmann S et al (2006) Phytome: a platform for plant comparative genomics. Nucleic Acids Res 34(Database issue):D724–D730CrossRefPubMedGoogle Scholar
  6. 6.
    Girke T et al (2004) The Cell Wall Navigator database. A systems-based approach to organism-unrestricted mining of protein families involved in cell wall metabolism. Plant Physiol 136(2):3003–3008 discussion 3001CrossRefPubMedGoogle Scholar
  7. 7.
    Yong W et al (2005) Genomics of plant cell wall biogenesis. Planta 221(6):747–51CrossRefPubMedGoogle Scholar
  8. 8.
    Cao P-J et al (2008) Construction of a rice glycosyltransferase phylogenomic database and identification of rice-diverged glycosyltransferases. Molecular Plant 1(5):858–877CrossRefPubMedGoogle Scholar
  9. 9.
    Punta M, Ofran Y (2008) The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 4(10):e1000160CrossRefPubMedGoogle Scholar
  10. 10.
    Minshull J et al (2005) Predicting enzyme function from protein sequence. Curr Opin Chem Biol 9(2):202–209CrossRefPubMedGoogle Scholar
  11. 11.
    Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15(3):275–284CrossRefPubMedGoogle Scholar
  12. 12.
    Lee D, Redfern O, Orengo C (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8(12):995–1005CrossRefPubMedGoogle Scholar
  13. 13.
    Mazumder R, Vasudevan S (2008) Structure-guided comparative analysis of proteins: principles, tools, and applications for predicting function. PLoS Comput Biol 4(9):e1000151CrossRefPubMedGoogle Scholar
  14. 14.
    Sjolander K (2004) Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20(2):170–179CrossRefPubMedGoogle Scholar
  15. 15.
    Brown DM et al (2005) Identification of novel genes in Arabidopsis involved in secondary cell wall formation using expression profiling and reverse genetics. Plant Cell 17(8):2281–2295CrossRefPubMedGoogle Scholar
  16. 16.
    Nariai N, Kolaczyk ESD, Kasif S (2007) Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS One 2(3):e337CrossRefPubMedGoogle Scholar
  17. 17.
    Yin Y, Huang J, Xu Y (2009) The cellulose synthase superfamily in fully sequenced plants and algae. BMC Plant Biol 9(1):99CrossRefPubMedGoogle Scholar
  18. 18.
    Ming R et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452(7190):991–996CrossRefPubMedGoogle Scholar
  19. 19.
    Cantarel BL et al (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res 37(Database issue):D233–D238CrossRefPubMedGoogle Scholar
  20. 20.
    Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedGoogle Scholar
  21. 21.
    Marchler-Bauer A et al (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 37(Database issue):D205–D210CrossRefPubMedGoogle Scholar
  22. 22.
    Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763CrossRefPubMedGoogle Scholar
  23. 23.
    Katoh K et al (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518CrossRefPubMedGoogle Scholar
  24. 24.
    Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52(5):696–704CrossRefPubMedGoogle Scholar
  25. 25.
    Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56(4):564–577CrossRefPubMedGoogle Scholar
  26. 26.
    Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552PubMedGoogle Scholar
  27. 27.
    Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins 72(2):547–556CrossRefPubMedGoogle Scholar
  28. 28.
    Berman HM et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242CrossRefPubMedGoogle Scholar
  29. 29.
    Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815CrossRefPubMedGoogle Scholar
  30. 30.
    Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309CrossRefPubMedGoogle Scholar
  31. 31.
    Kerrien S et al (2007) IntAct—open source resource for molecular interaction data. Nucleic Acids Res 35(Database issue):D561–D565CrossRefPubMedGoogle Scholar
  32. 32.
    Geisler-Lee J et al (2007) A predicted interactome for Arabidopsis. Plant Physiol 145(2):317–329CrossRefPubMedGoogle Scholar
  33. 33.
    Cui J et al (2007) AtPID: Arabidopsis thaliana protein interactome database—an integrative platform for plant systems biology. Nucleic Acids Res 36(Database issue):D999–D1008CrossRefPubMedGoogle Scholar
  34. 34.
    Raghavachari B et al (2008) DOMINE: a database of protein domain interactions. Nucleic Acids Res 36(Database issue):D656–D661PubMedGoogle Scholar
  35. 35.
    Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35CrossRefPubMedGoogle Scholar
  36. 36.
    Small I et al (2004) Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences. PROTEOMICS 4(6):1581–1590CrossRefPubMedGoogle Scholar
  37. 37.
    Chen H, Huang N, Sun Z (2006) SubLoc: a server/client suite for protein subcellular location based on SOAP. Bioinformatics 22(3):376–377CrossRefPubMedGoogle Scholar
  38. 38.
    Kaundal R, Zhao PX (2009) AtSubP: the Arabidopsis subcellular localization prediction server. Available from
  39. 39.
    Kaundal R, Raghava GPS (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. PROTEOMICS 9(9):2324–2342CrossRefPubMedGoogle Scholar
  40. 40.
    York WS, O’Neill MA (2008) Biochemical control of xylan biosynthesis—which end is up? Curr Opin Plant Biol 11(3):258–265CrossRefPubMedGoogle Scholar
  41. 41.
    Mohnen D (2008) Pectin structure and biosynthesis. Curr Opin Plant Biol 11(3):266–277CrossRefPubMedGoogle Scholar
  42. 42.
    Singh SK et al (2005) Cell adhesion in Arabidopsis thaliana is mediated by ECTOPICALLY PARTING CELLS 1—a glycosyltransferase (GT64) related to the animal exostosins. Plant J 43(3):384–397CrossRefPubMedGoogle Scholar
  43. 43.
    Zhong R, Demura T, Ye ZH (2006) SND1, a NAC domain transcription factor, is a key regulator of secondary wall synthesis in fibers of Arabidopsis. Plant Cell 18(11):3158–3170CrossRefPubMedGoogle Scholar
  44. 44.
    Dereeper A et al (2008) robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36(Web Server issue):W465–W469CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC. 2009

Authors and Affiliations

  • Fenglou Mao
    • 1
    • 2
  • Yanbin Yin
    • 1
    • 2
  • Fengfeng Zhou
    • 1
    • 2
  • Wen-Chi Chou
    • 1
    • 2
  • Chan Zhou
    • 1
    • 2
  • Huiling Chen
    • 1
    • 2
  • Ying Xu
    • 1
    • 2
    Email author
  1. 1.Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of BioinformaticsUniversity of GeorgiaAthensUSA
  2. 2.DOE BioEnergy Science Center (BESC)Oak RidgeUSA

Personalised recommendations