We developed multiple gene expression pipelines and assembled them into a web-based tool called Pop’s Pipes to facilitate preprocessing and analysis of substantial poplar gene expression data. The input data can be spatiotemporal microarray and RNA-seq data from comparable tissues, time points, or treatment-vs-control conditions. Pop’s Pipes can be used to identify differentially expressed genes between one or multiple paired tissues, time points, or treatment-vs-control conditions in a single in silico analysis. The differentially expressed genes (DEGs) obtained for each comparison will be automatically analyzed by Pop’s Pipes for identifying significantly enriched gene ontologies and interpro protein domains. Also, significantly changed metabolic pathways across all input data sets will be identified. We also integrated a pipeline into Pop's Pipes for constructing any of three type gene ontology trees when a short list of gene ontologies from biological processes, molecular functions, or cellular components is used as an input. The resulting information from Pop’s Pipes enables scrutiny to create spatiotemporal models and hypotheses to understand how poplar develops and functions. Pop’s Pipes can analyze a microarray or RNA-seq data set with 10 time points in 4–10 h, with each time point containing three replicates of treatments and three controls. Such a data set usually takes a bioinformatician a few months to a year to analyze. Pop’s Pipes can thus save users tremendous amounts of research time when large numbers of comparative data need to be analyzed.
Poplar Microarray RNA-seq data Differentially expressed genes Pathway enrichment analysis Gene ontology enrichment analysis Protein domain enrichment analysis Pipeline Gene ontology tree
We thank James Bialas for setting up and configuring the server. This project was partly supported with a start-up fund to Dr. Wei from the School of Forest Resources and Environment Science, Michigan Technological University; a grant from the Plant Feedstock Genomics for Bioenergy: A Joint Research Program of USDA and DOE (2009-65504-05767, ER65454-1040591-0018445) to V. B.; and also the National Science Foundation Plant Genome Research Program Grant (DBI-0922391) to V.L.C.
Data archiving statement
RNA-seq data from Ptr-miR397a transgenic lines that were used to test Pop’s pipelines have been deposited into the Gene Expression Omnibus (GEO) database at NCBI (http://www.ncbi.nlm.nih.gov/geo/) with the accession number GSE54153. The data will be available after this manuscript is accepted. The microarray data of Populus deltoides were not our own data, and these data have been published in GEO database with an ID of GSE24349.
Auer PL, Doerge RW (2011) A two-stage Poisson model for testing RNA-seq data. Stat Appl Genet Mol Biol 10:Article 26Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300Google Scholar
Breitling R et al (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573(1–3):83–92PubMedCrossRefGoogle Scholar
Di Y et al (2011) The NBP negative binomial model for assessing differential gene expression from RNA-seq. Stat Appl Genet Mol Biol 10:Article 24Google Scholar
Dinu I et al (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinforma 8:242CrossRefGoogle Scholar
Enguita FJ et al (2003) Crystal structure of a bacterial endospore coat component. A laccase with enhanced thermostability properties. J Biol Chem 278(21):19416–19425PubMedCrossRefGoogle Scholar
Hardcastle TJ, Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinforma 11:422CrossRefGoogle Scholar
Hong F, Wittner B (2008) Bioconductor RankProd Package VignetteGoogle Scholar
Kadota K, Nakai Y, Shimizu K (2009) Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity. Algorithms Mol Biol 4:7PubMedCentralPubMedCrossRefGoogle Scholar
Komori H, Miyazaki K, Higuchi Y (2009) X-ray structure of a two-domain type laccase: a missing link in the evolution of multi-copper proteins. FEBS Lett 583(7):1189–1195PubMedCrossRefGoogle Scholar
Li J, Tibshirani R (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med ResGoogle Scholar
Lu S et al (2013) Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proc Natl Acad Sci U S A 110(26):10848–10853PubMedCentralPubMedCrossRefGoogle Scholar
Opgen-Rhein R, Strimmer K (2007) Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat Appl Genet Mol Biol 6:Article9PubMedGoogle Scholar
Van De Wiel MA et al (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics 14(1):113–128CrossRefGoogle Scholar
Wei H et al (2013a) Global transcriptomic profiling of aspen trees under elevated [CO2] to identify potential molecular mechanisms responsible for enhanced radial growth. J Plant Res 126(2):305–320PubMedCrossRefGoogle Scholar
Wei H et al (2013b) Nitrogen deprivation promotes Populus root growth through global transcriptome reprogramming and activation of hierarchical genetic networks. New Phytol 200(2):483–497PubMedCrossRefGoogle Scholar
Yang L, Conway SR, Poethig RS (2011) Vegetative phase change is mediated by a leaf-derived signal that represses the transcription of miR156. Development 138(2):245–249PubMedCentralPubMedCrossRefGoogle Scholar
Zdobnov EM, Apweiler R (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9):847–848PubMedCrossRefGoogle Scholar