A Comprehensive Analysis Workflow for Genome-Wide Screening Data from ChIP-Sequencing Experiments
ChIP-sequencing is a new technique for generating short DNA sequences useful in analyzing DNA-protein interactions and carrying out genome-wide studies. Although there are some studies to process and analyze ChIP-sequencing data, a complete workflow has not been reported yet. The size of the data and broad range of biological questions are the main challenges to establish a data analysis workflow for ChIP-sequencing data. In this paper, we present the ChIP-sequencing data analysis workflow that we developed at the Ohio State University Comprehensive Cancer Center Bioinformatics Shared Resources. This pipeline utilizes 1) use of different mapping algorithms such as Eland, MapReads, SeqMap, RMAP to align short sequence reads to the reference genome 2) a novel normalization algorithm to detect significant binding densities and to compare binding densities of different experiments 3) gene database mapping and 3D binding density visualization 4) distributed computing and high performance computing (HPC) support.
KeywordsChIP-seq workflow short sequence mapping parallelization normalization visualization
Unable to display preview. Download preview PDF.
- 2.Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., Thiessen, N., Griffith, O.L., He, A., Marra, M., Snyder, M., Jones, S.: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007)CrossRefPubMedGoogle Scholar
- 3.Cox, A.: ELAND: Efficient Local Alignment of Nucleotide Data (unpublished)Google Scholar
- 4.Zhang, Z., et al.: Fast flexible mapping of AB SOLiD short sequence reads (unpublished)Google Scholar
- 8.Bozdag, D., Barbacioru, C., Catalyurek, U.: Parallel Short Sequence Mapping for High Throughput Genome Sequencing. In: 23rd International Parallel and Distributed Processing Symposium (to appear) (2009)Google Scholar
- 10.Dudoit, S., Yang, Y.H., Callow, M.J., Speed, T.P.: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12, 111–140 (2002)Google Scholar