Abstract
RNA-seq is a recently developed approach widely used for transcriptome profiling in biological analyses that use next-generation sequencing technologies. Accurate estimation of gene expression levels is critical for answering biological questions. Here, we show that the commonly used measure of gene expression levels, fragments per kilobase of transcript per million mapped reads (FPKM), is biased in transcript length, GC content, and dinucleotide frequencies in the RNA-seq analysis of marine species. We used a generalized linear model to correct the observed biases of FPKM. We used RNA-seq data sets from eight species obtained by different sequencing methods to evaluate the correction methods. Our work contributes to the understanding of potential technical artifacts in RNA-seq experiments for marine species, and presents a means by which more accurate gene expression measures can be obtained.
Similar content being viewed by others
References
Chen S, Zhang G, Shao C, Huang Q, Liu G, Zhang P, Song W, An N, Chalopin D, Volff JN, Hong Y, Li Q, Sha Z, Zhou H, Xie M, Yu Q, Liu Y, Xiang H, Wang N, Wu K, Yang C, Zhou Q, Liao X, Yang L, Hu Q, Zhang J, Meng L, Jin L, Tian Y, Lian J, Yang J, Miao G, Liu S, Liang Z, Yan F, Li Y, Sun B, Zhang H, Zhang J, Zhu Y, Du M, Zhao Y, Schartl M, Tang Q, Wang J (2014) Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle. Nat Genet 46:253–260
De Wit P, Palumbi SR (2013) Transcriptome-wide polymorphisms of red abalone (Haliotis rufescens) reveal patterns of gene flow and local adaptation. Mol Ecol 22:2884–2897
Dunning MJ, Barbosa-Morais NL, Lynch AG, Tavare S, Ritchie ME (2008) Statistical issues in the analysis of Illumina data. BMC Bioinformatics 9:85
Flegel C, Schobel N, Altmuller J, Becker C, Tannapfel A, Hatt H, Gisselmann G (2015) RNA-Seq analysis of human trigeminal and dorsal root ganglia with a focus on chemoreceptors. PloS One 10:e0128951
Franchini P, Van Der Merwe M, Roodt-Wilding R (2011) Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis. BMC Res Notes 4:59
Fu X, Sun Y, Wang J, Xing Q, Zou J, Li R, Wang Z, Wang S, Hu X, Zhang L, Bao Z (2014) Sequencing-based gene network analysis provides a core set of gene resource for understanding thermal adaptation in Zhikong scallop Chlamys farreri. Mol Ecol Resour 14:184–198
Gleason LU, Burton RS (2015) RNA-seq reveals regional differences in transcriptome response to heat stress in the marine snail Chlorostoma funebralis. Mol Ecol 24:610–627
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131
Holt RA, Jones SJ (2008) The new paradigm of flow cell sequencing. Genome Res 18:839–846
Jabbari K, Bernardi G (1998) CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224:123–127
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360
Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K (2002) A high-resolution recombination map of the human genome. Nat Genet 31:241–247
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, Fitzhugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, Levine R, Mcewan P, Mckernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, Mcmurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, Mcpherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Proc GPD (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Li J, Jiang H, Wong WH (2010) Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol 11:R50
Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, Grammes F, Grove H, Gjuvsland A, Walenz B, Hermansen RA, Von Schalburg K, Rondeau EB, Di Genova A, Samy JKA, Vik JO, Vigeland MD, Caler L, Grimholt U, Jentoft S, Vage DI, De Jong P, Moen T, Baranski M, Palti Y, Smith DR, Yorke JA, Nederbragt AJ, Tooming-Klunderud A, Jakobsen KS, Jiang XT, Fan DD, Liberles DA, Vidal R, Iturra P, Jones SJM, Jonassen I, Maass A, Omholt SW, Davidson WS (2016) The Atlantic salmon genome provides insights into rediploidization. Nature 533:200–205
Lin Q, Fan S, Zhang Y, Xu M, Zhang H, Yang Y, Lee AP, Woltering JM, Ravi V, Gunter HM, Luo W, Gao Z, Lim ZW, Qin G, Schneider RF, Wang X, Xiong P, Li G, Wang K, Min J, Zhang C, Qiu Y, Bai J, He W, Bian C, Zhang X, Shan D, Qu H, Sun Y, Gao Q, Huang L, Shi Q, Meyer A, Venkatesh B (2016) The seahorse genome and the evolution of its specialized morphology. Nature 540:395–399
Liu ZJ, Liu SK, Yao J, Bao LS, Zhang JR, Li Y, Jiang C, Sun LY, Wang RJ, Zhang Y, Zhou T, Zeng QF, Fu Q, Gao S, Li N, Koren S, Jiang YL, Zimin A, Xu P, Phillippy AM, Geng X, Song L, Sun FY, Li C, Wang XZ, Chen AL, Jin YL, Yuan ZH, Yang YJ, Tan SX, Peatman E, Lu JG, Qin ZK, Dunham R, Li ZX, Sonstegard T, Feng JB, Danzmann RG, Schroeder S, Scheffler B, Duke MV, Ballard L, Kucuktas H, Kaltenboeck L, Liu HX, Armbruster J, Xie YJ, Kirby ML, Tian Y, Flanagan ME, Mu WJ, Waldbieser GC (2016) The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts. Nat Commun 7:11757
Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349
Nam BH, Kwak W, Kim YO, Kim DG, Kong HJ, Kim WJ, Kang JH, Park JY, An CM, Moon JY, Park CJ, Yu JW, Yoon J, Seo M, Kim K, Kim DK, Lee S, Sung S, Lee C, Shin Y, Jung M, Kang BC, Ga-Hee S, Ka S, Caetano-Anolles K, Cho S, Kim H (2017) Genome sequence of Pacific abalone (Haliotis discus hannai): the first draft genome in family Haliotidae. GigaScience 6:1–8
Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14
Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PloS One 7:e30619
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295
Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650–1667
Rondon R, Akcha F, Alonso P, Menard D, Rouxel J, Montagnani C, Mitta G, Cosseau C, Grunau C (2016) Transcriptional changes in Crassostrea gigas oyster spat following a parental exposure to the herbicide diuron. Aquat Toxicol 175:47–55
Shao C, Bao B, Xie Z, Chen X, Li B, Jia X, Yao Q, Orti G, Li W, Li X, Hamre K, Xu J, Wang L, Chen F, Tian Y, Schreiber AM, Wang N, Wei F, Zhang J, Dong Z, Gao L, Gai J, Sakamoto T, Mo S, Chen W, Shi Q, Li H, Xiu Y, Li Y, Xu W, Shi Z, Zhang G, Power DM, Wang Q, Schartl M, Chen S (2016) The genome and transcriptome of Japanese flounder provide insights into flatfish asymmetry. Nat Genet 49:119–124
Shi M, Lin Y, Xu G, Xie L, Hu X, Bao Z, Zhang R (2013) Characterization of the Zhikong scallop (Chlamys farreri) mantle transcriptome and identification of biomineralization-related genes. Mar Biotechnol (NY) 15:706–715
Simakov O, Marletaz F, Cho SJ, Edsinger-Gonzales E, Havlak P, Hellsten U, Kuo DH, Larsson T, Lv J, Arendt D, Savage R, Osoegawa K, De Jong P, Grimwood J, Chapman JA, Shapiro H, Aerts A, Otillar RP, Terry AY, Boore JL, Grigoriev IV, Lindberg DR, Seaver EC, Weisblat DA, Putnam NH, Rokhsar DS (2013) Insights into bilaterian evolution from three spiralian genomes. Nature 493:526–531
Song K, Li L, Zhang GF (2016) Coverage recommendation for genotyping analysis of highly heterologous species using next-generation sequencing technology. Sci Rep 6:35736
Sun XJ, Yang AG, Wu BA, Zhou LQ, Liu ZH (2015) Characterization of the mantle transcriptome of yesso scallop (Patinopecten yessoensis): identification of genes potentially involved in biomineralization and pigmentation. PloS One 10:e0122967
Sun J, Zhang Y, Xu T, Zhang Y, Mu H, Zhang Y, Lan Y, Fields CJ, Hui JHL, Zhang W (2017) Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes. Nat Ecol Evol 1:0121
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63
Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, Guo X, Huan P, Dong B, Zhang L (2017) Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol 1:0120
Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453:1239–1243
Wong YH, Sun J, He LS, Chen LG, Qiu JW, Qian PY (2015) High-throughput transcriptome sequencing of the cold seep mussel Bathymodiolus platifrons. Sci Rep 5:16597
Zhang GF, Fang XD, Guo XM, Li L, Luo RB, Xu F, Yang PC, Zhang LL, Wang XT, Qi HG, Xiong ZQ, Que HY, Xie YL, Holland PWH, Paps J, Zhu YB, Wu FC, Chen YX, Wang JF, Peng CF, Meng J, Yang L, Liu J, Wen B, Zhang N, Huang ZY, Zhu QH, Feng Y, Mount A, Hedgecock D, Xu Z, Liu YJ, Domazet-Loso T, Du YS, Sun XQ, Zhang SD, Liu BH, Cheng PZ, Jiang XT, Li J, Fan DD, Wang W, Fu WJ, Wang T, Wang B, Zhang JB, Peng ZY, Li YX, Li N, Wang JP, Chen MS, He Y, Tan FJ, Song XR, Zheng QM, Huang RL, Yang HL, Du XD, Chen L, Yang M, Gaffney PM, Wang S, Luo LH, She ZC, Ming Y, Huang W, Zhang S, Huang BY, Zhang Y, Qu T, Ni PX, Miao GY, Wang JY, Wang Q, Steinberg CEW, Wang HY, Li N, Qian LM, Zhang GJ, Li YR, Yang HM, Liu X, Wang J, Yin Y, Wang J (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490:49–54
Zhao XL, Yu H, Kong LF, Li Q (2012) Transcriptomic responses to salinity stress in the Pacific oyster Crassostrea gigas. PloS One 7:e46244
Zhao X, Yu H, Kong L, Liu S, Li Q (2014) Comparative transcriptome analysis of two oysters, Crassostrea gigas and Crassostrea hongkongensis provides insights into adaptation to hypo-osmotic conditions. PLoS One 9:e111915
Zheng W, Chung LM, Zhao H (2011) Bias detection and correction in RNA-sequencing data. BMC Bioinformatics 12:290
Zheng P, Wang M, Li C, Sun X, Wang X, Sun Y, Sun S (2017) Insights into deep-sea adaptations and host-symbiont interactions: a comparative transcriptome study on Bathymodiolus mussels and their coastal relatives. Mol Ecol. https://doi.org/10.1111/mec.14160
Acknowledgements
This research was supported by the National Natural Science Foundation of China (11701546, 31530079, 31572620), the Earmarked Fund for Modern Agro-industry Technology Research System (CARS-48), the Strategic Priority Research Program of “Western Pacific Ocean System: Structure, Dynamics and Consequences” (XDA11000000), and Technological Innovation Project Financially Supported by Qingdao National Laboratory for Marine Science and Technology (2015ASKJ02-03).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Electronic Supplementary Material
ESM 1
(DOCX 6986 kb).
Rights and permissions
About this article
Cite this article
Song, K., Li, L. & Zhang, G. Bias and Correction in RNA-seq Data for Marine Species. Mar Biotechnol 19, 541–550 (2017). https://doi.org/10.1007/s10126-017-9773-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10126-017-9773-5