Skip to main content

Advertisement

Log in

Bias and Correction in RNA-seq Data for Marine Species

  • Original Article
  • Published:
Marine Biotechnology Aims and scope Submit manuscript

Abstract

RNA-seq is a recently developed approach widely used for transcriptome profiling in biological analyses that use next-generation sequencing technologies. Accurate estimation of gene expression levels is critical for answering biological questions. Here, we show that the commonly used measure of gene expression levels, fragments per kilobase of transcript per million mapped reads (FPKM), is biased in transcript length, GC content, and dinucleotide frequencies in the RNA-seq analysis of marine species. We used a generalized linear model to correct the observed biases of FPKM. We used RNA-seq data sets from eight species obtained by different sequencing methods to evaluate the correction methods. Our work contributes to the understanding of potential technical artifacts in RNA-seq experiments for marine species, and presents a means by which more accurate gene expression measures can be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Chen S, Zhang G, Shao C, Huang Q, Liu G, Zhang P, Song W, An N, Chalopin D, Volff JN, Hong Y, Li Q, Sha Z, Zhou H, Xie M, Yu Q, Liu Y, Xiang H, Wang N, Wu K, Yang C, Zhou Q, Liao X, Yang L, Hu Q, Zhang J, Meng L, Jin L, Tian Y, Lian J, Yang J, Miao G, Liu S, Liang Z, Yan F, Li Y, Sun B, Zhang H, Zhang J, Zhu Y, Du M, Zhao Y, Schartl M, Tang Q, Wang J (2014) Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle. Nat Genet 46:253–260

    Article  CAS  PubMed  Google Scholar 

  • De Wit P, Palumbi SR (2013) Transcriptome-wide polymorphisms of red abalone (Haliotis rufescens) reveal patterns of gene flow and local adaptation. Mol Ecol 22:2884–2897

    Article  PubMed  Google Scholar 

  • Dunning MJ, Barbosa-Morais NL, Lynch AG, Tavare S, Ritchie ME (2008) Statistical issues in the analysis of Illumina data. BMC Bioinformatics 9:85

    Article  PubMed  PubMed Central  Google Scholar 

  • Flegel C, Schobel N, Altmuller J, Becker C, Tannapfel A, Hatt H, Gisselmann G (2015) RNA-Seq analysis of human trigeminal and dorsal root ganglia with a focus on chemoreceptors. PloS One 10:e0128951

    Article  PubMed  PubMed Central  Google Scholar 

  • Franchini P, Van Der Merwe M, Roodt-Wilding R (2011) Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis. BMC Res Notes 4:59

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu X, Sun Y, Wang J, Xing Q, Zou J, Li R, Wang Z, Wang S, Hu X, Zhang L, Bao Z (2014) Sequencing-based gene network analysis provides a core set of gene resource for understanding thermal adaptation in Zhikong scallop Chlamys farreri. Mol Ecol Resour 14:184–198

    Article  CAS  PubMed  Google Scholar 

  • Gleason LU, Burton RS (2015) RNA-seq reveals regional differences in transcriptome response to heat stress in the marine snail Chlorostoma funebralis. Mol Ecol 24:610–627

    Article  CAS  PubMed  Google Scholar 

  • Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131

    Article  PubMed  PubMed Central  Google Scholar 

  • Holt RA, Jones SJ (2008) The new paradigm of flow cell sequencing. Genome Res 18:839–846

    Article  CAS  PubMed  Google Scholar 

  • Jabbari K, Bernardi G (1998) CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224:123–127

    Article  CAS  PubMed  Google Scholar 

  • Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36

    Article  PubMed  PubMed Central  Google Scholar 

  • Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K (2002) A high-resolution recombination map of the human genome. Nat Genet 31:241–247

    CAS  PubMed  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, Fitzhugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, Levine R, Mcewan P, Mckernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, Mcmurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, Mcpherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  CAS  PubMed  Google Scholar 

  • Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Proc GPD (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Li J, Jiang H, Wong WH (2010) Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol 11:R50

    Article  PubMed  PubMed Central  Google Scholar 

  • Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, Grammes F, Grove H, Gjuvsland A, Walenz B, Hermansen RA, Von Schalburg K, Rondeau EB, Di Genova A, Samy JKA, Vik JO, Vigeland MD, Caler L, Grimholt U, Jentoft S, Vage DI, De Jong P, Moen T, Baranski M, Palti Y, Smith DR, Yorke JA, Nederbragt AJ, Tooming-Klunderud A, Jakobsen KS, Jiang XT, Fan DD, Liberles DA, Vidal R, Iturra P, Jones SJM, Jonassen I, Maass A, Omholt SW, Davidson WS (2016) The Atlantic salmon genome provides insights into rediploidization. Nature 533:200–205

    Article  CAS  PubMed  Google Scholar 

  • Lin Q, Fan S, Zhang Y, Xu M, Zhang H, Yang Y, Lee AP, Woltering JM, Ravi V, Gunter HM, Luo W, Gao Z, Lim ZW, Qin G, Schneider RF, Wang X, Xiong P, Li G, Wang K, Min J, Zhang C, Qiu Y, Bai J, He W, Bian C, Zhang X, Shan D, Qu H, Sun Y, Gao Q, Huang L, Shi Q, Meyer A, Venkatesh B (2016) The seahorse genome and the evolution of its specialized morphology. Nature 540:395–399

    Article  CAS  PubMed  Google Scholar 

  • Liu ZJ, Liu SK, Yao J, Bao LS, Zhang JR, Li Y, Jiang C, Sun LY, Wang RJ, Zhang Y, Zhou T, Zeng QF, Fu Q, Gao S, Li N, Koren S, Jiang YL, Zimin A, Xu P, Phillippy AM, Geng X, Song L, Sun FY, Li C, Wang XZ, Chen AL, Jin YL, Yuan ZH, Yang YJ, Tan SX, Peatman E, Lu JG, Qin ZK, Dunham R, Li ZX, Sonstegard T, Feng JB, Danzmann RG, Schroeder S, Scheffler B, Duke MV, Ballard L, Kucuktas H, Kaltenboeck L, Liu HX, Armbruster J, Xie YJ, Kirby ML, Tian Y, Flanagan ME, Mu WJ, Waldbieser GC (2016) The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts. Nat Commun 7:11757

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628

    Article  CAS  PubMed  Google Scholar 

  • Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nam BH, Kwak W, Kim YO, Kim DG, Kong HJ, Kim WJ, Kang JH, Park JY, An CM, Moon JY, Park CJ, Yu JW, Yoon J, Seo M, Kim K, Kim DK, Lee S, Sung S, Lee C, Shin Y, Jung M, Kang BC, Ga-Hee S, Ka S, Caetano-Anolles K, Cho S, Kim H (2017) Genome sequence of Pacific abalone (Haliotis discus hannai): the first draft genome in family Haliotidae. GigaScience 6:1–8

  • Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14

    Article  PubMed  PubMed Central  Google Scholar 

  • Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PloS One 7:e30619

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650–1667

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rondon R, Akcha F, Alonso P, Menard D, Rouxel J, Montagnani C, Mitta G, Cosseau C, Grunau C (2016) Transcriptional changes in Crassostrea gigas oyster spat following a parental exposure to the herbicide diuron. Aquat Toxicol 175:47–55

    Article  CAS  PubMed  Google Scholar 

  • Shao C, Bao B, Xie Z, Chen X, Li B, Jia X, Yao Q, Orti G, Li W, Li X, Hamre K, Xu J, Wang L, Chen F, Tian Y, Schreiber AM, Wang N, Wei F, Zhang J, Dong Z, Gao L, Gai J, Sakamoto T, Mo S, Chen W, Shi Q, Li H, Xiu Y, Li Y, Xu W, Shi Z, Zhang G, Power DM, Wang Q, Schartl M, Chen S (2016) The genome and transcriptome of Japanese flounder provide insights into flatfish asymmetry. Nat Genet 49:119–124

  • Shi M, Lin Y, Xu G, Xie L, Hu X, Bao Z, Zhang R (2013) Characterization of the Zhikong scallop (Chlamys farreri) mantle transcriptome and identification of biomineralization-related genes. Mar Biotechnol (NY) 15:706–715

    Article  CAS  Google Scholar 

  • Simakov O, Marletaz F, Cho SJ, Edsinger-Gonzales E, Havlak P, Hellsten U, Kuo DH, Larsson T, Lv J, Arendt D, Savage R, Osoegawa K, De Jong P, Grimwood J, Chapman JA, Shapiro H, Aerts A, Otillar RP, Terry AY, Boore JL, Grigoriev IV, Lindberg DR, Seaver EC, Weisblat DA, Putnam NH, Rokhsar DS (2013) Insights into bilaterian evolution from three spiralian genomes. Nature 493:526–531

    Article  CAS  PubMed  Google Scholar 

  • Song K, Li L, Zhang GF (2016) Coverage recommendation for genotyping analysis of highly heterologous species using next-generation sequencing technology. Sci Rep 6:35736

  • Sun XJ, Yang AG, Wu BA, Zhou LQ, Liu ZH (2015) Characterization of the mantle transcriptome of yesso scallop (Patinopecten yessoensis): identification of genes potentially involved in biomineralization and pigmentation. PloS One 10:e0122967

    Article  PubMed  PubMed Central  Google Scholar 

  • Sun J, Zhang Y, Xu T, Zhang Y, Mu H, Zhang Y, Lan Y, Fields CJ, Hui JHL, Zhang W (2017) Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes. Nat Ecol Evol 1:0121

    Article  Google Scholar 

  • Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, Guo X, Huan P, Dong B, Zhang L (2017) Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol 1:0120

    Article  Google Scholar 

  • Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453:1239–1243

    Article  CAS  PubMed  Google Scholar 

  • Wong YH, Sun J, He LS, Chen LG, Qiu JW, Qian PY (2015) High-throughput transcriptome sequencing of the cold seep mussel Bathymodiolus platifrons. Sci Rep 5:16597

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang GF, Fang XD, Guo XM, Li L, Luo RB, Xu F, Yang PC, Zhang LL, Wang XT, Qi HG, Xiong ZQ, Que HY, Xie YL, Holland PWH, Paps J, Zhu YB, Wu FC, Chen YX, Wang JF, Peng CF, Meng J, Yang L, Liu J, Wen B, Zhang N, Huang ZY, Zhu QH, Feng Y, Mount A, Hedgecock D, Xu Z, Liu YJ, Domazet-Loso T, Du YS, Sun XQ, Zhang SD, Liu BH, Cheng PZ, Jiang XT, Li J, Fan DD, Wang W, Fu WJ, Wang T, Wang B, Zhang JB, Peng ZY, Li YX, Li N, Wang JP, Chen MS, He Y, Tan FJ, Song XR, Zheng QM, Huang RL, Yang HL, Du XD, Chen L, Yang M, Gaffney PM, Wang S, Luo LH, She ZC, Ming Y, Huang W, Zhang S, Huang BY, Zhang Y, Qu T, Ni PX, Miao GY, Wang JY, Wang Q, Steinberg CEW, Wang HY, Li N, Qian LM, Zhang GJ, Li YR, Yang HM, Liu X, Wang J, Yin Y, Wang J (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490:49–54

    Article  CAS  PubMed  Google Scholar 

  • Zhao XL, Yu H, Kong LF, Li Q (2012) Transcriptomic responses to salinity stress in the Pacific oyster Crassostrea gigas. PloS One 7:e46244

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhao X, Yu H, Kong L, Liu S, Li Q (2014) Comparative transcriptome analysis of two oysters, Crassostrea gigas and Crassostrea hongkongensis provides insights into adaptation to hypo-osmotic conditions. PLoS One 9:e111915

    Article  PubMed  PubMed Central  Google Scholar 

  • Zheng W, Chung LM, Zhao H (2011) Bias detection and correction in RNA-sequencing data. BMC Bioinformatics 12:290

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zheng P, Wang M, Li C, Sun X, Wang X, Sun Y, Sun S (2017) Insights into deep-sea adaptations and host-symbiont interactions: a comparative transcriptome study on Bathymodiolus mussels and their coastal relatives. Mol Ecol. https://doi.org/10.1111/mec.14160

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (11701546, 31530079, 31572620), the Earmarked Fund for Modern Agro-industry Technology Research System (CARS-48), the Strategic Priority Research Program of “Western Pacific Ocean System: Structure, Dynamics and Consequences” (XDA11000000), and Technological Innovation Project Financially Supported by Qingdao National Laboratory for Marine Science and Technology (2015ASKJ02-03).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Li Li or Guofan Zhang.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Electronic Supplementary Material

ESM 1

(DOCX 6986 kb).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, K., Li, L. & Zhang, G. Bias and Correction in RNA-seq Data for Marine Species. Mar Biotechnol 19, 541–550 (2017). https://doi.org/10.1007/s10126-017-9773-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10126-017-9773-5

Keywords

Navigation