# Identifying atypically expressed chromosome regions using RNA-Seq data

- 46 Downloads

## Abstract

The number of studies dealing with RNA-Seq data analysis has experienced a fast increase in the past years making this type of gene expression a strong competitor to the DNA microarrays. This paper proposes a Bayesian model to detect low and highly-expressed chromosome regions using RNA-Seq data. The methodology is based on a recent work designed to detect highly-expressed (overexpressed) regions in the context of microarray data. A hidden Markov model is developed by considering a mixture of Gaussian distributions with ordered means in a way that first and last mixture components are supposed to accommodate the under and overexpressed genes, respectively. The model is flexible enough to efficiently deal with the highly irregular spaced configuration of the data by assuming a hierarchical Markov dependence structure. The analysis of four cancer data sets (breast, lung, ovarian and uterus) is presented. Results indicate that the proposed model is selective in determining the expression status, robust with respect to prior specifications and provides tools for a global or local search of under and overexpressed chromosome regions.

## Keywords

Bayesian inference Mixture model Gibbs sampling Gene expression Cancer## Notes

### Acknowledgements

The authors would like to thank an anonymous referee for constructive comments to improve this work.

## References

- Albert JH (1992) Bayesian estimation of normal ogive item response curves using Gibbs sampling. J Educ Behav Stat 17:251–269CrossRefGoogle Scholar
- Anders S, Huber W (2010) Differential expression analysis for sequencing count data. Genome Biol 11:R106CrossRefGoogle Scholar
- Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Maguire XAJ, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA (2010) Integrative analysis of the melanoma transcriptome. Genome Res 20:413–427CrossRefGoogle Scholar
- Bivand R, Piras G (2015) Comparing implementations of estimation methods for spatial econometrics. J Stat Softw 63(18):1–36CrossRefGoogle Scholar
- Broet P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20:2562–2571CrossRefGoogle Scholar
- Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinform 11:94CrossRefGoogle Scholar
- Chu Y, Corey DR (2012) RNA sequencing: platform selection, experimental design and data interpretation. Nucl Acid Ther 22(4):271–274CrossRefGoogle Scholar
- Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szczesniak MW, Gaffney D, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-Seq data analysis. Genome Biol 17:13CrossRefGoogle Scholar
- Dean N, Raftery AE (2005) Normal uniform mixture differential gene expression detection for cDNA microarrays. BMC Bioinform 6(1):173–187CrossRefGoogle Scholar
- Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloe D, Le-Gall C, Schaeffer B, Le-Crom S, Guedj M, Jaffrezic F (2012) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14(6):671–683CrossRefGoogle Scholar
- Do KA, Muller P, Tang F (2005) A Bayesian mixture model for differential gene expression. J R Stat Soc Ser C 54(3):627–644MathSciNetCrossRefGoogle Scholar
- Frazee AC, Sabunciyan S, Hansen KD, Irizarry RA, Leek JT (2014) Differential expression analysis of DNA-Seq data at single-base resolution. Biostatistics 15(3):413–426CrossRefGoogle Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80CrossRefGoogle Scholar
- Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: Bernardo JM, Berger J, Dawid AP, Smith AFM (eds) Bayesian statistics, vol 4. Oxford University Press, Oxford, pp 169–193Google Scholar
- Green PJ (1995) Reversible jump MCMC and Bayesian model determination. Biometrika 82(4):711–732MathSciNetCrossRefGoogle Scholar
- Han Y, Chen J, Zhao X, Liang C, Wang Y, Sun L, Jiang Z, Zhang Z, Yang R, Chen J, Li Z, Tang A, Li Z, Ye J, Guan Z, Gui Y, Cai Z (2011) MicroRNA expression signatures of bladder cancer revealed by deep sequencing. PLoS One 6(3):e18286CrossRefGoogle Scholar
- Hansen KD, Irizarry RA, Wu Z (2012) Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 41(2):204–216CrossRefGoogle Scholar
- Hebenstreit D, Fang M, Gu M, Charoensawan V, Van-Oudenaarden A, Teichmann SA (2011) RNA sequencing reveal two major classes of gene expression levels in metazoan cells. Mol Syst Biol 7:497. https://doi.org/10.1038/msb.2011.28 CrossRefGoogle Scholar
- Lewin A, Bochkina N, Richardson S (2007) Fully Bayesian mixture model for differential gene expression: simulations and model checks. Stat Appl Genet Mol Biol 6:36. https://doi.org/10.2202/1544-6115.1314 MathSciNetCrossRefzbMATHGoogle Scholar
- Liu JS (1994) The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Am Stat Assoc 89:958–966MathSciNetCrossRefGoogle Scholar
- Lucas JE, Kung HN, Chi JTA (2010) Latent factor analysis to discover pathway associated putative segmental aneuploidies in human cancers. PLoS Comput Biol 6:e1000920CrossRefGoogle Scholar
- Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458(7234):97–101CrossRefGoogle Scholar
- Mayrink VD, Gonçalves FB (2017) A Bayesian hidden Markov mixture model to detect overexpressed chromosome regions. J R Stat Soc Ser C 66(2):387–412MathSciNetCrossRefGoogle Scholar
- McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucl Acids Res 40:4288–4297CrossRefGoogle Scholar
- Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1):17–23MathSciNetCrossRefGoogle Scholar
- Nueda MJ, Tarazona S, Conesa A (2014) Next maSigPro: updating maSigPro bioconductor package for RNA-Seq time series. Bioinformatics 30(18):2598–2602CrossRefGoogle Scholar
- Oshlack A, Robinson MD, Young MD (2010) From RNA-Seq reads to differential expression results. Genome Biol 11(12):220. https://doi.org/10.1186/gb-2010-11-12-220 CrossRefGoogle Scholar
- Papastamoulis P, Rattray M (2018) A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data. J R Stat Soc Ser C 67(1):3–23MathSciNetCrossRefGoogle Scholar
- Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1):7–11Google Scholar
- Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Dale ALB, Brown PO (2002) Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci USA 99:12963–12968CrossRefGoogle Scholar
- R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 10 Oct 2019
- Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biol 11:R25CrossRefGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140CrossRefGoogle Scholar
- Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-Seq data. BMC Bioinform 14:91CrossRefGoogle Scholar
- Van-De-Wiel MA, Leday GGR, Pardo L, Rue H, Van-Der-Vaart AW, Van-Wieringen WN (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics 14(1):113–128CrossRefGoogle Scholar
- Wagner GP, Kin K, Lynch VJ (2013) A model based criterion for gene expression calls using RNA-Seq data. Theory Biosci 132(3):159–164. https://doi.org/10.1007/s12064-013-0178-3 CrossRefGoogle Scholar
- Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefGoogle Scholar
- Zhang H, Xu J, Jiang N, Hu X, Luo Z (2015) PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data. Stat Med 34:1577–1589MathSciNetCrossRefGoogle Scholar