Abstract
One of the key features of statistical change-point analysis is to estimate the unknown change-point locations for various statistical models imposed on the sample data. This analysis can be done through a hypothesis testing process, a model selection perspective, or a Bayesian approach, among other methods. Change-point analysis has a wide range of applications in research fields such as statistical quality control, finance and economics, climate study, medicine, genetics, etc. In this paper, a change-point analysis motivated by the modeling of genomic data will be provided. The high throughput next generation sequencing (NGS) technology is now frequently used in profiling tumor and control samples for the study of DNA copy number variants (CNVs). In particular, the ratio of the read count of the tumor sample to that of the control sample is popularly used for identifying CNV regions. To identify CNV regions is equivalent to finding change-points that potentially exist in the NGS reads ratio data. We present a change-point model and a Bayesian solution for the estimation of the change-point locations in NGS reads ratio data. Simulation studies of the proposed method indicate the effectiveness of the proposed method in identifying change-point locations. Applications of the proposed change point model for identifying boundaries of DNA copy number variation (CNV) regions using the next generation sequencing data of breast cancer/tumor cell lines and lung cancer cell line will be presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
R. Redon, S. Ishiwaka, K.R. Fitch, L. Feuk, G.H. Perry, D. Andrews, H. Fiegler, M.H. Shapero, A.R. Carson, W. Chen, E.K. Cho, S. Dallaire, J.L. Freeman, J.R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J.R. MacDonald, C.R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M.J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D.F. Conrad, X. Estivill, C. Tyler-Smith, N.P. Carter, H. Aburatani, C. Lee, K.W. Jones, S.W. Scherer, M.E. Hurles, Global variation in copy number in the human genome. Nature 444, 444–454 (2006)
B. Stranger, M. Forrest, M. Dunning, C. Ingle, C. Beazley, N. Thorne, R. Redon, C. Bird, A. de Grassi, C. Lee, C. Tyler-Smith, N. Carter, S.W. Scherer, S. Tavar, P. Deloukas, M.E. Hurles, E.T. Dermitzakis, Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848 (2007)
J. Sebat, B. Lakshmi, D. Malhotra, J. Troge, C. Lese-Martin, T. Walsh, B. Yamrom, S. Yoon, A. Krasnitz, J. Kendall, A. Leotta, D. Pai, R. Zhang, Y.-H. Lee, J. Hicks, S.J. Spence, A.T. Lee, K. Puura, T. Lehtimki, D. Ledbetter, P.K. Gregersen, J. Bregman, J.S. Sutcliffe, V. Jobanputra, W. Chung, D. Warburton, M.-C. King, D. Skuse, D.H. Geschwind, T.C. Gilliam, K. Ye, M. Wigler, Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007)
P.J. Campbell, P.J. Stephens, E.D. Pleasance, S. O’Meara, H. Li, T. Santarius, L.A. Stebbings, C. Leroy, S. Edkins, C. Hardy, J.W. Teague, A. Menzies, I. Goodhead, D.J. Turner, C.M. Clee, M.A. Quail, A. Cox, C. Brown, R. Durbin, M.E. Hurles, P.A.W. Edwards, G.R. Bignell, M.R. Stratton, P.A. Futreal, Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008)
H. Stefansson, D. Rujescu, S. Cichon, O.P.H. Pietilinen, A. Ingason, S. Steinberg, R. Fossdal, E. Sigurdsson, T. Sigmundsson, J.E. Buizer-Voskamp, T. Hansen, K.D. Jakobsen, P. Muglia, C. Francks, P.M. Matthews, A. Gylfason, B.V. Halldorsson, D. Gudbjartsson, T.E. Thorgeirsson, A. Sigurdsson, A. Jonasdottir, A. Jonasdottir, A. Bjornsson, S. Mattiasdottir, T. Blondal, M. Haraldsson, B.B. Magnusdottir, I. Giegling, H.-J. Mller, A. Hartmann, K.V. Shianna, D. Ge, A.C. Need, C. Crombie, G. Fraser, N. Walker, J. Lonnqvist, J. Suvisaari, A. Tuulio-Henriksson, T. Paunio, T. Toulopoulou, E. Bramon, M. Di Forti, R. Murray, M. Ruggeri, E. Vassos, S. Tosato, M. Walshe, T. Li, C. Vasilescu, T.W. Mhleisen, A.G. Wang, H. Ullum, S. Djurovic, I. Melle, J. Olesen, L.A. Kiemeney, B. Franke, C. Sabatti, N.B. Freimer, J.R. Gulcher, U. Thorsteinsdottir, A. Kong, O.A. Andreassen, R.A. Ophoff, A. Georgi, M. Rietschel, T. Werge, H. Petursson, D.B. Goldstein, M.M. Nthen, L. Peltonen, D.A. Collier, D. St Clair, K. Stefansson, R.S. Kahn, D.H. Linszen, J. Van Os, D. Wiersma, R. Bruggeman, W. Cahn, L. De Haan, L. Krabbendam, I. Myin-Germeys, Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008)
T.-L. Yang, X.-D. Chen, Y. Guo, S.-F. Lei, J.-T. Wang, Q. Zhou, F. Pan, Y. Chen, Z.-X. Zhang, S.-S. Dong, X.-H. Xu, H. Yan, X. Liu, C. Qiu, X.-Z. Zhu, T. Chen, M. Li, H. Zhang, L. Zhang, B.M. Drees, J.J. Hamilton, C.J. Papasian, R.R. Recker, X.-P. Song, J. Cheng, H.-W. Deng, Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. Am. J. Hum. Genet. 83(6), 663–674 (2008)
A. Rovelet-Lecrux, D. Hannequin, G. Raux, N. Le Meur, A. Laquerrire, A. Vital, C. Dumanchin, S. Feuillette, A. Brice, M. Vercelletto, F. Dubas, T. Frebourg, D. Campion, APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat. Genet. 38, 24–26 (2006)
S. Moorthie, C.J. Mattocks, C.F. Wright, Review of massively parallel DNA sequencing technologies. Hugo J. 5, 112 (2001)
S. Yoon, Z. Xuan, V. Makarov, K. Ye, J. Sebat, Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2006)
C.A. Miller, O. Hampton, C. Coarfa, A. Milosavljevic, ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 6(1), e16327 (2011)
A. Magi, L. Tattini, T. Pippucci, F. Torricelli, M. Benelli, Read count approach for DNA copy number variants detection. Bioinformatics 28, 470–478 (2012)
T. Ji, J. Chen, Modeling the next generation sequencing read count data for DNA copy number variant study. Stat. Appl. Genet. Mol. Biol. 14, 361374 (2015)
C. Xie, M.T. Tammi, CNV-seq: a new method to detect copy number variation using high-throughput sequencing. BMC Bioinform. 10, 80 (2009)
A.B. Olshen, E.S. Venkatraman, R. Lucito, M. Wigler, Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4), 557–572 (2004)
D.Y. Chiang, G. Getz, D.B. Jaffe, M.J.T. O’Kelly, X. Zhao, S.L. Carter, C. Russ, C. Nusbaum, M. Meyerson, E.S. Lander, High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009)
T.M. Kim, L.J. Luquette, R. Xi, P.J. Park, rSW-seq: algorithm for detection of copy number alterations in deep sequencing data. BMC Bioinform. 11(432), 1471–2105 (2010)
R. Xi, A.G. Hadjipanayis, L.J. Luquette, T.-M. Kim, E. Lee, J. Zhang, M.D. Johnson, D.M. Muzny, D.A. Wheeler, R.A. Gibbs, R. Kucherlapati, P.J. Park, Copy number variation detection in whole-genome sequencing data using Bayesian information criterion. PNAS 108, E1128–E1136 (2011)
J. Chen, A.K. Gupta, Parametric Statistical Change Point Analysis - With Applications to Genetics, Medicine, and Finance, 2nd edn. (Birkhauser, New York, 2012)
H. Li, J. Vallandingham, J. Chen, SeqBBS: a change-point model based algorithm and R package for searching CNV regions via the ratio of sequencing reads, in Proceedings of the 2013 IEEE International Workshop on Genomic Signal Processing and Statistics (2013), pp. 46–49
J. Chen, Y.-P. Wang, A statistical change point model approach for the detection of DNA copy number variations in array CGH data. IEEE/ACM Trans. Comput. Biol. Bioinform. 6, 529–541 (2009)
J. Chen, A. Yiiter, K.-C. Chang, A Bayesian approach to inference about a change point model with application to DNA copy number experimental data. J. Appl. Stat. 38, 1899–1913 (2011)
L.J. Vostrikova, Detecting “disorder” in multidimensional random processes. Sov. Math. Dokl. 2, 55–59 (1981)
R.E. Bellman, S.E. Dreyfus, Applied Dynamic Programming (Princeton University Press, Princeton, 1962)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, J., Li, H. (2016). A Statistical Change-Point Analysis Approach for Modeling the Ratio of Next Generation Sequencing Reads. In: Letzter, G., et al. Advances in the Mathematical Sciences. Association for Women in Mathematics Series, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-34139-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-34139-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34137-8
Online ISBN: 978-3-319-34139-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)