Skip to main content

Fast and Parallel Algorithm for Population-Based Segmentation of Copy-Number Profiles

  • Conference paper
  • First Online:
  • 944 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8452))

Abstract

Dynamic Programming (DP) based change-point methods have shown very good statistical performance on DNA copy number analysis. However, the quadratic algorithmic complexity of DP has limited their use on high-density arrays or next generation sequencing data. This complexity issue is particularly critical for segmentation and calling of segments, and for the joint segmentation of many different profiles. Our contribution is two-fold. First we provide an at worst linear DP algorithm for segmentation and calling, which allows the use of DP-based segmentation on high-density arrays with a considerably reduced computational cost. For the joint segmentation issue we provide a parallel version of the cghseg package which now allows us to analyze more than 1,000 profiles of length 100,000 within a few hours. Therefore our method and software package are adapted to the next generation of computers (multi-cores) and experiments (very large profiles).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://cran.r-project.org/web/packages/cghseg

References

  1. Amdahl, G. M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the AFIPS ’67 Spring Joint Computer Conference, 18–20 April 1967 (Spring), pp. 483–485. ACM (1967)

    Google Scholar 

  2. David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006)

    Article  Google Scholar 

  3. Hocking, T.D., Schleiermacher, G., Janoueix-Lerosey, I., Delattre, O., Bach, F., Vert, J.-P.: Learning smoothing models using breakpoint annotations. HAL Technical report 00663790 (2012)

    Google Scholar 

  4. Killick, R., Fearnhead, P., Eckley, I. A.: Optimal detection of changepoints with a linear computational cost. arXiv:1101.1438, January 2011.

  5. Marioni, J.-C., Thorne, N.-P., Tavare, S.: BioHMM: a heterogeneous hidden markov model for segmenting array CGH data. Bioinformatics 22(9), 1144–1146 (2006)

    Article  Google Scholar 

  6. Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., Bessieres, P.: Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res. 30(6), 1418–1426 (2002)

    Article  Google Scholar 

  7. Nicolas, P., Leduc, A., Robin, S., Rasmussen, S., Jarmer, H., Bessieres, P.: Transcriptional landscape estimation from tiling array data using a model of signal shift and drift. Bioinformatics 25(18), 2341–2347 (2009)

    Article  Google Scholar 

  8. Olshen, A.B., Venkatraman, E.S., Lucito, R., Wigler, M.: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4), 557–572 (2004)

    Article  MATH  Google Scholar 

  9. Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B., Robin, S.: Joint segmentation, calling and normalization of multiple array CGH profiles. Biostatistics 12(3), 413–428 (2011)

    Article  Google Scholar 

  10. Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.-J.: A statistical approach for array CGH data analysis. BMC Bioinf. 6, 27 (2005)

    Article  Google Scholar 

  11. Picard, F., Robin, S., Lebarbier, E., Daudin, J.-J.: A segmentation/clustering model for the analysis of array CGH data. Biometrics 63, 758–766 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  12. Pique-Regi, R., Ortega, A., Asgharzadeh, S.: Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics 25(10), 1223–1230 (2009)

    Article  Google Scholar 

  13. Rigaill, G.: Pruned dynamic programming for optimal multiple change-point detection. arxiv:1004.0887, April 2010

  14. Shah, S.P.: Computational methods for identification of recurrent copy number alteration patterns by array CGH. Cytogenet. Genome Res. 123(1–4), 343–351 (2008)

    Article  Google Scholar 

  15. Teo, S.M., Pawitan, Y., Kumar, V., Thalamuthu, A., Seielstad, M., Chia, K.S., Salim, A.: Multi-platform segmentation for joint detection of copy number variants. Bioinformatics 27(11), 1555–1561 (2011)

    Article  Google Scholar 

  16. van de Wiel, M.A., Picard, F., van Wieringen, W.N., Ylstra, B.: Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief. Bioinf. 12(1), 10–21 (2011)

    Article  Google Scholar 

  17. van de Wiel, M.A., Kim, K.I., Vosse, S.J., van Wieringen, W.N., Wilting, S.M., Ylstra, B.: CGHcall: calling aberrations for array cgh tumor profiles. Bioinformatics 23(7), 892–894 (2007)

    Article  Google Scholar 

  18. Willenbrock, H., Fridlyand, J.: A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21(22), 4084–4091 (2005)

    Article  Google Scholar 

  19. Zhang, N.R., Siegmund, D.O.: A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1), 22–32 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  20. Zhang, N.R., Siegmund, D.O., Ji, H., Li, J.Z.: Detecting simultaneous changepoints in multiple sequences. Biometrika 97(3), 631–645 (2010)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillem Rigaill .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rigaill, G., Miele, V., Picard, F. (2014). Fast and Parallel Algorithm for Population-Based Segmentation of Copy-Number Profiles. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09042-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09041-2

  • Online ISBN: 978-3-319-09042-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics