International Conference Image Analysis and Recognition

ICIAR 2012: Image Analysis and Recognition pp 359-366

Compression of Whole Genome Alignments Using a Mixture of Finite-Context Models

  • Luís M. O. Matos
  • Diogo Pratas
  • Armando J. Pinho
Conference paper

DOI: 10.1007/978-3-642-31295-3_42

Volume 7324 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Matos L.M.O., Pratas D., Pinho A.J. (2012) Compression of Whole Genome Alignments Using a Mixture of Finite-Context Models. In: Campilho A., Kamel M. (eds) Image Analysis and Recognition. ICIAR 2012. Lecture Notes in Computer Science, vol 7324. Springer, Berlin, Heidelberg

Abstract

In the last years, advances in DNA sequencing technology have caused a giant growth in the amount of available data related with genomic sequences. One of those types of data sets is that resulting from multiple sequence alignments (MSA). In this paper, we propose a compression method for compressing these data sets, using a mixture of finite-context models and arithmetic coding. The method relies on image compression concepts, it was tested in the multiz28way data set and attained a compression rate around 0.93 bits per symbol on the sequence data, better than the ≈ 1 bit per symbol attained by a recently proposed method.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Luís M. O. Matos
    • 1
  • Diogo Pratas
    • 1
  • Armando J. Pinho
    • 1
  1. 1.Signal Processing Laboratory, IEETA/DETIUniversity of AveiroAveiroPortugal