Abstract
Eukaryotic genomes are full of repetitive DNA, transposable elements (TEs) in particular, and accordingly there are a number of computational methods that can be used to identify TEs from genomic sequences. We present here a survey of two of the most readily available and widely used bioinformatics applications for the detection, characterization, and analysis of TE sequences in eukaryotic genomes: CENSOR and RepeatMasker. For each program, information on availability, input, output, and the algorithmic methods used is provided. Specific examples of the use of CENSOR and RepeatMasker are also described. CENSOR and RepeatMasker both rely on homology-based methods for the detection of TE sequences. There are several other classes of methods available for the analysis of repetitive DNA sequences including de novo methods that compare genomic sequences against themselves, class-specific methods that use structural characteristics of specific classes of elements to aid in their identification, and pipeline methods that combine aspects of some or all of the aforementioned methods. We briefly consider the strengths and weaknesses of these different classes of methods with an emphasis on their complementary utility for the analysis of repetitive DNA in eukaryotes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.
Jurka, J. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16, 418–20.
Jurka, J., Kapitonov, V. V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–7.
Jurka, J., and Milosavljevic, A. (1991) Reconstruction and analysis of human Alu genes. J Mol Evol 32, 105–21.
Jurka, J., Walichiewicz, J., and Milosavljevic, A. (1992) Prototypic sequences for human repetitive DNA. J Mol Evol 35, 286–91.
Jurka, J., Klonowski, P., Dagman, V., and Pelton, P. (1996) CENSOR – a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20, 119–21.
Kohany, O., Gentles, A. J., Hankus, L., and Jurka, J. (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 7, 474.
Milosavljevic, A., and Jurka, J. (1993) Discovering simple DNA sequences by the algorithmic significance method. Comput Appl Biosci 9, 407–11.
Smit, A. F. A., Hubley, R., and Green, P. (1996–2004) RepeatMasker Open-3.0 http://www.repeatmasker.org
Britten, R. J., and Kohne, D. E. (1968) Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161, 529–40.
Morgulis, A., Gertz, E. M., Schaffer, A. A., and Agarwala, R. (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–41.
Bao, Z., and Eddy, S. R. (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12, 1269–76.
McCarthy, E. M., Liu, J., Lizhi, G., and McDonald, J. F. (2002) Long terminal repeat retrotransposons of Oryza sativa. Genome Biol 3, RESEARCH0053.
McCarthy, E. M., and McDonald, J. F. (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–7.
Rho, M., Choi, J. H., Kim, S., Lynch, M., and Tang, H. (2007) De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 8, 90.
Yang, G., and Hall, T. C. (2003) MAK, a computational tool kit for automated MITE analysis. Nucleic Acids Res 31, 3659–65.
Quesneville, H., Bergman, C. M., Andrieu, O., Autard, D., Nouaud, D., Ashburner, M., and Anxolabehere, D. (2005) Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 1, 166–75.
Gish, W. (1996–2004) WU-BLAST http://blast.wustl.edu
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–402.
Green, P. (1994–1999) PHRAP and CROSS_MATCH http://www.phrap.org/phredphrap/phrap.html
Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–7.
Bedell, J. A., Korf, I., and Gish, W. (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16, 1040–1.
Acknowledgments
The authors wish to thank Leonardo Mariño-RamÃrez and Jittima Piriyapongsa for comments and technical support. Ahsan Huda and I. King Jordan are supported by the School of Biology at the Georgia Institute of Technology.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Huda, A., Jordan, I.K. (2009). Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-59745-251-9_16
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-910-9
Online ISBN: 978-1-59745-251-9
eBook Packages: Springer Protocols