Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker

Huda, Ahsan; Jordan, I. King

doi:10.1007/978-1-59745-251-9_16

Ahsan Huda² &
I. King Jordan²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 537))

5370 Accesses
15 Citations

Abstract

Eukaryotic genomes are full of repetitive DNA, transposable elements (TEs) in particular, and accordingly there are a number of computational methods that can be used to identify TEs from genomic sequences. We present here a survey of two of the most readily available and widely used bioinformatics applications for the detection, characterization, and analysis of TE sequences in eukaryotic genomes: CENSOR and RepeatMasker. For each program, information on availability, input, output, and the algorithmic methods used is provided. Specific examples of the use of CENSOR and RepeatMasker are also described. CENSOR and RepeatMasker both rely on homology-based methods for the detection of TE sequences. There are several other classes of methods available for the analysis of repetitive DNA sequences including de novo methods that compare genomic sequences against themselves, class-specific methods that use structural characteristics of specific classes of elements to aid in their identification, and pipeline methods that combine aspects of some or all of the aforementioned methods. We briefly consider the strengths and weaknesses of these different classes of methods with an emphasis on their complementary utility for the analysis of repetitive DNA in eukaryotes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.
Article PubMed CAS Google Scholar
Jurka, J. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16, 418–20.
Article PubMed CAS Google Scholar
Jurka, J., Kapitonov, V. V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–7.
Article PubMed CAS Google Scholar
Jurka, J., and Milosavljevic, A. (1991) Reconstruction and analysis of human Alu genes. J Mol Evol 32, 105–21.
Article PubMed CAS Google Scholar
Jurka, J., Walichiewicz, J., and Milosavljevic, A. (1992) Prototypic sequences for human repetitive DNA. J Mol Evol 35, 286–91.
Article PubMed CAS Google Scholar
Jurka, J., Klonowski, P., Dagman, V., and Pelton, P. (1996) CENSOR – a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20, 119–21.
Article PubMed CAS Google Scholar
Kohany, O., Gentles, A. J., Hankus, L., and Jurka, J. (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 7, 474.
Article PubMed Google Scholar
Milosavljevic, A., and Jurka, J. (1993) Discovering simple DNA sequences by the algorithmic significance method. Comput Appl Biosci 9, 407–11.
PubMed CAS Google Scholar
Smit, A. F. A., Hubley, R., and Green, P. (1996–2004) RepeatMasker Open-3.0 http://www.repeatmasker.org
Britten, R. J., and Kohne, D. E. (1968) Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161, 529–40.
Article PubMed CAS Google Scholar
Morgulis, A., Gertz, E. M., Schaffer, A. A., and Agarwala, R. (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–41.
Article PubMed CAS Google Scholar
Bao, Z., and Eddy, S. R. (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12, 1269–76.
Article PubMed CAS Google Scholar
McCarthy, E. M., Liu, J., Lizhi, G., and McDonald, J. F. (2002) Long terminal repeat retrotransposons of Oryza sativa. Genome Biol 3, RESEARCH0053.
Article PubMed Google Scholar
McCarthy, E. M., and McDonald, J. F. (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–7.
Article PubMed CAS Google Scholar
Rho, M., Choi, J. H., Kim, S., Lynch, M., and Tang, H. (2007) De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 8, 90.
Article PubMed Google Scholar
Yang, G., and Hall, T. C. (2003) MAK, a computational tool kit for automated MITE analysis. Nucleic Acids Res 31, 3659–65.
Article PubMed CAS Google Scholar
Quesneville, H., Bergman, C. M., Andrieu, O., Autard, D., Nouaud, D., Ashburner, M., and Anxolabehere, D. (2005) Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 1, 166–75.
Article PubMed CAS Google Scholar
Gish, W. (1996–2004) WU-BLAST http://blast.wustl.edu
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–402.
Article PubMed CAS Google Scholar
Green, P. (1994–1999) PHRAP and CROSS_MATCH http://www.phrap.org/phredphrap/phrap.html
Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–7.
Article PubMed CAS Google Scholar
Bedell, J. A., Korf, I., and Gish, W. (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16, 1040–1.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The authors wish to thank Leonardo Mariño-Ramírez and Jittima Piriyapongsa for comments and technical support. Ahsan Huda and I. King Jordan are supported by the School of Biology at the Georgia Institute of Technology.

Author information

Authors and Affiliations

School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
Ahsan Huda & I. King Jordan

Authors

Ahsan Huda
View author publications
You can also search for this author in PubMed Google Scholar
I. King Jordan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, 36310, Spain
David Posada

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Huda, A., Jordan, I.K. (2009). Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_16

Download citation

DOI: https://doi.org/10.1007/978-1-59745-251-9_16
Published: 28 February 2009
Publisher Name: Humana Press
Print ISBN: 978-1-58829-910-9
Online ISBN: 978-1-59745-251-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics