Skip to main content

Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker

  • Protocol
  • First Online:
Bioinformatics for DNA Sequence Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 537))

Abstract

Eukaryotic genomes are full of repetitive DNA, transposable elements (TEs) in particular, and accordingly there are a number of computational methods that can be used to identify TEs from genomic sequences. We present here a survey of two of the most readily available and widely used bioinformatics applications for the detection, characterization, and analysis of TE sequences in eukaryotic genomes: CENSOR and RepeatMasker. For each program, information on availability, input, output, and the algorithmic methods used is provided. Specific examples of the use of CENSOR and RepeatMasker are also described. CENSOR and RepeatMasker both rely on homology-based methods for the detection of TE sequences. There are several other classes of methods available for the analysis of repetitive DNA sequences including de novo methods that compare genomic sequences against themselves, class-specific methods that use structural characteristics of specific classes of elements to aid in their identification, and pipeline methods that combine aspects of some or all of the aforementioned methods. We briefly consider the strengths and weaknesses of these different classes of methods with an emphasis on their complementary utility for the analysis of repetitive DNA in eukaryotes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.

    Article  PubMed  CAS  Google Scholar 

  2. Jurka, J. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16, 418–20.

    Article  PubMed  CAS  Google Scholar 

  3. Jurka, J., Kapitonov, V. V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–7.

    Article  PubMed  CAS  Google Scholar 

  4. Jurka, J., and Milosavljevic, A. (1991) Reconstruction and analysis of human Alu genes. J Mol Evol 32, 105–21.

    Article  PubMed  CAS  Google Scholar 

  5. Jurka, J., Walichiewicz, J., and Milosavljevic, A. (1992) Prototypic sequences for human repetitive DNA. J Mol Evol 35, 286–91.

    Article  PubMed  CAS  Google Scholar 

  6. Jurka, J., Klonowski, P., Dagman, V., and Pelton, P. (1996) CENSOR – a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20, 119–21.

    Article  PubMed  CAS  Google Scholar 

  7. Kohany, O., Gentles, A. J., Hankus, L., and Jurka, J. (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 7, 474.

    Article  PubMed  Google Scholar 

  8. Milosavljevic, A., and Jurka, J. (1993) Discovering simple DNA sequences by the algorithmic significance method. Comput Appl Biosci 9, 407–11.

    PubMed  CAS  Google Scholar 

  9. Smit, A. F. A., Hubley, R., and Green, P. (1996–2004) RepeatMasker Open-3.0 http://www.repeatmasker.org

  10. Britten, R. J., and Kohne, D. E. (1968) Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161, 529–40.

    Article  PubMed  CAS  Google Scholar 

  11. Morgulis, A., Gertz, E. M., Schaffer, A. A., and Agarwala, R. (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–41.

    Article  PubMed  CAS  Google Scholar 

  12. Bao, Z., and Eddy, S. R. (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12, 1269–76.

    Article  PubMed  CAS  Google Scholar 

  13. McCarthy, E. M., Liu, J., Lizhi, G., and McDonald, J. F. (2002) Long terminal repeat retrotransposons of Oryza sativa. Genome Biol 3, RESEARCH0053.

    Article  PubMed  Google Scholar 

  14. McCarthy, E. M., and McDonald, J. F. (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–7.

    Article  PubMed  CAS  Google Scholar 

  15. Rho, M., Choi, J. H., Kim, S., Lynch, M., and Tang, H. (2007) De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 8, 90.

    Article  PubMed  Google Scholar 

  16. Yang, G., and Hall, T. C. (2003) MAK, a computational tool kit for automated MITE analysis. Nucleic Acids Res 31, 3659–65.

    Article  PubMed  CAS  Google Scholar 

  17. Quesneville, H., Bergman, C. M., Andrieu, O., Autard, D., Nouaud, D., Ashburner, M., and Anxolabehere, D. (2005) Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 1, 166–75.

    Article  PubMed  CAS  Google Scholar 

  18. Gish, W. (1996–2004) WU-BLAST http://blast.wustl.edu

  19. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–402.

    Article  PubMed  CAS  Google Scholar 

  20. Green, P. (1994–1999) PHRAP and CROSS_MATCH http://www.phrap.org/phredphrap/phrap.html

  21. Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–7.

    Article  PubMed  CAS  Google Scholar 

  22. Bedell, J. A., Korf, I., and Gish, W. (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16, 1040–1.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Leonardo Mariño-Ramírez and Jittima Piriyapongsa for comments and technical support. Ahsan Huda and I. King Jordan are supported by the School of Biology at the Georgia Institute of Technology.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Huda, A., Jordan, I.K. (2009). Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-251-9_16

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-910-9

  • Online ISBN: 978-1-59745-251-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics