The techniques that are usually used to detect transposable elements (TEs) in nucleic acid sequences rely on sequence similarity with previously characterized elements. However, these methods are likely to miss many elements in various organisms. We tested two strategies for the detection of unknown elements. The first, which we call “TBLASTX strategy,” searches for TE sequences by comparing the six-frame translations of the nucleic acid sequences of known TEs with the genomic sequence of interest. The second, “repeat-based strategy,” searches genomic sequences for long repeats and clusters them in groups of similar sequences. TE copies from a given family are expected to cluster together. We tested the Drosophila melanogaster genomic sequence and the recently sequenced Anopheles gambiae genome in which most TEs remain unknown. We showed that the “TBLASTX strategy” is very efficient as it detected at least 332 new TE families in D. melanogaster and 400 in A. gambiae. This was unexpected in Drosophila as TEs of this organism have been extensively studied. The “repeat-based strategy” appeared to be very inefficient because of two problems: (i) TE copies are heavily deleted and few copies share homologous regions, and (ii) segmental duplications are frequent and it is not easy to distinguish them from TE copies.
Transposable elements Segmental duplications Annotations Bioinformatics Genomics Drosophila melanogasterAnopheles gambiae