DNA Sequence Search Using Content-Based Image Search Approach
In this work, we investigate a new method to search DNA sequences based on multimedia retrieval approach. We try to address the issues related to index sizes and performance by first transforming the DNA sequences into images, and then index these images using content-based image indexing techniques. The main goal is to allow users retrieve similar gene sequences using stored image features rather than the sequence itself. We suggest two algorithms to do the conversions, each of which have been tested to reveal its sensitivity to both sequence length and sequence changes. We have also compared our approach to BLAST, which were used as a reference system. The result from our experiments has shown that this approach performed well with respect to size and speed, but more work must be done to improve it in terms of search sensitivity.
KeywordsSequence Length Index Size Naive Approach Search Sensitivity Search Speed
Unable to display preview. Download preview PDF.
- 1.Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of molecular biology 215(3), 403–410 (1990)Google Scholar
- 4.Brown, A.L.: Constructing chromosome scale suffix trees. In: Proceedings of the 2nd conference on Asia-Pacific bioinformatics, pp. 105–112. Australian Computer Society (2004)Google Scholar
- 9.Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(Suppl. 1), S312 (2002)Google Scholar
- 10.Hunt, E., Atkinson, M.P., Irving, R.W.: A database index to large biological sequences. In: VLDB 2001: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 139–148. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
- 11.Kanz, C., et al.: The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 33(1), D29–D33 (2005)Google Scholar