High Performance Genomic Sequencing: A Filtered Approach
Protein and DNA homology detection systems are an essential part in computational biology applications. These algorithms have changed over the time from dynamic programming approaches by finding the optimal local alignment between two sequences to statistical approaches with different kinds of heuristics that minimize former executions times. However, the continuously increasing size of input datasets is being projected into the use of High Performance Computing (HPC) hardware and software in order to address this problem. The aim of the research presented in this paper is to propose a new filtering methodology, based on general-purpose graphical processor units (GP-GPUs) and multi-core processors, for removing those sequences considered irrelevant in terms of homology and similarity. The proposed methodology is completely independent from the homology detection algorithm. This approach is very useful for researchers and practitioners because they do not need to understand a new algorithm. This design has been approved by the National Biotechnology Research Center of Spain (CNB).
KeywordsComparison and alignment methods BLAST High Performance Computing GP-GPU
Unable to display preview. Download preview PDF.
- 1.National Human Genome Research Institute. Why are genetics and genomics important to my health (2014), http://www.genome.gov/19016904
- 2.Weiss, B.: Genomics companies ripe for flurry of mergers. The Wall Street Journal (2013), http://www.marketwatch.com/story/genomics-companies-ripe-for-flurry-of-mergers-2013-04-16
- 3.Humphries, C.: A Hospital Takes Its Own Big-Data Medicine. MIT Technology Review (2013), http://www.technologyreview.com/news/518916/a-hospital-takes-its-own-big-data-medicine/
- 5.Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California (1994)Google Scholar
- 9.Nordin, M., Rahman, A., Yazid, M., Saman, M., Ahmad, A., Osman, A., Tap, M.: A Filtering Algorithm for Efficient Retrieving of DNA Sequence. International Journal of Computer Theory and Engineering 1(2), 1793–8201 (2009)Google Scholar
- 10.Xiao, S., Lin, H., Feng, W.: Accelerating Protein Sequence Search in a Heterogeneous Computing System. In: Parallel & Distributed Processing Symposium, IPDPS (2011)Google Scholar
- 14.Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model of Evolutionary Change in Proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5(3), pp. 345–352 (1978)Google Scholar
- 17.Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology (2004)Google Scholar
- 20.Johnson, M.S., Overington, J.P.: A structural basis for sequence comparisons. An evaluation of scoring methodologies. J. Mol. Biol. 233, 716–738 (1993)Google Scholar
- 22.Darling, A., Carey, L., Feng, W.: The Design, Implementation, and Evaluation of mpiBLAST. In: Proc. of the 4th Intl. Conf. on Linux Clusters, p. 14 (2003)Google Scholar
- 23.Cornelis, P.: Pseudomonas: Genomics and Molecular Biology. Caister Academic Press (2008) ISBN 1-904455-19-0Google Scholar