Advertisement

GenSeeK: A Novel Parallel Multiple Pattern Recognition Algorithm for DNA Sequences

  • Kaliuday Balleda
  • D. Satyanvesh
  • P. K. Baruah
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 243)

Abstract

DNA sequences are huge in size, and the genome databases are growing exponentially every year. One of the key elements in computational biology is genomic data. There are many real-time applications, such as DNA profiling and real-time crime investigation, which requires the biological subjects DNA sequences at real time. To retrieve this, data in real time require lot of computational power and resources. Throughput is one of the main bottleneck for applications such as DNA sequence searching or pattern matching. This paper presents a new DNA sequence multiple pattern recognition algorithm which computes on compressed space. This algorithm is efficient in terms of computational complexity and the amount of resources required during the computation in real time, the main reason for this behavior is that it does the computations on compressed sequences. This algorithm is implemented using index-based technique, and the sequential code is optimized. The proposed algorithm is mainly focused on achieving good comparison per character ratio as well as high throughput. The parallel version of the algorithm is implemented using multicore for achieving high throughput. The techniques used in development of this algorithm can be directly translated into huge DNA database search.

Keywords

Pattern matching Multicore 

Notes

Acknowledgments

We would like to dedicate this work to founder Chancellor of SSSIHL, Bhagawan Sri Sathya Sai Baba. Without His grace, this work would have remained a dream for us. This work was partially supported by a NVIDIA grant under professor partnership program and the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science foundation grant number OCI-1053575.

References

  1. 1.
    Satyanvesh, D., Balleda, K., Padyana, A., Baruah, P.K.: Gencodex: A novel algorithm for compressing DNA sequences on multicores and GPUs. In 19th IEEE International Conference on High Performance Computing (Hipc2012), Student Research Symposium (2012)Google Scholar
  2. 2.
    Morris, J.M., Pratt, V.R., Knuth, D.E.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)MATHMathSciNetGoogle Scholar
  3. 3.
    Aqel, M., EI Emary, I.M.M., Alqadi, Z.A.A.: Multiple skip multiple pattern matching algorithms. IAENG Int. J. 34(2) (2007)Google Scholar
  4. 4.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM. 18(6), 333–340 (1975)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Somayajulu, D., Bhukya, R.: An index based forward backward multiple pattern matching algorithm. World Acad. Sci. Technol. 42, 347–355 (2010)Google Scholar
  6. 6.
    Somayajulu, D., Bhukya R.: An index based k-partition multiple pattern matching algorithm. ACEEE Int. J. Netw. Secur. 2, (2011)Google Scholar
  7. 7.
    Somayajulu, D., Bhukya, R.: Exact multiple pattern matching algorithm using DNA sequence and pattern pair. Int. J. Comput. Appl. 17(8), 32–38 (2011). (0975–8887)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  • Kaliuday Balleda
    • 1
  • D. Satyanvesh
    • 1
  • P. K. Baruah
    • 1
  1. 1.Sri Sathya Sai Institute of Higher LearningPrashantinilayamIndia

Personalised recommendations