Skip to main content
Log in

N-Folded Parallel String Matching Mechanism

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

A massive requirement of information vitalized the importance of managing enormous amount of data. It becomes a herculean task to fetch the anticipated data from large data storage as it includes text processing, text mining, pattern recognition, data cleaning etc., The need for concurrent events and coming up with high performance processing models to extract data is a challenge to the researchers. One of the solutions to this challenge is concurrent process to match string on processing models. While, some of the mechanisms do perform very well in practice. Frequent works have been published on this subject and research is still active in this area as the scope and opportunities to develop the new techniques is perennial. This paper proposes N-folded parallel string matching mechanism. This mechanism would be able to divide the input sequence files into various parts and the same would be distributed to the processors. Considering this mechanism as a model, experiments have been conducted considering chloroplast, mitochondria and different categories of plants genome sequence file as input for different sizes with seven possible patterns. The results of the experiment made evident that N-folded parallel string matching mechanism can reduce the processing time on a multi processor system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Crochemore M, Rytter W, Crochemore M (1994) Text algorithms, vol 698. Oxford University Press, New York

    Google Scholar 

  2. Simon Y, Inayatullah M (2004) Improving approximate matching capabilities for meta map transfer applications. In: Proceedings of symposium on principles and practice of programming in java, pp 143–147

  3. Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discret Algorithms 7(4):579–594

    Article  Google Scholar 

  4. Luis Russo L, Navarro G, Oliveira A, Morales P (2009) Approximate string matching with compressed indexes algorithm. Algorithms 2(3):1105–1136

    Article  Google Scholar 

  5. Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discret Algorithms 8(4):418–428

    Article  Google Scholar 

  6. Viswanadha Raju S et al. (2011) Recent advancement is parallel algorithms for string matching on computing models—a survey and experimental results. LNCS, Springer, pp 270–278, ISBN: 978-3-642-29279-8

  7. Viswanadha Raju S et al. (2011) PDM data classification from STEP–an object oriented String matching approach. In: IEEE conference on application of information and communication technologies, pp 1–9. ISBN: 978-1-61284-831-0

  8. Boyer RS, Moore JS (1977) A fast string searching algorithm, Carom. Assoc Comput Mach 20(10):262–277

    Google Scholar 

  9. Knuth D, Morris JH Jr, Pratt V (1977) Fast pattern matching in strings. SIAM J Comput 6(2):323–350

    Article  Google Scholar 

  10. http://www.nlm.nih.gov/news/pressreleases/dnarna100gig.html

  11. http://www.ncbi.nlm.nih.gov

  12. http://www.ebi.ac.uk/embl

  13. http://www.ddbj.nig.ac.jp

  14. Sunday DM (1990) A very fast substring search algorithm. Commun ACM 33(8):132–142

    Article  Google Scholar 

  15. Charras C, Lecrog T, Pehoushek JD (1998) A very fast string matching algorithm for small alphabets and long patterns. In: Combinatorial pattern matching. Springer, Berlin, pp 55–64

  16. Rasool A, Khare N (2013) Performance improvement of BMH and BMHS using PDJ (possible double jump) and MValue (match value). Int J Comput Appl 72(1):1–6

    Google Scholar 

  17. Kim HJ, Lee S-W (2013) A hardware-based string matching using state transition compression for deep packet inspection. ETRI J 35(1):154–157

    Article  Google Scholar 

  18. Dharmapurikar S, Krishnamurthy P, Sproull TS, Lockwood JW (2004) Deep packet inspection using parallel bloom filters. IEEE Micro 24(1):52–61

    Article  Google Scholar 

  19. Al-Mamory SO, Hamid A, Abdul-Razak A, Falah Z (2010) String matching enhancement for snort IDS. In: 2010 5th international conference on computer sciences and convergence information technology (ICCIT), pp 1020–1023

  20. Kun B, Nai-jie G, Kun T, Xiao-hu L, Gang L (2005) A practical distributed string matching algorithm architecture and implementation. World Acad Sci Eng Technol 10:1307–6884

    Google Scholar 

  21. Park JH, George KM (1999) Parallel string matching algorithms based on dataflow. In: IEEE Hawaii international conference on system sciences

  22. Kawulok J (2013) Approximate string matching for searching DNA sequences. Int J Biosci Biochem Bioinform 3(2):145–148

    Google Scholar 

  23. dos Reis CCT, Cruz O (2005) “Approximate String Matching Algorithm Using Parallel Methods for Molecular Sequence Camparisons. In IEEE data compression conference, pp 140–143

  24. Zubair M et al. (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121

  25. Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170

    Article  Google Scholar 

  26. Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Int J Parallel Prog 41(4):526–551

    Article  Google Scholar 

  27. Nolte J, Horton P (2001) Parallel sequence matching with TACO’s distributed object groups—a case study from molecular biology. Clust Comput 4(1):71–77

    Article  Google Scholar 

  28. Tseng K-K, Lin Y-D, Lee T-H, Lai Y-C (2005) A parallel automaton string matching with pre-hashing and root-indexing techniques for content filtering coprocessor. In: IEEE international conference on application-specific systems, pp 113–118

  29. Dandass YS, Burgess SC, Lawrence M, Bridges SM (2008) Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinform 9(1):1–11

    Article  Google Scholar 

  30. Hart SN, Sarangi V, Moore R, Baheti S, Bhavsar JD, Couch FJ, Kocher JP (2013) SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations. PLoS One 8(12):e83356

    Article  Google Scholar 

  31. Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom 16(1):1

    Article  Google Scholar 

  32. Alam KK, Chang JL, Burke DH (2015) FASTaptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol Therapy 4(3):e230

    Google Scholar 

  33. Yang X, Liu D, Lv N, Zhao F, Liu F, Zou J, Chen Y, Xiao X, Wu J, Liu P, Gao J (2015) TCRklass: a new k-string–based algorithm for human and mouse TCR repertoire characterization. J Immunol 194(1):446–454

    Article  Google Scholar 

  34. Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genom Data 7:307–317

    Article  Google Scholar 

  35. Sun D, Wang X (2016) A Q-gram filter for local alignment in large genomic database. Int J Hybrid Inf Technol 9(1):221–232

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Butchi Raju Katari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Katari, B.R., Viswanadha Raju, S. N-Folded Parallel String Matching Mechanism. Ann. Data. Sci. 3, 339–384 (2016). https://doi.org/10.1007/s40745-016-0086-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-016-0086-8

Keywords

Navigation