Abstract
A massive requirement of information vitalized the importance of managing enormous amount of data. It becomes a herculean task to fetch the anticipated data from large data storage as it includes text processing, text mining, pattern recognition, data cleaning etc., The need for concurrent events and coming up with high performance processing models to extract data is a challenge to the researchers. One of the solutions to this challenge is concurrent process to match string on processing models. While, some of the mechanisms do perform very well in practice. Frequent works have been published on this subject and research is still active in this area as the scope and opportunities to develop the new techniques is perennial. This paper proposes N-folded parallel string matching mechanism. This mechanism would be able to divide the input sequence files into various parts and the same would be distributed to the processors. Considering this mechanism as a model, experiments have been conducted considering chloroplast, mitochondria and different categories of plants genome sequence file as input for different sizes with seven possible patterns. The results of the experiment made evident that N-folded parallel string matching mechanism can reduce the processing time on a multi processor system.
Similar content being viewed by others
References
Crochemore M, Rytter W, Crochemore M (1994) Text algorithms, vol 698. Oxford University Press, New York
Simon Y, Inayatullah M (2004) Improving approximate matching capabilities for meta map transfer applications. In: Proceedings of symposium on principles and practice of programming in java, pp 143–147
Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discret Algorithms 7(4):579–594
Luis Russo L, Navarro G, Oliveira A, Morales P (2009) Approximate string matching with compressed indexes algorithm. Algorithms 2(3):1105–1136
Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discret Algorithms 8(4):418–428
Viswanadha Raju S et al. (2011) Recent advancement is parallel algorithms for string matching on computing models—a survey and experimental results. LNCS, Springer, pp 270–278, ISBN: 978-3-642-29279-8
Viswanadha Raju S et al. (2011) PDM data classification from STEP–an object oriented String matching approach. In: IEEE conference on application of information and communication technologies, pp 1–9. ISBN: 978-1-61284-831-0
Boyer RS, Moore JS (1977) A fast string searching algorithm, Carom. Assoc Comput Mach 20(10):262–277
Knuth D, Morris JH Jr, Pratt V (1977) Fast pattern matching in strings. SIAM J Comput 6(2):323–350
Sunday DM (1990) A very fast substring search algorithm. Commun ACM 33(8):132–142
Charras C, Lecrog T, Pehoushek JD (1998) A very fast string matching algorithm for small alphabets and long patterns. In: Combinatorial pattern matching. Springer, Berlin, pp 55–64
Rasool A, Khare N (2013) Performance improvement of BMH and BMHS using PDJ (possible double jump) and MValue (match value). Int J Comput Appl 72(1):1–6
Kim HJ, Lee S-W (2013) A hardware-based string matching using state transition compression for deep packet inspection. ETRI J 35(1):154–157
Dharmapurikar S, Krishnamurthy P, Sproull TS, Lockwood JW (2004) Deep packet inspection using parallel bloom filters. IEEE Micro 24(1):52–61
Al-Mamory SO, Hamid A, Abdul-Razak A, Falah Z (2010) String matching enhancement for snort IDS. In: 2010 5th international conference on computer sciences and convergence information technology (ICCIT), pp 1020–1023
Kun B, Nai-jie G, Kun T, Xiao-hu L, Gang L (2005) A practical distributed string matching algorithm architecture and implementation. World Acad Sci Eng Technol 10:1307–6884
Park JH, George KM (1999) Parallel string matching algorithms based on dataflow. In: IEEE Hawaii international conference on system sciences
Kawulok J (2013) Approximate string matching for searching DNA sequences. Int J Biosci Biochem Bioinform 3(2):145–148
dos Reis CCT, Cruz O (2005) “Approximate String Matching Algorithm Using Parallel Methods for Molecular Sequence Camparisons. In IEEE data compression conference, pp 140–143
Zubair M et al. (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121
Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170
Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Int J Parallel Prog 41(4):526–551
Nolte J, Horton P (2001) Parallel sequence matching with TACO’s distributed object groups—a case study from molecular biology. Clust Comput 4(1):71–77
Tseng K-K, Lin Y-D, Lee T-H, Lai Y-C (2005) A parallel automaton string matching with pre-hashing and root-indexing techniques for content filtering coprocessor. In: IEEE international conference on application-specific systems, pp 113–118
Dandass YS, Burgess SC, Lawrence M, Bridges SM (2008) Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinform 9(1):1–11
Hart SN, Sarangi V, Moore R, Baheti S, Bhavsar JD, Couch FJ, Kocher JP (2013) SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations. PLoS One 8(12):e83356
Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom 16(1):1
Alam KK, Chang JL, Burke DH (2015) FASTaptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol Therapy 4(3):e230
Yang X, Liu D, Lv N, Zhao F, Liu F, Zou J, Chen Y, Xiao X, Wu J, Liu P, Gao J (2015) TCRklass: a new k-string–based algorithm for human and mouse TCR repertoire characterization. J Immunol 194(1):446–454
Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genom Data 7:307–317
Sun D, Wang X (2016) A Q-gram filter for local alignment in large genomic database. Int J Hybrid Inf Technol 9(1):221–232
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Katari, B.R., Viswanadha Raju, S. N-Folded Parallel String Matching Mechanism. Ann. Data. Sci. 3, 339–384 (2016). https://doi.org/10.1007/s40745-016-0086-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40745-016-0086-8