Skip to main content
Log in

Searching String in Big-Data: A Better Approach by Applied Machine Learning

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

There are hot patterns of huge data sets in numerous regions in the course of the most recent 5 years. Looking through string or pattern from a gigantic record is intense generally in the event of randomized situation. These require the advancement of new algorithmic strategies to investigate such enormous data sets and solve optimization errands utilizing sorting on indexing levels and Applied Machine Learning Models. One of the essential strides in text processing is string searching and pattern matching. A word search algorithm works by finding the first or all the occurrences of a word in a textual data or ASCII files. The pre-processing phase is used to determine the formula for number of positions by which the pattern needs to be shifted in case of a mismatch in the matching phase. The fundamental objective of string search or pattern matching algorithms is to increase efficiency by reducing the number of comparisons and increase the length of shifts in event of a mismatch. The issue of efficiency of string search algorithms has probably never been considered so seriously and genuinely until the virtual content explosion caused by the web and the task of mining valuable data and information from it. In this paper, a better search algorithm “Tara–Paras String Search” is introduced that is faster than conventional Binary Search and Interpolation Search. Indexing levels are introduced by length of the word, sequence total of alphabets and starting letter of the word to reduce the size of input. For analysis, 2 data sets are considered. The dictionary of English words having more than 109,000 words and a list of more than 2.5 Lac sorted numbers and uniformly distributed (multiples of 7) are taken for data sets. Analysis and Implementations Models have been implemented, compared and executed in Python with time complexity and obviously Applied Machine Learning will select the faster one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Boyer RS, Moore JS. A fast string searching algorithm. Commun ACM. 1977;20:762–72.

    Article  Google Scholar 

  2. Knuth DE, Morris JH, Pratt VR. Fast pattern matching in strings. Siam. 1977;6(2):323–50.

    Article  MathSciNet  Google Scholar 

  3. Horspool RN. Practical fast searching in strings. Softw Pract Exp. 1980;10:501–6.

    Article  Google Scholar 

  4. Karp RM, Rabin MO. Efficient randomized pattern-matching algorithms. IBM J Res Dev. 1987;31(2):249–60.

    Article  MathSciNet  Google Scholar 

  5. Raita T. Tuning the boyer-moore-horspool string searching algorithm. Softw Pract Exp. 1992;22(10):879–84.

    Article  Google Scholar 

  6. Haddi E, Liu X, Shi Y (2013) The role of text pre-processing in sentiment analysis. In: International conference on information technology and quantitative management, pp 231–234

  7. Gurung D, Chakraborty UK, Sharma P (2016) Intelligent predictive string search algorithm. In: Proceedings of international conference on communication, computing and virtualization (ICCCV) 2016, Elsevier procedia computer science, vol 79, pp 161–169

  8. Arne A, Mattsson C (1993) Dynamic interpolation search in o(log log n) time. In: Proceeding of the 20th int colloquium on automata, languages and programming, London, UK, pp. 15–27

  9. Saha G, Raju SS. Interpolation sort and its implementation with strings. Int J Comput Theory Eng. 2012;4(5):772–6.

    Article  Google Scholar 

  10. Knuth D (1998) Sorting and searching. The art of computer programming.3 (2 ed.)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paras Nath Singh.

Ethics declarations

Conflict of interest

Author Dr. Paras Nath Singh declares that he has conflict of interest. Authur Tara P. Gowder declares that she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Data Science and Communication” guest edited by Kamesh Namudri, Naveen Chilamkurti, Sushma S. J. and S. Padmashree.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, P.N., Gowdar, T.P. Searching String in Big-Data: A Better Approach by Applied Machine Learning. SN COMPUT. SCI. 2, 192 (2021). https://doi.org/10.1007/s42979-021-00569-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00569-w

Keywords

Navigation