Abstract
In this paper, an adaptive genetic algorithm is used to conduct an in-depth study and analysis of English text background elimination, and a corresponding model is designed. The curve results after the initial character editorialization are curved and transformed, and the adaptive genetic algorithm is used for the transformation to solve the influence of multiple inflection points of curve images on feature extraction. Then, using the minimum deviation method, the error values of the input characters and the sample set in the spatial coordinate system are calculated, and the deviation values of the angle and the straight line are used to match the characters with the smallest deviation value to match the highest degree. To enhance identification accuracy, a genetic algorithm is used to iterate the feature sets of angles and line segments, and the optimum features are ultimately generated via cross evolution of generations. The character library is then utilized as an input item for average grouping for trials, and the resulting feature sets are placed in a position matrix and compared one by one to the samples in the database. It is found that the improved stroke-structure feature extraction algorithm based on a genetic algorithm can improve the recognition accuracy and better accomplish the recognition task with better results compared to others. Finally, by analyzing the limitations and characteristics of traditional particle swarm optimization algorithm and differential evolution algorithm, and giving full play to the advantages and applicability of different algorithms, a new differential evolution particle swarm algorithm with better performance and more stable performance is proposed. The algorithm is based on the PSO algorithm, and when the population update of the PSO algorithm is stagnant and the search space is limited, the crossover and mutation operations of the DE algorithm are used to perturb the population, increase the diversity of the population, and improve the global optimization ability of the algorithm. The algorithm is tested on a common dataset for text mining to verify the effectiveness and feasibility of the algorithm.
Similar content being viewed by others
Data availability
The codes written to run the simulations presented in this paper are available upon request to the authors.
References
Abasi AK, Khader AT, Al-Betar MA, Naim S, Makhadmeh SN, Alyasseri ZAA (2021) A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering. Multimed Tools Appl 80(1):37–82
Ahlawat S, Rishi R (2019) A genetic algorithm based feature selection for handwritten digit recognition. Recent Pat Comput Sci 12(4):304–316
Azmi SD, Kusumaningrum R (2019) Relevance feedback using genetic algorithm on information retrieval for indonesian language documents. J Inf Syst Eng Bus Intell 5(2):171–182
Bibi K, Naz S, Rehman A (2020) Biometric signature authentication using machine learning techniques: current trends, challenges and opportunities. Multimed Tools Appl 79(1):289–340
Biswas A, Islam MS (2021) An efficient CNN model for automated digital handwritten digit classification. J Inf Syst Eng Bus Intell 7(1):42–55
Chouhan SS, Kaul A, Singh UP (2019) Image segmentation using computational intelligence techniques. Arch Comput Methods Eng 26(3):533–596
Dutta S, Saha N, Das AK, Ghosh S (2019) Clustering model for microblogging sites using dimension reduction techniques. Int J Inf Syst Model Des 10(2):26–45
Elharrouss O, Almaadeed N, Al-Maadeed S, Akbari Y (2020) Image inpainting: a review. Neural Process Lett 51(2):2007–2028
Feng Z, Sun P (2019) Segmentacion de imagenes medicas basada en la red neuronal optimizada de GA. Invest Clin 60(1):233–241
Kumar A, Jaiswal A (2019) Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter. Multimed Tools Appl 78(20):29529–29553
Kumar M, Jindal MK, Sharma RK, Jindal SR (2019) Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif Intell Rev 52(4):2235–2261
Li X, Liu T, Li A, Zhang L, Dai W, Jin L, Feng J (2021) Genetic polymorphisms and the independent evolution of major histocompatibility complex class II-DRB in sibling bat species Rhinolophus episcopus and Rhinolophus siamensis. J Zool Syst Evol Res 59(4):887–901
Lizarraga A, Sprinkle J, Lysecky R (2020) Automated model-based optimization of data-adaptable embedded systems. ACM Trans Embed Comput Syst 19(1):1–22
Ma J, Xue B, Zhang M (2019) A hybrid filter-wrapper feature selection approach for authorship attribution. Int J Innov Comput Inf Control 15(5):1989–2006
Pandey D, Pandey BK, Wairya S (2021) Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images. Soft Comput 25(2):1563–1580
Puri S, Singh SP (2020) A fuzzy matching based image classification system for printed and handwritten text documents. J Inf Technol Res 13(2):155–194
Ratre A (2020) Stochastic gradient descent–whale optimization algorithm-based deep convolutional neural network to crowd emotion understanding. Comput J 63(2):267–282
Sanches SRR, Sementille AC, Aguilar IA, Freire V (2021) Recommendations for evaluating the performance of background subtraction algorithms for surveillance systems. Multimed Tools Appl 80(3):4421–4454
Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett 50(3):2281–2304
Talha M, Azeem S, Sohail M, Javed A, Tariq R (2020) Mediating effects of reflexivity of top management team between team processes and decision performance. Azerb J Educ Stud 690
Talha M (2020) A history of development in brain chips in present and future. Int J Psychos Rehabil 24(02)
Talha M (2021) Financial statement analysis of atlas honda motors, indus motors and pak suzuki motors (Evidence from Pakistan). Ilkogretim Online 20(4)
Virk IS, Maini R (2020) Medical image segmentation based on fuzzy 2-partition Kapur entropy using fast recursive algorithm. Int J Intell Eng Inf 8(4):346–373
Wang J, Qin JH, Xiang XY, Tan Y, Pan N (2019) CAPTCHA recognition based on deep convolutional neural network. Math Biosci Eng 16(5):5851–5861
Xu J, Ding W, Zhao H (2020) Based on improved edge detection algorithm for English text extraction and restoration from color images. IEEE Sens J 20(20):11951–11958
Ziani A, Azizi N, Zenakhra D, Cheriguene S, Aldwairi M (2019) Combining RSS-SVM with genetic algorithm for Arabic opinions analysis. Int J Intell Syst Technol Appl 18(1–2):152–178
Funding
This work is supported by the Project of Shandong Province Higher Educational Science and Technology Program: A Corpus-based Study on the Translation of Gaomi Dialect in English Versions of Mo Yan's Novels (No. J18RA238).
Author information
Authors and Affiliations
Contributions
TX designed the model, collected dataset, performed the analysis, validated the results, written and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors that they have declared no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xiaohui, T. An adaptive genetic algorithm-based background elimination model for English text. Soft Comput 26, 8133–8143 (2022). https://doi.org/10.1007/s00500-022-07204-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07204-7