Abstract
These days, exploring information retrieval models is one of the most essential aspects of English sentence retrieval research. These models are driven by diverse retrieval mechanisms that offer varying similarity calculations and directly influence the final result ranking. However, despite decades of work due to technical constraints, deep semantic analysis has been challenging. This gap emphasizes the importance of a precise semantic understanding of information acquisition through learning approaches. Based on the above opening, this paper establishes a fast retrieval model of English sentences based on the statistical language model (SLM). First, the proposed method utilizes SLM to extract significant feature words from the corpus. These feature words are identified by analyzing co-occurrence patterns and frequency distributions within the standard. Second, it employs the N-gram model to calculate the probabilities of word occurrences based on their contextual dependencies. This framework represents feature words and their associated probabilities in a structured manner by capturing the intricate nuances of language semantics. Third, the model integrates ontology to bridge the gap between human language and machine understanding by enabling the mapping natural language expressions to conceptual entities. Finally, the suggested model retrieves English sentences through semantic matching by leveraging the comprehensive semantic framework and ontology-based search. The experimental study revealed that the proposed model demonstrated an impressive retrieval ratio of 98.5% by outperforming existing models in the comparison. Moreover, these results show that the proposed algorithm performs better than the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, and the accuracy of this algorithm is improved by 7.52% compared with TF-IDF. When the labelled corpus is very small and the unlabeled corpus is relatively large, the algorithm enhances the classifier’s performance by 12.6%. This shows that the algorithm used in this paper reduces the influence of the synonym processing stage on the overall performance while retaining the advantages of high precision and accuracy of calculation results.
Similar content being viewed by others
Availability of data and materials
Not applicable.
References
Adel E, El-Sappagh S, Barakat S, Kwak KS, Elmogy M (2022) Semantic architecture for interoperability in distributed healthcare systems. IEEE Access 10:126161–126179
Atabuzzaman M, Shajalal M, Ahmed ME, Afjal MI, Aono M (2021) Leveraging grammatical roles for measuring semantic similarity between texts. IEEE Access 9:62972–62983
Baoqun Y, Aslam MS et al (2023) A practical study of active disturbance rejection control for rotary flexible joint robot manipulator. Soft Comput 27:4987–5001. https://doi.org/10.1007/s00500-023-08026-x
Bova VV, Nuzhnov EV, Kureichik VV (2017) The combined method of semantic similarity estimation of problem oriented knowledge on the basis of evolutionary procedures. In: Silhavy R et al (eds) Artificial intelligence trends in intelligent systems: proceedings of the 6th computer science on-line conference 2017 (CSOC2017), vol 1. Springer International Publishing, Cham, pp 74–83
Chen Z (2019) Observer-based dissipative output feedback control for network T-S fuzzy systems under time delays with mismatch premise. Nonlinear Dyn 95:2923–2941
Chen G, Chen P, Huang W, Zhai J (2022) Continuance intention mechanism of middle school student users on online learning platform based on qualitative comparative analysis method. Math Problems Eng 2022:3215337. https://doi.org/10.1155/2022/3215337
Dai X, Sheng A (2020) Event-triggered scheme for fault detection and isolation of non-linear system with time-varying delay. IET Control Theory Appl 14(16):2429–2438
Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I (2023) Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med 139:102523
Guo Y, Wu Y, Guo J (2017) Experimental validation of fuzzy PID control of flexible joint system in presence of uncertainties. In: 2017 36th Chinese control conference (CCC). IEEE, pp 4192–4197. https://doi.org/10.23919/ChiCC.2017.8028015
Hussain MJ, Bai H, Wasti SH, Huang G, Jiang Y (2023) Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia. Inf Sci 625:673–699
Khan J, Wang L, Zhang J, Kumar A (2019) Real-time lane detection and tracking for advanced driver assistance systems. In: 2019 Chinese control conference (CCC). IEEE, pp 6772–6777. https://doi.org/10.23919/ChiCC.2019.8866334
Kumar A, Ali M, Zhang J, Yao J (2023) Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach. Soft Comput 27(7):4029–4039. https://doi.org/10.1007/s00500-023-07923-5
Li Q, Hou J (2021) Fault detection for asynchronous T-S fuzzy networked Markov jump systems with new event-triggered scheme. IET Control Theory Appl 15(11):1461–1473
Li L, Wang P, Zheng X, Xie Q, Tao X, Velásquez JD (2023a) Dual-interactive fusion for code-mixed deep representation learning in tag recommendation. Inf Fusion 99:101862
Li D, Ortegas KD, White M (2023b) Exploring the computational effects of advanced deep neural networks on logical and activity learning for enhanced thinking skills. Systems 11(7):319
Li W, Wang Y, Su Y, Li X, Liu A, Zhang Y (2023c) Multi-scale fine-grained alignments for image and sentence matching. IEEE Trans Multimedia 25:543–556. https://doi.org/10.1109/TMM.2021.3128744
Lima E, Shi W, Liu X, Yu Q (2019) Integrating multi-level tag recommendation with external knowledge bases for automatic question answering. ACM Trans Internet Technol (TOIT) 19(3):1–22
Liu Y, Wang K, Liu L, Lan H, Lin L (2022) Tcgl: temporal contrastive graph for self-supervised video representation learning. IEEE Trans Image Process 31:1978–1993
Liu X, Shi T, Zhou G, Liu M, Yin Z, Yin L, Zheng W (2023a) Emotion classification for short texts: an improved multi-label method. Human Soc Sci Commun 10(1):1–9
Liu X, Zhou G, Kong M, Yin Z, Li X, Yin L, Zheng W (2023b) Developing multi-labelled corpus of twitter short texts: a semi-automatic method. Systems 11(8):390
Lu S, Ding Y, Liu M, Yin Z, Yin L, Zheng W (2023) Multiscale feature extraction and fusion of image and text in VQA. Int J Comput Intell Syst 16(1):54
Ma Y, Meng F (2022) Enhanced video caption model based on text attention mechanism. In: 2022 5th international conference on data science and information technology (DSIT). IEEE, pp 1–7
Muhammad A, Yin B, Kumar A, Sheikh AM et al (2020) Reduction of multiplications in convolutional neural networks. In: 2020 39th Chinese control conference (CCC). IEEE, pp 7406–7411. https://doi.org/10.23919/CCC50068.2020.9188843
Qaisar I, Majid A, Shamrooz S (2023) Adaptive event-triggered robust H∞ control for Takagi-Sugeno fuzzy networked Markov jump systems with time-varying delay. Asian J Control 25(1):213–228
Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manage 54(4):593–608
Shaikh AM, Li Y et al (2021) Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl Intell 51:1152–1160. https://doi.org/10.1007/s10489-020-01894-y
Sun Z, Cao Y et al (2023) A data-driven approach for intrusion and anomaly detection using automated machine learning for the internet of things. Soft Comput. https://doi.org/10.1007/s00500-023-09037-4
Wang Y, Su Y, Li W, Xiao J, Li X, Liu A (2023) Dual-path rare content enhancement network for image and text matching. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3254530
Wulff P, Buschhüter D, Westphal A, Mientus L, Nowak A, Borowski A (2022) Bridging the gap between qualitative and quantitative assessment in science education research with machine learning—a case for pretrained language models-based clustering. J Sci Educ Technol 31(4):490–513
Xiong Y (2021) Decentralization, market, and aspiration: dimensions of the control mechanism of the transnational online english education industry. J Chin Hum Resour Manage 12(2):16–24
Xisheng D, Hou J, Li Q, Ullah R, Ni Z, Liu Y (2020) Reliable control design for composite-driven scheme based on delay networked T-S fuzzy system. Int J Robust Nonlinear Control 30(4):1622–1642
Yang S, Li Q, Li W, Li X, Liu AA (2022) Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Trans Circuits Syst Video Technol 32(11):8037–8050
Zhai Q, Yin B et al (2019) Second-order convolutional network for crowd counting. In: Proc. SPIE 11198, fourth international workshop on pattern recognition, 111980T, 31 July 2019. https://doi.org/10.1117/12.2540362
Zhang P, Huang X, Wang Y, Jiang C, He S, Wang H (2021) Semantic similarity computing model based on multi model fine-grained nonlinear fusion. IEEE Access 9:8433–8443
Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24
Zhuang Y, Jiang N, Xu Y (2022) Progressive distributed and parallel similarity retrieval of large CT image sequences in mobile telemedicine networks. Wirel Commun Mob Comput 2022:1–13
Funding
No funding was provided for the completion of this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no financial or proprietary interests in any material discussed in this article. The authors declare that they have no conflict of interest.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H. A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques. Soft Comput 27, 18129–18146 (2023). https://doi.org/10.1007/s00500-023-09224-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09224-3