A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques

Wang, Haibo

doi:10.1007/s00500-023-09224-3

A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques

Application of soft computing
Published: 23 September 2023

Volume 27, pages 18129–18146, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Haibo Wang¹

Abstract

These days, exploring information retrieval models is one of the most essential aspects of English sentence retrieval research. These models are driven by diverse retrieval mechanisms that offer varying similarity calculations and directly influence the final result ranking. However, despite decades of work due to technical constraints, deep semantic analysis has been challenging. This gap emphasizes the importance of a precise semantic understanding of information acquisition through learning approaches. Based on the above opening, this paper establishes a fast retrieval model of English sentences based on the statistical language model (SLM). First, the proposed method utilizes SLM to extract significant feature words from the corpus. These feature words are identified by analyzing co-occurrence patterns and frequency distributions within the standard. Second, it employs the N-gram model to calculate the probabilities of word occurrences based on their contextual dependencies. This framework represents feature words and their associated probabilities in a structured manner by capturing the intricate nuances of language semantics. Third, the model integrates ontology to bridge the gap between human language and machine understanding by enabling the mapping natural language expressions to conceptual entities. Finally, the suggested model retrieves English sentences through semantic matching by leveraging the comprehensive semantic framework and ontology-based search. The experimental study revealed that the proposed model demonstrated an impressive retrieval ratio of 98.5% by outperforming existing models in the comparison. Moreover, these results show that the proposed algorithm performs better than the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, and the accuracy of this algorithm is improved by 7.52% compared with TF-IDF. When the labelled corpus is very small and the unlabeled corpus is relatively large, the algorithm enhances the classifier’s performance by 12.6%. This shows that the algorithm used in this paper reduces the influence of the synonym processing stage on the overall performance while retaining the advantages of high precision and accuracy of calculation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Retrieval Algorithm of English Sentences Based on Artificial Intelligence Machine Translation

Yet Another Ranking Function for Automatic Multiword Term Extraction

Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences

Availability of data and materials

Not applicable.

References

Adel E, El-Sappagh S, Barakat S, Kwak KS, Elmogy M (2022) Semantic architecture for interoperability in distributed healthcare systems. IEEE Access 10:126161–126179
Article Google Scholar
Atabuzzaman M, Shajalal M, Ahmed ME, Afjal MI, Aono M (2021) Leveraging grammatical roles for measuring semantic similarity between texts. IEEE Access 9:62972–62983
Article Google Scholar
Baoqun Y, Aslam MS et al (2023) A practical study of active disturbance rejection control for rotary flexible joint robot manipulator. Soft Comput 27:4987–5001. https://doi.org/10.1007/s00500-023-08026-x
Article Google Scholar
Bova VV, Nuzhnov EV, Kureichik VV (2017) The combined method of semantic similarity estimation of problem oriented knowledge on the basis of evolutionary procedures. In: Silhavy R et al (eds) Artificial intelligence trends in intelligent systems: proceedings of the 6th computer science on-line conference 2017 (CSOC2017), vol 1. Springer International Publishing, Cham, pp 74–83
Chapter Google Scholar
Chen Z (2019) Observer-based dissipative output feedback control for network T-S fuzzy systems under time delays with mismatch premise. Nonlinear Dyn 95:2923–2941
Article MATH Google Scholar
Chen G, Chen P, Huang W, Zhai J (2022) Continuance intention mechanism of middle school student users on online learning platform based on qualitative comparative analysis method. Math Problems Eng 2022:3215337. https://doi.org/10.1155/2022/3215337
Article Google Scholar
Dai X, Sheng A (2020) Event-triggered scheme for fault detection and isolation of non-linear system with time-varying delay. IET Control Theory Appl 14(16):2429–2438
Article MathSciNet Google Scholar
Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I (2023) Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med 139:102523
Article Google Scholar
Guo Y, Wu Y, Guo J (2017) Experimental validation of fuzzy PID control of flexible joint system in presence of uncertainties. In: 2017 36th Chinese control conference (CCC). IEEE, pp 4192–4197. https://doi.org/10.23919/ChiCC.2017.8028015
Hussain MJ, Bai H, Wasti SH, Huang G, Jiang Y (2023) Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia. Inf Sci 625:673–699
Article Google Scholar
Khan J, Wang L, Zhang J, Kumar A (2019) Real-time lane detection and tracking for advanced driver assistance systems. In: 2019 Chinese control conference (CCC). IEEE, pp 6772–6777. https://doi.org/10.23919/ChiCC.2019.8866334
Kumar A, Ali M, Zhang J, Yao J (2023) Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach. Soft Comput 27(7):4029–4039. https://doi.org/10.1007/s00500-023-07923-5
Article Google Scholar
Li Q, Hou J (2021) Fault detection for asynchronous T-S fuzzy networked Markov jump systems with new event-triggered scheme. IET Control Theory Appl 15(11):1461–1473
Article MathSciNet Google Scholar
Li L, Wang P, Zheng X, Xie Q, Tao X, Velásquez JD (2023a) Dual-interactive fusion for code-mixed deep representation learning in tag recommendation. Inf Fusion 99:101862
Article Google Scholar
Li D, Ortegas KD, White M (2023b) Exploring the computational effects of advanced deep neural networks on logical and activity learning for enhanced thinking skills. Systems 11(7):319
Article Google Scholar
Li W, Wang Y, Su Y, Li X, Liu A, Zhang Y (2023c) Multi-scale fine-grained alignments for image and sentence matching. IEEE Trans Multimedia 25:543–556. https://doi.org/10.1109/TMM.2021.3128744
Article Google Scholar
Lima E, Shi W, Liu X, Yu Q (2019) Integrating multi-level tag recommendation with external knowledge bases for automatic question answering. ACM Trans Internet Technol (TOIT) 19(3):1–22
Article Google Scholar
Liu Y, Wang K, Liu L, Lan H, Lin L (2022) Tcgl: temporal contrastive graph for self-supervised video representation learning. IEEE Trans Image Process 31:1978–1993
Article Google Scholar
Liu X, Shi T, Zhou G, Liu M, Yin Z, Yin L, Zheng W (2023a) Emotion classification for short texts: an improved multi-label method. Human Soc Sci Commun 10(1):1–9
Google Scholar
Liu X, Zhou G, Kong M, Yin Z, Li X, Yin L, Zheng W (2023b) Developing multi-labelled corpus of twitter short texts: a semi-automatic method. Systems 11(8):390
Article Google Scholar
Lu S, Ding Y, Liu M, Yin Z, Yin L, Zheng W (2023) Multiscale feature extraction and fusion of image and text in VQA. Int J Comput Intell Syst 16(1):54
Article Google Scholar
Ma Y, Meng F (2022) Enhanced video caption model based on text attention mechanism. In: 2022 5th international conference on data science and information technology (DSIT). IEEE, pp 1–7
Muhammad A, Yin B, Kumar A, Sheikh AM et al (2020) Reduction of multiplications in convolutional neural networks. In: 2020 39th Chinese control conference (CCC). IEEE, pp 7406–7411. https://doi.org/10.23919/CCC50068.2020.9188843
Qaisar I, Majid A, Shamrooz S (2023) Adaptive event-triggered robust H∞ control for Takagi-Sugeno fuzzy networked Markov jump systems with time-varying delay. Asian J Control 25(1):213–228
Article MathSciNet Google Scholar
Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manage 54(4):593–608
Article Google Scholar
Shaikh AM, Li Y et al (2021) Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl Intell 51:1152–1160. https://doi.org/10.1007/s10489-020-01894-y
Article Google Scholar
Sun Z, Cao Y et al (2023) A data-driven approach for intrusion and anomaly detection using automated machine learning for the internet of things. Soft Comput. https://doi.org/10.1007/s00500-023-09037-4
Article Google Scholar
Wang Y, Su Y, Li W, Xiao J, Li X, Liu A (2023) Dual-path rare content enhancement network for image and text matching. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3254530
Article Google Scholar
Wulff P, Buschhüter D, Westphal A, Mientus L, Nowak A, Borowski A (2022) Bridging the gap between qualitative and quantitative assessment in science education research with machine learning—a case for pretrained language models-based clustering. J Sci Educ Technol 31(4):490–513
Article Google Scholar
Xiong Y (2021) Decentralization, market, and aspiration: dimensions of the control mechanism of the transnational online english education industry. J Chin Hum Resour Manage 12(2):16–24
Article Google Scholar
Xisheng D, Hou J, Li Q, Ullah R, Ni Z, Liu Y (2020) Reliable control design for composite-driven scheme based on delay networked T-S fuzzy system. Int J Robust Nonlinear Control 30(4):1622–1642
Article MathSciNet MATH Google Scholar
Yang S, Li Q, Li W, Li X, Liu AA (2022) Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Trans Circuits Syst Video Technol 32(11):8037–8050
Article Google Scholar
Zhai Q, Yin B et al (2019) Second-order convolutional network for crowd counting. In: Proc. SPIE 11198, fourth international workshop on pattern recognition, 111980T, 31 July 2019. https://doi.org/10.1117/12.2540362
Zhang P, Huang X, Wang Y, Jiang C, He S, Wang H (2021) Semantic similarity computing model based on multi model fine-grained nonlinear fusion. IEEE Access 9:8433–8443
Article Google Scholar
Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24
Article Google Scholar
Zhuang Y, Jiang N, Xu Y (2022) Progressive distributed and parallel similarity retrieval of large CT image sequences in mobile telemedicine networks. Wirel Commun Mob Comput 2022:1–13
Google Scholar

Download references

Funding

No funding was provided for the completion of this study.

Author information

Authors and Affiliations

Xinyang Vocational and Technical College, Xinyang, 464000, Henan, China
Haibo Wang

Authors

Haibo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibo Wang.

Ethics declarations

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article. The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H. A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques. Soft Comput 27, 18129–18146 (2023). https://doi.org/10.1007/s00500-023-09224-3

Download citation

Accepted: 09 September 2023
Published: 23 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00500-023-09224-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Fast Retrieval Algorithm of English Sentences Based on Artificial Intelligence Machine Translation

Yet Another Ranking Function for Automatic Multiword Term Extraction

Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Fast Retrieval Algorithm of English Sentences Based on Artificial Intelligence Machine Translation

Yet Another Ranking Function for Automatic Multiword Term Extraction

Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation