Skip to main content

Advertisement

Log in

A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

These days, exploring information retrieval models is one of the most essential aspects of English sentence retrieval research. These models are driven by diverse retrieval mechanisms that offer varying similarity calculations and directly influence the final result ranking. However, despite decades of work due to technical constraints, deep semantic analysis has been challenging. This gap emphasizes the importance of a precise semantic understanding of information acquisition through learning approaches. Based on the above opening, this paper establishes a fast retrieval model of English sentences based on the statistical language model (SLM). First, the proposed method utilizes SLM to extract significant feature words from the corpus. These feature words are identified by analyzing co-occurrence patterns and frequency distributions within the standard. Second, it employs the N-gram model to calculate the probabilities of word occurrences based on their contextual dependencies. This framework represents feature words and their associated probabilities in a structured manner by capturing the intricate nuances of language semantics. Third, the model integrates ontology to bridge the gap between human language and machine understanding by enabling the mapping natural language expressions to conceptual entities. Finally, the suggested model retrieves English sentences through semantic matching by leveraging the comprehensive semantic framework and ontology-based search. The experimental study revealed that the proposed model demonstrated an impressive retrieval ratio of 98.5% by outperforming existing models in the comparison. Moreover, these results show that the proposed algorithm performs better than the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, and the accuracy of this algorithm is improved by 7.52% compared with TF-IDF. When the labelled corpus is very small and the unlabeled corpus is relatively large, the algorithm enhances the classifier’s performance by 12.6%. This shows that the algorithm used in this paper reduces the influence of the synonym processing stage on the overall performance while retaining the advantages of high precision and accuracy of calculation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  • Adel E, El-Sappagh S, Barakat S, Kwak KS, Elmogy M (2022) Semantic architecture for interoperability in distributed healthcare systems. IEEE Access 10:126161–126179

    Article  Google Scholar 

  • Atabuzzaman M, Shajalal M, Ahmed ME, Afjal MI, Aono M (2021) Leveraging grammatical roles for measuring semantic similarity between texts. IEEE Access 9:62972–62983

    Article  Google Scholar 

  • Baoqun Y, Aslam MS et al (2023) A practical study of active disturbance rejection control for rotary flexible joint robot manipulator. Soft Comput 27:4987–5001. https://doi.org/10.1007/s00500-023-08026-x

    Article  Google Scholar 

  • Bova VV, Nuzhnov EV, Kureichik VV (2017) The combined method of semantic similarity estimation of problem oriented knowledge on the basis of evolutionary procedures. In: Silhavy R et al (eds) Artificial intelligence trends in intelligent systems: proceedings of the 6th computer science on-line conference 2017 (CSOC2017), vol 1. Springer International Publishing, Cham, pp 74–83

    Chapter  Google Scholar 

  • Chen Z (2019) Observer-based dissipative output feedback control for network T-S fuzzy systems under time delays with mismatch premise. Nonlinear Dyn 95:2923–2941

    Article  MATH  Google Scholar 

  • Chen G, Chen P, Huang W, Zhai J (2022) Continuance intention mechanism of middle school student users on online learning platform based on qualitative comparative analysis method. Math Problems Eng 2022:3215337. https://doi.org/10.1155/2022/3215337

    Article  Google Scholar 

  • Dai X, Sheng A (2020) Event-triggered scheme for fault detection and isolation of non-linear system with time-varying delay. IET Control Theory Appl 14(16):2429–2438

    Article  MathSciNet  Google Scholar 

  • Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I (2023) Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med 139:102523

    Article  Google Scholar 

  • Guo Y, Wu Y, Guo J (2017) Experimental validation of fuzzy PID control of flexible joint system in presence of uncertainties. In: 2017 36th Chinese control conference (CCC). IEEE, pp 4192–4197. https://doi.org/10.23919/ChiCC.2017.8028015

  • Hussain MJ, Bai H, Wasti SH, Huang G, Jiang Y (2023) Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia. Inf Sci 625:673–699

    Article  Google Scholar 

  • Khan J, Wang L, Zhang J, Kumar A (2019) Real-time lane detection and tracking for advanced driver assistance systems. In: 2019 Chinese control conference (CCC). IEEE, pp 6772–6777. https://doi.org/10.23919/ChiCC.2019.8866334

  • Kumar A, Ali M, Zhang J, Yao J (2023) Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach. Soft Comput 27(7):4029–4039. https://doi.org/10.1007/s00500-023-07923-5

    Article  Google Scholar 

  • Li Q, Hou J (2021) Fault detection for asynchronous T-S fuzzy networked Markov jump systems with new event-triggered scheme. IET Control Theory Appl 15(11):1461–1473

    Article  MathSciNet  Google Scholar 

  • Li L, Wang P, Zheng X, Xie Q, Tao X, Velásquez JD (2023a) Dual-interactive fusion for code-mixed deep representation learning in tag recommendation. Inf Fusion 99:101862

    Article  Google Scholar 

  • Li D, Ortegas KD, White M (2023b) Exploring the computational effects of advanced deep neural networks on logical and activity learning for enhanced thinking skills. Systems 11(7):319

    Article  Google Scholar 

  • Li W, Wang Y, Su Y, Li X, Liu A, Zhang Y (2023c) Multi-scale fine-grained alignments for image and sentence matching. IEEE Trans Multimedia 25:543–556. https://doi.org/10.1109/TMM.2021.3128744

    Article  Google Scholar 

  • Lima E, Shi W, Liu X, Yu Q (2019) Integrating multi-level tag recommendation with external knowledge bases for automatic question answering. ACM Trans Internet Technol (TOIT) 19(3):1–22

    Article  Google Scholar 

  • Liu Y, Wang K, Liu L, Lan H, Lin L (2022) Tcgl: temporal contrastive graph for self-supervised video representation learning. IEEE Trans Image Process 31:1978–1993

    Article  Google Scholar 

  • Liu X, Shi T, Zhou G, Liu M, Yin Z, Yin L, Zheng W (2023a) Emotion classification for short texts: an improved multi-label method. Human Soc Sci Commun 10(1):1–9

    Google Scholar 

  • Liu X, Zhou G, Kong M, Yin Z, Li X, Yin L, Zheng W (2023b) Developing multi-labelled corpus of twitter short texts: a semi-automatic method. Systems 11(8):390

    Article  Google Scholar 

  • Lu S, Ding Y, Liu M, Yin Z, Yin L, Zheng W (2023) Multiscale feature extraction and fusion of image and text in VQA. Int J Comput Intell Syst 16(1):54

    Article  Google Scholar 

  • Ma Y, Meng F (2022) Enhanced video caption model based on text attention mechanism. In: 2022 5th international conference on data science and information technology (DSIT). IEEE, pp 1–7

  • Muhammad A, Yin B, Kumar A, Sheikh AM et al (2020) Reduction of multiplications in convolutional neural networks. In: 2020 39th Chinese control conference (CCC). IEEE, pp 7406–7411. https://doi.org/10.23919/CCC50068.2020.9188843

  • Qaisar I, Majid A, Shamrooz S (2023) Adaptive event-triggered robust H∞ control for Takagi-Sugeno fuzzy networked Markov jump systems with time-varying delay. Asian J Control 25(1):213–228

    Article  MathSciNet  Google Scholar 

  • Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manage 54(4):593–608

    Article  Google Scholar 

  • Shaikh AM, Li Y et al (2021) Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl Intell 51:1152–1160. https://doi.org/10.1007/s10489-020-01894-y

    Article  Google Scholar 

  • Sun Z, Cao Y et al (2023) A data-driven approach for intrusion and anomaly detection using automated machine learning for the internet of things. Soft Comput. https://doi.org/10.1007/s00500-023-09037-4

    Article  Google Scholar 

  • Wang Y, Su Y, Li W, Xiao J, Li X, Liu A (2023) Dual-path rare content enhancement network for image and text matching. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3254530

    Article  Google Scholar 

  • Wulff P, Buschhüter D, Westphal A, Mientus L, Nowak A, Borowski A (2022) Bridging the gap between qualitative and quantitative assessment in science education research with machine learning—a case for pretrained language models-based clustering. J Sci Educ Technol 31(4):490–513

    Article  Google Scholar 

  • Xiong Y (2021) Decentralization, market, and aspiration: dimensions of the control mechanism of the transnational online english education industry. J Chin Hum Resour Manage 12(2):16–24

    Article  Google Scholar 

  • Xisheng D, Hou J, Li Q, Ullah R, Ni Z, Liu Y (2020) Reliable control design for composite-driven scheme based on delay networked T-S fuzzy system. Int J Robust Nonlinear Control 30(4):1622–1642

    Article  MathSciNet  MATH  Google Scholar 

  • Yang S, Li Q, Li W, Li X, Liu AA (2022) Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Trans Circuits Syst Video Technol 32(11):8037–8050

    Article  Google Scholar 

  • Zhai Q, Yin B et al (2019) Second-order convolutional network for crowd counting. In: Proc. SPIE 11198, fourth international workshop on pattern recognition, 111980T, 31 July 2019. https://doi.org/10.1117/12.2540362

  • Zhang P, Huang X, Wang Y, Jiang C, He S, Wang H (2021) Semantic similarity computing model based on multi model fine-grained nonlinear fusion. IEEE Access 9:8433–8443

    Article  Google Scholar 

  • Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24

    Article  Google Scholar 

  • Zhuang Y, Jiang N, Xu Y (2022) Progressive distributed and parallel similarity retrieval of large CT image sequences in mobile telemedicine networks. Wirel Commun Mob Comput 2022:1–13

    Google Scholar 

Download references

Funding

No funding was provided for the completion of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibo Wang.

Ethics declarations

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article. The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H. A novel algorithm for the construction of fast English sentence retrieval model using a combination of ontology and advanced machine learning techniques. Soft Comput 27, 18129–18146 (2023). https://doi.org/10.1007/s00500-023-09224-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09224-3

Keywords

Navigation