Multi-corpus-Based Model for Measuring the Semantic Relatedness in Short Texts (SRST)

Research Article - Computer Engineering and Computer Science
  • 31 Downloads

Abstract

Semantic Relatedness (SR) defines a relation between linguistic items. These items could be words, phrases, or documents. There are many interesting related applications such as information extraction, words sense disambiguation, text summarization, and text clustering. The task of quantifying SR manually is fairly natural and axiomatic, whereas it is complex automatically because of human’s background experience and external domain concepts that are not available for the computational methods. This paper focuses on the Semantic Relatedness in Short Texts (SRST). A Vector Space Model—that is based on multi-corpus—is proposed to measure the SRST. Word synonyms and anaphoric information are used to improve the semantic representation of the document. Since the set of verses in the Holy Quran is a precious sample of the short texts., it is used as the main case study in this paper to measure the degree of relatedness between these verses. Experiments are conducted where their results proved the efficiency of the proposed model in improving SR measurement. The results show an improvement to the recall to be 60% rather than 11.3% as the best previous studies.

Keywords

Text similarity Semantic similarity Similarity measurement The Holy Quran Arabic language Short texts relatedness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pakhomov, S.; McInnes, B.; Adam, T.; Liu, Y.; Pedersen, T.; Melton, G.B.: Semantic similarity and relatedness between clinical terms: an experimental study. In: AMIA Annual Symposium Proceedings 2010, pp. 572–576. American Medical Informatics Association (2010)Google Scholar
  2. 2.
    Harispe, S.; Ranwez, S.; Janaqi, S.; Montmain, J.: Semantic similarity from natural language and ontology analysis. Synth. Lect. Hum. Lang. Technol. 8(1), 1–254 (2015)CrossRefGoogle Scholar
  3. 3.
    Shoaib, M.; Daud, A.; Khiyal, M.S.H.: Improving similarity measures for publications with special focus on author name disambiguation. Arab. J. Sci. Eng. 40(6), 1591–1605 (2015).  https://doi.org/10.1007/s13369-015-1636-7 MathSciNetCrossRefGoogle Scholar
  4. 4.
    Fernando, S.; Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics 2008, pp. 45–52 (2008)Google Scholar
  5. 5.
    Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Exp. Syst. Appl. 36(4), 7764–7772 (2009)CrossRefGoogle Scholar
  6. 6.
    Martinez, D.; MacKinlay, A.; Aliod, D.M.; Cavedon, L.; Verspoor, K.: Simple similarity-based question answering strategies for biomedical text. In: CLEF (Online Working Notes/Labs/Workshop) (2012)Google Scholar
  7. 7.
    Gómez-Adorno, H.; Pinto, D.; Vilarino, D.: A question answering system for reading comprehension tests. In: Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.; Rodríguez J.S.; di Baja G.S. (eds.) Mexican Conference on Pattern Recognition 2013, pp. 354–363. Springer, Berlin (2013)Google Scholar
  8. 8.
    Shoukry, A.; Rafea, A.: Sentence-level Arabic sentiment analysis. In: Collaboration Technologies and Systems (CTS), 2012 International Conference on 2012, pp. 546–550. IEEE (2012)Google Scholar
  9. 9.
    Al-Zoghby, A.M.; Ahmed, A.S.E.; Hamza, T.T.: Arabic semantic web applications: a survey. J. Emerg. Technol. Web Intell. 5(1), 52–69 (2013)Google Scholar
  10. 10.
    Shaheen, M.; Ezzeldin, A.M.: Arabic question answering: systems, resources, tools, and future trends. Arab. J. Sci. Eng. 39(6), 4541–4564 (2014).  https://doi.org/10.1007/s13369-014-1062-2 CrossRefGoogle Scholar
  11. 11.
    Hakkoum, A.; Raghay, S.: Semantic Q&A system on the Qur’an. Arab. J. Sci. Eng. 41(12), 5205–5214 (2016).  https://doi.org/10.1007/s13369-016-2251-y CrossRefGoogle Scholar
  12. 12.
    Lahbib, W.; Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: A hybrid approach for Arabic semantic relation extraction. In: FLAIRS Conference (2013)Google Scholar
  13. 13.
    Froud, H.; Lachkar, A.; Ouatik, S.A.: A comparative study of root-based and stem-based approaches for measuring the similarity between arabic words for Arabic text mining applications. arXiv preprint arXiv:1212.3634 (2012)
  14. 14.
    Hadni, M.; Ouatik, S.E.A.; Lachkar, A.: Word sense disambiguation for arabic text categorization. Int. Arab J. Inf. Technol. 13(1A), 215–222 (2016)Google Scholar
  15. 15.
    Joty, S.; Moschitti, A.; Al Obaidli, F.A.; Romeo, S.; Tymoshenko, K.; Uva, A.: ConvKN at SemEval-2016 Task 3: answer and question selection for question answering on Arabic and English fora. In: Proceedings of SemEval, pp. 896–903 (2016)Google Scholar
  16. 16.
    Ababneh, J.; Almomani, O.; Hadi, W.; El-Omari, N.K.T.; Al-Ibrahim, A.: Vector space models to classify Arabic text. Int. J. Comput. Trends Technol. (IJCTT) 7(4), 219–223 (2014)CrossRefGoogle Scholar
  17. 17.
    Al-Anzi, F.S.; AbuZeina, D.: Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. J. King Saud Univ. Comput. Inf. Sci. 29(2), 189–195 (2017)Google Scholar
  18. 18.
    Sharaf, A.-B.M.; Atwell, E.: QurSim: a corpus for evaluation of relatedness in short texts. In: LREC 2012, pp. 2295–2302 (2012)Google Scholar
  19. 19.
    Sharaf, A.-B.M.; Atwell, E.: QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In: LREC 2012, pp. 130–137 (2012)Google Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2018

Authors and Affiliations

  • Reem El-Deeb
    • 1
  • Aya M. Al-Zoghby
    • 1
  • Samir Elmougy
    • 1
  1. 1.Department of Computer Science, Faculty of Computers and InformationMansoura UniversityMansouraEgypt

Personalised recommendations