Skip to main content
Log in

Ontology-based Tamil–English cross-lingual information retrieval system

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Cross-lingual information retrieval (CLIR) systems facilitate users to query for information in one language and retrieve relevant documents in another language. In general, CLIR systems translate query in source language to target language and retrieve documents in target language based on the keywords present in the translated query. However, the presence of ambiguity in source and translated queries reduces the performance of the system. Ontology can be used to address this problem. The current approaches to ontology-based CLIR systems use manually constructed multilingual ontology, which is expensive. However, many methods exist to automatically construct ontology for any domain in English but not in other languages like Tamil. We propose a methodology for Tamil–English CLIR system by translating the Tamil query to English and retrieve pages in English to address these issues. Our approach uses a word sense disambiguation module to resolve the ambiguity in Tamil query. An automatically constructed ontology in English is used to address the ambiguity of English query. We have developed a morphological analyser for Tamil language, Tamil–English bilingual dictionary and named entity database to translate a Tamil query to English. The translated query is reformulated using ontology and the reformulated queries are given to a search engine to retrieve English documents from the Internet. We have evaluated our methodology for agriculture domain and the evaluation results show that our approach outperforms other approaches in terms of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

Notes

  1. http://www.fallingrain.com/world/IN/25/.

  2. https://en.wikipedia.org/wiki/List_of_rivers_of_Tamil_Nadu.

  3. https://en.wikipedia.org/wiki/List_of_lakes_in_Tamil_Nadu.

  4. https://en.wikipedia.org/wiki/List_of_Tamil_people.

References

  1. Zimmermann A, Lopes N, Polleres A and Straccia U 2012 A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant. Sci. Serv. Agents World Wide Web 11: 72–95

    Article  Google Scholar 

  2. Kara S, Alan Ö, Sabuncu O, Akpınar S, Cicekli N K and Alpaslan F N 2012 An ontology based retrieval system using semantic indexing. Inf. Syst. 37(4): 294–305

    Article  Google Scholar 

  3. Mustafa J, Khan S and Latif K 2008 Ontology based semantic information retrieval. In: Proceedings of the 4th International IEEE Conference on Intelligent Systems, IS’08, vol. 3, pp. 2214–2219

  4. Hogan A, Harth A, Umbrich J, Kinsella S, Polleres A and Decker S 2011 Searching and browsing linked data with SWSE: the semantic web search engine. Web Semant. Sci. Serv. Agents World Wide Web 9(4): 365–401

    Article  Google Scholar 

  5. Fernández M, Cantador I, López V, Vallet D, Castells P and Motta E 2011 Semantically enhanced information retrieval: an ontology-based approach. Web Semant. Sci. Serv. Agents World Wide Web 9(4): 434–452

    Article  Google Scholar 

  6. Sorg P and Cimiano P 2008 Cross-lingual information retrieval with explicit semantic analysis. In: Working Notes for the CLEF 2008 Workshop

  7. SivaKumar A P, Premchand P and Govardhan A 2011 Indian languages IR using latent semantic indexing. Int. J. Comput. Sci. Inf. Technol. 3: 245–253

    Google Scholar 

  8. Bandyopadhyay S, Mondal T, Naskar S K, Ekbal A, Haque R and Godhavarthy S R 2008 Bengali, Hindi and Telugu to English ad-hoc bilingual task at CLEF 2007. In: Advances in Multilingual and Multimodal Information Retrieval, pp. 88–94

  9. Chinnakotla M K, Ranadive S, Damani O P and Bhattacharyya P 2008 Hindi to English and Marathi to English cross language information retrieval evaluation. In Advances in Multilingual and Multimodal Information Retrieval, pp. 111–118

  10. Pingali P and Varma V 2007 IIIT hyderabad at CLEF 2007-adhoc Indian language CLIR task. In Working Notes for the CLEF 2007 Workshop

  11. Yu F, Zheng D, Zhao T, Li S and Yu H 2006 Chinese-english cross-lingual information retrieval based on domain ontology knowledge. In: Proceedings of the 2006 IEEE - International Conference on Computational Intelligence and Security, vol. 2, pp. 1460–1463

  12. Yahya Z, Abdullah M T, Azman A and Kadir R A 2013 Query translation using concepts similarity based on quran ontology for cross-language information retrieval. J. Comput. Sci. 9(7): 889–897

    Article  Google Scholar 

  13. Abusalah M, Tait J and Oakes M 2009 Cross language information retrieval using multilingual ontology as translation and query expansion base. Polibits (40): 13–16

    Article  Google Scholar 

  14. Monti J, Monteleone M, Buono M P and Marano F 2013 Natural language processing and big data—an ontology-based approach for cross-lingual information retrieval. In: Proceedings of the 2013 IEEE-International Conference on Social Computing (SocialCom), pp. 725–731

  15. Pourmahmoud S and Shamsfard M 2008 Semantic cross-lingual information retrieval. In: Proceedings of the 23rd IEEE - International Symposium on Computer and Information Sciences, ISCIS’08, pp. 1–4

  16. Navigli R and Ponzetto S P 2012 Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. J. Comput. Sci. 193: 217–250

    Article  MathSciNet  Google Scholar 

  17. Nastase V and Strube M 2013 Transforming wikipedia into a large scale multilingual concept network. Artif. Intell. 194: 62–85

    Article  MathSciNet  Google Scholar 

  18. Xu R, Gao Z, Pan Y, Qu Y and Huang Z 2008 An integrated approach for automatic construction of bilingual Chinese–English wordnet. In: Proceedings of ASWC 2008: The Semantic Web, pp. 302–314

  19. Thenmozhi D and Aravindan C 2016 An automatic and clause based approach to learn relations for ontologies. Comput. J. 59(6): 889–907

    Article  Google Scholar 

  20. Bhogal J, Macfarlane A and Smith P 2007 A review of ontology based query expansion. Inf. Process. Manag. 43(4): 866–886

    Article  Google Scholar 

  21. Jain V and Singh M 2013 Ontology based information retrieval in semantic web: a survey. Int. J. Inf. Technol. Comput. Sci. 5(10): 62–69

    Google Scholar 

  22. Sy M F, Ranwez S, Montmain J, Regnault A, Crampes M and Ranwez V 2012 User centered and ontology based information retrieval system for life sciences. BMC Bioinf. 13(Suppl 1): S4

    Article  Google Scholar 

  23. Sujatha P and Dhavachelvan P 2011 A review on the cross and multilingual information retrieval. Int. J. Web Semant. Technol. 2(4): 115–124

    Article  Google Scholar 

  24. Sorg P and Cimiano P 2012 Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74: 26–45

    Article  Google Scholar 

  25. Majumder P, Mitra M, Parui S K and Bhattacharyya P 2007 Initiative for Indian language IR evaluation. In: Proceedings of the First International Workshop on Evaluating Information Access (EVIA), Tokyo, Japan, May 15

  26. Mandal D, Dandapat S, Gupta M, Banerjee P and Sarkar S 2007 Bengali and Hindi to English cross-language text retrieval under limited resources. In: Working Notes for the CLEF 2007 Workshop

  27. Jagarlamudi J and Kumaran A 2008 Cross-lingual information retrieval system for Indian languages. In: Proceedings of the Advances in Multilingual and Multimodal Information Retrieval Workshop, pp. 80–87

  28. Rao T P R K and Devi S L 2013 Tamil English cross lingual information retrieval. In: Multilingual Information Access in South Asian Languages, pp. 269–279

    Chapter  Google Scholar 

  29. Thenmozhi D and Aravindan C 2009 Tamil–English cross lingual information retrieval system for agriculture society. In: Proceedings of the Tamil Internet Conference, pp. 173–178

  30. Popovic M and Ney H 2006 POS-based word reorderings for statistical machine translation. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 1278–1283

  31. Menon A G, Saravanan S, Loganathan R and Soman K 2009 Amrita morph analyzer and generator for tamil: a rule based approach. In: Proceedings of the Tamil Internet Conference, pp. 239–243

Download references

Acknowledgements

We would like to thank the management of SSN Institutions for funding the High Performance Computing (HPC) lab, where this research is being carried out. We also thank the anonymous reviewers for their constructive comments, which helped us to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D Thenmozhi.

Appendix I

Appendix I

Tables 8, 9, 10, 11, 12, 13, 14, 15 and 16 show the significance of domain-specific ontology in the retrieval performance by comparing with other search methods.

Table 8 Performance comparison for the user query “Mutkalappai”.
Table 9 Performance comparison for the user query “Uzhudhal Upakaranangkal”.
Table 10 Performance comparison for the user query “Payinkaal”.
Table 11 Performance comparison for the user query “Puussikal”.
Table 12 Performance comparison for the user query “Kalappai”.
Table 13 Performance comparison for the user query “VeNkaaram”.
Table 14 Performance comparison for the user query “EthiruutikaL”.
Table 15 Performance comparison for the user query “Thunai Marunthu PoruL”.
Table 16 Performance comparison for the user query “ManjaLin payankaL”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thenmozhi, D., Aravindan, C. Ontology-based Tamil–English cross-lingual information retrieval system. Sādhanā 43, 157 (2018). https://doi.org/10.1007/s12046-018-0942-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-0942-7

Keywords

Navigation