Complex Search Queries in the Corpus Management System

Mukhamedshin, Damir; Nevzorova, Olga; Khusainov, Aidar

doi:10.1007/978-3-319-67077-5_39

Damir Mukhamedshin¹⁸,
Olga Nevzorova¹⁸ &
Aidar Khusainov¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10449))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1816 Accesses
1 Citations

Abstract

This article discusses the advanced features of the newly developed search engine of the “Tugan tel” corpus management system. This corpus consists of texts written in the Tatar language. The new features include executing complex queries with arbitrary logical formulas for direct and reverse search; executing complex queries using a thesaurus or word form/lemma list and extracting some types of named entities.

Complex queries enable to automatically extract and annotate semantic data from a corpus for linguistic applications. These options improve the search process and also enable to test the lexicon and collocations in the corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Aibaidulla, Y., Lua, K.T.: The development of tagged Uyghur corpus. In: Proceedings of PACLIC17, pp. 1–3 (2003)
Google Scholar
Anthony, L.: AntConc: a learner and classroom friendly, multi-platform corpus analysis toolkit. In: Proceedings of IWLeL 2004: An Interactive Workshop on Language e-Learning, pp. 7–13 (2004)
Google Scholar
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Suchomel, V.: The Sketch Engine: ten years on. Lexicography 1(1), 7–36 (2014)
Article Google Scholar
Křen, M.: Recent developments in the Czech National Corpus. In: Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), pp. 1–4 (2015)
Google Scholar
Scott, M.: Wordsmith Tools. Oxford University Press, Oxford (1996)
Google Scholar
Asahara, M., Maekawa, K., Imada, M., Kato, S., Konishi, H.: Archiving and analysing techniques of the ultra-large-scale web-based Corpus Project of NINJAL, Japan. Alexandria 25(1–2), 129–148 (2014)
Article Google Scholar
Kouklakis, G., Mikros, G., Markopoulos, G., Koutsis, I.: Corpus manager a tool for multilingual corpus analysis. In: Proceedings of Corpus Linguistics Conference 2007. http://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2007/244Paper.pdf
Nevzorova, O., Mukhamedshin, D., Kurmanbakiev, M.: Semantic aspects of metadata representation in corpus manager system. In: Open Semantic Technologies for Intelligent Systems (OSTIS-2016), pp. 371–376 (2016)
Google Scholar
Suleymanov, D., Nevzorova, O., Gatiatullin, A., Gilmullin, R., Hakimov, B.: National corpus of the Tatar language “Tugan Tel”: grammatical annotation and implementation. Proc. Soc. Behav. Sci. 95, 68–74 (2013)
Article Google Scholar
Zakharov, V.: Corpora of the Russian language. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 1–13. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_1
Chapter Google Scholar

Download references

Acknowledgment

The reported study was funded by Russian Science Foundation (research project № 16-18-02074).

Author information

Authors and Affiliations

Institute of Applied Semiotics, Tatarstan Academy of Sciences, Kazan, Russia
Damir Mukhamedshin, Olga Nevzorova & Aidar Khusainov

Authors

Damir Mukhamedshin
View author publications
You can also search for this author in PubMed Google Scholar
Olga Nevzorova
View author publications
You can also search for this author in PubMed Google Scholar
Aidar Khusainov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Damir Mukhamedshin .

Editor information

Editors and Affiliations

Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
Department of Information Systems, Gdynia Maritime University, Gdynia, Poland
Piotr Jędrzejowicz
Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukhamedshin, D., Nevzorova, O., Khusainov, A. (2017). Complex Search Queries in the Corpus Management System. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-67077-5_39
Published: 07 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics