Abstract
The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the ‘Communication in Slovene’ project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and abilities of user groups such as translators, writers, proofreaders and teachers. Two years after the corpus was published within the new tool, its features were assessed by the users. With an average rate of 4.36 on a scale between 1 and 5 (1 = I strongly disagree, 5 = I strongly agree), the results indicate that most survey participants agreed or strongly agreed with positive statements about the new implementations (e.g. “The corpus results are displayed in a clear manner”). This is a considerable improvement in user experience from the previous reference corpus of Slovene, i.e. the FidaPLUS corpus within the ASP32 concordancer (rated with 3.67). In the user feedback, the simplicity of search options and the interface clarity are highlighted as the main advantages, while for the future development, advanced visualizations of corpus data and improved search of word-phrases are suggested. The evaluation also highlighted some relevant user habits, such as not taking the time to learn systematically about the tool before they start using it. The findings will be implemented in future editions of the Gigafida corpus, but are relevant to any project that aims at facilitating a wider use of reference corpora and corpus-based resources.
Notes
Information about the project is available at http://eng.slovenscina.eu/.
Since the Slovene declaration of independence in 1991, five large-scale corpora of written Slovene were compiled: FIDA (in 2000), FidaPLUS (2006), Gigafida (2012), Beseda (2000), and Nova Beseda (2011). As the names suggest, they represent two different series of corpora: the first three were built by consortiums of research institutions, and the last two were compiled by the Fran Ramovš Institute of Slovene Language (for a more detailed description of Gigafida see Logar 2017, also Gorjanc 2006; Logar Berginc and Krek 2012). The most recently published corpora are the ones currently in use.
To the linguistic community, Gigafida was made freely available also in the NoSketch Engine corpus analysis tool (https://www.clarin.si/noske/; Erjavec 2013) and under licence in The Sketch Engine software (Kilgarriff et al. 2014). Although there is no data available on Gigafida usage within these two specialised tools, the initial log analyses of the SSJ concordancer suggest the project website http://www.gigafida.net/ is the default corpus entry point for a large number of users. For example, 16,244 queries have been recorded in August 2014 (launch of the survey), i.e. an average of 524 queries per day. Given that queries have only been recorded for users that had accepted the cookie consent, the actual number of SSJ concordancer users is presumed to be much higher.
Among the target user groups, ‘Gigafida’ is typically perceived as an entity comprising annotated texts, the corresponding concordancer, and the web user interface. Not to confuse the participants, only the term corpus was used in the survey.
The questionnaire was compiled using the freely available tools at 1KA, OneClick Survey: http://english.1ka.si/.
The estimated time for completing the survey (also stated in its Introduction) was 10–15 min, placing it in the category of medium-long surveys. According to the Basic Recommendations of the 1KA survey tool (How long should my survey be?), this meant that “in addition to interesting topics, respondents require an additional motivation/…/, encouragement or incentive”. In our case, we relied upon the motive of interest, acknowledging that the extensive estimated time may discourage users from participating (we return to this question in Sect. 4). It turned out, however, the average time for valid responses was 6 min and 5 s. This substantial reduction as to the estimated time was presumably caused by the participants’ omission of non-obligatory open format questions.
In the first evaluation, additional effort was dedicated to promote the survey among university students, while for the new survey there was intentionally no focused recruiting of any user group.
As a reminder, an additional short explanation of each of the listed features was provided in the survey.
Such was, for example, a user survey on the usefulness of different genres included in the Corpus of Contemporary Arabic conducted among language teachers and language engineers (Al-Sulaiti and Atwell 2006: 19–25).
References
Agarwal, R., & Venkatesh, V. (2002). Assessing a firm’s web presence: A heuristic evaluation procedure for the measurement of usability. Information Systems Research, 13(2), 168–186.
Al-Sulaiti, L., & Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(1), 1–36.
Arhar, Š. (2009). Uporabniška evalvacija korpusa FidaPLUS: zasnova vprašalnika, prvi rezultati. In M. Stabej (Ed.), Infrastruktura slovenščine in slovenistike (pp. 19–26). Ljubljana: Znanstvena založba Filozofske fakultete.
Arhar, Š., Gorjanc, V., & Krek, S. (2007). FidaPLUS corpus of Slovenian: The new generation of the Slovenian reference corpus: Its design and tools. In M. Davies (Ed.), Proceedings of the corpus linguistics conference CL2007 (pp. 1–12). Birmingham: University of Birmingham.
Arhar Holdt, Š., Kosem, I., & Gantar, P. (2017). Corpus-based resources for L1 teaching: The case of Slovene. In A. Marcus-Quinn & T. Hourigan (Eds.), Handbook on digital learning for K-12 schools (pp. 91–113). Berlin: Springer.
Bryman, A. (2012). Social research methods. Oxford: Oxford University Press.
Erjavec, T. (2013). Slovene corpora for corpus linguistics and language technologies. In K. Gajdošová & A. Žáková (Eds.), Proceedings of the seventh international conference SLOVKO 2013 (pp. 51–62). Bratislava: Slovenská académia vied.
Erjavec, T., Fišer, D., Krek, S., & Ledinek, N. (2010). The JOS linguistically tagged corpus of Slovene. In N. Calzolari, et al. (Eds.), Proceedings of the 7th international conference on language resources and evaluation (pp. 1806–1809). Paris: ELRA.
Flowerdew, L. (2009). Applying corpus linguistics to pedagogy: A critical evaluation. International Journal of Corpus Linguistics, 14(3), 393–417.
Frankenberg-Garcia, A. (2012). Raising teachers’ awareness of corpora. Language Teaching, 45(4), 475–489.
Gorjanc, V. (2006). Tracking lexical changes in the reference corpus of Slovene texts. In A. Wilson, D. Archer, & P. Rayson (Eds.), Corpus linguistics around the world (pp. 91–100). Amsterdam, New York: Rodopi.
Grčar, M., Krek, S., & Dobrovoljc, K. (2012). Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In T. Erjavec & J. Žganec Gros (Eds.), Proceedings of the eighth language technologies conference (pp. 89–94). Ljubljana: Institut “Jožef Stefan”.
Groves, M. R., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: Wiley.
Hardie, A. (2012). CQPweb—Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3), 380–409.
Hewson, C., Vogel, C., & Laurent, D. (2016). Internet research methods. Los Angeles: Sage.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., et al. (2014). The sketch engine: Ten years on. Lexicography, 1(1), 7–36.
Kilgarriff, A., Rundell, M., & Dhonnchadha, E. U. (2006). Efficient corpus development for lexicography: Building the New Corpus for Ireland. Language Resources and Evaluation, 40(2), 127–152.
Kosem, I. (2012). User-friendly interfaces for corpora of Slovene. Prace Filologiczne, 63, 167–180.
Krek, S. (2012). The Slovene language in the digital age. Berlin, Heidelberg: Springer.
Logar, N. (2017). Reference corpora revisited: Expansion of the Gigafida corpus. In V. Gorjanc, et al. (Eds.), Dictionary of modern Slovene: Problems and solutions (pp. 96–119). Ljubljana: Ljubljana University Press, Faculty of Arts.
Logar Berginc, N., Grčar, M., Brakus, M., Erjavec, T., Arhar Holdt, Š., & Krek, S. (2012). Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: Gradnja, vsebina, uporaba. Ljubljana: Trojina, zavod za uporabno slovenistiko, Fakulteta za družbene vede.
Logar Berginc, N., & Krek, S. (2012). New Slovene corpora within the communication in Slovene project. Prace Filologiczne, 63, 197–207.
Pérez-Paredes, P., Sánchez-Tornel, M., & Calero, J. M. A. (2012). Learners’ search patterns during corpus-based focus-on-form activities: A study on hands-on concordancing. International Journal of Corpus Linguistics, 17(4), 482–515.
Renouf, A., & Kehoe, A. (2013). Filling the gaps: Using the WebCorp Linguist’s Search Engine to supplement existing text resources. International Journal of Corpus Linguistics, 18(2), 167–198.
Santos, D., & Frankenberg-Garcia, A. (2007). The corpus, its users and their needs: A user-oriented evaluation of COMPARA. International Journal of Corpus Linguistics, 12(3), 335–374.
Soehn, J.-Ph., Zinsmeister, H., & Rehm, G. (2008). Requirements of a user-friendly, general-purpose corpus query interface. In A. Witt, et al. (Eds.), Proceedings of the LREC 2008 workshop ‘sustainability of language resources and tools for NLP (pp. 27–32). ELRA: Paris.
Acknowledgements
The resources described in this paper were funded within the national project ‘Communication in Slovene’ (2008–2013), financed by the European Social Fund and the Slovene Ministry of Education, Science and Sports (Grant No. 3311-08-986003). The evaluation was supported by the infrastructure programme (ARRS-I0-0051) at the Centre for Applied Linguistics (Trojina), and the reference corpus upgrade funded by the Slovene Ministry of Culture (2015–2018) (Grant No. 33400-15-141007). Authors are also grateful to all reviewers for their very constructive input and comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arhar Holdt, Š., Dobrovoljc, K. & Logar, N. Simplicity matters: user evaluation of the Slovene reference corpus. Lang Resources & Evaluation 53, 173–190 (2019). https://doi.org/10.1007/s10579-018-9429-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-018-9429-8