Simplicity matters: user evaluation of the Slovene reference corpus

Arhar Holdt, Špela; Dobrovoljc, Kaja; Logar, Nataša

doi:10.1007/s10579-018-9429-8

Simplicity matters: user evaluation of the Slovene reference corpus

Project Notes
Published: 01 October 2018

Volume 53, pages 173–190, (2019)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

237 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the ‘Communication in Slovene’ project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and abilities of user groups such as translators, writers, proofreaders and teachers. Two years after the corpus was published within the new tool, its features were assessed by the users. With an average rate of 4.36 on a scale between 1 and 5 (1 = I strongly disagree, 5 = I strongly agree), the results indicate that most survey participants agreed or strongly agreed with positive statements about the new implementations (e.g. “The corpus results are displayed in a clear manner”). This is a considerable improvement in user experience from the previous reference corpus of Slovene, i.e. the FidaPLUS corpus within the ASP32 concordancer (rated with 3.67). In the user feedback, the simplicity of search options and the interface clarity are highlighted as the main advantages, while for the future development, advanced visualizations of corpus data and improved search of word-phrases are suggested. The evaluation also highlighted some relevant user habits, such as not taking the time to learn systematically about the tool before they start using it. The findings will be implemented in future editions of the Gigafida corpus, but are relevant to any project that aims at facilitating a wider use of reference corpora and corpus-based resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Information about the project is available at http://eng.slovenscina.eu/.
Since the Slovene declaration of independence in 1991, five large-scale corpora of written Slovene were compiled: FIDA (in 2000), FidaPLUS (2006), Gigafida (2012), Beseda (2000), and Nova Beseda (2011). As the names suggest, they represent two different series of corpora: the first three were built by consortiums of research institutions, and the last two were compiled by the Fran Ramovš Institute of Slovene Language (for a more detailed description of Gigafida see Logar 2017, also Gorjanc 2006; Logar Berginc and Krek 2012). The most recently published corpora are the ones currently in use.
To the linguistic community, Gigafida was made freely available also in the NoSketch Engine corpus analysis tool (https://www.clarin.si/noske/; Erjavec 2013) and under licence in The Sketch Engine software (Kilgarriff et al. 2014). Although there is no data available on Gigafida usage within these two specialised tools, the initial log analyses of the SSJ concordancer suggest the project website http://www.gigafida.net/ is the default corpus entry point for a large number of users. For example, 16,244 queries have been recorded in August 2014 (launch of the survey), i.e. an average of 524 queries per day. Given that queries have only been recorded for users that had accepted the cookie consent, the actual number of SSJ concordancer users is presumed to be much higher.
Among the target user groups, ‘Gigafida’ is typically perceived as an entity comprising annotated texts, the corresponding concordancer, and the web user interface. Not to confuse the participants, only the term corpus was used in the survey.
The questionnaire was compiled using the freely available tools at 1KA, OneClick Survey: http://english.1ka.si/.
The estimated time for completing the survey (also stated in its Introduction) was 10–15 min, placing it in the category of medium-long surveys. According to the Basic Recommendations of the 1KA survey tool (How long should my survey be?), this meant that “in addition to interesting topics, respondents require an additional motivation/…/, encouragement or incentive”. In our case, we relied upon the motive of interest, acknowledging that the extensive estimated time may discourage users from participating (we return to this question in Sect. 4). It turned out, however, the average time for valid responses was 6 min and 5 s. This substantial reduction as to the estimated time was presumably caused by the participants’ omission of non-obligatory open format questions.
The previous questionnaire was very similar to the one described in this paper in terms of content, length and complexity (Kosem 2012; Arhar 2009). Naturally, as each of the evaluations focused on the corresponding concordancer and interface, certain questions differed.
In the first evaluation, additional effort was dedicated to promote the survey among university students, while for the new survey there was intentionally no focused recruiting of any user group.
As a reminder, an additional short explanation of each of the listed features was provided in the survey.
Such was, for example, a user survey on the usefulness of different genres included in the Corpus of Contemporary Arabic conducted among language teachers and language engineers (Al-Sulaiti and Atwell 2006: 19–25).

References

Agarwal, R., & Venkatesh, V. (2002). Assessing a firm’s web presence: A heuristic evaluation procedure for the measurement of usability. Information Systems Research, 13(2), 168–186.
Article Google Scholar
Al-Sulaiti, L., & Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(1), 1–36.
Article Google Scholar
Arhar, Š. (2009). Uporabniška evalvacija korpusa FidaPLUS: zasnova vprašalnika, prvi rezultati. In M. Stabej (Ed.), Infrastruktura slovenščine in slovenistike (pp. 19–26). Ljubljana: Znanstvena založba Filozofske fakultete.
Google Scholar
Arhar, Š., Gorjanc, V., & Krek, S. (2007). FidaPLUS corpus of Slovenian: The new generation of the Slovenian reference corpus: Its design and tools. In M. Davies (Ed.), Proceedings of the corpus linguistics conference CL2007 (pp. 1–12). Birmingham: University of Birmingham.
Google Scholar
Arhar Holdt, Š., Kosem, I., & Gantar, P. (2017). Corpus-based resources for L1 teaching: The case of Slovene. In A. Marcus-Quinn & T. Hourigan (Eds.), Handbook on digital learning for K-12 schools (pp. 91–113). Berlin: Springer.
Chapter Google Scholar
Bryman, A. (2012). Social research methods. Oxford: Oxford University Press.
Google Scholar
Erjavec, T. (2013). Slovene corpora for corpus linguistics and language technologies. In K. Gajdošová & A. Žáková (Eds.), Proceedings of the seventh international conference SLOVKO 2013 (pp. 51–62). Bratislava: Slovenská académia vied.
Google Scholar
Erjavec, T., Fišer, D., Krek, S., & Ledinek, N. (2010). The JOS linguistically tagged corpus of Slovene. In N. Calzolari, et al. (Eds.), Proceedings of the 7th international conference on language resources and evaluation (pp. 1806–1809). Paris: ELRA.
Google Scholar
Flowerdew, L. (2009). Applying corpus linguistics to pedagogy: A critical evaluation. International Journal of Corpus Linguistics, 14(3), 393–417.
Article Google Scholar
Frankenberg-Garcia, A. (2012). Raising teachers’ awareness of corpora. Language Teaching, 45(4), 475–489.
Article Google Scholar
Gorjanc, V. (2006). Tracking lexical changes in the reference corpus of Slovene texts. In A. Wilson, D. Archer, & P. Rayson (Eds.), Corpus linguistics around the world (pp. 91–100). Amsterdam, New York: Rodopi.
Chapter Google Scholar
Grčar, M., Krek, S., & Dobrovoljc, K. (2012). Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In T. Erjavec & J. Žganec Gros (Eds.), Proceedings of the eighth language technologies conference (pp. 89–94). Ljubljana: Institut “Jožef Stefan”.
Google Scholar
Groves, M. R., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: Wiley.
Google Scholar
Hardie, A. (2012). CQPweb—Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3), 380–409.
Article Google Scholar
Hewson, C., Vogel, C., & Laurent, D. (2016). Internet research methods. Los Angeles: Sage.
Book Google Scholar
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., et al. (2014). The sketch engine: Ten years on. Lexicography, 1(1), 7–36.
Article Google Scholar
Kilgarriff, A., Rundell, M., & Dhonnchadha, E. U. (2006). Efficient corpus development for lexicography: Building the New Corpus for Ireland. Language Resources and Evaluation, 40(2), 127–152.
Article Google Scholar
Kosem, I. (2012). User-friendly interfaces for corpora of Slovene. Prace Filologiczne, 63, 167–180.
Google Scholar
Krek, S. (2012). The Slovene language in the digital age. Berlin, Heidelberg: Springer.
Google Scholar
Logar, N. (2017). Reference corpora revisited: Expansion of the Gigafida corpus. In V. Gorjanc, et al. (Eds.), Dictionary of modern Slovene: Problems and solutions (pp. 96–119). Ljubljana: Ljubljana University Press, Faculty of Arts.
Google Scholar
Logar Berginc, N., Grčar, M., Brakus, M., Erjavec, T., Arhar Holdt, Š., & Krek, S. (2012). Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: Gradnja, vsebina, uporaba. Ljubljana: Trojina, zavod za uporabno slovenistiko, Fakulteta za družbene vede.
Google Scholar
Logar Berginc, N., & Krek, S. (2012). New Slovene corpora within the communication in Slovene project. Prace Filologiczne, 63, 197–207.
Google Scholar
Pérez-Paredes, P., Sánchez-Tornel, M., & Calero, J. M. A. (2012). Learners’ search patterns during corpus-based focus-on-form activities: A study on hands-on concordancing. International Journal of Corpus Linguistics, 17(4), 482–515.
Article Google Scholar
Renouf, A., & Kehoe, A. (2013). Filling the gaps: Using the WebCorp Linguist’s Search Engine to supplement existing text resources. International Journal of Corpus Linguistics, 18(2), 167–198.
Article Google Scholar
Santos, D., & Frankenberg-Garcia, A. (2007). The corpus, its users and their needs: A user-oriented evaluation of COMPARA. International Journal of Corpus Linguistics, 12(3), 335–374.
Article Google Scholar
Soehn, J.-Ph., Zinsmeister, H., & Rehm, G. (2008). Requirements of a user-friendly, general-purpose corpus query interface. In A. Witt, et al. (Eds.), Proceedings of the LREC 2008 workshop ‘sustainability of language resources and tools for NLP (pp. 27–32). ELRA: Paris.
Google Scholar

Download references

Acknowledgements

The resources described in this paper were funded within the national project ‘Communication in Slovene’ (2008–2013), financed by the European Social Fund and the Slovene Ministry of Education, Science and Sports (Grant No. 3311-08-986003). The evaluation was supported by the infrastructure programme (ARRS-I0-0051) at the Centre for Applied Linguistics (Trojina), and the reference corpus upgrade funded by the Slovene Ministry of Culture (2015–2018) (Grant No. 33400-15-141007). Authors are also grateful to all reviewers for their very constructive input and comments.

Author information

Authors and Affiliations

Faculty of Arts, University of Ljubljana, Aškerčeva 2, 1000, Ljubljana, Slovenia
Špela Arhar Holdt
Institute “Jožef Stefan”, Jamova cesta 39, 1000, Ljubljana, Slovenia
Kaja Dobrovoljc
Faculty of Social Sciences, University of Ljubljana, Kardeljeva ploščad 5, 1000, Ljubljana, Slovenia
Nataša Logar

Authors

Špela Arhar Holdt
View author publications
You can also search for this author in PubMed Google Scholar
Kaja Dobrovoljc
View author publications
You can also search for this author in PubMed Google Scholar
Nataša Logar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nataša Logar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arhar Holdt, Š., Dobrovoljc, K. & Logar, N. Simplicity matters: user evaluation of the Slovene reference corpus. Lang Resources & Evaluation 53, 173–190 (2019). https://doi.org/10.1007/s10579-018-9429-8

Download citation

Published: 01 October 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10579-018-9429-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simplicity matters: user evaluation of the Slovene reference corpus

Abstract

Access this article

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation