Simplicity matters: user evaluation of the Slovene reference corpus


The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the ‘Communication in Slovene’ project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and abilities of user groups such as translators, writers, proofreaders and teachers. Two years after the corpus was published within the new tool, its features were assessed by the users. With an average rate of 4.36 on a scale between 1 and 5 (1 = I strongly disagree, 5 = I strongly agree), the results indicate that most survey participants agreed or strongly agreed with positive statements about the new implementations (e.g. “The corpus results are displayed in a clear manner”). This is a considerable improvement in user experience from the previous reference corpus of Slovene, i.e. the FidaPLUS corpus within the ASP32 concordancer (rated with 3.67). In the user feedback, the simplicity of search options and the interface clarity are highlighted as the main advantages, while for the future development, advanced visualizations of corpus data and improved search of word-phrases are suggested. The evaluation also highlighted some relevant user habits, such as not taking the time to learn systematically about the tool before they start using it. The findings will be implemented in future editions of the Gigafida corpus, but are relevant to any project that aims at facilitating a wider use of reference corpora and corpus-based resources.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Information about the project is available at

  2. 2.

    Since the Slovene declaration of independence in 1991, five large-scale corpora of written Slovene were compiled: FIDA (in 2000), FidaPLUS (2006), Gigafida (2012), Beseda (2000), and Nova Beseda (2011). As the names suggest, they represent two different series of corpora: the first three were built by consortiums of research institutions, and the last two were compiled by the Fran Ramovš Institute of Slovene Language (for a more detailed description of Gigafida see Logar 2017, also Gorjanc 2006; Logar Berginc and Krek 2012). The most recently published corpora are the ones currently in use.

  3. 3.

    To the linguistic community, Gigafida was made freely available also in the NoSketch Engine corpus analysis tool (; Erjavec 2013) and under licence in The Sketch Engine software (Kilgarriff et al. 2014). Although there is no data available on Gigafida usage within these two specialised tools, the initial log analyses of the SSJ concordancer suggest the project website is the default corpus entry point for a large number of users. For example, 16,244 queries have been recorded in August 2014 (launch of the survey), i.e. an average of 524 queries per day. Given that queries have only been recorded for users that had accepted the cookie consent, the actual number of SSJ concordancer users is presumed to be much higher.

  4. 4.

    Among the target user groups, ‘Gigafida’ is typically perceived as an entity comprising annotated texts, the corresponding concordancer, and the web user interface. Not to confuse the participants, only the term corpus was used in the survey.

  5. 5.

    The questionnaire was compiled using the freely available tools at 1KA, OneClick Survey:

  6. 6.

    The estimated time for completing the survey (also stated in its Introduction) was 10–15 min, placing it in the category of medium-long surveys. According to the Basic Recommendations of the 1KA survey tool (How long should my survey be?), this meant that “in addition to interesting topics, respondents require an additional motivation/…/, encouragement or incentive”. In our case, we relied upon the motive of interest, acknowledging that the extensive estimated time may discourage users from participating (we return to this question in Sect. 4). It turned out, however, the average time for valid responses was 6 min and 5 s. This substantial reduction as to the estimated time was presumably caused by the participants’ omission of non-obligatory open format questions.

  7. 7.

    The previous questionnaire was very similar to the one described in this paper in terms of content, length and complexity (Kosem 2012; Arhar 2009). Naturally, as each of the evaluations focused on the corresponding concordancer and interface, certain questions differed.

  8. 8.

    In the first evaluation, additional effort was dedicated to promote the survey among university students, while for the new survey there was intentionally no focused recruiting of any user group.

  9. 9.

    As a reminder, an additional short explanation of each of the listed features was provided in the survey.

  10. 10.

    Such was, for example, a user survey on the usefulness of different genres included in the Corpus of Contemporary Arabic conducted among language teachers and language engineers (Al-Sulaiti and Atwell 2006: 19–25).


Download references


The resources described in this paper were funded within the national project ‘Communication in Slovene’ (2008–2013), financed by the European Social Fund and the Slovene Ministry of Education, Science and Sports (Grant No. 3311-08-986003). The evaluation was supported by the infrastructure programme (ARRS-I0-0051) at the Centre for Applied Linguistics (Trojina), and the reference corpus upgrade funded by the Slovene Ministry of Culture (2015–2018) (Grant No. 33400-15-141007). Authors are also grateful to all reviewers for their very constructive input and comments.

Arhar Holdt, Š., Dobrovoljc, K. & Logar, N. Simplicity matters: user evaluation of the Slovene reference corpus. Lang Resources & Evaluation 53, 173–190 (2019).

