Our main testing material consisted of four texts of approximately 200 words each, which formed the basis of the cloze test and the picture task and a list of 100 words, which was used for the word translation task.
The word list was created by choosing the 100 most frequent nouns from the British National Corpus (BNC). Since some of the words were either very polysemous or had homonyms, we also provided the context in the form of a single sentence where the intended meaning was made unambiguous. Also, some of the first 100 words from the original British National Corpus list were synonyms either in English or in other languages (e.g. kind and sort; job and work), so we excluded one of those words and added the next in line from the British National Corpus list.
The texts from the cloze test were selected firstly through examining the Common European Framework of Reference for languages (Council of Europe 2001) to find the appropriate level. Levels A1 and A2 are quite basic and fairly restricted in terms of syntactic constructions. Since the Common European Framework of Reference for languages is normally used in language learning contexts and our participants are not learners of other Slavic languages, we had to keep in mind that the level should not be too demanding (B2, C1 and C2 mark a significant degree of fluency, enough to study in a foreign language). Therefore, we opted for the B1 level. We started with ten B1 level texts and chose four with the most appropriate length and culturally neutral content. Next, we slightly adapted them for our purpose: some texts were lengthened, others shortened and long and complex sentences were turned into two simpler ones. In the end, each text contained about 200 words in the English version and consisted of 16–17 sentences.
All the testing material was translated into the six languages of the study using English as the source language to make sure the translations were comparable. The translation was produced by native speakers of Czech, Slovak, Polish, Croatian, Slovene and Bulgarian. The first native speaker would translate all of the words and provide as many alternatives as possible. The second and the third native speakers’ task was to check the list and see if they agreed with the choice of words as well as to provide any other alternatives they could think of. The translated words, which at least two out of three native speakers agreed on were then used as the basis for our testing material and the alternatives provided were later included as potential solutions.
For the spoken version of all three tasks, the material was recorded by six female native speakers of Croatian, Slovene, Bulgarian, Czech, Slovak and Polish (36 speakers in total). They were instructed to read through the texts first in order to familiarize themselves with them and then to read them out clearly at normal speed. We created an online survey with sample recordings, in which native speakers of each of the six languages were instructed to rate each speaker’s clarity and voice quality. They rated the speakers by answering the question ‘How suitable is this speaker for presenting the news on national television?’ on a 5-point semantic differential scale ranging from ‘not at all suitable’ to ‘very suitable’. The voices of the four best-rated speakers were then used in the experiment, in order to avoid basing our results on the recordings made by one speaker only. In the final version of the tasks, each voice was used for one of the four texts and for 25 words from the list.
The whole experiment was done online through a custom-made web application.Footnote 2 Participants started the experiment by selecting their native language; subsequently all the questions and instructions in the applications were displayed in the selected language. Participants then completed a background questionnaire, in which we asked about their demographic information, the amount of contact with other Slavic languages and their attitude to them. Next, the participants were randomly assigned a test language and were asked if they had ever studied it and if so, for how long. Finally, they were assigned one of the six possible types of tasks (written-word translation task, spoken-word translation task, written cloze test, spoken cloze test, written picture task or spoken picture task). This means that each participant did only one written or spoken task in one language, which, multiplied by 30 language combinations (not 36, since we did not test any participants for their native language), resulted in 180 tasks.
We tried to minimize potential cheating by carefully piloting time limits to be sufficient for participants who type slowly, but not enough for checking words in dictionaries. In addition, the participants were not able to select any of the text in the application, which made the use of online translation tools extremely difficult within the time limits imposed.
Word translation task
In this task, the participants were presented with 50 words, randomly chosen from our 100-word list. They were given 10 seconds to translate each word. If they finished before the 10 seconds were up, they could either click on the ‘Next’ button or press ‘Enter’ on their keyboards. The time limit was piloted with people whose typing speeds varied greatly, and proved to be sufficient for typing any word from our list, but not sufficient for using a dictionary, online translation tools or other forms of help.
In the written version of the task, the participants saw the words on their screen, one by one. In the spoken version, the words were also presented one by one, but each word was repeated twice. This was designed to approximate a real-life situation in which one can reasonably ask one’s interlocutor to repeat what was said once, but not six or seven times. In order to make sure all participants heard the same input, the space reserved for typing appeared only after a word was played both times.
The cloze test is a task where a certain number of words are omitted from a text and replaced by a gap. This gap is normally a horizontal line with the mean length of all the words that were deleted from the text in the written version of the test, or a beep of uniform length in the spoken version. The participants’ task is to put the words back into the right ‘gaps’. The cloze test is a well-known task in language learning exercises (Oller 1973; Aitken 1977; Alderson 1979; Abraham and Chapelle 1992; Keshavarz and Salimi 2007), but it has also been used to measure intelligibility, e.g. by van Bezooijen and Gooskens (2005).
In our written version of the cloze test, four nouns, four verbs and four adjectives were deleted from a text and placed above it in a random order. The participants could see the whole text in front of them and they had 10 minutes to move all 12 words to the gaps in the text by dragging and dropping them. The word that was used in the text would be grayed out in the selection area, in order to help the participants keep track of their choices. In case they wanted to change an answer, they could simply drag and drop a different word into the same gap, their original word of choice would then re-appear in black in the selection area above the text.
In the spoken cloze test the gap was actually a beep of uniform length (one second, with a 30-ms pause before and after it). In order not to strain the working memory of the participants, the spoken cloze test was played in fragments of one or two sentences, where each fragment contained only one gap. Just like in the word-translation task, the fragments were repeated twice and only then would the participants see 12 words on the screen. A selection had to be made within 30 seconds, or the response was recorded as a blank. Any word used was greyed out, but it could be reused if needed—in the same manner as was described above.
In cases in which language combinations were very distant, a cloze test might result in a floor effect. Still, the participants might be able to grasp the basic gist of the content they read or listened to. In order to measure intelligibility on the level of discourse, we created a picture task, in which the participants read or listen to a short text; their task was to select the one picture out of four that best described the text. The texts used for the cloze test were also the basis of the picture task. Each text had two main aspects. We created sets of four pictures in which those aspects were varied: one picture that contained both correct aspects, two pictures that had one correct and one incorrect aspect and one picture where both aspects were incorrectly represented. An example of two pictures from one of the sets is shown in Fig. 2.
The quality of each set of pictures was tested with a pilot where the participants would listen to a short text in a language they did not understand, and were then asked to choose the picture they thought might best describe what they heard. The purpose of the pilot was to get the participants to choose the picture that made the most sense to them. Ideally, each picture should be equally ‘logical’, so the choice would be completely random, i.e. each picture should roughly be chosen 25 % of the time. In one case the pilot showed that the participants disproportionately favored one of the pictures: this set was adapted to include a more plausible distractor. The two aspects that were varied were having a cold (correct) vs. having a broken leg (incorrect) and eating healthy food and taking medicine vs. eating fast food. The participants favored the correct picture, with a person lying in bed with a cold and healthy food and medicine on the nightstand next to that person. In the adapted version, the semi-correct and the incorrect picture featured a book instead of fast food.
For the written picture task, the participants had 5 minutes to read one of the four texts, chosen at random. If they finished it early, they could press ‘Next’ to continue with the task. Then they saw a set of four pictures and had 30 seconds to select a picture they felt best described what they read / heard. A picture was selected simply by clicking on it. In the spoken version of the task, the participants listened to a text once and then saw the set of four pictures.
The results of the word translation task were manually corrected to allow typos, synonyms and any words that could be used in place of our target words in certain contexts. All the translations given by our participants were checked by two native speakers of Croatian, Slovene, Bulgarian, Czech, Slovak and Polish and their final scores were then calculated. Each correctly translated word was one point and the maximum score was 50. We then converted the 0–50 scores into percentages in order to facilitate comparison with other results.
The cloze test was scored automatically—each correctly placed word was one point. Since there were 12 gaps to fill, the results on the cloze test range from 0 to 12. The scores were converted to percentages for the sake of comparability.
The results of the picture task can be broken down into four categories: the participants who selected the correct picture; those who selected one of the semi-correct pictures; those who selected the incorrect picture and a very small number of participants who failed to select anything before the time ran out. For reasons of simplicity, we have only presented the percentage of participants who selected the correct picture. We urge the reader to keep in mind that, since this task represents a choice between four options, the chance of choosing a correct answer is 25 %.
The issue of Bulgarian Cyrillic
Bulgarian is the only language in our group written exclusively in Cyrillic. This meant that the native speakers of other Slavic languages, most of whom cannot read Cyrillic, are not able to do the written tasks in Bulgarian. Since we did not want to make the task artificial by transliterating Bulgarian, and we still wanted to obtain that data, we decided to only assign written tasks in Bulgarian to those participants who indicated in the background questionnaire that they could read Cyrillic. The opposite problem, i.e. native speakers of Bulgarian not being able to read Latin did not arise. A consequence of this choice is that the results for written Bulgarian might be somewhat biased by the fact that some participants might have learned another Slavic language.
Since we were primarily interested in young adults, we limited the sample to 18 to 30-year olds. Other filtering criteria included that they had at least completed their high school education, one of the six Slavic languages of the study had to be both the native language and the language the participants mostly spoke at home and they should not have learned the test language. A total of 5,965 participants took part in the study. Each of our three methods in the written or the spoken version was used with more than 1,000 participants. The mean age was 23 years and around two-thirds of the participants were female. Each of the 180 individual tasks was performed by at least 15 and at most by 61 participants. The mean number of participants per task was 33.14. Table 1 shows the number of participants, their mean age and the percentage of males and females across all six tasks.