1 Introduction

A Slovak tourist on a holiday in Croatia who wants to communicate with the locals has several options at hand (Backus, Marácz, and ten Thije 2011). She can use English and hope that the English of the locals as well as her own is at a level that is sufficient for mutual understanding. According to a report by the European Commission (Special Eurobarometer 243), however, only 38 % of EU citizens can speak English well enough to be able to hold a conversation, so the odds are that our tourist cannot really communicate that way. She could perhaps try German, which is commonly taught in schools in both Slovakia and Croatia, so it might perhaps qualify as some sort of a regional lingua franca. The odds of that working are even slimmer, since only 11 % of EU citizens report they are able to have a conversation in German (Special Eurobarometer 243). If she could speak a bit of Croatian, perhaps another option might be code-switching, i.e. speaking a bit of both languages, in the hope that she could create a mix that would enable mutual intelligibility. Since her Croatian is limited to the two or three most basic phrases, this option is also out.

What is our Slovak tourist to do then? She could attempt to simply speak her native language and let the locals speak Croatian, while she would try to understand as much of it as she can. Croatian speakers would do the same thing: speak Croatian and try to actively understand Slovak. This type of communication is called receptive multilingualism and our tourist is quite familiar with it: she talks to Czech speakers all the time in this manner (Nábělková 2007). The same mode of communication is also used in Scandinavia, between speakers of Swedish, Danish and Norwegian (Delsing and Lundin Åkesson 2005) as well as between some speakers of German and Dutch (Ribbert and ten Thije 2007), Finnish and Estonian (Verschik 2012) etc.

Could receptive multilingualism work between a speaker of Slovak and a speaker of Croatian? The success of their actual communication would depend on a number of factors, some of which might be the complexity of the topic, the speakers’ willingness to use this type of interaction, their previous experience with using receptive multilingualism, their cooperativeness, and the urgency of the situation. But the most important factor would certainly be the actual level of mutual intelligibility between the two languages. If a speaker of French and a speaker of Hebrew were to give receptive multilingualism a try, even with all the good will in the world, they would not get past the most basic interaction, probably one focusing on gestures. If the two languages in question belong to the same language family, as is the case with Slovak and Croatian, the odds are that some level of mutual intelligibility might be established.

2 Aims, research questions and hypotheses

In their 2007 report, the High Level Group on Multilingualism established by the European Commission called for research efforts into receptive multilingualism with regards to Germanic, Romance and Slavic language families. The Mutual intelligibility of closely related languages (MICReLa) projectFootnote 1 was set up with the goal of measuring the level of mutual intelligibility within these three language families as well as to examine the linguistic and extra-linguistic factors influencing intelligibility. The level of intelligibility between two languages approximates how well a speaker of language A could understand a genetically related language B and vice versa. Related languages share a certain percentage of cognates, i.e. words with a common origin which more often than not have a similar form. Their phonological, orthographic, morphological and syntactic systems are likely to be much more similar than the systems of completely unrelated languages. All this could help our speaker of Slovak to understand Croatian. Finally, extra-linguistic factors might also play a role in intelligibility. A positive attitude to a related language could be related to the success in decoding it (Schüppert, Hilton and Gooskens 2015). The amount of contact with this language is also bound to play a role: the more exposed a listener is to a related language, the more likely she is to understand it. The end result of the project is to create a general model of mutual intelligibility taking all the aforementioned factors into account.

The aim of the present paper is to describe the first step towards this model: the empirical testing of the level of mutual intelligibility between six Slavic languages: Czech, Slovak, Polish, Croatian, Slovene and Bulgarian. In order to get as complete a picture as possible, we tested intelligibility using three different methods and all three methods were used with both written and spoken language. The idea is to establish the level of receptive mutual intelligibility—to measure, for instance, how much written or spoken Croatian can a person understand on the basis of their native language, as well as their knowledge of or exposure to other languages of the same family?

Our first research question is: what is the level of mutual intelligibility between Czech, Slovak, Polish, Croatian, Slovene and Bulgarian? Keeping in mind that in the present study we are dealing with languages belonging to two distinct branches of the Slavic language tree, we hypothesize that in most cases, the distinction between West Slavic (Czech, Slovak, Polish) and South Slavic languages (Croatian, Slovene, Bulgarian) will be kept, i.e. a speaker of Polish will always understand Czech and Slovak better than any South Slavic language. The only exception to this pattern could be Bulgarian, which is characterized by an almost complete loss of case in all declensions; the creation of definite articles from demonstrative pronouns and a complex tense system where the infinitive is now completely lost but the distinction into indicative and renarrative tenses emerged (Townsend and Janda 1996). A potential consequence of this discrepancy between Bulgarian and other Slavic languages might be that speakers of Croatian and Slovene could understand Czech and Slovak, West Slavic languages, better than Bulgarian.

We propose that the most intelligible language combination will be Czech-Slovak, not only because of their great linguistic similarity, but also because of a great amount of contact (Nábělková 2007). Polish belongs to a separate sub-family, Lechitic (Rothstein 1993), so the degree of intelligibility between Czech and Polish and Slovak and Polish should only be moderate. We do not expect a high degree of intelligibility across language sub-families, so the second most intelligible language combination should be Croatian-Slovene because Bulgarian is so dissimilar to both of them. In addition, Croatia and Slovenia share a border and have some transitional dialects as well as some degree of exposure to one another.

Our second research question is whether the level of mutual intelligibility is always symmetrical. The asymmetric intelligibility between Czech and Slovak, whereby speakers of Slovak understand Czech better than vice versa, has been a much debated topic after the breakup of Czechoslovakia (Nábělková 2007). Similarly, while Slovenia was a part of Yugoslavia, many native speakers of Slovene were bilingual in Serbo-Croatian as well and some anecdotal evidence seems to suggest that native speakers of Slovene can still understand Croatian better than vice versa. Therefore, we hypothesize that we will find asymmetric intelligibility levels in Croatian-Slovene and Czech-Slovak language combinations.

Our third research question concerns the reliability and suitability of the three methods we are using: the word translation task, the cloze test and the picture task. We shall examine whether they all give a similar pattern of results.

In Sect. 3, we present previous research into mutual intelligibility of closely related languages with a focus on methodology. In Sect. 4 we describe all three methods for measuring intelligibility used in the study. In Sect. 5 we present the results; Sect. 6 is reserved for a discussion of our results and special attention is paid to the three methods and Sect. 7 for a conclusion and future directions.

3 Previous research

3.1 Mutual intelligibility in the Slavic language area

Research into mutual intelligibility in the Slavic language area mainly focused on Czech and Slovak and the specific relationship between the two (Budovičová 1978; Hoffmannová and Müllerová 1993; Berger 2003; Sloboda 2004; Nábělková 2008; Sloboda and Nábělková 2013). Dickins (2009) used opinion testing and compared the results from his survey with the opinion intelligibility study conducted by Tejnor (1971). He found that the percentage of speakers who reported active knowledge of Slovak increased dramatically, from 12 % in 1971 to 61 % in 2005 and over 90 % of participants in this study claimed to possess a receptive knowledge of Slovak.

3.2 Languages of the study

Slavic languages are traditionally divided into three distinct branches (Sussex and Cubberley 2006): West Slavic, South Slavic and East Slavic. Since the MICReLa project mainly deals with the languages of the European Union, Russian, Belarussian and Ukrainian were not among the languages observed. We decided to focus on three West Slavic languages (Czech, Slovak and Polish) and three South Slavic languages (Croatian, Slovene and Bulgarian), since two distinct sub-families give us an interesting basis for comparison both within and across these two clusters. The countries where these languages are spoken are shown in Fig. 1.

Fig. 1
figure 1

Map of Europe with countries where South Slavic languages (dark grey) and West Slavic languages (light grey) are spoken

3.3 How to measure mutual intelligibility?

Methods of measuring mutual intelligibility can generally be divided into opinion and functional testing (Gooskens 2013). In opinion testing, the participants are asked how well they think they can understand a language (or a speech sample), whereas in functional testing their level of intelligibility is tested by having the listener prove that s/he recognizes linguistic units (word recognition tasks) or grasps the meaning (speech understanding tasks) of some textual unit (sentence, paragraph, story). Opinion testing can be further divided into testing without speech samples, such as the Haugen (1966) study with Scandinavian languages; and testing with speech samples, for example Tang and van Heuven (2007) who tested the mutual intelligibility of Chinese dialects using recordings of the fable of the North Wind and the Sun as the text samples.

There are many methods of functional intelligibility testing, but some commonly used are:

  • Recorded text test, where a speech fragment is played in short sections and the participants are asked to retell what they have heard after each section. This method was first used for native American languages (Voegelin and Harris 1951; Hickerson, Turner and Hickerson 1952; Olmsted 1954). The task is quite intuitive and akin to a real-life situation, but since the participants only retell the content, it is extremely difficult to score such a task in a valid and reliable fashion.

  • A related type of method to the previous one is the sentence translation task, in which the participants read or listen to a text sentence by sentence and then translate every single word they read (or heard) (Gooskens, Heeringa and Beijering 2008). The scoring problem of the retelling tasks is partly solved by counting the number of correctly translated words. Nevertheless, sometimes it is difficult to score a partly correct translation.

  • Word translation tasks, where the participants are asked to translate isolated words (Maurud 1976; Lundin-Åkesson and Zola Christiansen 2001; Kürschner, Gooskens and van Bezooijen 2008). This method is quite quick and easy to administer, but it is also not immune to scoring issues. Additionally, since the task is limited to words in isolation, it completely excludes morphology and syntax as factors that potentially influence intelligibility.

  • One way to make the scoring more objective and automatic is to rely on multiple choice questions. The disadvantage of this approach is that constructing the test and finding the right distracters is difficult. Tang and van Heuven (2009) solved this by using a semantic categorization task, in which the participants classified words into one of ten predetermined semantic categories, for example ‘body parts’, ‘natural phenomena’ etc.

  • The cloze test, in which a number of words in a text are deleted and replaced by gaps (of uniform length). The participants’ task is to insert the correct words into the gaps (van Bezooijen and Gooskens 2005). Alternatively, a list of target words can be presented to the subject, with or without foils. This test captures the understanding of individual words as well as the general context and is relatively easy to score automatically.

For very closely related combinations, in which any of the aforementioned tasks would results in a ceiling effect, tests involving reaction times are an option (Impe 2010). For a more complete overview of the methods of measuring mutual intelligibility, see Gooskens (2013).

What is the best way of measuring mutual intelligibility? In general, there is no exact answer to this question, since every method could be suitable for a particular purpose. Doetjes (2007) compared six different methods of measuring the intelligibility of Swedish for Danish participants: true / false questions, multiple choice questions, open questions, word translation, summary and short summary. The results varied from 93 % for the true / false questions to 66 % for the short summary. Nevertheless, the asymmetry between Swedish and Danish, whereby Danish participants understand Swedish better than vice versa, was kept in all cases. Even though the absolute values vary, the basic assumption is that, as long as the testing conditions are kept constant, different methods should give the same overall pattern of results.

Since our aim was to measure mutual intelligibility between six languages, some of which are very closely related, while others might be quite distant, we needed a method that would capture all the variations in intelligibility. We planned to use a large number of participants, which meant that an automatic scoring of the data was essential. One of the aims of the MICReLa project is to measure the relative influence of linguistic factors on intelligibility; therefore, we needed testing material where such a relationship could be readily observed.

In the end, we opted for three different methods: word translations task, which focuses on the intelligibility of isolated words and enables the most direct observation of the influence of linguistic factors on intelligibility; the cloze test, which captures the understanding of individual words, but also of the higher context; and the picture task, which should measure intelligibility at a more general level, the level of main topics. To our knowledge, the spoken version of the cloze test has never been used for testing intelligibility before. The same goes for the picture task, which is our variation of a multiple-choice method at the level of discourse. All three methods will be described in detail in Sect. 4.

4 Method

4.1 Testing material

Our main testing material consisted of four texts of approximately 200 words each, which formed the basis of the cloze test and the picture task and a list of 100 words, which was used for the word translation task.

The word list was created by choosing the 100 most frequent nouns from the British National Corpus (BNC). Since some of the words were either very polysemous or had homonyms, we also provided the context in the form of a single sentence where the intended meaning was made unambiguous. Also, some of the first 100 words from the original British National Corpus list were synonyms either in English or in other languages (e.g. kind and sort; job and work), so we excluded one of those words and added the next in line from the British National Corpus list.

The texts from the cloze test were selected firstly through examining the Common European Framework of Reference for languages (Council of Europe 2001) to find the appropriate level. Levels A1 and A2 are quite basic and fairly restricted in terms of syntactic constructions. Since the Common European Framework of Reference for languages is normally used in language learning contexts and our participants are not learners of other Slavic languages, we had to keep in mind that the level should not be too demanding (B2, C1 and C2 mark a significant degree of fluency, enough to study in a foreign language). Therefore, we opted for the B1 level. We started with ten B1 level texts and chose four with the most appropriate length and culturally neutral content. Next, we slightly adapted them for our purpose: some texts were lengthened, others shortened and long and complex sentences were turned into two simpler ones. In the end, each text contained about 200 words in the English version and consisted of 16–17 sentences.

All the testing material was translated into the six languages of the study using English as the source language to make sure the translations were comparable. The translation was produced by native speakers of Czech, Slovak, Polish, Croatian, Slovene and Bulgarian. The first native speaker would translate all of the words and provide as many alternatives as possible. The second and the third native speakers’ task was to check the list and see if they agreed with the choice of words as well as to provide any other alternatives they could think of. The translated words, which at least two out of three native speakers agreed on were then used as the basis for our testing material and the alternatives provided were later included as potential solutions.

For the spoken version of all three tasks, the material was recorded by six female native speakers of Croatian, Slovene, Bulgarian, Czech, Slovak and Polish (36 speakers in total). They were instructed to read through the texts first in order to familiarize themselves with them and then to read them out clearly at normal speed. We created an online survey with sample recordings, in which native speakers of each of the six languages were instructed to rate each speaker’s clarity and voice quality. They rated the speakers by answering the question ‘How suitable is this speaker for presenting the news on national television?’ on a 5-point semantic differential scale ranging from ‘not at all suitable’ to ‘very suitable’. The voices of the four best-rated speakers were then used in the experiment, in order to avoid basing our results on the recordings made by one speaker only. In the final version of the tasks, each voice was used for one of the four texts and for 25 words from the list.

4.2 Experiment design

The whole experiment was done online through a custom-made web application.Footnote 2 Participants started the experiment by selecting their native language; subsequently all the questions and instructions in the applications were displayed in the selected language. Participants then completed a background questionnaire, in which we asked about their demographic information, the amount of contact with other Slavic languages and their attitude to them. Next, the participants were randomly assigned a test language and were asked if they had ever studied it and if so, for how long. Finally, they were assigned one of the six possible types of tasks (written-word translation task, spoken-word translation task, written cloze test, spoken cloze test, written picture task or spoken picture task). This means that each participant did only one written or spoken task in one language, which, multiplied by 30 language combinations (not 36, since we did not test any participants for their native language), resulted in 180 tasks.

We tried to minimize potential cheating by carefully piloting time limits to be sufficient for participants who type slowly, but not enough for checking words in dictionaries. In addition, the participants were not able to select any of the text in the application, which made the use of online translation tools extremely difficult within the time limits imposed.

4.3 Word translation task

In this task, the participants were presented with 50 words, randomly chosen from our 100-word list. They were given 10 seconds to translate each word. If they finished before the 10 seconds were up, they could either click on the ‘Next’ button or press ‘Enter’ on their keyboards. The time limit was piloted with people whose typing speeds varied greatly, and proved to be sufficient for typing any word from our list, but not sufficient for using a dictionary, online translation tools or other forms of help.

In the written version of the task, the participants saw the words on their screen, one by one. In the spoken version, the words were also presented one by one, but each word was repeated twice. This was designed to approximate a real-life situation in which one can reasonably ask one’s interlocutor to repeat what was said once, but not six or seven times. In order to make sure all participants heard the same input, the space reserved for typing appeared only after a word was played both times.

4.4 Cloze test

The cloze test is a task where a certain number of words are omitted from a text and replaced by a gap. This gap is normally a horizontal line with the mean length of all the words that were deleted from the text in the written version of the test, or a beep of uniform length in the spoken version. The participants’ task is to put the words back into the right ‘gaps’. The cloze test is a well-known task in language learning exercises (Oller 1973; Aitken 1977; Alderson 1979; Abraham and Chapelle 1992; Keshavarz and Salimi 2007), but it has also been used to measure intelligibility, e.g. by van Bezooijen and Gooskens (2005).

In our written version of the cloze test, four nouns, four verbs and four adjectives were deleted from a text and placed above it in a random order. The participants could see the whole text in front of them and they had 10 minutes to move all 12 words to the gaps in the text by dragging and dropping them. The word that was used in the text would be grayed out in the selection area, in order to help the participants keep track of their choices. In case they wanted to change an answer, they could simply drag and drop a different word into the same gap, their original word of choice would then re-appear in black in the selection area above the text.

In the spoken cloze test the gap was actually a beep of uniform length (one second, with a 30-ms pause before and after it). In order not to strain the working memory of the participants, the spoken cloze test was played in fragments of one or two sentences, where each fragment contained only one gap. Just like in the word-translation task, the fragments were repeated twice and only then would the participants see 12 words on the screen. A selection had to be made within 30 seconds, or the response was recorded as a blank. Any word used was greyed out, but it could be reused if needed—in the same manner as was described above.

4.5 Picture task

In cases in which language combinations were very distant, a cloze test might result in a floor effect. Still, the participants might be able to grasp the basic gist of the content they read or listened to. In order to measure intelligibility on the level of discourse, we created a picture task, in which the participants read or listen to a short text; their task was to select the one picture out of four that best described the text. The texts used for the cloze test were also the basis of the picture task. Each text had two main aspects. We created sets of four pictures in which those aspects were varied: one picture that contained both correct aspects, two pictures that had one correct and one incorrect aspect and one picture where both aspects were incorrectly represented. An example of two pictures from one of the sets is shown in Fig. 2.

Fig. 2
figure 2

An example of pictures used for the text about driving a car in winter. The correct picture is on the left and a semi-correct on the right (driving a car—rather than, e.g., flying a plane—in summer)

The quality of each set of pictures was tested with a pilot where the participants would listen to a short text in a language they did not understand, and were then asked to choose the picture they thought might best describe what they heard. The purpose of the pilot was to get the participants to choose the picture that made the most sense to them. Ideally, each picture should be equally ‘logical’, so the choice would be completely random, i.e. each picture should roughly be chosen 25 % of the time. In one case the pilot showed that the participants disproportionately favored one of the pictures: this set was adapted to include a more plausible distractor. The two aspects that were varied were having a cold (correct) vs. having a broken leg (incorrect) and eating healthy food and taking medicine vs. eating fast food. The participants favored the correct picture, with a person lying in bed with a cold and healthy food and medicine on the nightstand next to that person. In the adapted version, the semi-correct and the incorrect picture featured a book instead of fast food.

For the written picture task, the participants had 5 minutes to read one of the four texts, chosen at random. If they finished it early, they could press ‘Next’ to continue with the task. Then they saw a set of four pictures and had 30 seconds to select a picture they felt best described what they read / heard. A picture was selected simply by clicking on it. In the spoken version of the task, the participants listened to a text once and then saw the set of four pictures.

4.6 Scoring

The results of the word translation task were manually corrected to allow typos, synonyms and any words that could be used in place of our target words in certain contexts. All the translations given by our participants were checked by two native speakers of Croatian, Slovene, Bulgarian, Czech, Slovak and Polish and their final scores were then calculated. Each correctly translated word was one point and the maximum score was 50. We then converted the 0–50 scores into percentages in order to facilitate comparison with other results.

The cloze test was scored automatically—each correctly placed word was one point. Since there were 12 gaps to fill, the results on the cloze test range from 0 to 12. The scores were converted to percentages for the sake of comparability.

The results of the picture task can be broken down into four categories: the participants who selected the correct picture; those who selected one of the semi-correct pictures; those who selected the incorrect picture and a very small number of participants who failed to select anything before the time ran out. For reasons of simplicity, we have only presented the percentage of participants who selected the correct picture. We urge the reader to keep in mind that, since this task represents a choice between four options, the chance of choosing a correct answer is 25 %.

4.7 The issue of Bulgarian Cyrillic

Bulgarian is the only language in our group written exclusively in Cyrillic. This meant that the native speakers of other Slavic languages, most of whom cannot read Cyrillic, are not able to do the written tasks in Bulgarian. Since we did not want to make the task artificial by transliterating Bulgarian, and we still wanted to obtain that data, we decided to only assign written tasks in Bulgarian to those participants who indicated in the background questionnaire that they could read Cyrillic. The opposite problem, i.e. native speakers of Bulgarian not being able to read Latin did not arise. A consequence of this choice is that the results for written Bulgarian might be somewhat biased by the fact that some participants might have learned another Slavic language.

4.8 Participants

Since we were primarily interested in young adults, we limited the sample to 18 to 30-year olds. Other filtering criteria included that they had at least completed their high school education, one of the six Slavic languages of the study had to be both the native language and the language the participants mostly spoke at home and they should not have learned the test language. A total of 5,965 participants took part in the study. Each of our three methods in the written or the spoken version was used with more than 1,000 participants. The mean age was 23 years and around two-thirds of the participants were female. Each of the 180 individual tasks was performed by at least 15 and at most by 61 participants. The mean number of participants per task was 33.14. Table 1 shows the number of participants, their mean age and the percentage of males and females across all six tasks.

Table 1 The number of participants per task, their mean age and breakdown per sex

5 Results

5.1 What is the level of intelligibility between Czech, Slovak, Polish, Croatian, Slovene and Bulgarian?

In answer to this question, we present the results obtained with all three methods. Since reading and listening are quite different processes, we will also differentiate between the written and spoken version of each test.

5.1.1 The word translation task

Written word translation task

When reporting on the results, we shall refer to language combinations by mentioning the native language first and the test language second, e.g. Slovak-Slovene would be Slovak participants reading or listening to Slovene.

Across most languages, there is a clear distinction between the West and the South Slavic language cluster, e.g. speakers of Croatian could translate more words from Slovene and Bulgarian than they could from Czech, Slovak or Polish. The only exception is in the case of Polish native speakers, who were more successful at translating words from Bulgarian than they were with Czech and Slovak words.

The West Slavic language cluster seems to be more coherent than the South Slavic one—even in the Czech-Polish and Slovak-Polish language combinations, the participants managed to translate more than 60 % of the words correctly. The highest scores were observed between Czech-Slovak (96.52 %), Slovak-Czech (94.26 %), Slovene-Croatian (80.85 %) and Croatian-Slovene (74.31 %). The overview of the results is provided in Table 2.

Table 2 The results of the written translation task broken down per native and test language

A plot created using multidimensional scaling (MDS), a procedure used for representation of the structure of distance data in two-dimensional space, can be found in Fig. 3. The closer the dots representing different languages are, the closer they are in terms of intelligibility. The clustering of South Slavic and West Slavic languages can be clearly observed, as well as the relative intelligibility levels between the languages.

Fig. 3
figure 3

The MDS representation of the intelligibility scores on the written word translation task

Spoken word translation task

In the spoken version of the translation task, all the groups performed better for the languages of their sub-family than for the languages of the other sub-family. Once again, the scores are slightly higher within the West Slavic language group. The language combinations with the highest level of intelligibility are Czech-Slovak (97.40 %), Slovak-Czech (93.15 %), Slovene-Croatian (82.26 %) and Croatian-Slovene (71.15 %). The MDS plot is shown in Fig. 4 and the complete overview of the results is given in Table 3.

Fig. 4
figure 4

The MDS representation of the intelligibility scores on the spoken word translation task

Table 3 The results of the spoken translation task broken down per native and test language

5.1.2 The cloze test

Written cloze test

Native speakers of Czech, Slovak and Polish scored better for other West Slavic languages, than they did for Croatian or Slovene; however, an interesting pattern emerged in the South Slavic data. The participants of Croatian and Slovene can understand both Czech and Slovak (West Slavic languages) better than they can understand Bulgarian, which is also a South Slavic language. The highest scores on the written cloze test were by Slovak participants listening to Czech 99.63 %), Czech participants listening to Slovak (97.33 %), Slovene participants listening to Croatian (94.14 %) and Croatian participants listening to Slovene (63.89 %). The MDS plot of the results is shown in Fig. 5 and a matrix with all the results can be found in Table 4.

Fig. 5
figure 5

The MDS representation of the intelligibility scores on the written cloze test

Table 4 The results of the written cloze test

Spoken cloze test

The results for the spoken cloze test are generally low—most language combinations fall within the 10–30 % range. Once again, there is a higher degree of intelligibility among West Slavic than South Slavic languages. Speakers of Croatian and Slovene had a higher score for Czech and Slovak than they did for Bulgarian, which is the least intelligible language of the group. The MDS plot of the results can be found in Fig. 6.

Fig. 6
figure 6

The MDS representation of the intelligibility scores on the spoken cloze test

The highest scores were recorded between Slovak-Czech (95.04 %), Czech-Slovak (92.68 %), Slovene-Croatian (79.41 %) and Slovak-Polish (50.69 %). The scores can be found in Table 5.

Table 5 The results of the spoken cloze test

5.1.3 Picture task

Written picture task

When it comes to reading a text and understanding the gist of it, it seems that genetic relationships are not too visible (see Fig. 7 for the MDS plot). The Croatian-Slovene language combination seemed to be more intelligible than the Czech-Slovak one. Croatian and Slovene native speakers had the lowest score for Bulgarian, lower than for Czech, Slovak and Polish as the test languages. Bulgarian participants were more likely to select a correct picture if they read a text in Czech (68.0 %) or Slovak (52.0 %), than if they read it in Slovene (51.6 %), which is also a South Slavic language. Slovak speakers could understand Slovene (63.3 %) just as well as Polish (63.3 %). The highest scores were observed for Slovene-Croatian (100 %), Slovak-Czech (92.6 %), Croatian-Slovene (92.3 %) and Czech-Slovak (85.4 %). A complete overview of the results is provided in Table 6.

Fig. 7
figure 7

The MDS plot based on the results of the written picture task

Table 6 The results of the written picture task, presented as the percentage of participants who chose the correct picture

Spoken picture task

Once again, Croatian and Slovene participants scored slightly higher for Czech and Slovak than they did for Bulgarian. Slovak participants were almost as likely to select the correct picture for Croatian (83.3 %) as they were for Czech (86.0 %) and Polish (83.3 %).

The highest scores were for Slovene-Croatian (92.5 %), Croatian-Slovene (88.9 %) and quite surprisingly, for Polish-Czech (92.3 %). The Slovak-Czech combination, which was by all accounts supposed to be the most intelligible one, had only 86 % of participants who selected the correct picture. The MDS plot can be seen in Fig. 8 and the full overview of results can be found in Table 7.

Fig. 8
figure 8

The MDS plot based on the results of the spoken picture task

Table 7 The results of the spoken picture task, presented as the percentage of participants who chose the correct picture

5.2 Is the level of intelligibility between language pairs symmetric or asymmetric?

In order to answer this question, we decided to focus on the results of the cloze test only, since it represents a middle ground between understanding individual words, which we measured with the word translations task, and understanding the main topics of a text, which was tested with the picture task.

We compared the intelligibility levels in both directions within one language combination, e.g. Czech speakers completing a task in Slovak and Slovak speakers completing a task in Czech, using a two-tailed t-test. When it came to the written cloze test, we found a significant difference across two language combinations. Slovene speakers scored better for Croatian than vice versa, \(t(66) = -5.021\) (\(p < 0.01\), two-tailed); Polish speakers had a better score for Croatian than vice versa \(t(74) = -3.027\), \(p = 0.04\). The results for all the language combinations can be found in Fig. 9.

Fig. 9
figure 9

The scores for the written cloze test, arranged per language combination. Significant differences are marked with ‘∗’

The only significant difference in the spoken cloze test was found for Slovene and Croatian—Slovene speakers understand spoken Croatian better than Croatian speakers can understand spoken Slovene (\(t = -6.561\), \(p < 0.001\)). The full chart can be found in Fig. 10.

Fig. 10
figure 10

The scores for the spoken cloze test, arranged per language combination. Significant differences are marked with ‘∗’

5.3 Do all three tests give a similar pattern of results?

To answer this question, we used the average scores for each native-language-related language combination. The scores were obtained in six different tasks (our three methods in written and spoken versions were then used as variables which we correlated). The correlation matrix can be found in Table 8.

Table 8 The correlation matrix for the results of all six tests. All correlations are significant at 0.01 level

As can be seen from the matrix, the correlations between written cloze test, spoken cloze test, written translation task and spoken translation task (marked in grey) are all extremely highly correlated, so highly in fact that we can assume they are measuring the same thing. The correlations between the written and spoken picture task are high as well, but markedly lower compared to all the others (with the exception of the correlation between the written translation test and the written picture task). In Sect. 5.1 we have described the unusual results obtained from both picture tasks. We assume that they are not disruptive enough to make the correlations extremely low or non-significant, but they lead us to assume that the picture task is either somewhat less reliable compared to the two other methods, or is measuring something qualitatively different enough.

6 Discussion

We shall begin by discussing the three methods used in the study and whether all three yield a similar pattern of results. The results obtained from the word-translation task and the cloze test are quite similar to each other: the participants were always more successful at translating words than they were at doing a cloze test in a closely related language, but the overall pattern is the same. The story is somewhat different for the written and the spoken picture task. Firstly, we observed some peculiar findings: Croatian participants were more successful at understanding Slovene than Czech participants were at understanding Slovak, which we had not predicted would be the case. The same finding was observed with Slovene participants undertaking the task in Croatian—they were equally or more successful in choosing the correct picture than Slovak participants reading / listening to Czech. Secondly, we had not expected Bulgarian to be less intelligible to Croatian than Polish is, but the results of the written picture task seem to indicate this is the case. Slovak participants were almost as good at Croatian in the spoken picture task as they were at Czech, and they had exactly the same score in Croatian than they did in Polish. Thirdly, the results from the written and spoken picture task did not correlate as highly as the results from the two word-translation tasks and the two cloze tests did. Considering that the chance of choosing the right answer was 25 % for this task and coupling it with unexpected findings and somewhat lower correlations with the results of the other two tasks, we were led to conclude that the picture task is not the best way of measuring global intelligibility, at least not in the way we conceived it. Therefore, in the remainder of the discussion, we shall focus on the results from the other two methods only.

Our first research question was explorative in nature and was concerned with the level of mutual intelligibility between three West Slavic languages (Czech, Slovak and Polish) and three South Slavic languages (Croatian, Slovene and Bulgarian). The hypothesis that the Czech-Slovak language combination would have the highest level of mutual intelligibility was confirmed both in the results of the word-translation task and the cloze test. As expected, the Croatian-Slovak language combination was the next to follow. Such a high level of mutual intelligibility between Czech and Slovak indicates that receptive multilingualism is definitely possible, and since we already know that this method of communicating is practically the norm between the speakers of those languages, we conclude that the near-ceiling effect we found is valid. With regards to Croatian and Slovene, our results indicate that receptive multilingualism is generally possible, however, Croatian speakers might have a particularly hard time understanding spoken Slovene. To our knowledge, there was no previous research into Croatian-Slovene mutual intelligibility to confirm this.

Overall, we found moderate levels of mutual intelligibility between the speakers of Polish on one hand and Czech and Slovak on the other. Some degree of receptive multilingualism could be possible with these language combinations as well and anecdotal evidence indicates that there is indeed such communication, particularly in border areas. Sussex and Cubberley also note that all West Slavs can communicate with one another “to some extent” (2006, p. 3).

Bulgarian proved to be the least intelligible language in our study. In both written and spoken cloze test, Croatian and Slovene speakers were as successful when dealing with Czech and Slovak as they were when dealing with Bulgarian. This result stands despite the bias that all the participants who did the written tasks in Bulgarian could read Cyrillic and therefore might have learned another Slavic language written in Cyrillic, such as Russian.

Bulgarian participants overall had the lowest intelligibility scores out of all six native language groups. A part of this finding for the written tasks might be explained by the fact that unlike other participants, native speakers of Bulgarian had to read words or texts in Latin, which is not the alphabet of their native language. But since the results persist in the tasks dealing with spoken language as well, we conclude that the linguistic distance might be the reason for such levels of (un)intelligibility. Our study did not include Macedonian or southern dialects of Serbian, which are also characterized by the loss of cases and consequently, more preposition-based syntax. This means that there is a great discrepancy between Bulgarian on one hand and the other five languages we looked at on the other. The next step of the present project is to investigate the role of linguistic and extra-linguistic factors in the mutual intelligibility of Slavic languages, which should give a definite answer to this question. In the meantime, we can safely conclude that all language combinations from our sample involving Bulgarian can only use receptive multilingualism to a very limited degree.

When it comes to mutual intelligibility across language sub-families, it seems that the speakers of Czech and Slovak can understand Croatian to some extent, particularly in the written mode. With spoken language, however, their abilities are substantially reduced. Since there are features in Croatian phonology, morphology and syntax that are difficult to grasp for a speaker of a West Slavic language, it might be the case that Czech and Slovak speakers might benefit from a teaching program targeted at those differences, perhaps something similar to EuroCom Slav, which is still being developed (Zybatow 2003).

Our second research question was aimed at potential asymmetries in intelligibility. Our hypothesis was that Slovak speakers would be better at understanding Czech and that Slovene speakers would be better at understanding Croatian than the other way around. This hypothesis was only partly confirmed, since we did not find a significant difference in intelligibility levels for Czech-Slovak and Slovak-Czech. While there was indeed a difference in score in favor of Slovak participants, both sets of scores were so high that the difference was not significant. The hypothesis to do with asymmetric intelligibility between Slovene and Croatian was confirmed, since we found statistically significant differences both in the written and in the spoken cloze test. Future research is needed to show whether that difference arises due to linguistic factors, extra-linguistic factors, or the complex interplay of both. Our working hypothesis favors this third option, since there are some transitional dialects between Croatian and Slovene, indicating both a degree of linguistic similarity and at least some language contact in the border area.

Since both the cloze test and the word-translation task resulted in a similar and plausible pattern of results, we conclude that both methods are suitable for measuring the mutual intelligibility of closely related languages within such a varied language family as Slavic. The picture task resulted in some illogical findings, which might indicate that the (mis)understanding of single words that were highly relevant (e.g. the word for bicycle in a text about riding a bike) might have played a larger role in the results than the actual ability to understand a related language.

7 Conclusion and future directions

The present study used three different methods of measuring mutual intelligibility: the word translation task, the cloze test and the picture task. The word translation task and the cloze test had the same pattern of results, while the picture task in its present form is probably not a good way of measuring intelligibility. We found that Czech and Slovak have by far the highest level of mutual intelligibility, followed by Croatian and Slovene. In the case of Croatian and Slovene, the intelligibility is asymmetric, since Slovene participants could understand Croatian better than vice versa. The division into West and South Slavic languages is well preserved in the results, except in the case of Bulgarian, which is not very intelligible to the speakers of other South Slavic languages. Given that we did not include Serbian or Macedonian, which share some features with Bulgarian that Croatian or Slovene do not, it might be the case that this discrepancy within the South Slavic language family is a consequence of our language choice, rather than an inherent feature of Bulgarian.

Now that the general levels of mutual intelligibility between Czech, Slovak, Polish, Croatian, Slovene and Bulgarian have been established, we call for more research into the mutual intelligibility of other Slavic languages. The addition of the East Slavic language family would enable another set of comparisons and the inclusion of smaller languages such as Macedonian or minority languages such as Upper and Lower Sorbian would lead to a more complete picture.

Another interesting line of research partly covered by the MICReLa project would be measuring the influence of linguistic and extra-linguistic factors on intelligibility. How much do the differences in lexicon, phonology, syntax etc. influence intelligibility and how much does language exposure, for instance, play a role? Such findings would help us to answer to the question of whether the asymmetry in our results for Croatian and Slovene is due to differences in language structure or simply due to asymmetric contact between the speakers of the two languages.

Since the idea of measuring global intelligibility using pictures shows some promise and we hope that future studies will arrive at a more successful way of applying the method. We propose a finer-grained task, perhaps with a sequence of pictures describing a story, where some of the details in the pictures contradict the story and the participants’ task is to select the correct sequence of pictures. The successful applications of a picture task would be of great use to the field, since it would enable the administration of the task to populations such as small children or illiterate participants or, as in our case, participants who are unable to read the script of one of the test languages.

Finally, there are other approaches to mutual intelligibility besides quantitative ones. Qualitative research into the mutual intelligibility of Slavic languages would reveal what strategies different Slavic speakers use to make themselves understood and how successful inter-Slavic communication really is in a more naturalistic setting. On a more applied note, receptive multilingualism could also be taught: if our Slovak tourist decides from the outset to spend a longer time in Croatia, she might not need to learn Croatian. She could instead focus on learning how to understand it and to speak Slovak so that native speakers of Croatian can understand her. Developing and testing such learning programs is another course research into this topic could take. Hopefully all these new insights will give us a more complete picture of intelligibility in the Slavic language area.