1 Introduction

Technologies and social media have been finding its way not only into business organizations (Yu et al., 2022), but also into education for some years now but during the Covid-19 pandemics, as the lessons moved from physical classroom to the online environment, there was a boom in technology and social media use and the teachers had to promptly adapt to the new demands. Innovations are taking place rapidly in the field of information technologies and are being introduced via numerous social media and networking websites (Abbas et al., 2019).

Facing the challenges of emergency online education inspired teachers to be innovative and use methods previously not considered (Naidu, 2021). In teaching listening skills, teachers have always had to use some media; however, the emergency remote teaching forced them to rethink the way they used the technologies to deal with the limitations of the video-conferencing platforms affected by the quality of internet connection and of the resources that can be shared via such platforms.

Listening is a complex process and processing of information heard can impose a heavy cognitive load on the listeners who must instantaneously and simultaneously retain and disregard information, discern the grammatical features and pronunciation, and battle with the rate of speech, which may cause them to lose concentration quickly (Walker, 2014). Thus the means used in teaching listening comprehension are very important. YouTube is the most popular video-publishing and sharing platform, one of those easily accessible and shareable resources including videos from various areas of interest, such as entertainment, education, politics, medicine, marketing, or even personal videos captured by cell phones (Al-Jarf, 2022). The users do not need to be registered in order to view the contents of the website and it can be accessed through various kinds of devices, which is why it has been increasingly used as a resource for classroom instruction and often chosen as a tool in online teaching. In addition, YouTube videos often include the feature of automatic captions (transcription of the spoken word into written form originally intended for the hard-of-hearing audience) to support comprehension. It is often viewed with hesitation when it comes to its use in videos for the purposes of teaching listening comprehension, mainly because it clearly employs reading skills. However, there are studies showing benefits of captions when used in a controlled way (Winke et al., 2013; Danan, 2016; Rodgers & Webb, 2017; Gass et al., 2019).

Benefits of using (YouTube) videos in teaching foreign language are explored in numerous studies (Çakir, 2006, Watkin & Wilkins, 2011, Silviyanti, 2014, Yagci, 2014; Howlett & Waemusa, 2018, Alhaj & Abahiri, 2020, Yang & Yeh, 2021) yet few of them focus on the effect of captions, therefore the study attempts to fill this gap in literature and research and support the use of videos with captions in teaching listening comprehension, even though the captions are traditionally considered as pertaining to reading.

The paper is structured as follows. We present one of the possible methods of teaching listening comprehension remotely and the results obtained through it, starting with a brief review of studies pertaining to our research (Sect. 2), followed by the presentation of the objective (Sect. 3), methodology (Sect. 4) and main results of the research (Sect. 5), the implications of which are discussed in the final section (Sect. 6) before conclusion (Sect. 7).

2 Literature Review

When discussing online teaching, researchers have started to differentiate between emergency remote teaching, a specific term for the type of instruction delivered in the pressing circumstances of COVID-19, and high quality online education, which requires careful long-term planning, development, and design (Hodges et al., 2020). The new situation required a sudden and rapid shift from the more traditional forms of teaching; the world of education had to adapt quickly and it did (Naidu, 2021), assisted by innovations introduced in response to the gaps resulting from COVID-19 (Liu et al., 2022). Due to the pandemic, teachers’ employment of technologies has increased and so has their confidence in using them (Winter et al., 2021). However, online teaching is not only about the use of platforms and apps; it is “a way of thinking about delivery modes, methods and media, specifically as they map to rapidly changing needs and limitations in resources” (Hodges et al., 2020). Teacher, with the help of electronic hardware, digital software and communication tools, participates in the process of creating a meaningful educational product, which is interesting for the students (Kudinov et al., 2020). For successful learning experience, all aspects of the education process must be considered, similar to other firms and organisations whose overall success is reflected in their organisational, as well as social and environmental activities (Li, X. et al., 2022).

Such is also the case in teaching listening comprehension remotely, as the limitations of technology (quality of internet connection), as well as the limitations of shareable resources are a high concern. Listening is an important skill for learners to master as their first experience with a foreign language is often through what they hear. It is a complex and active process in which the listener must discriminate between sounds, understand vocabulary and grammatical structures, interpret stress and intonation, retain the data collected in the above process and interpret it within the immediate, as well as the larger sociocultural context of the utterance (Vandergrift, 2004, p. 168). Thus listening skills influence other language competences, as well, such as vocabulary and grammar (Ratnaningsih & Gumiandari, 2022), so they are a key to learning a language. When teaching listening skills, one of the challenges for the teacher is to provide optimal input, which stimulates the students and yet is not overwhelming. Krashen (1982) emphasised comprehensibility as the fundamental and necessary feature of optimal input and while he had mostly comprehensibility of the content in mind, reliance on technologies in language teaching introduces yet another aspect of input comprehensibility to think about. Poor internet connection quality can result in audio lagging and cutting causing difficulties in comprehension and when students already have problems with their listening skills, such discomfort adds to the negative feelings concerning the task (Jones, 2021). A convenient solution to avoid such issues can be found in combining synchronous and asynchronous forms of teaching. Instead of streaming the video or audio live via the video-conferencing platform, the resources may be shared through a Learning Management System (LMS), such as Moodle (Moodle), Blackboard (Anthology), Google Classroom (Google), or Canvas (Instructure).

When looking for easily shareable, accessible, and, at the same time, cost-effective resources for listening, YouTube videos are a popular choice. YouTube is a global gateway that can be accessed anywhere at any time, free of charge and with no limits as to the amount watched online (Yagci, 2014). YouTube videos also respond to other criteria of optimal input – to be interesting, relevant and not grammatically sequenced (Krashen, 1982). The videos are popular among students who find them interesting, entertaining and beneficial (Silviyanti, 2014; Howlett & Waemusa, 2018). They help learners (EFL students or translators/interpreters) develop language, intercultural an intracultural knowledge (Yang & Yeh, 2021). Through real-life videos on YouTube platform, the students can become familiar with the culture and language of a foreign country and even be motivated to discover the history and cultural specifics and experience the cultural attractions of the countries for themselves (Li, Y. et al., 2022). According to Çakir (2006, p. 68) video “brings the real world into the classroom, contextualizes language naturally and enables learners to experience authentic language in a controlled environment”, therefore it is a powerful tool in teaching listening comprehension. Students who are, generally, exposed to “clean” English in learning materials can be shocked upon encountering English in the real world and YouTube videos provide a great service of introducing them to a variety of dialects and accents (Watkin & Wilkins, 2011). In addition, just like in real life, video enables the students to capture not only the spoken word but also the paralinguistic features, such as body language, gestures, and facial expressions. Perceiving video clues also helps them connect words to images and simplifies analysing of the language (Harmer, 2001). Combined with the audio input, these visual and non-verbal features of the discourse included in the video facilitate understanding of the message (Alhaj & Albahiri, 2020).

Many of YouTube videos include the feature of automatic captions provided by YouTube for better comprehension of the content. When videos are shared asynchronously for students to watch independently, it is necessary to establish clear expectations on the use of captions (e.g. no captions on the first play to practice listening) and at the same time trust the students to take responsibility for their learning choices (Jones, 2021).

Captioned videos facilitate recognition of new vocabulary and improve comprehension of the content in general (Winke at al. 2013). In a study by Danan (2016), students found captions helpful in discerning words and phrases that were unknown or incomprehensible due to strong accents or bad articulation of speakers. Where students struggle with segmenting speech, captions provide visual means for determining linguistic units (words, morphemes) and grammatical structures (Gass et al., 2019). Jones (2021) suggests that rather than as a learning aid, captions serve to verify what was heard through the text; they improve the comprehension of audio input by closing the gap between students’ reading (usually better developed) and listening competences (Garza, 2008). Providing learners with captions can be an effective method for increasing comprehension of video (Rodgers & Webb, 2017).

All of the above reinforces the claim by Winke et al. (2013), that language input presented in various modes (video, audio, and text) yields better results in language processing, because learners utilize them differently and thus different modes can compensate for and reinforce one another, which is actually grounded in the multimedia learning theory suggesting that “people learn more deeply from words and pictures than from words alone” (Mayer, 2014). Thus the features offered by technologies are combined to provide comprehensible output which is responsible for progress in language acquisition (Krashen, 1982).

A recent study shows that videos help students get a better idea of how the language works and discern between the ambiguities in the voices and accents of native speakers, providing input that is usually not presented in traditional classroom (Masrai, 2022). Another study concludes that digital media can help train students’ listening skills (Tan et al., 2020); Hamad et al., (2019) explore the effect of using YouTube and Audio Tracks Imitation (YATI) on students’ listening performance. Rizkan et al. (2019) revealed that YouTube videos were more effective than audio recording, while Alabsi (2020) pointed out the effect of adding text to videos on students’ listening comprehension performance in foreign language.

The above literature review shows a number of studies that examined the benefits of YouTube video in developing listening comprehension, yet few of them have examined them in the context of captions and multimodal input. Therefore, the study is an attempt to fill this gap in research and literature, as well in education, and to support the use of captions in YouTube videos to facilitate the student’s language acquisition and development of listening skills.

3 Research hypotheses

For student interpreters listening and speaking are essential skills, as their ability to understand and produce spoken discourse in various fields of subject will influence their ability to transfer the meaning from source to target language (Al-Jarf, 2022). The objective of the research was to identify the impact of the emergency remote teaching of listening comprehension, with the chosen methods of delivery, on translation and interpreting students’ learning/skills, i.e., whether students made any progress in their listening skills when taught remotely or whether the form of delivery influenced students’ listening comprehension, also taking into account entry level of proficiency B1 or B2.

For this purpose, we have defined the following null hypotheses:

H0: The results of listening comprehension tests do not depend on the time of testing, i.e., test results do not depend on whether the tests were carried out in the beginning, middle and at the end of the semester.

H0: The results of the listening comprehension tests do not depend on the time of testing or students’ proficiency levels; i.e., test results do not depend on whether the tests were carried out in the beginning, middle or at the end of the semester or whether the students’ proficiency levels were B1 or B2.

In the second part of the research, we were examining the impact of using YouTube videos with captions for practicing listening comprehension skills while taught remotely on their progress in listening comprehension, i.e., as the students could use captions at their own discretion, how their use of captions in listening comprehension training related to their actual performance.

We have defined the following null hypothesis:

H0: Final listening comprehension test scores do not depend on the students’ use of captions during the course.

In the final part, we were interested in the correlation between students’ perception of improvement in their listening comprehension skills and their actual performance.

Null hypothesis was as follows:

H0: The final listening comprehension test scores do not relate to the students’ perception of having improved their listening comprehension skills during the remote teaching of the course.

4 Materials and methods

The research method consisted in a standard questionnaire-based survey complemented with listening comprehension tests carried out on a sample of 50 respondents, all of whom are students of translation studies. The study was carried out within the Listening Comprehension course with two 45-minute lessons a week over the course of 12 weeks during which students watched 16 YouTube videos covering 8 different topics (internet, smart technology, show business, medicine, history, fashion, industry, and interpreting profession). The students were divided into two groups based on their proficiency levels and two different sets of YouTube videos were carefully selected for each group to reflect their listening comprehension levels.

4.1 Participants’ Profile

The participants were first year university students of translation studies studying English as a foreign language in combination with another foreign language with the aim of becoming translators/interpreters, all in the age between 19 and 21. While most of the students were from Slovakia, there were also 9 foreign students, and the group included 34 females and 6 males. None of the students had English as their mother tongue or second language. In the beginning of the course, the participants were tested in order to determine their English proficiency levels. For this purpose, established tests by British council were used (downloaded from their free resource webpage https://learnenglishteens.britishcouncil.org) consisting of a listening comprehension test and a grammar test (during the listening comprehension test only audio recordings were used). The test results revealed that the students entered the translation studies programme with proficiency levels varying from A2 to C1 and they could not be developing their listening comprehension skills using the same materials. Based on the results the students were divided into two groups: B1 (including those testing between A2 and B1) and B2 (including those testing between B2 and C1) and they were receiving listening assignments accordingly. Students who tested between levels B1 and B2 (7 students) were allowed to choose whether they preferred to work with less or more complex and challenging videos considering their knowledge of themselves and their capacities. In the end, 18 students were working in B1 group and 32 students in B2 group.

4.2 Experimental Procedure

As many other education institutions, the university switched to remote teaching during the Covid-19 pandemics and the listening comprehension course had to be adjusted in order to render good quality learning experience using the video-conferencing system available at the university. Based on the experience concerning the quality of internet connection from the first year of pandemic-induced remote teaching, in order to avoid video and audio lagging and cutting out which make the listening comprehension very difficult it was necessary to opt for different methods of delivering listening materials. Asynchronous and synchronous forms of online teaching were combined and students received their assignments (links to the videos to watch) and exercises with questions facilitating comprehension through an LMS platform (Moodle); they completed the listening tasks ahead of time and during the lesson itself the teacher and students met online via video-conferencing platform (Jitsi meet) to discuss the assignments, check the answers to questions, review new vocabulary and have discussions concerning the topics covered in the assignments.

This approach gave the students greater flexibility and the important opportunity to watch the videos without unnecessary interruptions. At the same time, students were entrusted with more responsibility for their learning choices and they could manage their own learning experience at their own discretion.

Student’s listening skills and language skills were tested at the beginning of the course collectively in classroom, as the first lesson did not have to be taught remotely. In the middle and at the end of the course, students took listening comprehension tests again, these times remotely. Audio recordings were used for the tests with the mp3 files and test sheets shared via a LMS; students were given instructions to listen to each recording twice and a limited time to complete the tests and upload them in the LMS for evaluation.

4.3 Instrument

At the end of the course the students took part in a questionnaire consisting of 17 closed and open-end questions pertaining to the evaluation of the course itself and evaluation of the impact of online form of delivery, preferences with respect to the use of video or audio, questions concerning the benefits of YouTube videos, questions concerning the use of captions, and questions concerning the perception of their own listening comprehension improvement.

To measure the students’ progress in listening comprehension established listening comprehension tests developed by British council were used at the beginning, in the middle and at the end of the course.

5 Results

In the following chapter, we first analyse five questions from the questionnaire looking at the relations or factors influencing the development of listening skills by the students of translation studies. The data exploration implies that most of the students (40) felt rather positive or no negative influence of being taught remotely (Table 1). Students’ responses show they mostly preferred videos or their combination with audio recordings for practicing their listening skills (Table 2) with the main benefits according to them being the combination of visual and aural input (41), attractive and entertaining contents (20), easier concentration (7), authentic language including various accents and natural pace of speech (5), variety of topics (4), and the feature of automatic captions (9). Only 4 students did not use captions at all (Table 3) while the others used it with various frequency (Table 4).

Table 1 Data exploration – Impact of remote teaching
Table 2 Data exploration – Preferences of video, audio or both
Table 3 Data exploration – Use of captions
Table 4 Data exploration – Captions use frequency

Regardless of the frequency of the use, students’s responses show they were turning on the captions when they failed to understand the speech due to unknown words and phrases (24), strong accents (12), fast speech (10), incomprehensible connected speech (11), to verify what they heard (8), or to clarify names (4) and numbers (4).

Table 5 Data exploration – Improvement of listening (students’ perception)

46 students saw improvement in their listening skills (Table 5). They have learned new words and phrases (24), they became more familiar with authentic language, variety of accents and fast speech (14), they were learning spelling and pronunciation (17), improved in extracting essential information from the discourse (7) and discerning words in connected speech (5).

Following the above, we were interested in finding if the students could improve their listening competence even when taught remotely. For this purpose, we tested null statistical hypothesis claiming that the time of testing (at the beginning, mid-term, at the end) had no impact on the test scores.

To test the differences in test scores (repeated measures) we used univariate tests for repeated measures. Since the assumption of using analysis of variance was disproved (Mauchley Sphericity test is significant, p < 0.001), we had to adjust the degrees of freedom using G-G and H-F adjustments (Table 6).

Table 6 Mauchley Sphericity Test - result

Recalling our null hypotheses that test scores do not depend on the time of testing or students’ proficiency levels, we proceeded to test these claims using adjusted univariate tests for repeated measures, as shown in Table 7.

Table 7 Adjusted Univariate Tests for Repeated Measures

Null hypotheses claiming that the time of testing (test- entry level, mid-term, and final) and the combination of the time and proficiency level (group- B1 and B2) factors in testing do not influence the students’ performance were rejected at a 0.1% significance level (p < 0.001).

The rejection of the null hypotheses at a 0.1% significance level confirms that both the time of testing and the combination of time and proficiency level significantly influence students’ performance, directly addressing our initial research questions.

Following our findings, which confirmed that there were differences in test scores not only with respect to the proficiency level but also with respect to time, we examined in closer detail between which test scores there were statistically significant differences. We used multiple comparisons, by means of which we determined homogenous groups with respect to the test scores (Table 8).

Table 8 Multiple Comparisons: Homogenous Groups, p > 0.05

Statistically significant differences were proved between the entry-level tests and the other tests (Table 8). In particular, there was a statistically significant difference between the entry-level and other tests of B1 group. Similar results were obtained in the case of B2 entry-level test scores and others, except for B1 mid-term test where no statistically significant difference was identified, i.e., B2 entry-level test and B1 mid-term test comprise a homogenous group with respect to the achieved scores. No statistically significant differences were proven between the mid-term tests and final tests regardless of the proficiency levels; these comprise a homogenous group, even though the results, also considering the proficiency levels, are slightly in favour of the final tests.

The results are visualised in the mean plot, in particular, the total growth of mean scores in time, as well as with respect to the proficiency levels (Fig. 1).

Fig. 1
figure 1

Mean plot – Listening test

The results show that students achieved progress in listening comprehension even under the circumstances of emergency remote teaching. Listening comprehension does not depend on the form of education and students can improve their listening skills even when the subject is taught remotely. Development of listening competence is thus not limited to the classroom or synchronous education.

In the second part of the research, we were examining the impact of using YouTube videos with captions for practicing listening comprehension skills while taught remotely on the students’ progress in listening comprehension. As the students could use captions at their own discretion, we were interested in how their use of captions in listening comprehension training related to their actual performance, which was measured by means of final listening comprehension test scores. We also compared students’ actual progress with their own perception of their improvement in listening comprehension. To examine the relationship between the development of listening skills and use of captions or students’ perception of their improvement we used non-parametric correlation, as the variables were ordinal. For “captions use frequency” we used the scale of 1–4 where 1 was never and 4 always and for “students’ perception of improvement in listening the scale of 1–5 with 1 being definitely no and 5 being definitely yes. In both cases (Table 9), null hypotheses claiming independence of the relationships at 0.05 significance level are rejected, i.e., it was proved that there is a low (Kendall Tau < 0.3), but statistically significant (p < 0.03) positive correlation between the final test scores and the frequency of captions use and also between the final test scores and students’ perception of their improvement in listening comprehension.

Table 9 Kendall Tau Correlations

Scatter plot (Fig. 2) visualises the correlation between the final test scores and captions use frequency and students’ self-evaluation in listening comprehension. The values in both cases change together in the same direction. The size of the nodes represents the occurrence of the ordered pair.

Fig. 2
figure 2

Scatter Plot – Listening

In summary, our analysis has provided evidence against the null hypotheses, demonstrating significant effects of time of testing and proficiency levels on listening comprehension scores and a correlation between the frequency of captions use and improvement in listening comprehension. These findings contribute to our understanding of remote teaching impacts on listening skills development.

6 Discussion

We agree with Al-Jarf (2022) that listening, as well as speaking, are essential skills to be acquired and developed at an advanced level by student interpreters, for whom the ability to understand and produce spoken discourse on various subjects is a pre-requisite as they learn to orally transfer the meaning of specialised texts from the source language to the target language in their interpreting courses, which are part of the translation studies programme, as well as later on in their profession. Even though listening is a key to learning a language, it is often underestimated and taught with traditional methods, which fall short, whereas more appropriate intermediary tools are needed in practicing and developing students’ listening skills (Ratnaningsih & Gumiandari, 2022).

In response to our first research question, whether the students made any progress in their listening skills when taught remotely with the form of task delivery described above, we found that in spite of the Covid-19-induced remote teaching, majority of the students felt no negative influence on their ability to learn and improve their listening skills (Table 1). Our results show that both B1 and B2 groups actually showed significant improvement already after the first half of the semester, i.e., in the mid-term tests (Fig. 1), which confirms that students have an inherent impression of technological progress (Bisena et al., 2021). This may be attributed to several factors, starting with regular exposure to listening tasks and simple practice, which makes a master.

One of the influential factors that may have also played a significant role in the students’ learning and progress they made, which was also frequently mentioned in the students’ comments on the course, was the opportunity to work at their own pace, given by the chosen methods of task delivery. Since the students started the course with big differences in proficiency levels, they actually found learning from home beneficial highlighting mainly the possibility to work independently, having more time to do the assignments, replaying the videos as many times as needed and pausing whenever needed, as well as the use of captions when needed and the possibility to look up unknown words on the internet immediately while doing the assignments. This would not have been possible if the course had been conducted in a physical classroom, as large language classes commonly face problems, such as student discipline and engagement in the learning process, as well as the difficulty in assessing their understanding and providing individual feedback (Bamba, 2012). We agree with Premana et al. (2021), that the use of learning media is to overcome students’ obstacles, limitations, and passive attitudes in the classroom. There is a valuable lesson to be learned, even for the circumstances of regular classroom education, that listening tasks do not always need to be a collective activity because students may benefit greatly from developing their listening skills individually considering their own needs and at their own pace within homework assignments.

Also using videos for practicing listening comprehension may have contributed to the students’ progress as their responses imply that multimodal input (audio + video + text/captions) not only facilitated their understanding and extraction of important information from the discourse but also supported the retention of the learned content in memory, which corresponds with the findings of previous studies (Alhaj & Abahiri, 2020, Watkin & Wilkins, 2011). Students found the videos more attractive and fun, which provided the motivation and engagement in learning, as observed previously by Howlett and Waemusa (2018). Thanks to the “real life” contents of the YouTube videos, students grew more comfortable with authentic language, fast speech and a variety of accents and learned new words, phrases, and idioms from various areas of life, which is particularly crucial for student interpreters. It corresponds with Fadhillah et al. (2021) who showed that students are more interested in learning English using YouTube, a learning medium to practice listening skills, because they learn directly from native speakers.

Based on the evaluation of the students’ performance we have also found that students made less significant progress between the mid-term tests and final tests, thus most of the improvement happened in the first half of the course. This is especially true for the students at the lower proficiency level (B1) who were able to close the gap between their performance and the performance of students in B2 group, which was significant at the beginning of the course. This leads us back to our previous claim that students benefited from the possibility to work individually and manage their learning experience at their own discretion. The results showing lesser progress in the second half of the course are, one hand, influenced by the limitation of the test score maximum, as the students cannot obtain more than 100%, on the other hand, they revealed an opportunity to reassign the students to higher proficiency levels even in the middle of the course in order to provide them with more challenge and opportunity to grow and develop their skills further.

Upon examining our second research question concerning the impact of the use of captions on the students’ progress in listening comprehension, their responses show, interestingly enough, that the captions feature available in YouTube videos was seen by some as an advantage (9) and by others as a disadvantage (10). Yet most of the students were using captions (Table 3) at their discretion, with the majority (36) claiming they were usually watching videos without captions and only turned them on upon encountering comprehension difficulties. The students named situations similar to those in real life, such as unknown words or phrases, strong accents, fast speech, failure to understand connected speech or verification of what was heard, i.e. the situations when their listening skills were falling short in the listening process as described by Vandergrift (2004). While in real communication one can ask the speaker to repeat a phrase, articulate more clearly or explain an unknown term if they fail to understand, when watching videos captions can render similar service. Our results show that students have clearly benefited from having this option – the more frequently they were using captions the better their performance in the final tests was (Fig. 2). According to students, captions helped them discern words in incomprehensible or connected speech, which was mostly appreciated by less proficient students (B1), and thanks to captions they acquired new vocabulary, learned spelling and pronunciation of words and learned to recognize new accents, which corresponds with the findings of previous studies (Winke et al., 2013; Danan, 2016; Rodgers & Webb, 2017; Gass et al., 2019). We can thus assume that the potential of the captions in teaching listening comprehension should not be dismissed (as pertaining to reading not listening) because they operate as an additional mode of input - another piece of a puzzle – creating (together with audio and video) the “full picture” for comprehension (Winke et al., 2013) and as the means of clarifying the misunderstood parts of a message (Jones, 2021), serving as a stepping-up stone taking the students to a higher level little by little.

In response to our final research question, we found positive correlation between the students’ final test scores and their perception of their own improvement in listening comprehension. These results provide a positive feedback on the effectiveness of the course, confirmed both by the students’ performance and their confidence. At the same time, based on their performance, as well as their responses, we may assume they embraced the responsibility for their own development and were able to manage their learning experience in order to see improvement.

7 Conclusion

Language learning through viewing video with captions is supported by the Multimedia Principle applied to second language learning (Fletcher & Tobias, 2005). Learning and comprehension of the content is better when words and pictures are presented together, because when the aural and visual inputs are provided together, the learners comprehend the contents through different channels and create associations between them (Rodgers & Webb, 2017).

The study presents one of the possible approaches to teaching listening comprehension (not only remotely), where students receive video assignments ahead of time and work them out independently prior to the lessons, which gives them a certain level of control over their development.

To answer the research question, whether the students made any progress in their listening comprehension when taught remotely, with the used forms of delivery and taking into account their proficiency (B1 and B2), our results show that students made a significant progress already in the first half of the semester and benefited greatly from being able to consider their individual needs while learning from home. They could watch, replay and pause the videos as needed, and consult additional resources, such as dictionaries, at their own discretion. In addition, students found watching videos motivational and they became more comfortable with authentic language.

Comprehension was easier thanks to the combination of inputs – aural, visual and textual (captions). Our null hypothesis claiming independence between the test scores and the use of captions was rejected, as we found positive correlation between the frequency of captions use and students’ performance in final listening comprehension tests. Like real life interactions in comprehension difficulties, captions in videos can provide clarification of misunderstood contents, as they act as an additional mode of input, allowing the students to use their reading skills, as well (Garza, 2008). By means of captions, language learners acquire new vocabulary, learn to understand accents and discern words even in faster authentic speech and become better listeners step by step.

Finally, in response to our third research question, whether there is any correlation between the students’ perception of their improvement in listening skills and their actual performance we see that the correlation is positive and the students recognized and confirmed the progress they made. Their significant improvement even under circumstances when they had to manage their learning more independently suggests that the students, and university students in particular, can be trusted to make good learning choices (Jones, 2021) even on the other side of the screen.

Our lessons learned are applicable not only to emergency remote teaching but also to regular circumstances of classroom education and confirm that emergency remote teaching was not only a challenge to education institutions but also an opportunity to learn and become more creative with our teaching methods.

The limitations of the study include the small sample of 50 students and a short time frame (12 weeks of one semester), which do not allow for a broader generalisation of the findings. Following the students’ progress in listening comprehension over a longer period of studies and examining how they can advance to higher proficiency levels in listening using the approach of multimodal input can be an opportunity for further research.