Speaking, as used in the ESL (English as a second language), or EFL (English as a foreign language) contexts, appears to be the most important of the four language skills (listening, speaking, reading, and writing) (Bailey & Savage, 1994; El-Sakka, 2016; Shabani, 2013). Speaking might be considered the fundamental skill in second language (L2) learning (Lazaraton, 2014). Lazaraton (2014) adds that speaking is used as a vehicle to acquire other skills, such as listening, reading, and writing when the teachers adopt productive approaches to teaching, for example, communicative language teaching, silent way, or even audiolingual method. With regard to the role of speaking in facilitating the acquisition of other language skills, Goh and Burns (2012) also claim that speaking enhances other skills. They illustrated that speaking is often used to examine if we listened to something effectively, or comprehension of reading is verified through speaking. Writing also involves speaking since the student requires feedback from instructors and peers. Speaking apparently appeared as the most important language skill in a survey administered by Shaw (1983). J. C. Alderson and L. F. Bachman in Luoma (2004) explicitly declared that we demonstrate our personality, self-image, our knowledge of the world, and our views of the surroundings and beyond through our oral communication with others in a foreign language. Moreover, success in a career, to a large extent, is determined by how proficiently and effectively we use spoken English to communicate our thoughts and ideas with people in the management of the organization (Pandey & Pandey, 2014).

Mark Robson, Director of English and Exams, British Council, labeled English as the “operating system” of global conversation (Robson, 2013, p. 2). He further commented that people involved in economic affairs, the thought leaders, and decision-makers in business are learning and speaking English more than ever before. Of all languages of the world, English now is most widely used in international trade, popular media and entertainment, global telecommunications, publication of newspapers and books, and, most importantly, in the internationalization of education (Rahman & Singh, 2021; Rao, 2019). Therefore, effective communication in spoken English is most desirable in this globalized world due to its status as a lingua franca (Jenkins, 2007). Those who are proficient in spoken English enjoy numerous opportunities in social, academic, and professional life (Brown, 2001; Cook, 2003; Crystal, 2003). Al-Sobhi and Preece (2018) also claim that the role of spoken English in academia, work, and economy has been significantly recognized. Moreover, spoken English empowers people with the ability to express their feelings and thoughts that influence other people, and various social dynamics (Wierzbicka, 2006). Devi and Rao (2018) argue that people’s advanced proficiency in spoken English enables them to persuade others in synthesizing cultures, a prerequisite for building a collective global society. Finally, mastery in spoken English enables the designated people of a country to negotiate with the competing stakeholders on matters related to bilateral interests (Phillipson, 2012; TEPAV, 2015).

However, mastering proficiency in spoken English is not easy, and students as well as common people spend a long time to acquire the desired proficiency (Luoma, 2004). Speaking is a notoriously complex and perplexing skill (Lazaraton, 2014). According to Brown (2007), segmentation of speech as thought groups, hesitation devices, use of colloquial language, and suprasegmental features, for example, stress, rhythm, and intonation cause complexity in speaking. The dynamic and instantaneous nature of speaking also causes the complexity since speakers need to synchronize a number of factors, such as generating ideas, using knowledge of language, and adopting strategies simultaneously (Johnson, 1996). Pawlak and Waniek-Klimczak (2015) maintain that speaking is a multifaceted skill, and efficient command in speaking requires the speakers to coordinate linguistic resources, pragmatic consciousness, contexts of spoken discourse, and norms of conversation. During speaking, students are expected to orchestrate a) knowledge of language and discourse (grammatical knowledge, phonological knowledge, lexical knowledge, and discourse knowledge), b) core speaking skills (pronunciation, speech function, interaction management, and discourse organization), and c) communication strategies (cognitive strategies, metacognitive strategies, and interaction strategies) (Goh & Burns, 2012). Speakers’ performance conditions, such as time pressure, planning what to talk, self-monitoring the quality of performance, and the variety and volume of supports available, also pose difficulty for the speakers (Nation & Newton, 2009). Besides, complexity in speaking is also engendered by the affective dimensions of the speakers (Mihaljević Djigunović et al., 2008; Imura, 2007; Wei, 2007). Mihaljević Djigunović et al. (2008) argue that users’ success in speaking is more strongly connected to affective factors than success in writing. Speaking becomes an impediment for both teachers and students (Bouzar, 2019) since teachers struggle to identify an effective method of instruction while the students face the problem to adopt the appropriate process to master the skill. The difficulty experienced by the teachers poses a dilemma among them about guiding the students to learn to talk (Brown & Yule, 1983).

Conceptualization of the barriers to the improvement in spoken English

Adoption and implementation of instructional methods by the teachers have appeared to be a common problem that hindered the development of students’ proficiency in spoken English (Ali & Walker, 2014; Chowdhury & Kabir, 2014; Farooqui, 2007; Ibna Seraj & Habil, 2021; Islam, 2015; Rahman et al., 2015; Rahman & Pandian, 2018; Seraj et al., 2021). Anderson (1993), Chowdhury (2001), Li (1998), and Liu (1998) found that teachers predominantly apply a grammar translation method to teach speaking. Traditional methods of teaching, such as grammar translation and audiolingual methods are still widespread (Carless, 2009; Farooqui, 2007; Littlewood, 2007; Mangubhai et al., 2007; Paul & Liu, 2018; Richards, 2008). These methods exclusively focus on teaching grammar in isolation which hardly creates opportunities for the students to use English for their real-life communication. Globally, in many universities, the spoken English classes are taught mostly through reading aloud or reciting activities (El-Sakka, 2016). Another limitation of the traditional methods is their approach to instruction. They are predominantly teacher centered, and in the teacher-centered approach to speaking instruction, teachers mostly talk and they appear to be the main source of knowledge to the students where the students are the passive recipients of knowledge (Murray & Christison, 2010; Ning, 2011). These methods minimize students’ opportunities to grow as competent and confident speakers in English (Gomleksiz, 2007).

Although the teachers apply CLT approach to teach spoken English in the classroom, they barely promote autonomy of the learners. Autonomy refers to “the capacity to take control of one’s own learning” (Benson, 2011, p. 58). Supporting learner autonomy is one of the fundamental principles of the CLT approach (Jacobs & Farrell, 2003). Learners’ development of spoken English is affected due to their little opportunity to exercise autonomy (Begum & Chowdhury, 2016; Jamila, 2013; Mehrin, 2017). Consequently, learners’ agency (Duff, 2012) is hardly promoted by the teachers. Agency refers to “people’s ability to make choices, take control, self-regulate, and thereby pursue their goals as individuals, leading potentially to personal or social transformation” (Duff, 2012, p.417). As a result, the students gradually become dependent on the instructors and institutions (Begum & Chowdhury, 2016).

Teachers also become barriers to students’ improvement in spoken English. For example, the study by Xie (2010), and Zhang and Head (2010) identified that teachers’ controlling behavior was the reason why students hesitated to participate in the classroom interaction. When students confront a daunting teacher in the classroom, they feel intimidated and prefer to keep silent. Hossain (2010) identified that teachers’ lack of commitment is responsible for the students’ slow rate of development in speaking proficiency in English since teachers are less available to address students’ individual problems beyond the classroom, and the teachers share their assessment feedback much later. Immediate assessment feedback helps learners repair their mistakes. Even more worrisome is teachers’ low proficiency in spoken English. The survey study of Sadeghi and Richards (2015) reported that students were unable to learn accurate and appropriate spoken English in the class due to teachers’ low proficiency in spoken English. If teachers struggle to continue a conversation, or deliver a speech, how can they help learners? Behroozi and Amoozegar’s (2014) study also found teachers’ low proficiency as a problem in helping students improve English competence. Precisely, Afshar and Asakereh’s (2016) study reported that teachers’ poor quality of pronunciation caused problems for the students to comprehend their instructions and guidance given by the teachers; moreover, their inaccurate pronunciation was unable to motivate the students.

Teachers with low self-efficacy struggle to deliver lessons effectively in spoken English classes (Chen & Goh, 2011). Self-efficacy refers to one’s beliefs about his/her capabilities to perform any task (Bandura, 1986). Teachers’ classroom behavior is formed by how they perceive and understand teaching (Chen & Goh, 2011). Teachers’ self-efficacy influences their commitment to teaching, their classroom practices, their goal-setting, and their motivation (Borg, 2001; Coladarci, 1992; Tschannen-Moran & Hoy, 2001). Teachers with high self-efficacy demonstrate more enthusiastic responses to students’ questions in the classroom and assume more humanistic approaches to facilitate students’ attempts of learning (Coladarci, 1992; Woolfolk et al., 1990). Liu (2007) argues that EFL teachers invariably experience an inferiority complex since they feel that their spoken English is far away from the native speakers’ English.

Besides, students often experience a variety of affective factors, such as anxiety, inhibition, lack of confidence and motivation, and loss of face that affect their natural speaking performance (Burns, 2017; Maleki & Mahammadi, 2009; Wiltse, 2006). These psychological issues reduce students’ willingness to participate in speaking events (Burns, 2017). Students’ low self-esteem pushes them to suffer from low expectancy that eventually causes low achievement in spoken English (Toujani & Hermessi, 2018). Learners’ worries about making mistakes, or fears about criticism disrupt their improvement in spoken English since they develop the tendency of avoidance (Ellis, 2015; Zhang, 2009). Bhattacharjee (2008) also argues that students’ psychological attributes, such as fear, or losing face may be attributed to the attitude of avoiding speaking in English. Precisely, speaking anxiety has recently been intensively investigated and analyzed by the researchers, and they found that anxiety is the most common enemy to students’ development of spoken English (Horwitz et al., 1986; Humphries, 2011; MacIntyre, 1999; MacIntyre & Gardner, 1994). Anxiety destroys students’ self-efficacy which is essential for them to express their ideas and thoughts with confidence and motivation (MacIntyre & Gardner, 1994). Young (1990) reports that speaking appears to be most anxiety-generating experience for the students while Horwitz et al. (1986) find that speaking in English is most worrisome to the students. A systematic review by Ibna Seraj and Habil (2021) identified shyness, anxiety, self-efficacy, reluctance, emotions, and confidence of the learners that create barriers to learners’ development in spoken English.

Moreover, peers’ teasing of students who struggle to express themselves accurately and appropriately in spoken English class is perhaps a common phenomenon in EFL contexts (Lin, 2013; Aeni et al., 2017). Researchers also report that many language learners are worried about making mistakes in a spoken class since they experience anxiety about the reception of negative comments from teachers, and mockery from classmates and others (Kitano, 2001; Méndez & Peña, 2013; Yan & Horwitz, 2008). Moreover, EFL contexts being monolingual, learners have little exposure to English. This low exposure to the English-speaking environment has also been widely reported in the literature as a sociocultural barrier to learners’ development of spoken English (Abdalla & Mustafa, 2015; Adhikari, 2011; Bruner et al., 2015; Seraj et al., 2021). Such monolingual EFL context is described as ‘input-poor environments’ by Kouraogo (1993) since learners in such environments barely get any opportunity to develop their competence in English speaking skills, especially outside the classroom. Besides, the lack of an optimal environment deprives the learners of frequent practice of oral communication in English both inside and outside the classroom (Aeni et al., 2017; Abdalla & Mustafa, 2015; Seraj et al., 2021). Students’ families may also be a barrier to their development of spoken English. The study by Forey et al. (2016) found that due to the lack of required support from families, the children of those families were unable to make progress in language learning since parents’ education, involvement, and socioeconomic status influence their children’s language learning achievement (Phillips et al., 1998). Latha and Ramesh’s (2012) study demonstrated that family literacy determined the success or failure in students’ speaking development. The study revealed that the students whose parents were highly educated outperformed those whose parents have low literacy.

In addition to the problems discussed above, the challenge the learners encounter while trying to speak in English is their insufficient linguistic repertoire, i.e., inadequate vocabulary and limited range of grammatical structures. This insufficiency leads the learners to frequent pauses, long hesitations, or sometimes a complete discontinuation while they are attempting to perform a speaking task. A recent survey study conducted by Xie (2020) in a Chinese university identified that the obvious barrier to learners’ improvement in spoken proficiency is their deficient repertoire of grammar and vocabulary. In addition to inadequacy in linguistic resources, learners also suffer from their limited background knowledge on various topics. When they attempt to speak on a topic, they seem to struggle to retrieve information from their long-term memory. Having adequate topical knowledge enables speakers to connect to the context and with this connection, they can continue a conversation, or develop a speech (Bachman & Palmer, 1996). Bachman and Palmer (1996) hold that learners’ speaking performance is significantly influenced by how much knowledge and ideas they have on the topic at point.

Furthermore, many researchers report that learners’ first language (L1) is a barrier to the development of spoken English (Ahmed & Qasem, 2019; Islam, 2009; Jahan & Jahan, 2008; Khan, 2007; Musliadi, 2016; Seraj et al., 2021). They claim that their reliance on L1 minimizes their use of L2 (English). However, a number of studies report that the students find assistance from L1 transfer while they are learning English (Bouangeune, 2009; Jafari & Shokrpour, 2013; Kafes, 2011; Mart, 2013). The finding of Bouangeune’s (2009) study conducted in Laos was in favor of L1 transfer for the improvement of English proficiency among the students of the experimental group.

Probably, the most frequently cited problem in teaching and learning English speaking skills is the large class size (Musliadi, 2016; Nuraini, 2016; Seraj et al., 2021). In their study, Chowdhury and Shaila (2011) have found that engaging students in various speaking activities in the classroom is difficult because of the large class size. Researchers in Chinese and Indonesian contexts also reported that large class sizes obstructed the teachers to implement their planned lessons (Chen & Goh, 2011; Musliadi, 2016; Nuraini, 2016).

Conceptualization of the remedies to improve spoken English

Currently, popular approaches to instruction offer promising results that might alter the current deplorable situation. One such approach is a shift from a teacher-centered learning model to a learner-centered learning approach that provides the learners with more opportunities to express their vices on instructional and learning issues (Namaziandost, Neisi, et al., 2019). As a learner-centered method of instruction, the cooperative learning (CL) approach has appeared to be effective in spoken English programs (Hall Haley & Ferro, 2011; Namaziandost, Neisi, et al., 2019). With the guidance received from the teachers, the cooperative learners work in heterogeneous groups in solving learning problems, completing projects, or achieving other learning goals (Namaziandost, Neisi, et al., 2019). When the learners are involved in cooperative learning, they get more opportunities for interaction that facilitate their improvement in spoken English (Al-Sohbani, 2013; Ning, 2011). Cooperative learning approach creates an atmosphere of ease and comfort that reduces anxiety among the learners; thus, they gradually develop a positive attitude to interaction and get involved in speaking activities (Namaziandost, Neisi, et al., 2019; Nasri & Biria, 2017; Pattanpichet, 2011; Sühendan & Bengü, 2014; Tahmasbi et al., 2019). In a cooperative learning classroom, the teacher engages learners in debates, dialogues, group discussions, and role-plays (Li, 2015; Lv, 2014; Wang, 2013).

Along with cooperative learning, task-based learning (TBL) can also enable learners to progress in spoken English. In task-based approach, “students are given functional tasks that invite them to focus primarily on meaning exchange and to use language for real-world, non-linguistic purposes” (Van den Branden, 2006, p. 4). A language task is an activity with specified objective that involves the learners to attain that objective through the use of language (Van den Branden, 2006). Learners reap benefit from TBL since their involvement in the task completion helps them improve coping strategies, negotiating skills, and pronunciation as well as suprasegmental features of the language, such as rhythm, stress, and intonation, and overall speaking proficiency facilitated by peer feedback (Maley & Duff, 2006; Philp et al., 2010; Zusuki, 2018; Zyoud, 2010). A recent study by Safitri et al. (2020) identified that TBL creates a relaxed learning environment, develops negotiating skills among learners, and offers opportunities to learners of peer feedback.

Besides, students’ ability to notice the appropriate and accurate usage and use of spoken English is another approach to effective language learning. Noticing, a significant construct in SLA research, refers to “… a cognitive process whereby linguistic exemplars in the input that learners are exposed to are consciously attended to” (Ellis, 2015, p. 348). Noticing, according to Schmidt (1990), “is the necessary and sufficient condition for converting input to intake” (p. 129). Students’ progress in learning to speak is enhanced when they invest conscious attention or noticing to learning. Earlier studies found a favorable influence of noticing on learning English speaking skills (Abdalla, 2014; Baleghizadeh & Derakhshesh, 2012; Bergsleithner, 2007; Mamaghani & Birjandi, 2017). To add, very recent studies by Ögeyik (2017) and Navidinia et al. (2019) also showed a strong effect of noticing on students’ improvement of spoken English.

Moreover, listening to proficient speakers creates opportunities for the prospective speakers since listening provides them with input, and with the input received from listening, speakers internalize information that they use while uttering a speech (Brown & Lee, 2015). Listening and speaking are language skills that most often occur together and are interconnected (Brown & Lee, 2015). Learners’ listening ability enables them to improve speaking (Doff, 1998). While listening, the potential speakers may imitate the actual speakers they are listening to, and thus, they improve speaking since they obtain content and learn pronunciation and grammar (Eun & Lim, 2009). Eun and Lim (2009) also argue that the potential speakers also gain benefits from listening since they understand the sociocultural appropriateness of language from the actual speakers they listen to. The recent study by Mart (2020) finds that for carrying out a conversation in English, speakers require ‘input’ which is provided by listening.

Furthermore, collocations, or words in chunks seem to rescue the EFL learners from their struggle to put words together when they are involved in speaking. Firth (1957) defined collocation as the combination of words associated with each other while Lewis (1997) defined it as predictable co-occurrence of lexical items. Collocational competence is inherently integrated into the processes of second, or foreign language acquisition (Lewis, 1997; Nattinger & DeCarrico, 1992; Richards & Rogers, 2001). Studies also suggest that students who are better equipped with collocations speak more fluently (Ellis & Schmidt, 1997; Nation, 2001; Nattinger & DeCarrico, 1992; Schmitt, 2000). Collocation may come in various combinations, such as verb + noun (perform a surgery, make a decision), adverb + verb (carefully negotiated), verb + preposition (depend on, abstain from), and adjective + noun (high probability, good chance). Students’ improvement in fluency in spoken English is supported by empirical studies that involved collocation instruction. For example, Hsu and Chiu (2008) in their study found that the use of collocation by the subjects helped them improve fluency in speaking. Sung’s (2003) study also found a significant correlation between students’ use of collocation and improvement in speaking proficiency.

In addition, various types of technologies have now been integrated into English language teaching, and all these digital applications have been benefitting the students and the teachers. The use of technology has changed the face of ELT pedagogy. The use of technology in language learning has been quite widespread and diverse in recent years. Kuning (2019) has offered an inventory of various ICT (information communication technology) applications that are widely used in language learning. The inventory includes communication lab, video conferencing, video library, CALL (computer-assisted language learning), TELL (technology-enhanced language learning), pod casting, quick link pen, quicktionary, programs through educational satellites, speech recognition software, YouTube, internet, and blogging. Various studies suggest that learners’ language learning was facilitated when the teaching/learning was ICT integrated. First of all, ICT applications promote learner-centered language learning (Costley, 2014; Riasati et al., 2012). Zhao (2013) claims that ICT integration exposes learners to authentic materials for learning the language skills. Secondly, both teachers and students find language learning engaging, enjoyable, interesting, and interactive when technology is integrated into the learning process (Baytak et al., 2011). Moreover, the use of ICT applications enhances cooperation between the teachers and the students (Sabzian et al., 2013). The study by Parvin and Salam (2015) showed that ICT applications exposed learners to real-life speaking events in meaningful contexts. Alsaleem’s (2014) study on WhatsApp applications using dialogue journals favored students’ improvement in spoken English while the study by Godzicki et al. (2013) found the students were more motivated in learning when they used technology.

Perhaps, the most promising learning approach to the improvement of English skills by the learners is self-regulated learning (Guo et al., 2018; Teng & Zhang, 2018; Tseng et al., 2006). Self-regulated learning (SRL) refers to learners’ independent learning processes that involve them as metacognitively, motivationally, and behaviorally active participants in learning (Zimmerman, 1986). According to Zimmerman (2000), learners operate their self-regulation in learning in three phases: forethought, performance, and self-reflection. Forethought which is much like pre-speaking when the learners set their speaking goal and exploit all sources of motivation while during the performance phase, the learners employ a set of strategies, such as self-control, imagery, time management, environmental structuring, help-seeking, and self-monitoring. However, self-reflection, the third phase of learning, is crucial since this phase involves learners in self-evaluation and self-reaction which engage the learners in more autonomous actions in learning. When the learners are involved in SRL, they assume more autonomy and their learning becomes self-directed (Aregu, 2013). Aregu (2013) also found that learners are more self-efficacious when they use SRL strategies. The study by El-Sakka (2016) found that learners’ speaking anxiety decreased and their proficiency improved when they received SRL strategy instruction. A significant way that SRL strategies might help students is to assist them to avoid distractions and to improve persistence. Students in the current age of technology often get distracted from the study by electronic gadgets, mobile phones, and other addictive tech apps.

Finally, the most desirable intended learning outcomes would be possible through formal instruction if the instructors are facilitated by a robust framework of teacher education. Teachers need education and training to become skilled, expert, knowledgeable, and confident to effectively transform students into successful learners (Safari & Rashidi, 2015). Teaching quality is strongly associated with students’ achievement in learning (Cochran-Smith & Fries, 2005; Goodwyn, 1997; Hagger & McIntyre, 2006). For competent teaching, Shulman (1987) has identified seven types of knowledge: (i) content knowledge, (ii) pedagogical knowledge, (iii) curriculum knowledge, (iv) pedagogical content knowledge, (v) knowledge of the learners and their characteristics, (vi) knowledge of educational contexts, and (vii) knowledge of educational ends. In addition to these pieces of knowledge, teachers’ self-reflection has often been reported in the literature as a critical component of teachers’ growth as professionals (Farrell & Baecher, 2017; Kourieos & Diakou, 2019; Richards & Farrell, 2011).

Bangladesh as the context of the study

In Bangladesh, communicative language teaching (CLT) was introduced in 1996, but the improvement of English teaching and learning was hardly observed after the implementation of CLT for two decades (Ali & Walker, 2014; Chowdhury & Kabir, 2014; Rahman et al., 2019; Rahman & Pandian, 2018). Rahman et al. (2018) identified that several policy errors impeded the implementation of CLT in Bangladesh. As a result, despite the potential of CLT, Bangladesh was unable to take the advantage of CLT as a promising approach to facilitate students’ improvement in English proficiency. Reasons reported in the literature are mostly associated with teachers’ lack of literacy in CLT implementation, assessment, and their preference to and comfort for the traditional methods of teaching (Islam et al., 2021; Rahman et al., 2018; Rahman & Pandian, 2018). We have found that teachers invariably follow a grammar translation method (GTM) to teach the students. As a result, the students are recurrently engaged in a drill or other mechanical types of activities in the classroom. Moreover, the speaking classes are teacher-centered, and TTT (teacher talking time) is more than STT (student talking time) (Barman et al., 2007; Bhattacharjee, 2008). Although CLT seems to be promising to many, scholars, such as Canagarajah (2005), Kumaravadivelu (2001), Nunan (2003), and Humphries and Burns (2015) are critical about its effectiveness for students’ success in learning English, which is also the case in the context of Bangladesh (see Rahman & Pandian, 2018). Teachers are still invariably the main actors in the classroom which, actually, marginalize the students in the teaching/learning enterprise. Therefore, the exercise of learner autonomy is to a bare minimum. The study by Jamila (2013) provides evidence that students lack autonomy. Her study reports that setting goals and objectives of the syllabi, adopting and adapting materials, designing teaching-learning methods, and developing assessment methods are beyond learners’ negotiation.

In a recent study conducted in Bangladesh, Seraj et al. (2021) involved both teachers and students of the tertiary level in surveys and semi-structured interviews and identified the teaching method as a remarkable barrier to students’ progress in spoken English. They stated that a traditional lecture-based instruction promotes rote learning and it can barely involve students in oral interaction. Interference of Bangla (L1 in Bangladesh) in English classes is also a major challenge for the improvement of English proficiency in Bangladesh (Islam, 2009). Students’ use of L1 (Bangla language) often decelerates their progress in spoken English since 86% of students enter the universities from a Bangla-speaking background (Bhattacharjee, 2008; Jahan & Jahan, 2008; Khan, 2007). Seraj et al. (2021) in a recent investigation spotted that both students’ and teachers’ use of L1 inside and outside the classroom stymies their progress in oral English communication skills.

With the approval of the Private University Act 1992 (revised in 2010), private universities started to operate in Bangladesh in 1992 (Rahman & Singh, 2019). These universities adopted an English medium instruction (Dearden, 2014) policy to deliver education. Consequently, students of these universities are demanded to possess high proficiency in English to acquire education successfully (Farooqui, 2007; Tohura, 2016). High proficiency in spoken English naturally comes first for academic engagements with the teachers and the staff members of the universities (Farooqui, 2007; Rahman et al., 2015; Sultana, 2014; Tohura, 2016). However, the majority of the students have low competence in spoken English that stymies their communication with the faculty members and their peers (Akhter & Ashiquzzaman, 2019; Ibna Seraj & Habil, 2021; Seraj et al., 2021; Sultana, 2014). Since the students struggle to participate in academic activities, such as classroom discussion, questioning for clarification, responding to teachers’ questions, making presentations, presenting seminars, and attending viva, their academic achievements are low (Akhter & Ashiquzzaman, 2019; Chowdhury & Shaila, 2011; Ibna Seraj & Habil, 2021; Seraj et al., 2021; Sultana, 2014). Besides, employers in Bangladesh demonstrate their dissatisfaction with the students’ competence in spoken English (Hasan, 2019; Sharmi, 2018).

Furthermore, being an English language instructor of a leading private university, the main author of the current study observes that the majority of the students can hardly carry out simple conversations in English for an extended period, and performing speaking at the advanced level as expected at the university context is beyond their ability. The other faculty members of the university also have similar opinions. During informal discussions with the colleagues regarding the students’ level of speaking proficiency, the main author learned from his colleagues that the students struggle to communicate with them.

Considering the existing gap in the context of Bangladesh and the enormous importance of spoken English for academic and professional purposes, this study intends to investigate the reasons behind students’ low proficiency in this skill and to explore remedies. Although researchers explored reasons that forestall learners to acquire proficiency in spoken English in various regions of the world as well as at various educational levels and contexts in Bangladesh, only a few studies addressed this significant problem at the private university context in Bangladesh. Besides, despite the potential of various approaches to enhance students’ competence in spoken English, existing literature only weakly offered remedial measures. To fill this gap, this study addresses the following research questions:

  1. 1.

    What is the current situation of the proficiency in spoken English among the students at the private universities in Bangladesh?

  2. 2.

    What are the barriers to the improvement of students’ proficiency in spoken English?

  3. 3.

    What initiatives can be taken to help students overcome the barriers?


Given that the initial objective of the study was to determine the current level of proficiency in spoken English of the private university students in Bangladesh, the study involved an IELTS-style test to achieve the objective. More than 11,000 organizations across the world accept IELTS scores (IELTS, 2021). The IELTS speaking test is valid in terms of content validity and face validity, and the test is reliable on the content because of the coherent complementation of the three sections, and accessibility of the format and related information about the test (Li, 2019). The tests were finally followed by semistructured email interviews. Although researchers usually conduct face-to-face interviews (Polit & Beck, 2014), technological development opened up more convenient and effective methods of interviewing, such as telephone interview, videoconference, email, and text message interview methods (Oltmann, 2016; Redlich-Amirav & Higginbottom, 2014). Email interviews in recent years have been able to provide the researchers with an effective alternative (Gibson, 2010; Walker, 2013). Walker (2013) finds that collecting qualitative data using email is both efficient and economic since the researchers can overcome geographical and financial barriers. Email interviews ease scheduling with the potential participants that ultimately ensures the availability of the participants regardless of wherever they are geographically located; thus, researchers can conduct asynchronous interviews (Fritz & Vandermause, 2017). Perhaps, the biggest advantage of an email interview is its ability to offer the participants an opportunity of “authoring their life experiences” (Gibson, 2010, p. 7). The iterative and reflective nature of email interviews facilitates the participants to compose insightful responses that result in rich data (Fritz & Vandermause, 2017; Gibson, 2010).


Since no data are available for reference to the state of the art of students’ proficiency in spoken English, we randomly selected twenty-one students from five private universities located in Dhaka (the capital of Bangladesh) with the help of the teachers of those universities to determine their proficiency level in spoken English. These students were given an IELTS-style speaking test. However, the email interviews engaged twelve students of the twenty-one students who took the tests, and eleven teachers from the five universities mentioned above. For the sake of anonymity, the teacher participants have been labelled as T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, and T11 while the student participants are labelled as S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, and S12. These teachers teach English speaking skill courses in those private universities.


The study employed two instruments: IELTS speaking tests and interview protocol. The IELTS speaking tests were collected from Cambridge IELTS 15 (2020). In addition, semistructured email interviews (Creswell, 2012) were conducted. Semistructured interview protocols (separate protocols for the teachers and the students) were used to gather data on (1) the teacher participants’ evaluation of the test scores, (2) the participants’ perceptions of the barriers to the students’ development of speaking skills, and (3) the participants’ perceptions of the remedies that might favor the students to overcome the barriers.


The tests were examined by two IELTS trainers. One trainer has 12 years of experience in training candidates for the IELTS speaking test while the other trainer has been training candidates for 10 years. The same student took the same test at two different times on two different days given by the two trainers alternatively. The scores given by the two trainers to each student were consistent since the inter-rater reliability was within the high acceptable value (α = .849). For the selection of the students, the main author of the study contacted the teachers of those five universities. The teachers are the main author’s acquaintances. Then, the trainers who are also the main author’s acquaintances were recruited for testing the students. Once the tests were administered, the data were analyzed for inter-rater reliability and frequency. Next, the email addresses of the twenty-one students who took the tests and fifteen teachers of those five universities were collected for the purpose of inviting them to the semistructured email interviews. First, emails were sent to them to explain the purposes of the interviews for their consent. Then, twelve students and eleven teachers agreed to participate in the semistructured email interviews. After that, the interview protocols were emailed to them. While responding to the items of the protocols, the participants continued emailing us if they had any confusion and we immediately replied with clarification. Finally, they returned their responses which were then analyzed for results and reporting.

Ethical considerations

The ethical considerations in the current study were guided by “ethics in research” of the TESOL International Association (2014). Before the study, the participants were informed of the purpose and the potential outcomes of the study. They were assured that participant codes, instead of their names, will be used in order to maintain confidentiality, and they will not experience any harm if they participate in the research. Obtaining the consent of the participants, they were informed of the purpose of the semistructured interview and the content of the interview protocol. Then, they were briefed about the procedures of the interview implementation. The participants agreed that they would be cooperative to establish the priority of the study, rather than their own interests. Finally, we shared with the participants that the study findings will be disseminated by research publication to realize its contribution to society.

Data analysis

The test scores were analyzed using SPSS (version 25) for inter-rater reliability and frequency. The scores were also analyzed for alpha value to examine the inter-rater reliability of the test scores given by the two trainers. Since the one objective of data collection was to obtain teachers’ opinion on what the IELTS speaking band score may be expected for the students to be able to study at the private universities in Bangladesh, the interpretations of the IELTS speaking test, such as expert user (band score 9), very good user (band score 8), good user (band score 7), competent user (band score 6), modest user (band score 5), and limited user (band score 4) were used (Cambridge IELTS 15, 2020). However, a thematic analysis procedure (Braun & Clarke, 2006) was used to analyze the qualitative data for themes and categories.


The tests were administered to obtain answers to the first research question which intends to investigate the current situation of the proficiency in spoken English of the students in the private universities. The test results revealed that the mean score of the students examined by trainer 1 was 5.04 (M = 5.04) while the mean score examined by trainer 2 was 5.09 (M = 5.09). Moreover, as shown in Table 1, according to the examination of trainer 1, the majority of the students belonged to the range of IELTS band score 4.5 (38.1%) and 5 (33.3%) while only one student scored 6.5.

Table 1 Frequency of students’ scores examined by trainer 1

As shown in Table 2, almost similar results were derived by the examination of trainer 2. Here also, the majority of the students belonged to the range of IELTS band score 4.5 (23.8%) and 5 (42.9%) while only two students scored 6.

Table 2 Frequency of students’ scores examined by trainer 2

In addition to the familiarity of the current situation of proficiency in spoken English, we were also interested to explore teachers’ evaluation of the test scores. Hence, the teachers were interviewed. The following section presents teachers’ evaluations and comments on the scores.

Teachers’ evaluation of the test scores

With reference to the first research question, teachers’ evaluation of the test scores was obtained. When the nine teacher participants of the current study were asked for their opinion of the current level of students’ proficiency in spoken English, all the teachers expressed their dissatisfaction although the teachers stated that only a small number of students possess the required proficiency. T7 said, “If I am to evaluate the level of spoken English proficiency of the students, many students would be within the bracket of 4.5 to 5 band scores IELTS.” In his long response, T3 said:

The level of proficiency in speaking skill of the undergraduate students of my university varies. However, I would level them as pre-intermediate and intermediate level. In general, the level is not satisfactory for the tertiary level of education. It affects students’ learning especially in the classroom, since the students fail to communicate their ideas with the faculty members as their speaking is frequently interrupted by pauses and hesitations.

When the teachers were asked about their expected band score for the students to manage their communication needs in the classroom and beyond at the university level, all nine teachers commented that the students should have at least ‘competent user’ (IELTS equivalent 6) level of proficiency although band score 6.5 would be expected.

With specific focus on the salient comments and opinions of the participants collected by the semistructured interviews (teachers and students), data were analyzed to answer the second and third research questions. Themes that emerged from the analysis have been categorized as (1) reasons for students’ low proficiency in spoken English and (2) remedies to reverse the current morbidity of students’ proficiency in spoken English.

Reasons for students’ low proficiency in spoken English

The reasons for the low proficiency in spoken English among the students of the private universities in Bangladesh that have emerged from data analysis include the complex nature of speaking, inappropriate application of instructional methods, teachers’ low proficiency in spoken English and controlling behavior, students’ psychological factors, sociocultural factors, students’ inadequate linguistic resources, L1 interference, and large class size.

According to most of the teachers and students, the first reason that causes difficulty for the students to master spoken English is the complex nature of speaking. In this regard, T3 said, “Speaking occurs instantaneously; students hardly get much time to think, to produce, and to organize a speech before delivery.” The complex nature of speaking was also shared by T6, “Simultaneously coordinating various aspects of language, such as selecting appropriate vocabulary, constructing a grammatically accurate sentence, being aware of sociocultural issues, and most importantly, crafting a suitable strategy for effective communication is really hard considering the quickness of speech production.” To add, S4, S5, and S9 said, “We really struggle to remember a suitable word and to organize the words in accurate grammatical form in a short time that we get when we are in interaction with others, or make a presentation.”

One striking finding was identified with regard to the instructional methods used by the teachers in the classroom. Although they apply CLT pedagogy and support its applications, the teachers still prioritize the role of the teachers in the classroom. For example, T3 said, “Teachers care for the students, so it’s natural that they attempt to play a pivotal role.” T6 stated, “Traditional methods are not all bad; they also have strengths.” However, T8 and T9 supported CLT and they claim that CLT is an effective approach to learning speaking if it is implemented effectively. T8 said:

The current instructional methods are helpful to the students for improving their spoken English proficiency since we use real-life situations to practise and act out the scenarios. We ask the students to have drama presentation, news presentation, motivational speeches, brochure presentation, etc. The situational dialogues are helpful since they can apply in real life situations and help them to come out of their shyness.

When the students were asked about the types of activities they do in the classroom, they responded that the teachers predominantly involve them in grammar and vocabulary activities. S9 said, “In speaking class, I was tired of doing grammar exercises.” Another student (S2) stated, “I was overburdened with the list of vocabulary items that were given to me as homework; actually, I didn’t learn how to use them in speaking.” Many students think that teachers’ talking time limits their opportunity to practice speaking in the classroom. S10 commented, “If my teacher keeps on lecturing, how we as students of a large class get time to practice speaking.”

Analysis of the responses of the students revealed that teachers were also barriers to students’ progress in spoken English. Students experienced that teachers’ low proficiency in English was a problem for them to improve their proficiency in spoken English. S4 said,

We actually highly regard our teachers. We believe that they are the perfect people to solve all our problems. However, when I asked my teachers to help me with the English vocabulary of certain terms, they could not immediately provide me with the English words. I know … a teacher may not know certain terms in English, but it happened with me on several occasions.

Besides, most of the students found their teachers authoritarian. As a result, they felt scared and preferred to refrain from asking questions since they felt that the teachers might be angry with them. S7 said, “I did have questions, but my teacher’s attitude and behavior prevented me to ask those questions.”

Findings also showed that students struggle with various types of psychological problems while trying to perform speaking tasks. The students experience three psychological problems: anxiety, low confidence, and fear to lose face. S8 said, “I really lack confidence; I feel that I don’t have the ability to speak in English.” Coupled with the lack of confidence, S8 also has the problem of anxiety. He said, “I think my lack of confidence causes anxiety in me; when my teacher asks me to speak in the class, I start sweating and I feel that my head is blank.” S10 admitted that she is often forestalled to perform in the speaking class because of her fear to lose face in front of the friends in the classroom context and in front of the relatives in a social context. She said, “I always feel that it would be very shameful if I pause frequently and can’t continue to finish what I need to say.”

With regard to sociocultural issues impeding learners’ progress in spoken English, the respondents commented that sociocultural factors affect students’ improvement in spoken English. Three sociocultural factors have been reported by the respondents, and they include input-poor environment, peer teasing, and family support. T4 stated, “Our students hardly get suitable environments to improve speaking in English because of the monolingual context in Bangladesh; I guess even all their classes are not conducted in English in the university, let alone speaking in English outside the classroom.” In this connection, S12 said, “Where should I practice speaking? Being in a large class, I hardly get my turn and outside the classroom, it’s a Bangla (learners’ L1) world.” Students are also affected by peer teasing if they are unable to speak well. For example, S3 said:

I have come from a rural background and my English pronunciation has my regional accent. So, when I speak in English, I notice the jeering looks of classmates. I also have grammar and vocabulary problems. I don’t know English of many objects and many behaviors. Therefore, I make mistakes and my classmates laugh at me. For all these reasons, I try not to speak in public.

In response to the question on family support, S5 stated, “I hardly get support from my family to improve my spoken English since both my parents are weak in spoken English, so they don’t speak with me in English; however, most annoying is my elder brother who teases me when I try to speak with him in English” while T3 said, “I have come from a rural family and both my parents are illiterate, so I didn’t get any support from my family as we only used Bangla for all communications.”

In response to the question regarding learners’ repertoire of resources, most of the teachers and students commented that students have a deficiency in their resources necessary to speak. Teachers find that students are unable to talk in English because they lack grammatical and lexical knowledge. T1 said, “Most often I find that students struggle to put words together in accurate grammatical structures.” T5, T6, and T11 observe that students’ limited vocabulary causes a huge barrier to their development of speaking skills. T11 said, “The students frequently stammer since they do not find appropriate English words to express their.” However, T10 thinks that students are epistemologically challenged. He said:

I feel, in general, ELT has failed to address the problem of substance. How can one speak in front of others when one does not know much? Experience and knowledge are both important features of good speakers. In order to speak well, one needs to have a certain degree of knowledge and experience. Our students need to read and know things before we push them to speak. Speaking needs to be preceded by accumulation of information and knowledge.

Students have also reported that they lack resources that would require to carry out any conversation, or to deliver a speech. S1 said, “I am not confident about my grammar although I have been learning grammar since my primary school; my vocabulary is also poor.” S7 said, “My problem in speaking is my poor knowledge in various topics; after one or two sentences, I feel lost.” Because of their lack of resources, the students struggle to interact with others in English.

Mixed findings have been obtained from the participants with regard to L1 interference as a barrier to students’ progress in spoken English. Most of the teachers think that students tend to mix Bangla (L1) when they try to speak in English while many students find that Bangla helps them understand the meaning of the English words and teachers’ explanation of the grammatical rules. Regarding students’ over the use of Bangla, T2 said, “Students’ practice in spoken English is hindered because they frequently take resort to Bangla.” However, T6 said, “We can’t only blame the students; in classroom observations of spoken English classes, I observed teachers using Bangla as well.” Finding Bangla as an aid to understand English expressions and English grammatical rules, S7 said, “When I use bilingual dictionary (Bangla to English), I understand the meaning of the English words easily and I can remember them for a long time” while S12 said, “When my instructor explains the grammatical structures in Bangla, the structures appear easy; otherwise, they seem abstract.” However, S11 said, “I believe we should stop using Bangla to improve our spoken English; initially, we might face problems but through continuous practice using only English, we can gradually develop the habit of speaking English.”

When the participants were asked if large class size is a problem, all the participants responded affirmatively. All of them stated that teaching and learning are disrupted if the class size is large. T8 commented, “How can I monitor my students whether they are using English while practicing with peers if I teach a class with 45-50 students?.” T10 stated:

What I find most problematic in teaching a large class is doing assessment. Assessment involves time, and patience since we need to critically evaluate our students’ performance. And assessing speaking is not very easy. You have to assess so many dimensions of speaking, such as grammar, vocabulary, relevance of the ideas to the topic, coherence, pronunciation, suprasegmental features, non-verbal cues and so forth. To be honest with you, I often experience burnout while teaching a large class. Worst of all, the students get upset as they don’t get the support they need.

Remedies to reverse the current “morbidity” in spoken English

To overcome these problems, the remedies that have been suggested by the participants include: using task-based and cooperative learning, improving family support, encouraging learners to notice English usage and use in spoken English, integrating listening to speaking instruction, teaching speaking through collocation, integrating ICT applications, promoting SRL, and strengthening teacher education.

Regarding the potential of task-based and cooperative learning, the participants provided favorable responses. Both the teachers and the students think that task-based learning and cooperative learning can involve the students more in speaking development. In this connection, T1 said, “Students’ speaking needs may be realized better if they are engaged in real-life speaking tasks, such as, fixing an appointment with the teachers and others, or making a presentation.” T9 also stated:

Task-based learning is more authentic in terms of students’ development of spoken English to function well in the real-world. The tasks may be related to the students’ academic, social, or professional needs. While performing the tasks, they students will be able gain hands-in training on speaking in English. I, actually, find the task-based learning quite fascinating since language as a whole, not as fragments, appears while the students complete a task. Most importantly, task-based learning is learner-oriented, so it minimizes teachers’ involvement and supports learner autonomy.

The students also believe that learning speaking in English through performing tasks is beneficial for them. S3 said, “When my instructor involved me doing a speaking task, I realized where my weaknesses are and what initiatives I should take to overcome my weaknesses.”

With regard to cooperative learning, T11 said, “Since reducing class size is difficult in Bangladesh considering its economic condition, lack of resources and teachers, and infrastructural limitation, cooperative learning might be an effective alternative.” Another teacher (T5) commented, “Cooperative learning might be effective for the students to reduce their psychological problems since they will feel more comfortable to work with their peers.” Cooperative learning seems to be more interesting to the learners as S2 said, “Cooperative learning seems to be fun.”

Noticing was also found effective by the participants of the study. When asked, T3 said, “As a student when I was learning spoken English, I used to notice how the native speakers in English movies spoke, how they pronounced words, how they used their body language, and how they responded to other characters in the movie.” A student (S7) also shared that her mistakes in speaking mostly occur as she often does not notice how competent speakers speak and what linguistic resources they use. She said, “Ya, when I notice carefully, I can note the correct style of talking, but when I don’t pay proper attention, I miss out bulk of it.”

When asked if listening to others helps students’ spoken English, T2, T3, and T5 agreed that students may practically benefit from listening to proficient speakers since the proficient speakers become models for them, and they can imitate them for their progress in spoken English. T5 said, “I always encourage my students to listen to BBC or CNN coz the quality of English of these news broadcasters is good and their exposure to the English on BBC and CNN will help them acquire spoken English proficiency fast.” Students also feel that when they watch English movies, or listen to English songs, their English gets better. S3 said, “To be honest, English movies give me two benefits: fun and good English.” S8, however, got benefit from English music as he said, “I have always been listening to English songs and I learnt some beautiful phrases from the lyrics.”

In response to the question on the use of collocation for speaking development, the participants reported that learning speaking through collocation is an effective method. T1 commented, “Collocations provide the students with authentic native like spoken vocabulary.” Another teacher (T11) also believes that collocations may help students overcome the difficulty of putting words together. When explained to the students what collocation is, they agreed that collocation helps them learn vocabulary effectively. S2 said, “I struggle to find suitable words to be used before or after another word, but when I learn the chunks, I can use them easily.”

Both the teachers and students positively responded to ICT integration in spoken English program. They think that students will be enormously benefitted from ICT integrated teaching/learning. In this regard, T5 stated, “Since the students are constantly exposed to technology, use of technological applications in instructional methods will facilitate their acquisition of speaking competence as the technological applications invariably use English” while S6 said, “I mostly learnt spoken English from watching movies, listening to BBC Hardtalks on my podcast, and of course, watching English speaking videos on YouTube.”

Initially, the participants of the study were confused about SRL but when we explained to them the theory of SRL, all the participants responded that SRL might be very effective. T10 said, “I think SRL is an excellent approach since most of the time, I find my students losing control on their focus on learning; they are usually irregular in their speaking practice.” Another teacher (T8) stated, “Most of the students hardly set goals; therefore, their speaking practice does not sustain.” Some students shared that when they apply SRL strategies, such as goal setting, planning, seeking motivation, self-monitoring, seeking feedback, and self-evaluation, they feel more empowered and their learning is more successful. S5 said, “Well, goal setting …, you know, guides to achieve my speaking learning outcomes” while S10 claimed, “Motivation, to me, is crucial. To be honest with you, self-motivation is most effective to me as it comes from within.”

Teacher participants in the study stated that teacher education is certainly important for facilitating students’ learning. For example, T2 said, “Although teacher education at the tertiary level in Bangladesh is almost neglected, I personally think that it has immense potential in shaping the quality of students’ learning of spoken English.” T6 also stated, “I, actually, learnt a lot about effective teaching when I did teaching practicum course in my undergrad and grad programs.”


The current study aimed to explore the current situation of the undergraduate students’ proficiency in spoken English in the private universities in Bangladesh, barriers the students face to improve proficiency, and the ways how the students may reduce the obstacles. The results derived from the data exposed the researchers to teachers’ evaluation of the students’ level of proficiency in spoken English, a number of challenges the students and teachers experience, and the remedies the participants of the study suggested which might be useful to tackle the situation.

Teachers’ evaluation

Teachers’ dissatisfaction with the current level of proficiency in spoken English is a matter of worry since effective learning at the tertiary level largely depends on students’ ability to communicate their questions, or queries during the interaction in the class. If they are unable to share their thoughts and ideas, there is a possibility that they will be deprived of the opportunity to learn from classroom engagement. Moreover, teachers’ opinion on expected spoken proficiency (IELTS band score 6 or above) is supported by Oliver et al. (2012) as they find that students’ competency in spoken English is essential in the tertiary level since without adequate competency, the students will struggle to achieve the intended learning outcomes of the courses which are often negotiated through classroom discussion.

Reasons for students’ low proficiency in spoken English

This section discusses the findings with regard to the second question which investigated the reasons that cause the students to struggle with the progress in spoken English. The reasons include the complex nature of speaking, inappropriate application of instructional methods, teachers’ lack of proficiency in spoken English, students’ psychological factors, sociocultural factors, students’ inadequate linguistic resources, L1 interference, and large class size.

First, the participants of this study consider that speaking is inherently a complex language skill. The complex nature of speaking is probably one reason why students’ progress in spoken English is slow. Since the students struggle with simple expressions, the complex nature of speaking threatens their confidence which eventually refrains them from participating in speaking events. This finding is supported by a number of scholars. Johnson (1996) finds that the dynamic characteristic of speaking poses problems for the students to master it while Pawlak and Waniek-Klimczak (2015) claim that the multifaceted characteristics of speaking challenge learners to develop it with accuracy and appropriacy. Nation and Newton (2009) also find that the performance conditions of the speakers often puzzle them. Bouzar (2019) claims that the complex nature of speaking poses difficulty for the teachers since they perspire to select suitable methods of instruction. They also experience difficulty to identify what difficulties the students are undergoing since the complexity of speaking is underlying students speaking performances.

Second, regarding the influence of instructional methods on learning spoken English, mixed findings emerged from the data. The teachers reported that the students explicitly struggle with basic linguistic problems, such as problems with tense system, singular-plural, English articles, passive construction, pronunciation, and lexical resources. Therefore, although they apply the CLT approach, they also use grammar translation and audiolingual methods to help students with basic linguistic problems. This finding is apparently present in past studies conducted by Mangubhai et al. (2007), Paul and Liu (2018), Farooqui (2007), Ibna Seraj and Habil (2021), and Seraj et al. (2021). The problem with these traditional methods (grammar translation and audiolingual method) in teaching spoken English is their attention to language as isolated decontextualized fragments. Therefore, students barely get any opportunity to develop their communicative competence at the discourse level. However, teachers’ application of CLT is probably marred by their lack of literacy on the CLT approach. The students reported that they rarely find their teachers to use authentic language in the classroom. Teachers’ language appears to be too formal for spoken English. Teachers seem to lack the knowledge of spoken grammar, and spoken vocabulary (Hilliard, 2014; LeBarton et al., 2015). If the teachers are not appropriately literate on the CLT approach, they will not be able to implement the CLT principles (Jacobs & Farrell, 2003).

Although some teacher participants backed CLT, CLT implementation in Bangladesh is challenged by teachers’ and students’ resistance to many principles of this approach (Biswas et al., 2013; Rahman & Karim, 2015). One such resistance is that teachers still enjoy controlling the classrooms which is conflicting with learner autonomy, an important CLT principle. We assume that teachers have reservations to promote learner agency (Duff, 2012).

Third, students in the current study claimed that teachers’ struggle in spoken English weakens their spirit of learning, and they feel deprived of learning spoken English from teachers in the classroom. Wilden and Porsch (2017), and Enever (2014) emphasized the high proficiency of teachers in EFL contexts. They argue that teachers’ high proficiency enables them to offer high-quality input to the students while low proficiency may cause anxiety among teachers which may eventually affect their management of the class. The study conducted by Sadeghi and Richards (2015) in Iran identified that the success of students’ development of oral English competence largely depends on teachers’ proficiency level in oral interaction in English in the classroom and their spontaneous provision of learning resources to the students. Teachers’ high proficiency in spoken English has several advantages. For one thing, it helps them to conduct a spoken English class smoothly and confidently. Moreover, students receive authentic input from teachers. Most importantly, students often assume teachers as their role models. In an EFL context, students are often unsure about the accuracy and appropriacy of their spoken English. Therefore, they expect that their teachers are always there to help them out; however, when they find that their teachers are struggling, this might weaken their morale, and they might regard that being unable to speak fluently is natural as my teachers also have this limitation.

The findings also revealed that teachers’ dominating behavior in the classroom reduces students’ participation which affects their progress in spoken English since low learning achievement occurs when there is low participation by the students. Xie (2010), and Zhang and Head (2010) also found that teachers’ controlling behavior forestalled students’ improvement in spoken English.

Fourth, students reported that they experience several psychological problems, such as anxiety, low confidence, and fear to lose face when they attempt to involve themselves in spoken English. Many studies conducted in Bangladesh, and outside the country identified these psychological problems as obstacles for the learners to improve spoken English (Bhattacharjee, 2008; Burns, 2017; Ibna Seraj & Habil, 2021; Rahman, 2009; Zhang, 2009). We argue that the worst effect of learners’ psychological factors that are associated with their dissatisfactory performance in spoken English is their negative effects on learners’ self-efficacy. These factors damage learners’ beliefs of their ability to improve their proficiency.

Fifth, apart from psychological factors, learners’ progress in spoken English, as reported by the participants, is also stymied by several sociocultural factors, such as input-poor environment, peer teasing, and unfavorable family support. Input-poor environment deprives the students of the opportunity to practice English speaking outside the classroom. This finding is consistent with the studies by Abdalla and Mustafa (2015), and Seraj et al. (2021) that identified that learners were unable to make much progress in their spoken English due to their bare minimum exposure to the target language. Moreover, peer teasing destroys the morale of the students. The students’ self-esteem is hurt when their peers tease them for their struggle in spoken English. Lin (2013) and Aeni et al. (2017) found in their study that students refrained from participating in spoken English activities because of their worries about peer teasing. In addition, it was found from the responses of the participants that disadvantageous family background debilitated the students to develop as speakers in English. Forey et al. (2016) in their study argued that parents’ education, their involvement in their children’s education, and their socioeconomic condition are significant determinants of their children’s success as language learners.

Sixth, learners’ poor linguistic knowledge, limited vocabulary, and inadequate content knowledge have been reported by the participants as the problems related to learners’ resources that hamper their progress in spoken English. Both the teachers and the students find that learners struggle to organize words in accurate grammatical structures. Learners also struggle to use appropriate words when they attempt to speak in English in a particular situation. The study by Xie (2020) in the Chinese context also found that the students wrestled with grammar and vocabulary while engaged in spoken English. Precisely, Liu and Jackson’s (2008) study showed that lack of vocabulary barred Chinese students’ development of spoken communication skills in English. Apart from grammar and vocabulary, students’ inadequate topical knowledge caused them to have frequent pauses and discontinuation in speaking. Bachman and Palmer (1996) argue that when learners lack ideas to talk about a given topic, they are unable to continue talking.

Seventh, with regard to the interference of Bangla (learners’ L1), mixed findings emerged. Teachers think that interference of Bangla limits learners’ use of English during speaking. If they continue mixing Bangla and English, their development of spoken English competence will be compromised and consequently, this practice will slow their progress. Researchers, such as Musliadi (2016), Ahmed and Qasem (2019), and Seraj et al. (2021) hold a similar opinion. However, students find that Bangla assists them to understand the meaning of English words and teachers’ explanation of grammatical rules. Studies by Bouangeune (2009), Mart (2013), Kafes (2011), and Jafari and Shokrpour (2013) support the use of L1 in L2 (second language) learning.

Finally, large class size has most frequently been commented on by the participants as a major problem that hinders students’ progress in spoken English. According to the participants, students get minimum individual attention when they attend spoken English sessions in large classes while the teachers perspire to manage classes attended by a huge number of students. This finding is supported by a number of studies conducted in Bangladesh and beyond. For example, Chen and Goh (2011), Musliadi (2016), and Nuraini (2016) found in their studies conducted in China and Indonesia that large class sizes obstructed students’ improvement in spoken English. Seraj et al. (2021) also identified large class sizes as a barrier to the improvement in spoken English in their recent study.

Remedies to reverse the current "morbidity" in spoken English

This section discusses the findings with regard to the third research question which explored the remedies that might help the students overcome the barriers discussed above. The remedies entail: using task-based and cooperative learning, improving family support, encouraging learners to notice English usage and use in spoken English, integrating listening to speaking instruction, teaching speaking through collocation, integrating ICT applications, promoting SRL, and strengthening teacher education.

First, the participants have suggested that the use of task-based and cooperative learning might help students succeed in improving spoken English. Evidence of the effectiveness of task-based learning aiding the students in their improvement in spoken English has been recognized by the studies conducted by Philp et al. (2010), Zusuki (2018), and Safitri et al. (2020). They found that TBL practically engages the learners in tasks that they often face in the real world. There is also evidence of the effectiveness of cooperative learning in learners’ progress in spoken English. Nasri and Biria (2017), Tahmasbi et al. (2019), and Namaziandost, Neisi, et al. (2019) found that cooperative learning enhanced learners’ involvement in speaking tasks by creating a congenial learning environment that reduced learners’ anxiety.

Second, the participants reported that students may benefit from increasing their attention to or focus on noticing the accurate and appropriate use of spoken English by others. Studies by Abdalla (2014), Baleghizadeh and Derakhshesh (2012), and Mamaghani and Birjandi (2017) concluded that students’ progress in learning to speak enhanced when they invested conscious attention or noticing to learning. Noticing provokes the learners to be aware of how the competent users of spoken English use a range of grammatical structures, lexical resources, pronunciation, and discourse features. Those who carefully notice usually learn by themselves while those who are indifferent to the accurate and appropriate use and usage of spoken English that occurs around them fall behind and continue to use wrong English.

Third, the participants think that the students should listen to good speakers speaking English around them. Brown and Lee (2015) argue that listening provides the learners with the most valuable resource for learning which is “input.” Doff (1998) claims that learners’ listening ability enables them to improve speaking. A recent study by Mart (2020) finds that students are benefitted by listening to improve their English speaking skills. When we listen to others speaking in English, we get exposure to the natural English expressions, learn pronunciation, and obtain the contexts in which the speakers use their language. When learners notice carefully, they can learn how the speakers around them pronounce words or larger chunks of language, produce grammatically accurate utterances, use lexical resources, connect expressions cohesively and coherently, and maintain fluency.

Fourth, participants have also shared that the learners will be able to improve speaking skills if they learn English collocations. Learning to speak in English through collocations assists the learners to use a chunk of language that naturally occurs in a native context. Learning English through collocations reduces students’ anxiety about how to combine words in accurate grammatical structure since they are already grammatically well-formed. Studies by Hsu and Chiu (2008) and Sung (2003) showed how students’ spoken English improved when they used collocations.

Fifth, ICT integration in spoken English programs has been favored by the participants. They have stated that learning spoken English, as well as other English skills, and the systems of a language, such as grammar, vocabulary, and pronunciation seems to be unimaginable without some assistance of ICT in the modern era. Studies by Alsaleem (2014) and Godzicki et al. (2013) showed that students’ improvement in spoken English occurred through their use of ICT applications. Students in the contemporary world of education heavily rely on technology for entertainment and learning.

Sixth, the participants stated that SRL seems to be a very effective approach to learning spoken English since it encourages learner autonomy as learner agency is embedded in SRL. Instruction on self-regulated learning might enable students to avoid distraction and increase persistence in their learning efforts. When the learners apply SRL strategies, such as setting goals, searching for sources of motivation, self-controlling during learning, and self-evaluating their speaking performances, they remain more focused on learning. Aregu (2013) in his study found that SRL strategies helped students improve their proficiency in spoken English. El-Sakka (2016) also identified that SRL strategies reduced students’ anxiety during speaking in English and enhanced EFL students’ speaking proficiency. The most significant advantage of equipping EFL students with the SRL strategies is that once they get habituated with its application in their learning endeavors, they move ahead with the possibility of becoming independent lifelong learners.

Finally, teacher education was recognized by the participants as a significant factor for students’ progress in learning. Teacher education plays an important role in helping teachers gain knowledge about instructional methods, materials evaluation, and development, design effective tasks and activities, develop assessment methods, and, overall, develop expertise in managing teaching/learning. The goal of teacher education is to enhance the quality of teaching. Teaching quality is strongly associated with students’ achievement in learning (Cochran-Smith & Fries, 2005; Goodwyn, 1997; Hagger & McIntyre, 2006; Karim et al., 2021; Karim & Mohamed, 2019). Athanases and De Oliveira (2008) recommended that EFL instructors should attend teacher education programs at regular intervals since the English language is an evolving phenomenon, and teacher education programs, if designed with the state-of-the-art methodologies and materials, will equip the teachers with the latest innovations in the ELT pedagogy.

Conclusions and implications

Perceiving the enormous importance of students’ proficiency in spoken English in the private universities in Bangladesh, this study examined the state of the art of this proficiency, investigated the underlying reasons, and explored possible solutions. Complex nature of speaking, inappropriate application of instructional methods, teachers’ lack of proficiency in spoken English, students’ psychological factors, sociocultural factors, students’ inadequate linguistic resources, L1 interference, and large class size were identified as the barriers that preclude students’ development of speaking skills in English. Plagued by these problems, both teachers’ and students’ efforts result in dissatisfactory outcomes.

However, as ways to overcome these problems, the study has offered some solutions that include: using task-based and cooperative learning, improving family support, encouraging learners to notice English usage and use in spoken English, integrating listening to speaking instruction, teaching speaking through collocation, integrating ICT applications, promoting SRL, and strengthening teacher education. If these are implemented effectively, students might be able to surmount the challenges explained earlier.

Critical understanding of the reasons identified by the study, and the remedies suggested by the participants will enable the members of the operating trusts of the private universities, members of the curriculum development and revision committees, and the practitioners to adopt practical approaches to ensure students’ effective learning of spoken English. If the study results are taken into cognizance and are aligned with the EMI policies of the private universities in Bangladesh, it has the insights to enable the stakeholders to advance their EMI goals. Moreover, the findings revealed that the causes identified in the study and the remedies recommended by the participants are true about spoken English in many other regions of the world. Therefore, facilitating teacher education programs on task-based and cooperative learning will take success in learning spoken English a step forward in Bangladesh and beyond. Precisely, private universities may take initiatives to include communities in the advancement of education through strengthening family support programs. Workshops on noticing, listening-oriented instruction of speaking, and collocation in L2 acquisition will build the capacity among teachers and students to take advantage of these procedures. Educators can capitalize on the inclination of the learners towards ICT and can develop an ICT-oriented curriculum, instructional methods, materials, tasks and activities, and online modes of assessment. Most importantly, private universities in Bangladesh, and educational institutions in any part of the world may ensure long-term gain if they can implement their spoken English courses through SRL instruction since SRL transforms students into lifelong independent learners.