1 Introduction

Audio learning has a long history in distance education (DE) and is still being used in classroom teaching and general radio programs. Even though audio is usually termed as the unappreciated medium [7], there has been a good deal of research on its unique pedagogical characteristics [7, 31] . It has remained remarkably constant over a fairly long period. Usually, when used in combination with other media, it is much more impactful as pointed out by the research on multimedia learning. Nevertheless, [7] rightly pointed out that on its own audio presents several merits.

With the advent of Adaptive Context-Aware Learning Environments (ACALEs), interest in audio-only learning has resurged to meet the physical limitations of mobile devices, mobility and networking issues [30], personalization and personal learning styles [32]. Besides, the COVID-19 pandemic has forced educational institutions to deliver online courses overnight to ensure continuity, even though students do not have decent internet connectivity or bandwidth. Fortunately, ACALEs adapt their courses at every moment based on the information about the learner profile (learning style, cognitive style, cognitive state) and the learner context (connectivity, mobility, etc.) [21]. Adaptivity deals with taking learners’ situations, educational needs and personal characteristics into consideration in generating appropriately designed learning experiences & content [34]. However, one major challenge is the creation of suitable learning materials which adapt to the context [10, 23, 42].

1.1 Problem statement

Switching to audio-only content by ACALEs sometimes is a must when the circumstances are not appropriate for conventional multimedia learning namely: (1) Learner characteristics such as illiteracy, learners originating from oral cultures [40] or verbal learning style [24]; (2) Connectivity issues such as no internet or intermittent connectivity or network does not support multimedia; (3) Smartphone accessibility where learners resort to dumb/basic phone since they cannot afford expensive phones.

Hence, it is important for both theorists and practitioners to understand how to design audio-only content for ACALEs as a substitute for multimedia instructional design. They need prescriptions on how to effectively design instructional audio to compensate for the multimedia instructions and enhance teaching and learning in various settings.

Based on the systematic literature review on effective audio instructional design characteristics by [1], there was a deafening absence of a systematic, holistic and practical instructional audio design guidelines in the existing literature. This is also confirmed by the work of [4, 5, 11, 30] whereby no sustained scholarly attention was given to instructional audio to enhance students’ learning experience. Practical guidelines on auditory information design are unavailable, leaving many instructional designers to rely on their experience when designing auditory information. The authors in [1] derived a list of twenty-three (23) effective audio-only instructions characteristics regrouped under Organisation, Content, Retention, Auditory and Technology (OCRAT) heading from their systematic literature review as per Table 1 in Appendix. However, these guidelines were not empirically tested. Hence, this leads to defining our research question as follows:

RQ: Are the proposed OCRAT audio-only instructional design guidelines empirically valid to be prescribed in ACALEs for enhanced teaching and learning?

1.2 Motivation

The motivation driving our research is to provide a groundwork on which interactive audio courses can be methodically developed to suit the ACALEs framework. There is a need to design auditory learning contents to adapt to context-aware learning with respect to the personal, spatio-temporal and environmental parameters when the context is not appropriate for multimedia learning. Besides, educators need to seek guidance on how to adapt their content to audio-only medium to maintain the teaching and learning effectiveness when the context is constrained to a low-bandwidth network.

The purpose of this paper is to empirically validate these derived guidelines and provide a prescriptive audio-only instructional design framework to designers to guide the development of audio content creation in ACALEs environment. Audio contents were created for ten (10) courses having forty (40) lessons/chapters using these guidelines and live tested on a group of 225 and 180 literate and semi-literate populations both in Mauritius and India. Survey data were collected and empirically validated using inferences on population proportions based on the binomial distribution.

1.3 Paper organization

This paper is organized as follows. Section 2 provides a detailed description of the methodology adopted for live testing the OCRAT guidelines in terms of population identifications, design of audio courses and performing the survey. Section 3 validates the guidelines using inferences on population proportions based on the binomial distribution. Section 4 provides a detailed description of the validated guidelines substantiated by additional literature. Finally, Sect. 5 provides the conclusion.

2 Methodology

This section describes the research approach and the corresponding methodology which was adopted to empirically test and validate the proposed OCRAT framework. It responds to the research question by confirming the determinants influencing audio-only instructional design in ACALEs. It discusses the research context and setting, the targeted populations, the testing procedures, the ACALEs platform used, surveys administered and data analysis methods that have been used.

2.1 Testing the framework requirements

The methodology used to test the framework was as follows:

  1. 1.

    Design audio course materials according to the proposed OCRAT audio instructional guidelines on two identified domains, namely Agriculture and Fisheries;

  2. 2.

    Identify semi-literate and literate population groups for testing the audio-only courses;

  3. 3.

    Live testing of the audio courses on the identified population groups via an ACALEs system;

  4. 4.

    Survey administration to the participants and collection of their views on the efficacy of the proposed framework;

  5. 5.

    Validating the guidelines of the OCRAT framework using inferences on population proportions based on the binomial distribution

2.2 Population identification for testing the framework

To validate the framework and its underlying hypotheses, real-life testing had to be done. In education, it is sometimes difficult to obtain a representative sample [33]. For our test cases, we targeted both the literate and low-literate populations for a wider universal acceptance. Population groups had to be identified. To demonstrate the flexibility and universality of the framework and its accompanying guidelines, we did not limit ourselves to only one population group from a specific sector but targeted several population groups. Planters and fishermen are usually less educated and they were targeted as representative of the low-literate population. University students are those who have excelled in the academic process and hence were chosen as representative of the literate population. Therefore, our testbed considered planters and university students both in Mauritius and the state of Uttar Pradesh (UP), India, and fishermen in Mauritius as illustrated in Fig. 1.

Fig. 1
figure 1

Mobi-MOOC Platform

2.2.1 Low-literate population

Identification of semi-literate planters which are representatives of the population was done with the help of the extension services both in Mauritius and India to ensure randomness in the sample. Fishermen and planters identification was done with the help of the Fisheries Training and Extension Centre (FiTEC) division and Food and Agricultural Research Extension and Institute (FAREI), respectively, in Mauritius. In India, planters were identified through “Kendras” (Extension services of Uttar Pradesh (UP) and Kanpur). The list of planters & fishermen classified in various regions was used. A call for participation was made three (3) times from the list to act as volunteers for the live testing. Those who agreed formed part of the sample. We then counter-verified the sample to check for any anomalies and see to it that it was representative of the various regions across Mauritius and the state of UP, India. The registration of participants was done manually.

2.2.2 Literate population

In India, several agricultural colleges and universities were contacted for participation through either mass emails, leaflets, or noticeboards. In Mauritius, students from the Author’s university were targeted through mass emails and noticeboards. Those interested were asked to register themselves directly through the web portal of our system.

2.3 Audio lesson design

Audio content creation was a challenge for this project due to the lack of systematic practical guidelines for audio-only courses. To meet the demands/expectations of the selected population, the help of the respective extension services (FAREI, FITEC, and the state of Uttar Pradesh (UP) extension services) were solicited to identify courses which were of interest and relevant to the planters and fishermen for better engagement and participation. On the other hand, students were asked to follow the same course as the fishermen and planters. Field experts for each course were identified to work on the content as per the proposed OCRAT audio instructional guidelines by [1]. A small workshop on how to develop audio courses based on OCRAT was conducted with the experts so that they can deliver the contents in the required format.

Courses had to be designed to maximize the learning experience through the auditory medium only. They were designed to captivate the learners’ attention for an enhanced learning experience. The local language was used for course recordings, i.e., Creole for Mauritian planters, fishermen and students and Hindi for Indian planters and students. The following pedagogical features were considered: (a) human attention span, (b) learner cognitive capacity (c) content and, d) audio recording and voice quality. The course was split into seven (7) chapters based on Miller’s Law [38] stating that people can retain 7 ± 2 objects in working memory. This catered for the learners’ pace of learning through the navigation capabilities provided like next, previous and pause buttons. To promote discrimination of new concepts, end-segment questions were included at the end of each chapter [25]. The lessons were designed to be of an average of 120 s, duration to fit the human attention span [46]. This is confirmed by [26] stating that students are much more engaged within the first three minutes of video lectures. The course development was a tedious process which took around three months (part-time). Many revisions were made to align it with the proposed instructional guidelines. The content was designed to be relevant and concise to fit within the human attention span. It incorporated clearly defined and realistic objectives as per Keller [28] and Bloom’s Taxonomy [12] for higher-level thinking. The contents provided were vetted by at least three (3) extension officers before the audio recording of the lessons. It is to be noted that courses with homogeneous topics were devised with approximately the same difficulty level.

Once the content was finalized, it was time for recording. This was another important process, and we followed the auditory considerations proposed by the OCRAT guidelines. No background music was included so as not to distract the audience. A professional voice artist was engaged and was briefed on proper voice modulation in a conversational style with an emphasis on keywords. To avoid monotony, the voice used was modulated in terms of its pace, pitch and volume. To maximize retention of the information presented, we used proper pacing and appropriate pauses in-between key pieces of information [8]. It is to be noted that a total of ten (10) courses consisting of 40 audio lessons having a total duration of about 3 h 13 min. were developed using the proposed OCRAT guidelines (Authors, 2020) as shown in Table 2.

2.4 Courses delivered

Table 3 in Appendix shows the various courses held as case scenarios under the Audio-MOOC systems implemented by the authors [1] and their number of registered participants.

2.5 ACALEs platform used

The ACALEs platform used for this study was developed by [1] and termed as the Mobi-MOOC platform geared toward adapting content according to connectivity bandwidth, illustrated in Fig. 1. For instance, during good connectivity, multimedia content (full video stream) is delivered. As connectivity bandwidth diminishes, video quality is reduced. In the event of very poor quality or intermittent connectivity, content is downgraded to PowerPoint recordings and ultimately to audio-only content in cases where multimedia content cannot be delivered. Besides, it also provides the learning over phone feature in case the internet is not available. The Mobi-MOOC platform (https://mobi.mookit.co/) is a mobile-based contextually aware system developed by our research group at IIT Kanpur Media Lab.

For this study, the platform was restricted to delivering only audio lessons. A learner is welcomed by an Interactive Voice Response (IVR) menu where he can choose a particular audio course. Upon selection, the course is played. The system is dotted with navigation capabilities such as play, pause, forward, backward, next chapter, etc., to cater for learners’ pace of learning. Besides, assessments such as quizzes and tests can be done via the mobile itself. All these are exclusively done over the audio medium via either a smartphone for the literate population or via basic phone calls for the illiterate population. There is an in-built analytics system which tracks the learners’ progress on the course. Hence, this platform has been deemed ideal for hosting and dispensing audio courses developed using the OCRAT framework.

2.6 Survey design

2.6.1 Survey goals

At the beginning of this research, there were no clear-cut guidelines due to a lack of empirically tested research literature to address our main research question formulated in Sect. 1.1. In the absence of a given direction, in-depth research was made to provide useful guidelines for the framework. The OCRAT framework was developed based on the guidelines and assumptions derived through a systematic research review by the Authors [1] without an empirical test. Hence the aims and objectives of this survey are to validate these proposed initial guidelines and assumptions empirically.

2.6.2 Survey population

The survey population was derived from the identified population for testing the framework as elaborated in Sect. 2.2. It consisted of planters and university students both in India and Mauritius and fishermen in Mauritius who followed either fully or partly the Audio-MOOC courses on the system. Table 4 shows the different groups of participants who registered and completed an Audio-MOOC course. All those who followed the course were asked to complete the survey, and the %completion for the survey is as per Table 4.

2.6.3 Survey methodology

Due to the nature of the information we wanted to collect, especially on the learners' opinions about the way the audio course was designed and delivered on the Audio-MOOC platform, a cross-sectional survey was favored. Figure 2 shows diagrammatically the repartition of the sampled population into literate and low-literate groups as well as the methodology adopted to survey them. Face-to-face/ group interviews were conducted with the low-literate population. Those who were absent from the face-to-face interaction were surveyed through the phone. The literate population was sent a survey link to be filled online.

Fig. 2
figure 2

Survey Sample and Methodology adopted

2.6.4 Survey questionnaire design

To confirm the acceptability, usefulness and effectiveness of the OCRAT guidelines and the hypotheses made, we designed a questionnaire for our main research question. The survey questions are as per Table 6 and they are cross-referenced with the OCRAT guidelines in Table 1. It was refined and vetted by several experts in various fields as further elaborated in Sect. 2.6.5. The survey questionnaire was adapted based on the literacy level of the targeted population as described below.

2.6.4.1 Likert-scale range adapted according to population literacy

It is to be noted that depending on the population target group, the Likert scale varied. For planters and fishermen having low literacy, a Likert scale of 3 was used, because of the difficulties in making the participants distinguish between the items, while 5 was used for the student population surveyed. Emphasis was on group data rather than individual data. Besides, using more points than the group could understand might have resulted in increased variability, but not necessarily increased validity or reliability [9, 18].

2.6.4.2 Language used adapted according to population literacy

English was used for the literate population group. However, for the low-literate group, native languages were used: Creole for the Mauritian population (Creole-speaking group) and Hindi for the Indian population (Hindi-speaking group).

2.6.5 Expert reviews

For this research work, the expert advice in the following domains was solicited:

  1. 1.

    Instructional Design: The OCRAT instructional design guidelines by the authors [1] were reviewed by two (2) multimedia instructional experts from the Centre for Innovative and Lifelong Learning (CILL) at the Author’s university. They proposed some minor amendments, but overall they found it novel, comprehensive and fit for the purpose of audio courses.

  2. 2.

    Surveys and Statistical Analysis: One (1) expert in surveys and statistics from the Author’s university was asked to review and refine the questionnaire.

2.6.6 Survey administration

As discussed in Sect. 2.2, the methodology used to survey the learners was according to their respective literacy groups. The survey was administered to all participants who followed the course. Online surveys were privileged for the literate, while group, face-to-face and phone interviews were conducted for the low-literate population as further described in the following.

2.6.6.1 Literate population

For the student population, we used online surveys (in the English Language) which were quicker and easier. It turned out that around 80% of the student population completed the survey.

2.6.6.2 Low-literate population

For planters and fishermen, since they were low-literate, we conducted group/individual interviews as well as phone surveys in their local language. The latter was a much more time-consuming process but rather effective. After each course, we arranged for a group meeting with the planters and fishermen through their respective extension services. With the help of extension officers, the interviews were conducted in groups or individually depending on the students. Each question was spelled out using the local language to the group and the participants were asked to choose their appropriate options. Those who could not follow the pace or required special attention were individually assisted. Phone surveys were conducted for those who could not attend the interview sessions. Survey responses were around 90%, better than the student population. The individual attention and phone surveys made the difference. Those who did not respond were those who were absent, unavailable via phone or busy with their work schedule.

2.6.7 Data analysis

Data collected were analyzed using IBM SPSSFootnote 1 (version 20.0.0) and Kibana.Footnote 2 Respective statistical tests that best fit the data were applied depending on the research design and the type of data collected for coherence.

2.6.8 Validity of the instruments

To assess the robustness of the study, we provide evidence to show that the instruments used to measure the variables are valid and reliable, i.e., it measures what it is supposed to measure and does so with a suitable degree of accuracy [19]. The validity and reliability of the instrument are reported for each of the analyses done. The survey constructs were devised to answer our research questions. Content validity was established based on the opinion of experts in the field of instructional design and statistics as explained in Sect. 2.6.5. The internal consistency and reliability of the questionnaire were evaluated. The Cronbach's value was 0.821 which is considered acceptable since it is above 0.7.

3 Validation of the OCRAT instructional design guidelines

This section validates the OCRAT audio instructional design guidelines by analyzing the survey responses collected to answer our research question using the basis of the population proportion. The results proved to be very conclusive, and these will provide a strong foundation for the development of audio courses in the future.

3.1 Methodology

Data were standardized to values pertaining to a range between zero and one. The inference was made on the basis of the population proportion. Figure 3 depicts both the statistical hypothesis and how Likert-scale data were standardized.

Fig. 3
figure 3

Diagram illustrating both hypothesis and standardization

The following two (2) statistical hypotheses are as given below. The first hypothesis is for the Likert scale question items, while the second hypothesis is for question items having fixed answers such as “just fine”, “too short” or “too long”. It is to be noted that we made an inferential assumption as follows: Since the sample sizes from all the different populations were large (at least 30), we approximated the binomial distribution using the normal distribution for our inferences.

3.1.1 Hypothesis for Likert scale items

H0: \({p}_{0}=\frac{1}{2},\)

Ha: \({p}_{0}>\frac{1}{2},\) where \({p}_{0}=\) the population proportion who are agreeable to Likert-scale items tested.

The reject criteria for the null hypothesis of the right-tailed test depended on Formula 1 with data from students, fishermen and planters both from Mauritius and India combined. Equivalently the OCRAT guidelines are considered significant if

$$ x_{obs} > { }\frac{{n + \rlap{--} z_{\alpha \% } \sqrt n }}{2}, $$
(1)

where \(\rlap{--} z_{\alpha \% }\) is the critical value at either 5% or 10% and \(n\) is the population size.

3.1.2 Hypothesis for items with specific answer choices.

The second hypothesis depicts questions with specific answer choices such as “just fine” or “not fine” (“too short” or “too long”). They were hypothesized using a two-tailed test given by.

H0: p0 = \(\frac{1}{2}\) (just fine),

Ha: p0\(\frac{1}{2}\) (not fine [too long + too short]).

The rejection criteria for the null hypothesis, i.e., questions with specific answer choices were significantly “not fine”, were based on Formula 2:

$$ x_{obs} > np_{o} + \rlap{--} z_{\alpha /2\% } \sqrt {np_{0} q_{0} } {\text{or}} x_{obs} < np_{o} - \rlap{--} z_{\alpha /2\% } \sqrt {np_{0} q_{0} } , $$
(2)

Where p0 = the proportion of the population having “just fine” as answers,

q0 = the proportion of the population having “not fine” as answers, i.e. “too short” or “too long”.

Table 5 shows the threshold or Critical Value (CV) for each population group at 5% and 10% significance levels for a one-tailed test. Furthermore, these thresholds at 5% significance level are the corresponding thresholds for a two-tailed test using a 10% significance level.

Table 6 shows the survey questions asked mapped with the OCRAT guidelines. The results are shown in the form of total frequency. An item highlighted in red signifies that it does not meet the threshold.

The combined survey results from all populations as shown in the total column in Table 6 confirm that our assumptions on the OCRAT audio instructional design guidelines hold true since all the frequencies were above the threshold of 5% significance level except for one item. Both Mauritian and Indian students did not find the course engaging and motivating. This is most probably because they were IT students and the course was on agriculture and did not relate to their areas of interest.

3.2 Limitations

Our research used a mixed-method approach to survey the population. The results obtained should be interpreted with care because of some limitations in both approaches.

3.2.1 Quantitative approach

  • This research makes use of hypothesis testing to validate initially assumed guidelines on instructional audio. While interpreting the results, care should be taken in terms of the limitations or assumptions and interpretations or conclusions derived from the hypothesis testing at a reasonable \(\alpha \%\) significance level. Results coming from the significance tests are based on probabilities and hence cannot be expressed with full certainty. Even though our testing has been done on a significant sample size, the statistical inferences are subject to either a Type I or a Type II error.

  • The strength of a statistical measurement relies on the randomness of the sample size. To avoid any bias from our side, it was the responsibility of the extension services to identify the targeted population. We counter-verified the sample and their methodology for selection. Hence, the validity of the sample relies on the extent to which the sample selection was rigorous.

  • A major part of the survey questionnaire uses the Likert scale to collect opinions. It may suffer from response biases, which respondents display independent of the content of the questions. Besides, there is a long-running issue on how to consider the data collected whether continuous or discrete [14, 41]. In this research, we have used inferences on population proportions based on the binomial distribution.

  • Since the target population focuses on farmers, fishermen and the Author’s university students, care must be taken while generalizing the results to other sectors or institutions.

3.2.2 Qualitative approach

  • The data collected from the interviews have been processed and analyzed through categorization and frequency analysis. Qualitative analysis through observations would have been too time-consuming. Nevertheless, through face-to-face and group interviews on the open-ended questions, we managed to grasp the participants’ opinions on the system which we corroborate with the previous quantitative analysis.

3.2.3 Population literacy assumption

  • We relied solely on the information provided by the extension services to determine whether a participant was low-literate or not. Also, we assumed that the University students were literate. Ideally, it would be good to have a pre-treatment test to assess literacy and language comprehension.

4 Discussion on the validated OCRAT framework

As a follow up on the work of [1] and based on the empirical testing effected in Sect. 3 leading to conclusive results, this section provides a detailed discussion on the findings and components of the OCRAT framework which can eventually guide instructional designers in developing audio-only courses. The validated design strategies for audio-only learning aim at fulfilling the research gap pertaining to a more holistic and prescriptive guidelines for audio content creation, especially in DE. Each instructional guideline under the OCRAT headings is further studied and substantiated with additional literature in this section. Reference to Table 6 is made for the percent (%) results from the semi-literate and literate populations.

4.1 Organization

Organization emphasizes the planning aspects of the instructional materials to provide a simple, clear and precise direction to the learner in his learning process. It regroups the following guidelines discussed below:

  1. 1.

    Planning: We considered planning an important component in the audio design. This is instrumental in instructional design models like the ADDIE model [39] and [53]. Key characteristics like knowing the audience and their skills level, specific aims and objectives of the program, proper content and structure, identification of the audio form and format to be used and any supporting materials to be included were identified from the inception. All these have contributed to students responding favourably to the survey (94% semi-literate, 85% literate group, 94% for both Creole- and 93% Hindi-speaking group, respectively).

  2. 2.

    Clear learning objectives and outcomes: Having clearly defined objectives and learning outcomes at the very start provides learners with a sense of direction, ownership and responsibility for their learning, guiding their instructional experience [28]. We based our ourselves on [35] to design our learning objective to include the following three components, namely Performance, Conditions and Criteria and aligned these learning objectives with Bloom’s taxonomy to allow the presentation of ideas and concepts at many different levels. Indeed, these components seemed to have a positive impact on both the semi-literate (96%) and literate (87%) and students. The same was noticed for Creole (96%)- and Hindi-speaking (95%) groups, respectively

  3. 3.

    Segmentation: According to Miller's Theory [38], short-term memory is characterized by limited capacity and proposed that information should be organized into units or “chunks” (7 ± 2 chunks) that an individual can hold in consciousness at a given point in time. This is reaffirmed by several authors[36, 44, 50] segmenting principle. The design of our audio course was broken down into 5 to 9 chunks. Indeed, this proved to positively influence students learning abilities (Semi-literate—92%, literate—86%, Creole-speaking—93%, Hindi-speaking—88%). This goes in line with [37] research stating that segmenting may have stronger effects for learners with low rather than high working memory capacity and low-achieving rather than high-achieving learners.

  4. 4.

    Structure and sequence content: Once learning objectives and content have been segmented into achievable smaller units, we structured them in a logical sequence that makes sense as recommended by the major Instructional System Design models[22, 39, 53]. As more complexity is built up, it is important to relate new materials to the previously learned ones (make a recap) and how it is linked to the next lessons. This provides a structure and flow to the content which learners can easily relate to. We also used MNEMONICs, discussed in subsection 1.4 to structure contents for easy retention. This explains the positive response obtained (94% semi-literate, 90% literate, 95% Creole-speaking & 93% Hindi-speaking).

  5. 5.

    Auditory cues/ Verbal Map: Thomas [48] suggests making script writing an important process, in the absence of visual cues. He suggests “writing for the ear”, i.e., presenting the material which can be easily followed and understood by students the first time they hear it. For the audio course design, a verbal map was included to help the audience better follow the course and situate themselves in their learning process [6, 22, 30, 37]. Frequent signposts were introduced to help students locate where they were, where they are and where they are heading. This allows students to easily schematize and visualize the course content without too much mental effort as confirmed by the response (92%—semi-literate, 80%—literate, 93%—Creole-speaking, 88%—Hindi-speaking).

4.2 Content

Content suggests ways to plan and design audio content for a more effective learning and retention, engaging and motivating the learner while minimizing the mental effort.

  1. 1.

    Clear and simple language: A clear, simple and local language without technical jargon, as advocated by [15, 30], was used to facilitate auditory processing so that learners need not put extra mental effort in comprehending the meaning of the message. This was confirmed by the positive response obtained from the survey (96%—semi-literate, 84%—literate, 97%—Creole-speaking, 95%—Hindi-speaking).

  2. 2.

    Concise Information: The psychology of learning sums up to ‘less is more’ [15]. The weeding concept introduced by [36] was used as a load-reduction technique to eliminate extraneous materials and reduce incidental processing in the design of audio materials. They were concise, coherent and coordinated for effective learning to take place as confirmed by the students (87%—semi-literate, 80%—literate, 93%—Creole-speaking and 74%—Hindi-speaking).

  3. 3.

    Engage and motivate: Strategies of Keller’s ARCS model [28] of motivation were adapted to the development of engaging audio materials. We focused on well-organized, relevant materials with achievable goals and having a gradual natural flow from known to unknown as successful motivational strategies to engage the learners throughout the course, making them realize that they are really expanding their knowledge base. Some 95% of the semi-literate group, 96% of Creole- and 93% of Hindi-speaking found the course engaging and motivating compared to only 56% of the literate population. Nevertheless, though the literate group did not pass the 5% significant test, they crossed the 10% significance. This was because the courses on agriculture and fisheries were not as per the learning objectives of the university students.

  4. 4.

    Relevant & accurate: Contents were designed to portray the perceived present worth and future usefulness of knowledge to be acquired. Relevant and accurate information was contextualized to make them learn faster [8, 28]. This strategy paid off for the semi-literate (94%), Creole (96%)- and Hindi (93%)-speaking groups agreeable. However, only 68% of the literate population found it relevant since the courses were not aligned with their academic objectives similar to point 8 above.

4.3 Retention

Retention suggests ways to maximize content retention, through recall and remembrance, from the learner.

  1. 1.

    End-segments Questions/Practice: Since Echoic Memory, being the only sensory perception used in audio learning, is limited and information decays rapidly, end-segment questions were included to allow for better information retention and cognitive processing. Questions encourage deeper processing, assess student recall and understanding, provide a basis for lesson sequencing, keep the student attentive and promote discrimination of new concepts [25]. This was confirmed favorably by the survey results (94%—semi-literate, 85%—literate, 94%—Creole-speaking, 93%—Hindi-speaking).

  2. 2.

    Assessment: Assessment is an integral part of instruction as it determines whether or not the learning outcomes and objectives have been met [2, 6]. Well-designed assessment can encourage active learning [17] and focuses on the opportunities to develop students' ability to evaluate themselves, make judgments about their own performance and improve upon it. It reinforces the major course objectives and reactivates newly formed schemas [29]. The inclusion of a test at the end of the course was well accepted by the students (94%—semi-literate, 84%—literate, 93%—Creole-speaking, 95%—Hindi-speaking).

  3. 3.

    Rehearsal/ playback: Repetition is the most intuitive principle of learning. We designed instruction in segments which can be easily integrated in our ACALEs platform. These segments could be replayed or navigated over by technology, allowing the learner to rehearse via technology, offloading working memory processing without the need to use the phonological loop and subvocal rehearsal [4] and facilitating recall. Moreover, playback allows for remediation where learners did not meet the necessary assessment standards. It facilitates content recognition and cued recall. This feature was heavily used as per our tracking analytics and confirmed by the survey (93%—semi-literate and 84%—literate, 94%—Creole-speaking, 91%—Hindi-speaking).

  4. 4.

    Use of worked examples: A worked example is a step-by-step demonstration of how to perform a task or how to solve a problem [16]. Students learn more robustly and efficiently, reducing learning time, especially when learning new tasks [15] since it reduces extraneous load, freeing working memory to allocate resources to the generative learning process, thereby allowing the learner to learn the steps in problem-solving. This is indeed confirmed by the survey results, engaging and motivating students in more self-explanation than they do during problem-solving (80%—semi-literate and 71%—literate, 86%—Creole-speaking, 67%—Hindi-speaking).

  5. 5.

    MNEMONICS: We also made use of MNEMONICS, which is a learning strategy used for encoding information to make it more memorable and facilitate information retrieval, summarized as 3Rs (Recoding, Relating and Retrieving) according to Levin (1993). They are used at the initial stage of knowledge acquisition in learning new, abstract and/or complex concepts. Besides they also force one to pay attention to relevant features of the material, and to ‘process’ the material more deeply than by simply rehearsing it as confirmed by the survey (91%—semi-literate and 82%—literate, 94%—Creole-speaking, 83%—Hindi-speaking).

  6. 6.

    Feedback (Assessment, Q&A): Feedback is typically described as a guidance technique [43]. Different types of feedback provide different kinds of instructional support and influence performance. Feedback given as part of formative assessment helps learners become aware of any gaps that exist between their desired goal and their current knowledge, understanding, or skill and guides them through actions necessary to obtain the goal [49]. We included a Q&A session over the audio itself whereby a tutor could respond to students’ queries. This was well appreciated by the students (94%—semi-literate, 82%—literate population, 94%—Creole-speaking, 93%—Hindi-speaking) stating that the Q&A session enhanced their understanding of the topic and eliminated misconceptions.

4.4 Auditory consideration

Auditory consideration provides guidelines for the effective use of sound in devising audio learning materials to engage and align with the learners’ cognitive consideration and avoid unnecessary distractions in their learning process.

  1. 1.

    Avoid environmental sound: Background music and environmental sounds create unnecessary cognitive load and distract from, rather than increase, learning [3, 15]. Techniques, such as Music, beeps or applause and other ear candies, which entertain rather than educate were avoided so as not to unintentionally degrade the learning experience. For the development of our audio courses, we ensured that our recording environment was quiet to avoid noise. Post-recording editing was effected for clear and noise-free instruction to avoid distraction and enhance concentration with more focused attention. This indeed was confirmed by the students as an important recording criteria (92%—semi-literate, 86%—literate, 93%—Creole-speaking, 90%—Hindi-speaking).

  2. 2.

    Voice Modulation: People learn better when words are spoken in a standard-accented human voice than in a machine voice or foreign-accented human voice. The human voice is intended to prime a sense of social presence in learners [37]. People learn better and make their learning experience more personal when the words are in conversational style rather than formal style, feeling like a dialogue instead of a lecture. Voice emphasis on keywords captivates students’ attention to the important materials in the lesson and gives indications about how it is organized. It provides signaling cues [36], which can reduce extraneous processing by allowing the learner to concentrate on the relevant aspect of the course. For the development of the audio courses, a professional voice-over artist was hired. This strategy paid off and is reflected in the positive responses obtained (96%—semi-literate, 85%—literate, 98%—Creole-speaking, 93%—Hindi-speaking).

  3. 3.

    Speed and pause: The speed of the auditory information presented affects its effective processing ability [8]. If presented very quickly, it is difficult to register while if presented slowly, the learner can sometimes forget bits and lose the point of what is said. For our audio courses, we asked the voice artist to account for speed and pause in its delivery. Pauses between key pieces of information help the learner remember each piece presented. It gives time to the working memory to register and digest information that has been presented and allows auditory processing to schematize the information in long-term memory. The learner also has the time and capacity to organize and integrate information before the next segment [36]. This feature was well appreciated by the learners (94%—semi-literate, 85%—literate, 95%—Creole-speaking, 93%—Hindi-speaking).

  4. 4.

    Audio duration and attention Span: To our knowledge, there has not been no study on the ideal duration of an audio lecture. However, some closely related works are related to videos and human attention span. Research by [20] uncovered that adults can only sustain attention for about 20 min. From podcast literature, several authors recommend podcast duration to be within 5–10 min and not exceeding 15 min for ease of use and to hold the learner’s attention [13, 47, 50]; observed a pattern in which the first spike in reported attention lapses occurred just 30 s into a lecture segment, likely reflecting the same “settling-in” period of disruption; [26] present how production decisions affect student engagement by analyzing 6.9 million online educational videos on MOOCs. They found that students were much more engaged within the first three (3) minutes of video lectures and mean engagement time dropped significantly after 6 min, regardless of total video length. For our audio courses, each audio lesson/segment had a duration of between 1–2 min. Students in their response found it ideal for their learning since the duration was within their attention span (90%—literate, 81%—semi-literate, 91%—Creole-speaking, 88%—Hindi-speaking).

  5. 5.

    Audio quality: Audio quality impacts directly the concentration and motivation of learners. Good audio quality is nice to hear and motivates a learner to continue and focus on the content [3]. Otherwise, poor sound quality impacts the professionalism of the course which may ultimately lead to questioning the trustworthiness of the material. Moreover, it might be more difficult for students to understand the material being presented and reduce their level of interest and participation. Our audio course content was developed using studio-quality and professional editing and post-production software and we hired a professional voice-over artist for recording for better voice modulation. This feature was well appreciated by the learners (95%—semi-literate, 87%—Literate, 96%—Creole-speaking, 93%—Hindi-speaking).

4.5 Technology

One of the drawbacks of conventional audio learning is the lack of interactivity with the material since it is usually considered uni-directional without control over the pace of learning, and does not cater for individual differences. Nevertheless, technology is often expected to facilitate knowledge construction [37, 45]. It is a key factor in capturing and maintaining learners’ attention, promoting learning and evaluating performance. Below are the technology strategies adopted in our ACALEs platform for building an effective audio learning environment:

  1. 1.

    Interactivity: Interactivity in learning is a necessary and fundamental mechanism for knowledge acquisition and the development of both cognitive and physical skills. It has a significant impact on the effectiveness of the learning process. It is a key element of the course design process and adds outstanding value to distance education courses and has the potential to engage the learner in behavioral activities [51]. For our audio course, we incorporated [52] five types of interactivity, namely dialoguing (through Q&A and feedback sessions), controlling (ACALEs features to control the pace of learning), manipulating (ACALEs features to control aspects of the presentation), searching (searching and selecting audio lessons options) and navigation (ACALEs navigation capabilities as discussed in Sect. 21). These features improved attention, mechanisms encouraging reflection, reaction and boost motivation, engagement and retention as confirmed by the survey response (96%—semi-literate, 86%—literate, 97%—Creole-speaking, 95%—Hindi-speaking).

  2. 2.

    Navigation (Play, Pause, Stop, Back and Next): Navigation is a discrete form of interactivity [52] and incorporates aspects of learner control namely control over information (pacing) and control over content [27]. Pacing enables the learner to self-pace his/her learning through the start, stop, pause and replay functions while control over content involves selecting information units from menus, allowing learners to access information that is appropriate to their prior knowledge and that they need in order to construct a coherent mental model [27, 43] . We used the ACALEs navigation features like (Play, Pause, Stop, Back, Next, Searching and Menu options) to interact with audio lessons/segments. This was a well-highly used feature as per our tracking analytics and confirmed by the students (96%—semi-literate, 84%—literate, 95%—Creole-speaking, 97%—Hindi-speaking).

  3. 3.

    Universal interface: Technology contributes largely to the effective manipulation of the learning content. Complex technologies with a high learning curve may inhibit and undermine the learning experience, hence demotivating the learner. The ACALEs platform we used, allowed for a simple and universally accepted user interface similar to mobile phone calls. A user dials a number to access audio course contents. We chose this interface to allow both semi-literate and literate users to easily and intuitively use it irrespective of technology-literacy, background and educational level of users. It included fault-tolerant features, catered for errors by providing proper feedback and minimized user-input errors. This explained the positive response concerning the user-friendliness and easy-to-use features of the system by responders (96%—semi-literate, 85%—literate, 96%—Creole-speaking, 95%—Hindi-speaking).

  4. 4.

    Learning Analytics/Tracking: Tracking interaction with content and peers can be valuable information to measure the performance and effectiveness of learning. The inclusion of learning analytics/tracking in our audio course has led to outcomes such as enhancing study performance, retention and course registration, as well as productivity and effectiveness in learning and teaching as observed by the tutors. Reminders were sent to students who were lagging behind and urged them to be more regular in their course. This is in line with the research outcomes of the [54, 55] (93%—semi-literate, 84%—literate, 93%—Creole-speaking, 93%—Hindi-speaking).

4.6 Similar results for Creole- and Hindi-speaking group

Similar results were found for the Creole- and Hindi-speaking with some slight variations, because both groups are from the semi-literate background. This confirms that regardless of the geographic and language disparity, the semi-literate group from both countries behaved and responded similarly to this novel audio learning platform.

4.7 Difference in the results between the literate and the semi-literate population

While looking at the results, it is a noticeable fact that even though both the semi-literate and literate population group accepts positively the audio learning guidelines, there is a disparity of around 10% in the results. This can be explained as follows. The literate group is used to more effectively learning multimedia features being the norm in ACALEs. Downgrading their learning to a single sensory audio medium disrupts their usual learning pattern and necessitates adaptation. Resorting to audio learning is contextual for them in case of network connectivity problems. To the majority, it is not their medium of choice for learning but the second-best alternative.

On the other hand, the semi-literate group perceives audio learning as an opportunity to learn. They have long been digitally excluded from the DE process due to literacy issues, distance and technology. Leveraging the mobile phone call interface to access a repository of audio courses make it intuitive for them to interact with the ACALEs environment without being tech-savvy. Also, dispensing audio courses in their local language eliminates the literacy barriers. The ubiquitous nature of mobile phones makes learning happen anywhere and anytime. It also bridges the distance barrier. All these factors have contributed positively to the survey results.

4.8 Low scores from students (literate population) on relevance & accurate, engage & motivation factors

Based on the scores obtained, it can be observed that students provided a low score on the following two (2) factors: (i) Relevance & accurate (56%) and (ii) Engage & motivate (68%). These scores were indeed expected since the learning materials were not targeted toward this group, as they were asked to follow courses on Agriculture and Fisheries on a voluntary basis. However, it was interesting to note that though the domains presented to them were not in their field of interest and aligned with their academic objectives, both factors passed the 5% and 10% significant test, respectively. This is encouraging and we are of the view that if the course materials were designed explicitly for the students, the scores for the two (2) factors would have been definitely much higher and conclusive which is suggesting new avenues for our research work. These results also demonstrate and confirm the robustness and holistic part of the OCRAT framework which automatically highlights any shortcomings in the list of guidelines for corrective actions.

4.9 Visually impaired students

Visual impairment is a disability where a person has low vision or no sight at all. The International Classification of Diseases (11th revision, 2018) classifies vision impairment into two groups, namely distance and near presenting vision impairment. According to WHO [69], at least 2.2 billion people suffer from some sort of vision impairment. One of the consequences of such disability is access and inclusion to standard education and learning methods. Learners face: (i) difficulties in having access to information, (ii) lack of books in accessible formats, (iii) marginalization or total exclusion from policies, classrooms, library services, websites and online learning systems [70]. Recognising education for all is a fundamental right. SDG 4 ensures inclusive and equitable quality education and promotes lifelong learning opportunities for all. Hence, it is the duty of all stakeholders to provide alternative methods for accessible, affordable and inclusive learning to help develop their learning and abilities.

Literature on inclusive learning methods for visually impaired people abound. However, the common denominator for all the solutions provided is the use of audio whether in audiobooks, audio supported reading (ASR), accessible education material (AER), assistive technology (AT) or instructional design methods. The use of audiobooks as recommended by [57, 59, 64], Audio Supported Reading (ASR) and AER [66], is considered to facilitate their learning and inclusion. According to [57], audiobooks have valuable features such as the ability to replay for learning reinforcement, to listen to an unabridged version while following along in the actual hardcopy of the same book, and note-taking becomes easier. Moreover, audiobooks in distance education provide easy access, low cost and swift alteration of the content when it is necessary. Assistive technologies in classroom and libraries as recommended by [56, 58, 60, 65] allow more active participation in their education process, better communication and exchange of information for more inclusiveness. One of the main features of AT is again audio inputs, outputs and Voice User Interface (VUI) [68]. Another important recommendation is the use of proper instructional design to cater to the learning needs of visually impaired people [61,62,63, 67] fostering a more universal and inclusive approach.

Based on the above, this research comes at the right time to provide a holistic, systematic, scientific and empirical approach to designing instructional audio for enhanced learning effectiveness among the visually impaired population in addition to the low-literate population.

5 Conclusion

Audio learning is different from multimedia learning. The major challenge is to devise effective learning materials over a single sensory medium while acknowledging the fact that the cognitive capacities of the learner are severely limited. In the absence of any systematic and holistic audio instructional guidelines in the existing literature, the authors [1] through a systematic literature review derived 23 guidelines for effective audio design. This work is a continuation of the mentioned prior work and empirically validates these 23 guidelines regrouped under the OCRAT framework. To this effect, 40 audio lessons equivalent to around 3 h on various topics pertaining to agriculture and fisheries were devised based on the OCRAT guidelines. A group of 225 and 180 literate and semi-literate populations both in Mauritius and India were identified. Through a range of ten (10) courses regrouping the respective audio lessons, the OCRAT framework was tested over our ACALEs platform. A survey was conducted on learners who participated and completed the course. From the collected data, empirical testing using inferences on population proportions based on the binomial distribution was conducted to validate the proposed set of guidelines. The results proved to be very conclusive. The discussion section elaborates even further on the validated guidelines and substantiates them with additional literature. It is promising in that it validates the hypotheses we made and henceforth OCRAT audio instructional guidelines can be used as a holistic and effective guide to the development of audio courses not only in ACALEs but on other audio mediums like podcasts, radio programs and others both for the literate, non-literate as well as visually impaired population. This research ultimately empirically answers our main research question and contributes especially to this very focused domain of audio-only instruction design which currently is quasi-absent.

The limitations of this research work are that the following: (i) The testing has been done in the development of courses in the Agriculture and Fisheries domain only, which explains the survey results disparity between semi-literate and literate populations; (ii) The framework has been used to develop courses at lower levels of Bloom's taxonomy. Hence, as future work; (i) We aim to develop more courses in other domains such as Health, Gender Equality, Language learning and others; (ii) We aim to develop and test courses exclusively for the literate population and compare the results with this current research; (iii) We intend to design courses at higher levels of Bloom’s Taxonomy [12] (Analysis, Synthesis and Evaluation) using our framework.