Keywords

1 Introduction

The development of digital technologies has transformed reading practices in current society, with a proliferation of mobile devices (especially tablets) used by children for reading. Of course, e-books offer novel features that paper-based books don’t (e.g. facility to include multimedia elements, narration, hyperlinks, etc.). Although these features are potentially useful (e.g. they may deliver content in a way that might embed content in memory more effectively for the user), these features may also at the same time be distracting, particularly for younger users [2].

In essence, research has broadly focused on examining the pros and cons of using e-books in children for reading. Several studies have looked at a range of areas including: word learning/comprehension [3,4,5], fatigue [6], efficacy with children with disabilities [7], awareness of print/phonological awareness [8, 9], attention and engagement [10], parent interaction [1, 11, 12]. At this point in time, there is no clear consensus on the benefit of e-books vs traditional paper-based books. Indeed, the question is in all likelihood, not a simple one. It is likely that the answers vary on what measures are being examined, what age range the users are, as well as the reading context (e.g. with others or alone) (see [13] for a review of key research questions in the field).

This paper explores one key area that is often overlooked in the field of e-books – that of actual reading fluency and emotional expression in speech when children are reading aloud from different mediums. Much of the research comparing e-books and traditional books has looked at measures of comprehension and recall to examine how children’s understanding of story content varies by medium- with e-books that are usually of a passive rather than interactive nature. However, there is still much to learn about how children attend to and engage with stories from different mediums. This can be achieved through microanalysis of children’s behaviours and speech. Analysis of speech has been particularly neglected because in many previous studies, the children did not read aloud – they were read to by a researcher (or occasionally their mother), read silently or listened to audio narration.

Examining children’s speech whilst reading aloud from different book mediums will tell us much about their engagement with reading material. We know that for example that the pitch range in speech articulated within the English language is critical for marking prosody, which in turn cues the speaker’s intent or affect. As an example, within English, a question is often indicated by rising intonation at the end of a sentence. Similarly, emotionally-rich talk from mothers to their infants is marked by exaggerated pitch contours and a larger pitch range compared to talk directed at other speakers (e.g. [14]). Disfluencies (marked by slower speaking or articulation rate) may also be an indicator of whether the speech is more deliberate, and thereby less emotionally engaging and hence more task-oriented. Both of these areas are useful to explore within the context of developing readers, to determine whether their usage of different media is having a demonstrable impact on their reading style which in turn may reflect the level of engagement with the material.

Here we present a preliminary exploration of whether there would be differences in speech/articulation rate as well as pitch range in speech samples read by children from different media. On the issue of engagement, our previous data from video observations [1] and other research [10] would suggest that children engage more with touch-screen books (an interactive variant of the generic e-book). Hence, we expected that there was more pitch variation in our touch screen than paper book reading with children. On the issue of speech/articulation rate, previous research [10, 15] would suggest that e-books are read more slowly compared to paper-books. On this basis, we may also expect that there would be slower rate within the touch-screen version of the books.

2 Methods

The speech samples used in this paper were a subset of the audio clips extracted from video files used in [1]. We have initially only used 5 participants from this sample (all female and aged between 6 and 7 years of age). The scenario of usage and data collection is well described in [1] but in summary, the situation was one of child reading the books with their mother. It should be noted that within this sample, two children were using a highly interactive touch-screen book (The Fantastic Flying Books of Mr. Morris Lessmore) whereas three of the children were using a less interactive touch-screen book (The Prince’s Bedtime).

2.1 Speech Editing

The audio stream from the video samples provided from the Ross et al. [1] study were extracted from the video files using an ffmpeg command line utility. Short samples (around 5 s) were chosen semi-randomly with the following constraints: (1) that the sample could not be within the first 20 s of the recording; (2) that the sample was free of background noise or overlapping utterance from the parent and (3) that it contained a complete sentence or phrase. The editing into smaller samples was completed using Audacity software.

2.2 Speech Analysis

The shorter speech utterances were analysed using Praat software. The utterances were transcribed into syllables using Praat TextGrids. Total phonation time was computed by subtracting pauses from total utterance time. Articulation rate was measured using number of syllables/total phonation (in seconds). Similarly, speech rate was measured using number of syllables/total utterance time (in seconds). Pitch range was measured over the entire sample, using recommended pitch settings for children Praat (150–600 Hz).

3 Results

As there was a very small subset of the total sample of the data, results are very preliminary and only trends can be noted at this point, rather than be subject to any statistical analyses. Given the small number in this sample, box plots were deemed the most appropriate to explore data rather than mean graphs which would be more susceptible to outliers. We collapsed across the different book titles due to the small sample size. Within the data below, there are some interesting trends to note.

3.1 Pitch Range

Pitch range was measured over the entire sample. In general, there was a trend for the paper-based book to be more variable in pitch range over the sample compared to the touch-screen medium, although overall pitch range medians were not that disparate (Fig. 1).

Fig. 1.
figure 1

Pitch range for child speech whilst reading with paper-based and touch-screen books

3.2 Articulation Rate

Articulation rate was measured over the entire sample. There seemed to be a slight increase in overall rates of articulation for paper-based vs touch screen books, although the touch screen book articulation seemed to be more variable (Fig. 2).

Fig. 2.
figure 2

Articulation rate (syllables per second for phonated utterance) for child speech whilst reading with paper-based and touch-screen books

3.3 Speech Rate

Speech rate was measured over the entire sample. As for articulation rate, there seemed to be a slight increase in median speech rate for the paper-based books compared to touch-screen, but this was coupled by the observation that the touch screen book speech rate was possibly more variable (Fig. 3).

Fig. 3.
figure 3

Speech rate (syllables per second over entire utterance) for child speech whilst reading with paper-based and touch-screen book formats

3.4 Phonation Ratio

Phonation ratio was also measured over the entire sample. Interestingly, here, there did appear to be some evidence of a ceiling effect for paper-based utterances, such that for paper-based books, there was a higher ratio of voiced utterance over the entire sample (median = 0.93 vs. 0.84).

4 Discussion

4.1 Summary of Findings

Contrary to initial expectations, our small sample did not see a large difference in overall pitch range (expressed by approximate median values), but instead there was greater variation in pitch ranges in the paper-based books. This is a curious finding. It is difficult to tell without a more robust (higher sample size) dataset whether this a genuine effect or a spurious finding. If it is a genuine effect, it could be that greater pitch range variability may be reflective of a less consistent reading style in the touch screen modality that reflects the fact that the children are interacting with an electronic device in different ways. This may stem from effects seen in computer-directed speech for example, where it has been found that people are more deliberate in their speech register compared to interaction with humans [16]. Such a more deliberate style may be employed by some children and not others, but more robust analysis would be needed to determine whether this is the source of this effect.

Interestingly, when looking at speech and articulation rate, it also looks like there was possibly increased median articulation and speech rate. However, in this case (in contrast to the pitch data), there also appears to be more variability in the touch screen compared to the paper book mediums, although this is too small a sample size to make any definitive conclusions from this. An increased speech and articulation rate may reflect a more ‘natural’ and fluid style of speaking and hence the touch screen medium may be showing a less fluid rate of speech. The finding of a more variable speech and articulation rate for touch screens may also suggest that children are less consistently employing a fluent style as a result of a more deliberate, computer-directed speech style as described earlier. Another possibility is that distractions from the multimedia and interactive content might affect their reading (i.e. they were doing that task while video content was playing and dealing with task demands from the interactive ‘buttons’). This latter possibility might be more effectively explored by analyzing samples from different touch-screen books with low- and high-level interactivity.

4.2 Limitations and Future Work

In summary, this was an initial foray into exploring children’s reading style with touch-screenbooks. It must be emphasized that these are very preliminary results, with a small sample size (n = 5) that hasn’t yet been possible to analyse with full statistical analyses. Future work will need to be done on the larger dataset to determine whether these trends will hold. It should also be acknowledged that this initial analysis was done on a small selection of short speech samples (5 s approximately) and a more robust approach would include either longer speech samples or a further selection of other utterances from the same speaker at different time points. It should also be noted that our measure of affective engagement here (pitch range) is a rather crude one and that there may be more fruitful avenues of enquiry to look at affect expressed through pitch (e.g. analyzing pitch contours themselves which may be classified by shape as more undulating or flat). Finally, we would need to conduct analyses on a much larger sample size to determine whether there were any differences in the less and highly interactive touchscreen books (the low N in this sample did not allow for any meaningful analysis in this respect). Nonetheless, this is a useful first exploration of the data that encourages us further to explore the vocal indices of reading style within different medium for children.