Introduction

Handwriting is now relatively rare as a means of communication for most adults. However, it remains the dominant writing modality in the vast majority of primary school classrooms. Developing the ability to produce neat, or at least legible, handwriting is therefore important. Handwriting neatness affects subjective ratings of text quality. Studies in which raters assess compositional quality of texts that are neatly or untidily written but are otherwise identical or matched have consistently found that lower, and often substantially lower, ratings are given to the untidy texts (Bull & Stevens, 1979; Chase, 1979; Klein & Taub, 2005, review by Graham et al., 2011). In the majority of educational settings untidy writing will result in teacher criticism. Danna et al. (2016) describe a vicious cycle in which criticism reduces self-esteem which results in handwriting avoidance, which again results in reduced opportunity to improve.

Developing handwriting automaticity is also important. As might be expected, the fluency with which children are able to form letters on the page affects their productivity when composing text (Berninger et al., 1997; Graham et al., 2000), and there is some evidence that it also affects the quality of the resulting composition (Alves et al., 2016). Handwriting ability is, therefore, needed not just for the production of aesthetically pleasing text. It is, in most educational contexts, a necessary precursor to effective written communication.

Researchers exploring handwritten production therefore need tools that allow assessment both of the written product – the neatness or at least accuracy with which letters are formed on the page – and also of the fluency with which the pen moves across the page when these letters are produced. Although accuracy and fluency are likely to be correlated, particularly in young children, developing an understanding of the writing process requires that these are assessed independently: Real-time handwriting data need to be analysed in such a way as to be able to distinguish not just fluent, neat writers from inaccurate disfluent writers, but also writers who are fluent but inaccurate, and those who are disfluent but accurate.

Our aim in the present paper is to describe and illustrate one such tool. We first review existing research-focussed approaches to handwriting assessment. We then give a detailed description of the approach that we have adopted in our own research. By segmenting the handwriting trace into sub-letter features this approach makes possible fine-grained analysis of writers’ ability to control their pen movements. In the final section of the paper, we illustrate the use of the tool with a comparison of children and adults forming single letters.

Research-focussed approaches to handwriting assessment

A range of tools have been developed to meet the needs of researchers and educators in identifying children whose handwriting is unusually poor and therefore requires remediation (see Feder & Majnemer, 2003 and Rosenblum & Weiss, 2006 for reviews). Most of these include measures of both the neatness of the handwritten product – the form of the pen-trace as it appears on the page – and a measure of rate or fluency of the process by which this is produced. In practice, these are inseparable. Product accuracy must be interpreted with reference to how fluently the text was produced, and vice versa. For ease of explanation, however, we will first discuss approaches to product assessment, and then approaches to measuring process.

Assessing the handwritten product

Tools for assessing the handwritten product can be described broadly as either holistic or analytic. Holistic assessment involves raters making a global judgement about the legibility (readability) of a handwriting sample. Ayres (1912) described a legibility measure based on how long it takes to read a text, averaging across several readers. Much more recently Larsen and Hammill (1989) developed the Test of Legible Handwriting based on matching handwriting samples to benchmark exemplars representing different levels of reading ease and neatness. Other measures elicit holistic ratings of legibility or of characteristics that are assumed to impact legibility. The Children’s Handwriting Evaluation Scale (Phelps & Stempel, 1988) rates samples for form, spacing and general appearance. The Handwriting Legibility scale (Barnett et al., 2018) scores texts for legibility and effort-to-read, and raters also provide a single, global score for how well letters are formed, defined as containing all necessary elements, i.e., having appropriate shape, being neatly closed, and being consistent in size and tilt. Several other similar tools exist (e.g., Amundson, 1995; Molfese et al., 2011; Ziviani & Watson-Will, 1998).

Analytic approaches, by contrast, aim at direct measurement, on a letter-by-letter basis, of the degree to which a letter confirms to a neatly-written ideal. Helwig et al., (1976); see also Collins et al. (1980); Jones et al. (1977) described an approach to establishing accuracy when a writer is required to precisely copy model letters. Children copy letters onto paper with four guidelines: a baseline, upper and lower lines to guide maximum and minimum vertical extent, and a midline above the baseline. Inaccuracies are identified, using transparent overlays, where letter components deviate from the model by more than a set tolerance (1, 2 or 3 mm depending on researcher purpose). This provides a binary copying-accuracy measure for each sub-letter unit (is accurate or is not accurate, separately for, for example, the straight line and the curve in a lowercase h).

The strength of this approach, for research contexts, is that in contrast with holistic measures, it provides very precise diagnosis of which features a writer is not able to form precisely. The pen-control deficit of a child who, for example, struggles to keep letter height within bounds is potentially quite different from the deficit associated with producing malformed curves. The disadvantage, however, is that it necessarily requires precise copying not just of the form of a presented letter, but also its dimensions. This provides an overly-specific definition of what constitutes an accurately formed letter, both in terms of form – there is no possibility for variation in allograph – and in size.

The widely used Minnesota Handwriting Test (MHT; Reisman, 1993) also scores a sample of text on a letter-by-letter basis. However, unlike the transparent-overlay method, the sample text is a sentence that participants copy without the requirement to exactly reproduce letter form. In the manuscript version participants are, however, required to print rather than use cursive script to write within three guidelines. For each letter, scorers first determine whether or not it is possible to identify the letter out of context. If the letter passes this test, then it is given a binary score for form (for example whether gaps or overlaps within the letter are all less than 1.6 mm), for position relative to the printed baseline (must be within 1.6 mm), for size (all letter components must be positioned correctly relative to guidelines), and for letter and word spacing. Each legible letter is therefore given a score out of 5 representing its neatness.

The approach adopted by the MHT in scoring letter form is conceptually different from that implemented by the transparent-overlay method. The transparent-overlay method specifies a specific form for each letter, whereas the MHT applies the same neatness criteria across all letters. If the researcher requires by-child neatness scores – and this is the aim of the MHT – then this makes sense. However, if researchers need to know whether a specific letter was formed well, as might be the case for example in an experimental context, then letter-specific form criteria are required. The transparent-overlay method is one way to provide these. It is possible, however, to specify form individually for each letter without constraining size and allowing at least some flexibility in allograph choice. For example, the “criteria for letter formation” provided by Ziviani and Elkins (1984, Table II) specify necessary (but not sufficient) characteristics for each letter. For example, a lowercase m must comprise a “double smooth curve finishing on aligned base”.

Assessing production fluency

Handwriting fluency measures are, broadly, of two different types, delineated by implicit assumptions about the range of processes that are encompassed by the term “handwriting”. Educationally-focussed research that, for example, explores the effects of handwriting ability on the quality of children’s written compositions (e.g., Abbott & Berninger, 1993; Kim & Schatschneider, 2017; Limpo & Alves, 2013; review by Kent & Wanzek, 2016), tends to use tasks that capture a range of skills over-and-above the motor planning and execution necessary to form a letter. All of the studies reviewed by Kent and Wanzek measure fluency by recording the number of characters children wrote in a fixed period of time when recalling the alphabet or when copying a written sentence or paragraph. Both the MHT and the Detailed Assessment of Speed of Handwriting (Barnett et al., 2009), for example, involve copying an unfamiliar sentence that includes all of the letters in the English alphabet (although necessarily with unrepresentative letter and digraph frequencies). Rate of output when performing this task will depend upon motor planning ability and pen control. However, it will also require reading, short-term memory, attention, and orthographic retrieval.

More direct measures of the speed and fluency with which a writer can form known letters – i.e., of those components of the handwriting process that are directly related to planning and controlling pen movement – can be captured by participants writing on a digitising tablet (or, at lower resolution, with a smart pen). This permits a broad range of measures that describe how the pen moves across that page (see review by Danna et al., 2013). At minimum, measuring pen movement, unlike measures that just count characters produced in a fixed period, differentiates between time spent with the pen moving on the page and time spent with the pen lifted or stationary (e.g., Paz-Villagrán et al., 2014; Sumner et al., 2013). Pen lifts or stops will, for example, occur (probably) in sentence copying tasks when the writer is reading the next words to be copied (probably, but see Alamargot et al., 2007).

Measuring pen movement also permits a direct measure of mean pen-tip speed (Khalid et al., 2010; Kushki et al., 2011; Rosenblum et al., 2006; van Galen et al., 1993). For example, van Galen et al. found that 2nd to 4th graders identified by their teachers as having untidy handwriting moved the pen more quickly than peers. Kushki et al. (2011) found that 4th graders showed decreasing vertical velocity but increasing horizontal velocity as they progressed through composing a paragraph. Most obviously, competent adult writers show much faster mean pen-movement speed than beginning writers (writing single words: adults around 80 mm/s, Hepp-Reymond et al., 2009; 6-year-olds, around 10 mm/s, Séraphin-Thibon et al., 2019).

Underlying this variation in speed is the extent to which letter components are formed with smooth single movements. This is illustrated in Fig. 1, which shows the velocity profile and final product for a competent adult producing a lower-case letter h. The upright is formed in a single, ballistic movement. The velocity curve for this feature – the first peak in the speed plot – is smooth, formed by a single acceleration and deceleration. Contrast this with the much less fluent velocity profile for the corresponding feature produced by a child in the lower panel. Whilst the adult produced this feature in a little over 300 ms, the child took over three times longer. This difference in speed and fluency is even more marked in the formation of the curved feature of the h.

Fig. 1
figure 1

Speed (tangential velocity) of pen tip, omitting in-air movement, and the final product for an adult and a beginning writer producing lowercase h. Solid circles are locations where the pen was either stationary or lifted. Unfilled circles represent velocity peaks. Velocity is smoothed with a 10 Hz Butterworth filter

A range of indices have been suggested for measuring pen-tip movement disfluency (Broderick et al., 2009; Khalid et al., 2010; Rosenblum et al., 2006; Smits-Engelsman & Van Galen, 1997). One relatively straightforward approach is to smooth the velocity trace to some extent, as is the case in Fig. 1, and then simply count the number of times that velocity reaches a local maximum (a velocity peak; e.g., Overvelde & Hulstijn, 2011). Average velocity and velocity peak-count are strongly correlated, at least in beginning writers (Fitjar et al., 2021), but velocity peaks are causally prior to slow production: The child in Fig. 1 produced the two components of the h much more slowly than the adult because their pen accelerated and decelerated multiple times.

Two other characteristics of the velocity profiles shown in Fig. 1 are important to note. First, the disfluency in the child’s pen movement was particularly marked when producing the curve. This is to be expected. The motor planning associated with forming a straight line has two degrees of freedom – length and direction. Curves add the need to manage angular change. This adds considerable complexity to both planning and execution (see Morasso & Mussa Ivaldi, 1982, for a computational model and Habas & Cabanis, 2008, for fMRI evidence). Séraphin-Thibon et al. (2019) found that pseudowords composed of letters that contained more curves were written with a larger number of velocity peaks than otherwise-matched pseudowords with fewer curves.

Second, maximal fluency does not mean zero velocity peaks. Drawing a straight line necessarily involves starting and finishing with the pen stationary and so, at minimum, there must be one velocity maximum between these two points. This is the case for the adult writer in Fig. 1. Similar constraints apply to curved features: Edelman and Flash (1987) showed that both open and closed loops (hook, cup and gamma strokes, in their terminology) necessarily involve two velocity peaks, yet are still produced with maximum fluency. This again can be seen in the adult’s formation of the curve (inverted cup) of the h.

Product segmentation for process analysis

Determining the extent to which a specific sample of real-time handwriting data represent fluent production involves, therefore, making a comparison between the velocity profile for the sample and the theoretical maximally-fluent velocity profile for the production of the same text. One approximation to this is simply to make comparisons between groups who have, a priori, been identified as poor or good handwriters on the basis of the neatness of their handwriting (e.g., Di Brina et al., 2008; Rosenblum & Werner, 2006; van Galen et al., 1993). It is also possible to make a priori assumptions about differences in the bandwidth of velocity spikes that constitute disfluency and those that are an essential to fluent production (Danna et al., 2013; Meulenbroek & van Galen, 1986). Danna et al., for example, counted velocity peaks after low-pass filtering of pen-tip speed at 10 Hz (as in Fig. 1) and then subtracted a count of velocity peaks after low-pass filtering at 5 Hz on the grounds that the latter were likely to be a necessary characteristic of competent, fluent production. This permits estimates of pen-movement fluency across extended text.

A more fine-grained approach is to segment letters into standard features that in competent, fluent writers could be produced as a single stroke – i.e. as a pen movement bounded by points where the pen-tip is stationary or near-stationary and / or lifted (e.g., Meulenbroek & van Galen, 1990). These features then provide a basic unit of analysis when comparing pen movement across writers or experimental conditions. This is the approach that we have taken in our discussion of the fluency of production of the letter h shown in Fig. 1. By segmenting the letter h into two features – a straight line and a curve – it was possible to make direct comparison between the adult and child samples.

For an approach based on marking up pen traces into features to be an effective research tool it needs to meet the following criteria:

First, and most obviously, it must be universally applicable: The segment delineation for a specific letter must be applicable across a wide range of different attempts at forming that letter by different writers.

Second, segmentation must be possible on the basis of the written product, without reference to information about how the letter was formed. Automatic segmentation based on process – dividing up letters into components based on units that are composed in single strokes – is possible, of course (see, for example Rosenblum et al., 2006). However, this will identify different segments depending on whether a letter is produced fluently or disfluently. The reason for this can be clearly seen in Fig. 1 by considering the different location of the pen stops and lifts in the adult and child letters. If the purpose of segmenting the letter is to then establish the fluency with which the segments are produced, then the procedure by which segmentation is achieved must itself be independent of fluency.

Third, because there is potential for a trade-off between speed and precision, the segmentation procedure also needs to take some account of the accuracy with which a segment is formed. To compare like-with-like it is necessary to know whether a feature is well shaped and positioned.

Finally, the segmentation procedure must allow for the possibility of variation in allograph. Again, this is illustrated in Fig. 1. Although we have been talking as though the curved component of the h is comparable across the adult and child samples this is, arguably, not the case: In the classification used by Edelman and Flash (1987), the adult forms a cup whereas the child forms a hook. The production demands of these two features may well be different. Segmentation must therefore differentiate between various allographs that represent the same letter but comprise different features. In practice this means that common allographs of the same letter will require their own segment codes, but obviously also that the coding scheme must identify these different allographs as representations of the same letter.

The coding scheme that is the focus of this paper aims to meet these criteria. We describe a formal, though we believe intuitive, schema for segmenting Latin lower and uppercase letters into sub-letter features, and for then determining whether or not the feature is formed and positioned with adequate precision. This develops the “criteria for letter formation” (Ziviani & Elkins, 1984) approach to coding the handwritten trace into a rigorous formalism for segmenting letters into sub-letter units that can then be directly compared in terms of kinematics of their production. In the next section of this paper, we give a detailed description of our letter-segmentation and coding scheme. In the final section we provide evidence for its value in comparing production fluency across writers.

A scheme for letter segmentation and accuracy coding

Our coding scheme specifies, for each letter, a set of rules that (a) segments letters into sub-letter features, based on the shape of the pen trace, (b) provides criteria for deciding whether or not the feature is well formed. These are illustrated, for upper case R, in Table 1 and given in full in the appendix. The descriptions in the appendix describe common allographs of both upper- and lower-case printed letters. This is intended as illustrative rather than definitive and should be adapted by researchers to suit local context and their research needs.

Table 1 Example of coding scheme with specifications for size and position of each feature

Segmentation

Our strategy for identifying sub-letter features within a particular writer’s output depends just on the pen trace – the shape that the writer forms – and does not make reference to how the writer produced the feature. A feature is identified if it corresponds, within specified tolerances, to a feature as defined in our coding scheme (see example in Table 1). However, decisions about what constitutes a feature in a prototypical letter form – the decision, for example, to identify 3 distinct features in R is process-based. In developing the coding scheme, we identified features in a letter as the minimum number of components in an allograph such that, in maximally-fluent handwriting, each feature could be produced with a smooth velocity profile and without the pen either stopping or lifting (i.e., as a single pen stroke). Under this definition the letter C comprises a single feature, T comprises two features, N comprises three features, and so forth. Features may be either straight, as is the case for both features in T, or curved, as in feature R3 of R (see Table 1).

Marking up a specific pen trace into segments – identifying feature boundaries – is, as we have said, independent of the process by which that trace was produced. So, although in a skilled writer a feature will normally start and end with a pen stop or lift, this information is not used when deciding for a particular pen trace where a feature starts and ends. We defined features based on the spatial characteristics of acceptable letter forms – i.e., how the letter appears on the page – and then look for pen trace segments that, alone or combined, match these characteristics. As we discussed above, identifying features independently of how they were produced allows comparison of the kinematics of production of the same features across writers with varying graphomotor ability.

We use MarkWrite v 0.4.9 (Simpson et al., 2021) to segment and code handwriting traces. MarkWrite takes as input data captured in real time from a digitising tablet or, at lower resolution, a smart pen. It requires just that data provide, at minimum, time and coordinates for each pen-location sample. The MarkWrite interface is illustrated in Fig. 2. Sequences of samples that comprise a feature, as defined by the coding scheme, are selected either by cursor movement or by keyboard shortcut. Then the feature is annotated with a feature label and, if it is inaccurate, one or more codes to indicate how it deviates from well-formed.

Fig. 2
figure 2

Screenshot from the MarkWrite program showing selection Feature R1 from a child’s copying an uppercase R. The upper right panel shows change in y-axis location (upper plot) and pen-tip speed (lower plot) over the period when this feature was produced. The black trace in the spatial view is a selected set of samples that represent a single feature (annotated as R1). The grey trace represents in-air movements

Accuracy

Once features are identified, the coding scheme then allows a binary decision about whether or not a handwritten feature was produced accurately. By accurately, we mean the extent to which the pen-trace corresponds to an acceptable representation of the target feature with regards to shape, size and position. In our coding, this decision is made without regard to aesthetics – we define relatively broad criteria for acceptable feature representation – and as with segmenting into features, accuracy coding is agnostic about the kinematics of the feature’s production. Decisions about accuracy (and / or neatness) criteria will depend on research purposes. The criteria we present here are illustrative rather than prescriptive. In our own implementation we applied a general tolerance of 1/6 of letter or feature height or width in determining whether or not features deviated from the shape, proportion or size defined by their allograph. This corresponds approximately to the 1.5 mm tolerance on 9.5 mm ruled paper allowed by the Minnesota Handwriting Test (Reisman, 1993). Our approach differs from the MHT, however, in that we allowed for variation in absolute letter size, and therefore applied proportional rather than absolute tolerance criteria.

In Fig. 3 we illustrate six versions of the letter R, all of which have at least one inaccurate feature.

Fig. 3
figure 3

Different inaccurate versions of the letter R

Shape

Shape accuracy depends on the straightness for straight features and curvedness for curved features. Decisions about straightness were made relative to the feature’s length and without reference to other features in the letter. The rule is that the feature is coded as inaccurate if the pen trace deviates from the straightest path between the ends of the feature with more than 1/6 of the feature length. In Fig. 3 the first R (I), the feature R1 (highlighted in solid black) is inaccurately shaped. Curvedness requires the pen trace to deviate from the straightest path between endpoints with more than 1/6 of feature width without the trajectory crossing itself. Thus, the feature length (measured between endpoints and bottom of the curve) must be at least 1/6 of feature width (measured between the two most extreme points to each side of the endpoints). All the curved features in Fig. 3 are sufficiently curved. In our data, lack of curvedness was generally not a problem.

As shown in Table 1, curves can be either open or closed. Open curves need an opening that is at least 1/6 of feature width. Closed curves can have a gap or overlap between endpoints that corresponds to 1/6 of feature width. For letters with only one option, such as U, a gap smaller than 1/6 of feature width means that the letter is not accurately shaped. Likewise, the letter O can only be written with a closed curve and a gap larger than 1/6 means the letter is not accurately shaped.Footnote 1 For letters with options, such as R2 in Table 1, this distinction has two purposes. First, it makes letter description easier. Second, this is a scheme intended for exploring handwriting fluency and we recognise that other researchers may have an interest in curved features in particular. Although we have not pursued this further at the moment, other researchers might find this useful.

Position

Positioning of features refers to spatial orientation of features as well as gaps and overlaps between features. For open curves spatial orientation refers to the direction of the open end – left, right, upwards and downwards – and is described for each letter. Straight lines can be either vertical, diagonal or horisontal. In this coding scheme, the tolerances for gaps/overlaps are 1/6 of letter height, or feature height in case of curves. In Fig. 3, the top arm of the R2 feature of R (III) does not meet the top of R1 as specified in the table. The horizontal overlap is within the 1/6 tolerance for overlap between features that should meet. The vertical overlap exceeds the 1/6 tolerance and is coded as inaccurate for position. The R3 feature in the same letter does not meet R2 and is therefore coded as inaccurate for position. The R3 in R(IV) and R(II) are not diagonal, slanting bottom to right, and are therefore coded as inaccurate for position. The R2 and R3 in R(VI) are both inaccurately positioned as both are placed to the left of R1.

Size

For a feature to be accurate it also needs to meet relative size criteria. Criteria for size are letter specific. These are described in Appendix 1. To illustrate, in the letter R the vertical straight feature must be proportional to the other two features and vice versa. The rule is that the length of R1 is twice the width of the curve in R2. The length of the curve, R2, must be shorter or similar to length of R1, and the width must half the length of R1. Unless specified the tolerance for size difference is 1/6 of the previously produced feature. In Fig. 3, the curved feature R2_1 in the R(V) is too big in comparison with the previously produced feature, as the width of the curve is almost the same as the length of R1.

Alternative allographs

All letters have several legal allographs, depending on whether it is an upper-case or lower-case, block or cursive version. In addition, some letters have several allographs within these categories. As Fig. 4 shows, the upper-case A is an example of a letter with two allographs; one has two straight features slanting towards each other at the top while in the other the straight features are replaced by one curved feature. The scheme is open-ended and may need to be adapted and augmented in specific contexts.

Fig. 4
figure 4

Two handwritten allographs for the letter A, one has one curved and one straight feature and the other has three straight features

One feature – multiple segments

A feature may be produced with a single stroke (e.g., the h open curve – feature h2 in our coding scheme – produced by the adult in Fig. 1). A feature can also be produced with multiple pen-stops as is the case for the child’s production of the h open curve in Fig. 1. It may be produced in two or more distinct movements. The bar of the T – feature T2 in our coding scheme – would typically be produced by a skilled writer in a single stroke. Figure 5 shows this feature, T2, being produced, by a beginning writer, in two distinct movements. The numbers represent the sequence in which these were generated, and arrows indicate approximate initial direction of pen trajectory. The movement is separated by a pen lift and in-air move, and with the pen moving in different directions to produce each segment. It is not even the case that a feature must be produced with consecutive movements as illustrated with the g in Fig. 5. The feature g1 – in our coding scheme – is produced in three distinct and non-consecutive movements.

Fig. 5
figure 5

Handwritten letters in which features are produced with multiple actions. Different line shades represent pen traces bounded by pen lifts. Numbers represent the sequence in which these were generated. Arrows indicate approximate initial direction of pen trajectory. The feature g1, for example, was, therefore, produced in three separate non-consecutive movements – segments 1, 3 and 5. Data are digitally-sampled pen movements by a Norwegian child who was just starting to learn how to handwrite

Handwriting fluency in adults and beginning handwriters

Our purpose in this section is to illustrate how the segment and coding scheme can be used as the basis for a detailed analysis of the kinematics of single letter production in a sample of very early writers, and of competent adults.

Participants and task

Our child sample comprised 176 Norwegian children tested within the first four weeks of first grade (mean age 6.2 years, 86 girls). Early childhood care and education in Norway (Barnehage / Kindergarten) is attended by 97.6% of 5 year olds (Norwegian Directorate for Education & Training, 2019).Footnote 2 Children in kindergarten do not, however, follow a set curriculum and, in particular there is no requirement to learn handwriting before the start of primary (elementary) school. Many of the children in our sample were, therefore, at the very start of learning how to handwrite. Our adult sample comprised faculty and other staff at a Norwegian university (N = 27, 23 women). We did not record age.

Both children and adults copied the letters A M d h T d g R. Letters were displayed on cards presented one-at-a-time by a researcher.Footnote 3 Participants copied these within pre-printed 2.5 cm square boxes. They were instructed to “write the letter as they saw it”, without a requirement to exactly copy its form.

All participants were asked to write with their dominant hand. Adults then completed the task again, writing with their non-dominant hand. This provided a direct motor-control manipulation, holding all other factors that might affect production fluency constant.

Participants wrote with an inking ballpoint stylus on paper overlayed on a Wacom Intuos XL digitising tablet connected to an HP Elitebook i5 laptop. Pen-tip locations were sampled at intervals of around 7.5 ms (133 Hz) and with a spatial resolution of at least 330 lines/cm. Software for pen-movement capture and analysis was provided by the OpenHandWrite suite of programs (Simpson et al., 2021) which provide a digitising tablet interface for PsychoPy (Peirce et al., 2019).

Data from the child sample are a subset of data previously reported in Fitjar et al., (2021), although the analyses reported in this paper are new. Adults were sampled specifically for this paper.

Processing handwritten data

Pen traces were segmented and coded according to the segmentation and coding scheme presented in the previous section. If copied accurately, using most-common allographs, these 8 letters segment into a total of 20 features. We additionally classified these as either straight or curved. The motor plan for producing a curved line is more complex than a straight line and these different motor plans are reflected in different kinematic profiles (Habas & Cabanis, 2008; Morasso & Mussa Ivaldi, 1982). This means that the effects of graphomotor difficulty, in writers with impaired or not-yet-developed graphomotor ability, are more likely to be exhibited when drawing curves than when drawing straight lines (e.g., Fitjar et al., 2021). The letters for the present task – reproduced with the most common allograph, gave 8 curved and 12 straight features.

The digitised pen traces were first segmented into features, with boundaries at the first visible (non-zero pressure) sample that was part of the pen-trace associated with an identifiable feature. We then calculated tangential velocity (speed) of the pen tip at each sample point and then filtered the resulting velocity timecourse with a 10 Hz 4th order low-pass Butterworth filter. The 10 Hz filter removes measurement noise. We then counted remaining velocity maxima for each feature (see, for example, Khalid et al., 2010; Overvelde & Hulstijn, 2011; Smits-Engelsman et al., 2001).

Results

We present analysis of these data as follows: We describe the distribution of velocity maxima for straight and curved features produced by adults writing with their dominant hand and with their non-dominant hand and children. We then provide examples of fluency and accuracy for participants producing the three features of an upper-case letter R. We finally provide inferential analysis across all stimulus letters and both adult and child samples to determine differential effects of handwriting skill on the production of curved and straight features.

Fluency distributions

Figure 6 shows frequency distributions for the three groups producing straight and curved features. As we suggested in our introduction, modal number of velocity peaks for adults producing straight lines was one, and for producing a curve was two. For straight line, and for many curves, these represent the minimum possible number of velocity peaks. Adults writing with their dominant hand tended, as might be expected, to be maximally fluent. Interestingly, even though the distribution had a longer tail when adults wrote with their non dominant hands, modal number of velocity peaks remained similar in number to those for writing with their dominant hand, and substantially fewer than for our child sample. Given that handwriting with a non-dominant hand is not something that our adult sample will have practiced, this finding is consistent with the assumption that the motor plans underlying competent handwriting are effector-independent (Wing, 2000).

Fig. 6
figure 6

Distribution of count of velocity maxima (velocity peaks remaining after 10 Hz low-pass filtering) for adults and children producing straight and curved features

An example: upper-case R

Table 2 gives some summary statistics for adults and children producing the three features of the letter R. The adult sample produced all three of these features accurately in all cases, and inaccuracy was also rare in children. This resulted partly from how the task was set. Both adults and children had the shape of the letter that they were to produce visible in front of them as they wrote. It is also a feature of the coding scheme: The parameters within which a feature must lie are deliberately quite broad, so as to capture any successful attempt at production. Only a subset of these would also be perceived as having been produced neatly.

Table 2 Descriptive statistics for three features for the letter R

Pen movement in adults was very much more fluent than in children, even when adults were writing (accurately) with their non-dominant hand (see also Fig. 9). 74% of adults writing with their dominant hand achieved the minimum-possible number of velocity peaks for both features R1 and R2.Footnote 4

Fluency decreased when adults wrote R with their non-dominant hand but, as we have already noted, only slightly. This effect can be clearly seen in the example in Fig. 7. Non-dominant handwriting was definitely slower than when adults wrote normally, and mid-curve deceleration was more pronounced. However, the velocity profile maintained a similar shape to normal production and did not come close to the level of disfluency seen in our child sample.

Fig. 7
figure 7

Examples of velocity profiles for each feature of the letter R – R1 (vertical straight), R2_1 (open curve), R3 (diagonal straight) – for a child, and an adult writing with both non-dominant and dominant hand. Open circles in the trace represent locations of velocity peaks. Filled circles represent stops or lifts. Velocity is smoothed with a 10 Hz Butterworth filter

One question worth asking concerns the relationship between accuracy and fluency. Figure 8 gives velocity plots from four different child writers producing feature R2. This demonstrates again, very clearly, the importance of exploring fluency alongside accuracy. On the basis of their pen traces the children in the top two panels would be identified as skilled handwriters, and the children in the bottom two panels might be identified as being in need of remedial intervention. This is despite the fact that the child in the second panel took 8 times as long to produce the same feature as the child in the first panel. The bottom two panels show faster production suggesting an accuracy fluency trade-off: Inaccuracy in older children is often associated with greater rather than less fluency (van Galen et al., 1993). Analysis of just the child data from the present sample – children at an earlier stage of learning to handwrite than those sampled by van Galen, reported in Fitjar et al., (2021) – did not find this effect, however. We found, instead, that inaccurate features were produced with, on average, 3 more velocity peaks than accurately produced features.

Fig. 8
figure 8

Pen velocity (smoothed with 10 Hz Butterworth filter) for examples of children producing Feature R2 either correctly or incorrectly, and the resulting trace. Filled circles represent pen lifts or stops. Unfilled circles represent velocity maxima

Effects of feature shape on child and adult fluency

The descriptive statistics and illustrations that we have reported so far suggest that, as might be expected, curved features present a greater graphomotor challenge than straight features, particularly in the early stages of learning to handwrite. In this section we test that hypothesis with data from both adult and child samples. To this end we compared nested linear mixed effects models (e.g., Baayen et al., 2008) predicting velocity peak count and implemented in the lme4 R package (Bates et al., 2015). Model comparison was by likelihood ratio χ2 test. Statistical significance for parameter estimates for models was established by evaluating against a t distribution with Satterthwaite approximation for denominator degrees of freedom (implemented in lmerTest; Kuznetsova et al., 2017). We started with an intercept-only model, and then added main effects for condition (child, adult dominant-hand, adult non-dominant hand) and whether the feature was straight or curved as fixed effects. This model gave significantly better fit (χ2(1) = 70.4, p < 0.001). We then added the interaction between these factors (Model 2 vs. M1, χ2(1) = 49.3, p < 0.001). This final model gave an estimated marginal R2 of 0.17 (Nakagawa & Schielzeth, 2013), and intra-class correlations of 0.26 for random effects of child and 0.16 for random effects of item.

The effects found by the best-fit model can be clearly seen in Fig. 9. Relative to adult writers writing with their dominant hand there was some evidence of a reduction in fluency when adults write with their non-dominant hand (estimated velocity peak increase = 1.0, 95% CI [0.01,2.0], p = 0.047) but with no significant additional effect of the feature being curved. Children were substantially less fluent for straight features (5.8 [4.5, 7.2], p < 0.001) with a substantial additional effect of 7.9 velocity peaks (95% CI [5.7, 9.9], p < 0.001) for children producing curved features.

Fig. 9
figure 9

Observed velocity peak count, after 10 Hz Butterworth smoothing. Error bars represent one standard deviation

Conclusion

The aim of this paper was to describe and illustrate a method for segmenting handwritten letter pen-traces into sub-letter features that can then form the basis for an analysis of handwriting fluency. The scheme that we have described also necessarily identifies whether or not a feature has been produced with an acceptable degree of accuracy.

The details of our specific implementation of the approach – our 1/6 tolerance principle and, particularly, our set of acceptable allographs – can and should be varied by users to fit specific research questions, populations, and educational contexts. Our contribution boils down to two observations. First, that if researchers want to make comparisons across writers in handwriting kinesthetics then this must be across features that are, to some meaningful extent, spatially equivalent and that are identified independently of how they are produced. Second, that it is possible to develop a rigorous approach to identifying these features that allows both for variation in absolute size of letter that writers may represent letters using different allographs.

Combined with analysis of movement fluency based on counts of velocity peaks our approach to segmentation and coding scheme therefore allows direct measurement of graphomotor performance. This will be of use to researchers who are directly interested in motor control, providing a more systematic and rigorous approach to pen-trace coding than we were able to find in the existing literature. It will also be of use to researchers who have a broader interest in the cognitive processes that underlie written production, and also in developing strategies for supporting children learning to write. Our approach contrasts, for example, with measures that count the number of characters that are produced in a fixed period of time. These necessarily confound handwriting fluency with time spend in other writing-related processing that occurs when the pen is stationary. Application of this method to real time data from, for example, sentence-copying or written alphabet recall, would allow disambiguation of the contribution of handwriting fluency, per se, to a writer’s overall fluency, and the contribution of processes – reading, message processing / stimulus recall, syntactic and orthographic retrieval – that are more likely to occur when the pen is lifted.