Spatial knowledge during skilled action sequencing: Hierarchical versus nonhierarchical representations

Behmer, Lawrence P.; Crump, Matthew J. C.

doi:10.3758/s13414-017-1389-3

Spatial knowledge during skilled action sequencing: Hierarchical versus nonhierarchical representations

Published: 31 July 2017

Volume 79, pages 2435–2448, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Spatial knowledge during skilled action sequencing: Hierarchical versus nonhierarchical representations

Download PDF

Lawrence P. Behmer Jr.¹ &
Matthew J. C. Crump¹

3224 Accesses
2 Citations
Explore all metrics

Abstract

Typists can type 4 to 5 keystrokes per second at around 95% accuracy, yet they appear to have poor declarative knowledge of key locations. Logan and Crump (2011, Psychology of Learning and Motivation, Vol. 54, pp. 1–27) accounted for this paradox by proposing that typing is hierarchically organized into two loops, with an outer loop that transforms sentences into words and passes each word, one at a time, to an inner loop that transforms each word into its constituent keystrokes; however, the nature of the inner loop’s spatial knowledge is not well understood. Key locations may be learned through the experiences of locating and traversing between keys. In daily life, people tend to type structured language, and, as a consequence, certain keys and key-to-key transitions are experienced more frequently than others. Here, we asked whether or not this knowledge is structured hierarchically. For example, knowledge of key locations may be nested within representations of words, or the inner loop may rely on knowledge that is independent from higher level structures. To test this, we had people type English, English-like, and random strings during normal, partially occluded, and occluded typing. In both partially occluded and occluded typing, error rates were higher while typing random strings compared to English and English-like strings, whereas there was no difference in error rates between English and English-like strings. This suggests that typists’ spatial knowledge of the keyboard is not driven by hierarchical word-level representations, but instead is likely driven by a collection of individual processes, such as knowledge of the sequential structure of language acquired by typing more frequently occurring letters.

Cognitive load theory and educational technology

Article 01 August 2019

Cognitive Architecture and Instructional Design: 20 Years Later

Article Open access 22 January 2019

Deconstructing the effect of self-directed study on episodic memory

Article 19 June 2014

Spatial cognition is fundamental for goal-directed behavior across spatial scales, from navigating city streets to locating keys on a computer keyboard. To date, research has focused less on how people navigate in microenvironments, such as typing on a keyboard, and has instead focused more on spatial cognition in macro-environments, such as finding one’s way from one point in town to another, or judging the distances and directions on a physical or mental map (Hintzman, O’Dell, & Arndt, 1981; Ishikawa & Montello, 2006; Maguire, Frackowiak, & Frith, 1997; McNamara, 1986; McNamara, Ratcliff, & McKoon, 1984; Siegel & White, 1975; Thorndyke & Hayes-Roth, 1982; Tversky, 1993, 2000). As a consequence, whether or not principles of spatial cognition in macro-environments apply across spatial scales to microenvironments, such as typing on a QWERTY keyboard, is not well known.

Theories of spatial cognition describe knowledge of spatial relations in terms of the form and function of underlying representations, the structure of how representations are coded with respect to one another, and whether the content of spatial information is embedded in representations or computed by processes operating on representations (McNamara, 1986). We use the first three concepts to guide our questions about how people acquire and use spatial knowledge about the keyboard during typing.

Spatial knowledge could be represented as analog forms such as mental images or memories of environments (Kosslyn, 1975; Kosslyn & Pomerantz, 1977), in propositional forms such as using language to describe object positions (Pylyshyn, 1973), or both (Kosslyn & Shwartz, 1977). Indeed, there is wide agreement that cognitive maps are not “map-like” and are better described as cognitive collages (Tversky, 1993, 2000) that combine multiple representational forms. So, typists may not have a high-fidelity internal map of the keyboard but instead could rely on multiple forms of spatial knowledge to guide their fingers during typing.

Different forms of representation provide different functions. If typists have different forms of spatial knowledge about the keyboard, they may use them in an ad hoc fashion. For example, an analog mental image of the keyboard would be useful for judging distances and angles between keys, and propositional knowledge that a keyboard has a QWERTY layout would be useful for recalling which letter is placed directly to the right of Q.

Spatial knowledge is acquired and developed with experience by processes that structure how new and old spatial knowledge is coded with respect to one another. Spatial knowledge about macro-environments can develop in a scaffolded manner, with initial learning about landmarks enabling learning about individual routes between landmarks, and learning about multiple routes enabling integrated high-level survey or map-like knowledge about an environment (Siegel & White, 1975). However, spatial knowledge does not necessarily progress through the above stages, and people can rely on minimally necessary spatial knowledge to accomplish spatial tasks without developing higher level integrated forms of spatial knowledge (Byrne, 1982). Typists may achieve the ability to navigate the keyboard perfectly well without forming a map-like representation of the keyboard, and instead rely on more local route-like knowledge for moving fingers from key to key.

A central question about processes that structure relations between spatial knowledge is whether relations are represented hierarchically (McNamara, 1986; Shelton & McNamara, 2001; Stevens & Coupe, 1978). Hierarchical representations group details of spatial codes into individual nested objects. For example, a New Yorker might represent New York City, the five boroughs, neighborhoods, and specific streets, routes, and buildings using connected but separate spatial codes. An open question is whether and how people integrate new spatial codes with existing nested representations. For example, Wang and Brockmole (2003) showed that people exposed to a new spatial environment (an experimenter’s lab) within a familiar environment (university campus) can create new spatial representations that are not necessarily integrated with existing spatial knowledge. Similarly, in the present work we ask whether typists’ knowledge of key locations on a QWERTY keyboard is coded hierarchically, by higher level word units that group together routes between key locations. We first review prior work examining the quality of typists’ spatial knowledge of the QWERTY keyboard.

Spatial knowledge of the QWERTY keyboard

Experts can type four to six letters per second and maintain high (94%) accuracy. On the surface, this suggests typists have high-fidelity spatial knowledge about key locations that enables speeded finger movements to individual keys. However, several findings suggest that this knowledge is not represented by an internal analog of the keyboard. First, typists have poor declarative knowledge of key locations on a keyboard. For example, Snyder, Ashitaka, Shimada, Ulrich, and Logan (2014) showed that typists could identify the correct key locations on a blank QWERTY keyboard only 57% of the time in a pen-and-paper task. Accuracy increased to 79% when participants were cued to identify a specific key location but were still significantly worse than during normal typing (94%). Second, typists perform poorly at estimating the distances and angle between keys (Liu, Crump, & Logan, 2010). Typists apparently know where the keys are when they are typing, but not when they are explicitly asked to report on key locations. Finally, error rates during typing dramatically increase as kinematic and tactile cues from the keyboard are removed (Crump & Logan, 2010). If typists had an internal analog map of the keyboard, they should have been able to explicit identify letter locations, judge angles and distances between keys, and move fingers to appropriate locations without external feedback.

The task of typing is hierarchical

The somewhat paradoxical finding that typists have excellent procedural knowledge and poor declarative knowledge of key locations can be explained by Logan and Crump’s (2011) two-loop theory of typewriting. The theory proposes that fast and accurate typing is controlled by independently nested loops that divide the labor of typing. For example, the outer loop relies on language generation and comprehension to turn ideas into paragraphs, sentences, and words that are sent to the inner loop. The inner loop receives word-level instructions and serially orders keystroke responses to type each letter in a word. The division of labor explains the paradox, because the outer loop does not know the details of how the inner loop executes keystrokes. Similarly, knowledge of key locations is assumed to be housed by the inner loop.

This raises the question of how the inner loop knows where the keys are. If typists do not have a high-fidelity analog map of the keyboard, then what is the form of their spatial knowledge? We propose that knowledge of key locations are learned as individual landmarks or routes over the course experience with locating and traversing between particular keys. On this view, the inner loop does not have an integrated map of the keyboard but a parceled, memory-based collection of keystroke procedures (Logan, 1988; Rosenbaum et al., 1995).

If the inner loop uses memory-based procedures for traversing between keys, then knowledge about individual key locations should depend on the frequency with which individual keys are typed. Indeed, prior work has established that interkeystroke intervals are shorter for more frequent than less frequent letters, bigrams, and trigrams (Behmer & Crump, 2017b). This suggests that procedures for generating individual keystrokes are tuned by a process sensitive to the frequency of specific keystrokes. We are interested here in determining whether or not spatial knowledge about key locations and individual key transitions are structured in a hierarchical manner. For example, spatial knowledge about individual key locations may be cued by higher level structure at different n-gram levels or at the word level. Alternatively, the inner loop could form knowledge about key locations and transitions that are independent from higher level structure.

We tested these alternatives by comparing error rates for typing English words, English-like strings that approximated the bigram structure of English, and random strings. If spatial knowledge of key locations is hierarchically organized at the word level, then we expect lower error rates for English words compared to English-like and random strings. If spatial knowledge of key locations is hierarchically organized at the bigram level, then we expect lower error rates for English words and English-like strings compared to random strings, because words and word-like strings contain more frequently occurring and potentially well-learned keystroke transitions. If spatial codes are not hierarchically tied to higher level units, then we expect no differences in error rates at the letter level between string type conditions. For example, spatial knowledge of individual key locations could depend on key-specific practice such that key location knowledge is better for more than less frequently occurring letters. The main results sections of Experiments 1 and 2 address the evidence for hierarchical coding at the word and bigram level, and the final combined experiment analysis section addresses evidence for nonhierarchical coding at the letter level.

Experiment 1

Prior work shows that typists are faster when typing normal words compared to random strings and that error rates tend to be stable across string types (Gentner, Larochelle, & Grudin, 1988; Shaffer & Hardwick, 1968). However, differences in error rates may be obscured when typists can freely view the keyboard, because they may switch from relying on inner-loop spatial knowledge to using visual cues when key locations are uncertain. Snyder, Logan, and Yamaguchi (2015) manipulated typists’ access to visual cues by occluding the keyboard and withholding feedback from the monitor. They found that typists were slower and committed more errors when the keyboard and hands were occluded compared to normal typing. We adopted these occlusion manipulations to maximize reliance on inner-loop knowledge of key locations during typing.

In Experiment 1, typists copied normal English words, English-like nonwords that conformed to the bigram structure of the English language, and random strings while their hands and keyboard were visually occluded and typing output was withheld from the computer monitor. The central question of interest was whether error rates were lowest for words compared to other string types, which would indicate hierarchical spatial coding of key locations.

Method

Participants

Fifty people participated in Experiment 1. Some participants had high error rates in the occluded condition. In order to analyze a common set of subjects on all dependent measures, we excluded participants with less than four correct RT responses in the occluded typing conditions (Van Selst & Jolicouer, 1994). As a consequence, 12 participants were removed from the final analysis. This left a total of 38 participants (five males) in the final analysis. Participants’ mean age was 20 (SD = 3.4) They typed an average of 64 words per minute (SD = 14), reported having been typing for 12 years (SD= 3.3), and started typing at 9 years of age (SD = 3.0). Eighteen participants reported that they had received some type of formal typing instruction, either during K–12 or from a computer-based tutorial (mean training time = 25 weeks, SD = 22). Fourteen participants self-reported as touch typists, and 11 self-reported as “hunt and peck” (nine other; four no response). All participants reported having everyday access to a computer. They reported using a computer an average of 4.5 hours a day (SD = 3.0) and spent an average of 41% of their time on a computer typing endogenously generated text (SD = 33%) versus 14% copy typing (SD = 18%).

Stimuli

All the stimuli were five-letter words. The English words were compiled from the MRC Psycholinguistic Database (Wilson, 1987) with a frequency range of 100 to 500. The English-like words were generated according to bigram probabilities from a large corpus of text that we compiled for a previous experiment by counting the occurrences of single letters, bigrams, and trigrams from more than 3,000 freely available e-books found on Project Gutenberg (Behmer & Crump, 2016a). This resulted in generating five-letter strings that, while not actual words, were generated based on the bigram frequency from our Gutenberg corpus. The random words were constructed by randomly sampling each letter in the alphabet with replacement leading to an equal likelihood of occurrence for each letter. Our stimuli controlled for physical constraints. There was an even distribution between single and two-handed bigrams (see Table 1). Additionally, we calculated the correlations between the single-letter and bigram frequency of our stimuli with the single-letter and bigram frequency from our Gutenberg corpus. English (letters: r = .87; bigrams: r = .64) and English-like (letters: r = .92; bigrams: r = .86) strings were strongly correlated with the Gutenberg single-letter bigram frequency distributions; however, random strings (letters: r = .04; bigrams: r = .03) were not well correlated.

Table 1 Distribution of unimanual and bimanual bigrams across block and string type in Experiment 1

Full size table

Design and procedure

All procedures were conducted with the approval of the Brooklyn College Institutional Review Board. After reading and signing an informed consent, participants were brought into the experiment room to participate in the study. Up to four participants were run during each session, with each participant seated comfortably in a private cubicle under normal florescent lighting, within 55–60 cm of a Mac computer and standard IBM-clone QWERTY USB keyboard (see Fig. 1, left panel). During occluded typing, a cardboard box that was 42 cm in length × 31 cm wide × 21 centimeters high covered the keyboard (see Fig. 1, right panel). The box had a 10-cm tall opening in the front that ran the entire length of the box so that participants could comfortably fit their hands inside to type but were unable to see either their hands or the keyboard. During normal typing, the box was removed. Prior to the start of the experiment, participants practiced typing a short paragraph while the keyboard and their hands were occluded beneath the box. If a participant struggled during the occluded practice task, the box was temporarily removed and they were allowed to type a few sentences while being able to see their hands and the keyboard. When they were ready, the box was then put back in place and the participant continued practicing. If participants needed more practice, they were allowed to type the paragraph twice. During the actual occluded trials, the box was not removed until the block was completed.

The experiment was programmed using LiveCode 6.6.2. Prior to each trial, participants were instructed to rest their hands on the home row of the keyboard. Each trial (see Fig. 2) began with a fixation cross that appeared in the center of the screen for 500 ms, followed by a 500 ms blank screen ISI. This was followed by the to-be-typed stimulus that remained on the screen for the duration of the trial. All stimuli were presented in the center of the screen in a 36 point Times New Roman font. The color of the text was black and presented against a gray background. Participants were instructed to respond to the word as quickly and accurately as possible. Participants did not receive correct/incorrect feedback. When participants typed a letter, regardless of whether or not the keystroke was correct or incorrect, the background of the letter changed from gray to green. After five keypresses, the next trial automatically began. During the task, the backspace key was disabled. There was a 2,000 ms pause between trials, so that participants could move their hands back to the home row of the keyboard. The three different types of word strings were presented in random order in each block. In total, participants typed 225 unique words in each block. Since we were concerned that typists might perform better during occluded typing after engaging in normal typing, blocks were not counterbalanced. Participants always performed the occluded typing block first, followed by the normal typing block. Mandatory 30-s breaks occurred after 75 and 150 trials. After the completion of the experimental procedure, participants completed a survey that collected their basic demographic information and typing experience.

Results

We collected error rates, reaction times (RTs; the time in milliseconds to type the first letter of the word), and interkeystroke intervals (IKSIs; the difference in time in ms between the current and previous keypress) in each condition. Correct RTs and IKSIs were submitted to an outlier elimination procedure (nonrecursive; Van Selst & Jolicoeur, 1994) that removed an average of 3% of observations. For each subject, means from each condition were submitted to a 3 (string type: English, English-like, random string) × 2 (keyboard occlusion: normal vs. occluded) repeated-measures ANOVA. Analyses for RTs and IKSIs were restricted to correct responses. Planned comparisons were performed using paired t tests. Planned comparisons for a String Type × Keyboard Occlusion interaction were Bonferroni corrected to p < .006. Means and standard deviations for all measures in Experiment 1 can be found in Table 2.

Table 2 Experiment 1 means and standard errors (in parentheses)

Full size table

Error rates

There was a main effect of keyboard occlusion, F(1, 37) = 88.38, MSE = 0.094, p < .001, η_p ² = 0.70; string type, F(2, 74) = 23.03, MSE = 0.0010, p < .001, η_p ²=0.38; and a Keyboard Occlusion × String Type interaction, F(2, 74) = 6.87, MSE = 0.0012, p = .003, η_p ² = 0.16. Figure 3 (far left panel) shows the mean error rates for Keyboard Occlusion × String Type. During occluded typing, error rates were lower for English, t(37) = −4.70, p < .001, and English-like strings, t(37) = 4.55, p < .001, compared to random strings. There was no difference in error rates between English and English-like strings during occluded typing, t(37) = −0.25, p = .80. During normal typing, there was no difference between English and English-like, t(37) = −1.27, p = .21, or English-like and random strings, t(37) = 1.84, p = .07. Although mean error rates were numerically higher for random than English strings, they were not statistically different according to .006 threshold for multiple comparisons, t(37) = −2.90, p = .006, Additionally, error rates were higher during occluded compared to normal typing when typing English, t(37) = −9.25, p < .001; English-like, t(37) = −8.94, p<0.001; and random strings, t(37) = −9.67, p < .001.

Reaction times

There was a main effect for keyboard occlusion, F(1, 37) = 9.64, MSE = 26863, p = .004, η_p ² = 0.21, and string type, F(2, 74) = 163.03, MSE = 6223, p < .001, η_p ² = 0.82. There was also a Keyboard Occlusion × String Type interaction, F(2, 76) = 5.07, MSE = 3778, p = .09, η_p ² = 0.12. Figure 3 (middle panel) shows the mean RTs for Keyboard Occlusion × String Type. RTs were faster during occluded typing for English compared to English-like, t(37) = −5.41, p < .001, and random strings, t(37) = −8.56, p < .001. RTs were also faster for English-like compared to random strings, t(37) = 6.75, p < .001. Likewise, RTs were faster during normal typing for English compared to English-like, t(37) = −10.61, p < .001, and random strings, t(37) = −15.78, p < .001. RTs were also faster for English-like compared to random strings, t(37) = 13.49, p < .001. Additionally, RTs were faster during normal compared to occluded typing when typing English, t(37) = −4.17, p < .001, and English-like strings, t(37) = −4.08, p < .001; however, there was no difference between normal and occluded typing when typing random strings, t(37) = −1.00, p = .32.

IKSIs

There was a main effect for keyboard occlusion, F(1, 37) = 38.46, MSE = 7981, p < .001, η_p ² = 0.51, string type, F(2, 74) = 292.25, MSE = 3,413, p < .001, η_p ² = 0.89; however, there was not a Keyboard Occlusion × String Type interaction, F(2, 74) = 2.87, MSE = 1092, p = .07. Figure 3 (far right panel) shows the mean IKSIs for Keyboard Occlusion × String Type. IKSIs were faster during normal compared to occluded typing, t(37) = 6.19, p < .001. IKSIs were also faster when typing English compared to English-like, t(37) = −10.487, p < .001, and random strings, t(37) = −18.25, p < .001, as well as while typing English-like compared to random strings, t(37) = 19.28, p < .001.

Discussion

We were interested in investigating whether the nature of the inner-loop’s knowledge of the spatial layout of the keyboard was hierarchical. We tested this by having participants type English words, English-like words constructed from the frequency distributions of bigrams in the English language, and random letter strings during normal typing conditions and when the keyboard and hands were occluded beneath a box. We observed that during occluded typing, error rates did not differ when participants typed English and English-like strings, and both were significantly lower compared to error rates when typing random strings. This suggests that the spatial codes between strings and their constituent key presses are not hierarchically linked to higher level units, such as words. If knowledge of key locations were driven by word-level hierarchies, then we would have expected error rates during occluded typing to be significantly lower for English words than English-like strings. Instead, the finding that error rates were higher for random strings than English words and English-like strings is consistent with hierarchical coding at the level of bigrams or key-to-key transitions, and consistent with nonhierarchical coding at the level of individual keys. These alternatives are addressed in the combined analysis of Experiment 2.

Last, typists were slower during occluded compared to normal typing. These general findings are consistent with prior work from Snyder et al. (2015) who also observed that RTs and IKSIs were slower when feedback from the monitor was withheld and typists’ hands and keyboard were occluded from view, as well as Crump and Logan (2010), who observed that typing became progressively slower as kinematic feedback from the keyboard was manipulated.

Experiment 2

The purpose of Experiment 2 was to reproduce the basic findings of Experiment 1 and control for potential confounds that could have produced higher error rates for the random strings compared to the English and English-like strings. First, in the occluded condition of Experiment 1 typists were not given the opportunity to confirm that their fingers were starting in a home-row position, so they could have committed several proximity errors and inadvertently executed keypresses on adjacent keys that were within one to two key locations of the correct key. We performed an unreported analysis where we treated nearby incorrect responses as correct responses, and found that the although error rates were reduced overall, the pattern of higher error rates for random strings compared to English and English-like strings remained significant. To address the confound directly, in Experiment 2 we instructed participants to visually reorient their hands to the home row between trials during occluded typing. Additionally, we introduced a partial occlusion condition where participants could see the keyboard and their hands, but the identities of the letters on the keyboard were occluded with stickers. Last, following the analysis of Experiment 2 we performed two combined analyses of the occluded conditions from both experiments. The first examines the pattern of higher error rates for random strings varies by typing expertise. The second examines evidence for nonhierarchical coding at the letter level by determining whether error rates correlate with letter frequency from the natural English language, and whether error rate distributions at the letter level are similar across the string type conditions.