The personal computer has become a ubiquitous tool in the neurobehavioral sciences over the last three decades. Test administration to human participants has particularly benefited from this technological advance because the instructions can readily be presented in a standardized fashion, potentially in multiple languages, large volumes of objective data can be efficiently collected with lower probabilities of experimenter error, and tests can be administered by various individuals after brief training. Relative to their paper-and-pencil counterparts, computerized tests can also be more easily adapted to a neuroimaging environment (Kubo et al., 2009) and require less manual effort for scoring and data analysis. The measurement of executive functions has especially benefitted from computerized test developments (Conners, 1985; Greenberg & Waldmant, 1993; Gur et al., 2010; Wild, Howieson, Webbe, Seelye, & Kaye, 2008). Executive functions are adaptive, goal-directed actions that allow an individual to override automatic or established behaviors. Tasks that involve set shifting, response inhibition, and working memory, especially those that require solving a novel problem, are thought to provide indices of executive function (Garon, Bryson, & Smith, 2008).

However, two challenges continue to face those interested in adopting computerized behavioral testing. First, although the price for individual tests may be reasonable, many researchers prefer to measure a broad array of sensory, motor, and higher-order cognitive functions using a battery approach (Piper, Acevedo, Craytor, Murray, & Raber, 2010; Wild et al., 2008). Some of the better studied batteries (e.g., the Cambridge Neuropsychological Test Automated Battery (CANTAB; Fray & Robbins, 1996) not only charge for the initial setup with specialized equipment, but also have substantial annual or per-use license fees to keep the software operational. While this price is partially understandable to offset the resources needed for test development, excessive costs may limit the potential of smaller laboratories or investigators in developing nations from being full participants in the research process. Second, the computer code that underlies the commercial tests either may not be readily available or may be insufficiently documented so that other researchers can interpret the operations or verify their correctness.

The Psychology Experiment Building Language (PEBL) was developed to overcome both of these limitations. The software is freely downloadable ( and has been documented so that others with moderate programming skills can modify the individual tests to meet their experimental needs (Mueller, 2010b). The current PEBL battery (version 0.6) consists of approximately 50 tests, including many classic ones in experimental psychology and behavioral neurology (Mueller, 2010a, 2010c). The objective of the present study was to report on findings from a subset of PEBL battery tests that assess several core capacities of executive function across the lifespan, including attention, planning, decision making, and cognitive flexibility.

Characterization of age-related performance profiles can be useful for several purposes, including providing fundamental data to help determine the etiology of these changes, enabling diagnostic tests of specific impairments to be developed, and, eventually, identifying interventions for both younger and older individuals that may optimize neurocognitive functioning. Many neurobehavioral capacities improve in young children as they gain maturity, including fine motor function, reaction time, sustained attention, and working memory (Kail & Salthouse, 1994; Piper, 2011). As age progresses, there is often an age-related loss in processing speed and the emergence of generalized cognitive reduction, with deficits in visuospatial skills, working memory, and executive function (see Mahncke, Bronstone, & Merzenich, 2006; Park, 2000). In the present study, we sought to establish whether these general aging effects would occur across a set of four distinct and complementary behavioral tests. Each of the four tests employed, including their historical antecedents, is described below. Three of these tests are versions of some of the most widely used measures in clinically orientated behavioral neuropsychological research, while the fourth has seen limited prior use. To provide some context, we will first review the historical origins of the tests we completed.

The Trail Making Test (TMT) was originally part of the Army Individual Test Battery (1944) but was later adopted into the Halstead–Reitan Neuropsychological Test Battery (Reitan, 1955). Traditionally, the experimenter used a stopwatch to record how long participants took to connect dots on paper that were either numbered (Trails Part A: 1–2–3–4–5) or alternated between numbers and letters (Trails Part B: 1–A–2–B–3). The TMT is simple to administer and is used as an index of visual attention (Trails A) and cognitive flexibility (Trails B). The TMT has also been employed as a sensitive indicator of brain damage (Davidson, Gao, Mason, Winocur, & Anderson, 2008; Periáñez et al., 2007; Reitan, 1955, 1958; Stuss et al., 2001). In addition, the TMT has been employed with a wide variety of other populations, including those with age-associated memory impairment (Hänninen et al., 1997), alcoholics (O’Leary, Radford, Chaney, & Schau, 1977), and children with temporal lobe epilepsy (Guimarães et al., 2007).

The Wisonsin Card Sorting Test (WCST) is another prevalent measure of executive function in contemporary neuropsychological practice and research. This test was originally conceptualized by Berg (1948; Grant & Berg, 1948). The original design of the task involved physically placing cards in one of four piles on the basis of the characteristics of the stimuli. The rule for correctly sorting the stimuli changes regularly, and the ability to switch strategies based on the shape, color, or number of stimuli is recorded. A response in which the earlier rule is incorrectly employed is a perseverative error. Like the TMT, the WCST has been used with various clinical conditions, including schizophrenics (Shad, Tamminga, Cullum, Haas, & Keshavan, 2006) and children with attention deficit hyperactivity disorder (Tsuchiya, Oki, Yahara, & Fujieda, 2005).

The Tower of London (ToL) is a test of planning in which colored disks or balls on pegs are moved individually from an initial state to match a goal state. Optimal performance involves forming, retaining, and implementing a plan to make as few moves as possible. This test was originally developed as a simplified version of the Tower of Hanoi by Shallice (1982). The cognitive and neurophysiological substrates of ToL performance have been frequently and thoroughly examined (Phillips, Wynn, Gilhooly, Della Sala, & Logie, 1999; Ward & Allport, 1997). Both lesion and neuroimaging findings have identified the prefrontal cortex as integral in performing the ToL, as well as the TMT and WCST (Davidson et al., 2008; Kubo et al., 2009; Schall et al., 2003; Zakzanis, Mraz, & Graham, 2005).

The “Time-Wall” task is a recently developed nonverbal decision-making test modeled after a task originally included in the Unified Tri-Services Cognitive Performance Assessment Battery, which was used by the military of the United States for personnel testing (Perez, Masline, Ramsey, & Urban, 1987; Snyder & Rice, 1990). The objective of the original Time-Wall task was to assess the ability to estimate the time at which a target, moving vertically at a fixed rate, will have traveled a specified distance. Thus, it draws on skills relating to both motion perception (Sekuler, Watamaniuk, & Blake, 2002) and prediction. An interesting aspect of performance on Time-Wall type tasks is that this skill appears to be a stable characteristic that does not improve (Jerison, Crannell, & Pownall, 1957), even with extensive practice (Perez et al., 1987).

Although factor-analytic studies indicate that executive function is not a simple, unitary process (Fisk & Sharp, 2004; Miyake et al., 2000), performance across the lifespan on the aforementioned tests generally exhibits a U-shaped association with age. Table 1 provides an overall framework for the present endeavor by outlining the contributions of age to executive function, using the non-PEBL test versions. The Reitan Neuropsychological Laboratory, the originator and distributor of the Reitan TMT, has recognized this developmental profile and has constructed different versions of the test for preadolescent (ages, 9–14 years) versus older (ages, 15+) participants. Among adults, the time to complete Part B increases with age (Yeudall, Reddon, Gill, & Stefanyk, 1987; Tombaugh, 2004). Similarly, the number of perseverative errors decreases with age in children, is stable from ages 20 to 60, and is elevated at senescence on the WCST (Chelune & Baer, 1986; Fisk & Sharp, 2004; Hartman, Bolton, & Fehnel, 2001; Huizinga, Dolan, & Van der Molan, 2006; Strauss, Sherman, & Spreen, 2006). Young adults were most efficient at solving of the CANTAB ToL, relative to either children or older adults (DeLuca et al. 2003).

Table 1 Performance across the lifespan on measures of executive function

To demonstrate the validity of the PEBL executive function tests, we set out to determine whether the tests demonstrate the expected U-shaped relationship between age and behavior, with a progression during childhood, optimal performance (i.e., lowest scores/errors) during late-adolescence/early-adulthood, and then a regression during senescence.


Attendees of the Oregon Museum of Science and Industry were first asked their age and handedness and then were asked to complete a short computer-administered task that typically lasted from 3 to 12 min. The tests were implemented using the PEBL platform (all scripts are in the supplemental materials) and were typically identical or slightly adapted from versions distributed in version 0.5 of the PEBL Test Battery (exact methods are described below, and the scripts are available as supplemental materials). Testing was performed on one of ten personal computers running the Microsoft Windows operating system. The minimum age (5–7 years) to participate was based on the complexity of each test and preliminary testing. Children participating in an experiment were tested without the assistance of their parents or guardians, who instead were encouraged to take part in the study on a separate workstation while their children completed the test. Testing was completed in a semiprivate area with partitions between computer stations to limit any visual distractions or viewing of the monitors at the adjacent station. The instructions were displayed and read by the experimenter to each participant. All procedures were approved by the Institutional Review Board of Oregon Health and Science University. Each participant completed only one test, and so, consequently, we have no direct correlational measures between performance on different tests. During the data collection period (May–September 2010), data were obtained using eight different PEBL tests, of which four are reported here and two are available elsewhere (Berteau-Pavy & Piper, 2011; Piper & Miller, 2011).

PEBL Trail Making Test (pTMT)

The participants (N = 384; ages, 5–76 years; 51.3% female; 13.0% left-handed) completed a computerized version modeled generically after the Halstead–Reitan Trail Making Test. The PEBL version uses an automated algorithm to generate the problems, rather than using the specific set of layouts in the Halstead–Reitan test. The test administered was slightly modified from the one contained in version 0.5 of the PEBL battery, so that the same five test forms were used for all participants (instead of being generated randomly). The instructions below were displayed prior to testing:

In this experiment, your goal is to click on each circle, in sequence, as quickly as you can. When you click on the correct circle, its number will change to boldface, and a line will be drawn from the previous circle to the new circle. On some trials, the circles will be numbered from 1 to 25, and you should click on them in numerical order (1–2–3–4). On other trials, the circles will have both numbers (1 to 13) and letters (A through L), and you should click on them in an alternating order (1–A–2–B–3–C). If you click the wrong circle, no line will be drawn. The trial will continue until you have successfully clicked on all of the circles in the correct order. After the display appears, you can examine the circles as long as you want. Timing will not begin until you click on the first circle, which is labeled '1' on every trial.

The pTMT contained ten trails and alternated between five trials with only numbers (Part A) and five trials with alternating numbers and letters (Part B; see Fig. 1a). Each Part A trial had a corresponding Trail B (an isomorphic problem, mirrored along the vertical axis) with an equal distance to connect all the items. The primary dependent measure was the total time to complete each part. The B:A ratio and accuracy, defined as the minimum number of clicks necessary to complete each trial divided by the number made, were also calculated.

Fig. 1
figure 1

PEBL Trail Making Test behavior in children (ages, 5–12 years; N = 166), adolescents (ages, 13–19 years; N = 84), early adults (ages, 20–49 years; N = 95), and late adults (ages, 50–76 years; N = 39). Screen shot from a trial in Part B (a), completion time (b), and accuracy (c). *p < .0005 versus Part A; a p < .05 versus adolescents; e p < .05 versus early adults; c p < .0005 versus children; l p < .05 versus late adults

PEBL Wisconsin (Berg) Card Sorting Test (pWCST)

Participants (N = 246; ages, 7–89 years; 45.5% female; 10.7% left-handed) completed a card sorting task (Fig. 2a) modeled after Berg (1948) and described more fully elsewhere (Lyvers & Tobias-Webb, 2010). The instructions were as follows:

You are about to take part in an experiment in which you need to categorize cards based on the pictures appearing on them. To begin, you will see four piles. Each pile has a different number, color, and shape. You will see a series of cards and need to determine which pile each belongs to.... The correct answer depends upon a rule, but you will not know what the rule is. But, we will tell you on each trial whether or not you were correct. Finally, the rule may change during the task, so when it does, you should figure out what the rule is as quickly as possible and change with it. Press any key to begin.

Fig. 2
figure 2

Wisconsin Card Sorting (Berg, 1948) performance in children (ages, 7–12 years; N = 71), adolescents (ages, 13–19 years; N = 63), early adults (ages, 20–49 years; N = 81), and late adults (ages, 50–89 years; N = 30). a Screen shot: The lower card is placed into one of the four piles on the basis of similarity of shape, color, or number. b Percentages of perseverative errors by age (lp < .05 vs. late adults). c Response times on correct and incorrect trials by age (*p < .0005 vs. correct; c p < .05 vs. children; l p < .05 vs. late adults)

After each trial, feedback of “correct!” or “incorrect” was displayed for 500 ms. The maximum number of trials was 128 (i.e., two decks of 64 cards) but could be shorter (100) on the basis of optimal category completions. The rule (color, shape, or number) could switch as quickly as every tenth trial. The primary dependent measure was the percentage of the total number of trials with perseverative errors. A perseverative error was defined as an incorrect response to a shifted or new category that would have been correct for the immediately preceding category. Response time was also obtained for correct and incorrect decisions for each participant, although excessively short (<100 ms) or long (>10 s) trial times were excluded prior to calculating the mean for each participant.

PEBL Tower of London (pTOL)

The participants (N = 325; ages, 6–82 years; 44.0% female; 12.3% left-handed) completed eight trials with stimuli based on set A from Phillips et al. (1999). The instructions were as follows:

You are about to perform a task called the 'Tower of London'. Your goal is to move a pile of disks from their original configuration to the configuration shown on the top of the screen. You can only move one disk at a time, and you cannot move a disk onto a pile that has no more room (indicated by the size of the grey rectangle). To move a disk, click on the pile you want to move a disk off of, and it will move up into the hand. Then, click on another pile, and the disk will move down to that pile.

Notably, unlike some versions of the ToL, the version we tested placed no restrictions on the height of the pegs or the number of moves allowed to solve the problem. The primary dependent measure was the total number of extra moves across the seven trials (moves made minus 48, the minimum necessary to solve the problems), although the total time was also recorded (Fig. 3).

Fig. 3
figure 3

Tower of London performance in children (ages, 6–12 years; N = 118), adolescents (ages, 13–19 years; N = 51), early adults (ages, 20–49 years; N = 99), and late adults (ages, 50–82 years; N = 56). Screen shot (a), extra moves (b), and time (c). a p < .05 versus adolescents; c p < .01 versus children; l p < .001 versus late adults


The participants (N = 268; ages, 5–79 years; 47.8% female; 12.6% left-handed) completed a time estimation task based on Perez et al. (1987). The females in this sample were older (25.2 ± 1.8 years) than the males (19.9 ± 1.3 years), t(236.6) = 2.42, p < .05, so sex differences were evaluated with the sample stratified into age categories. The instructions, slightly modified from Snyder and Rice (1990), were as follows:

This is an experiment to see how well you can estimate the speed of a moving square target. The target will always start at the top of the screen and descend at a constant rate toward the bottom. After the target is two-thirds of the way down, it will pass behind a wall and become invisible. Your task is to press a button at the exact moment the moving target would pass through the notch marked at the very bottom of the display. In making this judgement, you are not to count or use any other rhythm method to facilitate your judgement. Instead, follow the target with your eyes and imagine it continuing straight down behind the wall to the notch. After you have pressed the button, you will receive feedback as to where the target actually was and whether you over or underestimated the time interval. When you are ready, press a key on the keyboard and the next target shall emerge from the top.

The participants underwent a brief practice, followed by 18 trials on which the correct completion time ranged from 2,000 to 9,200 ms (M = 5,822.4 ms, SEM = 558.4). Figure 4a shows a screenshot from Time-Wall. The primary dependent measure was inaccuracy, defined as the absolute value of the participant’s response time minus the correct time divided by the correct time for that trial. Since the vast majority of responses on tests of this type are too early (Jerison & Arginteanu, 1958; Jerison et al., 1957), the percentage of trials on which the response time was greater than the correct time was determined. These two values map roughly onto precision and bias, where optimal Time-Wall performance would result in a smaller values for inaccuracy (ideally, 0.00), and unbiased performance would result in a proportion of late responses close to 50%.

Fig. 4
figure 4

Time-Wall screenshot (a, with target, wall, and notch labeled) and performance in children (ages, 5–12 years; N = 105), adolescents (ages, 13–19 years; N = 73), early adults (ages, 20–49 years; N = 59), and late adults (ages, 50–79 years; N = 32). Inaccuracy (b) and percentages of late responses (c) by sex (c p < .05 vs. children; l p ≤ .05 vs. late adults; f p < .05 vs. females)

Statistical analysis

All analyses were conducted using SPSS, version 16.0 (SPSS Inc., Chicago, IL). Mixed ANOVAs were completed where applicable, with age divided into four groups: children (7 [or whatever the lower limit was for that test] to 12 years), adolescents (13–19 years), early adults (20–49 years), and late adults (50+ years). If Mauchly’s sphericity test was significant on repeated measures ANOVAs, results of the Greenhouse–Geisser were reported, with the corresponding reduction in the degrees of freedom. Pearson correlation coefficients were completed among measures on the same tests. Mean data are presented with the SEM, and nonlinear regressions depict the 95% confidence intervals. Effect sizes are expressed as partial η 2 for ANOVAs and Cohen’s d for two-group comparisons.


PEBL Trail Making Test (pTMT)

The total time to complete the pTMT was first analyzed with a mixed (within: part, A vs. B; between: age group, 5–12, 13–19, 20–49, vs. 50–76) ANOVA that identified main effects of part, F(1,380) = 251.7, p < .0005, η 2 = .40, and age, F(3,380) = 42.1, p < .0005, η 2 = .25, and an age × part interaction, F(3,380) = 24.0, p < .0005, η 2 = .16. Figure 1b shows that, as was anticipated, all ages took longer to complete Part B, relative to A. The duration for children to finish each part was larger than that for all older groups. Late adults had greater A and B times, as compared with adolescents and early adults. Examination of the ratio of Part B: Part A time showed an age effect, F(3,380) = 9.92, p < .0005, η 2 = .07, with the ratio for children (1.566 ± 0.029) being higher (p ≤ .01) than that for adolescents (1.396 ± 0.0268, d = 0.54), early adults (1.394 ± 0.0195, d = 0.58), or late adults (1.398 ± 0.047, d = 0.50).

A mixed ANOVA showed main effects of part, F(1,380) = 110.6, p < .0005, η 2 = .23, and age, F(3,380) = 47.93, p < .0005, η 2 = .28, and a part × age interaction, F(3,380) = 14.72, p < .0005, η 2 = .10, on accuracy. For all ages, accuracy was lower for Part B, relative to A. Children had lower accuracy, relative to all the older groups, for both parts. Finally, late adults had greater accuracy than did adolescents (Fig. 1b). The time to complete Part B was strongly correlated with Part A time (Pearson’s R(384) = .88, p < .0005) and Part B Accuracy (R(384) = -.61, p < .0005). Similarly, accuracy on Part A was highly associated with that on Part B, R(384) = .77, p < .0005.

PEBL Wisconsin (Berg) Card Sort Test (pWCST)

Overall, the males in this sample were younger (21.8 ± 1.3 years) than the females (29.8 ± 1.9 years), t(207.0) = 3.46, p ≤ .001. However, since there were no appreciable effects of sex, analyses did not incorporate this variable. The percentage of perseverative errors was analyzed with a one-way ANOVA, which revealed an age effect, F(3,242) = 2.86, p < .05, η 2 = .03, with late adults committing more errors than did early adults (Fig. 2b). Table 2 shows that late adults, but not adolescents or early adults, required fewer trials to complete their first category than did children (d = 0.61). A larger proportion of early adults, as compared with all the other ages, completed the WCST in less than the full 128 trials. Late adults had fewer correct responses than did either adolescents (d = 0.58) or early adults (d = 0.65), but their scores were not significantly different from those for the children.

Table 2 Wisconsin (Berg) Card Sort Test behavior in children (ages, 7–12; N = 71), adolescents (ages, 13–19; N = 63), early adults (ages, 20–49; N = 82), and late adults (ages, 50–89; N = 30)

Response time was analyzed with a mixed (within: response type, correct vs. incorrect; between: age), which revealed main effects of age, F(3,241) = 21.05, p < .0005, η 2 = .21, and response, F(1,241) = 234.9, p < .0005, η 2 = .49, and an age × response interaction, F(3,241) = 4.35, p ≤ .005, η 2 = .05. Figure 2c shows that incorrect responses took longer for all ages. Children’s correct responses were slower than those of adolescents (d = 0.91) and early adults (d = 0.52) but were faster than those of late adults (d = 0.51). Similarly, children’s incorrect responses took longer than those of adolescents (d = 0.58) but were more rapid than those of late adults (d = 0.71).

The percent correct was highly inversely related to the percentage of nonperseverative errors, R(246) = -.81. The percentage of perseverative errors was moderately associated with the percentage of total errors, R(246) = .52, p < .0005, and weakly with percentage of unique errors, R(246) = .23, p < .0005, correct response time, R(245) = .23, p < .0005, and incorrect response time, R(245) = .16, p < .05.

PEBL Tower of London (pTOL)

The number of extra moves showed a main effect of age, F(3,321) = 12.78, p < .0005, η 2 = .11, with children making more moves than all other groups (d = 0.43–0.80). Furthermore, early adults made fewer moves than did either adolescents (d = 0.35) or late adults (d = 0.37; Fig. 2b). Further analysis noted the same relative age pattern when examining the early (2–4) versus the later (5–8) trials (data not shown). The time to complete all trials was examined with an ANOVA that revealed a main effect of age, F(3,324) = 10.78, p < .0005, η 2 = .09. Children and late adults each took longer than either adolescents or early adults but were indistinguishable from each other (Fig. 2c). There was a moderate association between extra moves and time, R(324) = .54, p < .0005.


On average, time estimations were 289.1 ± 33.4 ms early, with 25.6% ± 1.1% of the trials having a late response. The average inaccuracy was 12.3% ± 0.8%. A sex × age ANOVA on percentage of late responses revealed a sex effect, F(1,261) = 6.66, p ≤ .01, η 2 = .03, an age effect, F(3,261) = 3.02, p < .05, η 2 = .03, and a trend in the interaction, F(3,261) = 2.39, p = .069. Late adult females had more late responses then did either adolescents (d = 0.62) or early adults (d = 0.74). Among males, children had fewer late responses than did late adults (d = 0.63). There was a sex difference favoring males at the adolescent (d = 0.65) and early adult (d = 0.68; Fig. 1b) ages. There was only an age effect, F(3,259) = 10.75, p < .0005, η 2 = .11, for inaccuracy. Girls were more inaccurate than all the older females (d = 0.89–0.92), but boys showed a difference only relative to the adolescents (d = 0.53). The correlation of the full-length version (18 trials) with an even shorter version (10 trials) was quite high for percentage of late trials, R(269) = .88, p < .005, and inaccuracy, R(267) = .97, p < .0005, indicating that future researchers could adopt the shortened 10-trial version of the test.


Our primary objective was to evaluate the validity of the PEBL tests. The prediction, based on Table 1, was of a nonlinear relationship between age and performance on measures of executive function. This was partially supported, particularly for the pTMT and the pWCST. These findings also suggest that these PEBL tests show substantial similarities with their more traditional test counterparts. The secondary hypothesis of a dissimilar profile across executive function tests was supported, especially at younger ages. Prior research with a college-aged sample completing multiple tests determined that executive function is not a unitary process (Miyake et al., 2000), and the present dataset corroborates that conclusion. The findings with each of these four measures, as well as some comments regarding their predecessor versions, are discussed below.


The original Halstead–Reitan TMT (henceforth referred to as the Reitan TMT for simplicity) and the pTMT, despite substantial procedural overlap, also differ in some key details. First, the Reitan TMT requires that the experimenter be constantly vigilant to notice any participant errors in connecting the dots and to immediately redirect them, whereas the pTMT reduces this source of variance considerably by automatically recording the number of errors and by requiring that the correct target be identified before continuing. Second, the total distance in each pair (e.g., A1 vs. B1) is equivalent in the pTMT. In contrast, the total distance in the Reitan TMT is 57 cm longer for Part B than for A (Gaudino, Geisler, & Squires, 1995), the overall trail pattern follows a distinctly different path, and this longer distance artificially inflates the completion time between Parts B and A. However, despite removing this bias, the B version took significantly longer than the A for each age group on the pTMT. Third, PEBL’s use of algorithmic generation of TMT problems allows multiple tests of each form to be completed, thereby increasing reliability. Fourth, the Reitan TMT includes separate versions for different ages (<15 vs. 15+), whereas a single test was easily completed across a wide age range in the present endeavor. In a separate study (Piper et al., 2010), we discovered that having separate test versions for different ages can be an inadvertent source of variance. Fourth, and perhaps most important, the Reitan TMT is completed using a pencil, whereas the pTMT involves a computer mouse. The possibility certainly exists that individuals who have greater computer experience would complete the pTMT more rapidly, but utilization of the B:A ratio eliminates this factor. More specifically, although a U-shaped function with the B completion time was observed, with an improvement during childhood and a slight, but significant, regression for late adults (Fig. 1b; see Tombaugh, 2004, for similar findings in adults), the B:A ratio removed any differences between late adults and early adults or adolescents. Further efforts at test development could also incorporate touch screen monitors with the PEBL battery to further diminish any preexisting individual differences in computer experience.

A secondary measure on the pTMT was accuracy [correct clicks/(correct + incorrect)]. The finding that all age groups had reduced accuracy on Part B, relative to A, is not unexpected. However, the higher accuracy of the late adults on Part B, relative to both children and adolescents, is remarkable, since this indicates that late adults may adopt a strategy of more careful deliberation prior to responding. This behavioral pattern, coupled with the generalized slowing among the elderly (Mahncke et al., 2006), may have had a differential impact on the performance of the remaining tests (described below).


The primary dependent measure on the pWCST, as well as the traditional (physical cards) WCST, is the percentage of perseverative errors. The contribution of age to perseverative errors was rather subtle, particularly when placed within the context of robust age differences in response times. As compared with early adults, the response time for late adults was approximately 700 ms longer (41.8%) for correct responses and 1,100 ms longer for (55.0%) for incorrect responses. Children, adolescents, and early adults had equivalent performance, but late adults made more perseverative errors than did early adults. Examination of the secondary measures (Table 1) revealed that late adults differed from other age groups in a no uniform fashion. Hartman et al. (2001) have quite clearly documented that working memory, the limited capacity to temporarily store and process information, is essential for an individual to complete the WCST, because they must store information from completed sorts and process new information in order to identify the subsequent rule. The finding that older adults had fewer correct responses than did adolescents or early adults might suggest that the longer response times on each trial, when multiplied by the number of trials necessary to complete each category, would place much higher demands on working memory and that this factor, either independently or in conjunction with cognitive inflexibility, may account for age differences in this group. Overall, the present pattern of age differences with the pWCST is strongly congruent with prior reports (Haaland, Vranes, Goodwin, & Garry, 1987; Strauss et al., 2006).

There are a wide variety of existing computerized card sorting tests that share conceptual similarities. This includes a test within the CANTAB (the Intra-Extra Dimensional Set Shift, the instrument utilized by Huizinga et al., 2006), and the Wisconsin Card Sorting Test distributed by Psychological Assessment Resources (PAR, 2003). The PARWCST and the pWCST are each based on the procedures outlined in Berg (1948). However, there is at least one difference in that the feedback provided is auditory in the PAR test but visual in the pWCST. A direct comparison where participants complete the short (64-item) forms of both tests is needed to further clarify the extent that both instruments are measuring the same constructs.


The number of extra moves is consistent with the developmental profile hypothesized a priori. More specifically, children made more moves relative to adolescents, who, in turn, made more moves than did early adults but were indistinguishable from late adults. These age differences are even more informative when placed within the context of completion times, with adolescents showing the shortest times of any age group. Together, the uncoupling of moves and time may indicate inferior planning and decision-making efficiency in adolescents. This pattern of responding prior to fully considering all options could also be consistent with elevated adolescent impulsivity on the pToL.

At least two other neuropsychological batteries contain ToL-type tests that are worthy of mention. The ToL freely provided by Davis and colleagues within the Colorado Assessment Tests ( includes both visual and audio instructions (Davis & Keller, 1998). The CANTAB Stockings of Cambridge is essentially a ToL task and offers detailed normative data (DeLuca et al. 2003; Luciana, Collins, Olson, & Schissel, 2009) that parallel the present findings and is highly engaging for participants. One potential advantage of the pToL is its versatility. In addition to the trial structure used in the present study, nine others of different lengths and complexities (Phillips et al., 1999; Shallice, 1982) are available, and it can be further adapted to test new problem sets with minimal changes.


In comparison with the other measures contained in this report, Time-Wall is a relatively obscure test. However, a few general findings are worthy of mention. An adult like response style was observed among adolescents for inaccuracy. The general response approach for all ages was to respond too early. Importantly, there was no evidence of inferior performance by late adults. In fact, late adult females were more likely than either adolescents or early adults to have late, rather than early, responses. These findings might suggest that the generalized cognitive slowing among the elderly (Mahncke et al., 2006) may confer some slight benefit on this simple decision-making test, which is quite different from the profile with the pWCST and pTMT.

Summary and conclusions

Overall, many studies examining changes in cognitive skill between childhood and senescence have been completed previously (e.g., Kail & Salthouse, 1994; Luciana et al., 2009; Mahncke et al., 2006; Park, 2000). However, the earlier data are often procured using unreleased special-purpose software and measures or with proprietary or obsolete software that is not available to many behavioral neuroscientists and neuropsychologists, especially those working in settings where costly software licenses are not an easily justifiable expense. This lifespan study may also provide useful normative behavior of age-related changes on standardized tests (Appendix Tables 3, 4, 5, 6). The benefits of characterizing normative behavior are two-fold: First, this provides a previously unknown profile of age-related changes across standardized neurocognitive measures, and second, it does so for tasks whose software is freely available for use and modification by neurobehavioral and clinical researchers.

In conclusion, the studies reported here may serve as a cornerstone for further investigations into executive function across the lifespan, and this represents a first major step in providing norms for a set of data collection tools that are freely available, verifiable, modifiable, and redistributable. Together, these norms both bolster the tenets of the scientific method, by enabling better transparency, replication, and exchange of information, and provide important normative data for using the software in applied testing contexts that proprietary tests have not reached because of their costs.