The data reported and analysed here have been uploaded to Open Science Framework: https://osf.io/g8jh5/?view_only=1b4bd078cb014d2db980ea1dcee5c855. The long appendices which include second by second detail of the two case study participants summarized at the end of the Results section can also be found at this link.
Aggregate performance measures
There were slightly more valid words produced in the high interactivity condition with participants producing an average of 18.35 words in 5 min (SD = 8.48); in the shuffle condition, participants produced slightly fewer words (M = 17.22, SD = 6.23) and in the low condition they produced the least (M = 17.02, SD = 6.59). However, in a repeated measures analysis of variance (ANOVA), the effect of condition on performance was not significant, F(2, 78) = 1.94, p = . 146 ŋp2 = .048. Similarly, there were no significant differences in word length between the conditions: the mean word length in the high condition was 3.68 (SD = 0.212), 3.64 (SD = 0.171) in the low condition and 3.62 (SD = 0.205) in the shuffle condition, F < 1. The words produced were marginally rarer in the high interactivity condition as indexed with Zipf scores (high: M = 4.172, SD = 0.281; low: M = 4.254, SD = 2.74, shuffle: M = 4.243, SD = 0.234); however, these means did not differ significantly, F(2, 78) = 1.54, p = .220, ŋp2 = .038.
Individual differences
Table 2 reports descriptive statistics for the measures of individual differences, as well as the matrix of correlations for the measures of performance (number of words produced) and scores on the measures of individual differences (df = 38 unless noted otherwise). As expected, there was a strong positive correlation between verbal fluency and anagram skills, r = .493, p = .001; both were significant predictors of performance in all three conditions (lowest r = .601, p < .001). Measures of extraversion did not correlate with any measures, nor did openness, with the exception of a negative correlation with word production, r = .331, p = .037, in the high interactivity condition. The direction of this association is a little difficult to interpret given that openness is sometimes associated with higher self-report measures of creativity, and it may be safer to treat the finding with caution.
Table 2 Descriptive statistics and correlations among measures of anagram performance, verbal fluency, openness, extraversion and word production performance in the three experimental conditions Participant behaviour
We further conducted an analysis of participant behaviour within the conditions. For the low interactivity condition, participant behaviour was controlled so variation was limited but in the high interactivity condition, participants were invited to move the tiles as they wished resulting in a wide range of behaviour. In the shuffle condition, while behaviour was controlled between shuffle the number and timings of shuffles was under the participants’ control.
Time interacting
In the high interactivity condition, the amount of time participants spent moving the tiles was coded using ELAN (https://tla.mpi.nl/tools/tla-tools/elan/). The total time interacting with the tiles was assessed from when a participant touched a tile to when he or she stopped touching it. As there were many moments when a participant touched a tile but did not move it, this was further split into neutral moves (which did nothing to alter the array) and active moves (which changed the array in some way, either deliberate or random). Active moves were considered a reflection of interactivity.
The average time spent interacting with the tokens was 106.4 s (SD = 65.1) out of a possible 300 s. Two people chose not to interact at all and from the remaining 38, the shortest amount of time spent moving the tokens was 2.92 s and the longest was 226.9 s. There was a significant correlation between the amount of time spent actively moving the tiles and the number of words produced in the high interactivity condition, r = .329, p = .038 (see Fig. 1); the correlation was marginally more positive when controlling for fluency, r(37) = .356, p = .026. This indicates that the amount of time spent interacting had a continually additive effect contrary to our hypothesis. Also contrary to our hypothesis, the relationship between the amount of time spent interacting and the measures of individual differences was not significant, verbal fluency: r = .117, p = .472; anagram skills: r = −.021, p = .897. The time spent moving the tiles appears to reflect something beyond individual differences in verbal skills.
To ensure that the total movement time did not reflect an individual difference that would be reflected by an increased performance across all conditions, we examined the correlations between the time spent interacting with the tiles in the high condition with the performance in both the low and shuffle condition. While both were positive, the correlation with words produced in the low condition was not significant, r = .219, p = .175, nor with words produced in the shuffle condition, r = .146, p = .365 (and indeed when controlling for fluency these correlations were weaker : low, r(37) = .199, p = .224 and shuffle, r(37) = .093, p = .574). This suggests that the time spent moving in the high condition is a unique predictor of the number of words produced in that condition.
Shuffling
In the shuffle condition, our hypothesis that the increased time and cognitive cost would lead to a decrease in the number of shuffles from that reported in Kirsh (2014) was upheld by our data. In the shuffle condition the number and timing of the shuffles were also recorded in ELAN. The largest number of shuffles was 3 with a mean of 1.55 (SD = 1.13). Word production performance did not differ as a function of the number of shuffles: 10 participants chose not to shuffle at all (M = 18.9, SD = 6.40), 8 shuffled once (M = 15.87, SD = 6.12), 12 shuffled twice (M = 17.00, SD = 6.50), and 10 shuffled 3 times (M = 16.90, SD = 6.43); a one-way between-subjects ANOVA with number of shuffles as a grouping factor revealed that the number of shuffles did not have a significant effect on the number of words produced, F < 1. Neither verbal fluency, r = .038, p = .841 nor anagram performance, r = −.011, p = .954, correlated with the number of shuffles, supporting the results from the high interactivity condition that suggest that changing the array is not related to those individual differences.
Exploratory analyses
To our knowledge a detailed analysis of behaviour in this task has not been done before. We first took a subsection of the main sample and subjected these participants to a detailed qualitative analysis of the process of word production to generate further exploratory hypotheses to apply quantitatively to the whole sample. Nine participants were selected for this exploratory analysis. They were selected on the basis of the change in performance from high interactivity to low interactivity: Our goal was to use participants who either benefited the most from the ability to move the tiles to generate words, or those who appeared impeded in their ability to generate words in the high interactivity condition. Thus, this sample of nine included two participants with the highest boost from interactivity: These participants showed an increase in the number of words produced of 18 and 10, respectively. The two participants who experienced the greatest negative impact of interactivity were also selected. Total word production by these participants declined by 7 and 6, respectively. Three participants who showed no change between the conditions were also selected. Additionally, two participants were selected who also showed behaviour different to the overall trend of data; that is that the more time spent interacting with the tiles the greater the word count. One spent over 3 min (181.9 s) interacting and yet produced 3 fewer words in the high-interactivity condition, the other only spent 5.37 s of the whole 5 min time period interacting with the tiles and yet produced 9 more words in this condition than in the low interactivity. The detailed scrutiny of these nine participants helped us generate three hypotheses which were then tested across the whole sample. The videos of these nine participants were scrutinized for underlying behaviours that may not be captured by the aggregated means and which indicated underlying strategies in approaching the task.
First, we proposed that lucky shuffles led to greater word production. A shuffle leads to a random change in the array and this change may quickly seed new words or obscure them. As serendipity is the enactment of this environmental luck, we can assess this by examining participant behaviour directly after a shuffle. If the shuffle has been useful, it seems likely that the participant would produce a word directly after. If not, then the shuffle has been less useful in breaking the impasse. We therefore indexed luckiness as the time taken to produce a word after a shuffle: the faster a word has been produced, the luckier the shuffle.
Second, higher physical engagement leads to a higher overall word production. Engagement here is a more fine-grained concept then time spent interacting with the tiles. Engagement with the environment can be measured by the responsiveness of the participant to the clues thrown up by it: That is, the more participants respond to the environment, the more words they would produce and, in contrast, the less they use the environment, the more they will be weighed down by the cognitive cost of movement. We termed this measure the efficiency score—that is, the measure of participants’ leverage of environmental opportunities. We expect this to be a greater predictor of word production in the high interactivity environment than in the conditions where this strategy is not as easily enacted.
Third, it seems likely that a word verbalised while the participant was not moving the tokens was more likely to come from internal processes, while one which the participant spoke during movements would reflect changes wrought by the array. We therefore hypothesised that the proportion of words produced while moving the tiles would predict the number of words produced overall in that condition. We next present how these hypotheses fared when evaluated with the data from the whole sample of participants.
Luckiness of shuffles
The time from the end of the shuffle to the production of a word was calculated. The end of the shuffle was chosen because participants’ behaviour was once again controlled and their engagement with the tiles limited to internal computations. Luckiness was indexed as the time from the end of the shuffle to the production of a new word. As expected, there was a wide range in times. Indeed, it was possible for a participant to generate a word while relaying the tiles, the change in the letter array presumably triggering an already liminal word. In this case, a negative latency was recorded, which was the time between the participants uttered the word and the time the final tile from the shuffled set was laid on the work surface. Where participants shuffled more than once the average of the times was taken. An analysis of the correlation between the number of words produced in the shuffle condition and the luckiness of the shuffle indexed in this way was conducted on the 30 participants who opted to shuffle (see Fig. 2). This revealed a significant relationship between the number of words produced in that conditions and the luckiness of the shuffle, r(28) = −.460, p = .011. The shuffle represented an element of nonlinear luck that prompted participants’ performance beyond their individual skills: The relationship between the luck experienced (as operationalised with the average latency to first word produced) did not correlate significantly with verbal fluency r(28) = −.316, p = .089, or anagram skill, r(28) = −.190, p = .315.
Efficiency score
There are inconsistent benefits to high interactivity. Moving the tiles may not necessarily augment the system’s new word affordances and indeed the additional cost may slow the system down if the benefits are not fully realised. In the word production task where the letters are unchanging, a beneficial strategy is to use the same root and change a single letter. In this way a participant may identify the root -ate in the word set and produce the words d-ate, f-ate, g-ate, h-ate, l-ate and so on depending on the other letters available in the set of seven, a switch between state spaces (Maglio et al. 1999). We hypothesised that an efficient strategy would be easier to follow in a condition where the words are reified physically. Further, we hypothesised it would be a better predictor of performance in the high condition because this strategy could be followed with little working memory cost whereas in a low or shuffle condition the boost from an efficient strategy may be undermined by the cognitive costs required to hold congenial letter arrangements in the head.
We calculated the similarity of the produced word to the word produced immediately before which we call here the efficiency score. This score assumes that when a participant thinks or sees the word, for example, BREAD, it demonstrates a higher efficiency to remove the B and create READ or remove the A and create BRED than to create an entirely new word. Each word generated by a participant was given an efficiency score. Two scores were calculated: the proportion of letters in the same absolute positionFootnote 2 and the proportion of letters in the same relative position.Footnote 3 The resulting two proportions give different measures of the similarity of words—it is possible to have words scoring highly in relative position but low in absolute position; for example, if the word READ follows BREAD it scores 0% for absolute position but 100% on the relative position. We therefore used the higher of the two proportions as the efficiency score for each word. Finally, the efficiency scores were averaged across participants.
Contrary to our prediction, efficiency scores were similar across all the conditions (high: M = .322, SD = .096, low: M = .316, SD = .089, shuffle: M = 0.303, SD = .090), F < 1. This indicates that the participants were using broadly the same strategy across the three conditions. We then examined whether there was an effect of using the strategy on word production in each of the conditions. The relationships are illustrated Fig. 3. The relationship between the efficiency and the total number of words produced in the high interactivity condition was strongly positive, r = .597, p < .001; however, the level of efficiency was not significantly correlated with word production in either the low, r = .223, p = .166 or the shuffle condition, r = .283, p = .076. This indicates that a good strategy is a significant contributor only in the high condition.
Active words
We further hypothesised that in the high interactivity condition those words produced while the participant was moving, rather than while contemplating the tiles would indicate a higher level of engagement with the environment (having been triggered by ongoing environmental changes) whereas those words produced after movement would be more likely to indicate a word generated from purely internal processes. We therefore calculated the proportion of words announced mid movement and assessed the relationship between this proportion and the number of words produced in the high interactivity condition. A relatively high proportion of words was produced mid movement (M = 43.9%, SD = 25.7%), however, the relationship between this proportion and the number of words produced was not significant, r = .180, p = .266 and our hypothesis was not supported.
Qualitative analysis
Research in interactivity proceeds from the assumption that including the external world in the cognitive ecosystem augments performance but this implies an optional use of the external world; that a problem solver will recruit the environment when she needs it and rely on her own internal processes to solve the problem when they are adequate. Instead, if we suggest that cognition is always systemic then we must consider moments when the external world disadvantages the problem solver. Furthermore, problem solving in a path-rich environment will yield different routes and strategies, the contingent patterns of which may be masked by aggregate data.
We therefore selected a participant who did not benefit from high interactivity (Participant 20; P20) and one who did (Participant 41; P41); P41 had the greatest boost from interactivity. Overall, P41 was higher than average on the measure of verbal fluency: his score was 113 against an average of 88.3 for the sample (+ 0.98 SD). P20 was lower than average on this measure, scoring 52 (− 1.44 SD). Both participants were above average in anagram skill: P41 got all 12 anagrams correct (+ 1.13 SD) and P20 got 10 correct (+ 0.40 SD).
P41 produced 50 words in the high interactivity condition, 32 in the low interactivity condition and 32 in the shuffle condition. He moved the tiles for an above-average time and spent 193.9 s interacting with the tiles, compared to the sample average of 106.4 s (+ 1.34 SD) with over twice as many episodes of activity as average (56 episodes, M = 24.72). Indeed, the amount of interaction can be seen in the number of words that were produced during a period of activity. Seventy-three percent of the words generated were produced while moving the tiles. He had a higher than average efficiency score in all conditions although it was higher in the high interactivity condition (high = .60, Mhigh = .323, low = .51, Mlow = .316, shuffle = .40, Mshuffle = .303).
P20 produced 12 words in the low-interactivity environment and 6 in the high-interactivity environment (9 in the shuffle). This indicates that for this problem solver the extended ecosystem was not an aid to thinking, rather it acted as a hinderance to the whole system. She spent less time than average interacting with the tiles, 71.23 s (− .541 SD) However, more of her words were produced during a period of activity (88%). She had a lower than average efficiency score in the high (.26) and low (.29) but higher than average in the shuffle (.32).
The following analyses contrast how the singular trajectory unfolded for each participant and specifically assess the coordination of the different systemic elements. The unit of analysis here is not reduced to the individual problem solver but instead contrasts the problem-solving systems formed by the problem solver, his or her environment and the unfolding of the problem over time. In the analysis that follows words produced by the participant are identified with capital letters (e.g., BODE), possible words not produced are written in lower case bold (e.g., bed). Each second of the video is presented in the appendix (found here: https://osf.io/g8jh5/?view_only=1b4bd078cb014d2db980ea1dcee5c855) and the most salient events are highlighted here coded with E and the second to which they refer.
Inconsistent effects of reification
After saying the first word (not a valid because of the spelling), P20 changed the array to reflect her suggestion. This move was of low utility—the word has already been generated and changing the array did not yield any additional information. Indeed, it is likely to be an impediment to new words because the suggested word is now treated as a unit, potentially blocking new ideas. This was a common strategy for P20 who spelt out every word after saying it indicating that the direction of cognition was internal to external. Changing the array in this way did not yield any benefit and was only a time and cognitive cost as well as perhaps stifling the generation of new combinations.
On the other hand, although P41 reifies his announcements, he does not leave them to stagnate. He breaks the initial set up quickly, creating a circular arrangement (E5), once the word BOD (E6) is identified this leads quickly to BODE (E9) then BORE (E14), BORED (E17) and BORN (E22) before hazarding a guess at BORNE (E28). This pattern is repeated several times (e.g., ROB [E98] to ROBE [E102] to ROD [E106] to RODE [E110]). This pattern of movements would yield a high efficiency score and would be supported by the high interactivity environment because the congenial collections of letters can be reified into a new candidate word. It is noticeable that this appears to only be a useful strategy if the letter tiles are moved sufficiently such as to defeat the anchoring paralysis of a static array. Thus, a level of disengagement is also necessary. Indeed, we can see in E52 the word bed is created by the left-over tiles but the participant rather incorporates unused letters to keep the array dynamic so this candidate word opportunity is not banked in; rather, he creates BEND (E54).
Microserendipity
The changes to the array yield other left-over collections of letters which are not produced intentionally but are rather an artefact of the movements which have come before. For example, creating the word BODE leaves the letters NRE (E10), later the R and the E become rearranged as artefacts to become ER (E51) which is a much more useful digraph. This movement is not intentional but rather a necessary outcome of a constantly shifting array. For P41, there are several moments when such unplanned moments prompt the following word, which are better understood as moments of microserendipity. The word BRO (E34) leaves the digraph EE on the array. The digraph is incorporated in the following word BREED (42). In E333 the left-over tiles generate the array illustrated by Fig. 4: this array leads the participant to suggest the word BONER (E338) then BONED (E342) and finally BONE (E349). This efficient word sequence was sparked by an unplanned change in the array.
Missed opportunities
A serendipitous moment can only arise when the environmental luck is capitalised upon by the problem solver. This is something that P20 found particularly difficult. Her lack of movements meant that fewer lucky arrangements were generated but also, she failed to notice others. For example, E19 and E27 offer almost identical letter array aside from the rearrangement of I and E (see Fig. 5) The word WET is spelt out in a triangle in E27 and is prompted, however, this arrangement has spelt out the less common word wit previously but this is not noticed by the problem solver (‘wet’ has a Zipf score of 4.67, ‘wit’ has a Zipf score of 3.67) although being in the same position in E19 as WET is in E27.
This underlines the importance of internal and external resources—while the environment may yield a word, if it is not noticed by the problem solver, it will remain inert. This happened often throughout the course of the 5 min for P20. It is particularly noticeable when the participant creates tar in E185 but does not say the word. This could be for two reasons, having the word in front of her she forgets to say it (the word does not need to be spoken to take form) or because she simply does not notice it (Fig. 6).
There are other clear moments of P20 not noticing environmental opportunities. In E71–78 the word ire is spelt vertically down and yet not recognised by the participant. The word war is made several times by the tiles and not recognised. See Fig. 7.
Indeed, any words elicited from the array by this participant were done so in a bottom-up or across way aside from this one last word. Those words which were most obvious and yet missed were those which were spelt downward suggesting an inattentional blindness relating to her habitual gaze trajectory.
Although we assumed that a high level of engagement would necessarily lead to an improved ability to capitalize on surreptitious candidate words, there are several moments when the array created words which were not noticed by P41 despite his overall high engagement. These were most obvious in the creation of bond in E62 and rend in E220 the latter of which is not said until E252 as demonstrated in Fig. 8.
This is in part a function of the kinetics of the task environment, which are hard to capture in the static array presented in the appendix. The tiles were in continual movement and the snapshot we have reproduced here cannot capture the fluid landscape of letter combinations. It also reflects a level of disengagement (whether a deliberate strategy or not), a reversed sunk cost as it were, quickly disassembling and re-assembling arrays of letter tiles, which, combined with high verbal fluency, prevents output inertia. Finally, it emphasises the contingent nature of serendipity and the inherent difficulty in observing it. In these instances, the environment yields the luck, the participant has demonstrated the ability to recognise those words, but the moment does not happen.
The additional complexity of a coupled system in movement makes it hard to create a systematic framework for analysing the number of words missed and we present the data here as a starting discussion on how such moments can be categorised. A better understanding of microserendipity will likely involve a better understanding of missed opportunities. However, as a reviewer pointed out, missed opportunities for whom? That is, we identified missed opportunities as we saw them in the video data, but we did not provide a formal way to categorize a certain un-named configuration of letters as a missed opportunity. Informally, horizontal left-to-right series of letters that formed a word and top-to-bottom vertical series of letters that formed a word but which was not named counted as a missed opportunity. However, participants demonstrated a considerable degree of flexibility in the selection and combination of letters to produce words: As illustrated in Fig. 5, P20 produced words in a right-to-left bottom-up manner, and as illustrated in Fig. 4, P41 produced a word in a bottom-top-bottom triangular pattern. This open-ended flexibility complicates considerably the formulation of a priori effort to capture missed opportunities. It seems likely that the missed opportunities will reflect both the characteristics of the system and the individual differences of the problem solver. Inviting participants to watch a video of their performance on the task might be a means to more clearly capture missed opportunities.