Communication data set
To test the utility of summary measures obtained from CRA, as well as the influence of semantic-space parameters on these measures, we analyzed previously collected team chat data (Mancuso, Finomore, Rahill, Blair, & Funke, 2014). The data were obtained using the Experimental Laboratory for Investigating Collaboration Information-Sharing and Trust (ELICIT), a simulated intelligence, surveillance, and reconnaissance task in which teams work to solve a logic puzzle framed as an investigation of a pending terrorist attack (Finomore et al., 2013; Ruddy & Nissen, 2008). In ELICIT, each team member receives a unique set of 15 complimentary text-based statements (factoids) about an expected terrorist attack. These factoids hold information that varied in importance: expert factoids (~ 5%) hold explicit information about the specifics of the pending attack; key factoids (~ 19%) help identify which information is meaningful; support factoids (~ 26%) supplement expert and key factoids; and noise factoids (~ 50%) contain no information needed to solve the puzzle. To succeed, team members must share their information and come to a joint decision on the details of the upcoming attack (i.e., Who will be attacking, What will be attacked, When will the attack take place, and Where will the attack occur). All the information needed to solve the task is dispensed to the team; by sharing information and following the logic of the factoids teams can arrive at the correct conclusions. The factoids were presented to participants in this task at a fairly quick pace: Every 5 s, a set of four factoids was introduced (one factoid per person), meaning that within 75 s each team received all the factoids.
The purpose of the study conducted by Mancuso et al. (2014) was to analyze the effect of confirmation bias on team decision making. To this end, distractor factoids (nine in total) were introduced in ELICIT that implicated the wrong terrorist group in the pending attack—the factoids implicated the “green”-Incorrect (I) group, though the correct answer was the “blue”-Correct (C) group. The distractor factoids did not make the puzzle unsolvable. Rather, they introduced a counter narrative that potentially named the “green”-I group as the answer to the Who question, but with insufficient evidence to justify that answer. In the experiment, teams of four individuals communicated via Internet relay chat (IRC) to solve the logic puzzle, whereas misleading information was conditionally presented either early in participants’ information queue, late in the queue, or mixed (both early and late).Footnote 1 It is important to note that if misleading information about “green”-I was presented early, then most information related to “blue”-C was introduced late in the queue, and vice versa. This biasing information was expected to complicate decision making and diminish team performance by inducing a confirmation bias, which is ultimately a failure to update beliefs in the presence of new information (Schippers, Edmondson, & West, 2014). Specifically, introducing the “green”-I distractor information early in the queue was expected to reduce conversation about later “blue”-C information and lead to an incorrect conclusion about the Who component of the puzzle.
Additionally, factoids about the “green”-I group were intentionally designed to be noisy (i.e., thematically incoherent), to allow teams to infer disjoint conclusions. For instance, consider the following factoids, and note that the last two were designed to create an implication that the “green”-I group may be behind the impending terrorist attack:
The target is a bank or military general.
Both cabinet members and a military general will be attending a political convention in New Zealand.
Rumor has it that the Green group has stolen uniforms from military bases in New Zealand and Australia.
Security forces for New Zealand’s political convention are providing extensive protection with military personnel only.
Since the target is possibly a military general and security is provided by military personnel only, a plausible inference is that the “green”-I group stole the uniforms to get access to the target. Other disjoint inferences are also possible—for example, regarding the “green”-I group communicating with locals about being oppressed by the military—that could plausibly establish motivation, but are either unrelated or only tangentially related to the other clues provided to the participant teams. Another difference between conditions was that information regarding the “blue”-C group given to specific individuals in the distractor-first condition was not always paired with its supported information, though it was in the distractor-last condition. Finally, the distractor-first condition contained an injected factoid that linked “green”-I to stolen uniforms but that did not do so in the distractor-last condition. Thus, the implication is that the biasing factoids would not only reduce the discussion of “blue”-C relevant information but would also create a noisy semantic profile.
Analyses of the number of exchanges referencing biasing information showed that teams were susceptible to it when it was introduced early in the queue (Mancuso et al., 2014). Specifically, participants in the early and mixed conditions discussed the misleading factoids equally often as they discussed those leading to the correct answer, whereas participants who received biasing information late in their queue were more likely to focus on relevant factoids. Additionally, teams who received biasing information early in their queue were less likely to answer the Who part of the puzzle correctly than were teams in the other conditions. Given that the biasing factoids were found to influence team performance and communication in the Mancuso et al. study, we expected that the CRA analyses in the present study would also be sensitive to those experimental manipulations. Importantly, Mancuso and colleagues’ findings regarding the effects of the order of biasing information on communication were obtained from analyses of hand-coded data that quantified how utterances were related to categories of factoids. We predicted that CRA would circumvent the need for such hand-coding and would be able to quantify the influence of the biasing factoids on team discourse. In particular, according to the hypothesis that teams will be susceptible to a confirmation bias, we expected a discounting of information that relates to “blue”-C, meaning that information semantically related to “blue”-C would be not be as prevalent in the distractor-first group.
Participants
The sample in the original study consisted of 64 paid participants (29 females), with ages ranging from 18–31 years (M = 24.5, SD = 5.5). Participants completed the study in four-person teams, yielding a total of 16 teams. All participants had normal hearing and normal or corrected-to-normal vision.
Analysis procedure
CRA parameter settings
The process of constructing a semantic space in Discursis (see Fig. 2) depends upon several user-defined settings that include: the upper limit of the number of key terms extracted; the identity of the corpus from which the key concepts are extracted (if the discourse being analyzed is not to be used to generate a set of intrinsic concepts); whether stemming—a preprocessing procedure to remove prefixes and suffixes—is used; and the list of stop words, among other options. For this project, we examined the relative influences of two of these settings: (1) the source of key concepts and (2) the number of key concepts. To do so, we created different sets of conceptual recurrence plots for all groups using two values for both parameters (see the example in Figs. 3 and 4). Each of these plots is summarized with the set of metrics introduced in the next section.
For the source of key concepts, we compared key concepts extracted from the communications themselves (intrinsic) against key concepts extracted from the factoid sets (factoid). Using the factoid sets to create an a priori conceptual space might give insight into how task specific concepts constrained team conversations. We also believe that testing a priori bases might be a first step toward future implementation of CRA as a tool for real-time analysis of team communication. For number of key concepts, we compared two methods: automatic identification of the number of key concepts (automatic) and a set upper limit of 100 (N100, the default setting in Discursis). We note here that the term “automatic concept extraction” is potentially misleading, because for both this method and the method that sets a ceiling limit on the number of identified concepts, the concepts are extracted automatically rather than manually. The potential importance of this setting lies in the granularity of the constructed semantic space; larger semantic spaces have more specific terms (Smith & Humphreys, 2006) that can distinguish otherwise similar utterances. In the data we analyzed, the average number of key concepts identified in the N100 analysis was almost three times greater than in the automatic analysis.
We ran CRA with all concepts or only certain concepts activated to test the influence of order of biasing information on utterances related to task-specific keywords. Specifically, we created exclusive-or filters that passed utterances conceptually similar to the names of either of two ELICIT terrorist groups, “green”-I or “blue”-C, but not both simultaneously. In both cases, we ran the analyses with intrinsic and factoid concept bases and using the N100 and automatic concept extraction methods.
For all analyses, the “merge word variants” option was selected in Discursis, which identified plural and singular words (e.g., “general” and “generals”) or words with different tense suffixes (e.g., “attack” and “attacked”) as the same. Such stemming procedures are commonly used in preparing texts for semantic analysis (Turney & Pantel, 2010), and preliminary evaluations of ELICIT data indicated that this option essentially filters the data and provides a clearer picture of conceptual coordination (although in some cases there may be good reasons to leave this option unchecked; Smith, 2000). As with many data preprocessing procedures, stemming alters the form of the data in ways intended to clarify the signal, but that can also introduce noise (i.e., by mapping words to the same root that should be separated—overstemming—or mapping words to different roots that should be the same—understemming). For instance, the above example regarding the semantic similarity of “general” and “generals” may raise concerns about varying definitions of “general” (e.g., “a commander of an army” vs. “widespread”). In the present context, both words frequently refer to the same type of entity (a commander of an army—e.g., “Generals in New Zealand have private guards” and “The target is a bank or military general”), but it bears noting that such morphological limitations are at the crux of automatic processing of semantic content of language and that no stemming algorithm is perfect (Paice, 1996).
Metrics
Our analyses focused on global aspects of team conceptual cohesion, where by “global” we mean average measures of whole conversations. Such measures have been used to quantify team (Gorman et al., 2003, 2016) and dyadic (Babcock, Ta, & Ickes, 2014) communications, but here we show that CRA as implemented in Discursis adds concept-specific and speaker-specific flexibility to the available analyses. We report a set of novel summary statistics based on partitioning similarity matrices into self and shared-with-others similarity, and by similarity with particular concepts. The metrics described below are not those automatically output by Discursis (which does provide its own set of automatically generated measures), but they are, we believe, among the most basic measures that can be computed from the Discursis output of the similarity matrices. One measure in particular—mean similarity—appeared to be the most sensitive to the biasing information experimental manipulation, but other metrics may also prove useful in different circumstances.
Total similarity
Total similarity (TS) is the grand sum of all cells (where each cell contains the conceptual similarity between a pair of utterances) above the diagonal of the similarity matrix, S. The lower triangular part is disregarded due to symmetry and the similarity of an utterance with itself is ignored. TS serves as measure and as a normalization factor for other measures and ratios. It is the defined as
$$ TS={\sum}_{i=1}^{N-1}{\sum}_{j=i+1}^NS\left(i,j\right). $$
(1)
As with all reported measures, the similarity sum may be divided into self and shared (Angus, Smith, & Wiles, 2012a) by multiplying each entry in the similarity matrix by either a self-matrix,
$$ T{S}_{self}={\sum}_{i=1}^{N-1}{\sum}_{j=i+1}^N self\left(i,j\right)S\left(i,j\right), $$
(2)
or a shared matrix,
$$ T{S}_{shared}={\sum}_{i=1}^{N-1}{\sum}_{j=i+1}^N shared\left(i,j\right)S\left(i,j\right), $$
(3)
where the values in the self-matrix are 1 when the rows and columns are coded as originating from the same speaker, and 0 otherwise; the shared matrix has the opposite coding.
Recurrence
Recurrence (REC) is a count of the number of cells above the diagonal of the similarity matrix S that exceed a threshold of similarity, denoted by ε. REC is a normalization factor for several other metrics, or is itself normalized by the number of utterances to yield a separate metric—percent recurrence. For our purposes, we chose ε to be equal to the smallest nonzero similarity score in the matrix so that all nonzero entries were counted,
$$ \varepsilon =\operatorname{inf}\left\{x\in S:0<x\right\}. $$
(4)
Once the threshold has been chosen, REC is computed as the sum of entries in S above the diagonal with a similarity equal to or greater than the threshold,
$$ REC={\sum}_{i=1}^{N-1}{\sum}_{j=i+1}^N\varTheta \left(S\left(i,j\right)-\varepsilon \right), $$
(5)
where Θ denotes the Heaviside function,
$$ \varTheta (x)=\left\{1\mid x\ge 0;0\mid x<0\right). $$
(6)
Overall similarity
Overall similarity (OS) is a measure of the average amount of similarity in cells above the diagonal of S, including zero cells. It is obtained by dividing total similarity by the total number of observations,
$$ OS=\frac{TS}{\frac{N^2-N}{2}}, $$
(7)
where N is the number of utterances. OS indexes the overall average density of similarity between all counted utterances, and can be broken down in terms of utterances that are made by the same person (OSself),
$$ O{S}_{self}=\frac{T{S}_{self}}{\frac{N^2-N}{2}}, $$
(8)
and utterances that are conceptually similar but that originated from different speakers (OSshared),
$$ O{S}_{shared}=\frac{T{S}_{shared}}{\frac{N^2-N}{2}}. $$
(9)
Mean similarity
Mean similarity (MS) is a measure of the average similarity of utterances similar to at least one key concept. It is the total similarity divided by the number of pairs of utterances that have some minimum amount of similarity between them,
$$ MS=\frac{TS}{REC}. $$
(10)
Conceptually, MS indexes the average degree of alignment between utterances that are projected onto at least one shared dimension in the semantic space. In other words, this value indicates how similar, on average, similar utterances tend to be, and can again be analyzed in terms of utterances that are made by the same person (MSself) and utterances made by different people (MSshared).
Percent recurrence
Percent recurrence (PREC) is the percentage of similar utterances to all utterances,
$$ PREC=\frac{REC}{\frac{N^2-N}{2}}\times 100. $$
(11)
This measure correlates highly, though not entirely, with OS. It can also be divided into utterances that are similar and repeated by the same person, PREC-Self, and utterances that are similar but made by different people, PREC-Shared.
Filtering by concepts
A conceptual recurrence matrix can be filtered by a subset of concepts (Angus, Smith, & Wiles, 2012b). Furthermore, this process may be extended by evaluating the exclusive appearance of a particular concept with respect to another. For instance, as part of the present analyses, we evaluated the occurrence of utterances similar to a particular concept (i.e., “blue”-C a correct answer to the Who component in the ELICIT task), but in which similarity to another specific concept was not present (i.e., “green”-I, an incorrect answer that was emphasized in the biasing factoids). We thus obtained an estimate of how tightly the groups focused on the correct or incorrect answer.
To filter a similarity matrix by a given concept means to eliminate all utterances from the analysis save those similar to the specified concept or concepts. The first step is to create a Boolean column vector, (of size Utterances × 1) containing a 1 if the specified concept is present in a given utterance, and a 0 otherwise. This vector may be calculated from the Discursis “concepts” output, an exportable file that gives the projection of every utterance onto the set of key concepts. To inclusively select more concepts, it is sufficient to take the Boolean of the vector sum of desired vectors. To create an exclusive-or filter, we applied the Heaviside function to the difference of the two concept vectors,
$$ {U}_{concept}=\varTheta \left({U}_i-{U}_j\right) $$
(12)
where Ui is the vector mapping of utterances to concepts to be kept, and Uj is a vector mapping of utterances to concepts to be excluded. The filter is then created by taking the product of the concept vector with its transpose,
$$ {M}_{concept}={U}_{concept}\times {U_{conccept}}^T, $$
(13)
resulting in an Utterances × Utterances matrix. The filter is then applied to the original similarity matrix by taking the Hadamard (element-wise) product of the original similarity matrix and the filter,
$$ {S}_{concept}=S\circ {M}_{concept}. $$
(14)
This similarity matrix then forms the basis for obtaining concept-specific derivations of the previously specified metrics (RECconcept, OSconcept, and MSconcept), each of which can be used to extract ratios that quantify the proportion of similarity accounted for by various partitions of the filtered matrix (see Table 1). We note that care should be taken in creating such exclusive-or filters, since utterances that are similar to both concepts are filtered out.
Table 1 Proportion of similarity accounted for by concept-specific measures Preprocessing
Although CRA in Discursis can automate many of the steps involved in conceptual analyses of textual data, it cannot easily deal with misspelled words, since each spelling variation would be identified as a unique concept. Furthermore, abbreviations occasionally used in place of semantically identical terms may have a similar effect of inflating the number of concepts. We treated both events as noisy processes that increase the dimensionality of a semantic space without adding meaningful conceptual discrimination. As such, prior to submission to CRA in Discursis, the data were processed by correcting misspellings and by mapping semantically identical abbreviations to common terms. For example, New Zealand—mentioned in some factoids—was referred to as “Zealand,” “New Zealand,” and “NZ” in the participant discussions; these were all changed to the term “Zealand.” We note that the latter step of mapping abbreviations to common terms is not necessary in Discursis but was an a priori decision made in order to minimize noise in a relatively small data set.
Analyses
To test the sensitivity of simple frequency counts to the order of biasing information, the relative frequencies of “blue”-C and “green”-I were calculated from the proportion of utterances produced by each team that contained either term. These results were submitted to a Concept (“blue”-C, “green”-I ) × Order of Biasing Information (distractor-first, distractor-last, and distractor-mixed) mixed ANOVA, with concept as a within-groups factor and order of biasing information as a between-groups factor.
As we described previously, it is possible to filter the conceptual recurrence matrix to emphasize or remove the influence of specific concepts in the analysis. In present analyses, the concepts “blue”-C and “green”-I were of import. As such, we conducted our analyses first by including all key concepts identified by Discursis (all key concepts active), and then by filtering out all key concepts except “blue”-C and “green”-I (selected key concept active). In addition to deriving key concepts for each individual set of communications, it is possible to supply a separate corpus to Discursis to identify key concepts as a basis for analyzing semantic similarity. This may be of interest for evaluating the extent to which utterances align along predetermined dimensions. For this study, we tested the difference in sensitivity to experimental conditions provided by projecting conversations onto bases that were extracted from the sets of factoids (factoid), or onto bases that were determined individually for each group from their actual communications (intrinsic). Finally, it is possible to set an upper limit on the number of key concepts that Discursis may extract from the corpus. We first verified that this setting made a difference in the number of extracted key concepts for both the intrinsic and factoid bases, and then we evaluated whether allowing the number of concepts to be automatically determined by Discursis (automatic) resulted in output that showed more or less sensitivity to the manipulations than did setting the maximum number of concepts to 100 (N100; the default setting).
To verify that the number of key concepts identified by Discursis was higher in the N100 than in the automatic condition, the number of key concepts identified for each group was submitted to a Number of Concepts Extracted (automatic, N100) × Source of Key Concepts (intrinsic, factoid) × Order of Biasing Information (distractor-first, distractor-last, and distractor-mixed) mixed ANOVA, with number of extracted concepts and source of key concepts as within-groups factors and order of biasing information as a between-groups factor.
The influences of three parameters on CRA similarity measures are reported: (1) the number of active key concepts (all key concepts, selected key concepts), (2) the source of the key concepts (intrinsic, factoid), and (3) the number of concepts extracted (N100, automatic). For each parameter setting, dependent measures of conceptual similarity were submitted to one-way between-groups ANOVAs, with order of biasing information as the factor. Measures included MS, the average degree of alignment of similar utterances; OS, the average amount of semantic similarity; and percent recurrence (PREC), the percentage of semantically similar utterances. When the source of similarity was investigated, similarity type (self, shared) was added as a within-groups ANOVA factor. When the degree of similarity as a function of specific terms was investigated (e.g., PREC-Concept—the proportion of similar utterances exclusively similar to either “green”-I or “blue”-C), term (“blue”-C, “green”-I) was added as a within-groups ANOVA factor. Due to the similarity of the latter analyses to those obtained from the “blue”-C and “green”-I specific proportions of MS, TS, OS, and PREC, results from the measures listed in Table 1 are not reported.
Prior to conducting planned inferential statistics, outlier analyses were performed using a repeated Grubb’s test within each condition separately for the analyses based on intrinsically defined concepts and for the analyses with concepts derived from the factoid lists. If a team was an outlier on more than one metric, that team was dropped from all analyses. Overall, three teams were removed (two from the distractor-first condition and one from the distractor-mixed condition). This resulted in a final sample of nine teams, with three teams in each of the bias conditions.