Introduction

Collaborative skills are highly relevant in many situations, ranging from computer-supported collaborative learning to collaborative problem-solving in professional practice (Fiore et al., 2018). While several broad collaborative problem-solving frameworks exist (OECD, 2017), most of them are situated in knowledge-lean settings. However, one example of collaborative problem-solving of knowledge-rich domains is collaborative diagnostic reasoning (CDR; Radkowitsch et al., 2022)—which aligns closely with medical practice—as this is a prototypical knowledge-rich domain requiring high collaboration skills in daily practice. In daily professional practice, physicians from different specialties often need to collaborate with different subdisciplines to solve complex problems, such as diagnosing, that is, determining the causes of a patient’s problem. Moreover, research in medical education and computer-supported collaborative learning suggests that the acquisition of medical knowledge and skills is significantly enhanced by collaborative problem-solving (Hautz et al., 2015; Koschmann et al., 1992). For problem-solving and learning, it is crucial that all relevant information (e.g., evidence and hypotheses) is elicited from and shared with the collaboration partner (Schmidt & Mamede, 2015). However, CDR is not unique to the medical field but also relevant in other domains, such as teacher education (Heitzmann et al., 2019).

The CDR model has been the basis of empirical studies and describes how individual characteristics and the diagnostic process are related to the diagnostic outcome. However, it has not yet been empirically tested, and the relationships between individual characteristics, diagnostic process, and diagnostic outcome remain mostly unexplored (Fink et al., 2023). The aim of this study is to test the CDR model by analyzing data from three studies with similar samples and tasks investigating CDR in a simulation-based environment. By undertaking these conceptual replications, we aspire to better understand the construct and the processes involved. As prior research has shown, collaboration needs to be performed at a high quality to achieve accurate problem solutions respectively learning outcomes (Pickal et al., 2023).

Collaborative Diagnostic Reasoning (CDR) Model

Diagnosing can be understood as the process of solving complex diagnostic problems through “goal-oriented collection and interpretation of case-specific or problem-specific information to reduce uncertainty” in decision-making through performing diagnostic activities at a high quality (Heitzmann et al., 2019, p. 4). To solve diagnostic problems, that is, to identify the causes of an undesired state, it is increasingly important to collaborate with experts from different fields, as these problems become too complex to be solved individually (Abele, 2018; Fiore et al., 2018). Collaboration provides advantages such as the division of labor, access to diverse perspectives and expertise, and enhanced solution quality through collaborative sharing of knowledge and skills (Graesser et al., 2018).

The CDR model is a theoretical model focusing on the diagnostic process in collaborative settings within knowledge-rich domains (Radkowitsch et al., 2022). The CDR model is based on scientific discovery as a dual-search model (SDDS; Klahr & Dunbar, 1988) and its further development by van Joolingen and Jong (1997). The SDDS model describes individual reasoning as the coordinated search through hypothetical evidence and hypotheses spaces and indicates that for successful reasoning it is important not only that high-quality cognitive activities within these spaces are performed but also that one is able to coordinate between them (Klahr & Dunbar, 1988). In the extended SDDS model (van Joolingen & Jong, 1997) focusing on learning in knowledge-rich domains, a learner hypothesis space was added including all the hypotheses one can search for without additional knowledge. Although Dunbar (1995) found that conceptual change occurs more often in groups than in individual work, emphasizing the importance of collaborative processes in scientific thinking and knowledge construction, the SDDS model has hardly been systematically applied in computer-supported collaborative learning and collaborative problem-solving.

Thus, the CDR model builds upon these considerations and describes the relationship between individual characteristics, the diagnostic process, and the diagnostic outcome. As in the SDDS model we assume that CDR involves activities within an evidence and hypotheses space; however, unlike the SDDS in the CDR model, these spaces are understood as cognitive storages of information. Which aligns more to the extended dual search space model of scientific discovery learning (van Joolingen & Jong, 1997). In summary we assume that coordinating between evidence (data) and hypothesis (theory) is essential for successful diagnosing. Further, the CDR model is extended to not only individual but also collaborative cognitive activities and describes the interaction of epistemic activities (F. Fischer et al., 2014) and collaborative activities (Liu et al., 2016) to construct a shared problem representation (Rochelle & Teasley, 1995) and effectively collaborate. Thus, we define CDR as a set of skills for solving a complex problem collaboratively “by generating and evaluating evidence and hypotheses that can be shared with, elicited from, or negotiated among collaborators” (Radkowitsch et al., 2020, p. 2). The CDR model also makes assumptions about the factors necessary for successful CDR. First, we look at what successful CDR means, why people differ, and what the mediating processes are.

Diagnostic Outcome: Accuracy, Justification, and Efficiency

The primary outcome of diagnostic processes, such as CDR, is the accuracy of the given diagnosis, which indicates problem-solving performance or expertise (Boshuizen et al., 2020). However, competence in diagnostic reasoning, whether it is done individually or collaboratively, also includes justifying the diagnosis and reaching it effectively. This is why, in addition to diagnostic accuracy, diagnostic justification and diagnostic efficiency should also be considered as secondary outcomes of the diagnostic reasoning process (Chernikova et al., 2022; Daniel et al., 2019). Diagnostic justification makes the reasoning behind the decision transparent and understandable for others (Bauer et al., 2022). Good reasoning entails a justification including evidence, which supports the reasoning (Hitchcock, 2005). Diagnostic efficiency is related to how much time and effort is needed to reach the correct diagnosis; this is important for CDR, as diagnosticians in practice are usually under time pressure (Braun et al., 2017). Both diagnostic justification and diagnostic efficiency are thus indicators of a structured and high-quality reasoning process. So, while in many studies, the focus of assessments regarding diagnostic reasoning is on the accuracy of the given diagnosis (Daniel et al., 2019), the CDR model considers all three facets of the diagnostic outcome as relevant factors.

Individual Characteristics: Knowledge and Social Skills

Research has shown that content knowledge, social skills, and, in particular, collaboration knowledge are important prerequisites for, and outcomes of, computer-supported collaborative learning (Jeong et al., 2019; Vogel et al., 2017). CDR has integrated these dependencies into its model structure. Thus, the CDR model assumes that people engaging in CDR differ with respect to their content knowledge, collaboration knowledge, and domain general social skills.

Content knowledge refers to conceptual and strategic knowledge in a specific domain (Förtsch et al., 2018). Conceptual knowledge encompasses factual understanding of domain-specific concepts and their interrelations. Strategic knowledge entails contextualized knowledge regarding problem-solving during the diagnostic process (Stark et al., 2011). During expertise development, large amounts of content knowledge are acquired and restructured through experience with problem-solving procedures and routines (Boshuizen et al., 2020). Research has repeatedly shown that having high conceptual and strategic knowledge is associated with the diagnostic outcome (e.g., Kiesewetter et al., 2020; cf. Fink et al., 2023).

In addition to content knowledge, the CDR model assumes that collaborators need collaboration knowledge. A key aspect of collaboration knowledge (i.e., being aware of knowledge distribution in the group; Noroozi et al., 2013) is the pooling and processing of non-shared information, as research shows that a lack of collaboration knowledge has a negative impact on information sharing, which in turn has a negative impact on performance (Stasser & Titus, 1985).

Finally, general social skills influence the CDR process. These skills mainly influence the collaborative aspect of collaborative problem-solving and less the problem-solving aspect (Graesser et al., 2018). Social skills are considered particularly important when collaboration knowledge is low (F. Fischer et al., 2013). CDR assumes that in particular the abilities to share and negotiate ideas, to coordinate, and to take the perspective are relevant for the diagnostic process and the diagnostic outcome (Radkowitsch et al., 2022; see also Liu et al., 2016, and Hesse et al., 2015).

Diagnostic Process: Collaborative Diagnostic Activities

The diagnostic process is thought to mediate the effect of the individual characteristics on the diagnostic outcome and is described in the CDR model using collaborative diagnostic activities (CDAs), such as evidence elicitation, evidence sharing, and hypotheses sharing (Heitzmann et al., 2019; Radkowitsch et al., 2022). One of the main functions of CDAs is to construct a shared problem representation (Rochelle & Teasley, 1995) by sharing and eliciting relevant information, as information may not be equally distributed among all collaborators initially. To perform these CDAs at a high quality, each collaborator needs to identify information relevant to be shared with the collaboration partner as well as information they need from the collaboration partner (OECD, 2017).

Evidence elicitation involves requesting information from a collaboration partner to access additional knowledge resources (Weinberger & Fischer, 2006). Evidence sharing and hypothesis sharing involve identifying the information needed by the collaborator to build a shared problem representation. This externalization of relevant information can be understood as the novelty aspect of transactivity (Vogel et al., 2023). However, challenges arise from a lack of relevant information due to deficient sharing, which can result from imprecise justification and insufficient clustering of information. In particular, research has shown that collaborators often lack essential information-sharing skills, such as identifying information relevant for others from available data, especially in the medical domain (Kiesewetter et al., 2017; Tschan et al., 2009).

It is crucial for the diagnostic outcome that all relevant evidence and hypotheses are elicited and shared for the specific collaborators (Tschan et al., 2009). However, diagnostic outcomes seem to be influenced more by the relevance and quality of the shared information than by their quantity (Kiesewetter et al., 2017; Tschan et al., 2009). In addition, recent research has shown that the diagnostic process is not only an embodiment of individual characteristics but also adds a unique contribution to diagnostic outcome (Fink et al., 2023). However, it remains difficult to assess and foster CDAs.

Collaboration in Knowledge-Rich Domains: Agent-Based Simulations

There are several challenges when it comes to modelling collaborative settings in knowledge-rich domains for both learning and research endeavors. First, many situations are not easily accessible, as they may be scarce (e.g., natural disasters) or too critical or overwhelming to be approached by novices (e.g., some medical procedures). In these cases, the use of simulation-based environments allows authentic situations approximating real-life diagnostic problems to be provided (Cook et al., 2013; Heitzmann et al., 2019). Further, the use of technology-enhanced simulations allows data from the ongoing CDR process to be collected in log files. This enables researchers to analyze process data without the need for additional assessments with dedicated tests. Analyzing process data instead of only product data (the assessment’s outcome) permits insights into the problem-solving processes leading to the eventual outcome (e.g., Goldhammer et al., 2017). Second, when using human-to-human collaboration, the results of one individual are typically influenced by factors such as group composition or motivation of the collaboration partner (Radkowitsch et al., 2022). However, we understand CDR as an individual set of skills enabling collaboration, as indicated by the broader definition of collaborative problem-solving (OECD, 2017). Thus, the use of simulated agents as collaboration partners allows a standardized and controlled setting to be created that would otherwise be hard to establish in collaborations among humans (Rosen, 2015). There is initial research showing that performance in simulations using computerized agents is moderately related to collaborative skills in other operationalizations (Stadler & Herborn et al., 2020). Thus, computerized agents allow for enhanced control over the collaborative process without significantly diverging from human-to-human interaction (Graesser et al., 2018; Herborn et al., 2020). Third, in less controlled settings it is hard to ensure a specific process is taking place during collaborative problem-solving. For example, when using a human-to-human setting, it is possible that, even though we envision measuring or fostering a specific activity (i.e. hypotheses sharing), it is not performed by the student. Through using an agent-based simulated collaboration partner, we can ensure that all required processes are taking place while solving the problem (Rosen, 2015).

Summarizing, by fostering a consistent and controlled setting, simulated agents facilitate the accurate measurement and enhancement of collaborative problem-solving. Evidential support for the application of simulated agents spans a variety of contexts, including tutoring, collaborative learning, knowledge co-construction, and collaborative problem-solving itself, emphasizing their versatility and effectiveness in educational settings (Graesser et al., 2018; Rosen, 2015).

Research Question and Current Study

In computer-supported collaborative learning there has been the distinction between approaches addressing collaboration to learn and approaches focusing on learning to collaborate. Our study is best understood as addressing the second approach, learning to collaborate. We want to better understand CDR to be able to facilitate collaborative problem-solving skills in learners. Thus, in this paper, we examine what it takes to be able to collaborate in professional practice of knowledge-rich domains, such as medical diagnosing.

When solving diagnostic problems, such as diagnosing a patient, it is often necessary to collaborate with experts from different fields (Radkowitsch et al., 2022). In CDR, the diagnostic outcome depends on effectively eliciting and sharing relevant evidence and hypotheses among collaborators, who often lack information-sharing skills (Tschan et al., 2009). Thus, the CDR model emphasizes the importance of high-quality CDAs influenced by content and collaboration knowledge as well as social skills to achieve accurate, justified, and efficient diagnostic outcomes (Radkowitsch et al., 2022).

This study reviews the relationships postulated in CDR model across three studies to test them empirically and investigate the extent to which the relationships in the CDR model are applicable across studies. By addressing this research question, the current study contributes to a better understanding of the underlying processes in collaborative problem-solving.

We derived a model (Fig. 1) from the postulated relationships made by the CDR model. We assume that the individual characteristics are positively related to the CDAs (Hypotheses 1–3), as well as that the CDAs are positively related to the diagnostic outcome (Hypotheses 4–6). Further, we expect that the relationship between the individual characteristics and the diagnostic outcome is partially mediated by the CDAs (Hypotheses 7–15).

Fig. 1
figure 1

Visualization of hypothesized relationships between individual characteristics, collaborative diagnostic activities, and diagnostic outcome

We used data from three studies with similar samples and tasks investigating CDR in an agent-based simulation in the medical domain. The studies can therefore be considered conceptual replication studies. Furthermore, we decided to use an agent-based simulation of a typical collaboration setting in diagnostic reasoning, namely the interdisciplinary collaboration between an internist and a radiologist (Radkowitsch et al., 2022).

Methods

Sample

To test the hypotheses, three studies were analyzed.Footnote 1Study A was carried out in a laboratory setting in 2019 and included medical students in their third to sixth years. Study B included medical students in their fifth to sixth years. Data collection for this study was online due to the pandemic situation in 2020 and 2021. In both studies, participation was voluntary, and participants were paid 10 per hour. Study C was embedded as an online session in the curriculum of the third year of medical school in 2022. Participation was mandatory, but permission to use the data for research purposes was given voluntarily. All participants took part in only one of the three studies. All three studies received ethical approval from LMU Munich (approval numbers 18-261, 18-262 & 22-0436). For a sample description of each study, see Table 1. We would like to emphasize that none of the students were specializing in internal medicine, ensuring that the study results reflect the competencies of regular medical students without specialized expertise.

Table 1 Sample description per study

Procedure

Each of the three studies was organized in the same way, with participants first completing a pretest that included a prior knowledge test, socio-demographic questions, and questions about individual motivational-affective characteristics (e.g., social skills, interest, and motivation). Participants then moved on to the CDR simulation and worked on the patient case. The patient case was the same for studies B and C, but was different for study A. The complexity and difficulty of the patient case did not vary systematically between the patient cases.

Simulation and Task

In the CDR simulation, which is also used as a learning environment, the task was to take over the role of an internist and to collaborate with an agent-based radiologist to obtain further information by performing radiological examinations to diagnose fictitious patient cases with the chief symptom of fever. Medical experts from internal medicine, radiology, and general medicine constructed the patient cases. Each case was structured in the same way: by studying the medical record individually, then collaborating with an agent-based radiologist, and finally reporting the final diagnosis and its justification again individually. For a detailed description on the development and validation of the simulation, see Radkowitsch and colleagues (2020).

Before working within the simulation, participants were presented with an instruction for the simulated scenario and informed what they were to do with it. Then, we instructed participants how to access further information in the medical record by clicking on hyperlinks, as well as how they could use the toolbar to make notes for the later in the process. Furthermore, we acquainted the students with how they could request further information through collaborating with a radiologist.

During the collaboration with an agent-based radiologist, participants were asked to fill out request forms to obtain further evidence from radiological examinations needed to diagnose the patient case. To effectively collaborate with radiologists, it is crucial for internists to clearly communicate the type of evidence required to reduce uncertainty (referred to as “evidence elicitation”) and share any relevant patient information such as signs, symptoms, and medical history (referred to as “evidence sharing”) as well as suspected diagnoses under investigation (referred to as “hypotheses sharing”) that may impact the radiologists’ diagnostic process. Only when participants shared evidence and hypotheses appropriately for their requested examination did they receive a description and evaluation of the radiologist’s radiologic findings. What was considered appropriate was determined by medical experts for each case and examination in preparation of the cases. Therefore, this scenario involves more than a simple division of tasks, as the quality of one person’s activity (i.e., description and evaluation of the radiologic findings) depends on the collaborative efforts (i.e., CDAs) of the other person (OECD, 2017)

Measures—Individual Characteristics

The individual characteristics were measured in the pretest. The internal consistencies of each measure per study are displayed in Table 4 in the Results section. We want to point out that the internal consistency of knowledge as a construct—determined by the intercorrelations among knowledge pieces—typically exhibits a moderate level. Importantly, recent research argues that a moderate level of internal consistency does not undermine the constructs’ capacity to explain a significant amount of variance (Edelsbrunner, 2024; Stadler et al., 2021; Taber, 2018).

Content knowledge was separated into radiology and internal medicine knowledge, as these two disciplines play a major role in the diagnosis of the simulated patient cases. For each discipline, conceptual and strategic knowledge was assessed (Kiesewetter et al., 2020; Stark et al., 2011). The items in each construct were presented in a randomized way in each study. However, the items for study C were shortened due to the embedding of the data collection in the curriculum. Therefore, items with a very high or low item difficulty in previous studies were excluded (Table 2).

Table 2 Overview of the number of questions in the content knowledge test

Conceptual knowledge was measured using single-choice questions including five options adapted from a database of examination questions from the Medical Faculty of the LMU Munich, focusing on relevant and closely related diagnoses of the patient cases used in the simulation. A mean score of 0–1 was calculated, representing the percentage of correct answers and indicating the average conceptual knowledge of the participant per medical knowledge domain.

Strategic content knowledge was measured contextually using key features questions (M. R. Fischer et al., 2005). Short cases were introduced followed by two to three follow up questions (e.g., What is your most likely suspected diagnosis?, What is your next examination?, What treatment do you choose?). Each question had eight possible answers, from which the learners were asked to choose one. Again, a mean score of 0–1 was calculated, representing the percentage of correct responses, indicating the average strategic content knowledge of the participant per domain.

The measure of collaboration knowledge was consistent across the three studies and specific to the simulated task. Participants were asked to select all relevant information for seven different patient cases with the cardinal symptom of fever (internal medicine). The patient cases were presented in a randomized order and always included 12 pieces of information regarding the chief complaints, medical history, and physical examination of the patient cases. We then assessed whether each piece of information was shared correctly (i.e. whether relevant information was shared and irrelevant information was not shared) and assigned 1 point and divided it by the maximum of 12 points to standardized the range of measure to 0–1. Then we calculated a mean score for each case and then across all cases, resulting in a range of 0–1 indicating the participants’ collaboration knowledge

The construct of social skills was consistent across the three data collections and was measured on the basis of self-report on a 6-point Likert scale ranging from total disagreement to total agreement. The construct was measured using 23 questions divided into five subscales; for example items, see Table 3. Five questions aimed to measure the overall construct, and the other four subscales were identified using the complex problem-solving frameworks of Liu et al. (2016) and Hesse et al. (2015): perspective taking (four questions), information sharing (five questions), negotiation (four questions), and coordination (five questions). For the final score, the mean of all subcategories was calculated, ranging from 1 to 6, representing general social skills.

Table 3 Example items for each subscale for measuring social skills

Measures—Collaborative Diagnostic Activities (CDA)

We operationalize CDAs in the pretest case in terms of quality of evidence elicitation, evidence sharing, and hypotheses sharing. The internal consistencies of each measure per study are displayed in Table 4 in the Results section.

Table 4 Means, standard deviations, and internal consistency for individual characteristics, collaborative diagnostic activities, and diagnostic outcome per study

The quality of evidence elicitation was measured by assessing the appropriateness of the requested radiological examination for the indicated diagnosis. An expert solution was developed to indicate which radiological examinations were appropriate for each of the possible diagnoses. If participants requested an appropriate radiological examination for the indicated diagnoses, they received 1 point for that request attempt. Finally, a mean score across all request attempts (maximum of 3) was calculated and scored. The final mean score was transformed into a binary indicator, with 1 indicating that all requested radiological examinations were appropriated and 0 indicating that inappropriate radiological examinations were also requested, due to the categorical nature of the original data and its skewed distribution, with a majority of responses concentrated in a single category.

The quality of evidence sharing was measured using a precision indicator. This was calculated as the proportion of shared relevant evidence out of all shared evidence. Relevant evidence is defined per case and per diagnosis and indicated by the expert solution. The precision indicator was first calculated per radiological request. We then calculated the mean score, summarizing all attempts in that patient case. This resulted in a range from 0 points, indicating that only irrelevant evidence was shared, to 1 point, indicating that only relevant evidence was shared.

The quality of hypotheses sharing was also measured using a precision indicator. For each patient case, the proportion of diagnoses relevant for the respective patient case to all shared diagnoses was calculated. Which diagnoses were considered relevant for a specific case was determined by an expert solution. As with evidence elicitation, this score was evaluated and converted into a binary variable, where 1 indicated that only relevant diagnoses were shared and 0 indicated that also irrelevant diagnoses were shared, due to the categorical nature of the original data and its skewed distribution, with a majority of responses concentrated in a single category.

Measures—Diagnostic Outcome

We operationalize diagnostic outcome in the pretest case in terms of diagnostic accuracy, diagnostic justification, and diagnostic efficiency.

For diagnostic accuracy, a main diagnosis was assigned to each patient case as expert solution. After working on the patient case and requesting the radiological examination, participants indicated their final diagnosis. To do this, they typed in the first three letters of their desired diagnosis and then received suggestions from a list of 249 possible diagnoses. Diagnostic accuracy was then calculated by coding the agreement between the final diagnosis given and the expert solution. Accurate diagnoses (e.g., hospital-acquired pneumonia) were coded as 1, correct but inaccurate diagnoses (e.g., pneumonia) were coded as 0.5, and incorrect diagnoses were coded as 0. A binary indicator was used for the final diagnostic accuracy score, with 0 indicating an incorrect diagnosis and 1 indicating an at least inaccurate diagnosis, due to the categorical nature of the original data and its skewed distribution, with a majority of responses concentrated in a single category.

A prerequisite for diagnostic justification and diagnostic efficiency is the provision of at least an inaccurate diagnosis. If a participant provided an incorrect diagnosis (coded as 0), diagnostic justification and diagnostic efficiency were immediately scored as 0.

After choosing a final diagnosis, participants were asked to justify their decision in an open text field. Diagnostic justification was then calculated as the proportion of relevant reported information out of all relevant information that would have fully justified the final accurate diagnosis. Again, medical experts agreed on an expert solution that included all relevant information to justify the correct diagnosis. The participants’ solution was coded by two independent coders, each coding the full data, and differences in coding were discussed until the coders agreed. We obtained a range from 0 points, indicating a completely inadequate justification, to 1 point, indicating a completely adequately justified final diagnosis.

Diagnostic efficiency was defined as diagnostic accuracy (non-binary version) divided by the minutes required to solve the case.

Statistical Analyses

To answer the research question, a structural equation model (SEM) was estimated using MPlus Editor, version 8 (Muthén & Muthén, 2017). We decided to use a SEM, as it is a comprehensive statistical approach widely used in psychology and educational sciences for its ability to model complex relationships among observed and latent variables while accounting for measurement error (Hilbert & Stadler, 2017). SEM support the development and verification of theoretical models, enabling scholars to refine theories and interventions in psychology and education based on empirical evidence, as not only can one relationship be investigated but a system of regressions is also considered simultaneously (Nachtigall et al., 2003).

We included all links between the variables and applied a two-step approach, using mean-adjusted and variance-adjusted unweighted least squares (ULSMV, Savalei & Rhemtulla, 2013) as the estimator and THETA for parametrization, first examining the measurement model and then the structural model. The assessment of model fit was based on chi-square (χ2), root mean square error of approximation (RMSEA), and comparative fit index (CFI). Model fit is generally indicated by small chi-squared values; RMSEA values of < 0.08 (acceptable) and < 0.06 (excellent), and CFI values ≥ 0.90. We do not consider standardized root mean squared residual (SRMR), because, according to the definition used in MPlus, this index is not appropriate when the sample size is 200 or less, as natural variation in such small samples contributes to larger SRMR values (Asparouhov & Muthén, 2018). For hypotheses 1–6, we excluded path coefficients < 0.1 from our interpretation, as they are relatively small. In addition, at least two interpretable path coefficients, of which at least one is statistically significant, are required to find support for the hypothesis. For hypotheses 7–15, specific indirect effects (effect of an individual characteristic on diagnostic outcome through a specific CDA) and total indirect effects (mediation of the effect of an individual characteristic on diagnostic outcome through all mediators) were estimated.

We reported all measures in the study and outlined differences between the three samples. All data and analysis code have been made publicly available at the Open Science Framework (OSF) and can be accessed at https://osf.io/u8t62. Materials for this study are available by email through the corresponding author. This study’s design and its analysis were not pre-registered.

Results

The descriptive statistics of each measure per study are displayed in Table 4. The intercorrelations between the measures per study can be found in Appendix Table 7.

Overall Results of the SEM

All loadings were in the expected directions and statistically significant, except for conceptual knowledge in internal medicine in study C (λ = 0.241, p = .120), conceptual knowledge in radiology in study A (λ = 0.398, p = .018), and strategic knowledge in internal medicine (λ = 0.387, p = .206) and radiology (λ = -0.166, p = .302) in study B. Standardized factor loadings of the measurement model are shown in Appendix Table 8.

The SEM has a good fit for study A [X2(75) = 74.086, p = .508, RMSEA = 0.00, CFI = 1.00], study B [X2(75) = 68.309, = .695, RMSEA = 0.000, CFI = 1.00], and study C [X2(75) = 93.816, = .070, RMSEA = 0.036, CFI = 1.00].

Paths between Individual Characteristics, CDAs, and Diagnostic Outcome

The standardized path coefficients and hypotheses tests for the theoretical model are reported in Table 5. An overview of the paths supported by the data is shown in Fig. 2.

Table 5 Standardized paths coefficients (β) and standard errors (SE) for paths between individual characteristics, collaborative diagnostic activities, and diagnostic outcome per study
Fig. 2
figure 2

Evidence on supported relationships between individual characteristics, collaborative diagnostic activities, and diagnostic outcome

Overall, the R2 for the CDAs ranged from medium to high for evidence elicitation and evidence sharing, depending on the study, and were consistently low for hypotheses sharing across all three studies. Looking at diagnostic outcome, R2 is consistently large for diagnostic accuracy and medium to large for diagnostic justification and diagnostic efficiency (Table 6).

Table 6 R2 for collaborative diagnostic activities and diagnostic outcome per study

The path from content knowledge to evidence elicitation was positive and > 0.1 in all three studies, as well as statistically significant in two of them; therefore, we consider Hypothesis 1a supported. The path from content knowledge to evidence sharing was positive and > 0.1 in two studies, as well as statistically significant in one of them; therefore, Hypothesis 1b is also supported. In contrast, the path from content knowledge to hypotheses sharing was indeed also positive in two studies, but as neither was statistically significant, we conclude that Hypothesis 1c was not supported. The path from collaboration knowledge to evidence elicitation was positive and > 0.1 in only one study, but also not statistically significant. Thus, we found that Hypothesis 2a was not supported. For the path from collaboration knowledge to evidence sharing, we found relevant positive and statistically significant coefficients in all three studies. Hypothesis 2b is therefore fully supported by the data. This is not the case for Hypothesis 2c, for which we found no coefficient > 0.1 for the path from collaboration knowledge to hypotheses sharing. For the path from social skills to evidence elicitation, we found positive coefficients > 0.1 in two out of three studies, of which one was also statistically significant. Thus, we consider Hypothesis 3a to be supported. For the path from social skills to evidence sharing, we again found one statistically significant positive coefficient, but in the other two studies it was < 0.1. Therefore, we do not consider Hypothesis 3b to be supported by the data. The same applies to the path from social skills to hypotheses sharing, where the coefficient is < 0.1 in two studies. We therefore do not consider Hypothesis 3c to be supported.

The path from evidence elicitation to diagnostic accuracy was statistically significant and large in magnitude in two out of three studies. Hypothesis 4a is therefore supported. The path from evidence elicitation to diagnostic justification was only positive and > 0.1 in one study, which was also not statistically significant. Therefore, we find no support for Hypothesis 4b. In contrast, the path from evidence elicitation to diagnostic efficiency was positive and statistically significant in two out of three studies, with one large effect. Hypothesis 4c is therefore supported. The path from evidence sharing to diagnostic accuracy was only positive and reasonably large in one study. Therefore, we do not find support for Hypothesis 5a. The path from evidence sharing to diagnostic justification was positive and > 0.1 in two studies as well as statistically significant in one of them, so Hypothesis 5b is supported. In contrast, we did not find a positive coefficient > 0.1 for the path from evidence sharing to diagnostic efficiency. Therefore, Hypothesis 5c is not supported by the data. Although we found coefficients > 0.1 in two studies for the path from hypotheses sharing to diagnostic accuracy, we found no support for Hypothesis 6a, as none of these was statistically significant. This is different for Hypothesis 6b, as we found two positive paths from hypotheses sharing to diagnostic justification, one of which was statistically significant and large. Finally, we found two positive paths from evidence sharing to diagnostic efficiency in three studies, one of which was statistically significant. Hypothesis 6c is therefore supported.

Indirect Effects between Individual Characteristics, CDA, and Diagnostic Outcome

Indirect effects of CDAs on the effect of individual characteristics on the diagnostic outcome in CDR were estimated to test hypotheses 7–15. Although we found a mediating effect of all CDAs (β = .31, p = .008), and specifically for evidence elicitation (β = .27, p = .021) from content knowledge on diagnostic accuracy in study C, and some significant overall and direct effects for other relationships (Appendix Table 9), none of these were consistent across all of the studies. Thus, we conclude no consistent support for any of the Hypotheses 7–15.

Discussion

The aim of the current study was to investigate the extent to which the relationships specified in the CDR model (Radkowitsch et al., 2022) are applicable across studies, to better understand the processes underlying CDR in knowledge-rich domains. Not only is this exploration crucial for the medical field or collaborative problem-solving in knowledge-rich domains, but it also offers valuable insights for computer-supported collaborative learning research. Despite CDR’s specific focus, the principles and findings have relevant implications for understanding and enhancing collaborative processes in various educational and professional settings.

Specifically, we investigated how individual learner characteristics, the CDAs, and the diagnostic outcome are related. We therefore analyzed data from three independent studies, all from the same context, a simulation-based environment in the medical domain. Our study found positive relationships between content knowledge and the quality of evidence elicitation as well as the quality of evidence sharing, but not for the quality of hypotheses sharing. Furthermore, collaboration knowledge is positively related to the quality of evidence sharing, but not to the quality of evidence elicitation and the quality of hypotheses sharing. Social skills are only positively related to the quality of evidence elicitation. This underscores the multifaceted nature of collaborative problem-solving situations. Thus, effective CDR, a form of collaborative problem-solving, necessitates a nuanced understanding of the interplay between individual characteristics and CDAs.

The relevance of content knowledge for diagnostic competence is well established in research (Chernikova et al., 2020). To develop any diagnostic skills in knowledge-rich domains, learners need to acquire large amounts of knowledge and to restructure it through experience with problem-solving procedures and routines (Boshuizen et al., 2020). In the case of CDR this enables the diagnostician to come up with an initial suspected diagnosis, which is likely to be relevant information for the collaboration partner and to guide the further CDAs effectively. The finding that content knowledge only has a relation to the quality of evidence elicitation but none of the other CDAs can be explained by the fact that evidence elicitation is the least transactive CDA within the collaborative decision-making process. When eliciting evidence, the collaboration partner is used as an external knowledge resource (Weinberger & Fischer, 2006). So, despite being a collaborative activity, evidence elicitation is about what information from the collaboration partner is needed rather than what the collaboration partner needs. Thus, elicitation is less transactive than sharing, which is focused at what the collaboration partner needs.

Not only content knowledge but also collaboration knowledge is related to the quality of evidence sharing. This finding implies that collaboration knowledge may influence the CDR above and beyond individual content knowledge. It also supports the differentiation of knowledge types made in the CDR model (Radkowitsch et al., 2022). Thus, it is important to learn not only the conceptual and strategic medical knowledge that is required for diagnosing but also knowledge about what information is relevant for specific collaboration partners when diagnosing collaboratively. This finding underpins the importance of being aware of the knowledge distribution among collaboration partners and the relevance of the transactive memory (Wegner, 1987). Thus, for collaborative problem-solving in knowledge-rich domains—as for computer-supported collaborative learning more generally—knowledge and information awareness is crucial (Engelmann & Hesse, 2010).

Thus, the relevance of collaboration knowledge in collaborative problem-solving is an important finding of our study, highlighting that it is critical in facilitating effective collaborative processes and outcomes. The current findings emphasize the need for educational strategies that explicitly target the development of collaborative knowledge to ensure that learners have the knowledge and skills necessary to participate in productive collaborative problem-solving and computer-supported collaborative learning processes. In doing so, the CDR model emphasizes the need for learners to master collaborative skills and build shared problem representations to take full advantage of collaborative learning opportunities.

As CDR is conceptualized to be an interplay of cognitive and social skills (Hesse et al., 2015), we also assumed that social skills are related to CDAs. However, we only found evidence of the expected relationship between social skills and CDAs for the quality of evidence elicitation. One explanation could be that collaboration knowledge was relatively high in all three samples, outweighing the influences of general skills. This is consistent with the assumption of the CDR model that the influence of more general social skills is reduced with an increasing level of professional collaboration knowledge (Radkowitsch et al., 2022). When collaboration knowledge is available to the diagnosticians, it becomes more important than social skills. This finding again underlines the importance of collaboration knowledge, which can be seen as a domain- and profession-specific development of social skills. However, another explanation could be that, when collaborating with an agent, the effect of social skills decreases, as the agent was not programmed to respond to social nuances. The design of the simulation would thus buffer against the effect of social skills. Although the study by Herborn et al. (2020) found no differences between human-to-human and human-to-agent collaboration, this does not necessarily invalidate the potential variability in outcomes associated with the social skills incorporated into the agent. For a thorough investigation into the impact of social skills, the agent would need variable social abilities, enabling the variation of the importance of basic social skills for successful collaboration.

Further, we need to conclude that there is no support for a relationship between the individual characteristics and hypotheses sharing, as we found no stable support for the relationship between any of the individual characteristics and the quality of hypotheses sharing. One possible explanation could be that the binary precision measure used to operationalize quality in hypotheses sharing is not sensitive enough or is not capturing the relevant aspect of quality in that activity. Another explanation could be that there is no direct relationship between the individual characteristics and hypotheses sharing, as this relationship is mediated by evidence sharing and thus influenced by the activated knowledge scripts (Schmidt & Rikers, 2007).

Looking at the relationships between CDAs and the diagnostic outcome, the current results highlight the need to distinguish between primary (diagnostic accuracy) and secondary (diagnostic justification and efficiency) outcomes of diagnostic reasoning (Daniel et al., 2019). Achieving diagnostic accuracy, a purely quantitative outcome measure, is less transactive than other aspects of the diagnostic outcome. This is also where we find the link to evidence elicitation, as we consider this to be the least transactive CDA within the collaborative decision-making process. However, the ability to justify and reach this decision efficiently is then highly dependent on evidence sharing and hypotheses sharing, activities that are more focused on transactivity within CDR (Weinberger & Fischer, 2006).

Although individual learner characteristics are found to have an effect on CDAs, and CDAs impact the diagnostic outcome, the effect is not mediated by CDAs across studies. Thus, we assume that, for effective collaborative problem-solving in knowledge-rich domains, such as CDR, it is not enough to have sufficient content and collaboration knowledge; it is also necessary to be able to engage in high quality CDAs to achieve a high-quality diagnostic outcome. This is consistent with research on individual diagnostic reasoning, which shows that diagnostic activities have a unique contribution to the diagnostic outcome after controlling for content knowledge (Fink et al., 2023).

In summary, we explored evidence elicitation, evidence sharing, and hypotheses sharing as crucial CDAs. The findings revealed diverse associations of these CDAs with individual characteristics and facets of the diagnostic outcome, supporting the notion that the CDR-process involves a variety of different skills (instead of being one overarching skill). On the basis of these results, we propose categorizing CDAs into activities primarily focused on individual goals and needs (e.g., elicitation) and more transactive activities directly targeted at the collaborator (e.g., sharing). To enhance quality in CDAs, instructional support should be considered. For instance, providing learners with an adaptive collaboration script has been shown to improve evidence sharing quality and promote the internalization of collaboration scripts, fostering the development of collaboration knowledge (Radkowitsch et al., 2021). Further, group awareness tools, such as shared concept maps, should be considered to compensate for deficits in one’s collaboration knowledge (Engelmann & Hesse, 2010). However, what is required to engage in high-quality CDAs remains an open question. One starting point is domain-general cognitive skills. These could influence CDAs, particularly in the early stages of skill development (Hetmanek et al., 2018). Previous research showed that, in diagnostic reasoning, instructional support is more beneficial when being domain-specific than domain-general (Schons et al., 2022). Thus, there is still a need for further research on how such instructional support might look like.

Future Research

Although we used data from three studies, all of them were in the same domain; thus, it remains an open question whether these findings are applicable across domains. The CDR model claims that the described relationships are not limited to the medical domain, but rather are valid across domains for collaboratively solving complex problems in knowledge-rich domains. Future research should explore generalizability, for example, for teacher education, which is a distinct field that also requires diagnosing and complex problem-solving (Heitzmann et al., 2019).

Regardless of domain, the non-mediating relationship of CDAs between individual characteristics and diagnostic outcomes, as well as the found effects of the CDAs in the current study, suggests that an isolated analysis of CDAs does not fully represent the complex interactions and relationships among activities, individual characteristics, and diagnostic outcomes. Future studies might assess CDAs as a bundle of necessary activities, including a focus on their possible non-linear interactions. We propose to use process data analysis to account for the inherent complexity of the data, as different activities in different sequences can lead to the same outcome (Y. Chen et al., 2019). More exploratory analyses of fine-grained, theory-based sequence data are needed to provide insights into more general and more specific processes involved in successful solving complex problems collaboratively (Stadler et al., 2020).

As our results have shown, collaboration knowledge and thus awareness of the knowledge distribution among collaboration partners is highly relevant. While a recent meta-analyses showed a moderate effect of group awareness of students’ performance in computer-supported collaborative learning (D. Chen et al., 2024), it has so far not been systematically investigated in collaborative problem-solving. Thus, more research on the influence collaboration knowledge in collaborative problem-solving is needed.

Further, additional factors associated with success in collaborative problem-solving—not yet incorporated into the model and thus not yet investigated systematically—include communication skills (OECD, 2017), the self-concept of problem-solving ability (Scalise et al., 2016), and positive activating emotions during problem-solving tasks (Camacho-Morles et al., 2019).

Limitations

There are, however, some limitations to be considered. One is that we have only considered CDAs and how they relate to individual characteristics and outcomes. However, the CDR model also introduces individual diagnostic activities, such as the generation of evidence and the drawing of conclusions. These occur before and after the CDAs and may therefore also have an impact on the described relationships. However, we decided to focus on the CDAs within the CDR process because they are particularly relevant for constructing a shared problem representation, being central to CDR. Future research might consider these individual diagnostic activities, as they could, for example, further explain the how content knowledge is related to the diagnostic outcome.

Another limitation of the current analyses is the operationalization of quality for the CDAs. We chose the appropriateness of radiological examination for the indicated diagnosis for quality of evidence elicitation and precision for quality of evidence sharing and hypotheses sharing. However, all of these only shed light on one perspective of each activity, while possibly obscuring others. For example, it may be that content knowledge is not related to the precision of hypotheses sharing, but this may be different when looking at other quality indicators, such as sensitivity or specificity. However, we decided to use the precision aspect of activities, as research shows that collaborators often fail to identify relevant information, and the amount of information is not related to performance (Tschan et al., 2009). Future research may explore a broader variety of quality indicators to be able to assess the quality of CDAs as comprehensively as possible. It should also be noted that in study B a suppression effect (Horst, 1941) between hypothesis sharing and evidence elicitation artificially inflated the observed effect size. This is to be expected with process data that can be highly correlated and needs to be considered when interpreting the effect sizes.

In addition, it should be noted that the omega values obtained for the conceptual and strategic knowledge measures were below the commonly accepted threshold of 0.7. While we chose to use omega values as a more appropriate measure of reliability in our context, given the complex and multifaceted nature of the knowledge constructs, these lower-than-expected values raise important questions about the quality of the data and the robustness of the findings. Thus, it is important to understand that knowledge constructs, by their very nature, may not always exhibit high levels of internal consistency due to the diverse and interrelated components they encompass (Edelsbrunner, 2024; Stadler et al., 2021; Taber, 2018). This complexity may be reflected in the moderate omega values observed, which, while seemingly counterintuitive, does not invalidate the potential of the constructs to account for substantial variance in related outcomes. However, findings related to these constructs should be interpreted with caution, and the results presented should be considered tentative. Future research should further explore the implications of using different reliability coefficients in assessing complex constructs within the learning sciences, potentially providing deeper insights into the nuanced nature of knowledge and its measurement.

Another limitation of this study is related to the agent-based collaboration, as a predictive validation of collaborative problem-solving for later human-to-human collaboration in comparable contexts has not yet been systematically conducted. Although the agent-based collaboration situation used has been validated in terms of perceived authenticity, it still does not fully correspond to a real collaboration situation (Rosen, 2015). This could be an explanation for the low influence of social skills, as the setting might not require the application of a broad set of social skills (Hesse et al., 2015; Radkowitsch et al., 2020). In a real-life collaboration, the effects of social skills might be more pronounced. However, research showed that the human-to-agent approach did not lead to different results in collaborative problem-solving than the human-to-human approach in the 2015 PISA study, and correlations with other measures of collaborative skills have been found (Herborn et al., 2020; Stadler, Herborn et al., 2020). Future studies should specifically test the relevance of social skills for CDR in a human-to-human setting to strengthen the generalizability of our findings.

Conclusion

In conclusion, the current study highlights the importance of individual characteristics and CDAs as independent predictors for achieving good diagnoses in collaborative contexts, at least in the simulation-based settings we used in the studies included in our analysis. Collaboration knowledge emerged as a critical factor, demonstrating its importance over early acquired, general social skills. Therefore, it is imperative to revise the CDR approach by giving higher priority to the proficiency of collaboration knowledge compared with social skills. Furthermore, we conclude that, in simulation-based CDR, content knowledge does not play such a crucial role in predicting diagnostic success compared with many other educational settings, most probably because of the endless opportunities for retrying and revising in simulation-based learning environments.

With respect to CDAs, we suggest refining the perspective on the quality of CDAs and consider revising the CDR model by summarizing CDAs as information elicitation and information sharing, with the former being less transactive, and thus, less demanding than the latter. Adequate performance in both types of CDA is presumed to result in a high-quality shared problem representation, resulting in good diagnostic outcome. Collaborative problem-solving skills are highly relevant in professional practice of knowledge-rich domains, highlighting the need to strengthen these skills in students engaged in CDR and to provide learning opportunities accordingly. Further, the ability to effectively collaborate and construct shared problem representations is important, not only in CDR but also in collaborative problem-solving and computer-supported collaborative learning more in general, highlighting the need for integrating such skills into curricula and instructional design.

By emphasizing these aspects, we can improve the diagnostic skills of individuals in collaborative settings. Through advancing our understanding of CDR, we are taking a key step forward in optimizing collaborative problem-solving and ultimately contributing to improved diagnostic outcomes in various professional domains beyond CDR in medical education. In particular, integrating collaboration knowledge and skills into computer-supported collaborative learning environments can enrich learning experiences and outcomes in various knowledge-rich domains.