Assessing Transdisciplinary Scholarly Development: A Longitudinal Mixed Method Graduate Program Evaluation

Transdisciplinary (TD) graduate training programs are growing in number, yet little is known about their effectiveness or the development of TD attitudes and behaviors among students over time. This prospective longitudinal mixed methods study compares graduate students participating in a federally funded TD training program with non-participating students from the same disciplines and degree programs (n = 26). The Interdisciplinary Perspectives Index (IPI) and Behavior Change Collaborative Activity Index (BCCAI) were used to assess TD attitudes and behaviors at beginning, middle, and end of an MPH/PhD program. Additionally, a multiple case-based approach was used to further analyze changes among the TD students at three time points (n = 10), including a novel sketch protocol to elicit TD student conceptualizations (mental maps) of TD teams. Four assessments were used to construct an overall TD orientation score. Wilcoxon Signed Rank Tests showed TD behaviors increased over time only among TD students, and favorable TD attitudes were high at baseline and did not change for any group. Generalized Estimating Equations showed that TD behaviors were higher among TD students than traditional students at both mid and endpoint, with no difference at baseline. Visual assessments showed TD students’ mental maps of TD research and team science, elicited under a novel sketch protocol, reflected greater integration and organization by endpoint. Two developmental patterns of increasing overall TD orientation emerged among the TD students. This article reports findings and insights applicable to TD graduate education and curriculum design and introduces a novel visual assessment tool.

Transdisciplinary approaches are widely recognized as essential for addressing complex health challenges (Colditz & Wolin, 2011;Irwin et al., 2021;National Institutes of Health, 2018;Rimer & Abrams, 2012), reducing health disparities (Heinzmann et al, 2019;Kuo et al., 2017), and leveraging innovation in fields such as data science and neuroscience (Galea et al., 2020;Stokols, 2018;Suleiman & Dahl, 2017). The Science of Team Science field has emerged to study the value of TD team science (Hall et al., 2012;Stokols et al., 2008). While outcome studies have largely focused on scholarly productivity in publication quantity and quality (Hall et al., 2018), team science also aims to study team composition and factors that enhance team science processes (Falk-Krzesinski et al., 2011;Irwin et al., 2021;Misra et al., 2015;Patel et al., 2021). One study measured scientific productivity in center-and investigator-initiated federal grants and found that, after three, five, and 10 years, the coordinated TD research initiatives led to higher productivity than traditional research models at all three time points (Hall et al., 2012). A systematic review of 10 years of research on team science concluded that collaborative team science is both effective and impactful (Little et al., 2017). Science of Team Science researchers have developed frameworks and principles for evaluating TD research (Kemp & Nurius, 2015;Klein, 2008), identified features of successful teams (Nash, 2008), and developed instruments to measure collaborative behaviors, TD attitudes, and teamwork (Hall et al., 2018;Mâsse et al., 2008;Misra et al., 2009Misra et al., , 2015. Conceptual and measurement development has demonstrated that TD orientation has at least two dimensions: attitudinal (values, attitudes, and beliefs) and behavioral (research activities and conceptual skills), and higher TD orientation has been correlated with greater TD scientific outputs (Misra et al., 2015;Stokols, 2018).
The push for more integrated approaches to address complex health issues has resulted in steady growth of TD graduate training programs (Vogel et al., 2014). Researchers have assessed student interdisciplinary orientation cross-sectionally (Shandas & Brown, 2016), and trainee satisfaction or productivity after program completion (Irwin, et al., 2021;Patel et al., 2021), but few studies have examined the effectiveness of graduate level TD programs or monitored change in students' TD understanding over time compared to students in traditional programs. Little is known about the process of scholarly development among TD students in graduate training programs, from baseline to completion. Questions remain about the role of self-selection: whether individuals with a high aptitude for TD self-select into becoming TD researchers, or if TD orientation can be developed through training.

3
The literature suggests that individual characteristics, such as openness and respect for multi-disciplinary viewpoints, tend to enhance students' adoption of a TD orientation (Nash, 2008), but empirical studies of students throughout their participation in formal graduate TD training programs are lacking, especially those that include comparison groups. Better understanding is needed about baseline and changes in attitudes, behaviors, and conceptualizations of TD approaches over time among students in TD programs. Finally, application of qualitative and mixed methods approaches to evaluation, so aptly suited to these inquires, have been underutilized.
To address these gaps, this study used multiple methods to assess change in doctoral student conceptualization and understanding of TD research over approximately five years at a large public research university with a federally funded TD doctoral training program. This study is a component of a larger evaluation, which also includes objective assessment of student productivity, impact, and collaboration (Keck et al., 2017a(Keck et al., , 2017b at graduation and beyond, and focus group data collected at three time points among students and faculty mentors (Keck et al., 2017a(Keck et al., , 2017b. The objectives of this study were to (1) describe growth and development in TD attitudes and practices over time among graduate students in a combined Masters of Public Health (MPH)/PhD TD training program compared to traditional MPH and/or doctoral students; (2) assess individual level TD development and changes in TD knowledge, attitudes, and practices over time among TD students using multiple methods; and (3) explore the utility and validity of a novel visual method of assessing TD development over time. Findings are discussed in the context of TD student scholarly development and curriculum planning for higher education.

Program Description
The Illinois Transdisciplinary Obesity Prevention Program (I-TOPP) is a TD joint degree (PhD/MPH) program that integrates education, research, and practice, which was funded by the United States Department of Agriculture. The program enrolled 13 doctoral students and 10 completed the program during the funding period; two transferred out in the first year and one completed coursework only. The TD program includes didactic coursework and practicum experience for the MPH degree, full requirements for a doctoral degree in one of six disciplines, additional coursework in TD approaches to childhood obesity prevention, and applied TD research and mentoring (see Fig. 1). The structure of this TD program supports an environment where graduate students learn the language of other disciplines while maintaining rigor in a primary disciplinary science. As such, students pursue an evolving disciplinary and professional identity, while their development is informed and broadened by exposure to cross-disciplinary models, theories, and methods that promote TD perspectives and learning (Kemp & Nurius, 2015).
Students were recruited from diverse fields of study encompassing basic, behavioral, and social sciences to pursue doctoral training in one of six participating units. They spent the first 2 years completing the MPH program, taking 1 3 TD courses, engaging in research, and preparing for qualifying exams. By the third year, research was the central activity. Students formulated their dissertation research proposals in consultation with a primary and secondary advisor from two different disciplines, and they were encouraged to seek advice from faculty across campus with needed expertise. Each TD student also selected an advisory committee that included interdisciplinary faculty in consultation with their co-advisors. This committee guided the student in selecting appropriate elective coursework and formulating an obesity-related TD research project. Students met with their advisory committee at least annually to set research and scholarly development goals for the year and completed an annual individual development plan. Most students defended their dissertation within 5 years. This structure is consistent with a multi-component developmental approach to TD scholarly training (Kemp & Nurius, 2015;Stokols et al., 2013).

Participants
All students enrolled in the TD program, the MPH program, or a traditional PhD program in the target disciplines ( Fig. 1) who began graduate school in fall 2011, 2012, or 2013 were invited to participate in this study. No incentives were offered and only students with complete data were included in the analysis. Participants with complete data included TD students (n = 10), a comparison group of traditional PhD students (n = 10) drawn from the same interdisciplinary departments and units as the TD students, and a comparison group of MPH graduate students (n = 6) drawn from the same MPH program the TD students completed ( Table 1). The study was approved by the university's Institutional Review Board and informed consent was obtained before participation.

Objective 1: Longitudinal Comparisons across Programs (n = 26)
All participants completed a TD questionnaire at baseline and year 2, and the doctoral students upon graduation (approximately year 5). The three time points corresponded to the time of enrollment, time of completion of the MPH degree or equivalent coursework, and time of completion of the doctoral degree, if applicable. Thus, the doctoral students (traditional and TD) participated at three time points, and MPH students participated at the first two time points. Early in the fall semester of each study enrollment year (2011-2013), lists of newly admitted MPH and doctoral graduate students were obtained from each relevant department. The informed consent, online TD questionnaire, and invitation to participate in this longitudinal study were distributed by email to eligible students. After two years, respondents were invited to complete the TD questionnaire again by email (corresponding to the completion of the 2-year MPH degree). Starting at four years after enrollment, the doctoral student respondents were monitored for progress toward dissertation completion via public and departmental announcements and/or by contact with the department and invited to complete the TD questionnaire at graduation.

Objectives 2-3: Longitudinal TD Development among TD Trained Students (n = 10)
In addition to the TD questionnaire that all participants completed, the 10 TD students participated in a more comprehensive evaluation of the TD training program, which included a visual assessment based on a sketch protocol at baseline (administered within 2 weeks of program entry), at midpoint (after completing their MPH requirements), and at endpoint (at the time of completion of the PhD).
Longitudinal data from the sketch protocols were used to assess student conceptualizations and mental maps of TD over time. The sketch protocol, the metrics derived from the visual assessment, and visual assessment validation procedures are described in detail below.

Sketch Protocol
The TD students participated in a sketch protocol at each of the three time points. They were instructed: "Assuming no art skills whatsoever, please take a few minutes to draw a transdisciplinary research or intervention team doing something that prevents or reduces childhood obesity." Researchers stopped them after five minutes. The next day the TD students completed short, individual audio-recorded interviews in a private setting to share a verbal explanation of their sketch. Each student was shown their sketch, and asked to 1) describe the sketch, 2) describe emotions evoked, and 3) ascribe meaning to the sketch. In contrast to written questionnaires, visual tools using sketches or drawings provide an opportunity for implicit attitudes and awareness to be expressed (Pauwels, 2010). Audio recordings were transcribed and used in conjunction with the sketches. This assessment exercise was adapted by the first author from the Kinetic Family Drawing (Burns & Kaufman, 1972), a classic family therapy technique to elicit thoughts, perceptions, and internal schema (i.e., mental maps) about family relationships, including the three questions to stimulate reflection on the drawings. Drawings and sketches have been used in higher educational contexts to assess learning and conceptualize complex research inquiries (Renfro, 2017). Visual assessments elicit imagery, mental maps, and non-verbal insights that reflect perceptions of phenomenon or relationships that might otherwise be outside of the individual's awareness (Burns & Kaufman, 1972) or difficult to explain in words alone (Renfro, 2017).

Transdisciplinary Attitudes
The Interdisciplinary Perspectives Index (IPI; Misra et al., 2009) was used to assess TD attitudes. The IPI is a 6-item measure (Cronbach's alpha = 0.93) with response categories from 1 to 5 (Strongly Disagree to Strongly Agree). Items were coded so that higher scores reflected more favorable attitudes toward TD, and a mean score was calculated. Sample items included "I prefer to conduct research independently rather than as part of a group" and "Generally speaking, I believe that the benefits of interdisciplinary research outweigh the inconvenience of such work" (reversed coded).

Collaborative Research Behaviors
The Behavioral Change Collaborative Activities Index (BCCAI; Misra et al., 2009) was used to assess collaborative behaviors. The BCCAI is a 7-item measure (alpha = 0.84) with response categories from 1 to 7 (Never to Very Often). Items were averaged/summed for a total BCCAI score of collaborative behaviors. A sample item is "How often do you participate in groups with researchers in other fields with the intent to integrate ideas?".

Levels of Obesity Determinants (6Cs)
To assess TD students' awareness of determinants of child obesity across the six levels of social ecology, a count method was used based on two data sources. The number of levels was derived from the six levels of social ecology (6Cs) model of child obesity (Harrison et al., 2011;Fig. 2), which was the theoretical framework for the TD training grant. The two data sources were: (a) responses to four openended questions on the TD questionnaire relevant to obesity determinants and (b) the sketch protocol (sketch and text). Sample open-ended questions included "What determinants of child obesity would you like to understand better?" and "What are the most important components of an effective program in preventing/reducing childhood obesity?" Responses and the sketch protocol data were coded for the mention of obesity determinants at each level of the 6Cs model. Levels of 6Cs from both sources were combined to create the "6Cs" score (range 0-6).

TD Attributes and Sketch Protocol
Student awareness of TD attributes was assessed using the sketch protocol (i.e., sketch with transcribed text explanation) described above. Three raters coded each of the 30 sketches/texts. After determining acceptable reliability among raters, the TD attribute score was adopted. The coding process was led by the first author and the coding scheme was refined by the first three authors. First, a multi-step visualtextual coding process was used to analyze the sketch/text, extract themes, and develop coding categories using thematic analysis (Braun & Clarke, 2006). Initially, the sketches were open coded for emerging themes informed by the literature (Stokols et al., 2013). Then, the analysts engaged in an iterative process until distinct and well-defined TD attributes and visual and textual indicators of such were established. Finally, operational definitions were developed for each attribute to further clarify and refine terms and coding categories. The coding refinement process revealed five distinct attributes of TD research: organized complexity, collaboration, long-range view, stakeholder engaged research, and societal impact (Table 2). Second, data were coded again using refined coding categories. Three raters independently scored each sketch/text for the presence of the five TD attributes on a scale of 1-3 as follows: 1 = not present, 2 = partially present or implied, and 3 = present and explicit. To assess the reliability of scoring among the three raters and the appropriateness of including the TD attribute scores as part of a composite overall TD score, Kendall's Coefficient of Concordance (W) was calculated (Gearhart et al., 2013;Howell, 2010). Kendall's W is a non-parametric statistic to assess agreement among two or more judges or raters and is often used when assessing rater agreement on visual data such as for gait analysis (Lord et al., 1998). Kendall's W ranges from 0 (no agreement) to 1 (unanimity or perfect agreement).
Overall, global concordance among independent raters, assessed prior to consensus, was high and significantly so for 13 out of 15 summary observations (five attributes over three time points). Specifically, the Kendell's coefficient of concordance was significant at all time points for organized complexity, time, and stakeholder engagement; and at two time points for collaboration and societal impact. Following Lincoln and Guba (1985), we resolved differences through discussion and consensus. After finding acceptable inter-rater agreement, mean TD attribute scores were computed across the five attributes for each of the 30 sketch/texts on a scale of 1-3.

Construction of Overall TD Score
To synthesize the multiple methods of assessment, an overall TD score was computed based on the IPI, BCCAI, awareness of obesity determinants (6Cs), and TD (2022) 47:661-681 Innovative Higher Education 668 attribute scores. Assessments were first standardized to a 1-6 scale, then averaged to yield an overall TD score for each student in the TD program.

Analytic Strategy
Scale analyses and quantitative data were analyzed using SPSS 22.0. Wilcoxon signed rank tests were used to examine aggregate differences over time in the two quantitative measures (IPI and BCCAI) and the composite overall TD scores from baseline to mid, baseline to end, and mid to end point. To examine differences between TD-trained PhD/MPH students versus traditional PhD students and traditional MPH students over time, Generalized Estimating Equations (GEE) analysis was used. Program type (TD, PhD, or MPH) was used as the independent variable and the IPI and BCCAI scores at the end of their programs were the dependent variables. The quasi-likelihood under the independence model criterion (QIC) was used to assess model fit.

Wilcoxon Signed Rank Test
Wilcoxon signed rank tests were performed to test for significant differences in mean scores on repeated measures of the IPI and BCCAI for each group of students. As shown in Fig. 3, significant change was observed from baseline to mid and baseline to end on the BCCAI, but only in the TD student group. Figure 3 shows that scores on the BCCAI (collaborative TD behaviors) rose only among the TD (39% increase) and MPH students (22% increase) in the first two years of training (during the MPH portion of the TD program) and continued to increase among TD students. Overall change in collaborative behaviors from baseline to end of program showed a significant increase by 1.9 points among TD students (53% increase, p = 0.005), with no change for the other groups. In addition, only the TD students showed an increase from baseline to mid (p < 0.01) by 1.4 points. Among the PhD group, BCCAI scores decreased numerically by 0.1 point (2.4% decrease), although not significantly. Figure 3 shows descriptively that the IPI scores rose slightly for all groups over time, but IPI scores were relatively high at baseline for all groups and no changes between timepoints on the IPE were significant for any group.

Comparison Group Findings from GEE Analysis
To compare changes over time between groups, we used GEE to test associations between type of training program (i.e., TD, PhD, or MPH) and within-subject repeated measures of scores on the IPI and on the BCCAI. Differences by group were observed on the BCCAI scores over time (an indicator of collaborative research activity). Based on GEE analysis, membership in the PhD group was associated with significantly lower BCCAI score change from baseline to endpoint compared to the TD student group's score change (estimate = -0.45, 95% CI -0.68, -0.21, p = 0.0002). No group by time interaction was noted for the repeated measure scores on the IPI. In addition, analysis of variance showed no differences among any of the three groups on the IPI or the BCCAI (not shown), indicating no difference in TD attitudes or behaviors between the TD and comparison groups at baseline.

Overall TD Score
Eight of the 10 TD students showed increasing TD orientation over the five years of the program based on the overall TD score. Among the other two TD students, one began and sustained a high TD score (mean overall TD = 4.6 out of 6) and the other began with a high score (mean = 4.9) that decreased by endpoint (mean = 3.9; see Fig. 4). The maximum individual increase in overall TD score was 1.4 points

Fig. 4
Overall transdisciplinary (TD) scores at baseline, midpoint, and endpoint by TD student. Notes. The bar graph shows overall TD scores over three time points among TD students (n = 10), grouped by pattern of change: TD increase-increase (increases at both mid and endpoint), and TD increase-stable (increase at mid and stable at endpoint). Change scores in overall TD are shown from (a) baseline to end, (b) baseline to mid, and (c) mid to end by TD student 1 3 and the maximum decrease was 1.0 points (see supplemental table). As a group, the average overall TD score rose from 4.3 to 4.7 at mid and then to 4.8 by endpoint, for a net gain of 0.5 points on average, showing overall growth in TD orientation.

Patterns in Overall TD Score Change over Time
The bar chart in Fig. 4 shows students grouped by pattern of overall TD score change. Among the eight students with rising overall TD scores, two patterns emerged: four students had overall TD scores that increased at each time point (TD increase-increase), and four students had overall TD scores that increased then stabilized (TD increase-stable). The graphs in Fig. 5 show all four TD indicators among each of the ten students, as well as a student exemplar of each of the two increasing patterns. (See the supplemental table for numeric values.)

Change Scores Over Time on Four TD Indicators
Average change scores over time on each of the four indicators are shown in Fig. 5 (see supplemental table). All students reported increased collaborative research behaviors (BCCAI) with an average net gain of 1.6 points. A slight majority, six out of 10 students, reported increased favorable attitudes toward TD (IPI) for an average net gain of 0.2 points. Four out of 10 students reported increases on the TD attributes assessment for an average net increase of 0.6 points, and only three out of 10 students reported increased identification of 6Cs, for an average net decrease of 0.3 points.

Objective 3: Explore Novel Visual Method of Assessing TD Development
Finally, Fig. 6 shows an example of a student's sketches at baseline, midpoint, and endpoint and TD attribute scoring. The sketch/text provide a unique window into students' ability to conceptualize TD research in action from the time of enrollment to doctoral degree (about five years). Details on methods to identify and score TD attributes and establish inter-rater reliability are in the Methods section. Across the 30 sketches, the mean TD attribute score was 3.9 at baseline, 3.8 at midpoint, and 4.5 at endpoint (Fig. 5a). As shown in Fig. 4, four student scores increased, four decreased, and two showed no change (see the supplemental table for numeric values).

Discussion
Overall, compared to traditional PhD training, students in the TD training program displayed greater collaborative TD research behaviors at program completion based on the BCCAI measure. The addition of the TD training components appears to  Fig. 6 Examples of sketches and text at baseline, midpoint, and endpoint. Note. Figure 6 shows an example of student #2's sketches/text. Each sketch/text was coded by three independent raters on a scale of 1-3 on five key TD attributes as follows: 1 = not present, 2 = partially present or implied, and 3 = present and explicit. Mean scores for each sketch/text item were calculated and show descriptive change over time (2022)  have increased collaborative research behaviors among the TD students; traditional training did not. This is consistent with prior literature showing that participation in team science was associated with higher scores on the Transdisciplinary Orientation (TDO) scale among faculty (Misra et al., 2015); and as expected since the 5-year training program was designed to foster TD approaches to research. Questions remain about the degree to which specific components contributed to student growth. Along with TD seminars throughout the 5 years, the MPH component of training in the first 2 years may have boosted collaborative and interdisciplinary research activities since public health is an inherently interdisciplinary field. Indeed, students in the MPH comparison group showed a numerical increase in the BCCAI score, but it was not significant. However, the BCCAI scores among the TD students continued to rise significantly even after the MPH portion of the TD program was completed (Fig. 3), suggesting that the MPH part of the program was not the sole contributor to students' TD development.
Other possible contributors to TD development included the weekly TD seminars each semester for 5 years, the TD coursework, the interdisciplinary advisory structure ( Fig. 1), financial support for students to participate in conferences from a wide array of disciplines, time spent with visiting scholars each semester, relationships with faculty who were also expanding into TD scholarship, the students' own TD research and dissertation projects, and the support and friendly competition among the students to take on new TD challenges. Together, these components created a stimulating learning environment for cross-disciplinary fertilization of ideas and TD development and innovation (Selznick et al., 2021). The multi-year cohort structure offered opportunities for more experienced peers to mentor and model TD concepts for newer cohorts (Mâsse et al., 2008). In this way, students at all levels of understanding of TD were intellectually supported, challenged, and socialized toward TD alongside their traditional disciplinary training. Given the supportive structure, it is not surprising that gains in collaborative behaviors were greater in the TD students than in the comparison groups. Since the TD training program represents significant investment in the students, future studies that can dismantle the effects of specific components of the TD program would be beneficial. However, the impact of TD training milieus and of building a culture of interdisciplinary learning may be difficult to parse.

Growth and Patterns of TD Development
Of all the assessments, collaborative behaviors (BCCAI) showed the most change, and the increase was significant at each time point among the TD students. Scores on the 6Cs yielded the most "mixed" change (increase and decrease): scores increased from baseline to midpoint, then decreased from midpoint to the end of the program, which may suggest that TD students initially expand their TD thinking in the first 2 years, and then narrow their focus during the dissertation stage. To our knowledge, this is the first study to document developmental fluctuations in TD orientation among students over time. Interestingly, little increase or change was observed (2022) 47:661-681 Innovative Higher Education 674 on the IPI score for any group, which assesses overall attitudes and receptivity to TD; however, the IPI score was high at baseline and remained high throughout. This suggests that attitudes toward TD may be more stable than TD behaviors, or there may have been a ceiling effect on IPI.
As shown in the patterns of overall TD development among all 10 students (Fig. 4), not all students responded in the same way to the TD program. While all overall TD scores were above the mean at baseline (range 3.7 to 4.9), four students showed continuous expansion in their TD scores throughout the program, while others leveled off in the second half of the program. This latter pattern is not surprising in TD development, where a phase of expansion followed by stabilization and focus may be beneficial. This would be consistent with the literature showing that TD development takes time (Hall et al., 2012) and requires both depth and breadth (Stokols et al., 2013); and that TD teams can benefit from having both generalists who bridge several disciplines and highly specialized members who provide indepth knowledge in one field (Stokols et al., 2013). Another possible explanation is the variation in emphasis on breadth vs. specialization among different faculty advisors, which could influence student exposure and research behaviors.

Self-Selection Versus Training
The role of self-selection into TD programs cannot be ignored when attempting to assess the effectiveness of TD training. Team science findings suggest that the greatest gains in productivity occur when TD team members are open and willing to consider new ways of thinking and collaborative approaches to study problems (Nash, 2008), but it is unclear if these qualities are inherent or can be learned by trainees. While certain types of students may be drawn to TD training opportunities, these findings suggest that changes in the TD group were learned in the training program described here rather than simply due to self-selection. First, Table 1 shows strong similarities among all three groups at baseline on demographic characteristics. Second, analysis of variance of baseline scores on both standardized measures (IPI and BCCAI) showed no differences among any of the three groups. Third, baseline scores on the BCCAI were numerically lower among the TD students than the traditional students at baseline (Fig. 3); and so, self-selection for "TD-minded" students into the TD program was not observed in this sample. Fourth, comparison group findings suggest that those in the TD program showed greater collaborative behavior gains than their traditional counterparts. Thus, findings suggest that the observed growth in TD orientation was a function of the TD program rather than self-selection. Overall, findings support the concept that students can learn interdisciplinary competence (Hammons et al., 2020) and TD research orientation during graduate training with sufficient support, mentoring, and structure (Liechty et al., 2009).

Usefulness of Multiple Methods to Assess TD Development
In addition to two quantitative scales of TD attitudes and behaviors, this study drew upon open-ended questions and a novel application of a visual assessment tool to (2022) 47:661-681 Innovative Higher Education 675 assess students' conceptualization, mental maps, and implicit understanding of TD research and teams over time. The addition of qualitative assessments complemented the standardized quantitative measures typically used (Misra et al., 2009(Misra et al., , 2015; and the use of visual data, such as the sketch protocol, elicited non-verbal information about TD that quantitative measures would likely have missed (Burns & Kaufman, 1972;Gearhart et al., 2013). To our knowledge, this is the first time a visual tool has been used to assess student growth within TD training and research team evaluation. In addition, a coding process was developed and deemed to have acceptable inter-rater agreement. The visual tool and longitudinal data allowed us to begin to examine TD growth over time, which these data show is not always a linear path. This study's emphasis on process and development complements other TD evaluation research which has mostly focused on outcomes such as publications (Hall et al., 2012;Patel et al., 2021).

Attributes of TD
Conceptualization of the true attributes of TD research remains a work in progress. In this current study, thematic coding of the sketch protocol and open-ended short answer questions revealed five attributes of TD. In another study, a team of scholars used concept mapping to identify a comprehensive list of discipline-neutral TD topics pertaining to TD teams, methods, and conceptual frameworks (Falk- Krzesinski et al., 2011). Their analysis revealed seven clusters of topics pertaining to the Science of Team Science. These were (a) definitions and models of team science; (b) measurement and evaluation of team science; (c) team disciplinary dynamics; (d) team structure and content; (e) institutional support, and (f) team management and organization (Falk-Krzesinski et al., 2011). In the current study, using multiple data sources to capture student development as TD scholars, findings highlight student awareness of six of these seven TD clusters of topics. The cluster that did not emerge clearly from the current study was team disciplinary dynamics. Additionally, although leadership is key to TD success (Stokols et al., 2013) and that topic would fall under the team structure cluster, none of the students depicted the role of a TD team leader and instead depicted highly egalitarian team participation visually and in text.

Limitations
This study had several limitations, including the small sample size of a doctoral training program, and risk of self-selection bias due to lack of random assignment to the type of educational program. Thus, findings cannot be generalized to all TD training programs and no causal inferences should be made. The standardized measures used tend to be skewed toward higher scores, adding to the potential for ceiling effects; and the measure of TD attitudes (IPI) did not appear to capture change over (2022) 47:661-681 Innovative Higher Education 676 time in any group, so it was not particularly useful in a longitudinal design. Despite limitations, findings from this study fill a gap and make important contributions to understanding scholarly development and growth in conceptualization of TD over time among doctoral students using multiple methods.

Conclusion
Graduate training programs that teach TD and team science approaches to address grand challenges in health and society are growing in number and importance, and evaluation must keep pace. This study addressed several gaps in this effort. It used a prospective longitudinal design with comparison groups and employed multiple measures to assess graduate students' growth and scholarly development of TD orientation from enrollment to degree completion. Overall, findings suggest that longitudinal TD training offered as an add-on to usual degree requirements can foster higher collaborative research practices in TD students compared to traditional doctoral student counterparts. Additionally, TD trained students can be expected to demonstrate increasing sophistication in conceptualization and frameworks for TD research and team science from start to end of training. Further research and funding are needed to enable rigorous evaluation of TD graduate training programs that use longitudinal and prospective designs, comparison groups, and multiple methods to better understand the developmental process of becoming a TD scholar and team scientist, and strategies to foster such development. By building an evidence base for TD training, institutional investments can be better targeted toward the features and active ingredients of graduate TD training programs that can most effectively cultivate the next generation of diverse and innovative TD scholars and leaders.