Background

As professional identity formation (PIF) of medical trainees is now a great concern to medical educators [1,2,3,4,5,6], evaluation of PIF, especially socialization and professional value formation, has become an important issue for medical education [7,8,9,10,11,12,13,14,15].

Based on human development theories by Piaget, Kohlberg, Loevinger, Maslow, McCleland, Murray, Ericson and others, Kegan proposed a life-long developmental framework of the self into a moral and meaning-making entity [16]. Kegan illustrated changes in individual sense, perspectives, values, emotional control and reflection as part of the process of the development of relationships with others and society. Kegan’s model represents a 6-stage helical pathway of evolutionary truces that people follow in a psychological reciprocal process of favoring inclusion with and independence from others during their development (from stage 0 to 5) (Fig. 1).

Fig. 1
figure 1

A modified Kegan’s helix of evolutionary truces (- - ▶) [reference 16], and expected scale direction and areas of evaluation. SAS: stage-specific attribute scale; SAS-2: stage 2-specific attribute scale; SAS-3: stage 3-specific attribute scale; SAS-4: stage 4-specific attribute scale; SAS-h: stage 4 and higher-specific attribute scale

Previous qualitative research indicated that Kegan’s model [16] can explain the development process in dentistry [1], the military [17, 18], and the legal profession [19], and theoretically, the model can also be applied to medical trainees [7, 13]. Previous qualitative research indicates that medical trainees are supposed to be in stages 2 to 4 [4, 7, 13]. Based on these studies, one scale (the Developing Scale, DS) to evaluate the overall degree of personal and professional development was developed [20]. This scale evaluates self-control as a professional, awareness of being a medical doctor, reflection as a medical doctor, execution of social responsibility, and external and internal self-harmonization.

Even though the DS could be a useful scale for evaluating PIF, one scale could not satisfactorily represent the helical and complex process of PIF, and attributes that characterize people at different developmental phases should be independently evaluated when determining PIF of individuals and target groups.

The purpose of this study was to develop scales to evaluate Kegan’s stage-specific attributes, and attempt to reveal medical trainees’ individual PIF, as well as group diversity.

Method

To illustrate target individual and group PIF, four scales that cover the different stages of Kegan’s model were developed (Fig. 1). The following assumptions were used: evaluation of specific attributes for Kegan’s stage 2 to 5 would cover the entire process of PIF in medical trainees [4, 7, 13], and lower stage-specific attributes decrease and higher stage-specific attributes increase as the level of education and clinical experience advance.

Stage-specific attribute scales (SASs) were developed simultaneously along with the DS as follows: 1) an initial item pool with items that cover attributes from Kegan’s stage 2 to 5 (referencing previously reported manifestations of people at stage 2, 3, and 4 in the context of medical training and practice) was created; 2) a pilot questionnaire with essential and common items with medical context using items selected from the initial item pool was created; 3) the pilot questionnaire was administered to medical students, residents, and experienced medical doctors; 4) respondent data from the pilot questionnaire were used to elucidate item sets for proposed SASs using a reliability coefficient, and 5) means of proposed item sets from the different respondent groups were compared to confirm proposed SAS scores indicating development. Following this procedure, four SASs were developed.

Initial item development

To create items for the four SASs, descriptions of medical trainees’ personal characteristics and behaviors or attitudes manifested in a professional context cited in previous studies [13, 17] were used.

To assess stage 2-specific attitudes and behaviors (stage 2-specific attribute scale: SAS-2), items describing an individual who took into account the views of others but whose own needs and interests predominate, whose norms were external rules, whose self-reflection was low, and whose emotions could overwhelm reason were used. To assess preference of inclusion typically seen at stage 3 (stage 3-specific attribute scale: SAS-3), items describing an individual who was able to view multiple perspectives simultaneously and subordinate self-interest and who was concerned about how others perceive him/her were used. To assess preference of independence typically seen at stage 4 (stage 4-specific attribute scale: SAS-4), items describing an individual who could assume a role and enter into relationships while assessing them in terms of self-authored principles and standards and who could define him/herself independently of others were used. To assess attributes expected at stage 4 or higher (stage 4 or higher-specific attribute scale: SAS-h), items describing an individual who clearly recognized professional roles; whose reason was in full control over needs, desires, and passion; who did not perceive him/herself as having a single identity and was open to other influences were used.

After creating and rewriting the items, 31 items to be used for the next round of data collection were selected. They consisted of 11 items for SAS-2, eight items for SAS-3, five items for SAS-4, and seven items for SAS-h. Of these SAS candidate items, 23 items for SAS-2, 4, and h were also used as DS candidates. Fifteen (items 1–15) satisfied the DS criteria and were used in the DS [20].

The questionnaire was self-administered and anonymous. Each item was scored on a 7-point Likert scale that ranged from 1 (completely inapplicable) to 7 (greatly applicable), and 4 was neutral. The questionnaire also asked about demographic characteristics (gender, age), as well as work experience and position for instructors.

Data collection

From July 2016 to March 2018, the printed questionnaire was distributed by hand to 4th-year medical students about to start their clinical clerkship courses, 6th-year medical students who finished 1.5 years of all clinical clerkship courses, and residents in the last month of the 2-year residency program at Kagoshima University. The author did not have a direct relationship (i.e., instructor or supervisor) with any of the medical students and residents participating in this study at the time of data collection. The questionnaire was also distributed by mail to experienced medical doctors working in community hospitals or private clinics in Kagoshima Prefecture who engaged in undergraduate medical education as senior instructors. Questionnaires were anonymous and were returned by postal mail in January 2017 using the return envelope provided with the questionnaire.

Data analysis for scale development

To develop each SAS, the reliability (Cronbach’s alpha) of candidate items was analyzed and reliable item sets were explored.

After item sets for all scales were fixed, confirmation of whether the lower stage-specific and higher stage-specific attribute scales could differentiate between different developing groups was performed using the average scores of the SASs in the four respondent groups. Furthermore, medical trainees’ PIF that SASs could provide, such as the stage of the respondent groups and diversity among groups was analyzed.

SPSS version 23 (IBM, New York, NY) was used for all data analyses.

Results

Demographic characteristics of the respondents

The same data were used as for the DS development [20]. Prior to the analysis, 14 respondents who chose option 4 (neutral) as the response for 27 items or more (87%) or for 23 sequential items (74%) were excluded as invalid data. Data for a total of 322 respondents (response rate 53.7%), including 118 (response rate 47.8%) 4th-year medical students and 120 (response rate 51.5%) 6th-year medical students at Kagoshima University School of Medicine, 47 (response rate 73.4%) 2nd-year residents at Kagoshima University Hospital, and 37 (response rate 66.1%) medical doctors at community hospitals and private clinics who served as instructors for medical students were included in this research. The mean ages of 4th-year medical students, 6th-year medical students, residents, and instructors were 24.2, 25.4, 29.7, and 55.2 years, respectively. The mean length of clinical experience among instructors was 29.3 years (standard deviation 6.2 years, range 15–40 years).

Development of SASs

Using the items related to key attributes of each stage, item sets with the highest Cronbach’s alpha for SASs were explored. Cronbach’s alpha for the proposed SAS-2, SAS-3, SAS-4, and SAS-h were 0.66 (11 items), 0.53 (six items), 0.61 (three items), and 0.63 (six items), respectively (Table 1, Additional file 1). Items 1 to 14 were identical to the DS items, and the coding direction differed depending on the scales.

Table 1 Items used for the four SASs, and mean item scores for each respondent group

Confirmation of SASs by comparing respondent groups

Table 2 shows average scores of proposed SASs as well as total DS scores of the 15 items for each respondent group.

Table 2 Mean scores of DS and the four SASs for each respondent group

Medical students’ SAS-2 mean scores were higher than those of residents and instructors, and instructors’ SAS-4 and SAS-h mean scores were higher than those of students and residents. Instructors had increasing scores from the lowest stage (SAS-2) to highest stage (SAS-h).

Univariate analysis of variance of each SAS score indicated that respondent group was a significant variable for SAS-2 score and SAS-h score (SAS-2 score, p = 0.03; SAS-3 score, p = 0.70; SAS-4 score, p = 0.37; SAS-h score, p < 0.01) whereas gender was not a significant variable in any of the four SASs (SAS-2 score, p = 0.10; SAS-3 score, p = 0.07; SAS-4 score, p = 0.33; SAS-h score, p = 0.87).

Confirmation of SASs using DS score

Since the SASs and DS utilized the same items, SAS-2 scores should negatively correlate and SAS-h scores should positively correlate with DS scores.

To analyze whether all four SAS scores were related to the DS, which evaluates overall maturation and socialization related to PIF, as theoretically expected, I examined SAS scores in the five DS score classifications (Fig. 2). As for DS scores of 54 or less (n = 7), SAS-2 mean score was the highest among the four SASs, whereas SAS-4 and SAS-h were both lower than 4 (i.e. neutral). As for DS mean scores of 85 or more (n = 21), SAS-4 and SAS-h were higher than 5.5, and SAS-h was the highest and SAS-2 was the lowest among the four SASs. As DS score increased from 55 to 84, SAS score transitioned from a pattern of high SAS-3 mean score to high SAS-4 and SAS-h mean scores and low SAS-2 mean scores.

Fig. 2
figure 2

Mean and standard deviation of SAS scores in five DS score classifications and number of respondents

DS: developing scale; SAS: stage-specific attribute scale; SAS-2: stage 2-specific attribute scale; SAS-3: stage 3-specific attribute scale; SAS-4: stage 4-specific attribute scale; SAS-h: stage 4 and higher-specific attribute scale

Comparison of individual SAS scores

To clarify the scale function at the individual level and the appropriateness of applying the scales to groups, I analyzed individual scores for SASs with the same DS scores. Figure 3 shows the four SAS scores of respondents whose DS scores were 60, 65, 70, 75, 80, 85, or 90. If there were several respondents with these DS scores in each respondent group, two respondents from each group were randomly selected.

Fig. 3
figure 3

SAS scores of respondents with DS scores of 60, 65, 70, 75, 80, 85, and 90. If there were three or more respondents with these DS scores in each respondent group, two respondents from each group were randomly selected

DS: developing scale; SAS: stage-specific attribute scale; SAS-2: stage 2-specific attribute scale; SAS-3: stage 3-specific attribute scale; SAS-4: stage 4-specific attribute scale; SAS-h: stage 4 and higher-specific attribute scale

For example, among a total of 16 respondents with a DS score of 70 (four 4th-year medical students, eight 6th-year medical students, three residents, and one instructor), seven respondents’ highest score was in SAS-3 (e.g. ID = 249, 49, and 312), three respondents’ highest score was in SAS-4 (e.g. ID = 214, 183, and 58) and three respondents’ highest score was in SAS-h (e.g. ID = 295) (Fig. 3).

Actually, the scores and patterns of SASs varied among individuals with the same DS score among respondents in the same group. Independent from DS score, some respondents showed high SAS-3 and SAS-h scores with low SAS-4 scores (inclusion pattern; e.g. ID = 117, 266, 98, 195, and 94), while others showed high SAS-4 scores with low SAS-3 and SAS-h scores (independent pattern; e.g. ID = 214, 163, and 252).

There were also tendencies indicated by the group mean scores. Many respondents with DS scores of 60 or 65 had a highest SAS-3 score among the four SASs, and most respondents with a DS score of 75 or higher had a highest SAS-4 or SAS-h score among the four SASs.

Discussion

Previous research [7, 13] indicated that Kegan’s model could explain individual medical trainees’ personal and professional development. One advantage of using this model as a conceptual framework for scale development is its applicability to individuals in any program and specialty, or in any position, because it describes general lifelong human development in relationships with others and society [16]. However, because a helical and reciprocal process was proposed, one scale score might not adequately represent the complex and divergent pathway of professional development and could not provide meaningful insight into the PIF of an individual or group. To respond to this issue, I propose four SASs, in addition to the DS, provided in one self-administrated questionnaire with a total of 27 items.

Comparing the respondent groups revealed that SAS-2 score decreased and SAS-h score increased as training advanced from students to residents and even more to instructors. This result suggested that SAS-2 and SAS-h might be able to differentiate respondents at low or high stages even though the reliability coefficients were not high.

On the other hand, stage 3-specific inclusion preferences, such as “rely on others” and stage 4-specific independence preference, such as “behave according to own values”, were opposite attributes. As a result, group mean scores of SAS-3 and SAS-4 might be reflected by the ratio of constituent members at stage 3 and stage 4, influenced by a few respondents with extreme attributes, and indicative of neutral characteristics. In fact, individual score analysis indicated there were respondents suspected to have different attributes as well as individuals at different stages within the same respondent group. Even though this limitation existed in the group comparison, the mean scores of SAS-3 and SAS-4 in each group suggested that students might have inclusion preference typically seen at stage 3 more than independence, and instructors might have independent preference typically seen at stage 4 more than inclusion preference.

Basically, individual values and attitudes are complex and there are transition phases between stages. Each SAS did not correspond exactly to Kegan’s stages, but did indicate the tendency of key attributes related to staging. Information provided by the SASs is informative for discussing PIF, and illustrates that the four SASs are valid scales for this purpose.

People with low DS scores might be expected to be at a lower stage, and people with high DS scores should be at higher stages. As shown in Fig. 2, respondents with DS scores of less than 55 were expected to be at stage 2. Respondents with DS scores of 75 or higher were expected to be at stage 4 or 5.

The results of this research indicated that the average medical student in this study was at stage 3 and they ranged from stage 2 to 4. The average instructor in this study was at stage 4 or higher, and few instructors were at stage 2. These results were compatible with theoretical hypotheses on medical trainees [7, 13], and a qualitative interview analysis of law and dental students (from stage 2 to stage 4) [19].

Even though each of the scales developed in this study requires greater validity and higher reliability, the combination of scale scores could be an indicator of PIF that knowledge and skill assessment or behavior observation cannot provide. Analysis of each respondent’s scale scores, which represent individual attitudes and values and actual behaviors, should be investigated in the future.

Limitations

The SASs had low reliability because of the limited number of items under the practical restrictions of the questionnaire compared to the complex and divergent process of PIF. I could not conclude whether the broad score range and differences among the four SAS scores were due to characteristics of the respondents’ PIF or error in an unreliable scale.

All items were written in Japanese and all respondents were located in Kagoshima, Japan. Long-term prospective studies and research in other locations are required to confirm scale sensitivity and improve usability.

Conclusions

Multiple scales evaluating different developmental stage-specific attributes, combined with one scale evaluating degree of maturation and socialization, might provide meaningful information about individual and group PIF. Young medical trainees, such as medical students and residents, were in the process of PIF.