The effects of procedural and conceptual knowledge on visual learning

Even though past research suggests that visual learning may benefit from conceptual knowledge, current interventions for medical image evaluation often focus on procedural knowledge, mainly by teaching classification algorithms. We compared the efficacy of pure procedural knowledge (three-point checklist for evaluating skin lesions) versus combined procedural plus conceptual knowledge (histological explanations for each of the three points). All students then trained their classification skills with a visual learning resource that included images of two types of pigmented skin lesions: benign nevi and malignant melanomas. Both treatments produced significant and long-lasting effects on diagnostic accuracy in transfer tasks. However, only students in the combined procedural plus conceptual knowledge condition significantly improved their diagnostic performance in classifying lesions they had seen before in the pre- and post-tests. Findings suggest that the provision of additional conceptual knowledge supported error correction mechanisms. Supplementary Information The online version contains supplementary material available at 10.1007/s10459-023-10304-0.


A. Comparison of iterations
Table A.1 shows that all independent samples t-tests for comparing the means of iteration 2021 and 2022 result in p-values > 0.05.We performed these analyses before outlier removal because we had already pooled the data to identify outliers.

B. Outlier removal
Outliers were defined as being outside of the third quartile + 1.5*interquartile range and first quartile -1.5*interquartile range in the variable "duration of learning".Ten participants fulfilled this criterion (7 from 2021 and 3 from 2022); hence, their data were not considered for further analyses.

C. Details on images used in the study
Table C.1 shows the details of the images used to design the skin lesion classification tasks in each of the study parts.
Table C.1: Overview of the images used in the different study parts Notes: The numbers in the coloured boxes indicate the number of images (number of images with a measure of confidence in parentheses) * The images in the three-point checklist learning activity are not represented in our image groups.
1 Each image group consists of 6 images: 3 of benign lesions and 3 of malignant lesions (1 easy, medium and difficult lesion per diagnosis).

D. Analysis of qualitative data on perceived helpfulness
The coding scheme focuses on two aspects: The extent of the perceived helpfulness and the underlying reasons (Table D.1).Table D.2 displays the interrater-reliability we observed using this coding scheme.

Systematics mentioned
The three-point checklist is described as systematic or as a guideline and therefore helpful, e.g."Systemic approach", "structured", "analysis system", "guideline", "list", "framework", "conceptually", "bullet points") not mentioned Systematics, as described above, is not mentioned.Simplicity mentioned Simplicity/complexity is mentioned to be helpful (e.g."easily explained", "easy to remember/apply" not mentioned Simplicity, as described above, is not mentioned Criteria… mentioned The criteria made it helpful (e.g."It was very helpful to have specific criteria ", "specific criteria", "single criteria" (e.g.network criteria)) Note: Criteria are supposed to be mentioned if participants generally refer to criteria at least once (e.g."It was very helpful to have specific criteria to guide my judgement").If they also describe which criterion exactly was helpful, a sub-category (asymmetry, atypical network or blue-white structures) is also considered to be mentioned.This means that if asymmetry, atypical network or bluewhite structures are mentioned, "criteria" in general must also be mentioned.On the other hand, "criteria" in general can be mentioned without further specification.not mentioned Criteria, as described above, are not mentioned.…Asymmetry mentioned It was helpful because "Asymmetry" is specifically mentioned/explained (e.g."It was somewhat helpful, especially the blue-white veil") not mentioned The criterion asymmetry as described above is not mentioned …Atypical network mentioned It was helpful because "Atypical network" is specifically mentioned/explained (e.g."Somewhat helpful, especially the asymmetry") not mentioned The criterion atypical network, as described above, is not mentioned …Blue-white structures mentioned It was helpful because "Blue-white structures" are specifically mentioned/explained not mentioned The criterion blue-white structures, as described above, is not mentioned Other reasons mentioned Other (Subjectivity (e.g.not helpful because "criteria seems subjective", "the colour perception (blue vs bluish) is very subjective"), amount of detail to be helpful ("the rules are rather detailed"), not helpful "because the solution is already given", "not helpful because you need experience/more training" not mentioned Other reasons, as described above, are not mentioned Note: N = 2 observers and 49 pairs for all variables.

E. Details on incoming characteristics, performance in pre-test, duration of learning and gender
To check whether the study groups differ regarding relevant incoming characteristics, we asked the participants to rate three items regarding their prior knowledge and two items regarding their skin-lesion-related behaviour on a 5-point Likert scale ranging from "strongly disagree" to "strongly agree" (Appendices E.1 and E.2).We then calculated the mean values per participant and variable.We added these questions only after the immediate post-test to avoid raising participants' awareness of their experiences at the beginning of the study.Furthermore, to check whether the study groups differ regarding the duration of learning, we summed the time that the participants spent on the initial learning treatment and the shared learning resource.
Subsequently, we performed independent samples t-tests, which showed no significant differences between the two experimental groups regarding the participants' self-reported prior knowledge, skin-lesion-related behaviour, age, performance in the pre-test and the duration of learning (Table E.1).Furthermore, a Chi-Square-test revealed that there was no significant difference between the gender distribution in the two study groups (X 2 (2, N = 115) = 1.071, p > .0586;Table E.2).

E.1 Prior knowledge
Thinking

H. Covariates used in ANCOVAs to analyse long-term performance outcomes
Table H.1 shows the descriptive statistics of the accuracy in the pre-test and duration of learning for the subsample used in the ANCOVA to analyse long-term performance outcomes.

I. Long-term performance outcomes in retention tasks
Table I.1 and Table I.2 contain the descriptive statistics and the results of the analyses of covariance regarding the long-term performance in retention tasks.

J. Long-term performance outcomes in transfer tasks
Table J.1 and Table J.2 contain the descriptive statistics and the results of the analyses of covariance regarding the long-term performance in transfer tasks.

Figure L. 1 :
Figure L.1: Development of performance in new tasks for easy, medium and difficult tasks separately Note: Significant differences are not flagged due to space reasons.Error bars represent standard errors.

Figure M. 1 :
Figure M.1: Reasons for helpfulness in active tasks

Table A .
1: Descriptive statistics and results of independent samples t-tests for comparison of iteration 2021 and 2022

Table D .
1: Coding scheme for the open question on perceived helpfulness

term performance outcomes in retention tasks
of yourself before you participated in this study, to what extent do you agree with the Table F.1 and Table F.2 contain the descriptive statistics and the results of the analyses of covariance regarding the short-term performance in retention tasks.

Table F
.2: Analyses of covariance for short-term performance in retention tasks

term performance outcomes in transfer tasks
Table G.1 and Table G.2 contain the descriptive statistics and the results of the analyses of covariance regarding the short-term performance in transfer tasks.

Table I
.2: Analyses of covariance for long-term performance in retention tasks

Table J
For detailed results on the development of the participants' diagnostic performance from pre-test to delayed post-test, please consult TableK.1,Table K.2, Table K.3, Table K.4,Table K.5, Table   K.6, and Figure K.1.
.2: Analyses of covariance for long-term performance in transfer tasks

Table K .
1: Descriptive statistics for participant's performance in track tasks in each of the four tests Development of performance in track tasks for easy, medium and difficult tasks separately Note: Significant differences are not flagged due to space reasons.Error bars represent standard errors.For detailed results on the development of the participants' diagnostic performance from pre-test to delayed post-test, please consult TableL.1,Table L.2, Table L.3, Table L.4, Table L.5, and   Table L.6, and Figure L.1.

Table L .
1: Descriptive statistics for participant's performance in new tasks in each of the four tests