The results presented in the preceding sections were primarily aimed at developing a test on professional competence of pre-service teachers specifically for teaching mathematical modelling. In a first step, the theoretical foundations were laid and, based on this, the structure of professional competence used for teaching mathematical modelling was explained. Subsequently, the relevant constructs were operationalised and empirically checked.

The results of the analyses presented regarding the test quality are summarised below and are classified in the current state of research to discuss possible explanations for the observed results on this basis. In order to further define the scope of the results, key limits of the test instrument in relation to the present study will be addressed. The book concludes with an outlook, in which possible implications are derived from this study for didactic research as well as for university teacher education.

The chosen approach to operationalise and empirically describe the structures of professional competence for teaching mathematical modelling addresses the question formulated by Blum (1995) about appropriate structural conceptualisations and empirical underpinnings for essential skills for teaching in application-related contexts. For example, central constructs, in particular the area-specific competence, were first defined and then measured using a test instrument and recorded in sufficient quality. On this basis, a model validity check showed that the data in the field of modelling-specific pedagogical content knowledge can be described by one-dimensional Rasch models—after excluding some critical test tasks (see Sect. 4.2).

The fact that aspects of professional competence and in particular pedagogical content knowledge can be empirically captured as a facet of professional knowledge was already demonstrated in the COACTIV (Kunter et al., 2013) and TEDS-M (Blömeke et al., 2014) studies. The present study now uses a different, focused perspective, for example, because no facets of pedagogical or psychological knowledge have been measured. For example, the explanatory notes on the individual modelling-specific pedagogical content knowledge facets reveal additional constrictions that have not been taken into account in the models in question (see also Sect. 2.4).

Nevertheless, the results of the present work conclude on the results of the structural analysis of Klock et al. (2019) and Wess et al. (2021), which showed that the dimensioning envisaged by Borromeo Ferri and Blum (2010), despite the theoretical constraints presented, is empirically distinct and can therefore be assumed from one-dimensional knowledge facets. The one-dimensionality of the constructs under consideration is therefore in line with the theoretical, conceptual and substantive dimensions of the skills and abilities necessary to promote modelling competence among students and, in addition, with the reported homogeneity of subject-related knowledge facets from the aforementioned large-scale studies (Blömeke et al., 2014; Krauss et al., 2013).

Furthermore, beliefs/values/aims and motivational orientations in the form of self-efficacy expectations for mathematical modelling could be adequately captured and significant interactions between these constructs could be demonstrated. In this sense, the professional competence to teach mathematical modelling can be considered a complex construct (see Sect. 2.1). In the modelling-specific design, beliefs can also be understood from a stronger transmissive or stronger constructivist perspective (cf. Voss et al., 2013). The established correlations are therefore in line with the findings of Schwarz et al. (2008) and Kuntze and Zöttl (2008): both positively correlated constructivist beliefs and negatively correlated transmissive beliefs contribute to the description of beliefs in mathematical modelling.

Before identifying further uses of the test presented here for didactic research and for university teacher training, some limitations of this study will be addressed in order to define the scope of the results presented. While the objectivity of the test can be considered very good due to the type of item used and the reliability on the basis of the studies can be considered acceptable to good, the review of the criterion validity in the field of professional competence for teaching mathematical modelling was primarily based on retrospective validity. Based on the results of the COACTIV study (Krauss et al., 2008), the focus was primarily on the school-leaving examination grade as a criterion that was not indicative of the specific knowledge facets (see Sect. 4.3). These results replicate the correlations found by Krauss et al. (2008) and thus support the criterion validity of the instrument used. In addition, the possibility of verifying the convergent validity in the present work is limited, since inaccessible comparative tests have not allowed the use of instruments other than those described in Sect. 3.5. Correlations between the competencies considered and with the beliefs and self-efficacy expectations for mathematical modelling were thus calculated, also in line with the results of the COACTIV study (Krauss et al., 2013) (see Sect. 4.3). Significant correlations between the examined aspects were shown, which, in almost all cases, are comparable with the COACTIV results in terms of their expression and significance and thus contribute towards the convergent validity of the test designed. Only the strengths of constructivist beliefs were slightly lower than in the reference study. However, it can be assumed that no stronger correlations could develop due to ceiling effects in this area. These ceiling effects also suggest that the degree of differentiation between beliefs (and, where appropriate, self-efficacy expectations) and mathematical modelling should be adapted for the following studies, for example by using a seven-step Likert scale instead of a five-step one. In addition to the convergent, factorial validity is another form of construct validity. In order to ensure this, various models of professional competence for teaching mathematical modelling were identified in the framework of the structural analyses carried out and were examined in replication studies (see Sect. 2.4.4).

The use of valid Rasch models also ensured the existence of a sufficient statistic (Bühner, 2011), which provides the basis for a valid transfer rule for test value formation and thus fulfils the quality criterion of the scaling. In the course of the application review, however, both in the facet of knowledge about concepts, aims and perspectives as well as in the facet of knowledge about interventions, some critical items emerged, which must be discussed further.

In the light of the model structures confirmed by the modelling tests and item characteristics, it was also possible to determine the reliability values that indicate exactly how the personal parameters (EAP or WLE estimators) could be measured (see Sect. 4.2). In this respect, it should be noted that, in all aspects of professional competence in teaching mathematical modelling, values have been achieved that is sufficient for group comparisons and, in some cases, as indicators of a good measuring instrument (Bühner, 2011; Ebel & Frisbie, 1991).

In view of the evaluation methodology, the probabilistic test theory used to scale the raw data can be considered to be of decisive importance. The chosen methodological approach is certainly not the easiest way to calculate the measurement accuracy and to verify the dimensionality of tests. However, it offers decisive advantages while at the same time reducing certain deficits in relation to the dependency on items (cf. van der Linden & Hambleton, 1997). Various methods for estimating the ability parameters were also discussed, ultimately looking at Weighted Likelihood Estimation. Although it leads to measurable error-related measurements, these are the best point estimates of the person's abilities (Rost, 2004). These provided the basis for the analyses aimed at answering the question of the quality of the test instrument under consideration.