Teachers and trainers in adult education–Investigating the dimensionality of their knowledge about methods and concepts of teaching and learning

Knowledge about methods and concepts of teaching and learning is considered an important aspect of the pedagogical-psychological knowledge (PPK) of teachers in adult and continuing education (ACE) for designing helpful situations of teaching and learning. The article poses the theoretical-conceptual and empirical question of the dimensionality of this knowledge using a sample of N = 212 ACE teachers. The results point to the multidimensionality of this knowledge in the ACE field. The findings are critically discussed in terms of their robustness and scope, as well as possibilities for the further development of the test.

teachers working in ACE have a major or minor degree in educational sciences (Autorengruppe wb-personalmonitor 2016).
Against this double background-the expectations of effectiveness of ACE on the one hand and the wide range of teachers with diverse professional backgrounds on the other-it is understandable that in recent years there has been an increased interest in training and development as well as in assessing the competencies of teaching staff, e.g., with a view to recruiting teachers (Goeze and Schneider 2014). It is no coincidence that numerous initiatives originate from practice or are implemented in close cooperation with practice. For example, standardized surveys, such as those on training needs, or evaluations of the advice literature are used to gain insights into the existing and desired competencies of teachers (Hippel and Tippelt 2009;Schöb et al. 2016;Strauch et al. 2021). Instruments and procedures are being developed to make visible and recognize non-formally or informally acquired competencies (Schläfli and Sgier 2008;Steiner 2010;Strauch et al. 2020;Vinepac-Project 2008). Finally, digital learning platforms have been (www. videofallarbeit.de; www.wb-web.de) and are being (Projekt EULE 1 or KUPPEL 2 ) developed for initial and in-service training for ACE teachers. These address, among other things, the pedagogical-psychological knowledge (PPK) of teachers.
It is striking, however, how slowly the development of empirically based instruments to assess the professional competence of ACE teachers has begun. So far, the focus has been on facets of PPK Rohs et al. 2017). Instruments-e.g., for self-assessment, trainer selection, or informed choice/assignment of teachers to online learning paths on learning platforms for initial and further training-should be able to capture PPK in its conceptual breadth (Schöb et al. 2016). Researchers also lack a test that captures the PPK of ACE teachers in a comprehensive and differentiated way. Only such a test would make it possible, for example, to verify the widespread assumption in practice that the more experience a teacher has, the more successful he or she will be in implementing continuing education courses. Success in ACE (unlike in school) not only implies the learning success of the participants, but also, for example, the subjective fulfillment of their benefit expectations, the solution of action problems faced by the participants or the clients outside of the training, or also the continuation of the training activities. 1 EULE (Development of a Web-based Learning Environment for Continuing Education, Acquisition of Competencies, and Professionalization of Adult Education Teachers) is a BMBF-funded research and development project conducted by the German Institute for Adult Education-Leibniz Center for Lifelong Learning (DIE) and the University of Tübingen from 2016 to 2020. 2 In the KUPPEL project (AI-supported Cross-platform Professionalization of Adult Educators), which is led by the German Adult Education Association (dvv), the University of Tübingen is currently working together with the DIE, Didactic Innovations GmbH, the Fraunhofer Institute for Applied Information Technology, and the German Research Center for Artificial Intelligence. The aim of the project is to develop a hybrid cloud for the provision of an AI-supported, individualized learning offering on distributed platforms. This paper continues the development and validation of a test for the assessment of facets of PPK that was started in the project ThinK 3 Voss et al. 2017) and focuses on knowledge about methods and concepts of teaching and learning. This knowledge is assumed to be particularly relevant for designing situations of teaching and learning. Since methods and concepts of teaching and learning for different goals and phases within learning processes exist in large numbers, the question arises to what extent a unidimensional conceptualization seems adequate ("knowledge of methods") or whether theoretical-conceptual considerations and empirical findings argue for a multidimensional conceptualization and measurement ("knowledge of learning processes, communication and interaction with learners, leading groups ..."). Thus, the question of the uni-or multidimensional conceptualizability and measurement of knowledge about methods and concepts of teaching and learning is the subject of this paper. The aim is to gain hints for the further development of the test from the project ThinK.

Research and development on the pedagogical-psychological knowledge of teachers in adult and continuing education
PPK is one aspect of teachers' professional knowledge, alongside content knowledge (CK) and pedagogical content knowledge (PCK) . In German adult education research, it is pointed out that in addition to this basic "body of knowledge" (Tietgens 1988, p. 37), there is also a need for both declarative and procedural experiential or professional knowledge (Dewe et al. 2002). "Professional knowledge" in ACE implies a particular form of experiential knowledge that refers to the specific norms, acculturated standards, or routinized practices of the professional field for learning with adults in the context of organized continuing education. Unifying these approaches, PPK is defined here as knowledge for designing and improving situations of teaching and learning in and across different subjects and educational fields. The definition includes declarative and procedural knowledge, as well as experiential knowledge acquired during a professional career (Marx et al. 2014;Voss et al. 2015Voss et al. , 2011. The COACTIV competency model , which was developed for the school context and has been empirically proven there, is increasingly being adapted for the field of ACE as well Rohs et al. 2017;Strauch et al. 2021), because it addresses generic areas of teachers' professional competence that also seem relevant for ACE. While the model has stimulated empirical research in the school setting, comparable studies in ACE are still lacking. Although the PPK to be applied in concrete situations of teaching and learning is regularly considered in competence models in different conceptualizations (Ziep 1990), most empirical approaches have so far been limited to instruments for self-assessment and/or peer assessment (Vinepac-Project 2008).
In recent years, there has been a growing interest in researching concrete situations of teaching and learning in ACE, their conditions for success, and thus the professional competencies of teachers (Kraft et al. 2009;Lattke and Jütte 2014;Vinepac-Project 2008). Thus, studies have been dedicated to the description of individual professionalization processes, didactical acts, and the metacognitive competencies of teaching staff. Maier-Gutheil (2012) used biographical case studies to identify differentiated processes of the formation of professionalism. Bastian (1997) created qualitative competence profiles of course instructors, to each of which specific knowledge foci and topic profiles were related. Several studies have been devoted to the planning activities of teachers: Hof (2001), for example, asked about the connection between teachers' subjective understanding of knowledge and their teaching concepts, while Stanik (2016) inquired into the decision fields for and the factors influencing microdidactic planning; Haberzeth (2010) identified the planning logics of teachers' actions according to which content is selected for their own courses, and Pachner (2013) investigated teachers' competence in self-reflection. However, these studies did not make use of any competency-theoretical modeling and operationalization of PPK. The instruments developed so far to measure (facets of) PPK were conceptualized for the school context (Beck et al. 2008;König and Blömeke 2009;Lenske et al. 2015;Linninger et al. 2015;Seifert et al. 2009;Voss et al. 2011), not for ACE.
The ThinK project follows on from the COACTIV competence model and the work of Voss et al. (2011) on PPK, with the aim of developing a test to measure PPK for ACE teachers as well. The measurement theory approach used in the ThinK project is Item Response Theory (IRT). A prominent model of IRT is the Rasch model (Rasch 1960). Using the specific objectivity of comparisons of test scores and item scores-one of the central model assumptions of the Rasch model (Rasch 1960)-the importance of conceptualizing the dimensionality of the PPK can be illustrated: Specific objectivity exists when statements about people's ability do not depend on which task they are being compared against. For this purpose, the following example is given: A task A is easier to solve than a task B. The tasks measure the same construct (knowledge domain). Person X has a lower ability than person Y. The probability of solving task A and B is lower for person X than for person Y. Without empirical dimensionality testing of the theoretical ideas about the conceptualization of PPK, it could happen that a test leads to wrong conclusions (e.g., Rost 2004;Strobl 2012)-which in turn can lead to poor decisions in educational practice.
In the ThinK project, PPK is conceptualized across disciplines and educational domains and dimensionalized into eight knowledge facets (Marx et al. 2014Voss et al. 2015): 1. knowledge about learning processes of learners 2. knowledge about heterogeneity of learners 3. knowledge about methods and concepts of teaching and learning 4. knowledge about objectives of teaching and learning 5. knowledge about classroom/courseroom management 6. knowledge about communication and interaction with learners 7. knowledge about the design of learning environments 8. knowledge about the diagnostics of individuals and their learning processes The conceptualization was developed following Voss et al. (2011) and is based on a systematic literature review with content analysis of relevant sources and an expert survey ). Thus, this conceptualization is compliant with the assumption of the multidimensionality of PPK (Voss et al. 2015). This conceptualization is extended by describing the eight facets using a total of 30 subfacets. The subfacets indicate a possible multidimensionality within the eight facets that could be of practical importance for the purposes addressed above (e.g., for the use of the test in the context of a learning platform).
With this in mind, the following chapter will address the question of how a unidimensional, a multidimensional, and a hierarchical conceptualization can be justified for the knowledge about methods and concepts of teaching and learning.

Unidimensional, multidimensional, and hierarchical conceptualizations of knowledge about methods and concepts of teaching and learning
Methods of teaching and learning result from a specific combination of forms of work (e.g., exercise) and social form (individual, partner, group, or plenary work). The difference, which is often mixed up or misunderstood, can be illustrated by two questions (Jank and Meyer 2008): 1) Who works together with whom? (= social form); and 2) What patterns of action (e.g., giving a lecture, reproducing something) are to be performed? (= forms of work). An investigation of the literature reveals many combinations of social forms with forms of work (which can be expanded almost at will), which can be used in different phases for different goals in situations of teaching and learning.

Unidimensional conceptualization
A unidimensional conceptualization of knowledge about methods and concepts of teaching and learning raises the question of what general knowledge might exist across different combinations of social forms and forms of work. This could be knowledge about the "orchestration" of situations of teaching and learning, i.e., the conditionality of visible and deep structures (Oser and Baeriswyl 2001). In the context of Oser's concept of teaching and learning, visible and low-inferent features of situations of teaching and learning are described by the term visible structures. These include social forms and forms of work as well as learning materials or tasks and thus the content that is the subject of a learning situation. The presence of certain visible structures does not initially allow any conclusions to be drawn about the process quality of teaching and learning, since within the same visible structure the quality of the interaction between teachers and learners can be completely dif-ferent (Oser and Baeriswyl 2001). Deep structures refer to the learning processes of learners that are not directly observable (Oser and Baeriswyl 2001) and that account for learning success. The deep-structural quality of teaching and learning results from the teacher's knowledge of the necessity to relate two aspects to each other in a pedagogically and didactically judicious way, i.e., aiming at an adequate teaching goal on the one hand and choosing the 'appropriate' visible-structural method to reach it on the other. The knowledge about this conditionality of visible and deep structures then represents on a theoretical-conceptual level the commonality across all tasks of the knowledge facet. This is because this knowledge could be useful for answering all tasks of the facet: If a teacher knows that visible structure features (e.g., a teacher lecture or group work) are neither effective nor ineffective as such, but can differentiate under which conditions a visible structure is more or less helpful in enabling specific deep structural and thus cognitive processes in learners, then she will have a higher probability of solving all tasks in this facet.

Multidimensional conceptualization
The knowledge about methods and concepts of teaching and learning was subdivided by Marx et al. (2017) into two subfacets: a) "social forms and forms of work as well as their combination and their target-adequate and effective use in situations of teaching and learning"; and b) "concepts for individualized, cooperative, or open forms of learning arrangements and their implementation". This was justified by the fact that methods mostly represent smaller units of (inter)action than the "larger" concepts of teaching and learning. Operationalizing these two subfacets does not seem to make sense because subfacet a) is a subset of subfacet b). A better way to conceptualize knowledge about methods and concepts of teaching and learning in a multidimensional way is to identify key methods that are frequently addressed in the advice literature and conceptualize each method as one subfacet of this knowledge. Such a conceptualization accommodates the idea that this knowledge might be acquired and (re)present in ACE teachers in a less formally curricular way; it may be more experiential and thus "insulated" and "scattered". The fact that different methods of teaching and learning require different knowledge to implement them will be exemplified by three essential combinations of social forms and forms of work: the teacher lecture, group work, and the feedback method. Knowledge about the design of a teacher lecture requires cognitive-psychological knowledge (e.g., knowledge about the capacity of working memory) in combination with knowledge about methods and concepts of teaching and learning: Through the teacher's lecture, information can be offered on a topic and this can be helpful in subjectively laying out or acquiring a basic cognitive structure (subtasks A and B; see Table 1 below). For enabling more complex cognitive operations (in the concrete situation), however, the method is usually less helpful (subtasks C, D, E). Knowledge about the teacher's lecture was operationalized in the ThinK project by the task in Table 1.
When designing group work, knowledge of social psychology in combination with knowledge about methods and concepts of teaching and learning is necessary. Here, for example, knowledge is important about different types of interdependencies Table 1 Task to measure the knowledge about the teacher lecture Please assess which of the following statements are correct or incorrect.

Please check one box in each row
The teacher lecture (also called the lecture method) is appropriate when ... Incorrect Correct (A) ... the main objective is to convey information (B) ... an introduction to a particular area is to be given (C) ... retention of the material over a longer period of time is desired (D) ... the participation of the learners is an essential prerequisite to achieve a learning goal (E) ... the material is complex for the learners or contains a lot of details (B) The learners should create a collage in small groups on the topic of environmental protection, for which each learner has to find five examples of environmental "sins" in everyday life, documented by pictures from magazines (C) The learners are to write a text together and each is responsible for one paragraph (D) The learners are to work together to find explanations for the outcome of a physics experiment (E) The learners have to solve a mathematical text problem together of the learners involved and, related to this, knowledge about the (non-)suitability of different task types (disjunctive, conjunctive, or additive tasks) for group work in order to avoid phenomena that are demotivating for the learners, such as the freerider effect (Wecker and Fischer 2014). This knowledge was operationalized by the task in Table 2, which was semantically adapted from Voss et al.'s (2011) COACTIV test.
Giving feedback is one of the most frequently used methods of teaching and learning to support learning processes and behavior change (Strijbos and Müller 2014). Giving feedback requires, among other things, knowledge of motivation psychology in combination with knowledge about methods and concepts of teaching and learning. Knowledge about the influence of personal factors (e.g., attributions) on the processing of feedback is relevant here. Attribution of feedback is significant because motivational and emotional effects, as well as beliefs regarding possible individual scopes for action, are associated with the assumed causes of an event. Central here is the question of whether a particular action outcome is judged by a person to be influenceable currently and in the future. This corresponds to an internal, variable, and controllable attribution of causes as conceptualized in Weiner's Table 3 Task to measure knowledge of the feedback method A learner shows below-average performance compared to his previous performance. Which feedback after disappointment is less well suited, and which is particularly well suited, for increasing his motivation to engage with the course content?

Please check one box in each row
Less well suited Especially well suited (A) "You worked too cursorily this time. If you put more effort into it, you can do it." (B) "The tasks were just too hard this time. I'm sure it will work better next time!" (C) "That's not bad at all, you were just unlucky this time." (D) "Don't worry about it, your strengths lie in other areas." (1985) classification scheme of reasons for action outcomes. This knowledge was operationalized by the task in Table 3, which was semantically adapted from the COACTIV test of Voss et al. (2011).

Hierarchical conceptualization
The examples given above show that different knowledge is relevant for different combinations of social forms and forms of work. A third way to conceptualize knowledge about methods and concepts of teaching and learning is a hierarchical conceptualization. A hierarchical conceptualization includes general and specific knowledge for different combinations of social forms and forms of work. General knowledge can be knowledge about the "orchestration" of situations of teaching and learning, i.e., the conditionality of visible and deep structures (Oser and Baeriswyl 2001). Specific knowledge can be, for example, the knowledge presented above about the teacher lecture, the group work, or the feedback method. Considering the idea that knowledge about methods and concepts of teaching and learning might be acquired and (re)present in ACE teachers mostly in an experiential way and thus "insulated" and "scattered", it seems plausible that general knowledge and specific subfacets as well as each of the specific subfacets are to be conceptualized independently of each other.

Research questions and assumptions
The questions addressed in this paper are whether (1) unidimensional, multidimensional, or hierarchical modeling better represents ACE teachers' knowledge about methods and concepts of teaching and learning, and (2) how reliably this knowledge is measured. We assume that a hierarchical conceptualization explains the data best. This is due to the fact that the tasks were developed as a general knowledge facet on the basis of the theoretical-conceptual considerations elaborated in Chap. 3, as well as the fact that in each case concrete methods of teaching and learning and thus specific contents are addressed in the tasks for which no systematic connections are to be expected for the ACE teachers.

Sample
A total of N = 212 ACE teachers participated in the main study of the ThinK project, which is an ad hoc sample. Teachers aged 24 to 77 years (M = 48.47; SD = 12.16) had between 0.2 and 50 years of teaching experience at the time of the survey (M = 12.71; SD = 10.38) and taught a mean of 12.13 h per week (SD = 11.10). The proportion of university graduates (72.77%), of individuals who find majority parttime employment with multiple educational institutions (64.34%), and of female teachers (51.89%) indicates this sample is typical of ACE (Autorengruppe wbpersonalmonitor 2016). The teachers in this sample are active in different ACE reproduction contexts (Schrader 2010) ("companies" and "free market": 29%; valueor interest-based "communities": 21%; publicly (co-)funded educational institutions with a legal mandate ("state"): 50%). They offer events on the full range of adult education topics. Only about one-third of teachers working in ACE have a formal major or minor degree in educational sciences (Autorengruppe wb-personalmonitor 2016). This group includes school teachers as well as those trained through educational sciences programs whose commonality is their training in educational science content that includes PPK (Linninger et al. 2015). A second group consists of teachers without a major or minor in education sciences, who make up about two-thirds of teachers in ACE, and who qualify for a teaching position primarily through their subject expertise (Autorengruppe wb-personalmonitor 2016). Teachers with a degree in educational sciences are-almost typically corresponding to these proportions-included in the sample with N = 54 (25.47%). Teachers without a degree in educational sciences comprise the rest of the sample (N = 158 (74.53%)).

Implementation
The sample was recruited with the help of two large adult education centers in Bavaria and North Rhine-Westphalia, as well as the Chamber of Industry and Commerce in North Rhine-Westphalia and various trainer networks. Participation took place individually on a prepared laptop in the premises of the institutions, was voluntary, and was rewarded. 42.45% of the sample was recruited as described, the remaining part of the sample completed the questionnaire online, also voluntarily and rewarded. All teachers had the total of 67 test items from the entire PPK test. In those cases where the ACE teachers' test time went far beyond the agreed upon frame of approximately 90 min, they were given the option to skip the remaining test items and complete only the socio-demographic information. Therefore, the test items were presented in three versions with different orders.

Instrument
The tasks analyzed in this paper represent a subset of the entire test. The subjects of the analyses are tasks from the ThinK project, some of which were developed subsequently to the tasks of Voss et al. (2011) and thus overlap with them, but are not congruent. All five tasks that are the subject of the analyses in this paper are in the same task format as the tasks in Tables 1, 2 and 3, which comprise four or five subtasks in true-false answer format. The true-false responses share a common task stem and thus resemble testlets (Wainer and Kiely 1987). Each subtask was scored 0 (unsolved) or 1 (solved). A total of eight subtasks were excluded from the analyses reported below. The exclusion was made for empirical reasons (subtasks were solved by almost all ACE teachers or remained unsolved by a large number of subjects). The five tasks had their content focus on different methods of teaching and learning.

Dealing with missing values
Especially in so-called low-stakes assessments, test takers skip items (omitted items) or abort the processing and produce not-reached items in the process for various reasons. Rubin (1976) proposed the distinction between missing completely at random (MCAR), missing at random (MAR), and not missing at random (MNAR) for the classification of missing values, which has become widely accepted in empirical educational research (Lüdtke et al. 2007). In the literature on missing data, MAR and MCAR are also referred to as so-called ignorable nonresponses. The full information maximum likelihood (FIML) estimation procedure or multiple imputation (MI) are currently considered state-of-the-art methods for dealing with these two cases.
Since it is not certain what type of missing values our missing values are according to Rubin (1976), missing values are treated as such in this paper. This is because simulation studies indicate that IRT models are relatively robust when only a few values are missing (Rose et al. 2010). For the five tasks presented here, 114 of 3180 values are missing, or only 3.6%, which is considered a fairly unproblematic relative proportion (Kline 2016). To avoid having to exclude individuals with missing values on individual variables from the analyses, the FIML estimation procedure was used in Mplus (Muthén andMuthén 1998-2015), as listwise case exclusion can lead to erroneous findings in addition to a loss of efficiency in parameter estimation (Lüdtke et al. 2007).

Measurement framework: Multidimensional item response theory (MIRT)
Item response theory (IRT) is often referred to as modern probabilistic test theory, in which the probability of solving a task is modeled by one or more latent variables (constructs). Before fitting an IRT model, the dimensionality of the tasks used should be carefully considered. For categorical data, as is often the case with knowledge tests, MIRT is a good choice for analyzing the dimensionality of a construct (Reckase 2009;Kelava et al. 2020). This paper focuses on the dimensionality analysis of knowledge about methods and concepts of teaching and learning. In doing so, the conceptualizations presented in Chap. 3 are specified and model fit is compared using information-theoretic measures. Further steps, including checking the specific objectivity of the tasks and differential item functions (see e.g. Mair 2018), are not the subject of the representations in this paper, but were undertaken.

Specification of the unidimensional model
The unidimensional conceptualization, as presented in Chap. 3.1, is modeled by a unidimensional model. If the assumption of unidimensionality exists, the differences of the individuals in the indicators are explained by a common latent variable and one measurement error is modeled per indicator. Of the measurement errors, we assume that they are uncorrelated (Fig. 1a). Of the subtasks of the tasks, we assume that they equally index the respective thematized knowledge, which is specified by equal factor loadings. If person scores are reported, they clearly refer to the one specified latent variable.

Specification of the multidimensional model
Consistent with the proposed multidimensional conceptualization in Sect. 3.2 is a correlated-factors model, which assumes knowledge dimensions that are separable from one another but correlated (Fig. 1b). In a correlated-factors model, a construct originally conceptualized as unidimensional is often split into different latent variables (Brown 2006). However, in the correlated-factors model, no latent variable is introduced into the model that captures the knowledge unifying the facets across all items (Reise et al. 2010).
We assume that the subtasks of the tasks equally index the respective thematized knowledge, which is specified by equal factor loadings. We also assume that the measurement errors of the indicators are uncorrelated. Which correlations between the subfacets or the individual factors are ultimately revealed is open. However, due to the different content areas addressed in the tasks and the low level of standardized pedagogical training and continuing education of ACE teachers, these are likely to be low. If person scores are reported for individual latent variables of a correlatedfactors model, then these are an amalgam of the latent variable that is not co-modeled but is common to all indicators, as well as the specific latent variables modeled in each case (Reise et al. 2010;Reise 2012).

Specification of hierarchical models
Fitting the assumption of a hierarchical conceptualization with general and specific knowledge are models of the bifactor family (Fig. 1c): "Bifactor models are potentially applicable when (a) there is a general factor that is hypothesized to account for the commonality of the items; (b) there are multiple domain specific factors, each of which is hypothesized to account for the unique influence of the specific K domain over and above the general factor; and (c) researchers may be interested in the domain specific factors as well as the common factor that is of focal interest." (Chen et al. 2006). In addition to the intended use outlined by Chen et al. (2006), bifactor models are also used to control for "nuisance factors", e.g., when only the general factor is in focus but not the specific factors due to, for example, different task stimuli (DeMars 2013).
For each subtask, two loading parameters are specified for models of the bifactor family, for which (at least) three different assumptions can be made: 1. All loading parameters are freely estimated (Reise 2012)-hereafter referred to as the bifactor model. 2. The loading parameters of the subtasks are proportional to each other for the general factor and the specific factor (Bradlow et al. 1999)-hereafter referred to as the testlet model. 3. The loading parameters are the same for the subtasks of a task stem, respectively for the general factor and the specific factor (Wang and Wilson 2005)-hereafter referred to as the Rasch Testlet Model.
We assume that the measurement errors of the indicators are uncorrelated. In accordance with the considerations in Chap. 3.3, the Rasch Testlet Model seems to be the most appropriate, since it is assumed that the subtasks equally index the respective thematic knowledge.
When interpreting the specific factors, note the aspect b) addressed above by Chen et al. (2006): The specific factors explain additional variance to the general factor. If one reports person scores, then it should be noted that the general factor scores would refer to the general knowledge outlined in Sect. 3.1 and the specific factor scores would refer to the specific knowledge about methods of teaching and learning outlined in Sect. 3.2.

Statistical analyses
All analyses were performed using Mplus version 7.4 (Muthén andMuthén 1998-2015), specifically in the MIRT framework and accordingly the information-theoretic measures AIC, AICc, CAIC, BIC, SABIC are reported for model comparison. An open research issue is what the penalty function of informationtheoretic measures ideally looks like (Rost 2004). Therefore, the two most common measures AIC and BIC as well as variants of these are reported (AICc, CAIC, SABIC), which take sample size etc. into account.

Results
To answer the question "Does a unidimensional, a multidimensional, or a hierarchical model better explain ACE teachers' knowledge about methods and concepts of teaching and learning?" the models specified in Sect. 5.6 were fitted to the data: a 1PL correlated-factors model (Fig. 1b) and a unidimensional 1PL model (Fig. 1a), Bold mark indicates the lowest value of an information-theoretic measure AIC Akaike information criterion, AICc bias corrected AIC, CAIC consistent AIC, BIC Bayesian information criterion, SABIC sample-size adjusted BIC as well as (in Fig. 1c), a Bifactor Model (Reise 2012), a Testlet Model (Bradlow et al. 1999), and a Rasch Testlet Model (Wang and Wilson 2005). The majority of the information-theoretic measures (AIC, AICc, CAIC, and BIC) indicate the superiority of the Rasch Testlet Model over the other models (see Table 4), and the SABIC indicates the superiority of the 1PL correlated-factors model over the other models.
In order to present as complete a picture of the results as possible, item difficulties and standard errors (of the subtasks) as well as reliabilities are mapped below for the 1PL correlated-factors model, the unidimensional 1PL model, and the Rasch Testlet Model. For all multidimensional models, the loading structure is reported. For the Bifactor Model and the Testlet Model, item difficulties, standard errors, loading structure, and reliabilities can be found in the Appendix (Tables 10, 11 and 12).

Correlated-factors model and unidimensional model
The loading structure of the correlated-factors model as well as the item difficulties and standard errors of the unidimensional model are shown in Table 5. For the five factors, loading parameters show between 0.475 and 0.720, all of which are statistically significant. Combined with the predominantly low latent correlations between the factors (hypothesized in Sect. 5.6) (between r = 0.011 and r = 0.391)-three out of ten turned out to be statistically significant (see Table 6)-this is an indication that the assumption of unidimensionality of knowledge seems questionable and multidimensionality seems likely. In order to assume unidimensionality or to combine the tasks into an overall value, significantly higher (latent) correlations between the factors would have to be shown, which do not differ substantially from 1.   For the correlated-factors model, item difficulty ranges between -4.115 logits ≤ item difficulty ≤ 0.997 logits. For the unidimensional 1PL model, item difficulty ranges between -4.708 logits ≤ item difficulty ≤ 1.091 logits.
For the 1PL correlated-factors model and the unidimensional 1PL model, the Kuder-Richardson 20 Formula (KR-20) was used to determine the internal consistency of the subtasks. KR-20 is essentially equivalent to Cronbach's alpha for dichotomous data as given here (Lienert and Raatz 1998). For the unidimensional model, internal consistency was found to be low at 0.56 (see Table 7). This is another indication that the property of unidimensionality is not present for the test. The internal consistency of factors A-E is also low (between 0.29 and 0.60), but it is important to bear in mind that only three sub-tasks per factor were included in the analyses, which may contribute to the low internal consistency.

Rasch testlet model
The loading structure and item difficulties of the Rasch testlet model are shown in Table 8. The item difficulty ranges from -4.066 logits ≤ item difficulty ≤ 0.987 logits. Statistically significant factor loadings are shown for the general factor (g) for all subtasks, as well as for the five specific factors (A-E). The factor loadings of the specific factors are consistently higher than those of the general factor. These findings also indicate here that one should not assume unidimensionality, but rather a general factor.
For the general factor and the specific factors, the question arises as to what proportions these explain the variance in response behavior. To answer this question, the coefficients omega hierarchical (ωh) and omega subscale (ωs) are used (Reise 2012;Revelle and Zinbarg 2009). For ωh, the effects of the specific factors (and measurement error) are controlled; for ωs, the effect of the general factor and the effects of the other specific factors (and measurement error) are controlled. ωh is an important indicator of how reliably a test measures a construct (Revelle and Zinbarg 2009).
For the Rasch Testlet Model, ωh = 0.41 for the general factor-general knowledge across different methods of teaching and learning-showed a comparatively low value (see Table 9). This is another indication that the unidimensionality of the test is questionable. ωs indexes how reliably the specific factors-the specific knowledge about a method of teaching and learning-are measured. Low to satisfactory values were shown with ωs ranging from 0.29 to 0.67, although interpretation should take into account that only three subtasks per factor were included in the analyses, which may contribute to low internal consistency.   Omega total (ωt) represents the proportion of variance attributable to all factors. ωt = 0.75 indicates that 75% of the differences in response behavior are explained by the total of six factors and 25% of the differences are due to measurement error. Omega (ω) is the reliability of the specific factors, but not adjusted for the effect of the general factor such as ωs (Reise 2012). ω is low to good with values between 0.46 and 0.75.

Summary and discussion of the results
This paper addressed the question of whether a unidimensional, a multidimensional, or a hierarchical modeling of ACE teachers' knowledge about methods and concepts of teaching and learning is more plausible from a theoretical-conceptual perspective and fits the data better empirically. This question arises against the background of a large number of methods and concepts of teaching and learning serving different goals and phases within learning processes, and a presumably more experiential and less standardized knowledge acquisition by the majority of ACE teachers not formally qualified in educational sciences. The aim of this paper is to gain indications for the further development of the present test, focusing on the facet of knowledge about methods and concepts of teaching and learning.
In Chap. 3, a unidimensional, a multidimensional, and a hierarchical conceptualization of knowledge about methods and concepts of teaching and learning were presented. The assumption of a hierarchical structure of this knowledge seems plausible, which assumes general knowledge about different methods of teaching and learning and specific knowledge for specific methods of teaching and learning. Empirically, we tested whether a correlated-factors model or a unidimensional model on the one hand, and a Bifactor Model, Testlet Model, or Rasch Testlet Model on the other fit the data best. Consistent with assumptions, four out of five informationtheoretic measures point to the superiority of the Rasch Testlet Model.
For the Rasch Testlet Model, the general factor-general knowledge across different methods of teaching and learning-showed a low omega hierarchical (ωh). One explanation for this may be that only a few tasks were used to measure knowledge and that heterogeneous topics were addressed. This is an indication that a further development of the subscales will be useful. These findings are plausible if one takes into account that ACE teachers have "insulated" and "scattered" knowledge against the background of the mostly unsystematic pedagogical training of ACE teachers. In view of this "insulated" knowledge, i.e., uncorrelated knowledge from single aspects about specific methods, it seems to make sense to consider the subscales and the further development of these into multiple unidimensional tests (or tests and methods of analysis) that already explicitly take into account the withinitem multidimensionality during the construction (Sorrel et al. 2016).
For hierarchical models, the question always arises which constructs are modeled in the general factor and which in the specific factors. Whether the general factor models knowledge about the relational necessity of visual and deep structures or domain-independent cognitive abilities cannot be answered conclusively in this paper. The same applies to the specific factors: Whether these are actually attributable to the variance of the true value or are construct-irrelevant needs to be further substantiated by validation studies. Indications that construct-relevant variance is modeled in the general factor and the specific factors are provided by the study of Marx et al. (2018).
One of the information-theoretic measures indicates the superiority of the correlated-factors model. When modeling the data by a correlated-factors model, out of ten possible correlations between the factors, statistically significant correlations are shown for only three. This can be interpreted as a further indication that the knowledge addressed in the tasks should not be conceptualized and measured in a unidimensional way. In addition, a unidimensional modeling showed a low internal consistency, which indicates a low homogeneity of the tasks and is an indication of the usefulness/necessity of a multidimensional or hierarchical conceptualization and measurement.
A limitation to all our findings is the overall small sample that was available for the study. A consequence of this is that only a selection of tasks was included in the analyses in order to keep the number of parameters to be estimated in reasonable proportion to the sample size (see Bentler and Chou 1987). What structure emerges for a larger sample and with the inclusion of additional tasks and facets is an open question. Similarly, it is unknown whether and to what extent the factor loading structure might change if the knowledge test was embedded in a learning situation (Mislevy 2018).
Nevertheless, insights for the further development of the test can be gained from the presented study: In the ThinK project, tasks were adapted, based on preliminary work, and were not explicitly constructed according to one of the presented measurement models. For future projects, the development of multiple unidimensional tests to capture knowledge about methods and concepts of teaching and learning seems reasonable. Another possibility is to develop a test that explicitly addresses multidimensionality. One way to conceptualize knowledge of these methods and concepts for ACE teachers in multidimensional terms is to map key methods as one dimension of that knowledge at a time. A challenge here will be the multitude of different methods and concepts of teaching and learning. The further developed test must then be checked for its dimensionality and, as addressed in Chap. 5, for the specific objectivity of the tasks and differential item functions (DIF), among other things. Further validation steps can be, e.g., the examination of the connection between the knowledge of the teachers mediated by the quality of teaching situations on the (learning) success of participants of ACE. In addition, it would be useful to examine whether (facets of) PPK can be distinguished from pedagogical beliefs or how it might be related to these beliefs, to other aspects of professional knowledge, or to professional experience.
Another challenge in the further development of the test will be to develop tasks that are more difficult to solve in order to be able to better differentiate between persons with a lot of knowledge about methods and concepts of teaching and learning. The comparatively low difficulty of the presented test can be explained by both the answer format, and the omission of technical terms in the test.
Overall, the present work can contribute to the development of tests to assess aspects of ACE teachers' professional competence that are needed in the context of educational policy and practice initiatives. With construct-inherent interpretation of the specific factors as well as the general factor, a short scale can be offered to assess ACE teachers' knowledge about methods of teaching and learning. However, before further validation steps pointed out in the paper are conducted, we only recommend this short scale for formative assessment and not for summative "highstakes" assessments.     Funding Open Access funding enabled and organized by Projekt DEAL.

Appendix
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.