Algorithms for the adaptive assessment of procedural knowledge and skills

Procedural knowledge space theory (PKST) was recently proposed by Stefanutti (British Journal of Mathematical and Statistical Psychology, 72(2) 185–218, 2019) for the assessment of human problem-solving skills. In PKST, the problem space formally represents how a family of problems can be solved and the knowledge space represents the skills required for solving those problems. The Markov solution process model (MSPM) by Stefanutti et al. (Journal of Mathematical Psychology, 103, 102552, 2021) provides a probabilistic framework for modeling the solution process of a task, via PKST. In this article, three adaptive procedures for the assessment of problem-solving skills are proposed that are based on the MSPM. Beside execution correctness, they also consider the sequence of moves observed in the solution of a problem with the aim of increasing efficiency and accuracy of assessments. The three procedures differ from one another in the assumption underlying the solution process, named pre-planning, interim-planning, and mixed-planning. In two simulation studies, the three adaptive procedures have been compared to one another and to the continuous Markov procedure (CMP) by Doignon and Falmagne (1988a). The last one accounts for dichotomous correct/wrong answers only. Results show that all the MSP-based adaptive procedures outperform the CMP in both accuracy and efficiency. These results have been obtained in the framework of the Tower of London test but the procedures can also be applied to all psychological and neuropsychological tests that have a problem space. Thus, the adaptive procedures presented in this paper pave the way to the adaptive assessment in the area of neuropsychological tests. Supplementary Information The online version contains supplementary material available at 10.3758/s13428-022-01998-y.


Introduction
In this article, a novel procedure for the adaptive assessment of human problem-solving is presented, which is suitable for performing the assessment with certain cognitive or neuropsychological tests like, for instance, the Tower of London (ToL) test.The theory on which the procedure is based is named procedural knowledge space theory (Stefanutti, 2019).It is a specialization of the knowledge structures theory (KST; Doignon & Falmagne, 1985;1999;Eppstein, & Hu, 2013) and also in the area of the socalled cognitive diagnostic models (CDM; Bolt, 2007;de la Torre, 2009;DiBello & Stout, 2007;Tatsuoka, 1990).Such theories are based on a problem-to-skills relationship which provides the fundamental skeleton of the developed models.
PKST is built upon the notion of a "problem space" (Newell & Simon, 1972), and it is applicable to all and only those problem situations for which a problem space exists and can be given.As such, PKST is at the meeting point between the theory of problem spaces (Newell & Simon, 1972) and that of knowledge spaces (Doignon & Falmagne, 1985).
In the original definition by Newell and Simon (1972), a "problem space" is the internal representation that a problem solver makes of a given task environment.Then, problemsolving consists of exploring this internal representation, in search of a solution.Very often, in the literature (see, e.g., Langley, Magnani, Schunn, & Thagard, 2005;Zhang & Norman, 1994), the term "problem space" also refers to a conceptual structure that can be objectively constructed and displayed (e.g., by a computer program) by repeatedly applying a finite set of transformation rules, starting from the initial configuration of the problem.In this article, the term "problem space" refers to this objectively obtainable structure.A classical example of such a construction is offered by the problem space of the Tower of Hanoi, described by Newell and Simon (1972).Another example, which is extensively described and applied in this article, is the problem space of the Tower of London test, a rather well-known neuropsychological test of executive functions (Shallice, 1982).
In PKST, the problem space represents complete knowledge over the problem.It is all a perfect problem solver needs to know for successfully solving a given set of problems.Such an ideal representation is based on properties that need not be satisfied by the knowledge state of an imperfect problem solver (e.g., a human one).Indeed, at least two sources of "imperfect" answers can occur in practice.The former deals with a sort of intransitivity of the human cognitive capability, in the sense that being able of solving two distinctive sub-problems does not necessary mean being able to solve the problem that concatenates those two sub-problems.The latter deals with the incomplete knowledge over the problem domain.In this case, the knowledge state of a problem solver is a strict subset of the whole problem space (a problem subspace).PKST is about the knowledge states of both perfect and imperfect problem solvers, the collection of which is named the procedural knowledge space.
Both the problem space and the procedural knowledge space are deterministic models.As such, they cannot be empirically validated, for instance, by means of standard goodness-of-fit statistics.A probabilistic model that incorporates all the critical deterministic assumptions of PKST has been recently developed by Stefanutti et al. (2021).It is based on the notion of a Markov solution process (MSP), a stochastic process that represents the problem solution behavior of a problem solver.
The MSP model (henceforth MSPM) can be used for uncovering (inferring) the knowledge state of an individual, on the basis of the solution behavior observed in a given subset of problems of the problem space.In this article, a novel adaptive assessment procedure, based on the MSPM, is described.The procedure features many interesting aspects.In the first place, being an adaptive procedure, it minimizes the number of questions and, at the same time, it maximizes the information on the underlying state of knowledge.Problem spaces may be large, containing hundreds or even thousands of different problems and subproblems.To give an example, the problem space of the ToL contains in the whole 1260 distinct problems, but the test by Shallice (1982) only uses 12 of them.What type of inference can be done from these fixed 12 problems to the remaining 1248, for every single individual, is not immediately obvious.The proposed procedure may be used for making inferences over the whole problem domain on the basis of a reasonably small subset of problems, which is tailored to the individual.
In the second place, existing adaptive assessment procedures in KST are not trivially applicable to response data that, going beyond the correct/incorrect response format, keep track of the whole trail of moves performed in intermediate steps of the problem solution process.The capability of exploiting this surplus of information, which arises naturally in problem-solving, is the most critical and important feature of the proposed procedure.
The third distinctive feature of the procedure is the assessment paradigm on which it is based.In a problem space, the order of difficulty of the problems could fail to be linear (i.e., from the easiest to the most difficult).There is a quite natural assumption for the problems in a problem space that provides a reason for this: If a person can solve a problem by following a specific solution path along the problem space, then, excluding random error, that person will be able to solve all the sub-problems that are encountered along that path.In general, this assumption induces an order of difficulty on the problems which is only partial.In PKST, this assumption is named the "sub-path assumption".Therefore, PKST does not impose any strong measurement requirements to data.Items do not need to be all aligned along a unidimensional continuum, and there is no need to throw away items that do not conform to this requirement.
The manuscript is organized as follows.Backgrounds are given in "Background", whereas the proposed adaptive assessment procedures are presented in "Adaptive assessment in a problem space".In both "Background" and "Adaptive assessment in a problem space" the theoretical explanations are illustrated with practical examples.In "Simulation study" and "Simulation study based on real data", three MSPM-based procedures were compared in two simulation studies.In "Simulation study", a series of simulation studies were carried out with the aim of testing how different assumptions concerning human planning affect the capability of the procedures to predict the actual planning skills of an individual.In "Simulation study based on real data", some simulations were run by using a pre-existing data set consisting of the responses of 154 participants to a subset of Tower of London problems.A general discussion concludes the article ("General discussion").

Background
Different theoretical frameworks contribute to the state of the art of the present research.A section for each of these topics follows.

The Tower of London test
Throughout the article, the various concepts of PKST are illustrated with the help of the example of the Tower of London test (Shallice, 1982).In particular, "Simulation study" and "Simulation study based on real data" describe extensive applications of PKST to the ToL test.For these reasons, the ToL is briefly described here.
The ToL was developed by Shallice (1982) for assessing planning deficits in patients with lesions of the frontal lobe.Today, it is used for assessing planning ability in the clinical and non-clinical population (Berg & Byrd, 2002).The ToL consists in three equally spaced pegs with different heights, mounted on a wooden support.An example of the spatial configuration of the ToL is illustrated in Fig. 1.
In total, there are 36 spatial configurations, each of which forms a different problem state.The three balls of different colors can be moved, one at the time, from one peg to another.Each problem consists of transforming a certain initial configuration, named initial state, into a final configuration, called goal state.For instance, in Fig. 2, where a portion of the ToL problem space is represented, the pair of problem states s 4 , s 9 can be seen respectively, as the initial state and the goal state of a problem.The task is correctly performed if the goal state is obtained with a minimum number of moves.Thus, to avoid mistakes, the problem solver must plan the sequence of moves in advance.In the original ToL test, developed by Shallice (1982), an indirect measure of the difficulty of a problem is obtained as the minimum number of moves necessary to solve it.However, recent studies (e.g., Berg, Byrd, McNamara, & Case, 2010;Kaller, Unterrainer, Rahm, & Halsband, 2004;Kaller, Rahm, Köstering, & Unterrainer, 2011;McKinlay et al., 2008;Newman & Pittman, 2007) found that other factors affect the difficulty of a problem.Some of them are the number of alternative solutions for the problem, the initial configuration of the balls on the pegs (named "start hierarchy"), and the final configuration (named "goal hierarchy").As it will be seen, the approach proposed in this article goes much beyond the notion of minimum number of moves.
As already mentioned, the problem space of the ToL consists in 6 × 6 = 36 different problem states obtained as the Cartesian product of the six different permutations of the three colors times the six spatial arrangements of the balls in the pegs.In the sequel, every single problem state in the ToL problem space is uniquely referred to by using a pair ab of numbers, where a stands for one of the six spatial arrangements whereas b stands to one of the six color permutations.The reader is referred to Stefanutti et al. (2021) for the complete list of problem states codings.

Knowledge space theory
The theory of knowledge spaces (Doignon & Falmagne, 1985;1999;Falmagne & Doignon, 2011) is a mathematical approach to a non-numerical assessment of knowledge.In KST, the domain of knowledge is the nonempty set Q of all the problems in a given field of knowledge (e.g., mathematics, chemistry, statistics, etc.).The knowledge state of a student is the set K ⊆ Q of all the problems that she is able to solve.The knowledge structure is the collection K of all the knowledge states.By definition, K always contains both the empty set and Q.A knowledge structure is named knowledge space if, for any subfamily F ⊆ K, the union of the subsets in F is still in K.
KST was initially developed as a behavioral theory, in the sense that it provided no assumptions or descriptions of cognitive processes, skills, or resources behind the solution of a problem.Later, the theory was extended to the assessment of skills (Doignon, 1994;Düntsch & Gediga, 1995;Falmagne, Koppen, Villano, Doignon, & Johanessen, 1990;Gediga & Düntsch, 2002;Stefanutti & de Chiusole, 2017;Ünlü et al., 2013;Heller, Stefanutti, Anselmi, & Robusto, 2015;Korossy, 1997;Korossy, 1999).Such extension is known as competence-based knowledge space theory (CbKST; Heller, Ünlü, & Albert, 2013;Heller, Augustin, Hockemeyer, Stefanutti, & Albert, 2013;Stefanutti & Albert, 2003).Given a set of skills, the competence state is the set C ⊆ of skills mastered by an individual.The collection C of all the competence states is the competence structure.The problems and the skills are related by a skill map (Doignon, 1994), which is a triple (Q, , τ ) where τ : Q → 2 is a function assigning to each problem in Q a non-empty subset of skills in .

Procedural knowledge space theory
Procedural knowledge space theory (Stefanutti & Albert, 2003;Stefanutti, 2019) generalizes the application of KST and CbKST to the area of human problem-solving and procedural knowledge.
Let be a set of operations.For example, in the ToL there are six operations each of which move a ball from one peg to another, and in particular, naming the three pegs as left, center and right, one has: A sequence of operations in is denoted as The collection of all the sequences of operations of arbitrary finite length, including the empty sequence is where Z + is the set of the non-negative integer numbers.
A problem space is formally defined as a triple P = (S, , •), in which S is a nonempty set of problem states, is a non-empty set of operations, and • : S × * → S is an operator that satisfies the following properties: where s ∈ S and σ, π ∈ * .The operator • is called operation application.
Figure 2 shows the directed graph of a portion of the problem space of the ToL test.Each vertex in the graph corresponds to a problem state in the set S ToL .This last contains nine of the 36 problem states of the ToL.The directed edges of the graph are labeled by the moves in In the running example of ToL, the pair (s 2 , s 9 ) of problem states in Fig. 2 is a problem because the sequence of operations b ā b ā transforms the initial problem state s 2 into the goal problem state s 9 .
The set of all the problems in P is thus It is worth noticing that the set Q obtained in this way is nothing else than what in KST is named the domain of knowledge.Any pair sπ (without the dot in between) is called a solution path.The solution path sπ solves problem (s, t) ∈ Q if s • π = t.The set of all the solution paths turns out to be = S × ( * \ { }).
In the subsequent example, only a part of the whole set of problems for the problem space in Fig. 2 is considered, namely Q ToL = {(s 1 , s 9 ), (s 3 , s 9 ), (s 4 , s 9 ), (s 7 , s 9 ), (s 8 , s 9 )}.Since all the problems in Q ToL have form (s i , s 9 ), for lightening the notation, each of them is just represented by the initial state s i .To solve a problem, one needs to know at least one of the solution paths of that problem.For instance, problem s 1 has two possible solution paths, namely s 1 ab ā b ā and s 1 ba b ā ā.It is left to the reader to check that the set of all solution paths that solve any one of the problems in Q ToL is ToL = {s 8 ā, s 7 ā ā, s 4 ā b ā, s 3 a b ā ā, s 1 ab ā b ā, s 1 ba b ā ā}.
Solution paths are partially ordered.Precisely, a solution path sπ is a subpath of another solution path tσ (denoted by sπ tσ ) if there are α, β ∈ * such that σ = απβ and t • α = s.For instance, in Fig. 2, consider the two solution paths s 4 ā b ā and s 1 ab ā b ā.It is easily seen that the former is a subpath of the latter.In fact, by setting α = ab and β = , it holds that ab ā b ā = α ā b āβ, and s 1 • α = s 4 .The cognitive interpretation of the subpath relation is that if an individual knows a solution path, then she will also know all of its solution subpaths.
A solutions path can be seen as a kind of "procedural skill" required for solving a problem.Therefore, the collection of all the solution paths solving a certain problem (s, t) ∈ Q is denoted τ (s, t), where τ : Q → 2 is a mapping having Q as the domain and the powerset of as the codomain.Using the Cb-KST notation, the triple (Q, , τ ) is named the skill map derived from the problem space P. In this example, for the sake of simplicity, the mapping τ ToL for the subset ToL is constructed instead of deriving the mapping τ for the whole set of solution paths.The mapping τ ToL is defined as follows: is respected for all sπ, tσ ∈ .A subset of solution paths respecting path inclusion is named a competence state of the problem space P. The collection C of all the competence states is the competence space.In the running example of ToL, the collection C ToL of all the solution paths in ToL that respect the path inclusion is The set of all the problems in Q that can be solved by an individual whose competence state is C ∈ C is given by the problem function, which is defined as Thus, p(C) contains all and only those problems (s, t) that can be solved by one or more solution paths, among those contained in C. Each such problem satisfies the condition τ (s, t) ∩ C = ∅.The set p(C) is named the knowledge state delineated by the competence state C. The collection K = {p(C) : C ∈ C} of all the knowledge states is the knowledge space derived from the problem space P.

The continuous Markov procedure
Adaptive assessment is one of the most important applications in knowledge space theory.The aim of an adaptive assessment is to uncover the individual knowledge state with a minimal number of questions.Some examples of this procedure are present in fields such as education (see, e.g., ALEKS, www.aleks.com, and Stat-Knowlab, de Chiusole, Stefanutti, Anselmi, & Robusto, 2020), and psychological assessment (Donadello et al., 2017;Granziol et al., 2020).In KST, the standard procedure used for implementing the adaptive assessment is the continuous Markov procedure by Falmagne and Doignon (1988).It is an iterative procedure which uses a likelihood distribution L m : K → R with the collection K as the domain and the R as codomain.The likelihood distribution is updated at each step m of the procedure on the basis of the incoming information.Unless prior information is available, the initial likelihood distribution L 0 is the uniform one.At each step m, the procedure: (i) selects a new problem for the student; (ii) updates the likelihood distribution on the knowledge states depending on the student's response; (iii) establishes if enough information has been collected and in that case, terminates.Different rules were proposed by Falmagne and Doignon (1988) and Doignon and Falmagne (1999) for each of these three phases.The rules that are relevant with respect to this article are described below.
The questioning rule selects a problem q ∈ Q in order to minimize the total number of questions to be administered before the assessment terminates.One such rule is the socalled half-split (Falmagne & Doignon, 2011), in which any one of the problems q ∈ Q is selected among those that minimize the following quantity: where The updating rule updates the likelihood L m on the basis of the answer collected at the step m of the procedure.
Whenever the student's response is correct (incorrect), the likelihood L m (K) of all K ∈ K such that q ∈ K increases (decreases), whereas the likelihood L m (K ) for all K ∈ K such that q / ∈ K decreases (increases).The likelihood function is updated at each step m + 1 of the assessment procedure by following a Bayesian updating rule: where the parameter P (r q |K) represents the conditional probability of the observed response r q for item q given the knowledge state K.In the procedure by Falmagne and Doignon (1988), two types of probabilities are defined for each item q-a careless error probability β q and a lucky guess probability η q .Then, the P (r q |K) parameter undergoes the following constraints: if r q = 0 and q ∈ K; 1 − η q if r q = 0 and q / ∈ K; 1 − β q if r q = 1 and q ∈ K; η q if r q = 1 and q / ∈ K. ( Equation 3 is known as the response rule.
The procedure continues to select questions and to update the likelihood until a termination criterion is reached.The most used termination criterion consists of fixing a threshold p that has to be reached by the maximum of the likelihood distribution L m .The minimum value of such a threshold is .50because this is a sufficient condition for have a unimodal likelihood distribution.In general, the accuracy of the assessment improves when p approaches 1, and this occurs at the expense of efficiency.In fact, the larger p, the larger the expected number of questions that have to be administered.
An alternative representation of this updating rule, also known as the multiplicative rule, is defined as follows: where the parameters ζ K q,r q depends on the knowledge state K ∈ K, the problem q ∈ Q, and the observed response r q .In particular, ζ K q,r q is defined as follows: ζ q,1 if r q = 1 and q ∈ K; 1 ifr q = 1 and q / ∈ K; 1 ifr q = 0 and q ∈ K; ζ q,0 if r q = 0 and q / ∈ K. ( where ζ q,0 and ζ q,1 > 1 are real parameters of the assessment procedure. Moreover, (Falmagne & Doignon, 1988) have shown that the Bayesian updating rule is equivalent to the multiplicative rule under the following equalities, for each q ∈ Q: A latent knowledge state K 0 ∈ K is said to be uncoverable by the stochastic assessment procedure presented above if L m (K 0 ) approaches 1 almost surely.Several theoretical results were obtained for the multiplicative updating rule.One of them is important here because it will be used in Section "Adaptive assessment in a problem space".
Proposition 1 A latent knowledge state is uncoverable by a stochastic assessment procedure with an updating rule which is multiplicative and a questioning rule which is half-split.

The Markov solution process model
A Markov model of the solution process of a problemsolving task was proposed in Stefanutti et al. (2021).The model provides a stochastic framework for the deterministic models described in Section "Procedural knowledge space theory".It has been empirically validated for the case of the ToL test (Stefanutti et al., 2021), where it obtained a satisfactory goodness-of-fit.
A central notion for the application of the Markov model is that of a goal space where each step of the solution process of a problem is classified as "correct" or "incorrect".A goal space is a problem space where there are two special problem states f, g ∈ S that are labeled the failure and goal states, respectively.Every problem in a goal space has the form (s, g), with s ∈ S \ {f }.The formal definition of the goal space is as follows.
A goal space is denoted by the 5-tuple (S, f, g, , •).
It follows from Condition (GS1) of Definition 1 that f and g are final states.In particular, whenever the solution process of a problem enters either g or f , the problem is marked as "correct " or "incorrect", respectively, and the solution process terminates.According to Condition (GS2), each problem state different from f has a solution path that terminates in g.The graph represented in Fig. 2 is an example of a goal space, where s 9 is the goal state.The failure state is omitted in the figure, but it could be easily added as a state that can be reached by every non-goal state.
Let (Q, K) be the knowledge space derived from the goal space (S, f, g, , •).The behavior of a problem solver in knowledge state K ∈ K, who is attempting to solve problem (s, g), is modeled as a random process S = {S n : n ∈ Z + } that satisfies the following Markov property: Property 7 says that, given the last visited problem state S n−1 , the knowledge state K of the problem solver, and the initial state s, the next problem state S n is independent of the past history of the process.For the right-hand term of Eq. 7 the shortcut notation P (s j |s i , s, K) is used, which is named the transition probability from state s i to state s j .
Even with problem spaces and related knowledge spaces of moderate size, the number of transition probabilities of this type could be huge.The Markov solution process model provides a reasonable assumption that allow to drastically reduce the number of free parameters of the model by introducing constraints on transition probabilities.Let E = {(s, t) ∈ S 2 : s • ω = t for some ω ∈ } be the collection of all the elementary problems (i.e., problems each of which can be solved by a single operation).Then the assumption is: (MSP1) For every problem (s, g) ∈ Q, every pair (i, j ) ∈ E and every knowledge state K ∈ K, where β ij and η ij are free parameters of the model.In the MSP1 assumption, given any pair (i, j ) ∈ E, the value of the transition probability from i to j is either β ij or η ij , depending on whether the problem (s, g) belongs to K or not.In particular, if i is a transient problem state, then the parameter β if is the probability that a problem solver which knows at least one solution path for (s, g) made a careless error.Similarly, for j = f , the parameter η ij is the probability that a problem solver who does not know any solution path for (s, g) guesses a correct move from i.Further details of the MSPM are not presented here, since they are not needed in the sequel.For a complete exposition of the model, the reader is referred to Stefanutti et al. (2021).

Adaptive assessment in a problem space
In many psychological tests (e.g., the Tower of London test, Tower of Hanoi, mental rotation task), the different tasks are accomplished by performing a sequence of observable moves.The CMP described in "The continuous Markov procedure" is based on dichotomous answers (i.e., correct or incorrect) and it has no mechanism for capitalizing on the information provided by the observable solution process.The following example shows the drawbacks of this limitations.
Example 1 Consider the knowledge space K ToL derived in the running example in "Procedural knowledge space theory".Suppose that the state of a problem solver is {s 1 , s 3 , s 4 , s 7 , s 8 }, and that the CMP is applied for uncovering it.The beta parameters for the five problems are assumed to be β s 1 = .004,β s 3 = .03,β s 4 = .02,β s 7 = .01,and β s 8 = .007,whereas the eta parameters are assumed to be η s 1 = 10 −6 , η s 3 = 5 × 10 −5 , η s 4 = 4 × 10 −5 , η s 7 = .007,and η s 8 = .08.At the beginning of the assessment (m = 0), all of the knowledge states K ∈ K ToL have the same likelihood L 0 (K) = 1/|K ToL | (see the second column of Table 1).
At step m = 1, according to the half-split questioning rule, problem s 4 is selected because it minimizes the value of Q m (see the second column of Table 2).Suppose that a correct response is obtained for this problem.After an application of the updating rule (Eq.4), the likelihood of every knowledge state K ∈ K s 4 that contains problem s 4 is L 1 (K) = .17and that of a state K ∈ K s 4 is L 1 (K ) = .01(see the third column of Table 1).
At step m = 2, the problem that minimizes the half-split questioning rule is s 1 , as shown in the third column of Table 2. Suppose that the correct solution process (s 1 , s 3 , s 5 , s 7 , s 8 , s 9 ) is observed for the problem.An application of the updating rule yields the likelihood distribution L 2 which is shown in the fourth column of Table 1.The knowledge states in the intersection K s 1 ∩ K s 4 have a larger likelihood (i.e., .32)than that of every other knowledge state.
Table 1 Values of the likelihood distribution L m at each step m of the assessment procedure It should be noticed that the observed solution path of problem s 1 contains those of both problems s 3 and s 7 .According to the sub-path assumption, if s 1 is contained in the knowledge state of the learner, both s 3 and s 7 must be contained in it.According to this assumption, a knowledge state containing all three problems s 1 , s 3 , and s 7 should have a higher likelihood than a knowledge state that misses anyone of them.As can be seen from the fourth column of Table 1, this does not happen in the CMP.For instance, L 2 {s 1 , s 4 , s 8 } = .32> .01= L 2 {s 1 , s 3 , s 7 , s 8 }.This shows that in the CMP there is no mechanism for exploiting the surplus of information that is made available by the observed solution process, and that a new updating rule is needed for this.
To complete the example, one further question is required at step m = 3.The half-split questioning rule selects problem s 3 (see the fourth column of Table 2).Suppose that a correct response is obtained.After the last update of the likelihood, the recovered knowledge state turns out to be {s 1 , s 3 , s 4 , s 7 , s 8 )}.

Updating rules
The assessment procedures proposed in this article are capable of exploiting the whole observable solution process in updating the likelihood of the knowledge states.The assessment procedures consist of two nested loops.The outer loop starts with the presentation of a new problem (s m,0 , g) ∈ Q, where m ≥ 1, whereas the inner loop starts with a new problem state s n ∈ S, with n ≥ 0, in the solution process of (s m,0 , g).For every new problem state s n of the solution process of problem (s m,0 , g) the likelihood distribution L m,n is updated as follows: where P (s m,n+1 |s m,n , s m,0 , K) is the conditional probability of the transition from s m,n to s m,n+1 , given knowledge state K ∈ K and problem (s m,0 , g).It should be noted that Eq. 9 is nothing else than an adaptation of the Doignon and Falmagne's Bayesian updating rule described in Eq. 2.
As stated in Section "The Markov solution process model", specific assumptions can be introduced on the conditional probability P (s m,n+1 |s m,n , s m,0 , K), for reducing the number of the parameters.One of these assumptions is (MSP1) described in Eq. 8.It is recalled that in this assumption the transition probability from i to j depends on the initial problem state s 0 only.Two new assumptions denoted by (MSP2) and (MSP3) are presented below.
Assumption (MSP2) differs from (MSP1) from the fact that the transition probability from a problem state i to another problem state j is independent of the initial problem state s 0 .Under this assumption, for every problem (s 0 , g) ∈ Q, every pair (i, j ) ∈ E and every knowledge state K ∈ K the transition probability is Such probability is a β ij parameter if problem i ∈ K belongs to the knowledge state K ∈ K, it is a η ij parameter otherwise.
According to assumption (MSP3), the transition probability from a problem state i to another problem state j depends on whether both problems s 0 and i belong or not to the knowledge state K ∈ K.For every problem (s 0 , g) ∈ Q, every pair (i, j ) ∈ E and every knowledge state K ∈ K the transition probability is In particular, the probability of the transition from i to j is a β ij parameter if the individual knows at least one solution path for both problems s 0 and i.Otherwise, the transition probability is a η ij parameter.
The three different assumptions are plausible in different situations.The MSP1 assumption is plausible when a problem solver plans ahead the whole solution process of the problem and every single move sticks to the initial plan.For this reason, (MSP1) can be regarded as a pre-planning assumption.On the other side, the MSP2 assumption allows interim planning.It might well be that an initial plan is built, however this last may change along the way.Thus, the transition from a problem state i to another one depends on problem state i only.For this reason, (MSP2) can be regarded as an interim-planning assumption.Finally, according to assumption MSP3, a correct solution to the problem depends on both the initial (s 0 ) and current (i) problem states.In particular, any transition probability is a β if and only if both s 0 and i belong to the knowledge state.In this sense, (MSP3) combines together MSP1 and MSP2 like an "AND" Boolean operator on the β ij .Given this interpretation, (MSP3) can be named as a mixed planning assumption.
Table 3 summarizes the parameters obtained by the three assumptions in function of the initial and the current problem states (columns 1 and 2 in the table).For instance, if s 0 ∈ K and i / ∈ K (Row 3 in the table), under assumption MSP1 the transition probability from i to j is a β ij parameter whereas under assumptions MSP2 and MSP3 the same transition is an η ij parameter.It is worth mentioning that other assumptions are possible like, for instance, one that behaves like a Boolean operators "OR" on the β ij .However, such assumptions are not considered in this research.When applied to the MSPM, the three different assumptions MSP1, MSP2, and MSP3 gives rises to three different models, henceforth named MSPM1, MSPM2, and MSPM3, respectively.

Procedures based on the Markov solution process model
In this section, an MSP-based adaptive assessment procedure is presented that is based on the updating rule shown in Eq. 9.It is worth noticing that this procedure can be applied with any of the MSP1, MSP2, and MSP3 assumptions (and it is open to other assumptions).
Figure 3 illustrates the flowchart of the procedure.The assessment procedure consists of two nested loops.The outer loop starts with the presentation of a new problem (s m,0 , g) ∈ Q, where m ≥ 1, whereas the inner loop starts with a new problem state s m,n ∈ S, with n ≥ 0, in the solution process of the problem (s m,0 , g).
At the beginning of the assessment (i.e., m = 0 and n = 0), the likelihood L 0,0 is set to be a uniform distribution among the knowledge states.Starting from L 0,0 , the assessment is carried out in an iterative way.At each step m, the likelihood L m,0 = L m−1,n and a problem (s m,0 , g) ∈ Q is selected according to the questioning rule.In this work, the half-split questioning rule presented in Eq. 1 has been used.The updating rule described by Eq. 9 obtains L m,n+1 from the L m,n given that the current Example 2 Consider the problem space depicted in Fig. 2 and the knowledge space K ToL derived in the running example in Section "Procedural knowledge space theory".Suppose that the MSP-based procedure, with the mixedplanning assumption, is applied to uncover the knowledge state {s 1 , s 3 , s 4 , s 7 , s 8 } of the same problem solver introduced in Example 1. Table 4 shows the β ij (third column) and η ij (fourth column) assumed in this example.In particular, each row of the columns shows the transition probabilities from the At the beginning of the assessment (i.e., m = 0 and n = 0), all of the knowledge states K ∈ K ToL have the same likelihood L 0 (K) = 1/|K ToL | (see the second column of Table 5).
According to the half-split questioning rule, at step m = 1 and n = 0 problem s 4 is selected because it minimizes the value of Q m (see the second column of Table 2).Suppose that at step m = 1 and n = 3 the correct solution process (s 4 , s 6 , s 8 , s 9 ) is observed for problem s 4 .After three updates of the likelihood distribution (one for each move), the likelihood of every knowledge state K that contains both problems s 4 , s 8 is L 1 (K) = .17,whereas that of every knowledge state K containing neither s 4 nor s 8 is L 1 (K ) = 0 (see the third column of Table 5).
At step m = 2 and n = 0, the half-split questioning rule selects problem s 1 , as shown in the third column of Table 2. Suppose that the correct solution process (s 1 , s 3 , s 5 , s 7 , s 8 , s 9 ) is observed for the problem.At the last sub-step n = 5, the likelihood was updated five times and the knowledge state {s 1 , s 3 , s 4 , s 7 , s 8 } obtained the largest likelihood, as shown in the third column of Table 5.This was also the last question asked by the procedure because the maximum likelihood exceeded the termination criterion of .5.Thus, the MSP-based procedure inferred the knowledge state of the problem solver in two questions out of five.Comparing this example with Example 1, it can be noticed that the MSP-based procedure is more efficient than the CMP, even in this trivial example.Indeed, the CMP requires one more question to terminate.This is because the proposed procedure exploits the fact that according to the sup-paths assumption, if s 1 is contained in the knowledge state of the problem solver, both s 3 and s 5 must be contained.
To show that a latent knowledge state K 0 ∈ K is uncoverable by the MSP-based procedures, it suffices to assure that the updating rule in Eq. 9 is multiplicative.
Theorem 1 The updating rule in Eq. 9 is multiplicative if and only if for all the transitions (i, j ) ∈ E, β ij > η ij and Moreover, for i ∈ S, let E(i) = {j ∈ S : (i, j ) ∈ E}, and define the function R : We are aimed at showing that the following equality holds true: . ( 14) There are four cases.Case 1 is ι K (s 0 ) = 1 and R(j ) = 1.In this case Eq. 14 becomes: By applying Eq. 15, the right-hand side term of the equation becomes: which then gives which turns out to be the MSP-based updating rule for Case 1.We omit the proof for each of the remaining three cases: Case 2, ι K (s 0 ) = 1 and R(j ) = 0; Case 3, ι K (s 0 ) = 0 and R(j ) = 1; and Case 4, ι K (s 0 ) = 0 and R(j ) = 0, because they can be trivially obtained by applying the obvious substitutions.

Simulation study
The aim of the study was to compare to one another the three adaptive procedures based on MSP1, MSP2, and MSP3 assumptions.Moreover, the performance of the three procedures was compared with that of the more known and used CMP.The comparison was made in terms of efficiency and accuracy.

Goal spaces of the Tower of London
The assessment procedures described in this research are for general purpose, as long as procedural assessment of knowledge is concern.In the following studies, they are applied to the case of the ToL test.Since the problem is correctly solved only if the solution is obtained with a minimum number of moves, the goal space of the ToL happens to be a special case called shortest path space (SP space; Stefanutti et al., 2021).Such a type of goal spaces can often arise in applications like the ToL.Further considerations and properties of the SP spaces as well as the accurate description of the goal spaces and knowledge spaces used in this application can be found in Stefanutti et al. (2021).
The goal spaces considered in this study were obtained by setting problem state 31 as the goal state (see Fig. 4).
The goal space P (1) g is represented in Fig. 4 using solid lines.Such goal space is composed by 12 problem states plus the goal and the failure states.The number of problems involved in such a goal space is 12.One of them Fig. 4 The two goal spaces P (1)  g ,and P (2) g used in both simulation studies.It is recalled that both goal spaces are shortest paths spaces was removed because of its easiness (only one move was required to solve it).Thus, the domain Q (1) g of the goal space P (1) g contains 11 problems, three of them having two alternative solutions.The other goal space P (2) g used in this study was obtained from the problem space by setting problem state 31 as the goal state (both dotted and solid line in Fig. 4).The set of problems involved in this goal space is 35, however all problems requiring only one move were removed.Thus, the set of problems Q (2) g of this goal space is 31, 11 of them having more than one solutions.The two goal spaces delineate two knowledge spaces K 1 with 61 knowledge states and K 2 with 242,498 knowledge states.

Simulation design and data set generation
Table 6 shows the simulation design used for generating the data sets.
The manipulated variables were: (i) the generative model, that could be the MSPM1, the MSPM2, or the MSPM3; (ii) the true knowledge structure, that could be K 1 composed by 61 states, or K 2 , composed by 242,498 states; (iii) the amount of error in the data, that was at maximum .01 or .20;and (iv) the sample size N, that could be 155 or 1000 when the considered structure was K 1 and 1000 and 100,000 when the structure was K 2 .
Concerning with knowledge structures, the choice was to use one feasible structure (K 1 ) and one huge structure (K 2 ).The former was the structure derived from the goal space P (1) g .This structure has also been considered for collecting real data that were used in the study presented in Column 1 displays the condition number, column 2 displays the assumption underlying the data generation.Column 3 displays which knowledge space was used.Column 4 displays the maximum amount of error used for generating the data and column 5 displays the sample size "Simulation study based on real data".The latter structure was derived from the goal space P (2) g .As for the "amount of error" in the data, it has been manipulated through the two types of parameters β ij and η ij that are present in all three models.The values of these parameters used for generating the data were exactly the same for all models.They have been generated in the following way.For i ∈ S g \ {f, g}, first the probabilities β if and η if were extracted at random from a uniform distribution in the interval (0; x] and (0; 1 − x], where x ∈ {.01, .20},respectively.These two intervals have been chosen in order to have a situation of a very small error in the data (the former case), and a situation of a large error in the data (the latter case).We recall, in fact, that β if is interpreted as a careless error probability, and, for i = f , η ij is interpreted as a lucky guess probability.Then, the probabilities β ij and η ij , with i = j , were generated at random from a uniform distribution in the interval (0, 1), and then normalized to sum up to 1 − β ij and 1 − η ij , respectively.
In the whole, a 3 × 2 × 2 × 2 = 24 different conditions have been considered and, in each of them, one sample was generated.The procedure used for generating the samples is described below.
Each simulated response pattern corresponded to a collection of J q jump matrices, one for each item q ∈ Q g .Moreover, every single "simulated subject" is represented by a pair (J, K), where K is a knowledge state and J is a response pattern.In the sequel, the response pattern J is referred to as the "response pattern generated by the true state K".
For generating the pair (J, K) the procedure started with the extraction of K from the knowledge structure K, with a certain probability.More precisely, for each state K ∈ K, a random number was extracted from a uniform distribution in the (0, 1) interval.A set of values was obtained that was normalized to sum up to 1.In this way, a random probability distribution π K was generated, which determined the extraction probability of each state.The knowledge states extracted at each iteration and the probability distribution π K were kept fixed across the different conditions 1 to 12, when the true knowledge structure was K 1 , and across 13 to 24 conditions, when the true knowledge structure was K 2 .
Given knowledge state K, the response pattern J was obtained as follows.For each item q ∈ Q g , a sequence of moves J q = (s 1 , s 2 , . . ., s i , . . ., s n ) was generated.Such sequence was obtained iteratively, as explained below.For each i ∈ {1, 2 . . ., n − 1}, problem state s i+1 was randomly generated under different rules, depending on the generative model.Under model MSPM1, P (s i+1 |s i , s 1 , K) = β s i ,s i+1 whenever (s 1 , g) ∈ K, and For each item, the iterations terminated when one of the two problem states f (failure) or g (goal) was entered.It is worth noticing that the termination of each iteration was assured by the fact that P (1) g and P (2) g were goal spaces.This procedure was applied iteratively until N pairs (J, K) were obtained for each generative model.In the end, three types of data were obtained, that is D 1 , generated under the MSPM1, D 2 , generated under the MSPM2, and D 3 , generated under the MSPM3.
With the aim of applying the CMP adaptive procedure, the simulated response patterns belonging to D 1 , D 2 , and D 3 have been "dichotomized" obtaining data set D 4 .For each problem, only the accuracy (correct vs. incorrect) was considered.More in detail, if (s 1 , . . ., s n ) represents the observed sequence of moves for problem q, then the "dichotomous" answer to q was marked as "correct" if s n = g and as "incorrect" if s n = f .

Methods
The procedures based on MSP1, MSP2, and MSP3 were applied to each of the 24 samples (one sample per simulation condition).Moreover, the dichotomous version of each sample was used with the CMP's adaptive procedure.Thus, each sample was used with four different procedures.
All the four adaptive procedures were applied to the simulated response patterns in the following way.Let w ∈ {1, 2, . . ., N}, and let J w denote the w-th simulated subject.For each J w , each step m of the assessment, with m ∈ {1, 2, . . ., |Q g |}, consisted of m updating of the knowledge states likelihood L(m).This updating depended on the response to problem q selected by the procedure at that step.Thus, m increased with the number of problems asked and not with state transitions.The response to problem q was stored in advance in the simulated samples D 1 , D 2 , D 3 , and D 4 , respectively when the adaptive procedure based on the MSP1, the MSP2, the MSP3, and the CMP were considered.
At each step m of a particular procedure, the modal knowledge state K w m of the simulated subject J w was estimated.The estimation procedure consisted of taking the state K ∈ K for which the likelihood L w m was maximum.When max(L w m ) > .50, then a unique K w m existed, otherwise the modal knowledge state may be not unique.In such a case, the only way for assigning a knowledge state to a subject is a random choice among the modal states.
For each condition of the simulation design, the accuracy and the efficiency of the procedures have been analyzed at each step m of the assessment by using several performance indexes.

Performance accuracy indexes
Concerning the accuracy, two performance indexes have been considered for each procedure, that is: 1.The average Hamming distance Dm (K w , Kw m ) computed by where represents the symmetric set difference.
2. The true-positive rate TPR computed at the end of the assessment, that is the proportion of pairs (J w , K w ) for which K w = Kw m , with m = |Q g |.

Performance efficiency indexes
The efficiency of each procedure was measured by three indexes.For each participant w, the number of problems asked m w until the termination criterion L w m ( Kw m ) > .50 is reached was registered.This index has a frequency distribution in the simulated data set, having the set {1, 2, • • • , |Q g |} as a support.Two of the three efficiency indexes considered in this study were the mean m of this distribution and its cumulative distribution.
The last index was the Shannon's entropy (Shannon, 1948).This metric is used in information theory for quantifying the "amount of information" contained in a variable, in terms of the number of bits it takes to store the variable.In the context of computing the efficiency of an adaptive assessment procedure, this metric informs on how many "bits of information" are missing for having the maximal information on the whole test.Each bit of information is an item of the test.It was computed as The average Hm of this quantity was computed across all simulated subject for each number m of questions asked.

Accuracy
Figure 5 shows the results obtained on the accuracy of the procedures when the average Hamming distance is used as the performance index, and K 1 is the considered knowledge structure.In the figure, panels to the left refer to conditions in which the maximum amount of error in the data was .01(named, in the figure, low error conditions).Panels to the right refer to simulation conditions in which the maximum amount of error in the data was .20 (named high error conditions).Row panels refer to the model used for generating the data, which is MSPM1, MSPM2, and MSPM3, respectively, from the top to the bottom of the figure.In each panel, the number m of problems asked by a procedure is along the x-axis, and the average Hamming distance Dm (K w , Kw m ) is along the y-axis.The smaller the distance, the better the performance.
As expected by an adaptive assessment procedure, the average Hamming distance decreases as the number of questions asked increases.This is true for all procedures, irrespective of the amount of error in the data, and of the generative model.Another quite evident result is that among the four procedures, the CMP is the one most susceptible to noise.Indeed, the difference in the performance between conditions with low error in the data and conditions with high error is the greatest for this model.As for the other models, the effect of the amount of error in the data can be seen in the values of Dm (K w , Kw m ) reached by the procedures at each step m of the assessment and, mostly, at the end (m = 11).Indeed, for all procedures, irrespective of the generative model, in conditions with low error in the data (panels to the left), the average Hamming distance is lower than that in conditions with a high error in the data (panels to the right).It approaches 0 only when the amount of error in the data is very low, but with a different extent depending on the generative model.
Interestingly enough, when the generative model is the MSPM1, in the low-error condition, both the MSP1 and the CMP procedures terminate with a distance D11 (K w , Kw 11 ) = 0, whereas the other two procedures had a slightly worse performance.A different result is obtained when the generative model is the MSPM2 or the MSPM3.Indeed, in these conditions, D11 (K w , Kw 11 ) reaches zero with the MSPM1, MSPM2, and MSPM3 models, whereas it is higher for the CMP.
The effect of the sample size on the Hamming distance is negligible (see Fig. 1 in the supplementary material of the article).
The results on the Hamming distance between the true state K and the estimated state Km are better understood if considered along with the true-positive rate.
Figure 6 displays the results of the procedures' accuracy in terms of true-positive rate, when the knowledge structure was K 1 .Panels to the top refer to conditions in which the sample size was N = 155 and those to the bottom refer to conditions with N = 1000.In each panel, the three generative models are along the x-axis and the true-positive rate is along the y-axis.
What clearly results is that the TPR of the CMPbased procedure is almost always lower than that of the MSP-based procedures.Its performances are equally good compared to those of the MSP1 and higher than those of the MSP2 and MSP3 only in two conditions of the simulation design out of 24, that is when the generative model is the MSPM1 and the amount of error in the data is low.Not surprisingly, these two conditions are very favorable for the CMP.
In conditions with low error in the data (panels to the left in Fig. 6), MSP2-and MSP3-based procedures perform equally well, reaching a TPR = 1.00 when they are the generative models.Instead, their performances are worse than those of the other two procedures when the generative model is the MSP1.In conditions with high error in the data (panels to the right), the performances of all procedures worsen.In these conditions, the CMP is able of finding the true knowledge state of the patterns only in a number of cases smaller than 50%.Also for the TPR, it seems that the effect of the sample size on the procedure's accuracy is negligible.Indeed, the bottom panels of Fig. 6 are almost the same as those to the top.
Reading jointly the results on the Hamming distance and the TPR, some interesting insights emerge about the efficiency of procedures when they are applied in a condition in which they are not the generative model.If the generative model is the MSPM1, both the MSPM2 and the MSPM3 procedures perform very well in terms of Hamming distances (their performances are very similar to those of the MSP1) but they perform less well in terms of TPR (they performances are about 20% worse than that of the MSP1), whereas when the generative model is the MSPM2 or the MSPM3, the performance of the MSP1 is quite good in terms of TPR (its performance is about 10% worse than the other two) but it is worse in terms of the Hamming distance.Thus, it seems that although the MSP2 and MSP3 procedures have a lower TPR than the MSP1 (they fail more often) they estimate a knowledge state that is closer to the true one in terms of Hamming distance.
Concerning Conditions 13 to 24, where the knowledge structure K 2 having 242,498 states was used, very similar results of those described above (panels on the left of Figs. 3 and 6 in the supplementary material).In these conditions, the only obvious differences are in the values of the performance indexes reached by the procedures.In fact, the domain of K 2 was composed by 31 problems (versus the 11 problems belonging to the domain of K 1 ).The increasing of the number of problems affects, necessarily, both the accuracy and the efficiency of the procedures.Nevertheless, in proportion, the results are almost the same for all the performance indexes.

Efficiency
Figure 7 shows the results on the efficiency of the procedures in terms of proportion of subjects p m (y-axis) that reached the termination criterion L w m ( Kw m ) ≥ .50 at a particular step m (x-axis) of the assessment.The results refer to conditions with low error in the data (panels to the left) and with high error in the data (panels to the right), when the sample size is 155 and the structure is K 1 .
Interestingly enough, MSP2 and MSP3 perform better than the MSP1 and the CMP in almost all conditions, irrespective of the generative model and the amount of error in the data.In conditions with low error in the data, a proportion of simulated subjects greater than 80% reaches the termination criterion with MSP2 and MSP3 only after five questions, even when they are not the generative model.For the other two models, at least one more question is needed for arriving at the same proportion of the sample.It is worth noticing that in conditions with high errors in the data, the performance of the CMP is a lot worse.Indeed, less than the 20% of the sample reaches the termination criterion at the end of the assessment.At the end of the assessment, the other three procedures approach 100% of the sample when the amount of error is small, and a percentage greater than 80% when it is high.The effect of the sample size on this efficiency index is negligible (see Fig. 2 in the supplementary material of the article).
Concluding, the efficiency in terms of average entropy H w m of the adaptive procedures is displayed in Fig. 8.The figure is read exactly like Fig. 5, with the only difference that along the y-axis, the average entropy Hm is displayed.
It can be seen that this index monotonically decreases as the number of problems asked increases.This is true irrespective of the generative model and the amount of error in the data.What emerges very clearly is that when the amount of error in the data was high (panels to the right), the procedure based on the CMP performed worse than the other three in all conditions.When the amount of error in the data was low (panels to the left), the CMP and the MSP1 performed very similarly one to another but worse than the MSP2 and the MSP3 procedures.Thus, also this statistic suggests that the MSP2 and the MSP3 procedures are more efficient than the other two.
Concerning Conditions 13 to 24, where the knowledge structure K 2 having 242,498 states was used, the entropy show acceptable results (Figs. 4 and 7 of the supplementary material), however the proportion of subject that react the termination criteria (p m ≥ .5) is rather poor when the error is high (right panel in Figs. 5 and 8 of the supplementary material).This could be due to the interaction between two factors, namely the huge size of the knowledge space and the high error level used in the simulation.In these conditions, a likelihood as large as .5 would hardly be reached by any assessment procedure.Maybe in a situation like this, such criterion is too strong and could be replaced by a weaker one, like the following: stop whenever a single modal state is obtained.

Discussion
Compared with the performance of the CMP, those of the MSP1, MSP2, and MSP3 are sharply superior, mostly when the amount of error in the data increases.Indeed, the results on both the accuracy and the efficiency showed that the adaptive assessment procedure based on the CMP is more susceptible to noise than the other three.
As for the comparison among the three MSP-based procedures, a clear superiority of one of them did not emerge.Nevertheless, it can be stated that the MSP2 and MSP3 are less affected by the assumptions behind the data.In fact, they perform quite well, both in terms of accuracy and efficiency, even when the generative model was the MSPM1.

Simulation study based on real data
The aim of this study was to test the three (MSP1, MSP2, and MSP3) adaptive procedures with real data.To this aim, a pre-existing data set (Stefanutti et al., 2021) was used that consisted of the responses of 154 subjects to the set Q g of 31 ToL problems collected via a computerized version of the ToL.Among the 31 problems, only 11 were used, namely those problems belonging to the domain Q (1)  g .Thus, only the goal space

P
(1) g , and the knowledge structure K 1 delineated by it, were here considered (see "Goal spaces of the Tower of London" for more details).Goal space P (2) g and the corresponding knowledge space K 2 were not considered in this study because the cardinality of K 2 was too large (242,498) to be used with a sample of size 154 (as resulted by the previous simulation study).

Material and Data
The description of the administration of the ToL is briefly summarized here.Only the most important features of the administration phase are described here.For details, the reader is referred to Stefanutti et al. (2021).
To each participant, the ToL problems were administered in a randomized order via the computerized version of the ToL developed by the authors.Participants were given the following instructions: (a) solving the problems with a minimum number of moves; (b) planning in advance; (c) being as fast as possible.For every problem, the computerized ToL recorded each move until the participant made an error or correctly solved the problem.A move was considered an error whenever it reached a problem state laying outside a minimum length solution path.No Fig. 9 Efficiency of the adaptive procedures applied to real data.In the upper panels, the efficiency is given in terms of average entropy Hm (yaxis), at each step m of the assessment (x-axis).In the lower panels, the efficiency is given in terms of the proportion p m of subjects reaching the termination criterion (y-axis), at each step m of the assessment (x-axis).See text for more details conditions, the MSP-based procedures outperform the CMP.Regarding the efficiency, the MSP2-and MSP3-based procedures performed better than the other two in almost all the conditions.
In the second simulation study, the procedures were applied to a real data set of 154 individuals to whom a set of the ToL problems was administered.The results were coherent with those obtained in the first simulation study.An exception is the case of the condition of high error, where for the MSP1-and MSP3-based procedures the entropy of the knowledge states likelihood distribution was almost the same and the lowest.This may seem as an incoherence with the first simulation study.A tentative explanation is that the participants were instructed to plan in advance the whole solution paths.However, some participants could have applied a different strategy.
The main peculiarity of the procedures presented in this article is that the dependencies among problems reflect their structural relations in the problem space rather than inferred through the application of statistical procedures to the data.Such a relationship is based on the assumption that, if a solution path includes another solution path, then an individual who knows how to apply the former also knows how to apply the latter.Referring to Example 1 in "Adaptive assessment in a problem space", an individual who knows how to solve problem s 1 by applying ab ā b ā will also be able to solve s 3 by applying b ā b ā.The validity of this assumption seems reasonable, although it needs to be empirically tested in every single context where such procedures are applied.For instance, in the context of the ToL test, empirical validation of the MSPM by (Stefanutti et al., 2021) showed promising results.
The outcome of a PKST-based assessment procedure is a knowledge state rather than a numerical score.The knowledge state is an "estimated" representation of the portion of the problem space that is known to the problem solver, or the portion where this last can operate successfully.This kind of representation cannot by achieved through a simple numerical score.This seems to be a clear advantage of the proposed approach, in the attempt of better capturing and explaining individual differences.
In the clinical context, many advantages of this representation may be pointed out.In KST, a knowledge state has two well-known properties that are named the "inner fringe" and the "outer fringe".Both of these have very clear and theoretically well-founded interpretation in the educational context (Falmagne et al., 2013).The inner fringe represents the points of strength of the student, whereas the outer fringe represents what a student is ready to learn.Such interpretations can be easily transferred to the clinical and psychological contexts.The inner fringe represents the maximum performance of the individual, which is not the same thing as number of problems solved correctly.The outer fringe contains the problems that are one step ahead for the individual.In a rehabilitation context, they may be used as training exercises, which are at the appropriate difficulty level for the patient.
This work was focused on comparing the updating rule of CMP and MSP-based procedures.However, other aspects of the assessment procedure can be varied to increase the efficiency of an assessment.For instance, (Heller & Repitsch, 2012) has shown that using an informative initial likelihood distribution on the knowledge states can improve the performance of an assessment procedure.However, an incorrect initial distribution can impair the performance of the procedure.In this application, the uniform distribution was used to avoid those issues.However, future applications should further investigate these aspects to further improve the assessment performance.
From a practical perspective, a field of application for the adaptive procedures proposed in this research is neuropsychological testing.In the last years, the attention of neuropsychology researchers has focused on how modern psychometric theories and advances in technology should be incorporated in neuropsychological assessment (see, e.g., Costa, Dogan, Schulz, & Reetz, 2019;Howieson, 2019;Kessels, 2019;Marcopulos & Łojek, 2019).Some attempts and innovations were made, such as a recent work by D' Alessandro et al. (2020) which used a computational model approach to assess perseverant behavior with healthy and substance-dependent individuals on the Wisconsin Card Sorting Task.Although based on a different approach, the assessment procedures proposed in this article have a similar objective.
Another promising field of application is serious games.The procedures developed in this article can be used as a base for the definition of educational games and virtual training environments.This sets up an agenda for future research work.

Open Practices Statement
The code and an example of how to run it on an existing simulated dataset are available at the following link: https:// osf.io/qa8mg/v?viewonly=8b4e148300de40a6941df4a102 067fc1/.None of the experiments was preregistered.

Fig. 1
Fig. 1 Problem state 31 of the Tower of London test

Fig. 2
Fig. 2 Portion of the problem space of the Tower of London test representing two solution paths of a five-moves problem (a) left to center; (b) center to right; (c) left to right;( ā) center to left; ( b) right to center; ( c) right to left.Therefore, in the ToL, the set of operations is ToL = {a, b, c, ā, b, c}.

Fig. 5
Fig.5Accuracy of the procedures in terms of average Hamming distance between the true and the estimated knowledge state.The results refer to odds conditions from 1 to 12 of the simulation study

Fig. 6
Fig.6Accuracy of the procedures in terms of TPR in all conditions 1 to 12 of the simulation study

Fig. 7
Fig. 7Efficiency of the procedures in terms of proportion of subjects that reached the termination criterion p ≥ .50 at step m.The results refer to odds conditions from 1 to 12 of the simulation study

Fig. 8
Fig.8Efficiency of the adaptive procedures in terms of average entropy Hm at each step m of the assessment.The results refer to odds conditions from 1 to 12 of the simulation study = t, such that s • π = t for some sequence π of operations in * .Stated differently, a pair (s, t) is a problem if, by applying the sequence π to the problem state s, the problem state t is obtained.State s is named the initial state of the problem, whereas t is the goal state.

Table 2
Values of Q m obtained during the Example 1 for each step m

Table 3
Summary of the parameters obtained under the three assumptions Columns 1 and 2 display, respectively, if the initial problem and the current problem belong to the considered knowledge state.Columns 3-5 display the resulting parameters under that assumption Fig.3Diagram of the MSP-based procedure.See text for the details problem is (s m,0 , g) and the observed problem state in the solution process is s m,n+1 .The solution process for problem (s m,0 , g) terminates whenever the observed problem state s m,n+1 is the goal state g or the failure state f .The termination criterion decides whether an additional problem should be presented or not.The assessment terminates as soon as the likelihood L m,n+1 (K) of any knowledge state K ∈ K is greater than a predefined value p ∈ (.5, 1].

Table 4
Values of the β ij and η ij parameters used in the Example 2

Table 5
Values of the likelihood distribution L m at each step m of the

Table 6
Design of the simulation study used for generating the data