10.1 Introduction

This chapter provides two main sets of information. First, it describes the development and validation of the tool designed to measure proficiencies demonstrated by adolescents aged 13–17 years in Kenya, Tanzania, and Uganda in the three life skills and one value measured under the Assessment of Life skills and Values in East Africa (ALiVE) initiative. Second, it provides and discusses results from the large-scale assessment program that relied on the tool. The information provides the background to the assessment results that is needed to inform policy of the participating countries as they seek to include life skills and values in their national curricula.

ALiVE began in August 2020, starting with a contextualisation study designed to explore the nature of life skills and values in the East African community. With outputs from this study, a process of re-framing these ‘constructs’ (a term that is used to represent particular constellations or groups of behaviours or characteristics) was undertaken to ensure appropriate interpretation in the East African community. The activity reflected a well-recognised model of creating psycho-social measurements (Wilson, 2005) in which the process requires clarification of the construct, development of tasks and items, collection and scoring of responses, and production of measures. From the contextualisation study, the conceptual structures of the target constructs were agreed upon, followed by development of assessment frameworks which identified the measurable components of those constructs. Guided by the assessment frameworks, the ALiVE team moved into task and item development in March 2021, which was finalised by August 2022. The large-scale assessments across the three countries were administered to over 45,000 adolescents late in 2022.

Development of tasks and items to measure the constructs was undertaken mindful of the intended mode of data collection and of the intended use of results. Of particular note was that results were to be used to provide broad descriptors of adolescent proficiencies at aggregate levels. There was no assumption that results could or would be used to describe the proficiencies at individual level or for diagnostic purposes. Accordingly, the test and scale development process was focussed on identification of key indicators of the constructs and their subskills rather than a comprehensive and deep evaluation of demonstration of competencies. The goal guiding the process was to enable reporting of competencies in a way that would provide ministries of education with sufficient information upon which to consider best integration of these into mainstream curricula. As documented in Care (2024; Chap. 1, this volume) all three participating countries and their four education systems have either recently incorporated these competencies into their curricula or are in the process of doing so. Several are in the process of delineating how, and at what levels, integration should be managed. The information generated by the ALiVE program therefore provides empirical evidence of current levels of functioning in the 13–17 year old age group that can be factored into decision-making by the national education systems. The constructs for which a tool was created are collaboration, problem solving, respect, and self-awareness.

10.2 Method

10.2.1 Design

The design of the tool was influenced by several factors. The first of these was the use to which the assessments would be put; the second was the intended mode of data collection; the third was principles of test and scale development.

  • ALiVE was designed to capture a glimpse of functioning across life skills and values, as aspired to by ministries of education in the respective educational jurisdictions (Kenya, Tanzania mainland, Uganda, and Zanzibar). The assessments were not designed for diagnosis of individual functioning but rather to establish a basis upon which countries might evaluate their educational goals given their embrace of life skills and values in recent years, and to inform their curricular planning.

  • ALiVE was interested in a representative sample of the participating countries’ adolescents who might be in or out of school or employment, since the targeted competencies have been highlighted by employers and in the technical-vocational sector as well as by ministries of education. This interest, therefore, required household-based assessment. This medium for assessment, in turn, requires manageable interactions in the field distinct from interactions that can be managed at the group level in a formal education environment. Manageability in the field implies assessment forms that can be communicated orally, in time-efficient ways, and through content such as daily life scenarios that are not reliant on school-based learning.

  • ALiVE committed considerable effort to definition and description of the target constructs. This was undertaken both due to observance of a well-established test and scale development model (Wu et al., 2016), and to the combination of two relatively recent innovations. First, the assessment of twenty-first century competencies remains in its early days (Griffin et al., 2012; Care et al., 2018), and second, household based assessment at large scale has emerged in the past decade as an acceptable and sufficiently stringent approach to collection of data that prompts government action (e.g. Uwezo, 2021) or contributes to monitoring of the Sustainable Development Goals (UNESCO UIS, 2022). The household mode has been used predominantly for measurement of numeracy and literacy rather than competencies associated with socio-emotional functioning. Ensuring common understanding of the target competencies is therefore not only essential in the routine test and scale development processes, but in ensuring same understandings when hundreds of Test Administrators and other personnel are involved in data collection.

10.2.2 Sample

The sampling frame used for this study was derived from the Population and Housing Census frames for Kenya, Tanzania (mainland and Zanzibar), and Uganda. This frame included a complete listing of census enumeration areas and households. In each country, a multi-stage sampling approach was used to select households and adolescents for the assessment. The approach involved selection of districts/counties; followed by the selection of enumeration areas; that is, the smallest areas of clustering the population within a country; and then selection of households within each selected enumeration area. The desired sample sizes were determined by considering: the degree of precision desired for the study estimates, the cost and operational limitations, the efficiency of the design, and a fixed number of households per enumeration area. A total of 45,442 in-school and out-of-school adolescent boys and girls aged 13–17 years from 35,720 households, 1991 enumeration areas, and 85 districts/counties were assessed as shown in Table 10.1.

Table 10.1 Sampling across districts, enumeration areas, , and adolescents

Of the enumeration areas, reached were just 18 less than the planned allocations. Approximately 10% less households were reached than planned (35,720 of 40,000). Discrepancies in both cases were distributed across each of the jurisdictions. Sampling weights calculated based on sampling probabilities for each sampling stage were used in the analyses to ensure that the results accurately reflect the characteristics of the population being studied. Adjustments for non-response were also made by including in the sampling weights the household response rate adjustment factor. The sampling weights were then used to adjust analysis of the sample data to account for any differences between the sample and the population.

10.2.3 Ethical Considerations

The research protocols were reviewed and approved by the relevant research regulatory authorities in the four jurisdictions. Practical strategies adopted to safeguard participants’ rights and safety included obtaining informed consent from all participants and, as far as possible, ensuring voluntary participation. Both parent or guardian consent and adolescents’ consent were sought before the start of the assessment. Individuals’ identity was safeguarded through codes in storing and sharing data.

The activities were undertaken in recognition of three risk factors that highlighted the project as requiring of human research ethics attention. These included: the age of the young people was such that they were regarded as a vulnerable group; the area of enquiry included self-awareness and respect, two social functioning aspects of life that can be seen as sensitive; the scenario form of assessment was contextualised within issues that could occur in daily life, which could reasonably be seen as sensitive topics for some.

Two approaches to risk minimisation were taken. First, the development of the tool included multiple iterations of review of content in order to ensure that this would not confront the participants with issues relating to sensitivities associated with culture, gender, ethnicity, language, or religion. Second, the training of Test Administrators included sensitivity awareness to such issues and provided clear guidelines for how to approach and interact with the participating adolescents and their parents.

10.2.4 Data Collection

The ALiVE assessment events were conducted by 3080 Test Administrators (TA) across the four jurisdictions—with two of these allocated for each adolescent interviewed. These assessors participated in three-day training workshops which introduced them to the purpose of the assessment, the nature of the targeted constructs, the assessment tool and scoring rubrics, and ethical issues and principles. They also participated in a field practice exercise to familiarise themselves with the assessment protocol.

The data collection tools drew on two formats:

  • Scenario-based tasks were used to assess problem solving, self-awareness, and respect. The scenarios were based on elements of daily life with which adolescents could be expected to be familiar. Each scenario, of 1–4 sentences, was read aloud to the adolescent, followed by a series of questions. Either English or the local language was used. Adolescents responded orally, again in English or the local language, as preferred, with TA noting down the responses and coding these based on scoring rubrics. TA were paired so notations and coding could occur in real-time and be reviewed at the day’s end for verification before submission. The number of tasks and associated items are summarised in Table 10.2.

  • Performance-based tasks were used to assess collaboration. The tasks were administered to groups of four adolescents in both single-sex and mixed-sex groups. Adolescents who had already completed the individual scenario-based tasks of problem solving, respect and self-awareness, formed these groups. The tasks were read aloud to the group in the language preferred by adolescents. The first two tasks required the group to work with physical materials available within the immediate environment or provided to the groups. TA were again paired such that the notations and coding could occur in real time, and be reviewed at the day’s end for verification before submission. The number of tasks and associated items are summarised in Table 10.2.

  • A household survey tool was used to collect household and adolescent background data. Information gathered included: household head (gender, education), number of members in the household, types of walls of the main house, main source of lighting, the main source of water, number of meals regularly eaten, number of assets owned, media preference and use, and adolescents’ characteristics such as gender, education level, age, and functioning (hearing, seeing, walking, remembering and self-care). The household tool was responded to by the head of the household.

Recording of responses from all tools was enabled through the TA use of KoboToolbox on mobile devices (tablets or phones) (Kobo Organization, n.d.). Case records were uploaded at the end of each day of testing to the main data repository.

Table 10.2 Numbers of tasks and items in the ALiVE tool set

Table 10.2 makes explicit both the target life skills and one value and their contributing dimensions or subskills. Note that the dimensions and subskills do not fully represent the overall constructs in every case. Additional information about the conceptual structure and assessment framework for each construct is found in each of the three chapters dedicated to these (Scoular & Otieno, 2024; Chap. 6, this volume; Ngina et al., 2024; Chap. 5, this volume; Care & Giacomazzi, 2024; Chap. 4, this volume). As made explicit by these sources, the assessment frameworks which led the design of the tasks and items were guided by the conceptual models but determined by the pragmatics of the planned assessment approach. Responding to the household-based environment, the conditions for data capture were such that the more easily identifiable indicators of the various constructs were prioritised for measurement. Tasks and items were developed to encompass the dimensions and subskills identified in Table 10.2. This chapter reports on the main construct for collaboration, problem solving, and respect, and across the two dimensions for self-awareness.

10.2.5 Structure of the Tools

10.2.5.1 Collaboration

The assessment constitutes three tasks, with a subset of 8 items. Two tasks follow the same pattern, targeting a step approach to collaboration. The items assess the adolescent’s communication—which is about listening (receptive) and speaking (expressing); negotiation—which is needed for one to reflect on other people’s views vis á vis their own, including accepting feedback and reaching a consensus; and working together—to plan and engage in activities. The last task includes only the communication and negotiation phases. The final tool provides 8 data points. Additional information on the construct can be found in Scoular and Otieno (2024; Chap. 6, this volume).

10.2.5.2 Problem Solving

The assessment constitutes three contextualised task scenarios, each containing four items. A task scenario comprises a brief description of a situation with four items, each of which targets a different aspect of an adolescent’s problem-solving proficiency. All three task scenarios follow the same pattern, with each of the four items targeting a step or process approach to problem solving. The first two items of a task scenario assess the adolescent’s recognition of the problem, followed by gathering relevant information. The second two items assess the adolescent’s exploration of alternative solutions and selection of the best solution. The final tool therefore provides 12 data points. Additional information on the construct can be found in Care and Giacomazzi (2024; Chap. 4, this volume).

10.2.5.3 Respect

The assessment constitutes four task scenarios with a subset of 10 items. All four task scenarios follow a similar pattern, from awareness of lack of kindness to recognition of actions betokening lack of respect. The items assess the adolescent’s regard for others and regard for property: awareness of infringing on others’ rights, recognition of one’s wrongdoing, respect for the rights of others, and willingness to make amends for wrongdoing. The final tool provides 10 data points. Additional information on the construct can be found in Ngina et al. (2024; Chap. 5, this volume).

10.2.5.4 Self-Awareness

The assessment constitutes five task scenarios with a subset of 12 items. Unlike the case of the problem-solving tasks and items, the items across these five tasks follow slightly different patterns. The items assess the adolescent’s self-awareness through two subskills: self-management—managing emotions and stress; and perspective taking—understanding views and actions of others, adjusting to others’ views and actions, and recognising one’s identity and where one fits in family, society, and community. The final tool provides 12 data points. Additional information on the construct can be found in Ngina et al. (2024; Chap. 5, this volume).

10.2.6 Data Analysis

Initially, classical test theory was used to explore the functioning of tools across its items. Specifically reviewed were: the distribution of responses across items; the patterns of responses for each item by country, gender, age, and education levels of the adolescents. Reliability coefficients were then calculated to establish the scales’ coherence.

The Rasch model was used to explore and quantify the participants’ responses.Footnote 1 Using the Rasch model provided tools for interpreting skills that underpin constructing and developing empirical proficiency levels. Proficiency levels describing increasing competency levels were developed for the over-arching constructs. Following these test and scale development processes, further analyses were performed to explore the four constructs across jurisdictions and by selected variables—gender, education level, adolescent age, and disability status.

Finally, based on the Rasch model analysis outputs, patterns were recognised regarding increasing proficiency levels across the four main scales and the subscales for problem solving, self-awareness and collaboration. The results demonstrated the utility of the rubrics used for coding responses.

10.3 Results

10.3.1 Demographic Characteristics of Adolescents

Younger and older adolescents were almost equally distributed across the two main age ranges (13–14 years and 15–17 years) (Table 10.3). Approximately 13% of the adolescents assessed were out of school—not currently studying. For adolescents not currently in school, the achieved highest level of education was recorded as primary or secondary for analysis. Across the full sample, about 30% of adolescents assessed reached secondary education level, and 65% reached primary education level. Note that there are some variations across the four jurisdictions in allocation of Grades 7 and 8 to primary versus secondary education.

Disability status of the adolescents was determined using the Washington Group Short Set of Questions. Parents were asked whether their children had any difficulty in vision, hearing, walking, memory, self-care and language/communication and how severe such difficulty was. Overall, parents reported about 12% of the adolescents had some form of difficulty.

10.3.2 Psychometric Properties

10.3.2.1 Reliability Analysis of the Tools

Table 10.4 presents alpha coefficients as a measure of internal consistency in the scales for the over-arching constructs and relevant dimensions for collaboration, problem solving, respect, and self-awareness.

For all scales, the alpha reliability coefficients indicate high homogeneity of content. The average inter-item correlation coefficients for the over-arching constructs ranged from 0.300 (for self-awareness) to 0.448 (for problem solving), suggesting that the target items are reasonably homogeneous, and contribute unique variance to their over-arching constructs.

10.3.2.2 Item Fit Statistics

All items across the four constructs demonstrated ‘good’ fit statistics. That is to say, the weighted mean-square values were all between 0.7 and 1.3 (Wu et al., 2016). See Appendix 1 (a, b, c, d) for the mean squared residual based item fit statistics, weighted and unweighted.

10.3.2.3 Differential Item Functioning

Use of assessment tools across countries or cultures raises issues of validity of comparisons between groups. Such issues may reside in matters of language, societal norms, religion, ethnicity, as well as age and gender. Test developers make efforts to design assessments in ways that will avoid differential bias among groups. Notwithstanding, it is also necessary to check whether such bias may have occurred after the fact. The results and information from Differential Item Functioning (DIF) analysis provide a rich source of information for exploring the possibility of bias of measurements across groups.

Since the ALiVE tool was used across three countries and four education jurisdictions, each of which varies in both explicit and subtle ways, it was important to check for DIF. Analyses were conducted across the four jurisdictions to provide insights into whether items functioned differently or similarly across the educational jurisdictions. Detection of DIF was undertaken through visual inspection of the results from scatterplots using item thresholds derived from the Rasch model. For each of the three over-arching constructs (collaboration, problem solving, respect) and the two dimensions of self-awareness (self-management and perspective taking), item thresholds for each of the study jurisdictions were placed on the Y-axis and the regional item thresholds—all four jurisdictions together—were placed on the X-axis. In addition, scatterplots for each of the study jurisdictions against each other were analysed. Figure 10.1 provides an example that shows least differences obtained for a jurisdiction against the region, while Fig. 10.2 provides two examples that illustrate greatest differences found of all comparisons, and this time between two jurisdictions.

Fig. 10.1
A 4-quadrant scatterplot of Zanzibar thresholds versus regional thresholds plots dots in increasing slope from the third quadrant to the first quadrant through the origin. The regression line begins at (negative 2.40, negative 2.00) and ends at (2.60, 2.30). Values are estimated. R square = 0.9946.

Scatterplot of problem solving item thresholds: Zanzibar against region

Fig. 10.2
2 scatterplots of Zanzibar and Tanzania thresholds versus Kenya plots dots in increasing slope. The regression lines begin at (negative 1.75, negative 1.40) and (negative 1.70, negative 1.20), and ends at (1.65, 1.30) and (1.65, 1.00). Values are estimated. R square = 0.9194 and 0.9019 respectively.

(a) Scatterplot of self-management item thresholds: Zanzibar against Kenya. (b) Scatterplot of self-management item thresholds: Tanzania mainland against Kenya

The comparison of thresholds in the Fig. 10.2 scatterplots illustrates how slight the differences are, even in these cases which are the most extreme. Two scatterplots are provided, both for the self-management dimension of self-awareness, and with each comparing one of the Tanzania jurisdictions with Kenya. Looking at both scatterplots is instrumental in demonstrating the resilience of the patterning of the items, notwithstanding their elevation varying.

As shown, there is negligible differential item functioning in the four constructs across the four jurisdictions. In exploring the slight differences that do occur, it is clear that they are due primarily to group difference in performance rather than bias. Overall, items for all constructs pattern very similarly across the four jurisdictions. Where differences occur, these are in terms of levels of proficiency rather than patterning differences which might imply bias.

10.3.2.4 Item Spread

‘Person-ability maps’ based on the Rasch model provide a view of how well items cover a range of proficiencies via their physical separation on a graph. Such a map, or graph, shows whether items are separated enough to differentiate between the performance of different respondents. The person-ability map allows for the placement of items and persons on the same scale. Naturally, the higher a person’s ability, the greater the probability of appropriately answering an item or demonstrating a higher level of proficiency.

The patterns demonstrated by the person-ability maps validate the approach to item design, allowing for identifiable differences in responses and informing their coding. The clear separation of response codes across the items for all the constructs demonstrates that the coding rubrics indeed captured similar degrees of discrimination across proficiencies for most items. An ideal test would be characterised by items distributed right across the possible range of person abilities. Such an instance would allow for optimal differentiation of one person’s abilities from another. In these cases, there is sufficient delineation between each coding level to justify the attribution of descriptive scoring statements for the various categories of proficiencies. The task and item design for all constructs was such that three or four levels of response were provided for through scoring rubrics. That the categories of responses in the main adhere to these levels is indicative of the robustness of the design and of the rubrics. The maps indicate how the response categories tend to pattern, and support the creation of the descriptive statements that are strongly aligned with the original scoring rubrics. Hence, both the content of the rubrics and how the resulting category responses are located in the graph space provide confidence in setting cut-offs which are then used to identify ranges of performance within achievement levels with corresponding descriptive statements.

The person-ability maps for the four constructs (see Figs. 10.3, 10.4, 10.5, and 10.6) represent the coded responses of adolescents through low to higher performance levels. The horizontal blue lines on each map identify the approximate location of the cut-offs. Each map provides an illustration of the capabilities of the adolescents in the context of the demands of the tasks. The distribution of adolescents is on the left-hand side of the graph, and on the right are the item numbers and coding according to the partial credit model. Each item and its label appear at a position horizontally parallel to those adolescents who have a 50% chance of demonstrating that particular item’s performance. These adolescents have increasing probabilities of being able to demonstrate the proficiencies represented by item by category responses, the lower these appear on the graph.

Fig. 10.3
2 parts. A histogram for respondents of dimension 1 versus logits plots bars in bell curve trend. A person-ability map of logits versus items plots 3 sets of dots for cat 1, cat 2, and cat 3. The dots lie horizontally between negative 1 and 2.5, 1 and negative 1, and 1 and 3, respectively.

Person-ability map for collaboration

Fig. 10.4
2 parts. A histogram for respondents of dimension 1 versus logits plots bars in bell curve trend. A person-ability map of logits versus items plots 3 sets of dots for cat 1, cat 2, and cat 3. The dots for cat 1 and cat 2 are clustered horizontally between 0.5 and negative 2.5 while cat 3 are above 0.5.

Person-ability map for problem solving

Fig. 10.5
2 parts. A histogram for respondents of dimension 1 versus logits plots bars in bell curve trend. A person-ability map of logits versus items plots 2 sets of dots for cat 1 and cat 2. The dots for cat 1 are clustered horizontally between 0 and minus 2.5 while cat 2 are between 0 and 2.

Person-ability map for respect

Fig. 10.6
2 parts. A histogram for respondents of dimension 1 versus logits plots bars in bell curve trend. A person-ability map of logits versus items plots 3 sets of dots for cat 1, cat 2, and cat 3. The dots for cat 1 are clustered horizontally between negative 0.5 and negative 2 and others are above negative 0.5.

Person-ability map for self-awareness (including both self-management and self-awareness)

Along the right-hand side of each graph is the ‘logit’ scale, showing the numeric equivalents of the graphed locations. Each item identified along the bottom of the graph is represented at the different levels of quality at which adolescents respond to that item (shown as Cat1, Cat2, Cat3). For example, in Fig. 10.3, the lowest level response to item CT11, Cat1 (at a logit of about −1.8), represents the lowest level of quality response for the item, while Cat3 (at a logit of about 2.3) represents the highest quality response. The graph also illustrates how different items vary in difficulty. For example, item CT13 at the far right of the graph is the ‘easiest’ item for adolescents to demonstrate overall. Information about the coding rubrics for the constructs can be found in this volume (Scoular & Otieno, 2024; Chap. 6, this volume; Ngina et al., 2024; Chap. 5, this volume; and Care & Giacomazzi, 2024; Chap. 4, this volume).

The figures provide additional information about how items contribute to subskills, and how these subskills are variably easier or more difficult for adolescents to demonstrate. For example, CT13 and CT63 are items that target working together in collaboration. Together these are the activities that more adolescents find easy to engage in, as distinct from communication (CT 11, 41, 61) and negotiation CT 12, 42, 62) which cover a broader and more complex range of performance.

Figure 10.4 reveals a less even distribution of proficiencies within each item for problem solving. Items contributing to the dimension of defining the problem (PS1b, 3b, 4b, 1c, 3c, 4c) cover a wide range of proficiencies due to the highest level of responses presenting considerable difficulty for respondents. Ideally the task and item design could include additional features which would allow for more nuanced responses between the highest and middle categories. The remaining items which contribute to exploring the solution cover a relatively narrow range of proficiencies. Again, the task and item design needs some improvement in order to capture more differentiated abilities for this dimension.

Figure 10.5 illustrates the perspectives associated with respect expressed by the adolescents. It should be kept in mind that the measurement of respect was limited to respect for self and others, and did not include respect for property and the environment (see Ngina et al., 2024; Chap. 5, this volume). Similarly, the mode of assessment did not allow for identification of the subskills, that would separate respect for self and respect for others.

Figure 10.6 shows location of items across the range of responses to the self-awareness tasks and items. The graphic divides items that contribute to the two dimensions for ease of reference. The pattern of items hypothesised to assess adolescent’s self-awareness indicates how these two dimensions—self-management and perspective taking—draw on adolescents’ capacities differently, notwithstanding their strength of association (r = 0.668). Although the two scales demonstrate high reliability, contributing robustly to the over-arching skill, perspective taking (SA1b, SA1e, SA3a, SA3c, SA4c, and SA7d) appears slightly more difficult to demonstrate than self-management (SA1d, SA4b, SA6b, SA6c, SA7b, and SA7c). Therefore, reporting results for each scale separately provides useful information to inform teaching and learning in the education context.

10.3.2.5 Summary of Approach

The Rasch model was used to calibrate the participants’ response data and establish the ability and item difficulties on a common scale. This approach ensured that item parameters were independent of the participants being measured and estimates of participant abilities were independent of the items used to measure the underlying ability/trait. The fit statistics were used to investigate how well the data met the model requirements, and misfitting items or persons were identified and addressed to ensure the validity of the results. Differential item functioning was used to detect differences in item performance for individuals with the same underlying ability but belonging to different subgroups, such as gender, education level, and adolescent age. The presence of DIF was explored to establish the scope of use and the generalisability of the results across and within countries and to distinguish between bias and real differences in ability distribution. The Rasch model analysis outputs provided tools for interpreting skills that underpinned constructing and developing empirical proficiency levels, which described increasing competency levels for the over-arching constructs.

10.3.3 Proficiencies Distributions across Jurisdictions

ALiVE reports adolescents’ demonstration of their life skills and values through brief statements. These statements make explicit how the adolescents actually respond—as opposed to reporting scores, for example. Varying across the four constructs, there are three or four statements which describe increasing levels of development or proficiency. These descriptions are based on analysis of the items that fall within the various cut-offs shown in Figs. 10.3, 10.4, 10.5, and 10.6. All items were designed such that adolescents could respond to these at varying level of proficiency or quality which were captured through the scoring rubrics. Reviewing the behaviours targeted by those rubrics against placement of the items in the person-item maps acts as a confirmation of the increasing levels of quality. Adolescents performing within each of the descriptive levels are shown in Tables 10.5, 10.6, 10.7, 10.8, 10.9, and 10.10.

In this section the distributions of adolescent proficiencies for each construct are shown in the context of jurisdictional differences. As pointed out in the presentation of the DIF results, there are no differences across Kenya, Tanzania (mainland and Zanzibar), and Uganda in terms of how items contribute to the scales—this is uniformly consistent. Some differences in distributions of the proficiencies are highlighted.

10.3.3.1 Collaboration

Four levels of performance describe what adolescents were able to demonstrate during the collaboration tasks (Table 10.5). Overall, most adolescents were attentive to the discussions; they queried the views of others and engaged actively in the performance tasks. Relatively few either did not engage visibly (9.5%) or prompted others to engage (10.0%).

10.3.3.2 Problem Solving

Four levels of performance describe what adolescents were able to do during the problem-solving assessments (Table 10.6). A reasonably large proportion of adolescents (32.9%) struggled to identify possible solutions to a problem, while nearly half (49.1%) of the adolescents were able to recognise existence of a problem from one perspective and act on that to identify a possible solution. Relatively few were able to justify solutions or identify multiple approaches to solving a problem. There is a slight skew in the distribution of the Zanzibar adolescents, with fewer than expected at the lowest level, and more than expected at the higher levels.

10.3.3.3 Respect

Four levels describe how adolescents expressed respect in terms of regard for others and self (Table 10.7). Overall, a large proportion of the adolescents were aware of poor behaviour (34.4%), and able to interpret this as lack of respect for others or self, with need for conciliatory steps (50.2%). However, very few were able to act respectfully in defence of others and self. It is noteworthy that a larger proportion of Zanzibar adolescents were part of these few.

10.3.3.4 Self-Awareness: Self-Management

Three levels of performance describe how adolescents demonstrated self-management (Table 10.8). Overall, the majority of adolescents (50.7%) were able to demonstrate how to control self in a negative or stressful situation through repression of emotion or avoidance. They were less able to respond adaptively when presented with situations in which they might be directly confronted or attacked.

10.3.3.5 Self-Awareness: Perspective Taking

Three levels of performance describe how adolescents demonstrated perspective taking (Table 10.9). Most (64.7%) of the adolescents were aware that others may be impacted by multiple factors. They were, however, less able to see views on self from the perspective of others. Again, there is a slight skew in results from Zanzibar with less than expected performing at the lowest level, and more at the highest level. The lower proportion of adolescents performing at the highest level is a clear indication of the greater complexity of this skill, that perhaps requires more experience or maturation.

Review of the proficiency levels across each of the constructs reveals some differences in distributions across the jurisdictions. In the main, these differences are re-aligned with reference to proximate categories. This finding supports the patterns revealed by the person-item maps, that adolescents across the four jurisdictions responded in very similar ways, differing only slightly in the actual performance levels (as demonstrated in particular by the smallest group, Zanzibar). These similarities provide confidence in the sequence of incremental learning steps in the life skills and value processes.

10.3.4 Proficiencies by Age and Education

Gender had no impact at any levels for any of the constructs—in other words, males and females performed similarly to each other across the board, regardless of jurisdiction. The disability index information presented in Table 10.3 was derived from parent data where these parents identified that their adolescents had ‘at least some difficulty’. Based on the data collected on disability, there are no associations with performance on the life skills and value tasks. Accordingly, detailed information on performance by disability is not reported here. Information on age and education status is provided since these two factors appear to be associated with performance. In brief, as adolescents age, and as adolescents move through education grades, their performance improves (Table 10.10).

Table 10.3 Gender, age distribution and education status of adolescents
Table 10.4 Summary of reliability coefficients
Table 10.5 Collaboration: descriptive statements and adolescents’ proficiencies
Table 10.6 Problem solving: descriptive statements and distributions of proficiencies
Table 10.7 Respect: descriptive statements and distributions of proficiencies
Table 10.8 Self-management: descriptive statements and distributions of proficiencies
Table 10.9 Perspective taking: descriptive statements and distributions of proficiencies
Table 10.10 Proficiency levels of adolescents by age and education

Age has an influence on the demonstrated proficiencies of adolescents across all the constructs. Older adolescents demonstrate higher proficiencies compared with the younger adolescents. For example in the case of collaboration, 13.3% of the adolescents aged 15 to 17 years compared with 6.4% of the adolescents aged 13 to 14 years, collaborated through taking positions and contributing ideas, prompting others, and being attentive to others’ inputs (Level 4). On Level 1, 7.4% of adolescents aged 15 to 17 years compared with 11.7% of the adolescents aged 13 to 14 years did not engage either by being attentive to discussion, speaking, or through action.

Education level also is associated with increasing proficiencies. More educated adolescents demonstrated higher proficiencies compared with the less educated adolescents. For example in self-management, 34.5% of the adolescents who have reached secondary level of education compared with 18.7% of those who have attained a primary level of education are sufficiently self-aware and confident to respond adaptively even when directly confronted or attacked (Level 4). On Level 1, 15.8% of the adolescents with a secondary level of education compared with 29.8% of the adolescents with a primary level of education are unable to regulate negative emotions or responses.

Although there are obvious associations between age and education level, it is the latter that appears to be more strongly associated with increasing proficiencies, as illustrated in Fig. 10.7. There is no doubt that adolescents currently studying in secondary school demonstrate higher proficiencies than do those still in the primary years. Whether this is due to the effect of schooling or to other factors, uncontrolled for, that characterise the two groups, is not known. The primary school attendees account for more than double the sample size of the secondary (see Table 10.3) but both groups are very sizable, so the difference cannot be attributed to the variability that might be associated with small sample sizes. There is also a general flattening effect over age for the primary school attendees, and for the self-management dimension of self-awareness across the full sample.

Fig. 10.7
2 line graphs with error bars for primary and secondary education of mean versus adolescent age plots 4 lines for collaboration, problem-solving, respect, S A self-management, and perspective taking. Left, the lines decrease after 16 age except for problem solving. Right, the lines are increasing.

Associations between age, education, and proficiencies

10.4 Discussion

ALiVE developed an assessment of three life skills and one value, creating a tool that gathered responses from adolescents to a variety of scenario-based and performance tasks. The open-ended responses of the adolescents were coded according to rubrics that allowed for evaluation of levels of quality in those responses. The coded data were then analysed according to their hypothesised contributions to over-arching constructs, and in some cases to dimensions and subskills. The aim was to develop a measure that would generate information about what adolescents are able to do in terms of collaboration and problem solving, and how they perceive themselves and others around them in terms of respect and self-awareness. Scale reliabilities and person and item fit statistics calculated from the collected data support the validity of the assessment for its intended purpose. Given the comprehensive and systematic sampling, generalisability of the results can reasonably be claimed. To date, there has been no other study to collect evidence of life skills and values at large scale through household-based assessment. The initiative demonstrates that robust and useful tools can be developed for use outside of the formal classroom space to generate data that is useful within that space.

There are limitations to note. These include impact of the chosen medium of assessment (oral one-to-one administration in the household), use of coding rubrics, training of Test Administrators, use of local language translations. The impact of some of these limitations is unknown and will require more controlled assessment environments in order to ascertain. However, the matters of range of targeted proficiencies, and training of the Test Administrators, are briefly discussed in this section.

The tools are simple in structure and the approach to coding of adolescent responses is not highly nuanced. Scoring rubrics were sufficiently behaviour-focussed to enable Test Administrators a high degree of reliability in coding. Decisions about the nature of the tasks themselves and the decision to code across only three or four levels were made in the light of the logistics and realities of household-based assessment in low resourced environments. These decisions have had the consequence that more finely delineated differences in proficiencies between adolescents are not accommodated in the current descriptive statements. Development of more assessment tasks and more detailed rubrics could be modelled on what has been used to date in order to increase the range of information that can be captured. In ALiVE all adolescent respondents engaged with at least three of the constructs. This meant reduced assessment time available for any one of these and so led to pragmatic decisions about complexity of tasks and coding.

The training of Test Administrators was undertaken with a similar approach across jurisdictions. However, there were some differences in criteria used to recruit and select these, and the actual training events were particular to each jurisdiction. Due to this potential variability, and also to general concerns where ratings or performance are required, it is possible that assessor bias might have influenced the assessment process either on a case to case basis, or through slightly different jurisdictional approaches to the training. The interaction between Test Administrators and adolescents also could influence the confidence of the adolescent, who typically was engaging in an unfamiliar process—notwithstanding that the content of the assessment tasks themselves was familiar, being based in daily life. The extent to which this possibility could have led to under-estimates of real proficiencies is not known. Further research with the ALiVE tools could focus on investigating issues of bias and interaction modes.

Another potential influence on adolescents’ capacity to respond at their most proficient level was language. The tools were translated into several local languages commonly spoken in the sampled districts, and adolescents were given the choice of language in which to interact. During the assessment, Test Administrators would use a language with which the adolescent was most comfortable, typically either English or the local language. After electing for one language, in some cases adolescents requested to switch to another to comprehend the tasks better. Another aspect of the language issue pertains to the actual translations of the tool. These were not subjected to back translation as a standard process, and different quality assurance approaches were used. Further research under more controlled conditions could explore the influence of language on assessment results.

The creation, development, and use of ALiVE’s tool was accomplished through the efforts of teams across the three participating countries (Turner et al., 2024; Chap. 11, this volume). One outcome of these efforts was the assessment of over 45,000 adolescents, generating data on life skills and a value to inform curricular and assessment needs of the four educational jurisdictions involved. Another outcome was the confirmation that life skills and values can be captured through a household-based assessment model, providing a template for the future.