Current practice of assessing students’ sustainability competencies: a review of tools

While there is growing agreement on the competencies sustainability professionals should possess as well as the pedagogies to develop them, the practice of assessing students’ sustainability competencies is still in its infancy. Despite growing interest among researchers, there has not yet been a systematic review of how students’ sustainability competencies are currently assessed. This review article responds to this need by examining what tools are currently used for assessing students’ sustainability competencies to inform future practice. A systematic literature review was conducted for publications through the end of 2019, resulting in 75 relevant studies that detail the use of an assessment tool. We analyzed the described tools regarding their main features, strengths and weaknesses, as well as potential improvements. Based on this analysis, we first propose a typology of eight assessment tools, which fall into three meta-types: self-perceiving, observation, and test-based approaches, providing specific examples of practice for all tools. We then articulate strengths and weaknesses as well as potential improvements for each tool (type). This study structures the field of sustainability competency assessment, provides a criteria-based overview of the currently used tools, and highlights promising future developments. For the practice, it provides guidance to sustainability (science) instructors, researchers, and program directors who are interested in using competencies assessment tools in more informed ways.


Introduction
The world is in urgent need of competent professionals to contribute to societal transformations towards sustainability (Gordon et al. 2019), and educational institutions ought to prepare students for these roles (Barth 2016;Franco et al. 2019). In response to this challenge, there has been a proliferation of sustainability science programs (O'Byrne et al. 2015), which increasingly define the learning objectives for their students in terms of sustainability competencies (Salovaara et al. 2020). Competencies are "complex combination[s] of knowledge, skills, understanding, values, attitudes and desire which lead to effective, embodied human action in the world" (Crick 2008). There is increasing agreement on the set of key competencies in sustainability (Redman et al. 2020), namely, systems-thinking, futures-thinking, values-thinking, strategic-thinking, and interpersonal competencies (Wiek et al. 2011)). Similarly, scholars and educators have started to converge on effective and efficient pedagogies to develop these competencies (Brundiers et al. 2010;Frisk and Larson 2011;Barth and Michelsen 2013).
Yet, the practice of assessing students' sustainability competencies is still in its infancy . A broad range of assessment tools are currently in use for both research and instructional purposes (Cebrián Bernat et al. 2019). However, these tools are rarely selected with clear and informed intention, largely due to a lack of guidance in the literature . Despite a Handled by Tatsuya Kusakabe, Hiroshima University Center for the Study of International Cooperation in Education, Japan. growing body of research describing innovative pedagogies (Hallinger and Chatpinyakoop 2019), there is a shortage of empirical evidence of whether and in what ways these pedagogies are successful in developing students' sustainability competencies (Osagie et al. 2016;Mindt and Rieckmann 2017;Garrecht et al. 2018). Meanwhile, course instructors, curriculum designers, and program directors lack the means to effectively assess whether or not they are successfully educating sustainability professionals through their courses and programs, which is a core purpose of assessment (Kuh et al. 2014). This is a significant gap when it comes to constructive alignment (Biggs 1996) and putting all critical components of sustainability (science) education in place (Fig. 1). As this figure illustrates, reliable and valid tools for assessing competencies, which is the focus of this article, fulfill an important function in supporting structured teaching efforts and student learning for sustainability.
Education science researchers have called out traditional methods of assessment as inadequate for measuring multidimensional and performance-oriented competencies (Frey and Hartig 2009). Traditional assessments are already challenging for experts to create and apply properly (Reckase 2017) and adequate assessment of competencies even more so (Leutner et al. 2017). Nonetheless, much exploratory work on assessing competencies has begun (Hartig et al. 2007), though a review found that progress on competency assessment was limited, particularly in the non-cognitive dimensions (Zlatkin-Troitschanskaia et al. 2015). For sustainability competencies in particular, Barth (2009) provided a conceptual framing, and sporadic if increasing efforts to develop tools has been undertaken by individual instructors and researchers around the world (Cebrián Bernat et al. 2019). This growing body of research has yet to be brought together in a systematic review which compares the existing tools and provides guidance to instructors, researchers, and program directors.
This review article examines what tools are currently used for assessing students' sustainability competencies, as documented in the literature through the end of 2019. We conducted an in-depth analysis of a comprehensive sample of peer-reviewed publication (N = 75) and distilled a typology of assessment tools for sustainability competencies. We also evaluate strengths and weaknesses of these tools and offer avenues for improvements. The article provides guidance to instructors, researchers, and program directors who are interested in using competencies assessment tools in more informed ways.

Research design
To review literature on assessing students' sustainability competencies thus far, we systematically collected publications from SCOPUS, Web of Science, ERIC, and Google Scholar, published in English through 2019 resulting in a first pool of 3908 publications. Following Moher et al.'s (2009) and Fink's (2014) systematic review approaches, we then iteratively excluded publications by first reviewing the titles, then abstracts and finally the full text. This yielded 75 publications focused on sustainability competencies assessments (see appendix for a full description of procedures). For this sample, Fig. 2 shows the steady growth of publications on sustainability competencies assessments over the last 10 years. However, they still only represents less than 7% of the sustainability (science) education research field as reviewed in 2017 (Grosseck et al. 2019). The publications come from 35 outlets, yet, research took place almost exclusively in OECD countries (93%) and at higher education institutions (87%). Sustainability/environmental degree programs, teacher training, general education, and business/ management education were the most frequent foci areas of the studies. Research on assessment in sustainability (science) education appears to likely be in its emergent growth phase, trailing the pattern of research growth in sustainability science by about 15 years (Fang et al. 2018).
In reviewing the sampled literature, we identified 121 total tools in use (many of the 75 reviewed studies used more than one tool), which we classified into eight distinct types of tools currently being used to assess students' sustainability competencies. To be clustered into a type, a tool has to have a record of several applications (with documentation). We disregarded terminological differences in cases, where authors used different names for the same tool. We first generalized the descriptions to cover all specific tools under each type and then standardized the descriptions to make the tools comparable (Table 1). We then analyzed each tool (type) independently and in contrast to each other using a set of common attributes (Table 2). We finally appraised strengths and weaknesses of each tool (type), as well as explored potential improvements (Table 3). This appraisal was informed by insights on competencies assessments gleaned from the broader educational literature.

Typology of tools for competencies assessment
Instructors use a wide variety of tools for assessing students' sustainability competencies (121 in total were identified from this sample). They can nonetheless be clustered into eight major tools (types) ( Table 1), currently in use. Some of these types are quite broad (e.g., reflective writing), while others are narrower, but also more refined (e.g., concept mapping). Many studies used more than one tool (n = 31) with scaled self-assessment being disproportionately represented among these (80%) when compared to the overall sample (56%). Generally, there were only few cases, where a single tool was developed over multiple publications. The exception to that was the scenario/case test type, where four tools were iteratively developed over 14 publications.
We first present examples of each tool (Table 2). These examples were chosen based on three criteria: (1) representativeness of tool, (2) clarity of description in publication (a frequent deficiency), and (3) if they used the competency framework articulated by Wiek et al. (2011). We chose to purposefully select examples which use the same key competencies, so that comparability between tools is enhanced. In our sample, the Wiek et al. (2011) framework was the only one used across enough studies to make this possible, besides it being highly influential on the broader field of sustainability (science) education as noted in other reviews (Grosseck et al. 2019). However, it is not possible to conduct a comprehensive meta-analysis of assessment results due to the diversity of what is being assessed, i.e., the specific sustainability competencies targeted.
The examples are drawn from a single source for each tool. They are described by two sets of characteristics: one for the tool itself and one for its application. The table can    Cluster 1: self-perceiving-based assessment procedures Scaled self-assessment Students are asked individually to rate their agreement to pre-defined competencies statements on an 4-to 9-point Likert scale Before and after the course Quantitative data analysis Easy to administer, analyze, and scale (Cebrián Bernat et al. 2019) Integrated with other survey-based data collection  Produces quantitative data to which statistical analysis and modeling can be applied  Is an effective tool for formative assessment (Andrade 2019) and practice improves student self-awareness  Results are based on the unknowable way in which each student (inconsistently) interprets the prompt and the scale or understands the competency  Distance between items on scales cannot assumed to be linear (Bishop and Herron 2015) Students are unlikely to have ability to rate their own capacity in an activity they have never practiced ( be read horizontally to give an overview of each example or vertically to enable comparison between tools for each characteristic. The different tools were each fairly widely applied (as represented by the captured characteristics). The scope of applications described in Table 2 well represents those within the overall sample. For each tool, there was also quite a variety of application settings.
Having identified eight distinct assessment tools (types), each of the studies (full list in the "Appendix") was reviewed again, particularly with respect to the research methods used, and an analysis for each tool conducted. The first result of this analysis was that the eight tools can be further clustered into three meta-types: self-perceiving-based assessment procedures, observation-based assessment procedures, and test-based assessment procedures (see Table 3). The critical characteristic of the tool which determines the cluster is who is doing the assessment of the students' competencies. For self-perceiving-based procedures (e.g., reflective writing), the student himself/herself is assessing his/her own competence level and/or development. In applying observation-based procedures, instructors or experts assess students' competencies. The test-based assessment procedures use a predefined set of criteria (or "correct" answers) to evaluate students' competencies. This distinction in who assesses students' competencies leads to the tools within each cluster sharing much in common in terms of strengths and weaknesses.
Based on the analysis of the sample articles and review of broader education science literature, we compiled a distilled set of strengths, weaknesses, and best practices for each tool (Table 3). An exemplary citation was provided for each point whenever possible, typically representing many other sources. The column on current practice in Table 3 offers a generic description of the tool based on the full scope of examples, in contrast to the detailed, but specific examples offered in Table 2.

Discussion
We conducted a systematic review of the growing body of published research on the assessment of sustainability competencies. This review identified a wide range of assessment tools currently in use (more than 120 specific tools). Yet, despite this diversity on the surface, we argue for a typology containing eight major tool types that can be further grouped into three clusters of assessment procedures (Table 3). The tool types we specify overlap meaningfully with those utilized by Nicolaou and Constantinou (2014) in their systematic review of assessing a competence closely related to sustainability (modeling in science). In-depth insights into the tools comes via the examples included in Table 2 and through the appraisal summarized in Table 3. There are clear signs of substantial investment in model and tool building , multi-methodological triangulations , and the piloting of innovative assessment tools (see box 1, below). However, this appraisal also reveals flaws in the current assessment practice in sustainability (science) education: there is too little connectivity across studies, in particular regarding agreement on outcomes; an over-reliance on scaled self-assessment; and general insufficiency of actual tool development. The implications of these flaws can be seen in Fig. 1-unclear learning objectives (1) or the lack of a baseline assessment (2) undermine the effectiveness of even well-developed assessment tools.
Box 1. Novel assessment tools use in-vivo simulated professional situaƟons to assess students' sustainability competenciesfollowing a model from medical and social work educaƟon programs. A recently published study (Foucrier & Wiek, 2020) presents the results of testing such an assessment tool for an interdisciplinary graduate course in sustainability entrepreneurship at Arizona State University (several graduate programs involved). The students were provided with material and asked to prepare as sustainability consultants for a simulated city council meeƟng on infusing sustainability into the local economy. The tool was tested in two different seƫngs, one deployed with four of the graduate students at the local city hall with actual professionals (city council member, local government administrator, local business associaƟon representaƟve), and one with five of the graduate students at the university with "actors" (sustainability graduates and researchers). Student performances were evaluated against a set of 22 criteria. The test results indicate that the tool is valid/reliable against a number of these criteria and provided an assessment of student performance very close to actual pracƟce. Such an in-vivo assessment proved both resource and Ɵme intensive, but there are guidelines under which condiƟons this assessment tool seems most effecƟve and a worthy investment.
Other than the studies, where the same research group builds off of their previous work (scenario/case test type), there are no obvious connections (e.g., citations) made across research efforts. Even in the cases, where the same competencies are assessed (e.g., Wiek et al. 2011) and the same assessment tool is applied (e.g., scaled self-assessment), new studies are not building off the tool previously used (e.g., . The reviewed competency-like constructs that are currently used in assessments are often so differently described that a comparison across assessments is impossible. Besides drawing on Wiek et al. (2011), a handful of studies explicitly proposed "new" competencies such as sustainability and social responsibility (SSR) (Albareda Tiana and Alférez Villarreal 2016); others leave it quite unclear what competencies were actually being assessed (e.g., . Apart from making comparisons across assessments impossible, this ambiguity of learning outcomes undermines recognition and career trajectories of graduates from sustainability (science) programs.
Scaled self-assessment was by far the most commonly chosen assessment tool (56% of cases); yet, only rarely  has the tool choice been justified. In their descriptive review, Bernat et al. (2019) hypothesize that this type of tool is often selected, because "it is less time-consuming, easy to distribute amongst a larger number of students, and in turn it provides a larger amount of information." Several authors make the case for its pedagogical uses in sustainability science , in line with educational scholars who have advocated for self-reflection as a tool for formative assessment (Andrade 2019). However, as a tool of robust, reliable, and valid measurement of sustainability competencies, selfassessment falls much too short to warrant such popularity.
As Metzler and Kurz (2018, p. 8) conclude in their report on educational assessment procedure, "data gleaned from easy measurement tell us little about the student learning that matters most." Even among the assessment studies carefully selected for inclusion in this review, there is a tendency for development of assessment tools to be an apparent afterthought. The main topics of the studies are the pedagogical approach, case description, or programmatic innovation. Assessment as such is used to produce some empirical evidence to validate those initiatives' success. Little effort goes into tool development ahead of time or reflection afterwards. But there are many studies from the educational sciences (Barth and Michelsen 2013) that have rigorously developed assessment tools, which the practice of sustainability competencies assessment should adopt going forward. Some, such as the recent work of Mehren et al. (2018) are highly relevant (assessing systems thinking in geography), yet are not being learned from in sustainability science. We recommend four steps. First, developing a clear set of learning objectives/outcomes to be assessed, properly operationalized for the given context; second, providing a theoretical and empirical basis for selecting a particular assessment tool to be used; third, articulating a psychometric model which links the learning outcomes to the tool to be used; fourth, pilot testing the tool with a relevant sample population.
Many disciplines have adopted some form of sustainability (science) education and instructors ought to look for assessment tools to fit their specific teaching situation. The experiences so far suggest that combining assessment tools may be the best way to address the shortcomings of any particular assessment tool. For example, assessment tools with reasonable validity due to narrow learning objectives, e.g., , will likely have low reliability across contexts and content (Schuwirth and Van Der Vleuten 2011). Each assessment tool has inherent weaknesses even with proper development (which the typology helps to foresee); thus, triangulation should happen on two levels-within the clusters and between them. For example, combining scaled self-assessment with reflective writing (within a cluster) provides a more complete and meaningful picture of the students' views of their own competencies, while triangulating these results with a testing approach (between clusters) checks the validity of students' self-perception against an objective (if typically narrower) measure.
As mentioned above, individual cases of developing assessment tools seem quite promising. Beyond just the increase in the quantity of publications, some tools have been developed with rigor, along the lines of the four steps outlined above (e.g., . Additionally, it is critical to plan for ultimate deployment on a scale sufficient to the needs of sustainability (science) education (Arima 2009), a topic that Holdsworth et al. (2019b) have explicitly grappled with over a series of articles. Yet, for all the innovation that sustainability (science) education purports to offer pedagogically, the field has so far little to offer in terms of assessment. Inspiration could be drawn from many other educational fields (Leutner et al. 2017), in particular from medical education, with its innovative approaches to competency assessment (Lockyer et al. 2017). This is in line with other intriguing parallels between medical and sustainability (science) education. The recent in-vivo assessment described in box 1 drew its inspiration from the long and established practice of competencies assessment in medical education. Sustainability (science) education researchers and practitioners would do well to find inspiration in such corners.

Conclusions
This article offers a typology which provides guidance for instructors, researchers, and program directors interested in assessing students' competencies in sustainability. This typology, based on a systematic review and synthesis of the academic literature through the end of 2019, goes beyond description to offer an appraisal of eight types of assessment tools. The analysis of their strengths, weaknesses, and best practices distills the key lessons from the 75 peer-reviewed publications included.
Reflective of the rest of the field of sustainability (science) education, there is a lack of explicit agreement on what is being assessed. This makes comparison of results impossible but also challenges comparisons of the process of assessment (i.e., the tools themselves). Perhaps due to assessment not being the topic of primary research interest, the assessment tools are not typically well-developed and often inappropriately used. This is particularly true of scaled self-assessment, for which weaknesses are well documented, yet, continues to dominate current assessment practice. In response to the lack of robust assessment tools, many instructors, researchers, and program directors have chosen to apply more than one, an approach which is likely to have value even if utilizing tools with extensive development. The proposed typology provides a structure of the field as it is today. As more tools are being developed and refined, we would expect to distinguish more specific tools such as Concept Mapping (specific to systems-thinking competence) within each of the broader categories. Ultimately, it would be the meta-types (e.g., self-perceiving) which would form the critical organizing structure. Despite a bumpy beginning, current trends are quite positive, as more rigor is being applied in combination with meaningful innovations.
Considering the need for broad sustainability (science) education, efforts ought to be accelerated. If education is going to contribute to the needed global transformations, the scholarly community needs to generate more evidence about "what works" for teaching and learning (evidence-supported practices), and this requires robust assessment tools. As we briefly touched on, sustainability (science) education researchers need to draw much more heavily on work being done in other education research fields. These efforts should extend beyond just the research perspective to include coordination across the relevant parties. Researchers, for example, need to focus on linking outcomes to the actual learning processes, while instructors may emphasize the formative aspect, and program directors be concerned about objective and comparable measures for reporting. In these efforts, there is a need for innovative assessment approaches that more directly prepare students for their professional paths and the challenges they will be facing. comments on the manuscript's topic at various stages.
Funding Open Access funding provided by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix
Synthesizing a growing body of research, such as that on sustainability competency assessment, is best done through a literature review (Snyder 2019). For this study, we conducted a literature review following the procedures laid out by Fink (2014). This appendix describes, how we followed Fink's (2014) approach to be systematic, explicit, comprehensive and reproducible. We sought to identify all articles that were published on assessing sustainability competencies. To be sure that definitional differences did not accidentally exclude relevant articles, we searched for synonyms of competencies and did not include assessment in the search procedures (it is used in many other ways in sustainability fields, e.g., LCA), rather using it as a screening criterion. We drew from as broad a pool of publications as possible, so we conducted our search on Web of Science, SCOPUS, ERIC, and Google Scholar. Based on other reviews, we expected these databases to provide comprehensive coverage. The following search strings were used: a. Scopus i. Search the title, abstract and keywords; English; Through 2019 ii. TITLE-ABS-KEY ("competency" OR "competence" OR "competencies" OR "competences" OR "attribute" OR "attributes" OR "capability" OR "capabilities" OR "learning outcome" OR "learning outcomes") AND TITLE-ABS-KEY ( education) AND KEY ( "sustainable development" OR "sustainability") AND LAN-GUAGE ( english) AND PUBYEAR < 2019 AND ( EXCLUDE ( SUBJAREA, "MEDI") OR EXCLUDE ( SUBJAREA, "NURS") OR EXCLUDE ( SUBJAREA, "PHAR") OR EXCLUDE ( SUBJAREA, "HEAL") OR EXCLUDE ( SUBJAREA, "DENT") OR EXCLUDE ( SUBJAREA, "IMMU")) iii. 1398 results a. Web of Science i. Topic search (TS); English; Through 2019 ii. TS = (("competency" OR "competence" OR "competencies" OR "competences" OR "attribute" OR "attributes" OR "capability" OR "capabilities" OR "learning outcome" OR "learning outcomes") AND "education" AND ("sustainable development" OR "sustainability")) iii. 1198 results a. ERIC (proquest) i. Search Anywhere; 2 separate command lines; English; Through 2019 ii. "competency" OR "competence" OR "competencies" OR "competences" OR "attribute" OR "attributes" OR "capability" OR "capabilities" OR "learning outcome" OR "learning outcomes" | "sustainable development" OR "sustainability" iii. 830 results a. Google Scholar-search i. Used the software Harzing's Publish or Perish https ://harzi ng.com/resou rces/publi sh-or-peris h which searches and downloads up to 1,000 citations but has a character limit on searches ii. Through 2019| Sustainability, education |Competencies: 750; Attributes: 250; Capabilities: 250; "Learning Outcomes": 250 iii. 1,000 results After duplicates were removed, 3898 publications constituted the first sample. Following the structured review approaches of Moher et al. (2009) andFink (2014), we then iteratively excluded publications. We excluded irrelevant publications first based on titles (1747), abstracts (1241) and other content (108). Of the remainder, the full text was downloaded (except for 52 which could not be) and reviewed for a final exclusion (559). A detailed reading of each article was carried out resulting in a few more exclusions (64) and a final sample of 75 articles. At the title stage, only the most obviously unfit publications were excluded. An example title to remove was: "What attributes do Australian midwifery leaders identify as essential to effectively manage a Midwifery Group Practice?" The abstracts and full text were given more than one critical reading to determine inclusion or exclusion. The selection of articles was carried out primarily by the first author, with checks done by the co-author. Other experts in the field were consulted for missing publications. The criteria used to include publications (i.e., not put them in the exclusion group at each step) were: • English • Published or in-press by the end of 2019 • Education type (any level) of the following domains: o Sustainability focused education o Adding sustainability focus to other degrees/programs/general etc. o Environmental education with a strong sustainability related focus • Included specific learning objectives (e.g., competencies, capabilities, learning outcomes, attributes) • Includes an evaluation or assessment of impact of a program on learning objectives See Table 4.