Scenario Evaluation with Relevance and Interest (SERI): Development and Validation of a Scenario Measurement Tool for Context-Based Learning

Context-based learning (CBL) approaches have been recommended and expanded in science education to make science more relevant to students by connecting science content with students’ daily life. Subsequently, in order to implement CBL at school, a group of scenarios has been produced by several stakeholders. However, there is a lack of resources to measure effectively what makes a good scenario. Thus, this study aims to develop and validate a scenario evaluation instrument to examine students’ perspectives on science career-related scenarios through the lens of relevance and interest. For this purpose, 25 science career-related scenarios and a measurement tool, Scenario Evaluation with Relevance and Interest (SERI), were developed by a team of researchers for the EU funded MultiCO project. Then, lower secondary school students from three different countries, Estonia, Finland, and the UK, were asked to respond to the newly developed instrument after reading the scenarios, and their responses were analyzed by factor analyses and multivariate analysis of variance. According to the result, this instrument has good construct validity and reliability. However, it indicates one issue of discriminant validity between two factors, individual dimension and societal dimension. Also, significant gender differences were found in the Estonian sample regarding students’ perspectives on the scenarios. Possible interpretations of the results and implications of the suggested measurement tool are discussed.


Introduction
In the science education community, students' negative trends in science interest have been an issue in recent decades especially at the secondary school level. As interest has been revealed as a sound predictor of students' career aspirations in their future (Kang & Keinonen, 2017;Schoon, 2001;Tai, Liu, Maltese & Fan, 2006), young pupils' decreasing interest in science has affected students' low enrolment in advanced science courses in upper secondary school and college (Fouad et al., 2010;Simon & Osborne, 2010). It has been indicated that one of the major reasons giving rise to this phenomenon has been that science at school is irrelevant to students' everyday life (Gilbert, 2006;Holbrook, 2008;Stuckey, Hofstein, Mamlok-Naaman & Eilks, 2013). This issue has resonated so much with educators, policymakers, and researchers that many projects have been initiated and conducted in order to make school science more relevant to students' daily life, with the aim of drawing students' attention to science and STEM (Science, Technology, Engineering, and Mathematics) careers. One such attempt is the context-based science curricula movement, which was initiated in the 1970s and developed with various forms (Taconis & den Brok, 2016). In particular, since contexts have been designed to relate the science content to students' life, such as socio-scientific issues, context-based learning (CBL) in science education is unique as it has tried to develop and use scenarios relating to the contexts (Lubben, Bennett, Hogarth & Robinson, 2004;Pilot & Bulte, 2006). Needless to say, a variety of scenarios has been produced from multinational projects to foster the relevance and meaningful learning of science and science education (e.g. PROFILES (Bolte, Holbrook, Mamlok-Naaman & Rauch, 2014) or PARSEL (Nielsen et al., 2008)).
However, the important term Brelevance^has been interpreted and understood in several ways in the science education community, and thus, there has been a lack of consensus in understanding the meaning of relevance (Stuckey et al., 2013). Likewise, it can be assumed that there is little consensus in developing Brelevant^scenarios for CBL. Inspired by the study of Newton (1988), Stuckey et al. (2013) identified three dimensions of relevance in science education-individual, societal, and vocational dimensionsindicating the span of the present-future and the intrinsic-extrinsic range. This model offers a critical insight and several aspects relating to the meaning of relevant science education and its relation to science interest. Notably, in designing scenarios, this model can be considered as categories in measuring, for instance, relevant topics and contents of the scenarios. However, there is insufficient empirical research developing measurement tools of scenarios considering diverse aspects of relevance (e.g. Sormunen, Hartikainen-Ahia & Jäppinen, 2017). Accordingly, this study aims to develop a measurement tool of scenarios and validate it in order to design relevant scenarios for CBL in science education.

Definitions and Dimensions of Relevance in Science Education
As mentioned, making science classes more relevant to students' daily life has been emphasized in the last few decades because of low levels of students' interest in science. The concerns relating to science interest have been raised not only for the students who are planning to become future scientists but also for those who are expected to become scientifically literate citizens. Science interest has been related positively to students' involvement in socio-scientific issues that demands scientific understanding both in content and process (Pilot & Bulte, 2006). Particularly, since the 1980s were called Byears of crisis in science education,^this perspective has constantly been discussed within the science education community. Consequently, various national and international projects have been conducted in an attempt to make science education more relevant. Among them, the Relevance of Science Education (ROSE) project was one of the representative cases (Schreiner & Sjøberg, 2004) that brought the issue of relevant science education to the fore. In their project, the term relevance was defined widely as Bmeaningful,motivating,interesting,engaging,important,etc.^(p. 21), and thus, they investigated a variety of students' motivation in learning science in respect of relevance without any operational definition of the term relevance.
In the last 20 years, issues relating to relevance science education have been discussed rigorously in chemistry education (e.g. King, 2012;Pilot & Bulte, 2006). However, it can be argued that the discussion can be applied to all other science subjects, as chemistry is one of the sciences. Van Aalsvoort (2004a, b) examined a lack of relevance in chemistry education with the lens of logical positivism and activity theory. As the results of the analysis, she suggested four distinguished meanings of relevance as personal, professional, social, and personal-social relevance and their implications in teaching chemistry. For instance, in terms of personal relevance, chemical education should make connections between the contents and students' life because Bchemical objects, events, and concepts do not have a life of their own, but are connected with products we all use^ (Van Aalsvoort, 2004a, p. 1648. In the end, her studies aimed to point out that students should be supported to become responsible citizens by chemical education that embeds the clear contribution of chemistry professions and the social needs with regard to personal-social relevance. In line with Van Aalsvoort, Stuckey et al. (2013) introduced a threedimensional model of relevance in science education-individual, societal, and vocational dimensions. Specifically, they have specified the meaning of each dimension with two spectra-timeline (present and future) and motivation (intrinsic and extrinsic). As an example, regarding the intrinsic individual dimension, they suggested that science education should consider students' interest (present) and useful skills needed in future for personal life (future). In terms of the extrinsic individual dimension, they demonstrated that good marks in school (present) and acting responsibly in future (future) can be considered to make science lessons more relevant. However, as they indicated, there are many concepts overlapping each other and no clear distinctions between dimensions. Especially, since the term interest can be understood differently and widely in science education, it may be inadequate to put it in one dimension of relevance (individual dimension); indeed, the concept of interest has been related to all other aspects of relevance in their three-dimensional model as well. In other words, in case individual interest is placed among one dimension of relevance, it may conflict the initial idea that relevance in science education should be promoted aiming to increase students' interest (Hulleman & Harackiewicz, 2009). Therefore, these two constructs, relevance and interest, would need to be distinguished in order to have a clear understanding of relationships between relevant science education and students' interest catalyzed by the relevance, as suggested by Kotkas, Holbrook and Rannikmäe (2017).

The Construct of Science Interest
Interest has been introduced with several theories in educational studies (e.g. Alexander, 2004;Krapp, 2002;Silvia, 2001). However, in contrast to other motivational variables, those theories have related interest to object or specific content such as a particular science content or context, subject, area of knowledge, or activity, and this idea has been dominantly used in the science education community (Potvin & Hasni, 2014).
Regarding the construct of interest, Hidi and Renninger (2006) used three components of interest-affect, knowledge, and value-to introduce the four-phase model of interest development in terms of sequential growth of interest considering the amounts of those three components. Similar to Hidi and Renninger (2006), Krapp (2007) introduced three general characteristics of interest as cognitive aspects (knowledge), emotional characteristics (affect), and value-related characteristics (value). Cognitive aspects or knowledge refer to the readiness to acquire new knowledge related to the person's interest since Ba person who is interested in a certain subject area is not content with his or her current level of knowledge or abilities in that interest domain^(p. 10). That is, the person with interest may attempt to expand knowledge and learn more about the topic. Emotional characteristics or affect refer to positive emotions such as enjoyment connecting with an interest-triggered activity; this positive emotion is often expressed as BI like something or doing something.^The value-related characteristics or value refer to positive personal evaluation on the object of interest since Ba person shows a high subjective esteem for the objects and actions in his or her areas of interest^(p. 11). According to one PISA report (Organization for Economic Cooperation and Development [OECD], 2007), students' personal value of science is clearly distinguished from their social value on science. Moreover, students often see science as an important contributor to societal development; students indicate general support for science needs, but they do not relate science to their lives (OECD, 2007;Salonen, Kärkkäinen & Keinonen, 2018). However, as is known, this personal value of science indicated a strong positive correlation with science performance (OECD, 2007). Interestingly, one of the questions used in PISA measuring the personal value was Bscience is very relevant to me^; that is, this value-related characteristic of interest may be closely related to the concept of relevant science education.

Context-Based Learning in Science Education
Given the notion that the use of context in science education is to emphasize students' Blearning as relevant to some aspect of their lives^ (Gilbert, 2006, p. 960), the movement of context-based learning (CBL) has closely been in line with relevant science education. King (2012) defined CBL specifically in chemistry as Bwhen the 'context' or 'application of the chemistry to a real-world situation' is central to the teaching of the chemistry…that is, when the students require the concepts to understand further the real-world application^(p. 53). Thus, the CBL movement eventually aims to bring science to students' lives (Bennett, Lubben & Hogarth, 2007).
In the 1970s, a large-scale project concerning CBL was started in the Netherlands at the University of Utrecht on the Dutch Physics Curriculum Project (PLON: Project Leerpakket Ontwikkeling Natuurkunde) aimed at connecting physics with students' daily life so as to support students to see science as more attractive and relevant (King, 2012). For this sake, the units of the PLON program started with an orientation concerning social and societal domains of the learner using a basic question that is linked, at last, to relevant science concepts. Followed by the PLON, several other largescale initiatives have tried to enhance CBL, and Pilot and Bulte (2006) introduced and analyzed five representative CBL-related major projects in different educational systems-Chemistry in Context in the USA, Salters in the UK, Industrial Science in Israel, Chemie im Kontext in Germany, and Chemistry in Practice in the Netherlands. According to their results, all five approaches were developed in order to provide personal and societal relevance to students, and four of them reported that participants enjoyed CBL more than traditional approaches and indicated higher interest in science and more positive opinions about the relevance of science content after CBL experiences. Regarding students' understanding of science concepts, Pilot and Bulte (2006) compared several studies investigating students' conceptual understanding using samples from those five projects and concluded that CBL students indicated a deeper understanding of science concepts than those who participated more traditional approaches. Although some criticism has remained regarding the effects of CBL, there is much evidence reporting positive effects of CBL on students' attitude to and understanding of science (Bennett et al., 2007).
Regarding CBL curricula development, Bennett and Holman (2002) stated that Blearner motivation has been the strongest driving force in the development of 'relevant' curriculum materials^since the aim of CBL is to make science more attractive and meaningful to students. For this, the contexts are generally used as the starting point of science lessons to attain and retain students' attention and to develop their scientific ideas contrasting with traditional teaching instructions (Bennett et al., 2007). However, as Gilbert (2006) pointed out, it has been a challenging to relate science content to students' personal relevance in developing context-based curricula. Subsequently, in order to support relevant curricula development, numerous scenarios have been developed and disseminated to the science education community via many largescale international projects. In the PROFILES (Professional Reflection Oriented Focus on Inquiry-based Learning and Education through Science) project, for instance, a three-stage model approach was introduced consisting of the introduction of relevant socio-scientific scenario as the first stage, inquiry-based learning as the second stage, and students' engagement in a decision-making process relating to the socio-scientific scenario as the last stage (Bolte et al., 2014). During the 5-year project, scenarios relating the content to the contexts were produced and practiced by 21 partner countries, and the scenarios have been publicly open for everyone on the website.
In sum, scenarios in CBL play a pivotal role in connecting school science content with students' daily life so that students can have a positive attitude to learning science. In order to achieve this goal, scenarios should be perceived as relevant and interesting to students, and each construct-relevant and interest-may consist of three components as shown in Fig. 1. However, although much effort has been made to assess the effect of CBL, there is a lack of studies developing measurement tools of the scenario itself considering the relevance and interest of the contents. Since the scenarios have been located in the center of CBL, it is important to evaluate whether the scenario meets the students' needs in terms of relevant and interesting science education. Accordingly, this study aimed to develop and test a measurement tool of scenarios in the lens of the three-dimensional model of relevance and the three characteristics of interest. We also have taken gender and cultural differences into account since these two variables are important in developing and disseminating the tool to multinational studies and environments. This study is a follow-up, extended study of Kotkas et al. (2017) which examined the scenario evaluation instrument with one particular scenario from one particular national sample.

Sample and Questionnaire
The sample of this study is from three countries, Estonia, Finland, and the UK, participating the EU project, BPromoting Youth Scientific Career Awareness and Its Attractiveness through Multi-Stakeholder Co-operation^(MultiCO). The MultiCO project aims to make science and science careers more attractive to students; for this, career-based, science-related scenarios concerning diverse global challenges, such as energy, water, waste, food, or health issues, have been developed and introduced in order to increase students' awareness of science careers and the role of science in society. During the project, five successive interventions using scenarios are being implemented in each country; an instrument, the Scenario Evaluation with Relevance and Interest (SERI) survey for measuring students' perceptions of career-related scenarios, was developed; before the first intervention, a large-scale survey was carried out to evaluate the scenarios in all participating partners with the instrument, SERI; overall, 25 scenarios were evaluated by students from each country. Students evaluated scenarios as a group or an individual; 574 responses were gathered from 320 girls (or groups of girls) and 253 boys (or groups of boys); 1 respondent did not indicate his or her gender. Most of the participants were aged 13 (see Table 1).
The instrument initially consisted of 28 questions measuring students' perception of scenarios. According to the result from the small-scale study of Kotkas et al. (2017) using the same instrument, six factors were indicated-learning value, vocational value, scenario attributes, career awareness, social value, and like and interest. However, before a large-scale study, we conducted the rigorous literature review again, built a new conceptual framework, and decided to select 21 questions that were related Fig. 1 The role of scenarios in context-based learning to three dimensions of relevance-individual, societal, and vocational dimensionsand three characteristics of interest-affect, knowledge, and value. In addition, since we found two different aspects of vocational dimensions of relevance in the questionnaire-knowledge gain and future aspiration-we categorized them separately. Each subcategory of relevance consists of three to six questions. Regarding the interest domain, we found four questions reflecting three characteristics of interest and general interest. Finally, we had two categories with eight subcategories as shown in Table 2. While questions relating to the relevance were asked using a four-point Likert scale between totally disagree to totally agree, interest-related questions were asked with a three-point Likert scale considering the nature of the questions and the age of participants (Kotkas et al., 2017). However, since we mostly investigated the correlations of variables in our analyses, these different scales did not affect any of results.

Data Analysis
We mainly conducted three statistical analyses-exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and multivariate analysis of variance (MANOVA)-in order to achieve our research goals. For the first phase, we conducted EFA and CFA in order to find constructs related to different dimensions of relevance and interest with the 21 questions presented in Table 2. For the factor analyses, the data were split randomly into two halves from each country. The first half (n = 286) was used to conduct EFA while the other half (n = 288) was used to perform CFA. With the SPSS 23 software, EFA was conducted in order to reduce the items to a smaller number and to investigate the underlying structure of the variables, since EFA is used when the factorial structure of the measuring instruments is unknown (Wang & Wang, 2012). Before conducting EFA, the Kaiser-Meyer-Olkin (KMO) and Bartlett's tests were conducted to check whether the sample was proper for conducting EFA. KMO index above .80 is used to determine whether the sample was appropriate. The principal axis factoring (PAF) extraction method and varimax rotation with a standard eigenvalue greater than 1 criterion were used in EFA. Based on the result of EFA and previous literature, CFA was performed to measure a construct validity using Mplus version 7.4 software (Muthén & Muthén, 1998-2010. Traditional cutoff values were applied in order to assess the goodness of fit: RMSEA (Root Mean Square Error of Approximation) and SRMR (Standardized Root Mean Square Residual) below .08, CFI (comparative fit index) and TLI (Tucker-Lewis Index) above .90 (Wang & Wang, 2012). Then, we checked the internal consistency reliability of each factor with Cronbach's alpha and composite reliability (CR) coefficient to see if they were above .70 in order to assess the internal consistency of each factor. In addition, convergent and discriminant validity were examined based on Fornell and Larcker's (1981) criterion. According to the criterion, the average variance extracted (AVE) value should be higher than .50 to ensure convergent validity, but for a newly developed scale, .45 is viewed reasonable (Netemeyer, Bearden & Sharma, 2003); a squared root value of AVE for each latent

IND2
The knowledge I gain from the scenario may be useful in the future.

IND3
I can put knowledge gained from the scenario into practice, to solve problems.

IND4
I find this scenario topic important for me personally.

IND5
I find this scenario topic important to my family.

IND6
I find this scenario topic important for learning school subjects.
Societal dimension SOC1 I find this scenario topic important for appreciating the work of our local community (town, country).

SOC2
I find this scenario topic important for the whole world.

SOC3
The scenario presents a scientific problem, which is socially relevant. construct should be higher than each latent constructs' highest correlation to confirm discriminant validity. After assessing validity and reliability, we estimated measurement invariance between genders by means of a multiple group confirmatory factor analysis (MGCFA). Followed by Byrne (2010), we constrained factor loadings and item intercepts across group systematically in a hierarchical order. To begin, configural invariance was examined as a baseline model of the invariance tests. This invariance model tests whether the same number of factors with the same number of items exist across gender with no equality restrictions on other parameters. Thereby, we could confirm the suggested variables in this study explore same constructs across gender. Then, we measured metric invariance by constraining factor loadings across gender and compared the model fit of the metric invariance model to the configural invariance model. If there is no significant difference between the two models, Bmeasures across groups are considered to be on the same scale^ (Byrne, 2010, p. 209). Finally, this metric invariance model was compared to a scalar invariance model constraining both factor loadings and item intercepts across groups. No significant differences between the metric and scalar invariance models allow comparing factor means across groups. In these model comparisons, we first checked chi-square (χ 2 ) differences to see whether the differences were significant between the models. However, since the χ 2 test has been criticized due to its sensitivity to sample size (Chen, 2007), we also examined changes in CFI (ΔCFI), RMSEA (ΔRMSEA), and SRMR (ΔSRMR) based on the suggestions by Cheung and Rensvold (2002) and Chen (2007). The recommend criteria for invariance were ΔCFI ≤ .01, ΔRMSEA ≤ .015, and ΔSRMR ≤ .015 (metric invariance) or .03 (scalar invariance). After testing and fulfilling the measurement invariance, we conducted the MANOVA in order to find group differences between gender and countries.

Exploratory Factor Analysis
The KMO index of EFA was greater than .80 (.92), and Bartlett's test was significant (χ 2 = 4554.50, p < .001) indicating that our data was adequate to conduct EFA; thus, we further investigated other results of EFA. As shown in Table 3, 20 variables loaded greater than .5 remained and formed four factors. However, unlike our assumption, Bindividual dimension^and Bsocietal dimension^were gathered in one factor; that is, there was a high correlation between these constructs. Also, four variables relating to the three characteristics of interest were indicated as one factor. On the other hand, vocational dimensions were separated into two factors as knowledge gain and future aspiration, as predicted.
In order to examine the correlation of individual dimension and societal dimension that indicated as one factor, we conducted EFA for each country separately to see whether every country indicated a similar pattern or not. For this country-level comparison, we used the full sample size (n = 574) because the number of samples from each country became too small to conduct EFA if we only used half of them. Specifically, the number of Finnish samples dropped under 70 while the number of variables remained the same as 21; this small sample size may not be adequate for EFA with 21 variables (Everitt, 1975).
According to the EFA comparion, as shown in Table 4, students from Estonia and Finland indicated similar factor loadings on individual and societal dimensions likewise Table 3; only the UK sample presented two distinct factors on these two dimensions. Based on the EFA results, we decided to exclude IND6 and VFUT5 for further analyses.

Confirmatory Factor Analysis
Based on the results of EFA and previous literature, we constructed CFA model with the samples of Estonia, Finland, and the UK as shown in Fig. 2. CFA was performed with five latent constructs: the individual dimension, societal dimension, two vocational dimensions (knowledge gain and future aspiration), and interest. As shown in Fig. 2, the result presented that the values of factor loading were .54 to .91 exceeding the criterion of .50 (Hair, Black, Babin, Anderson & Tatham, 2006). The measurement model indicated an appropriated model fit (CFI = .931, TLI = .916, RMSEA = .058 (90% C.I. .048 and .068) SRMR = .059). Concerning the convergent validity, the  (Table 5); there can be an issue of discriminant validity between these two constructs in our sample. Regarding the constructs of relevance and interest, they fulfilled the convergent validity and discriminant validity while they also indicated a significant correlation to each other. With these five factors, internal consistency was examined based on the value of Cronbach's alpha and composite reliability (CR) greater than .70. As presented in Table 5, the Cronbach's alpha and CR reliability coefficient for each scale suggested that all of our scales had satisfactory reliability (> .70) in measuring students' perception on scenarios in terms of relevance and interest. In order to confirm two dimensions of relevance and interest, we briefly compared three CFA models, one-factor, five-factor (measurement model), and second-order models as presented in Fig. 3.
According to the result, there was a significant difference between the one-factor model and five-factor measurement model and fit of the one-factor model was unacceptable. On the other hand, while no significant difference was found between fivefactor model and second-order model, both indicated proper model fit; second-order model presented slightly better model fit than the five-factor model; thus, overall, the  Table 6).

Measurement Invariance Across Gender
As shown in Table 7, the configural model indicated an acceptable fit for all the goodness-of-fit values providing adequate support for configural invariance of the suggested measurement. Regarding the model comparison between configural and matric and between matric and scalar, χ 2 tests indicated statistical differences for both comparisons. However, most of the goodness-of-fit values remained the same; that is, ΔCFI, ΔRMSEA, and ΔSRMR were much smaller than the suggested criteria (ΔCFI ≤ .01, ΔRMSEA ≤ .015, and ΔSRMR ≤ .015) Therefore, this finding supported that none of these constraints resulted in a model fit decrease, and thus, the measurement invariance of the SERI survey across gender was fulfilled.

Multivariate Analysis of Variance
Finally, we investigated group differences via MANOVA by examining the mean differences in relevance and interest towards scenarios by gender and countries (Estonia, Finland, and the UK) and the results were derived based on estimated marginal means. As shown in Table 8, regarding 2.5 as a central value, students indicated that presented scenarios were moderately relevant individually, socially, and vocationally. However, students responded that science careers introduced in the scenarios were not likely to be related or relevant to their future career aspiration. Table 9 presents group differences in scenario evaluations concerning relevance and interest. There were no significant gender differences with regard to the relevance of scenarios while significant gender differences were found in accordance with interest in scenarios. On the other hand, statistical group differences were found at the country level among three countries, Estonia, Finland, and the UK, regarding both relevance and interest in scenarios, except the vocational dimension in future aspiration (p = .055). Thus, we conducted an independent t test and a post hoc test to investigate where these gender and country-level differences were derived from.
In order to check the gender gap with regard to interest in scenarios, independent t test was conducted for each country. As shown in Table 10, while there were no gender differences in Finland and the UK, Estonian students indicated a significant gender difference in interest in scenarios (.78, p < .05) that boys' interest was very low compared to girls and students of other countries. Table 11 presents the mean differences between Estonia, Finland, and the UK. Regarding the individual dimension, Estonia indicated a significant difference with Finland and the UK; regarding the societal dimension, Estonia indicated a significant difference with the UK; regarding the vocational dimension of knowledge gain, all three countries indicated differences with each other; regarding the vocational dimension in future aspiration, there were no significant differences between countries; lastly, with regard to interest in scenarios, Estonia again indicated a significant difference with Finland and the UK probably due to boys as shown in Table 10.
Overall, Estonian students presented some differences in regard to their perspectives on relevance and interest in the introduced scenarios while Finnish and the UK students presented similarities. In addition, as shown in Fig. 4, these differences resulted from Estonian students' negative responses on the questionnaire.
Thus, we further conducted an independent t test with the Estonian sample for all five factors to measure the gender differences. The mean comparisons using t test with Bonferroni correction indicated that boys presented such negative responses to all five aspects that overall value of Estonian sample indicated a significant difference with Finnish and the UK samples (see Table 12). That is, Estonian students' gender differences in their perspectives on scenarios caused country-level differences with Finland and the UK eventually.

Discussion
While CBL has been recommended and expanded in science education (Gilbert, 2006;King, 2012;Pilot & Bulte, 2006) and scenarios have been developed and introduced as the starting point of science lessons to relate science content to daily contexts in CBL approaches (Bolte et al., 2014;Bennett et al., 2007;Nielsen et al., 2008), there is a lack of resources to measure good scenarios that have been placed at the center of CBL in terms of relevance and interest. Therefore, this study aimed to develop and validate an instrument to examine students' perspectives on science careerrelated scenarios reflecting three dimensions of relevance and three characteristics of interest. According to the previous literature, the survey, and the data analyses, we have found four relevance-related factors-individual, societal, vocational knowledge gain, and vocational future aspiration-and one interest-related factor. This newly invented instrument has been investigated by three different country samples-Estonia, Finland, and the UK-in order to measure gender and country differences. According to the results, this instrument has good construct validity and reliabilities. In addition, the constructs of relevance and interest were perceived differently by students although these two constructs have significant correlations with each other. Also, gender differences were found in the Estonian sample. The instrument, the scenario evaluation with relevance and interest (SERI) survey for measuring students' perceptions on career-related scenarios, was designed based on three dimensions of relevance by Stuckey et al. (2013) and three characteristics of interest by Krapp (2007) and Hidi and Renninger (2006). According to the rigorous  statistical analyses based on the samples from three countries, this instrument is well qualified concerning reliability and validity. However, we found one issue of discriminant validity between two constructs, individual dimension and societal dimension of relevance, from Estonian and Finnish samples. This issue with discriminant validity in these two countries may result from a small number of participants. There are no specific guidelines regarding minimum sample size for conducting factor analyses; however, it has been recommended to have a ratio of 10 times the number of variables (Everitt, 1975); in this study, we used 21 variables; that is, each country needed a minimum of 210 samples according to the recommendation, but Estonia and Finland had 177 and 133 respectively while the UK included 264 responses. Interestingly, the UK indicated different factor loadings with the other two countries, and the individual and societal dimensions were grouped in different factors. However, the recommended sample size is not only about the ratio but also, Bis related to the number of variables, the number of factors, the number of variables per factor, and the size of the communalities^ (Mundfrom, Shaw & Ke, 2005, pp. 160-161) and still it is inconclusive. Another possible explanation for this issue can be attributed to the students' age group since Bthe individual dimension might be more important for younger children, but this importance will shift towards societal relevance as the child grows and matures^ (Stuckey et al., 2013, p. 24). Students participating in this survey were mostly 13-year-olds and this year of the age group has been deemed as a transition period to adolescence. Therefore, there may be a chance to see these individual dimension and societal dimension of relevance as similarly important from the participants. For further research, subsequently, it is recommended to investigate different age groups, such as students in upper secondary school, with the same instrument to see whether the pattern of factor loadings is different. Concerning the constructs of relevance and interest, in contrast to Stuckey et al. (2013) suggesting interest as a part of the individual dimension of relevance, this study found that students clearly distinguished relevance and interest as different constructs in measuring the scenarios, and these two constructs indicated a moderated but statistically significant relationship. This result is in accord with recent studies indicating that when students were encouraged to relate science courses to their lives, their interest in science was increased (Bennett & Holman, 2002;Hulleman & Harackiewicz, 2009). As Kotkas et al. (2017) argued, although the concepts of relevance and interest are closely related and overlap each other, it is not justified to consider these constructs as synonyms or interest as a part of the relevance. Rather, our result suggests that it would be beneficial to measure students' perspective on scenarios in terms of interest and relevance separately and to find their relationship if they are highly correlated, since it may be possible that the scenarios are relevant but not interesting and vice versa.
Regarding the group differences, we found country-level differences in students' perspectives on relevant and interesting scenarios resulted from low scores by Estonian boys. While the UK and Finnish students and Estonian girls indicated moderate scores in evaluating scenarios, Estonian boys presented a very negative trend in their responses and all those responses were significantly lower than other participants statistically. According to the previous study of Teppo and Rannikmäe (2003) from a different project, Estonian students indicated gender differences in their preference of scenarios that girls preferred scenarios that dealt with health and outlook while boys have more interested in scenarios linked to economic and environmental problems.
Since the aim of this study was not to investigate students' preference for different scenarios, we did not examine gender differences in each developed scenario. However, it must be valuable to evaluate how each scenario with different topics affects students' perceptions differently in terms of relevance and interest.
Relating to the MultiCO project, the scenarios were perceived as moderately relevant and interesting to participating students. In particular, the MultiCO scenarios were highly valued in terms of their societal relevance and vocational knowledge gain. On the other hand, with respect to the vocational future aspiration, students indicated that these scenarios were not relevant to their future careers. Although, in general, less than 25% of students expect to work in an occupation requiring science-related skills (OECD, 2016), given that the project aims to increase students' interest in science careers, this result may be problematic to achieve the goal of the MultiCO project with the scenarios. Considering the high correlation of vocational relevance with individual dimension, it is recommended to modify the scenarios in a way to increase the individual relevance score of students as well as the vocational dimension of future aspiration guided by the questions suggested in the SERI survey.

Conclusion and Limitation
In creating, developing, and modifying scenarios for CBL, educators should evaluate the scenarios through a proper measurement tool. This newly designed instrument, SERI, may be valuable for educators such as teachers or scenario developers as a tool to measure students' perceptions of scenarios in implementing CBL in science education. Knowing students' attitudes on scenarios may support educators to modify their instructions so as to draw students' attention to science more and to make their class successful. In addition, since relevance and interest are the most important indicators in designing scenarios and preparation of CBL, this tool will help teachers consider these constructs continuously in their science lesson. Notably, this tool can be used in multinational projects developing and using scenarios for the projects. Although there are attempts to design worldwide scenarios in multinational projects like MultiCO or PROFILES, since students are placed in different social, cultural, and environmental situations, there may be different interest and understanding towards scenarios in different countries and cultures. Thus, it is important to measure whether the scenarios indicate gender and cultural differences beforehand as we have done with SERI instrument. Indeed, results from different cultures may require to revise this tool. So we recommend conducting research using this tool with different samples from other cultural backgrounds in order to generalize or modify the results.
This study is not without limitations. One of the limitations is this study did not measure whether the highly rated scenarios ultimately lead to students' higher achievements or interest in science since we only measured students' interest in science-related scenarios, not science interest. However, as is known, the effects of scenarios will be varied based on how they will be presented by instructors. Also, as the previous studies indicated it could be assumed that if students see the scenarios as more relevant to their lives, their interest in science will be increased, and the increased interest in science might affect their achievements as well (Hulleman & Harackiewicz, 2009;OECD, 2007). In addition, as we discussed, this tool indicated the issue of discriminant validity between two factors, individual and societal dimensions of relevance from two participating countries. Therefore, it again is recommended to research this tool with different country samples and age groups. In addition, regarding the sampling collection, although we used multinational samples, they were from one specific project. Therefore, in order to generalize the result, unbiased large-scale sampling as has been done by PISA or TIMSS should be conducted.