Fun is important in several aspects of life. Next to being a defining element of leisure and play, it is increasingly understood to play a key role in learning, work and social interactions. Researchers and developers in the fields of educational technology and child-computer interaction often inject fun elements into their systems just like educationalists who often aspire to make learning activities enjoyable. Research has demonstrated that the promise of fun has an inviting effect (Long 2007), that fun increases engagement with learning technologies and with learning activities (Iten and Petko 2016; Long 2007; Rambli et al. 2013; Vieira and da Silva 2017) and having fun has a positive effect on learning outcomes (Chan et al. 2019; Elton-Chalcraft and Mills 2015; Long 2007; Lucardie 2014; Rambli et al. 2013). Additionally, neuroscience provides evidence at the biochemical level for the positive effects of fun in the learning environment and on learning (Willis 2007).

Despite that “fun” is often mentioned, the concept behind the term and its measurement are not always clearly defined. The Cambridge dictionaryFootnote 1 defines “fun” as being an informal expression for pleasure, enjoyment or entertainment, and as behaviour or activities that are not serious; games or jokes. In the academic literature, the terms “fun” and “enjoyment” are frequently used interchangeably (Elton-Chalcraft and Mills 2015; Fowler 2013; Iten and Petko 2016; Long 2007; Mellecker et al. 2013; Romero 2014). In other cases, these notions stand next to each other as close relatives, however, complementary to some degree (Dismore and Bailey 2011). This also invites a scientific debate concerning the distinction between the two concepts (Dismore and Bailey 2011). Nonetheless, in recent years, it is usual that papers discussing fun simply do not elaborate on what their understanding of the concept of fun is (Iten and Petko 2016; Knowles et al. 2017; Tokuhisa et al. 2015). In our conceptualization, we understand “enjoyment” as a term that describes positive emotions, while we consider “fun” as a more extensive, nuanced and complex notion, which is yet more difficult to grasp and define. Specifically regarding children, we agree with the viewpoint of Bengoechea et al. (2004) that children are generally seeking positive experiences, and they label those as “fun”.

Despite the growing interest towards measuring the fun experience—with special regard to the relation to learning—currently, there is a lack of reliable measurement tools. Where they exist (mostly in the field of human-computer interaction), they rather measure product liking (acceptance or preference) with young, preliterate children (Fun Toolkit, Read 2008; Fun Semantic Differential Scales, Yusoff et al. 2011; This or That, Zaman et al. 2013). Or in case of adults, the most widely known instruments (EGameFlow, Fu et al. 2009; UES, O’Brien et al. 2018; GEQ, Poels et al. 2013) measure game enjoyment and engagement or the gaming experience along several dimensions. Remarkably, there is a gap in research with adolescents (age 11–18). The aim of the current study is twofold. First, it aims to investigate the notion and the meaning behind the term “fun”. Second, it aims to create a tool for the measurement of the experienced fun that is psychometrically and theoretically sound, comprehensive yet parsimonious, practical and child appropriate, specially developed for adolescents and can be used in the learning environment across various fields of research.

The paper is structured in three main parts. In the first part, we argue why experiencing fun during learning is important. We review the related literature in order to define and summarize the web of related terms. Thereafter, we discuss the methodological challenges of developing questionnaires for children, and we introduce existing tools for measuring fun, including their suitability for our target audience and scope of research (adolescents in the learning environment), and their possible pitfalls.

In the second part, we describe the development of the FunQ. We start with the construction of the initial item pool, we follow with the think-aloud study results; then, we finish with introducing the final version of the instrument, including its psychometric properties. The final version of the FunQ consists of 18 items along six dimensions (Autonomy, Challenge, Delight, Immersion, Loss of Social Barriers and Stress) and bears with the appropriate validity and reliability measures (ωoverall = 0.875 and ωpartial = 0.864; RMSEA = 0.052 and SRMR = 0.072).

In the third and last part, we discuss our research findings, the possible applications of the developed measurement tool, and summarize our scientific contribution.

Theoretical Background

Fun and Learning

Research into understanding the role of fun in the learning process is found mainly in the areas of serious games and gamification in education (Chu et al. 2017), thus concerns mainly the non-formal and informal learning environment. In these fields, there is growing evidence in support of the importance of fun for learning; we review this body of research below.

Bisson and Luckner (1996) elaborated on the pedagogical benefits of fun. They saw fun as a powerful tool to enhance motivation and create a safe learning environment. They summarized that fun is beneficial as (a) it evokes intrinsic motivation, (b) it facilitates the suspension of the social reality, (c) it reduces stress and (d) it creates a state of relaxed alertness where “learners feel safe to take risks, be creative, make mistakes, and most importantly, keep trying” (Bisson and Luckner 1996, p. 111).

Rambli et al. (2013) stated that fun and interactive learning are one of the most powerful pedagogical factors which could yield to create an interactive and engaged learning environment (Rambli et al. 2013) and they added that this environment facilitates the memorization procedure of learners while keeping their attention, which ultimately enhances learning. Based on the PISA test (N > 400,000 15-year-old students), Ainley and Ainley (2011) concluded that the sense of fun and excitement has a huge importance for science learning.

While investigating the learning outcomes of an educational game, Long (2007) found that 87.5% of the participants joined the activity in the first place because of the promise of fun (Fun in programming games) and it improved the skills of 79.8% of the participants. Moreover, she found that computer games “lead to positive results in long-term learner retention by improving learning interest and more focused attention because the students enjoy the approach” (Long 2007, p. 280)—although she used the terms “fun” and “enjoyment” interchangeably. She also demonstrated that fun has a significant effect on the learning effort.

In a recent systemic review on emotions in design-based learning (DBL), Zhang et al. (2020) found that design-based learning had an overall positive effect on students’ interest and motivation to learn. Accordingly, enjoyment was among the most frequently mentioned emotions in DBL; however, this review also reflected that fun and enjoyment are barely distinguished or measured separately.

Vieira and da Silva (2017, p. 130) stated that “fun is an important element of life because it satisfies curiosity and fosters learning”. They encouraged designers to “make their artefacts fun” in order to stimulate users to use them. In their understanding, fun consisted in attention, flow, immersion and emotion. Tews and Noe (2019) even went further and said that “fun is an important component of high quality learning experiences” (Tews and Noe 2019, p. 226).

Chan et al. (2019) investigated the role of perceived fun in a collaborative learning scenario and the learning performance while using personal response systems (PRSs). Their results suggested that “the level of fun students experienced using PRSs was found to promote collaborative learning and learning performance” (Chan et al. 2019, p. 99).

Iten and Petko (2016) studied whether fun playing an educational game was a predictor for learning success. They used the terms “fun” and “enjoyment” interchangeably. They found that the experienced enjoyment and flow during the game had a significant effect on gaining motivation, increasing interest in the subject matter and upon choosing to play the game again. However, their study could not demonstrate any association between the experienced enjoyment and the learning gains, which is in contrast with previous findings. And, they questioned “whether ‘fun’ and ‘enjoyment’ are adequate constructs to grasp meaningful motivational processes in serious game experiences” (Iten and Petko 2016, p. 161). They referred to other authors who proposed instead “student engagement” to analyze positive emotions when learning with serious games. Similarly, Sim et al. (2006) did not find significant correlation neither between the observed nor the reported fun and the learning outcome.

Controversially, Tews et al. (2017) found that fun had a significant impact on informal learning in the working environment (i.e. learning from others and learning from non-interpersonal sources). They stressed that “researchers should not necessarily focus on fun as a unidimensional construct” (Tews et al. 2017, p. 52). Additionally, their findings suggested that the managers’ support for fun had a significant influence on learning (learning from oneself) as well.

Similarly with adults, but in the learning environment, Lucardie (2014) in her qualitative research found that “both adult learners and their teachers also believed that fun and enjoyment impacted on adults learning and they were able to articulate the role that fun plays in adult learning programs” (Lucardie 2014, p. 445).

Elton-Chalcraft and Mills (2015) summarized their study results as follows: “Learning which is enjoyable (fun) and self-motivating is more effective than sterile (boring) solely teacher-directed learning” (Elton-Chalcraft and Mills 2015, p. 482). This finding is supported by Aoki et al. (2004) who investigated how the education of children with type-1 diabetes could be improved. They developed three edutainment tools and tested them. Their findings suggested that children patients found the games fun (compared to the researchers’ previous study on traditional learning methods), 91.4% of the respondents showed more interest toward the edutainment method, and more than 60% of them found that this approach would be useful as an initial education for type-1 diabetes children. They thus concluded that “edutainment systems could have a significant potential for healthcare education especially for children” (Aoki et al. 2004, p. 859).

Willis (2007) wrote about the neuroscience of joyful education. “When students are engaged and motivated and feel minimal stress, information flows freely through the effective filter in the amygdala and they achieve higher levels of cognition, make connections, and experience ‘aha’ moments. Such learning comes not from quiet classrooms and directed lectures, but from classrooms with an atmosphere of exuberant discovery” (Willis 2007, p. 1). She added that “when classroom activities are pleasurable, the brain releases dopamine, a neurotransmitter that stimulates the memory centers and promotes the release of acetylcholinem, which increases focused attention” (Willis 2007, p. 2). Additionally, she claimed that despite “some schools have unspoken mandates against these valuable components of the classroom experience” (Willis 2007, p. 3), no neuroimaging or brain wave analysis data exist that would demonstrate any downshifting effect of “joy”—a term she used interchangeably with “fun”—in the classroom.

Thus, in sum, previous research provides a growing support for the view that fun has positive effects on learning. However, the above-introduced studies do not define and measure fun, and moreover, none of these studies is a controlled experiment which would compare the effects of introducing fun elements versus not. Only one study—of Iten and Petko (2016)—comes close by studying the effects of enjoyment but in that study, no distinction is made between fun and enjoyment. In order to be able to make such claims precise regarding the effect of fun on learning, or to evaluate such activities, we need to define clearly what fun is and have a reliable instrument to measure it.

Characteristics of Fun

Bisson and Luckner (1996) synthesized earlier scholarly attempts to define fun into four characteristics that were inherent to it. They argued that “fun” is a relative, situational, voluntary experience and natural/essential to all human beings. With relative and situational, they meant that fun depends on many factors, e.g. what one finds fun is not necessarily fun for another, nor is it certainly fun on another day. Fun is voluntary as “to experience fun one must consciously or unconsciously accept to feel good, to relax, to let go and to let the situation be perceived as enjoyable” (Bisson and Luckner 1996, p. 109). Fun cannot be “forced”, so, for an activity to be perceived as fun, the participation must be intrinsically motivated.

Glasser (1986) argued that fun is one of the five most essential human needs and emphasized its importance while learning and especially for child development. He stated that fun “is like a catalyst that makes anything we do better and worth doing again and again” (Glasser 1986, p. 28). Accordingly, Read et al. (2002) discussed returnance as a facet of fun in child-computer interaction; in this context, returnance meant the desire to do an enjoyable activity again and again.

Based on a 3-year-long study on attitudes towards physical education, Dismore and Bailey (2011) argued that the meaning attributed to the concept of fun changed in merit as children approached teenage years. While for younger children (7–11 years), fun was a critical factor for an activity to be enjoyable; teenagers (11–14 years) described fun in terms of a learning challenge rather than in relation to hedonic responses while playing games (Dismore and Bailey 2011). Along with the aforementioned properties of fun, its stress-reducing effect has to be mentioned as well (Caine and Caine 1991).

Fun and Play

During childhood, “fun” and “play” often concur. Despite that the two concepts are closely related, they should be distinguished from each other. Sutton-Smith (2011, p. 3) defined play as “an activity that is voluntary, intrinsically motivated, fun, incorporates free will/choices, offers escape, and is fundamentally exciting”. Gajadjar, de Kort and IJsselsteijn considered play as an “intrinsically motivated, physical or mental leisure activity that is undertaken only for enjoyment or amusement and has no other objective” (Gajadhar et al. 2008, p. 105). Especially in early childhood, “fun” and “play” are overlapping notions. However, as indicated already (Dismore and Bailey 2011), with adolescence, the hedonistic character of fun that is purely present during play gives way to challenge, which is a well-established dimension of fun among adults (Fu et al. 2009; IJsselsteijn et al. 2008; Ryan et al. 2006).

Intrinsic Motivation and Social Aspect

Bisson and Luckner argued that the promise of fun “can motivate learners to engage in activities with which they have little or no previous experience” (Bisson and Luckner 1996, p. 110). Therefore, fun is not only an experience but it can be itself a strong, intrinsic motivating factor to encourage children to try new challenges. Already back in the 1980s, Malone and Lepper (Malone 1981; Malone and Lepper 1987) emphasized the importance of intrinsic motivation that according to their theory, it could be evoked by the optimal level of challenge, curiosity and fantasy. Bisson and Luckner (1996) also suggested that the combination of “fun” and “play” can act as a catalyst to eliminate inhibiting factors inherent to our socialization. In their opinion, “the more genuine and intense the fun is, the greater the suspension of reality will be. Consequently, fun can transform social insecurity into trust and camaraderie, and a restrictive self-image into the freedom of expression” (Bisson and Luckner 1996, p. 110). This property of fun is closely related to the concept of Flow.

Flow

The concept, or rather the experience of Flow, was defined by Csikszentmihalyi (1990) as an optimal experience, where the following characteristics were present: (a) an intrinsically rewarding experience, (b) a loss of reflective self-consciousness, (c) a distorted experience of time, (d) an intense and focused concentration on the present, (e) a merge of actions and awareness, (f) an optimal balance between challenge and skills, (g) a sense of control over the situation and (h) clear goals and immediate feedback. Considering the definitions above, it appears that experiencing the fun and being in the psychological state of Flow overlap substantially.

Abbasi et al. (2019) argued based on structural modelling that the theoretical constructs “experience” (a.k.a. Flow) and “engagement” should not be used interchangeably to investigate the subjective experience of video gameplay. Rather, they proposed a model of playful-consumption experience, which consisted of different types of experience (emotional and sensory) and different types of engagement (cognitive, affective and behavioural), and they discussed “enjoyment” as one of the emotional experience factors, which, in their terminology, were used interchangeably with the term “fun”. Thus, they considered enjoyment or fun as a part of the emotional experience.

As Tasci and Ko (2016, p.167) described, “A distorted sense of time, in general, is taken as an indicator of engagement, desire, enjoyment, excitement and thus, having fun; when it feels as if time went more quickly than it actually did, this is a sign of fun and vice versa”. Rodríguez-Ardura and Meseguer-Artola (2017) also stated that when one feels that the activity is going smoothly and is fun, then time flies, and one undergoes a distortion of the temporal experience of time.

Caine and Caine (1991) showed how learning is maximized when combining fun and challenge, which they called a state of relaxed alertness and suggested that a major goal for educators should be to challenge students in a natural way so conceptual mapping (i.e. intellectual connections) could happen without evoking a downshifting response. Mellecker, Lyons and Baranowski have also found while evaluating video game design with children that “an engrossing story in which a player faces increasing challenges and can increase skills quickly enough to overcome the challenges, but not so quickly as to get bored by the challenges, appears to provide an important game design structure for enhancing fun or enjoyment” (Mellecker et al. 2013, p. 144). Chu et al. (2017) described how during curriculum-based making activity children had the most positive feelings and got engaged the most when the level of challenge matched their skills. It has been mentioned already how adolescents attributed experiencing fun to challenge (Dismore and Bailey 2011) which corresponds to the challenge aspect of the Flow experience. Rodríguez-Ardura and Meseguer-Artola (2017), p. 902) explained the effect of challenge on Flow in terms of the cognitive evaluation theory (Ryan et al. 1983). They suggested that as long as individuals have a psychological need to feel competent, activities that trigger positive challenge can lead to experiencing optimum experienced and intrinsic motivation—because they satisfy the individual’s need for competence.

Distinction Between Young Children and Adolescents

Approaching the childhood from the perspective of cognitive and psychological development, both Piaget and Erikson account for the shift that occurs on the edge of adolescence. According to Piaget (1964), the formal operational stage begins approximately at the age of 11–12 and lasts into adulthood when children develop the ability to think about abstract objects and to logically test hypotheses. In the theory of Erikson (1950), stage 5 is approximately between age 12 and 18, during which children search for a sense of self and personal identity exploring their own personal values, beliefs and goals. Without further delving into the characteristics of the teenagers, it can be summarized that during this age children’s identity is formed and the way they understand the world changes. They begin to shape their opinion and they learn how to express it as well. Their cognitive abilities—such as memory capacity, language skills and concentration span—are approaching quickly the level of an adult’s. De Leeuw (2011, p. 15) argued that the “age of 11 is seen as a turning point in memory capacity when children appear to function as well as adults”. From the beginning of this age, therefore, they are less prone to the typical response biases that are common for younger children (see section Attention Span). Ultimately, this means that when they are asked about their opinion, the answers will be more differentiated than in younger ages (this is also shown by Read 2008, 2012; Read and MacFarlane 2006) and are generally more valid (Mellor and Moore 2014). Moreover, Dismore and Bailey (2011) showed that the meaning and the content of the concept of “fun” altered during childhood. This shift was found to be around age 11 as well, which is in synchrony with the psychological and cognitive changes while approaching adolescence. On the basis of these arguments, the definition and measurement of fun will address children over the age of 11.

Surveying Children

In designing measurement tools especially for children, their competencies and differences to adults have to be taken into account. Hall et al. (2016) emphasized that generally, children preferred Likert-type scales over similar simple response items and that free-recall questions were useful especially in spoken surveys (Read and MacFarlane 2006). De Leeuw (2011) found that in general, the older the child, the more reliable the answers will be and that children were better informants on topics directly related to them such as their feelings and other subjective phenomena. Furthermore, de Leeuw (2011, p. 6) stated that “Below the age of 7 children do not have sufficient cognitive skills to be effectively and systematically questioned” and added that individual (semi-) structured interviews were more suitable than questionnaires for children between 7 and 12 (see also Bell 2007). Mellor and Moore (2014) found that children below the age of 12 had difficulties in answering questions about abstract concepts such as their own behaviours, bodily states or emotional states. They related it to the theory of Piaget about the formal operational stage of development. Regarding adolescents, de Leeuw (2011) suggested that questionnaires could be used for adults; however, there should be special attention devoted against ambiguity in item wording. Therefore, using simple language and formulating items as exactly as possible is a must and ensuring language appropriateness by readability testing is highly recommended.

Read (2008) and de Leeuw (2011) discussed the challenges of designing measurement tools for children, and Mellor and Moore (2014) wrote specifically about the use of Likert-type scales with children. The main caveats which can jeopardize the reliability of surveys involving children are discussed below.

Attention Span

Children’s attention span (or sustained attention) is crucial for directly measuring children (i.e. not observing their behaviour). Attention span is defined as the time a person is able to selectively attend to relevant information, such as listening to a teacher and persisting on a task (McClelland et al. 2013, p. 315). Scientists have been studying children’s attention span for a long time. In a literature review dating back to the 1950s, Moyer and von Haller Gilmer (1954) reviewed nineteen studies which found that the attention span of young children ranged from 1 to 25 min. Additionally, they found that the attention span was lower in a group situation and that there was a difference “between work tasks, such as reading, and the activities in which a child engages in playing with a toy” (Moyer and von Haller Gilmer 1954, p. 466). Moreover, Sousa (2011) suggested that motivation had an effect on the attention span, and Bradbury (2016) even called this effect crucial.

Within the field of educational psychology, a generally accepted and referred rule of thumb for the length of the average student’s attention span is 10 to 15 min during lectures (Goss Lucas and Bernstein 2005; McKeachie and Svinicki 2006, and Wolvin 1983 in Davis 2009). Although formulas are available for approximating the length of the attention span per age, no supporting empirical evidence exists. Lin et al. (1999) showed that the sustained attention developed between ages 6 and 15. Nonetheless, since they have measured the sustained attention by the Continuous Performance Test (CPT), they only report on the hit and false alarm rates for the evaluation of sustained attention and did not provide the length of the attention span in minutes.

Controversially to the generally accepted 10–15 min, a study conducted at Microsoft (2015) claimed that the average attention span was only 12 s in 2000 and 8 s in 2013, and it is ever decreasing. Although the validity of these numbers has been contested (Bradbury 2016), it appears that people nowadays, and especially the younger generation, get distracted easily and hold their attention for less time than was the case in the past.

To safeguard the reliability of answers, it is important to consider children’s attention span adjusting the length of the inspection to the child’s attention span. Based on the above, a survey—that is not a particularly engaging task for a child—should not require sustained attention by adolescents for more than 10–15 min.

Bias

When working with children, the risk of introducing bias is high and different types of bias can be manifested compared to those concerning adults. The most common bias types concerning children are discussed below.

Suggestibility pertains to the influence of the researcher on the way the respondent encodes, stores, retrieves and reports events. This effect is due to a range of social and psychological factors. Social desirability bias is when the respondent provides the answer that s/he thinks the examiner asking the question wants to hear. Satisficing is a tendency of the respondent to select a good enough option, instead of the very best one. In the case of surveys, this phenomenon could be manifested as giving a superficial response that appears to be reasonable but without thoroughly considering all answer possibilities. Acquiescence bias is the tendency of the respondent to agree or respond positively. Extreme responding is the type of response bias when the respondent mainly selects the most extreme options/answers available. Straight-lining is the tendency of the respondent to provide answers in a way that the responses form a line or rather a visual pattern. Extreme responding is a sort of straight-lining; however, meanwhile, in extreme responding, it can be assumed that the respondent reads the question and considers the responses; in case of straight-lining, this assumption cannot be made.

Around the age of 11, the suggestibility of children decreases while the importance of peers increases. Therefore, peer pressure can be a serious issue with early adolescents (12–16 years) (de Leeuw 2011). However, contrary to adults, the item non-response appears not to be a problem with children and adolescents (Bell 2007). That is, the error size in responses by children and adolescents is approximately stable across different conditions and not dependent on the content of the question. This is assumed to be in relation to their cognitive abilities, namely that they cannot fully apply an optimizing strategy (Borgers 2003); therefore, they will not skip difficult questions. It has, however, a downside. The difficult or vague questions will not be indicated by a missing value pattern, but the quality of those responses remains doubtful. Therefore, the importance of simple and short questionnaire items is stressed and the application of think-aloud interviews to check whether any item is problematic is highly recommended.

Earlier research has shown that children are particularly prone to the above-described bias types (Bell 2007; de Leeuw 2011; Hall et al. 2016; Read 2008; Read and MacFarlane 2006) to very different degrees for different ages, which has to be considered when developing measurement instruments for any specific age group.

Existing Measurement Tools

Methods for evaluating fun derive primarily from the domain of human-computer interaction and especially child-computer interaction, where fun is seen as an essential component of children’s interactive experience whether they approach technology as users (e.g. of an application or consumer device), learners or players (Markopoulos et al. 2008).

There are a few existing measurement tools that have been designed to gather opinions on the “funness” of an experience or product. Where these exist, they target either young children or adults. Some studies report the use of a survey or a list of questions to be asked from children within the narrow scope of the study; however, they are not intended for further use nor are they validated. A list including the most-known tools for measuring (aspects of) fun is shown in Table 1.

Table 1 Measurement tools for evaluating preference/engagement/experienced fun.

The This or That (Zaman et al. 2013) method examines preference. Despite being a validated measurement tool, it is constrained by its comparative structure: it is only suitable when measuring the preference of one product/experience over the other, which is particularly suitable for its targeted age group of 2 to 7.

The Fun Semantic Differential Scales (Yusoff et al. 2011) is a measurement tool for evaluating games with nursery-aged children based on choosing between photos of a child expressing different emotions (love-do not know-hate). While it has been shown to work well for the target age group, it has not been psychometrically validated, and it addresses fun as a unidimensional construct and is not sufficiently refined for teenagers.

The Fun Toolkit (Read 2008) is a set of tools that targets a wide age range up until teenage to measure the “funness” (Smileyometer) and preference of products (Fun sorter and Again-again table). However, it handles fun as a unidimensional construct, and it faces a problem that younger children tend to use mainly the higher values of the Smileyometer. Despite being widely used, it has not been psychometrically validated.

The Five Degrees of Happiness (Hall et al. 2016) was introduced to address the extreme response bias of the Smileyometer discussed above. Its target audience is children between age 9 and 11, and like its predecessor handles fun as a unidimensional construct and has not been validated psychometrically. Furthermore, the emphasis on positive emotions makes it less suitable for assessing less pleasant experiences that might include frustration or disappointment.

A recent study in this domain proposed a list of Likert scales for the evaluation of a game and attitudes towards learning games (Iten and Petko 2016) which however has not been validated. Despite being suitable for teenagers and measuring multiple dimensions, those dimensions are not linked to fun and refer to the serious game rather than to the personal experience (i.e. How is the game instead of How do one feels while playing the game).

This limitation is avoided in the Physical Activity Enjoyment Scale (Kendzierski and DeCarlo 1991) which measures the personal experience of a physical activity, rather than the activity as such. However, it does not help conceptualize enjoyment which is measured as a unidimensional construct across bipolar scales.

The PENS (Ryan et al. 2006), UES and UES-SF scales (O’Brien et al. 2018) are validated measurement tools, made for adults and have a strong focus on the evaluation of games (usability, aesthetics, novelty, intuitive controls, in-game competence, etc.) rather than on the personal experience (flow, intrinsic motivation, etc.). This limits their applicability in different contexts and does not contribute to our purpose of defining and measuring fun as a psychological construct.

The GEQ (Poels et al. 2007) has the focus on how one feels while playing a game and measures enjoyment as a multidimensional construct; however, it has been validated only with adults in a gaming environment, which limits its applicability in different contexts. Besides, given that the scale is designed for adults, its vocabulary is quite advanced, so it is questionable whether children would be able to comprehend and rate the scale reliably.

In the same domain, a recent study (Abbasi et al. 2019) proposed a Playful-consumption experience questionnaire for the assessment of consumer video game engagement for adolescents. However, they measured enjoyment as a unidimensional construct and as a subdimension of emotional experience, and the questionnaire has not yet been validated.

The EGameFlow (Fu et al. 2009) measures the enjoyment of an e-learning game across the dimensions of Csikszentmihalyi’s Flow theory; thus, it equates enjoyment to the Flow experience, and it is validated for adults. Given the nature of the scale, its usability in different contexts with different ages is limited.

The FUN scale (Tasci and Ko 2016) handles fun as a multidimensional construct; however, it is validated to measure the fun value of a touristic destination as a product among adults, which is reflected in its vocabulary. Additionally, the focus of the scale is to evaluate whether a place, a hotel or a restaurant is fun and not to assess the personal experience.

The EmoForm (Zhang et al. 2019) is a recently developed tool for the assessment of emotions during design-based learning. While the instrument is designed for adolescents, it examines various emotions; however, it does not examine fun only enjoyment across a single item, and it has not yet been validated.

From this review of earlier work, we can conclude that there is currently no psychometrically validated inventory targeting adolescent respondents that is theoretically grounded and that treats fun as a multidimensional construct. The necessity of having multiple dimensions was not only shown by the theoretical review but it helps to conceptualize and define fun rather than treating it as an opaque descriptor or an umbrella term. The present study introduces an instrument designed to fill this gap by providing a tool that is psychometrically and theoretically sound, comprehensive yet parsimonious, practical and adolescent appropriate, and can be used in the learning environment across various fields of research.

For the development of the FunQ, only a fraction of the referred scales was relevant. It is important to mention that several items of different measurement tools overlap with each other (e.g. items measuring the Flow experience, the perceived competence, the enjoyment). Selected questionnaire items from the EGameFlow (Fu et al. 2009), the Evaluation of- and attitudes towards learning games list (Iten and Petko 2016), the FUN scale (Tasci and Ko 2016), the GEQ (Poels et al. 2007), the Physical Activity Enjoyment Scale (Kendzierski and DeCarlo 1991) and the Intrinsic Motivation Inventory (Ryan 1982) have been included in the initial item pool of FunQ, albeit, rephrased to be adolescent appropriate and to reflect a personal experience instead of evaluating the activity (e.g. I had fun instead of It’s [the activity] a lots of fun). Additionally, the pool of items was extended by further ones that reflect the underlying factors, and the adopted items were organized in the FunQ by the factorial structure proposed in this paper and not by the dimensions of the original instrument.

Development of the FunQ

The development of the FunQ consisted of four main phases applying a deductive scale development approach (Adèr et al. 2008). During the first phase, initially, the dimensions of the instrument were constructed based on the above-described theories. Then, a pool of possible items was created according to the previously defined dimensions and based on the cited theories and existing measurement tools. Second, the initial item pool of the instrument was tested with 75 children. Consequently, a comprehensive yet parsimonious model was created. Third, we conducted think-aloud interviews with six children to assess possible pitfalls relating to the questionnaire items. Fourth, the questionnaire was administered with another 150 children and the validity and reliability measures were calculated.

Methodology of the Survey Design

In the development of the FunQ, we paid attention to a number of key issues as follows:

First, to gauge relevant abstract concepts—such as challenge, fun, flow, stress and autonomy—questions were formulated by asking the emotional and behavioural reflections of these concepts. Then, statements were derived as responses to the questions based on the underlying theories. For example, How do I feel when I am enjoying an activity?I feel delighted and How do I feel when I’m in Flow?I feel that time flies. Then, the emotional and behavioural reflections were filtered, and the wording was adapted to be child appropriate, e.g. I feel delighted was transformed into I feel good and I feel happy. This approach aligns with the recommendation by de Leeuw (de Leeuw 2011) that the items should be worded to focus on how the child felt during the activity. It is not only easier for children to identify themselves in such items but also this phrasing avoids asking them to judge the activity, thus reducing the risk of social desirability bias, which could arise if they had to evaluate an activity designed by the teacher or another adult. Additionally, we can expect they will be more likely to use the full range of the scale rather than the extremes of the scale. This latter issue has been extensively addressed by previous studies (Hall et al. 2016; Read 2012; Read and MacFarlane 2006; Read 2008) on scales developed specifically for children.

Second, the questionnaire uses contra-indicative statements to measure factors that could indicate one is not having fun. The sentences within these factors are phrased in a non-negating, thus positive way; however, their content is anticipated to be contra-indicative for experiencing fun (e.g. During the activity I feld bad). Such factors are the experienced stress and tension, which are presented in the questionnaire by the initial factors Fear of Damage, Pressure and Stress. Having contra-indicative items among the statements of the questionnaire is intended to prevent—or highlight—acquiescence bias and straight-lining (see section Bias) and thus serve as a control for the reliability of the answers.

Third, the questionnaire items were phrased very briefly and in simple language, so that children would find them easy to comprehend and evaluate. Language appropriateness was checked by several measures, which from the Flesch Reading Ease (Flesch 1948) score is 77.9 and the Flesch-Kincaid grade level (Kincaid et al. 1975) is 3.7 for the initial—50 item—version of FunQ, and 84.7 and 2.7 respectively for the 18-item final version. This indicates that the text of the 50-item questionnaire is respectively fairly easy to read and is understandable for an average end-of-third-grade student (age 9), and the text of the 18-item final version is easy to read and is understandable for an average end-of-second-grade student (age 8). This is in agreement with de Leeuw’s suggestion (de Leeuw 2011) that the readability level of items should be about two grades lower than the target group.

Fourth, the appearance of the survey was created by considering the specialties of the target group. Therefore, based on previous findings (Bernard et al. 2001), the text was presented in Comic Sans type with 12 pt. size, which was found to be the most preferred font type and size among 9–11-year-old children. Besides, the items were highlighted with alternating colours—so that it makes easier to keep track of the responses—and additionally, a colourful design was created to make the questionnaire inviting for children (see Fig. Appendix B). The idea of adding cartoons to the design was considered but was abandoned for fear of the questionnaire appearing too childish for adolescents.

Fifth, there was special attention devoted to the type of response format. Based on the findings of Mellor and Moore (2014) and de Leeuw (2011), it was decided to use a 5-point Likert scale where the points are based on words that reflect the frequency of behaviour/thoughts (i.e. never/rarely/sometimes/often/all the time).

Sixth, the questionnaire was designed so that the response time should stay within the anticipated average concentration span of 10–15 min. The questionnaire items once finalized were randomly mixed.

Construction of the FunQ

The factorial structure of the FunQ was created by adopting a deductive scale development approach (Adèr et al. 2008) following similar steps as in previous research (Di Malta et al. 2020; Polo et al. 2018). That is, the factors were established strictly on the above-referred theories. Based on those theories, we made the assumption that the experienced fun is a multidimensional construct. The importance and the interrelation of the concerning concepts such as control over the activity (Csikszentmihalyi 1990), challenge (Chu et al. 2017; Csikszentmihalyi 1990; Dismore and Bailey 2011; Fu et al. 2009; IJsselsteijn et al. 2008; Mellecker et al. 2013; Ryan et al. 2006), enjoyment (Bisson and Luckner 1996; Glasser 1986; Read et al. 2002), engagement and immersion (Caine and Caine 1991; Csikszentmihalyi 1990; Tasci and Ko 2016), intrinsic motivation (Bisson and Luckner 1996; Csikszentmihalyi 1990; Glasser 1986; Malone 1981; Malone and Lepper 1987), social connectivity (Bisson and Luckner 1996; Csikszentmihalyi 1990) and stress (Caine and Caine 1991) have been argued in the earlier sections. Based on this reasoning, the following factors were established that covered all the referred theories and were expected to be defining for the experienced fun: Autonomy, Challenge, Delight, Fear of Damage, Immersion, Loss of Social Barriers, Pressure and Stress.

Then, the referred frequently used measurement tools were scrutinized whether they measure any of the dimensions of the FunQ. Consequently, some items from other measurement tools (Fu et al. 2009; Iten and Petko 2016; Kendzierski and DeCarlo 1991; Poels et al. 2007; Ryan 1982; Tasci and Ko 2016) were considered to be taken into the FunQ. For this, the above-detailed protocol for rephrasing and adjustment was followed. Thereafter, the number of questionnaire items was further expanded by items based on the emotional and behavioural reflections of the factor defining concepts while keeping the number of items limited with regard to the attention span of the target respondent population. The item pool was evaluated and adjusted in several consecutive steps according to topic experts’ recommendations.

In the initial version of the FunQ, the Experienced Fun is measured across eight dimensions. The Autonomy factor (4 items) measures whether the child experienced control over the activity. The Challenge factor (10 items) assesses whether the child felt challenged during the activity. The Delight factor (9 items) targets the positive emotions experienced during the activity. The Fear of Damage factor (4 items) aims to control whether the child experienced fear of hurting someone or causing damage. The Immersion factor (8 items) intends to indicate whether the child immersed in the activity by losing the sense of time and space. The Loss of Social Barriers factor (4 items) aims to monitor the social connectivity of the child. The Pressure factor (5 items) investigates whether the child experiences his/her own participation as voluntary or as obligatory. And, the Stress factor (5 items) measures the negative emotions experienced during the activity. In total, the initial version of the questionnaire consisted of 50 items which from 16 were reverse statements (see Appendix A for the items and Appendix B for the design of the questionnaire). The items were evaluated by the children on a 5-step Likert-type scale.

Analysing the Structure of Initial FunQ Item Pool

In order to assess the fit of the FunQ for its aimed purpose, the initial 50-item version of the FunQ was administered to children after they visited an interactive exhibition about the Dutch Delta Works. Based on the statistical analyses, a comprehensive yet parsimonious model was created which contains 18 items across 6 factors. Some items of the final model were slightly adjusted according to the following think-aloud study (see section Think-Aloud Evaluation of Initial Item Pool), which slightly modified 18-item version of the FunQ was used for the third study.

Think-Aloud Evaluation of Initial Item Pool

Besides statistically testing the fitness of the FunQ, we conducted think-aloud interviews (Ericsson and Simon 1993) with six children, for which we used the initial 50-item version of the instrument. The think-aloud interview is a commonly used method for assessing participant’s thought processes especially when confronted with a new situation or artefact. During the interview, the interviewee is asked to verbalize his/her thoughts on the subject of the usability testing while being actively engaged with it. With the think-aloud interviews, we aimed to get an insight (a) whether are problematic or misunderstandable items and (b) whether the children have the same understanding of the questionnaire items as it was intended by the researchers. In our case, the procedure was as follows. To start with, the interviewer explained the method and gave examples to the child of what is expected from him/her. Thereafter, the child was asked to read the FunQ items aloud and verbalize any thoughts that came into his/her mind. The role of the interviewer in this situation was mainly to observe and, if needed, to facilitate the verbalization and eventually to ask for clarification, applying a more relaxed approach to think-aloud than the recommendations of Erikson and Simon, in order to create a friendly and safe environment for the child participants (Markopoulos et al. 2008).

Psychometric Properties of FunQ

As a final step in the herein described study, we collected further data from 150 children visiting a museum with their school to test the validity and reliability of the final 18-item version of FunQ on a new data set.

Statistical Analyses and Measures

The think-aloud interviews were analyzed qualitatively (see section Think-Aloud Evaluation of Initial Item Pool). For the assessment of the psychometric properties of the instrument, the statistical analyses and measures are detailed below.

We applied confirmatory factor analysis (CFA) and second-order hierarchical latent variable modelling. Our choice for CFA is supported by the deductive scale development methodology we followed. That is, the FunQ the factors were established strictly on the above-referred theories, and according to previous papers (Hu and Li 2015; Suhr 2006), CFA is the appropriate choice when “the researcher uses knowledge of the theory, empirical research, or both, postulates the relationship pattern a priori and then tests the hypothesis statistically” (Suhr 2006, p. 1.).

Internal Consistency

For measuring internal consistency, Cronbach’s alpha (α) and the Omega (ω) coefficients were calculated. These statistics indicate whether the items measure the same underlying construct. Despite Cronbach’s alpha is the most widely known internal consistency measure, it has been the subject of considerable criticism (Dunn et al. 2014; Morrison et al. 2017; Peters 2014; Revelle and Zinbarg 2009). Thus, we adopt the Omega coefficient as the main indicator for internal consistency as it has been proven to be more reliable (Peters 2014; Revelle and Zinbarg 2009). Given that there is a wide variety for the acceptable internal consistency values (starting from 0.45 (Taber 2018)) for the reported study, we regarded the Omega values above 0.6 acceptable (Hair Jr et al. 2014; Tasci and Ko 2016).

Model Fit

For assessing model fit (whether the factorial structure found in the data is in agreement with the proposed one), we considered a variety of model fit indexes and addressed parameter estimates and their magnitudes. Hu and Bentler (1999) suggested to rely on a combination of indexes that have different measurement properties (e.g. CFI and SRMR). We used the selected indexes listed below based on the recommendations of Jackson, Gillapsy and Purc-Stephenson and Kline (Jackson et al. 2009; Kline 2015).

  • χ2 value. The χ2 value is a general, commonly used maximum likelihood approximation for the overall model fit. It tests whether the model implied covariance matrix differs significantly from the measured values. However, it is known that the χ2 value is affected by the sample size and is mostly significant when N > 75.

  • Comparative Fit Index. The Comparative Fit Index (CFI) is an incremental fit index which ranges from 0 and 1, with higher values indicating better model fit. The cutoff value for the CFI proposed by Hu and Bentler (1999) is 0.95 or higher, which indicates a good model fit.

  • Root Mean Squared Error of Approximation. The Root Mean Squared Error of Approximation (RMSEA) is a parsimony-based index. The index value typically ranges between 0 and 1, but higher values than 1 are also possible. An index value of 0 indicates a perfect model fit. When the p -value is ≥ 0.05; then, the hypothesis of close fit is justified. The cutoff value for the RMSEA proposed by Hu and Bentler (1999) is 0.06 or lower, which indicates a good model fit.

  • Standardized Root Mean Squared Residual. The Standardized Root Mean Squared Residual (SRMR) is an absolute fit index. The index value ranges between 0 and 1, where 0 marks the perfect fit. Thus, the lower the value, the better the fit is. The cutoff value for the SRMR proposed by Hu and Bentler (1999) is 0.08 or lower, which indicates a good model fit.

For modelling, second-order hierarchical latent variable models were fitted. For the data analysis, we used the RStudio 1.1.453 (RStudio Team 2016) software, for modelling the lavaan package (Rosseel 2012), and the semTools package (Jorgensen et al. 2018) to calculate Cronbach’s alpha and Omega coefficients.

Results

Analysing the Structure of Initial FunQ Item Pool

Data

Data were collected at the beginning of October 2018 from 75 students from the first year of a Dutch secondary school with English speaking specialization (39 boys, 33 girls, 3 missing, μage = 11.78, SDage = 0.45). Consent was attained according to the Dutch regulations: both parents and the child had to sign the informed consent form. The questionnaire was administered on paper after the children attended to Deltapark Neeltje Jans, an interactive exhibition and information center about the Delta Works, and designed and built their own dams. The data is coming from three groups.

Missing Data

The proportion of missing values both for the whole sample and the questionnaire items is 1.8%. Since the normality of the data cannot be assumed, we used the non-parametric test of homoscedasticity to check whether data is missing completely at random (MCAR). The test resulted in a nonsignificant p-value (p = 0.259); thus, it was assumed that the values are missing completely at random. Hence, to handle missing data, full information maximum likelihood (FIML) estimation was used.

The Initial 50-Item Pool

To begin with, the construct validity of the proposed factorial structure was assessed by means of confirmatory factor analysis (CFA) and second-order latent variable modelling. The contra-indicative items and factors that are marked with an (R) (see Appendix A) were reversed for the data analysis and for reporting the results. For details about the calculation of the coefficients, see the referred package (Jorgensen et al. 2018). The computation for overall internal consistency (ωoverall = 0.916) at the first-order level and for the second-order factor Experienced Funpartial = 0.909) revealed a high value of Omega coefficient.

The Final 18-Item Model

Since we aimed to create a comprehensive and parsimonious model, the factor loadings of the first-order factors and the second-order factor Experienced Fun were examined. First, it was investigated whether all proposed factors contribute equally well to the second-order factor Experienced Fun. The analysis revealed that the Fear of Damage factor had no significant effect on the Experienced Fun (standardized factor loading = 0.027, p = 0.891); therefore, it was removed from the model. Then, the number of the questionnaire items was reduced based on the factor loadings. In case the standardized factor loading of an item was < 0.3, it was considered not to be substantial (Hair Jr et al. 2014) for the given factor and in several consecutive steps, the non-substantial items were removed. Additionally, based on the modification indexes and factor covariances provided by the lavaan package, the Autonomy and the Pressure factors were merged. During the model fitting process, the internal consistency of the modified factors and the model fit was continuously monitored.

Comparing the final 18-item model to the initial 50-item pool, the Fear of Damage factor was completely removed as it appeared not to be related to the second-order factor Experienced Fun. Additionally, the Pressure and Autonomy factors were combined into one factor that measures the free choice/voluntary participation of the child, named Autonomy. The final 18-item version of FunQ including the factor loadings is presented in Fig. 1 and Appendix A. The descriptive statistics of the items are presented in Appendix C.

Table 2 summarizes the statistics of the final 18-item model. The internal consistency of the majority of the remaining factors is above the cutoff value (ω > 0.6) that indicates that the items are measuring the same underlying constructs. Despite the internal consistency of the Challenge (ω = 0.477) and Immersion (ω = 0.488) factors that is below the cutoff value, they both appear to have a significant effect on the Experienced Fun (pChallenge <0.001, pImmersion <0.001) and a standardized factor loading well above the 0.3 margin (0.719 and 1.022 respectively); therefore, the factors were kept in the model. In fact, the Immersion factor has a standardized factor loading above 1, which is unusual, however, acceptable, suggesting high correlations among the factors (Jöreskog 1999), that is desirable for second-order latent variable modelling.

Table 2 Statistics of the final 18-item model on the first data set. The internal consistency coefficients and the standardized factor loadings of the factors on Experienced Fun

The standardized factor loadings of the first-order variables on the second-order variable Experienced Fun in the final model are all above the cutoff value of 0.3; therefore, they are considered substantial.

The internal consistency of the second-order factor Experienced Fun is presented in Table 3. The Omega values are found to be above the cutoff value (ω > 0.6). This finding suggests that the Autonomy, Challenge, Delight, Immersion, Loss of Social Barriers and Stress factors measure with high reliability the same underlying construct, the Experienced Fun.

Table 3 Statistics of the final 18-item model on the first data set. The internal consistency coefficients of Experienced Fun as a second-order factor

The model fit indexes of the final 18-item model are introduced in Table 4. Despite the borderline values, given the relatively small sample size, the sufficient factor loadings, p-values and internal consistency values, in order to prevent hard fitting to the data, we decided to adhere to this model and test it on a new, larger data set.

Table 4 Fit indexes of the final 18-item model on the first data set

Think-Aloud Evaluation of Initial Item Pool

Data

The think-aloud interviews were conducted on 14 February 2019 at an international school in the Netherlands with six 11-year-old children after participating in a playful learning activity during which they prototyped a robot. Consent was attained according to the Dutch regulations: both parents and the child had to sign the informed consent form. Strengthening further the voluntary character of the participation in the interview, from the fifteen children who delivered the consent form complete, six who were willing were invited for the think-aloud interviews.

Analysis

For the think-aloud interviews, the initial 50-item pool of the FunQ was used for the sake of completeness. For the evaluation of the interviews, and thus, for the usability testing of the FunQ, the following aspects were considered:

  • Misreading of words

  • Difficulty with reading the item (when no reading difficulty was observed in general)

  • Asking clarification about the item

  • Adding comments that imply that the item is not relevant

  • Interpreting the item in a way that does not align with the intended meaning

In general, a good usability of FunQ was found. Specifically, the suitability of the design, the language and the scale labelling was justified by the interviews. It appeared that children—even being at the lowest range of the target user age group—went through most of the questions smoothly. Implying that FunQ is user-friendly, the appearance supports the evaluation of the items, which are readable (font type and size) and understandable (language). Also, the used labelling of the steps of the scale (chosen base on the suggestions of de Leeuw (de Leeuw 2011) appeared to be adequate as it helped children to think back and identify themselves with the statements.

According to the above-established criteria, based on the interviews, eight items emerged as problematic. Those are detailed in Table 5.

Table 5 FunQ items and the usability issue based on the think-aloud interviews

The items indicated by the interviews and the statistical analyses of the initial item pool were in approximate overlap, except for the items of the Fear of damage factor. None of those items appeared to be problematic during the interviews. Therefore, we can conclude that the items of the Fear of Damage factor are understandable and comprehensible to children. Hence, the nonsignificant effect on Experienced Fun is not due to the quality of the items.

According to the results of the think-aloud interviews, slight modifications were applied in general to the questionnaire and specific to item D4. Namely, item D4 was slightly modified from I want to do the activity again to I want to do something like this again. Additionally, since most of the children misread the activity as this activity, it was corrected accordingly in the whole questionnaire. Consequently, the questionnaire was used in this modified way during the next data collection and thus in the validation of the FunQ.

Psychometric Properties of FunQ

Data

Data were collected between 8 and 17 March 2019 during the British Science Week at the Science Museum. The reason for collecting the second data set in the UK was to collect responses from native English-speaking adolescents in order to ensure the quality of the instrument. The questionnaire responses were obtained from eight school classes who attained during this period the interactive Wonderlab: The Equinor Gellery program. In total, 150 responses were collected; however, the quality of 22 responses was questionable. They showed signs of typical response bias (straight-lining; see section Bias) or many responses were missing (e.g. the second half of the questionnaire) questioning the reliability of those responses. For the sake of data quality, the data of those 22 respondents were completely removed before the analysis started.

The validation was conducted on the data of the remaining 128 respondents (64 boys, 45 girls, 19 missing, μage = 12.15, SDage = 1.079). The questionnaire was administered on paper after the children had participated in the activity. Consent was attained across the class teachers according to the British regulations.

Missing Data

The proportion of missing values for the whole sample is 2% and for the questionnaire items is 1.7%. Since the normality of the data cannot be assumed, we used the non-parametric test of homoscedasticity to check whether data is missing completely at random (MCAR). The test resulted in a nonsignificant p-value (p = 0.093), thus it was assumed that the values are missing completely at random. Hence, to handle missing data, full information maximum likelihood (FIML) estimation was used.

The Model Fit of the 18-Item Model

To assess the validity of the previously established final 18-item model, we fitted it to the second data set. Table 6 introduces the statistics of the 18-item model on the second dataset; Fig. 1 depicts the model with the related factor loadings. The internal consistency of the majority of the remaining factors is above the cutoff value (ω > 0.6) suggesting that the items are measuring the same underlying constructs. Despite that the internal consistency of the factor Challenge (ω = 0.425) is below the cutoff value, it appears to have a strong significant effect (std. factor loading = 0.990, p = 0.002) on the Experienced Fun; therefore, keeping the factor in the model is justified. The standardized factor loadings of the first-order variables on the second-order variable Experienced Fun of the final model on the validation data set are all above the cutoff value of 0.3; therefore, they are considered substantial.

Table 6 Statistics of the 18-item model on the second data set. The internal consistency coefficients and the standardized factor loadings of the factors on Experienced Fun
Fig. 1
figure 1

The second-order hierarchical model results of the final 18-item model on the second data set. Standardized factor loadings are shown. All of the related p-values are below the 0.05 margin

The internal consistency of the second-order factor Experienced Fun is presented in Table 7. The Omega values are found to be above the cutoff value (ω > 0.6). This finding suggests that the Autonomy, Challenge, Delight, Immersion, Loss of Social Barriers and Stress factors measure with high reliability the same underlying construct, the Experienced Fun.

Table 7 Statistics of the 18-item model on the second data set. The internal consistency coefficients of Experienced Fun as a second-order factor

The model fit indices of the 18-item model on the second data set are introduced in Table 8. Evaluating the model fit indices and taking into account the limitations of the χ2 test, we can conclude that based on the RMSEA and SRMR values, the model fit is sufficient.

Table 8 Fit indexes of the 18-item model on the second data set

Discussion

In recent years, it has become common practice to address “fun” as a common sense notion instead of precisely defining the meaning of the concept (Iten and Petko 2016; Knowles et al. 2017; Tokuhisa et al. 2015) making its measurement difficult. Therefore, the aim of the paper was twofold: (a) defining the concept of fun and (b) creating a tool for the multidimensional measurement of the experienced fun that is psychometrically and theoretically sound, comprehensive yet parsimonious, practical and child appropriate, specially developed for adolescents and that can be used in the learning environment across various fields of research. To this end, we have adopted a deductive scale development approach, which is widely used in the field of industrial and organizational psychology (Tay and Jebb 2017). Accordingly, the conceptualization of the construct of fun was theory driven based on a thorough review of literature related to fun. We examined a network of related concepts contributing a theoretically founded conception of fun for our targeted demographic. We concluded that for adolescents to experience an activity as fun they need (a) to feel in control of the activity and be intrinsically motivated for participation (Autonomy); (b) to experience an optimal level of the challenge matching their level of skills (Challenge); (c) to feel well during the activity (Delight) and (d) to not feel bad (Stress, contra-indicative); (e) to be immersed in the activity losing one’s perception of time and space (Immersion) and (f) to let go of social inhibitions (Loss of Social Barriers). The FunQ is put forward as a tool for testing how a learning activity maps on its different dimensions.

Our conception of fun was tested by the statistical analysis of the created instrument. The final model consists of 18 items across six dimensions. Besides statistically testing the instrument, the comprehensibility and appropriateness for the target age group were checked by the think-aloud interviews, and the questionnaire was adapted accordingly. The final version of the FunQ has been shown to have reliable internal consistency both at the first- and second-order levels. Since the two data sets (for testing the initial item pool and for validating the final 18-item version) were collected at two different countries (The Netherlands and the UK), from eleven groups of adolescents who participated in three different kinds of learning activities, it is assumed that the revealed model is not activity specific. Additionally, given that the FunQ items are phrased in a general way, we anticipate that the instrument will be applicable in a broad range of different contexts to assess the experienced fun of an activity among adolescents.

The data analysis confirms our initial expectations that fun is a multidimensional construct. Among the dimensions of fun examined it seems that the Fear of Damage has no significant effect on whether adolescents experience an activity as fun; however, the existence of the rest of the proposed factors is confirmed with the note that the Autonomy and Pressure factors were merged as they appeared to measure the two extremes of the same dimension.

With the largest standardized factor loading, the Delight factor has the greatest contribution to the Experienced Fun. This factor focuses on the positive emotions and the related desires. It sounds natural that fun is a positive experience, and as such, it implies the desire for repetition (Bisson and Luckner 1996; Glasser 1986; Read et al. 2002). This aspect is captured by the Delight factor, which our findings indicate as an organic part of the Experienced Fun.

To maintain the engagement and therefore to stay in the activity while experiencing it continuously as fun, however, the optimal level of challenge is required. While previous researches with children investigated challenge and fun as separate constructs (Caine and Caine 1991; Chu et al. 2017), our model considers that challenge is a facet of the experienced fun. This idea appears in measurement tools designed for adults (Fu et al. 2009; Poels et al. 2007), however, for adolescents, the association has only been highlighted in relation to physical education activities by the qualitative study of Dismore and Bailey (2011). Our findings suggest that Challenge is the second most important factor of the Experienced Fun, though it is left for future investigations to establish the suitability of challenge as a dimension of fun for children of different ages.

The Immersion factor measures the loss of time and space. When one is deeply engaged, the immersion in the activity happens that leads to the loss of sense of time and space (Csikszentmihalyi 1990; Rodríguez-Ardura and Meseguer-Artola 2017; Tasci and Ko 2016). In the FunQ, this aspect is mapped by the Immersion factor which was found to have the third-highest factor loading on the Experienced Fun.

The Autonomy factor bears the fourth-highest standardized loading and it assesses whether the child feels control over his/her participation as well as the activity itself. As it was summarized above, fun is a voluntary experience (Bisson and Luckner 1996); therefore, intrinsic motivation is seen a key factor for participation (Mellecker et al. 2013). Additionally, applying the Flow theory (Csikszentmihalyi 1990), it was expected that feeling in control over the situation is related to motivation as well. Our finding supports this theory as the Autonomy factor that refers to the experienced control over the participation and the activity itself appeared to have a significant effect on the Experienced Fun.

Compared to the research instruments currently used to measure fun, e.g., in the context of evaluations of interactive systems and educational (serious) games, this study also includes contra-indicative items and factors to the construct of Experienced Fun to enhance the validity of the tool but also to allow assessing whether the activity that is intended to be fun causes unintentionally any distress to the participants. The antagonism between stress and fun has previously been taken as obvious. Caine and Caine (1991) mentioned the stress-reducing effect of fun, however, without statistically testing it. Our findings provide supportive evidence that negative emotions are contra-indicative for experiencing fun as the effect of the Stress factor was found to be significant.

According to Flow theory (Csikszentmihalyi 1990), immersion should result in social barriers to be largely removed. That is, while experiencing the fun and immersing in the activity, the suspension of reality is triggered, which, in turn, leads to loss of self-consciousness. Once the person is less self-conscious, he/she is becoming less engaged with him/herself, is less afraid of rejection, and more open for others, which ultimately results in the breakdown of social barriers. Bisson and Luckner (1996) indicated that in the case of children, the combination of fun and play could act as a catalyst to eliminate inhibitions inherent to our socialization. Our findings support this theory as the Loss of Social Barriers factor had a significant contribution to the Experienced Fun. In other words, while having fun, children could connect to each other easier than usual.

Regarding the psychometric properties, the internal consistency measures (Cronbach’s alpha and Omega) for the second-order factor Experienced Fun provide evidence that the questionnaire measures reliably the underlying construct. And, the model fit indices suggest a sufficient model fit. It is therefore proposed that the FunQ is suitable and valid to measure the experienced fun with adolescents. However, the role of challenge on the experienced fun among adolescents is proposed to be further investigated, especially as the internal consistency of the Challenge factor did not meet the criterion level (ω > 0.6).

Comparing FunQ to other instruments, FunQ covers a similar ground to the This or That (Zaman et al. 2013), the Fun Semantic Differentiel Scales (Yusoff et al. 2011), the Fun Toolkit (Read 2008) and the Five Degrees of Happiness (Hall et al. 2016) instruments; however, FunQ is a theoretically founded instrument, which handles fun as a multidimensional construct instead of being unidimensional, and it is designed for- and validated with adolescents instead of young children. Regarding the PENS (Ryan et al. 2006), the UES and UES-SF (O’Brien et al. 2018), the GEQ (Poels et al. 2007), the Playful-consumption experience questionnaire (Abbasi et al. 2019) and the EGameFlow (Fu et al. 2009) scales and the list of Likert scales for the evaluation of a game and attitudes towards learning games (Iten and Petko 2016), they all designed for the gaming environment; hence, their usability is limited in the learning environment for which FunQ has been created. Additionally, the aforementioned scales mainly target adults and mostly focus on the evaluation of a product or game, in comparison with FunQ, which is designed for adolescents and focuses on the personal experience while being engaged with a learning activity. The FUN scale (Tasci and Ko 2016) is validated to measure the fun value of a touristic destination as a product while the Physical Activity Enjoyment Scale (Kendzierski and DeCarlo 1991), as its name suggests, evaluates a physical activity; thus, they both target a different field than FunQ. Comparing FunQ to the EmoForm (Zhang et al. 2019), they are both designed for adolescents and for the learning environment, but the former focuses on the experienced fun as a multidimensional construct, while the latter investigates a broader range of emotions, handles enjoyment unidimensionally and has not been validated yet. Therefore, we conclude that FunQ is a much-needed instrument measuring fun as a multidimensional construct covering playful learning activities involving adolescents.

In sum, this paper contributes to (a) a review of the literature regarding the concept of fun, (b) a conception of fun as a multidimensional theoretically motivated concept, (c) a multidimensional instrument for assessing experienced fun—the FunQ targets specifically adolescents both in the design, the content and the response format—and (d) a psychometric evaluation and validation of the proposed instrument.

Limitations and Future Work

The herein introduced study is limited to the general population of adolescents in the learning environment. Additionally, as mentioned above, particular attention should be paid to investigate further the role of challenge on the experienced fun for different ages and settings. To further expand the potential of the questionnaire, follow-up studies shall investigate the psychometric properties of the questionnaire for different ages and examine its scope of application: whether the FunQ can be applied to evaluate fun not only in relation to learning but in other activities in which fun can play a useful role, such as participation in experimental studies, child-computer interaction, playful activities and experiences. The validation of the instrument in other languages is already in progress. With this latter, we also aim to investigate the intercultural differences in the concept and experience of fun.