Introduction

Engaging with material by generating “how” and “why” questions and seeking answers underpins effective learning (Day, 1982; Dewey, 1913). Motivational forces driving information seeking have commonly been described in terms of our curiosity and interest (Berlyne, 1949, 1950). Highlighting the potential value for learning, studies show that we recall information better when we are more curious about, or interested in it (Berlyne, 1954; Fastrich et al., 2018; Garner et al., 1991), and that curiosity elicits activation in brain areas associated with learning and memory consolidation (Gruber et al., 2014; Kang et al., 2009; Marvin & Shohamy, 2016; Mullaney et al., 2014). Accordingly, pedagogic researchers emphasise the need to nurture students’ curiosity and interest (Muis et al., 2018), in light of positive effects on academic achievement and engagement (Harackiewicz et al., 2012; Hulleman & Harackiewicz, 2009; Shah et al., 2018).

However, researchers studying curiosity and interest note a lack of consensus in how these important concepts are distinguished from one another (e.g. Peterson & Hidi, 2019). In early research the terms were often used interchangeably or synonymously (Berlyne, 1949, 1950; Day, 1982; though Berlyne’s position changed in subsequent publications). This initial lack of conceptual clarity led to divergence in how they were investigated, conceptualised and represented that persists to the present day (Murayama, 2019). While some maintain that the terms can be used interchangeably (there is no evidence to suggest they represent distinct psychological processes; Silvia, 2006; see also Litman & Silvia, 2006 for discussion), others argue that they should not (representing distinct processes; Grossnickle, 2016; Hidi & Renninger, 2019; Markey & Loewenstein, 2014). Practically, it is non-trivial to provide much-needed conceptual clarity of terminology, particularly in educational contexts (Marsh et al., 2003). If curiosity and interest represent different underlying psychological processes, they may affect learning outcomes differently, impacting on academic interventions and wider pedagogical practice.

Complicating the task of delineating the terms, modern frameworks outline dimensions of curiosity (for reviews, see Grossnickle, 2016; Loewenstein, 1994) and interest (Renninger & Hidi, 2016) that describe conceptually similar concepts called “curiosity” or “interest” (e.g. epistemic curiosity and situational interest), or treat interest as a dimension of curiosity (e.g. interest-type curiosity; Litman, 2019). Further, neuroscientists and computational modellers tend to focus on curiosity alone (e.g. Gottlieb et al., 2013; Kang et al., 2009; Kidd & Hayden, 2015; Lau et al., 2020), while educational researchers tend to focus on interest (e.g. Renninger & Hidi, 2016). This “conceptual gap” may simply result from siloed research traditions preferring different terminology (resulting from neuroscientists’ agnosticism on differences; see Kidd & Hayden, 2015) instead of critical theoretical distinctions.

In the current paper, we aim to provide a basis for conceptual clarity to complement theoretical work and facilitate future empirical work. Following in-depth theoretical review, we investigate whether there is a consensus that curiosity and interest are distinct (amongst non-experts) and what the nature of these distinctions are (Study 1). We then determine to what extent this non-expert consensus is shared by expert researchers (Study 2), assessing whether the consensus produces meaningful distinctions that can facilitate future empirical investigations.

Curiosity and Interest: Review of the Theoretical Literature

Major Theoretical Frameworks

Existing theoretical frameworks emerge from two broadly siloed lines of research; one into curiosity, the other into interest. Despite obvious relationships between them, the two lines have been developed independently with independent theoretical frameworks (Murayama, 2019). In separately describing the main theoretical frameworks, we simply reflect their historical separation: we do not presuppose that such frameworks necessarily describe different psychological processes.

Curiosity

Curiosity is characterised as a multidimensional concept (e.g. diversive/specific, see Loewenstein, 1994), with scope for broad definition (e.g. encompassing all information-seeking behaviour, from seeking answers to trivia questions to infants’ bias towards attending to high contrast images, motion onset and faces; see Kidd & Hayden, 2015). One key dimension concerns trait/state distinctions: state curiosity refers to momentary experiences triggered in particular situations by one’s environment (e.g. curiosity is piqued) whereas trait curiosity refers to a general tendency towards engaging in those experiences (e.g. a curious person, Grossnickle, 2016). A prominent conceptualisation of state curiosity is the information-gap theory (Loewenstein, 1994; Markey & Loewenstein, 2014), where someone becomes aware of a gap between what they know and do not know, and experiences a strong desire for closure, resulting in information-search. This holds that curiosity arises from incongruity, and is a drive like hunger or drive for sex. Underscoring drive-based theories, food and information stimulate the same neurological reward centres (Lau et al., 2020). Analogous to sex drive, curiosity can dissipate if people are distracted, and lack of satiation does not cause death (Shin & Kim, 2019). Like other drives, curiosity may not be rational; potentially driving people to seek information with negative consequences (FitzGibbon et al., 2021).

Another framework proposes two types of curiosity; a feeling of deprivation when information gaps are present (D-type) and a feeling of interest in learning something new (Litman & Jimerson, 2004). Using psychometric measures, this framework emphasises relatedness between trait forms of I- and D-type curiosity as both are linked to desiring and seeking information (see Litman, 2019 for a review). However, while I-type curiosity corresponds with exploring new things for pleasure, D-type corresponds with urgent acquisition of specific knowledge. Trait curiosity levels (i.e. how people “generally feel” a certain way) are self-reported in response to items, e.g. “I enjoy exploring new ideas” (I-type, from the Epistemic Curiosity scale; Litman & Spielberger, 2003) or “don’t like not knowing/try to learn about complex topics” (D-type, from the Curiosity as a Feeling-of-Deprivation scale; Litman & Jimerson, 2004). Following state-trait theory (Spielberger, 1972) individuals high in I/D-type curiosity traits experience the related I/D-type state curiosity more intensely compared to individuals with low trait levels (Litman et al., 2005). Although this theoretical distinction captures two distinct aspects of information-seeking behaviour, some researchers question whether I-type curiosity should simply be considered interest, and D-type considered curiosity, instead of both being labelled as sub-types of curiosity (Hidi & Renninger, 2019; Renninger & Hidi, 2016).

Interest

A second line of research focuses on interest, with prominent frameworks also positing momentary and long-term forms (Hidi & Renninger, 2006; Krapp, 2000). Situational interest (e.g. focused attention triggered by environmental stimuli) is distinguished from individual interest (e.g. a predisposition to reengage with a subject; Renninger & Hidi, 2016). Theories of interest often propose developmental accounts of how a person moves from situational to individual interest in a topic, and the relationships between situational and individual interest (e.g. Krapp, 2007; Schiefele, 2009). The most prominent theory proposes a four-phase model of interest development (Hidi & Renninger, 2006). Under this model, situational interest is initially environmentally triggered (e.g. on noticing incongruous information/recognising personal relevance), then maintained by focused attention towards that subject. Individual interest can emerge from situational interest when developing predispositions to reengage with a subject become fully-fledged. However, situational interest can occur after individual interest has developed; they are phases not unidirectional developmental stages.

Other theories of interest (see Renninger & Hidi, 2011 for a review) distinguish one’s interest (i.e. part of an emotional experience and momentary motivation), and one’s interests (i.e. part of personality, individual differences, and idiosyncratic hobbies; Silvia, 2006). This conception treats interest as a basic motivating emotion (like happiness, fear or anger) but one that is adapted for engagement with a stimuli/topic (Silvia, 2001). Notably, under this account, interest is considered separate from simple enjoyment, while other accounts equate interest directly with enjoyment of activities (Wigfield et al., 2007).

Proposed Distinctions Between Curiosity and Interest

Disentangling curiosity from interest is complicated because frameworks describe heterogeneous and multidimensional concepts. Accordingly, lines demarcating certain dimensions may not demarcate others. For example, distinguishing long-term forms (proposed by above frameworks) is straightforward; while trait curiosity refers to one’s general disposition to experience state curiosity, individual interest refers to one’s disposition to engage with information in particular domains (Ainley, 2019; Grossnickle, 2016). In contrast, situational interest and state curiosity (including I/D-type state curiosity) are clearly closely related (both are motivated searches for information triggered by environmental stimuli) and are perhaps indistinguishable (Silvia, 2006). However, the simplicity of this particular trait-level distinction does not necessarily mean that all trait-level distinctions are straightforward. When traits are conceived of as tendencies to experience corresponding state-level forms (i.e. to experience state curiosity/situational interest), this requires state-level distinctions. In this section, we outline five distinctions between state-level interest and curiosity proposed by researchers who maintain that the concepts are meaningfully separable.

Triggers

Some researchers argue that while triggering curiosity requires only what Berlyne (1960) termed “collative variables”, disequilibrium-inducing stimuli that lead a person to perceive an information gap (e.g. stimuli involving novelty, complexity, conflict/incongruity, surprise and uncertainty), interest is triggered by a broader range of variables (Ainley, 2019; Grossnickle, 2016; Renninger & Hidi, 2016). Interest triggers are not necessarily universal, and could be subjective (Renninger et al., 2019), though death, sex and power could be universally interesting (for a review of text-based triggers, see Renninger & Hidi, 2016). Shin and Kim (2019) suggest that well-organised informationally-complete material including availability of choice, relevance, praise and social interaction can trigger interest, while incomplete information rife with information gaps can trigger curiosity.

Characterisation of Information Seeking

The character of information-search following triggering of curiosity and interest may differ, as differing search goals influence how the search is conducted. Curiosity is goal-directed towards closing an information gap, ceasing on acquisition of the specific information required to close it, while interest is goal-directed towards engagement with information more generally, and therefore does not necessarily cease on information acquisition (Grossnickle, 2016; Markey & Loewenstein, 2014; Shin & Kim, 2019). The I/D-type curiosity framework makes a similar distinction; D-type curiosity (information-gap curiosity) represents “need to know” information, whereas I-type (feeling of interest) represents more relaxed “take it or leave it” approaches (Litman, 2005).

Due to differing goals, interest states may last longer. Renninger and Hidi (2016) argue that while curiosity is short-lived (yoked to information gaps) interest has unlimited duration, as the primary goal is engagement. Curiosity, as an urgent desire to close information gaps and resolve feelings of deprivation, necessarily involves briefer information seeking as a result of the urgency combined with “stopping rules” (i.e. curiosity ceases when specific information is gained). When experiencing interest, people do not urgently want resolution, and lack stopping rules for information seeking.

Knowledge States

Curiosity and interest may be distinguished by a person’s beliefs about their existing knowledge relating to the topic of the information search. Curiosity, based on recognition of information gaps, may only be triggered when someone knows enough about a topic to recognise a relevant gap, but not so much that they think there is no gap (Loewenstein, 1994; Metcalfe et al., 2020; Shin & Kim, 2019). Studies propose an inverted U-shaped curve in people’s feelings of curiosity predicted by their perceived knowledge about a topic (Gruber et al., 2014; Kang et al., 2009; Litman et al., 2005). This does not apply to interest (but see Fastrich & Murayama, 2020), so curiosity may be distinguished from interest by how much knowledge someone believes they have regarding a specific stimulus. If information seeking is in response to a stimulus that someone believes that they know nothing or lots about (e.g. it is one of their individual interests) perhaps they are experiencing interest, whereas if in response to a stimulus that they believe they know something but not everything about, they could be experiencing curiosity or interest (Grossnickle, 2016; Hidi & Renninger, 2019). This facet also distinguishes I/D-type curiosity (Litman, 2019; Litman et al., 2005); when people believe that they “don’t know” an answer to a question, the intensity of their curiosity is predicted by I-type trait measures (i.e. their disposition to search for information for pleasure). However, when they feel that the answer is on the tip of their tongue, intensity is predicted by D-type trait measures (i.e. their disposition to seek gap-closing information).

Affect

Curiosity and interest might be affectively distinct. While curiosity is initially aversive (due to feelings of deprivation), then pleasant when resolved (due to feelings of reward; Jepma et al., 2012), interest is generally thought to be pleasant from the point of triggering (Markey & Loewenstein, 2014; Schiefele, 2009; Shin & Kim, 2019; Silvia, 2006). Information-search motivated by individual interests is more likely to be pleasant as it arises from pursuit of information in areas that people already find rewarding (Renninger & Hidi, 2016). D-type trait curiosity measures are positively associated with anxiety, depression and anger and measures indicating discomfort or frustration; while I-type trait measures are unrelated (or negatively associated) with negative affect traits and positively associated with enjoyment (Litman, 2008; Litman & Jimerson, 2004). Given that trait-intensity predicts state-intensity, D-type curiosity (feeling of deprivation) is more aversive, while I-type curiosity (feeling of interest) is more enjoyable.

Incentive Salience

Different neurological processes could underpin curiosity and interest (Shin & Kim, 2019). Under the incentive-salience system, “wanting”, linked to mesolimbic dopamine receptors and characterised by behavioural approach and the experience of desire, is distinct from “liking”, linked to opioid receptors and characterised as experiences of pleasure (and further, both are distinct from a predictive learning component; Berridge, 2012). While the “wanting” and “liking” processes often work together (e.g. wanting and liking the same reward), they can be dissociated by manipulation of dopamine; impacting the wanting, but not liking system (see Berridge & Robinson, 1998 for a review). Litman et al. (2005); Litman (2019) proposed that while both I- and D-type curiosity involve “liking”, experiencing pleasure when information is obtained (albeit less intensely for I-type), D-type involves higher levels of initial “wanting”. FitzGibbon et al. (2020) argue that incentive-salience is a purely motivational urge, accounting for the seductive lure of curiosity, i.e. the desire to irrationally seek information with negative consequences (FitzGibbon et al., 2021; Hsee & Ruan, 2016; Oosterwijk, 2017).

How to Empirically Test Theoretical Distinctions?

According to proposed theoretical distinctions (reviewed above) curiosity and interest may be triggered by different types of stimuli, characterised by different forms of information search (with differing durations), distinguished by a person’s beliefs about their own knowledge, affectively distinct, and underpinned by different neurological processes. But how can we complement this theoretical analysis and empirically examine whether distinguishing these concepts is useful or redundant? Although empirical research is lacking, we identify two approaches by which evidence might be obtained: top-down and bottom-up.

Top-down approaches initially define curiosity and interest based on a certain theoretical perspective (or convenience) and then either measure peoples’ curiosity and interest (e.g. using self-report) or implement a manipulation based on these definitions. Then researchers examine distinctiveness by testing whether they differently predict some outcome variable. One recent example of this approach found distinct predictive validity for curiosity and interest on recall (McGillivray et al., 2015), suggesting potential evidence consistent with the claim that curiosity and interest represent separate psychological processes. In this study, a measure of curiosity (self-reported on a 10-point scale from “not at all curious” to “extremely curious”) was taken prior to the answer to a trivia question being revealed, while an interest measure (“not interesting at all” to “extremely interesting”) was taken after revealing the answer (see also Fandakova & Gruber, 2021). Measures of interest (not curiosity) predicted recall for trivia question answers in adults after an hour and a week delay.

While top-down approaches are promising, distinctions between curiosity and interest are considerably constrained by the theoretical perspective taken and assessment methods used. Top-down approaches can suffer from construct underrepresentation of the two broad concepts (Downing, 2002; Messick, 1995; Spurgeon, 2017). For example, McGillivray et al., (2015) assessed curiosity and interest using single items simply asking how curious/interested participants were about an answer. While this study demonstrated potential differential predictive utility of these measures on recall, such an assessment cannot reveal broad, impactful distinctions between curiosity and interest pertinent to the broader discussion about distinguishing the terms (which was also not the aim of this study). The evidence for separation may simply reflect a distinction based on linguistic use of the terms (curiosity referring to future knowledge gain, interest referring to knowledge gained; see Silvia, 2006). Another problem is that the same assessment may be interpreted differently, depending on researchers’ theoretical perspective. Measures referred to as curiosity and interest (Fandakova & Gruber, 2021; McGillivray et al., 2015) are elsewhere called pre-answer and post-answer interest (Fastrich et al., 2018). As we lack consensus on how we should define curiosity and interest separately, top-down approaches risk underrepresenting the rich and broad concepts of curiosity and interest, and overlook important aspects suggested by other theoretical perspectives.

Bottom-up approaches represent another way to examine distinctions. These do not impose top-down definitions, instead allowing definitions to be data-driven, analysing how participants’ responses to different measures cluster together (i.e. using factor-analytic approaches). For example, analysing whether specific items from existing survey measures (of curiosity, interest, or information seeking more broadly) cluster together onto one or more factors. If they form two separable factors that could meaningfully be labelled “curiosity” or “interest” then the concepts should be meaningfully distinguished. However, if they form only one factor then they may be practically indistinguishable. Litman (2008) applied this approach to trait-level survey measures of curiosity and interest (the 10-item Epistemic Curiosity scale and 15-item Curiosity as a Feeling-of-Deprivation scale, see above), identifying two factors across 25 items, providing evidence consistent with there being two separable processes (one related to pleasure associated with discovery of new ideas, the other to spending time/effort to seek specific information). More recently, Schmidt and Rotgans (2020) applied this approach to state-level measures. They informed students (aged 12-13) that they would be taught about a topic, and collected their responses to 10 items measuring epistemic curiosity (e.g. “I would like to explore this topic in depth”) and situational interest (e.g. “I enjoy working on this topic”) constructed through content analysis of previously published measures. These state-level measures of curiosity and interest were best explained by a single factor (the constructs were near-perfectly correlated in a two-factor solution).

While also promising, these bottom-up approaches are limited in three ways. Firstly, the lack of prior conceptual clarity in labelling factors inherent in this approach makes it difficult to interpret the results. For example, Litman (2008) described both factors as curiosity (interest-type and deprivation-type), whereas others might simply describe them as interest and curiosity. Secondly, these approaches are mainly based on between-person analyses of the relation between measures of curiosity and interest, not within-person analyses (Murayama et al., 2017). Between-person analyses show whether individuals who score highly on measures of epistemic curiosity also score highly on measures of situational interest compared to other people. However, this is independent of questions asked by within-person analyses; namely how curiosity and interest might covary over time for a person (i.e. to what extent they are psychologically distinct for a person). Simply, if factor analysis suggests some factor structure for measures of curiosity and interest based on individual differences, it is not necessarily the case that the same structure accounts for mental categories within individuals (see also Borkenau & Ostendorf, 1998 for similar arguments relating to personality). Finally, bottom-up approaches are limited by item design. For example, Schmidt and Rotgans (2020) used 10 items, three simply asked whether students were curious/interested (or lacked curiosity) in the topic; offering no clarification on the nature of differences between curiosity and interest. Therefore, there is still an element of top-down design in bottom-up approaches, with associated risks of construct underrepresentation, limiting the range of distinctions that can be tested to those captured by items used.

Current Research

The purpose of the current research is to provide a much-needed basis for conceptual clarity by extending bottom-up approaches to address limitations of previous investigations. Specifically, we use novel techniques to seek a “consensus view” on what makes curiosity and interest different, that can plausibly underscore empirical investigations into whether they represent separate psychological processes, and complement prior theoretical analysis of distinctions. We therefore present a new empirical approach to advance theoretical conceptualisations about curiosity and interest.

This idea is motivated by the reward-learning framework of knowledge acquisition, which argues that curiosity and interest represent distinct experiences emerging from the knowledge acquisition process (Murayama, 2019; Murayama et al., 2019). Under this account, curiosity and interest are commonsense (also termed naïve or folk) psychological concepts, intuitively and subjectively constructed by people to describe feelings resulting from underlying psychological processes to which they lack true introspective access. Thus, while we cannot currently make hard distinctions between curiosity and interest at the level of specific psychological or neural processes (see Hidi & Renninger, 2019 for further discussion on this issue), the framework supposes that people can define them due to distinct experiences of knowledge acquisition they label “curiosity” and “interest”. The framework therefore posits that curiosity and interest can be distinguished in terms of a consensus view.

To address this hypothesis, we investigate if there is a detectable consensus on how curiosity and interest differ. We look for distinguishing characteristics of free-text definitions of curiosity and interest provided by a large sample of non-expert participants. Instead of scale responses to survey items, we use free-text responses to open questions (e.g. “Define curiosity/interest”). Participants can respond in their own words and appeal to any aspects of curiosity or interest they deem relevant, allowing us to sample a wide range of potential distinctions. Our approach therefore extends previous bottom-up approaches. Bottom-up approaches in principle represent a good starting point for the current project, as unlike top-down approaches, they do not initially rely on prior definition of the terms. This is important to the current work, because of the lack of widely agreed-upon distinctions between the terms (especially across the two research domains). However, we extend on these bottom-up approaches by addressing their limitations in answering the current question (described above). Firstly, we avoid the problem of researchers’ subjective labelling of the constructs by defining the concepts according to a consensus view. Secondly, we avoid the problem of distinctions based on between-subject comparisons by not basing our approach on differences between individuals (participants provide both curiosity and interest definitions). Finally, we address the problem of construct underrepresentation by not using restrictive survey items.

Of course, simply demonstrating that people define curiosity and interest differently is not evidence that they are underpinned by separate psychological processes. However, this approach provides an attractive conceptual starting point (i.e. by providing agreed-upon definitions of curiosity and interest) that facilitates empirical investigation into whether there are distinct psychological processes in knowledge acquisition. For example, if our extended bottom-up approach distinguishes peoples’ feelings of curiosity from interest, then this can facilitate future top-down approaches. One could use conceptualisations established through our bottom-up approach to design different stimuli that elicit feelings of curiosity and feelings of interest. Then one could test if there is different predictive validity of these types of stimuli on recall or perhaps even neuroscientific or other physiological measures, e.g. pupil dilation (Brod & Breitwieser, 2019). To ensure that the results obtained in the current approach are applicable to scientific research, we conduct a second study (Study 2) examining whether non-expert consensus definitions of curiosity and interest reasonably capture distinctions made by experts in psychological sciences. This way, we hope to provide conceptual clarity on the terms (determining the nature of the consensus view on how they are distinct) that facilitates future work on curiosity and interest.

This approach is related to, but distinct from work that investigates how non-expert commonsense (naïve or folk) understanding of scientific phenomena align with expert scientific understanding (see Gelman & Noles, 2011 for a review). For example, work investigating students’ commonsense understanding of how objects remain still demonstrates common misconceptions about gravity requiring attention from instructors (Minstrell, 1982). Or relatedly, work demonstrating that commonsense concepts about motion hamper students learning about Newtonian physics (Halloun & Hestenes, 1985; see also McCloskey, 1983). Under this approach, commonsense understanding is a useful tool for researchers to investigate knowledge development; commonsense understanding can be contrasted against an agreed-upon gold standard of understanding, e.g. children’s intuitive understanding of contamination can be compared to biologically correct explanations of contamination (Legare et al., 2009). However, our approach is different, as there are no agreed-upon expert definitions of curiosity and interest in the literature with which to compare commonsense understanding. In contrast, the reward-learning framework proposes that because curiosity and interest are labels that people assign to their own experiences of information seeking, that people’s commonsense view could instead serve as a basis for an agreed-upon definition that can help the field to advance. In this sense, it is more appropriate to determine whether non-expert definitions are shared by experts (i.e. use the non-expert view as the reference category) to determine if the non-expert consensus can plausibly be used by experts in research.

While free-text responses provide richer data than scale responses, manually processing them is problematic. Critically, in seeking aspects of definitions delineating curiosity and interest, researchers may introduce their own bias, based on personal conceptions of the terms. Instead, we avoid manual coding, employing algorithm-based (Naïve Bayes classifier) machine learning techniques. Naïve Bayes classifiers have been used to automatically categorise texts in diverse contexts, determining authorship (Airoldi et al., 2006; Clement & Sharp, 2003; Malyutov, 2005; Mosteller & Wallace, 1963; Thisted & Efron, 1987), sentiment (Go et al., 2009; Greaves et al., 2013; Hawkins et al., 2016; Pang et al., 2002; Wang et al., 2012), author’s mental health (Al-Mosaiwi & Johnstone, 2018a, 2018b), and in early detection of pandemics (Alemi et al., 2012; Chapman et al., 2004, 2005; Wilcox & Hripcsak, 1999).

Using a Naïve Bayes classifier, we look for distinguishing characteristics of people’s definitions of curiosity and interest. In Study 1, we train a classifier to accurately recognise characteristics (i.e. words used) of free-text definitions of curiosity and interest (provided by participants online). On demonstrating the classifier’s generalisability (through cross-validation), we determine what the distinctions are, inspecting words used by the classifier to distinguish curiosity and interest definitions. We then determine if these are meaningful and make inferences about how people delineate the terms. Although Naïve Bayes classifiers are not highly sophisticated algorithms, their simplicity, and ability to “show their workings” makes them especially well-suited to the current task.

Study 1

The reward-learning framework holds that the terms curiosity and interest are not interchangeable, as people have different definitions, constructed subjectively to describe different experiences of the knowledge acquisition process. If this is correct, then a classifier should be able to distinguish their definitions of the terms at above chance levels. However, if they are interchangeable, classifier performance should not exceed chance.

Method

Participants

Participants were recruited via Prolific Academic for the main data (N = 351 participants, N = 702 definitions; collected August 2019 by ED & KM), and via Mturk for an additional dataset (N = 120 participants, N = 240 definitions; June-August 2018 by SA, GF & KM; see Aslan et al., under review). All were over 18 years old and paid £2.00 GBP (Prolific) or $1.00 USD (Mturk). According to pre-registered exclusion criteria (https://osf.io/r538u), participants were excluded when they reported learning English after 12 years of age (main, n = 11; additional n = 2), reported checking the internet or consulting others about their responses (main, n = 30; additional, n = 8), for not responding in English (main, n = 5) and for responses wholly unrelated to curiosity or interest (main, n = 8; additional, n = 1). After exclusions, main data included n = 297 participants (Female = 170, Male = 126, Described differently = 1; Age: M = 29.0, SD = 9.9; Ethnicity: Asian = 14, Black = 3, Describe differently = 4, Mixed ethnicity = 14, Prefer not to say = 3, White = 259), and additional data included n = 109 (Female = 61, Male = 48; Age M = 38.8, SD = 10.3; Ethnicity: African = 7, Asian/Pacific = 20, Caucasian = 77, Hispanic = 3, Native American = 1, Other = 1).

Procedure

Participants completed an online survey (implemented using jsPsych, de Leeuw, 2015) producing two free-text definitions, one of curiosity and one of interest (minimum 80 words each). Participants were informed we were interested in the similarities/differences in these terms and there were no right/wrong answers. In the main data, participants were prompted to simply, “Define curiosity” or “Define interest”. In the additional data, participants were given an example after this prompt, i.e. “being interested in X” or “being curious about X”. Question order (curiosity/interest) was counterbalanced for the main data (curiosity first, n = 155), in the additional data all participants defined curiosity first. All participants produced separate free-text responses describing differences and similarities between the terms (not analysed in this study). Participants in the main data reported how similar curiosity and interest were (5-point Likert scale: “completely different”, “mostly different”, “somewhat similar, somewhat different”, “mostly similar”, “completely similar”). All participants provided demographic information: age, gender, highest level of education, ethnicity, age at which they began learning English.

Analysis

Software

Analysis was conducted using R Studio 1.2.1335 (RStudio Team, 2015) running R 3.6.0 (R Core Team, 2015). We used hunspell (Ooms, 2018) and quanteda (Benoit et al., 2018) packages for text-processing, caret (Kuhn, 2019) and groupdata2 (Olsen, 2019) for feature selection, the naïve Bayes classifier from naivebayes (Majka, 2019) and conducted regression analyses using lavaan.survey (Oberski, 2014; which utilises lavaan; Rosseel, 2012).

Data Pre-processing

Pre-processing involved converting participants’ free-text definitions to vectors of words (more specifically word stems, see below), as the unit of analysis. Prior to analyses, free-text definitions were manually edited to remove unrelated terms. We removed portions of text (retaining the rest of the definition) when interest was used in the financial/banking sense (definitions edited in the main data, n = 46; additional data, n = 2) and where text was a placeholder (e.g. “I don’t know what else to write”; main: curiosity n = 12, interest n = 8; additional: curiosity n = 2, interest n = 3). We removed phrases deriving from “curiosity killed the cat”, as this would potentially bias the classifier towards using these terms, which would not be informative (main, n = 27; additional, n = 7). To ensure consistent classification, we corrected spelling to British English.

After manual pre-processing, definitions were automatically processed using natural language processing techniques. Numbers, punctuation, symbols and common stop words (e.g. “I”, “and”) were removed (Bollen et al., 2011; Conover et al., 2011) and text made lowercase. Words were reduced to word stems, e.g. “questioning”, “question” and “questions” stemmed to “question” (Al-Mosaiwi & Johnstone, 2018a; Gibbons et al., 2017), and word stems associated with “curiosity” and “interest” (“interest”, “curio”, “curios” and “curious”) were removed (Go et al., 2009; Ong et al., 2010). Definitions were converted to a document-feature matrix; each definition (document) was represented as a vector with each element indicating presence (1) or absence (0) of a word stem (feature) within it.

Overview of Machine Learning Analysis

Here we provide a comprehensible overview of our analysis of for those unfamiliar with machine learning methods. We report detailed method in the Supplementary Online Materials (SOM), additionally our analysis plan was pre-registered (https://osf.io/r538u) and analysis code is available at https://osf.io/49ue2/.

Following pre-processing (see above), we trained a Naïve Bayes algorithm to recognise what word stems were most likely to be present (and absent) in curiosity and interest definitions, thus learning what a curiosity or interest definition is likely to contain (see “Classification” in SOM). To do this, the algorithm used labelled definitions (e.g. labelled “curiosity” or “interest”) to compute the conditional probabilities of word stems being present (or absent) in curiosity compared to interest definitions (these are inversely related). Through feature selection (see “Classifier training” in SOM), we identified only word stems most clearly related to one definition type over the other, i.e. those with the highest conditional probability of appearing in curiosity compared to interest definitions (and vice versa). This process identified a manageable list of word stems (from all words that participants used) that facilitated reliable discrimination between curiosity and interest definitions.

Simply reporting the computed probabilities for each word stem tells us the unique characteristics of our data. However, this is less valuable than demonstrating that probabilities learned from some data can be applied to new data (not used in training) to predict whether it is a curiosity or interest definition. This can be achieved through a machine-learning technique called cross-validation (which is analogous to replication). In this instance, cross-validation is a technique whereby a portion of data is used to train an algorithm to accurately classify that data, before testing it on separate data to see if it achieves similar accuracy. Cross-validated results, like replicated findings, are more robust than those observed only once (which may simply accommodate data exactly and not generalise).

In the current study, we split the main data into a training and test set for cross-validation (see Fig. 1: Data partition). The training set was used to train the algorithm to classify curiosity and interest definitions as accurately as possible using a list of word stems of manageable length (note: feature selection also used a type of cross-validation during training within the training set; see “Classifier training” in SOM and Fig. 1: Testing). The test set served as new unseen data to validate (or replicate) that the characterisation of curiosity/interest definitions learned by the algorithm was applicable to data not used in training. Testing the algorithm involved determining whether it accurately predicted the definition type (“curiosity” or “interest”) for unseen definitions in the test set, based on the conditional probabilities of word stems computed across the training set. To make its prediction, the algorithm computes the probability that each definition is of either type (inversely related); the type with a probability > .5 is the predicted type (see “Classification” in SOM). Thus, accuracy can be absolute (i.e. did the algorithm correctly predict the definition type), or relative (i.e. the probability computed for the correct definition type; effectively the certainty of the prediction).

Fig. 1
figure 1

Process for data partition, training, and testing the classifier

Furthermore, as a more stringent test of the algorithm’s generalisability, we also tested the algorithm on an additional data set (see Fig. 1: Data). Binomial tests assessed if classifier absolute accuracy exceeded chance (50% of definitions correctly classified) for both the test set from the main data, and the additional data (see Fig. 1: Testing).

Predicting Classifier Accuracy from Judgements of Similarities/Differences

We were interested in whether participant’s ratings of how similar/different curiosity and interest were (Likert scale) predicted classifier accuracy (test data only). We considered accuracy as absolute (binary, i.e. correct classification = 1, incorrect = 0), and relative (continuous, i.e. the probability of a definition belonging to the correct type, given the word stems the definition contained). We constructed probit regression models for absolute accuracy, and linear regression for relative accuracy, with definitions (not participants) as the unit of analysis. To account for data dependency, we corrected standard errors using lavaan.survey package (Oberski, 2014).

The important predictor was how similar/different participants rated the terms on the Likert scale. We also included potentially confounding variables as fixed effects, including demographic variables (age, gender, education and age at which the participant learnt English), and definition order (curiosity/interest first). Additionally, we included definition type (curiosity/interest) and computed the interaction effects between definition type and all other fixed effects. To aid interpretation, Likert data and age were treated as continuous and mean-centred. We effect-coded gender (male = -1, female = 1), question order (curiosity first = -1, interest first = 1) and definition type (curiosity = -1, interest = 1). Additionally, the age at which participants learnt English and education were treated as binary categorical variables and effect-coded (English learnt from birth = -1, later = 1; A levels/college as highest educational level = -1, higher education = 1). Note, models exclude two participants with missing data (they declined to report education).

Collocations

For descriptive purposes, we extracted word collocations (the frequency that words were collocated with another) for words deriving from word stems. We limited the search to two-word collocations that had >10 instances across definitions.

Results

The classifier achieved high accuracy during training and in testing. In training, feature selection identified 42 word stems which accurately distinguished curiosity and interest definitions in the training data (accurately classifying 87.13% of definitions). Importantly, the classifier was also highly accurate in predicting unseen data (Table 1), accurately predicting the correct definition type significantly above chance for both the main test data (79.29%; 95% CI: 72.98–84.71%, p < .001) and additional test data (77.06%; 95% CI: 70.91–82.47%, p < .001). This demonstrates that the classifier accurately distinguished between definitions of curiosity and interest and this ability was generalisable to unseen definitions, i.e. not overfit to training data.

Table 1 Correct classification across datasets and definition type

Table 2 shows the conditional probabilities for word stems used by the classifier (calculated from training data). Table 2 is ranked by the absolute difference (largest to smallest) between P(C = “curiosity”|fk = 1) and P(C = “interest”|fk = 1), with larger differences indicating more valuable word stems for discriminating between definitions of curiosity and interest. The most discriminative word stem was “hobbi”; definitions containing “hobby/hobbies” were more likely to be interest definitions.

Table 2 Conditional probabilities of word stems used by the classifier

Table A1 (Supplementary Online Materials) shows frequent collocations of words derived from word stems used by the classifier (see Table A2 for complete lists for each word stem, and prevalence). Inspecting collocations of words gives contextual information about how they tended to be used in definitions. The most frequent collocations involving word stems used by the classifier were “to know” (two-word phrase appearing in 52.2% of curiosity definitions, and 33% of interest) and “to learn” (33.3% curiosity, 21.9% interest).

Table 3 shows word stems that were not selected, but appeared in >10% of definitions (see Table A3 for words derived from these stem words, and Table A4 for collocations). The most common word stems (except derivatives of “curios”, “curious” and “interest”) were “can”, “want”, “thing” and “feel”. These have no discriminative value despite frequent occurrence; they can be considered “common features” of curiosity and interest definitions.

Table 3 Proportion of definitions containing word stems used in 10% of definitions (main data: n = 594), but not used by the classifier (i.e. common words that represent both curiosity and interest)

Participants mostly reported that curiosity and interest were “somewhat similar, somewhat different” (median response, 50.17% participants, Fig. 2). Fewer reported they were “mostly similar” (35.02%) then “mostly different” (12.46%). Fewest reported they were “completely similar” (1.01%) or “completely different” (1.35%). We investigated whether this rating predicted absolute (correct classification or not) and relative accuracy (predicted probability of the correct class) in the main test data (Table 4), also including potential confounding variables (see Table A5 for descriptive information). The rating had no effect on classifier accuracy for either definition type (no main effect on absolute, b = 0.007, SE = 0.038, z = 0.186, p = .852; relative, b = -0.011, SE = 0.028, z = -0.411, p = .681; no interaction with definition type on absolute, b = 0.037, SE = 0.048, z = 0.770, p = .441; relative, b = 0.030, SE = 0.034, z = 0.871, p = .384).

Fig. 2
figure 2

Participants reporting how different/similar curiosity and interest are. Top: Non-expert participants from the main data in Study 1 (n = 297). Bottom: Expert participants from Study 2 (n = 47)

Table 4 Summary of fixed effects from regression models fitting classifier absolute accuracy (probit; 1 = accurate, 0 = not) or relative accuracy (linear; 0-1) ~ (how similar or different participants think curiosity and interest are [D/S] + participant age + gender + education + age at which participant learnt English [EA] + definition order [DO]) * definition type (e.g. curiosity or interest: CI) for the main test data (n = 194 definitions), with SE corrected to account for data dependency (definitions come from n = 97 participants)

Discussion

We trained a classifier to distinguish participants’ definitions of curiosity and interest, which performed with above-chance accuracy when tested on unseen data. This demonstrated that the classifier successfully distinguished most definitions, and moreover that the distinctions used were generalisable. It also indicates that there are words commonly used by participants that reliably distinguish definitions of curiosity and interest. Additionally, some words commonly used by participants to describe curiosity and interest did not reliably distinguish them. Our results suggest that a shared consensus on the distinctions between curiosity and interest exists. The exact nature of these distinctions, and their meaningfulness are discussed in the General Discussion.

Study 2

Study 1 established that non-expert descriptions of curiosity and interest can be distinguished by a classifier. The purpose of Study 2 is to determine whether experts (i.e. psychologists and neuroscientists studying curiosity and interest) distinguish between them in line with the non-expert consensus derived in Study 1, to demonstrate if this consensus can plausibly underscore future empirical work. We test how the classifier (trained on non-expert definitions in Study 1) classifies experts’ definitions. Furthermore, we explore what measures of expertise predict classifier accuracy, to determine if certain research traditions distinguish the terms more in line with the non-expert consensus than others. This is motivated by Hidi & Renninger et al. (2019)’s proposal that subscription to the idea that there are distinctions between curiosity and interest differs across domains; i.e. researchers studying interest (mostly educational psychologists) are more concerned about distinctions, whilst researchers studying curiosity (mostly neuroscientists, cognitive psychologists and computational modellers) remain agnostic.

Method

Participants

We contacted academics whom we considered to be experts. This was defined as academics (PhD/graduate students or in more advanced positions, e.g. lecturers, professors) who had either published papers on curiosity or interest (or motivation science more broadly), or attended relevant recent conferences. We contacted those whom we could find publicly-available, up-to-date contact information (N = 288). 51 experts completed the survey (response rate = 17.71%; collected November 2019-January 2020). Participants were excluded when they reported that they had checked the internet or consulted others about their responses (n = 1), or reported that they did not conduct research on a topic related to curiosity or interest (n = 3). After exclusions, data included n = 47 experts (Female = 28, Male = 19; Age: M = 43.8 years, SD = 10.3; Ethnicity: Asian = 3, Describe differently = 2, Prefer not to say = 1, White = 41).

Procedure

The procedure was similar to Study 1; experts produced free-text definitions of curiosity and interest (minimum 80 words each). However, we asked them to respond in their capacity as an expert, as we were interested in their professional opinion. Experts were asked to “Define curiosity/interest” (counterbalanced: curiosity first, n = 27). Experts reported how similar/different curiosity and interest were (5-point Likert scale). All provided demographic information (age, gender, ethnicity) and information on their expertise; their research domain (choosing all domains that applied from: cognitive, computational modelling, developmental, educational, neuroscience, organisational, social/personality or other), time since completing their PhD, whether they conducted research in curiosity/interest, how many papers they had published in this area, and what term they used to most frequently represent this research topic (“curiosity”, “interest”, “used to the same extent/interchangeably”, “rarely or never used”). Experts reported on a 5-point Likert scale to what extent they considered themselves an expert in curiosity or interest (“only in curiosity”, “more in curiosity”, “similar expertise in both curiosity and interest”, “more in interest”, “only in interest”).

Analysis

Data was pre-processed as in Study 1. We removed portions of placeholder text (curiosity n = 4, interest n = 5) and corrected spelling. The classifier trained in Study 1 was tested on the expert data (note: two participants declined to give curiosity definitions as it was outside their expertise, so we only included their interest definitions).

Predicting Classifier Accuracy from Judgements of Similarities/Differences and Expertise

As in Study 1, we computed absolute and relative classifier accuracy. In addition to expert ratings of how similar/different curiosity and interest were, we were interested in whether measures of expertise predicted classifier accuracy. We constructed models using probit (absolute) and linear regression (relative), with SE corrected for data dependency. The important predictors were the rating of how similar/different curiosity and interest were, and measures of expertise described above. We also aimed to include potentially confounding demographic variables (age and gender). Some measures were too highly correlated to include in the same model (see Table A6). Age, number of papers published and time since PhD were all highly correlated (age/PhD time r = .90; PhD time/papers r = .58; age/papers, r = .47); as such, we only included papers published as a fixed effect, as this was of most theoretical interest as an expertise measure. The terms experts used to represent the research topic (treated as continuous, 1 = curiosity, 2 = interchangeable, 3 = interest, none responded that they did not use these terms) was highly correlated with expertise reported on the Likert scale (1 = “only in curiosity” to 5 = “only in interest”, r = .90), so we only included the Likert scale as a fixed effect, as this measure had more variance. We were primarily interested in comparing researchers who study curiosity (neuroscientists, cognitive psychologists and computational modellers) and researchers who study interest (educational psychologists), and so inspected correlations between experts’ reported research domain and expertise in either curiosity or interest (reported on the Likert scale) to confirm this approach. Expertise more in curiosity (signified by negative correlation) was strongly associated with domains of neuroscience (rpb = -.48) and computational modelling (rpb = -.32), while expertise more in interest (signified by positive association) was strongly associated with educational psychology (rpb = .55). While there was weak association between expertise in curiosity and cognitive psychology (rpb = -.06), cognitive psychology was positively associated with expertise in neuroscience (rφ = .34) and computational modelling (rφ = .33), and negatively associated with educational psychology (rφ = -.21). We therefore did not include research domain measures, confirming that the categories of interest were indicated by the expertise Likert scale. To aid interpretation, the number of papers published and the two Likert scales (similar/different and expertise), were treated as continuous variables and mean-centred. Gender, definition order and definition type were effect-coded as in Study 1.

Comparing Accuracy Across Datasets

We compared classifier accuracy on the expert data to the two test data sets from Study 1. We considered both absolute and relative accuracy, constructing two different models, with SE corrected for data dependency. Fixed effects were definition type and dataset (main test, additional and expert data), and we also included the interaction between fixed effects. To aid interpretation, definition type was effect-coded (see Study 1), and dataset was orthogonally-coded (expert vs other data: main = -1, additional = -1, expert = 2; additional versus main: main = -1, additional = 1, expert = 0).

Results

The classifier accurately predicted expert definitions of curiosity and interest (Table 1), predicting the correct definition type significantly above chance (72.83%; 95% CI: 62.55–81.58%, p < .001). This demonstrates that the classifier trained on non-expert data could accurately distinguish expert definitions.

We investigated whether how similar/different experts reported curiosity and interest were, the number of papers they had published in curiosity/interest research and their expertise in curiosity or interest predicted absolute and relative accuracy (Table 5, see Table A7 for descriptive information on predictors). There was a significant main effect of the number of papers published (absolute, b = 0.008, SE = 0.002, z = 3.585, p < .001; relative, b = 0.004, SE = 0.002, z = 2.020, p = .043). This indicates that the classifier was more accurate for experts who had published more papers on the topics. There was also a main effect of definition type (absolute, b = 0.103, SE = 0.047, z = 2.186, p = .029; relative, b = 0.116, SE = 0.032, z = 3.646, p < .001), and an interaction between expertise and definition type (relative: b = 0.049, SE = 0.023, z = 2.125, p = .034; though absolute only borderline, b = 0.060, SE = 0.031, z = 1.916, p = .055). This indicates that the classifier was less accurate for curiosity definitions (compared to interest) from experts whose expertise is more exclusively in interest (Fig. 3).

Table 5 Summary of fixed effects from regression models fitting classifier absolute accuracy (probit; 1 = accurate, 0 = not) or relative accuracy (linear; 0-1) ~ (how similar or different experts think curiosity and interest are [D/S] + papers published + expertise + gender + definition order [DO]) * definition type (e.g. curiosity or interest: CI) for n = 92 definitions, with SE corrected to account for data dependency (definitions come from n = 47 expert participants)
Fig. 3
figure 3

Relative classifier accuracy for definitions (n = 92) from experts (n = 47) by expertise. Lines represent linear coefficients (with 95% confidence intervals) extracted from relative accuracy model in Table 5

There was no significant effect of how different/similar experts reported that curiosity and interest were (main effect on absolute, b = -0.022, SE = 0.055, z = -0.410, p = .682; relative, b = -0.028, SE = 0.044, z = -0.639, p = .523; interaction with definition type on absolute, b = -0.060, SE = 0.065, z = -0.923, p = .356; relative, b = -0.036, SE = 0.053, z = -0.676, p = .499). Experts reported that curiosity and interest were “somewhat similar, somewhat different” (median response, 68.09% participants, Fig. 2). Fewer reported they were “mostly different” (14.89%) or “mostly similar” (10.64%). Fewest reported they were completely different (4.26%) or completely similar (2.13%). Compared to non-expert participants (Study 1; response on 5-point Likert scale, M = 3.22, SD = 0.72), experts were significantly less likely to report that the terms were similar (M = 2.91, SD = 0.72), t(342) = 2.70, p = .009 (Welch’s two sample t-test).

To examine whether classifier performance differed between experts’ and non-experts’ definitions, we combined the expert data with the two datasets from Study 1, and compared accuracy (Table 6). Absence of significant main effects of dataset indicated that the classifier was not significantly more or less accurate for any dataset overall, however, there were significant interaction effects between dataset and definition type. The classifier was significantly less accurate for expert’s curiosity definitions compared to their interest definitions (significant for relative, b = 0.039, SE = 0.012, z = 3.224, p = .001; non-significant for absolute accuracy, b = 0.032, SE = 0.017, z = 1.861, p = .063). The classifier was significantly more accurate for curiosity definitions compared to interest definitions for participants in the additional data (absolute, b = -0.060, SE = 0.022, z = -2.752, p = .006; relative, b = -0.040, SE = 0.016, z = -2.522, p = .012).

Table 6 Summary of fixed effects from regression models fitting classifier absolute accuracy (probit; 1 = accurate, 0 = not) or relative accuracy (linear; 0-1) ~ dataset (expert vs others [Expert], additional versus main [Add vs main]) * definition type (e.g. curiosity or interest: CI) for n= 508 definitions, with SE corrected to account for data dependency (definitions come from n = 255 participants)

Discussion

The classifier trained on non-expert definitions, accurately distinguished expert definitions of curiosity and interest, performing with above chance accuracy. This indicates that the words used to distinguish definitions of curiosity and interest by non-experts also reliably distinguished expert definitions. This suggests that experts may share the non-expert psychological consensus on how these terms are distinguished, and demonstrates that the non-expert consensus view can be used to distinguish between curiosity and interest in academic research. Experts agreed with non-experts that curiosity and interest were somewhat similar and somewhat different (but not completely similar or different) but were less likely to report that they were more similar.

Classification was less accurate for curiosity definitions, particularly for experts who specialised more in interest research (compared to curiosity research). This could be because these researchers are more likely to describe curiosity in terms of interest, using similar language across both definitions, perhaps because they conceptualise curiosity as a special case of interest. However, how similar or different experts thought the terms were did not predict classifier accuracy, and there was no correlation between this and area of expertise (see Table A6).

General Discussion

We constructed a classifier that distinguished between descriptions of curiosity and interest (from both experts and non-experts) with high accuracy and generalisability. Here, we interpret the distinctions made by the classifier and discuss whether the distinctions are meaningful. This is followed by a discussion of similarities that the classifier identified. We then discuss implications for the reward-learning framework and other theories, and limitations and future directions.

Interpreting Classifier Distinctions

By inspecting word stems (along with associated words and collocations) used by the classifier to classify definitions, we can identify and evaluate distinctions the classifier made between curiosity and interest to determine whether such distinctions are meaningful. To do so, we thematically group features together, and interpret whether they plausibly represent a consensus view on coherent, meaningful distinctions between curiosity and interest. As making inferences could be subjective, we limit discussion to reasonably identifiable themes relating to distinctions made by previous theories about distinctions (reviewed in the introduction). This has the advantage of further validating how the non-expert consensus can provide a useful conceptual basis for future academic research. Along with distinctions between trait-level forms of curiosity and interest, we argue that the consensus view appeals to distinctions based upon the characterisation of the experience of information seeking (including duration), some aspects of distinctions around differing knowledge states, affect and elements of the incentive salience system, but not to distinctions around triggers. That is to say that there is evidence to suggest a consensus exists in how feelings of curiosity differ from feelings of interest.

Long-Term and Momentary Dimensions

The classifier demonstrated that distinguishing trait-level forms of curiosity and interest is more straightforward than distinguishing momentary or state-level forms (as discussed in the introduction). Interest was distinguished from curiosity most strongly by “hobby”, “spend”, “time” and “subject”. This reflects peoples’ individual interests, whereupon people return to specific topics over time; i.e. their interests (Hidi & Renninger, 2006; Silvia, 2006). “Hobby/hobbies” in particular was often used synonymously with interest (e.g. “someone’s interest is a hobby”) and references to spending time were common (e.g. “if you have interest in something you spend time pursuing that interest”). Participants more often described interest in domain-specific terms; “subject” was commonly collocated as “particular/certain subject” (although feature selection did not select “topic” nor “object” despite frequent use). In contrast, curiosity was distinguished from interest by “world”, “around” and “us” (often collocated together or with other information-seeking terms, e.g. “we feel driven to explore and learn about the world around us”, “questioning what goes on around us”). This reflects appeals to the domain-general trait of inquisitiveness. “Human”, “nature”, “children”, and “anim” were linked to curiosity, reflecting definitions of trait curiosity, e.g. participants suggested that curiosity is part of an intrinsic natural endowment, appealing to evolutionary (e.g. “it is a part of human nature to be curious”, “natural behaviour, shared by human beings and animals”) and developmental processes (“children are really curious”).

Triggers

While novelty was related uniquely to curiosity, there was no further consensus that curiosity and interest were distinguished by different triggers. Contrasting claims that domain-general topics trigger interest (e.g. Ainley, 2019), curiosity was linked more to domain-general topics (e.g. “world around us”). The range of interest triggers could be so broad that no critical mass of participants used similar definitions, meaning our method failed to detect them. However, it is more likely that there is no detectable consensus on different triggers for curiosity and interest resulting from people’s lack of conscious awareness of what tends to trigger their curiosity or interest, or because causes are not typically used in definitions. It is worth noting that previous distinctions based on triggers are complicated by determining when someone experiences an information gap. While Shin and Kim (2019) propose that well-organised information triggers interest, it could be argued that it also triggers curiosity, precisely because such organisation makes information gaps more salient. Furthermore, proposed universal interest triggers (e.g. death), could involve information gaps (death involves a plethora of unknowns) so may pique curiosity (for a review of potentially different triggers of curiosity and interest see Hidi & Renninger, 2020).

Characterisation of Information Seeking

The consensus distinguished curiosity and interest through different characterisation of information seeking. Specifically, curiosity was distinguished by words relating to active information seeking. Interrogative terms (i.e. “ask”, “question”, “explore”, and “see”; often collocated together, e.g. “ask questions”) along with terms relating to specific informational gain, (i.e. “answer”, “inform”, “understand”) were more related to curiosity. Further, curiosity was distinguished by “desire” and “satisf”, characterising curiosity as a drive for information (questioning) that needed satisfying (with answers). While information-seeking terms were more associated with curiosity, they were not absent from interest definitions. “Know”, “learn” and “find” related more to curiosity, but appeared frequently in interest definitions in similar collocations (e.g. “to know/learn/find”, “know/learn more”, “know/learn about” and “find out”). “Knowledg” was not distinctive, suggesting that knowledge is a key component of both curiosity and interest. While both involve knowledge acquisition (consistent with the reward-learning framework), the consensus is that peoples’ experience of curiosity is characterised as more active, urgent information seeking, requiring satisfaction with specific answers. This fits with previous research that uniquely links curiosity (or D-type curiosity) with information-gaps (Grossnickle, 2016; Lau et al., 2020; Litman, 2019; Loewenstein, 1994; Markey & Loewenstein, 2014; Shin & Kim, 2019).

The consensus may distinguish interest from curiosity as an in-depth, less momentary experience. “Spend”, “time”, (commonly collocated as “spend time”), “attention” and “involv” were predictive of interest (e.g. “you want to be more involved with the subject and spend time on it to understand it even further”, “holds your attention”). This fits with the theory that while state curiosity is normally short-lived, situational interest can be momentary or more protracted (Peterson & Cohen, 2019; Renninger & Hidi, 2016), however this interpretation perhaps confounds situational interest with individual interest (as discussed above) meaning state-level distinctions are more ambiguous.

Knowledge States

Curiosity was distinct in that it involved novel information seeking. “New” related to curiosity, and was commonly collocated as “learn/discover/something new” (e.g. “it is about learning new things”). “Know” was frequently collocated in curiosity definitions as “don’t know” (e.g. “curiosity is when you don’t know very much about a particular subject”). In contrast, “already”, related to interest, and was commonly collocated as “already know” (e.g. “when you say you're interested it means you already know something about it”). This suggests that peoples’ experience of curiosity involves a feeling of not knowing some information, and search for new information, whereas interest involves the feeling of already knowing something about that information. This supports the view that curiosity (or D-type curiosity) involves plausibly closeable information gaps, (i.e. believing that information is unknown but obtainable; Litman, 2019; Loewenstein, 1994; Muis et al., 2018; Shin & Kim, 2019), and that interest does not (i.e. that information is likely to be linked to their existing knowledge base; Grossnickle, 2016; Hidi & Renninger, 2019).

The consensus does not explicitly incorporate the U-shaped curve of knowledge (Gruber et al., 2014; Kang et al., 2009), nor the difference between not knowing and tip-of-the-tongue states (Litman et al., 2005), which holds that interest (or I-type curiosity) should be associated with both zero-, and complete-knowledge states.

Affect

The consensus suggested that interest is pleasurable. “Enjoy” (e.g. “enjoy doing”), “excit” (“excited/exciting”) and “like” (“like to”) related to interest, suggesting interest is more enjoyable than curiosity. This fits with previous linking of interest (or I-type curiosity) with positive affect (Litman, 2008; Litman & Jimerson, 2004; Markey & Loewenstein, 2014; Renninger & Hidi, 2016; Schiefele, 2009; Silvia, 2006). However, the consensus did not support previous theories about the affective experience of curiosity; namely, initial aversiveness (deprivation) then pleasure (satiation). This could be because aversiveness is mild (Litman & Jimerson, 2004), or variable (Noordewier & van Dijk, 2017), or because people underestimate how much positive affect they experience after uncertainty (Wilson et al., 2005). Alternatively, this may result from complexity over ascribing positive or negative affect to curiosity, i.e. whether the characterisation of curiosity relates to initial aversiveness or pleasure accompanying satiation.

Incentive Salience

The consensus did not clearly reflect incentive salience. While “liking” related to interest, neither “wanting” nor “motiv” (e.g. “motivation”, “motivated”) related to curiosity (despite frequent use). This contrasts with proposed distinctions that curiosity involves strong feelings of wanting; motivating information seeking even when cognitive desire for information is low (FitzGibbon et al., 2020). However, curiosity was distinctly related to “bad”, i.e. curiosity may lead to bad outcomes (e.g. “through curiosity we can learn some bad things”, “you can make bad decisions wanting to know about something that might not be a good idea”). Thus, the consensus may support distinguishing curiosity as involving a motivational urge (i.e. “wanting”) when negative outcomes are likely (FitzGibbon et al., 2021; Hsee & Ruan, 2016; Oosterwijk, 2017).

Common Features of Curiosity and Interest

Up until now, our discussion has focused on potential theoretical differences between curiosity and interest. Of course, a sizable minority of non-experts (36.03%) believed the terms were mostly or completely similar. Amongst experts, this tendency was less pronounced (12.77%), but there are those in the literature who prefer to emphasise their similarities (Berlyne, 1950; e.g. Silvia, 2006). Indeed, the classifier identified common features of curiosity and interest. The consensus suggests that both curiosity and interest are related to “want/wanting”, suggesting that curiosity is not exclusively strongly related to “wanting”, despite people being more likely to describe it in terms of “desire” (see above). Both were described as “feelings”, reflecting that they are experiential states. Both were described as motivators of action; both “makes” (e.g. “makes you”, “what makes”) and “motiv” (e.g. “motivation”, “motivates”) were common features. Finally, the consensus suggests that both are related to knowledge acquisition, with “knowledge” and “topic(s)” appearing frequently at similar rates in both definitions.

A Basis for Conceptual Clarity

The consensus uncovered here distinguishes curiosity as active information seeking directed towards specific and previously unknown (novel) information. In contrast, interest was more pleasurable, in-depth, less momentary information seeking towards information in domains where people already had knowledge. However, the consensus suggests that the concepts share many similarities, namely they are both feelings of wanting, motivators, and relate to knowledge acquisition. Consensus distinctions should be contextualised in the light of this and the finding that a sizeable minority of non-experts reported that they were more similar than different. This consensus underpinned expert distinctions of curiosity and interest; the classifier based on non-expert definitions accurately classified experts’ definitions. This suggests that non-expert distinctions between curiosity and interest map well onto experts’ theoretical distinctions. Thus the consensus revealed here is plausibly a useful foundation on which to base future empirical and theoretical work. Further, the substantial consensus between experts and non-experts provides reassurance that the current debate over distinctions between curiosity and interest is not a purely theoretical and logistic one couched in academic jargon. To some extent, this accords with findings that some commonsense understandings of concepts are not simply supplanted by expert conceptions, but may exist in parallel (Gelman & Noles, 2011). However, here we argue further that the commonsense understanding of the nature of curiosity and interest underlies expert conceptions too, validating our approach of using commonsense understandings as the basis for conceptual clarity.

Note that we are not arguing that people’s beliefs precisely reflect the true psychological processes underlying the knowledge acquisition process (Murayama et al., 2019). People lack the capacity to accurately understand the psychological mechanisms that produce subjective experiences through introspection (Kihlstrom, 1987; Nisbett & Wilson, 1977); and so introspective reports are not appropriate evidence for delineating the psychological processes underlying curiosity and interest. However, as the basis for investigating whether curiosity and interest represent meaningfully distinct processes, the shared consensus provides a clear starting point. Our bottom-up approach to distinguish curiosity and interest provides definitions that are open to revision, and should prompt work to expedite revision. We hope that our work facilitates the top-down creation of task-based measures of curiosity and interest using shared consensus distinctions as a starting point. For example, one could test whether stimuli that elicit active, more urgent information seeking for specific information (i.e. eliciting curiosity) predict recall differently to stimuli that elicits more relaxed, pleasurable more in-depth information seeking (i.e. eliciting interest). In turn, pending availability of task-based curiosity and interest measures, further bottom-up methods, such as data-driven ontology approaches (see Eisenberg et al., 2019 on self-regulation; and Frey et al., 2017 on risk preference) can refine the precise nature (or lack of) distinctions between curiosity and interest.

Curiosity researchers (experts specialising in curiosity) more closely shared the non-expert consensus than interest researchers; the non-expert consensus did not distinguish interest researchers’ definitions as clearly as curiosity researchers’. This suggests that curiosity researchers are not agnostic to differences between curiosity and interest, despite using “curiosity” to represent the entire research domain. However, interest researchers may have strongly differentiated the terms, but did not describe curiosity in line with non-expert distinctions, using different terminology altogether. Alternatively, they may have described curiosity using language associated with interest, perhaps explicitly referring to interest comparatively to define curiosity.

Practically, our findings support face validity of self-report items measuring curiosity or interest. This has implications for their use in both research and educational practice. When people are asked to report their own levels of curiosity or interest or rate the curiosity or interest of others (e.g. when educators rate their student’s interest/curiosity levels), it is likely that their conceptions of curiosity and interest are in line with their commonsense understanding. For example, when students rate their interest in a topic, they are likely to report their rate of relaxed, pleasurable engagement with the topic, whereas if they are rating their curiosity, they may be rating a more urgent feeling to search for information. If one is designing educational interventions based on promoting curiosity or interest, these can be designed with the commonsense understanding underlying both student and educator’s conception of the terms in mind. This can allow for better targeting of interventions, and better communication by researchers of precisely what is to be promoted.

Complementing Theoretical Distinctions

This empirically derived consensus complements previous theoretical distinctions made in the literature, especially the theoretical analyses provided in the 2019 special issue of Educational Psychology Review. Along with Ainley (2019), we agree that trait-level distinctions (based on individual differences) are far more straightforward than state-level experiential distinctions. We note that there are a lot of common experiential features of curiosity and interest (see “Common Features of Curiosity and Interest” above), which accords well with the view of Ainley (2019) and Murayama et al., (2019), but we suggest that there is some basis to perhaps distinguish state curiosity and situational interest as representing different experiential states. The consensus also fits well with Peterson and Cohen’s (2019), Pekrun’s (2019), Shin and Kim’s (2019) and Alexander’s (2019) distinction that curiosity is singularly urgently focused on closing an information gap and so is short-lived, while interest is not focused on information gaps and can be both short-lived (e.g. situational interest) and long-term (e.g. individual interest). Along with Shin and Kim (2019) and Hidi & Renninger et al. (2019), we argue that there are some grounds to distinguish curiosity and interest on the basis of affective experience (and to a lesser extent incentive salience), but along with Alexander (2019), that it is not necessarily the case that curiosity must have an aversive component. Unlike Hidi & Renninger et al. (2019) and Shin and Kim (2019) we did not find evidence to suggest that the non-expert consensus holds that curiosity and interest have different triggers (as discussed above).

Limitations and Future Directions

Although our method yielded meaningfully interpretable distinctions between definitions of curiosity and interest, there were exceptions. Some words (e.g. “happen”, “sometime(s)”, “live”, “other”, “without”, “go”, “get”, “person”, “may”, “good”, “something”) did not provide straightforward theory-driven meaningful distinctions, despite being utilised for classification. These perhaps reflect linguistic differences between when people describe curiosity and interest (Silvia, 2006). This underlies the importance of conducting empirical work, and not to simply assume distinctions exist because the terms are used in different contexts.

Our work should complement contextually rich qualitative work. We use quantitative methods (based on word counts) to complement qualitative methods such as thematic analysis (e.g. Aslan et al., under review), providing a convergent basis for consensus. These methods represent different trade-offs between bias-reduction and interpretative richness; our quantitative analysis reduces bias but provides less rich interpretation, whereas qualitative approaches increase bias but provide richer interpretation (e.g. contextual information). Word collocations provide some contextual information (as a proxy for qualitatively establishing their context), but we acknowledge that word counts are not contextually rich. An alternative approach could be to conduct in-depth structured interviews to provide contextually rich information on people’s beliefs about curiosity and interest, such as those used in studies of children’s naïve theories about biology (e.g. Legare et al., 2009). Other complementary quantitative techniques from natural language processing research may also help shed light on people’s understanding of the concepts. In particular, latent semantic analysis could assess the ways in which participants believe curiosity and interest are similar (see Forster & Dunbar, 2009; Laham, 1997).

Despite variation in how different both expert and non-experts rated curiosity and interest, these ratings did not predict classifier accuracy; higher accuracy was not associated with people rating the concepts as less interchangeable. The descriptive consensus on distinctiveness could be dissonant from people’s beliefs about distinctions, i.e. people can describe differences, but do not believe in distinction. Alternatively, there may be core distinctions, but also a wider set subscribed to by a minority who believe the terms are more distinct. Plausibly, less well-subscribed to distinctions were not used in classification (i.e. core distinctions were more reliable), thus accuracy would be immune to beliefs about wider differences. Furthermore, it is possible that because participants were informed that we were interested in similarities and differences between the terms, and provided responses in a relatively short amount of time (compared to structured interviews) they may have emphasised differences between the concepts in their definitions (which were picked up by the classifier), even while believing them to be broadly similar. Future work is needed to investigate the relation between people’s explicit beliefs about differences, and how this aligns with tendencies to distinguish curiosity and interest.

Avoiding construct underrepresentation, we simply prompted participants to “define curiosity/interest”, and did not ask about experiences. The classifier therefore may be biased to using trait-level over state-level distinctions. However, in practice our classifier was not overly-sensitive to trait-level distinctions, accurately classifying definitions from participants asked to define more momentary experiences (additional data; “being interested in/curious about X”) as well as those who were not (main test data and expert data). Nevertheless, the list of word stems used by the classifier may not represent the full range of state-level distinctions forming the consensus, due to a combination of two factors; (1) the relative ease of using trait-level distinctions to distinguish curiosity from interest, and (2) feature selection optimising accuracy with fewer word stems. Therefore, the classifier may prefer trait-level distinctions (as more valuable for discrimination) over state-level distinctions (as less valuable) to meet requirements for parsimony. While state-level distinctions were present (discussed above), further state-level distinctions inherent in the shared consensus may have been overlooked by feature selection in favour of trait-level ones. Future study should therefore determine if the consensus about experiential distinctions between curiosity and interest is broader than outlined.