Scholars have investigated individuals’ “pathological propensity to buy,” suggesting the possibility of problematic shopping behavior (PSB; Aboujaoude, 2011; Aboujaoude, 2014; Andreassen et al., 2015; Georgiadou et al., 2021; Kyrios et al., 2018; Moulding et al., 2017 Müller et al., 2019; Müller et al., 2021a, 2021b; Rahman et al., 2018; Rigby, 2011; Uzarska et al., 2021). Many terms have been used to describe PSB (e.g., “compulsive buying,” “compulsive spending,” “shopping addiction,” “shopaholism,” “problematic shopping”) suggesting that such problematic behaviors  are associated with an inability to regulate emotions and/or excessive impulsivity (Christenson et al., 1994). However, recent research suggests that much like psychoactive substance addictions, PSB, and other problematic behaviors are best understood from an addiction perspective, given that internal factors (e.g., distress) and external factors (e.g., environmental cues) precipitate cue reactivity providing the basis for craving and anticipation of rewards (Gomez et al., 2022; Starcke et al., 2018).

Previous research has identified negative consequences associated with PSB (e.g., financial solvency, compromised social relationships, psychological distress), suggesting the recognition of shopping addiction as a distinct behavioral addiction in psychopathology classification manuals such as the DSM-5 or ICD-11 (Andreassen et al., 2018; APA, 2013; Dittmar, 2005; Griffiths, 1996, 2017, 2018; Hartston, 2012; Uzarska et al., 2021; WHO, 2019; Zarate et al., 2022; Zhao et al., 2017). However, concerns have also been raised regarding the potential risk of over-pathologizing common behaviors (e.g., work, exercise, sex), suggesting that problematic behaviors are likely to manifest when individuals engage in subjectively enjoyable activities (Kardefelt-Winther et al., 2017; Niedermoser et al., 2021). Given these concerns and the relatively recent conceptualization of behavioral addictions, it is important to evaluate which symptoms (if any) may be problematic or indicative of impaired wellbeing due to problematic shopping, and therefore providing conceptual clarification and empirical validity to PSB.

The lack of consistency surrounding the recognition of PSB as a formal diagnosis has raised questions concerning prevalence rates and individual differences (Andreassen et al., 2018; Georgiadou et al., 2021; Granero et al., 2016; Otero-Lopez et al., 2021; Potenza, 2014; Uzarska et al., 2021). For example, the reported prevalence of PSB has ranged between 4.9 and 16.2% (Black, 2001; Dittmar, 2005; Duroy et al., 2014; Maraz et al., 2016), with a hypothesized ascending trend due to consumerism and the recent COVID-19 pandemic (among other factors; Georgiadou et al., 2021; Granero et al., 2016; Niedermoser et al., 2021). Additionally, studies investigating gender differences in PSB have reported mixed findings, with some showing higher prevalence among females (Dittmar, 2005; Maraz et al., 2016; Otero-Lopez & Villardefrancos, 2014) and others reporting no gender differences (Jiang & Shi, 2016; Müller et al., 2010). These observed discrepancies could be partially attributed to the lack of solid psychometric understanding of the instruments used to assess PSB (Georgiadou et al., 2021).

Past research has employed multiple psychometric instruments encompassing a variety of definitions/conceptualizations of PSB (e.g., Compulsive Buying Measurement Scale, Valence et al., 1988; Online Shopping Addiction Scale, Zhao et al., 2017; Compulsive Online Shopping Scale, Manchiraju et al., 2017). Within this broader context of measurement inconsistencies and dearth of solid psychometric findings, the seven-item Bergen Shopping Addiction Scale (BSAS; Andreassen et al., 2015) has been used more consistently based on its versatility (i.e., online shopping and in-person shopping), promising psychometric performance, and sound theoretical approach (i.e., components model of addiction; Griffiths, 1996, 2005; Kaur et al., 2019; Tanoto & Evelyn, 2019; Uzarska et al., 2019, 2021; Zhao et al., 2017). It has been proposed that PSB includes seven core symptoms, comprising (i) excessive preoccupation with shopping (salience), (ii) shopping to change mood state (mood modification), (iii) inability to fulfill daily obligations due to shopping (conflict), (iv) increased amount of shopping over time to obtain satisfaction (tolerance), (v) return to excessive shopping after a period of controlled shopping (relapse), (vi) irritability and frustration in the absence of shopping (withdrawal), (vii) and impaired wellbeing due to excessive shopping (problems; Andreassen et al., 2015; Griffiths, 2005). Despite these advantages in using the BSAS, to the best of the present authors’ knowledge, there is limited evidence evaluating the scale at the item level employing advanced approaches such as item response theory (IRT). Such work would add clarity to the assessment of PSB and the appropriate estimation of its prevalence rates.

Item Response Theory

It has been proposed that IRT outperforms classical test theory (CTT) approaches due to its ability to (i) assess relationships between item(s) and constructs, and therefore (ii) produce generalizable and sample independent results (Hambleton et al., 2010; Kircaburun et al., 2020). More specifically, IRT uses a logit function and logistic parameters (discrimination, α; difficulty, β; and pseudo-guessing, c) to assess item behavior at different levels of a latent trait θ (Embretson & Reise, 2013). In IRT, α evaluates how well an item “discriminates” between different θ levels (PSB), β examines the probability of endorsing an item at different θ levels, and c represents the probability of guessing “the correct response” to an item (De Ayala, 2008). Accordingly, IRT models can be estimated based on research needs, including “Rasch” models assuming equality constraints on α (Mellenbergh, 1994) or the graded response model assuming different α across items (GRM; Samejima, 1996). Although other models could be estimated (e.g., generalized partial credit, GPC; Muraki, 1992; nominal model for nominal and ordinal responses), the present study focuses on the Rasch and GRM due to its suitability for ordered polytomous items (Gomez et al., 2019; Marmara et al., 2021; Zarate et al., 2021).

Additionally, IRT provides three attractive features. Firstly, it can produce conditional precision indices (i.e., increased information produces lower standard errors increasing precision) to determine the reliability of a given instrument at different θ levels (Culpepper, 2013; Thomas et al., 2018). Secondly, it enables the estimation of prevalence rates via the employment of Summed Scores Expected a Posteriori (SSEAP [θ|x]) based on participants’ response patterns (i.e., raw scores ± 2 SD beyond the mean; Cai et al., 2011; Thissen, 1995). Thirdly, it can provide differential item functioning (DIF) statistics to investigate the equivalence of psychometric properties across groups (e.g., males and females; Meade & Wright, 2012).

The Present Study

The present study adds to the extant literature by (i) investigating the psychometric properties of the BSAS including items’ discrimination (α) and difficulty (β), (ii) proposing an optimal raw cut-off score, and (iii) DIF statistics at the item level across males and females. These considerations are important since they may help identify items to be prioritized in clinical assessments based on the severity of the different PSB presentations. Additionally, they may enhance clarity considering the PSB prevalence rates and gender differences.

Method

Participants

The initial sample comprised 1097 English-speaking individuals from the general community. However, 129 responses were removed due to being invalid (e.g., spam, incomplete responses). Therefore, a final sample of 968 individuals aged between 18 and 64 years participated (Mage = 29.5 years, SD = 9.36; 315 females, 32.5%). The sample used in the present study exceed the suggested minimum sample size for IRT analysis (N items*15; 7 × 15 = 105; Sahin & Anil, 2017). Table 1 provides descriptive statistics, and Supplementary Table 1 provides demographic statistics. Gender groups showed homogeneity of variance (Levene’s F = 1.306, p = 0.254) and females scored significantly higher on the BSAS than males (t [895] = 3.949, p < 0.001).

Table 1 Addictive behaviors descriptive statistics (N = 968)

Instrument

Bergen Shopping Addiction Scale (BSAS): The BSAS (Andreassen et al., 2015) assesses the risk of shopping addiction using seven items rated on a five-point Likert scale ranging from 0 (strongly disagree) to 4 (strongly agree). Originally item scores ranged from 1 (strongly disagree) - 5 (strongly agree) and have been converted to start from 0 to serve the present IRT analyses’ purposes. Each item relates to an element of the “components model of addiction” including salience, mood modification, tolerance, withdrawal symptoms, conflict, relapse, and presenting problems (Griffiths, 2005; Kim & Hodgins, 2018). Examples of items include “I think about shopping/buying things all the time.” Total possible scores range from 0–28, with higher scores indicating a higher risk of shopping addiction. The scale’s internal reliability in the present study was excellent (Cronbach’s α = 0.88, McDonald’s ω = 0.88).

Procedure

The study was advertised via email (on the Victoria University student platform) and social media (Twitter, Reddit, Facebook, Instagram) after obtaining approval from the research team’s university Ethics Committee. Individuals over 18 years were eligible to participate and invited to complete an online survey including demographic questions and the BSAS. A Plain Language Information Statement was available upon accessing the link to ensure participant eligibility criteria were met (i.e., being adults), obtain informed consent, and ensure participation was voluntary. Data were collected between November 2020 and January 2021.

Statistical Analyses

Statistical analyses followed a sequential process. First, IRT models were estimated with IRT-PRO (Cai et al., 2011). Model fit was concurrently determined by: (i) traditional fit indices (χ2Loglikelihood); (ii) marginal likelihood information statistics M2 (one and two-way marginal tables to correct for potentially sparse information); (iii) RMSEA (< 0.06 = sufficient fit; Hu & Bentler, 1999; Gomez et al., 2021a); and (iv) estimation of error prediction based on Akaike information criterion (AIC; Akaike, 1974) and Bayesian information criterion (BIC; Schwarz, 1978). Given the potential sensitivity of M2 to large samples (N > 900), emphasis was placed on RMSEA to assess model fit (De Ayala, 2008). Subsequently, the best fitting model was determined based on Δχ2Loglikelihood (Gomez et al., 2021b). Secondly, following past recommendations (Zarate et al., 2021), DIF statistics using Wald tests were obtained for all items with p < 0.05 as indication of non-invariance. Subsequently, to avoid increasing type 1 error, invariant items were anchored, and only non-invariant items were assessed. Thirdly, the conversion of the BSAS raw scores into addictive shopping risk levels was conducted based on SSEAP [θ|x] to classify participants exceeding ± 2SD as high risk (Cai et al., 2011; Embretson & Reise, 2013).

Results

Missing values showed no discernible pattern (MCAR; Little’s χ2 = 23.9, p = 0.247; Little, 1988), and ranged between 1.50 and 2.60% satisfying the maximum recommended threshold (< 5%; Schafer, 1999). Therefore, IRT assumptions were tested. Firstly, the R Studio-Lavaan package (Rosseel, 2012) was used to fit a confirmatory factor analysis (CFA) and test BSAS unidimensionality employing the weighted least squares means and variance adjusted (WLSMV) estimator due to its ability to deal with polichoric matrices and asymptotic distributions (Enders & Bandalos, 2001). Following the cut-off values outlined in Li (2016), goodness-of-fit indicators suggested sufficient fit to the data (χ2[14] = 50.39, p < 0.001; CFI = 0.989; TLI = 0.984; RMSEA = 0.05 [CI 0.04, 0.07]), with all items loadings saliently on one factor (standardized λ = 0.672–0.828; see Fig. 1). Secondly, pairwise residual correlations (LDχ2 statistics; Chen & Thissen, 1997) showed that items were locally independent (i.e., LDχ2 < 10; see Supplementary Table 2). Finally, the BSAS showed monotonicity (i.e., raw score continuously increased with increments in θ), as demonstrated by the test characteristic curve (TCC).

Fig. 1
figure 1

Factorial structure of the Bergen Shopping Addiction Scale (BSAS) showing standardized factor loadings

IRT models were estimated using the Bock-Aitkin marginal maximum likelihood algorithm with expectation–maximization (Bock & Aitkin, 1981). Both the Rasch (M2[656] = 1569.37; p < 0.001; χ2Loglikelihood = 12,159.35; RMSEA = 0.04; BIC = 12,303.35; AIC = 12,649.68) and the GRM (M2[669] = 1831.48; p < 0.001; χ2Loglikelihood = 12,331.56; RMSEA = 0.04; BIC = 13,256.55) demonstrated sufficient fit (Hu & Bentler, 1999). However, when α was constrained to be equal across items, there was as a significant drop in fit (Δ χ2loglikelihood[7] = 172.21, p < 0.01) indicating that the GRM provided superior fit (Gomez et at., 2019). Except for Item 2 (mood modification), all items showed appropriate fit. Therefore, results should be interpreted with caution (see S-χ2 diagnostic statistics in Supplementary Table 3).

Item Parameters and DIF Statistics

Considering α, all items were in the high to very high range (0 = non-discriminative; 0.01–0.34 = very low; 0.35–0.64 = low; 0.65–1.34 = moderate; 1.35–1.69 = high; > 1.70 = very high; Baker, 2001). The descending sequence of the items’ α is Item 4 (tolerance), Item 7 (presenting problems), Item 3 (conflict), Item 5 (relapse), Item 6 (withdrawal), Item 2 (mood modification), and Item 1 (salience; see Table 2 and Fig. 2). Considering β, there were fluctuations between the different thresholds across the seven items. For example, while the ascending item sequence of β for the first threshold (β1strongly disagree) was Items 1, 2, 6, 5, 4, 3, and 7, the ascending sequence in the fourth threshold (β4strongly agree) was Items 2, 1, 4, 5, 7, 3, and 6. Nonetheless, β values gradually increased for all items as the “difficulty” of endorsing an item increased, indicating that all items performed accordingly. Considering c, values progressively decreased with increments in Likert categories (i.e., from c1strongly disagree to c4strongly agree), suggesting that participants’ pseudo-guessing diminished with more “difficult” options.

Table 2 Bergen Shopping addiction Scale item discrimination (α), difficulty (β), and pseudo-guessing (c) parameters
Fig. 2
figure 2

BSAS item characteristic curves (ICCs) and item information functions (IIFs). Here, theta (θ) represents latent trait levels, and probability indicates the likelihood of endorsing an item at different Likert categories. For example, 0 represents strongly disagree and 4 represents strongly agree. The dotted lines represent conditional reliability indices, with increased levels of information obtained as standard error measurement decreases

Wald tests were employed to identify potentially significant DIF across gender groups. Interestingly, DIF showed that most BSAS items were invariant across gender groups suggesting that the BSAS captures the risk of shopping addiction similarly among males and females. However, Item 2 (mood modification) demonstrated non-invariance in β across all thresholds (χ2cja[4] = 12.5, p = 0.014; Table 3 and Fig. 3). More specifically, β threshold for males included β1 =  − 1.06; β2 =  − 0.01; β3 = 0.53; and β4 = 2.14. Alternatively, β threshold for females included β1 =  − 1.25; β2 =  − 0.34; β3 = 0.10; and β4 = 1.76. This indicates that males require a higher risk of shopping addiction to endorse this item when compared to female participants.

Table 3 Bergen Addiction Shopping Scales differential item functioning (DIF) across male and female participants
Fig. 3
figure 3

IIF for non-invariant items. Here, Item 2 (mood modification) shows significantly higher β for males (group 2) than females (group 1) suggesting that males require higher risk of shopping addiction to endorse this item

Considering item information, interesting fluctuations across items and θ levels were observed. More specifically, Item 4 provided the highest level of information between − 0.5SD and + 2.5SD, Item 7 between 0SD and + 2.5SD, Items 3 and 5 between − 0.5SD and + 2.5SD, and Items 1, 2, and 3 provided very limited information across θ levels (see Item Information Function, IIF—dotted line, Fig. 1). This indicates that Items 4, 3, 7, and 5 should be prioritized when assessing individuals above mean risk of shopping addiction levels, and more specifically Items 3 and 7 should be emphasized when assessing individuals with extremely high (+ 2SD) risk of shopping addiction scores (see Table 4).

Table 4 Item information function (IIF) values for θ levels ranging from − 2.8 to 2.8 on the Bergen Addiction Shopping Scale

IRT Properties at Scale Level and Prevalence

Considering the performance of the scale, the BSAS demonstrated good properties. More specifically, the test characteristic curve (TCC; Fig. 4 left panel) demonstrated a steep increase of BSAS raw scores as θ (PSB) increases. Similarly, the test information function (TIF; Fig. 4 right panel) indicated that the BSAS provided increased information for θ levels between − 0.5SD and + 2.5SD. However, the scale may not provide such high information at high (+ 2.5SD) and low (− 0.5SD) risk of shopping addiction values.

Fig. 4
figure 4

BSAS test characteristic curve (TCC; left panel) and test information function (TIF; right panel). The TCC illustrates the appropriate performance of the Bergen Shopping Addiction Scale as a scale, with risk of shopping addiction increasing as scores increase. The TIF illustrates the conditional effect of standard measurement error (SEM; dotted line) on reliability indices, with increased reliability for reduced SEM

Considering raw BSAS scores, the SSEAP [θ|x] identified scaled scores of 5 = 0SD, 14 =  + 1SD, and 23 =  + 2SD based on participants’ responses to all seven BSAS items (Table 5). Therefore, a score of 23 could be recommended as a conditional diagnostic cut-off point (prior to clinical assessment confirmation). Based on this cut-off point for risk of shopping addiction, 8% of participants (n = 75) in the sample exceeded it with no significant differences between males and females (χ2[1] = 0.289, p = 0.519). Additionally, raw BSAS scores between 14 and 23 could be used to identify medium risk of shopping addiction.

Table 5 Summed Bergen Addiction Shopping Scale score to scale score conversion based on expected a posteriori distribution

Discussion

The present study used IRT to (i) investigate the psychometric properties of the BSAS, (ii) assess its differential functioning across males and females, and (iii) estimate the proposed cut-off score for risk of shopping addiction in an adult English-speaking sample. The results demonstrated the BSAS to be a unidimensional measure for the risk of shopping addiction. All seven items showed sufficient discrimination (α), difficulty (β), and precision, indicating that the BSAS is a psychometrically sound instrument. Additionally, while six items assessed the risk of shopping addiction among males and females similarly, Item 2 (mood modification) required significantly higher latent trait levels in males to endorse the item. Finally, a BSAS score of 23 was identified as a proposed cut-off for risk of shopping addiction, with 8% of participants exceeding it, and no significant differences were observed between male and female prevalence rates.

BSAS Structure

In line with previous studies, the BSAS demonstrated a unidimensional factorial structure and good psychometric properties (Fig. 1; Andreassen et al., 2015). The seven-item BSAS is based upon the components model of addiction (salience, mood modification, tolerance, withdrawal, conflict, and relapse) with the additional inclusion of “presenting problems” (Griffiths, 2005). As such, this instrument represents each component with one item, allowing it to maintain its theoretical basis while providing a practical and succinct instrument (Voss et al., 2013). Given the current debate in the field of behavioral addictions, it is important to employ sound theoretical models with clinically identifiable symptoms to contribute to a cohesive body of empirical evidence supporting the recognition of diagnoses and/or disorders such as PSB.

IRT Properties and DIF Statistics

Considering the BSAS, the risk of shopping addiction inclined sharply as the total score increased, demonstrating a positive correlation between the BSAS scores and shopping addiction presentation. Thus, the BSAS can be an adequate instrument to measure the risk of shopping addiction among individuals with differing levels of problematic shopping presentation. Moreover, all IRT parameters showed interesting variability across items when considering different levels of shopping addiction risk. In line with previous literature, variations in α indicated that the GRM provided the optimal solution to fit the data (Marmara et al., 2021; Zarate et al., 2021). Much like previous literature investigating behavioral addictions, tolerance demonstrated the highest α highlighting the item’s ability to detect subtle changes in the risk of shopping addiction (Gomez et al., 2019; Kircaburun et al., 2020; Primi et al., 2021). For example, Gomez and colleagues (2019) indicated that components relating to tolerance often show higher discrimination power concerning disordered gaming. Tolerance is characterized by a progressively higher engagement in the problematic behavior over time to derive the same pleasure or satisfaction as originally felt when engaging in the behavior, and it may lead to addiction or disordered behaviors (James & Jowza, 2019). Therefore, clinical questions related to tolerance may be prioritized.

Considering β, all items showed a gradual increase between the first and last point of the Likert scale. However, the sequence of β changed depending on the items’ threshold. For example, while salience showed the lowest β1 (strongly disagree), mood modification showed the lowest β4 (strongly agree). Mood modification showed a combined low α and β compared to other items suggesting that this item may indicate less severe problems and may not accurately detect changes in risk of shopping addiction. In line with the self-medication model of addiction and the Interaction of Person-Affect-Cognition-Execution (I-PACE; Brand et al., 2016), mood modification may result in individuals engaging in disordered shopping as a maladaptive coping mechanism and to actively seek physiological stimulation with direct and observable effects on mood state and reducing psychological distress (Kovacs et al., 2022). However, results of the present study suggest that mood modification may initially attract individuals to engage in the disordered behavior and may not represent chronic addiction-like symptoms or severe problematic behavior (such as tolerance). Moreover, in line with previous behavioral addiction studies (Gomez et al., 2019; Lin et al., 2017; Primi et al., 2021), withdrawal showed the highest β4, indicating that this item could be indicative of severe risk of shopping addiction. Interestingly, presenting problems showed both high α and β, suggesting that this newly added item is useful in detecting different levels of risk of shopping addiction.

In addition, DIF statistics confirmed that Items 1 and 3–7 assess shopping addiction risk in the same way for males and females. However, endorsing β thresholds in Item 2 (mood modification) required a significantly higher risk of shopping addiction among males. In other words, mood modification may indicate a less severe risk of shopping addiction among males suggesting that females may be more prone to engage in shopping activities to modify their mood. While previous research has reported mixed findings concerning PSB prevalence rates across males and females (Maraz et al., 2016), theoretical perspectives suggest that females may be conditioned to engage more frequently with shopping activities to modify their mood or stress levels than males (Dittmar, 2005). However, this assertion should be approached with caution considering that gender constructs are being constantly challenged producing fundamental changes in the social fabric of Western societies (Van Droogenbroeck & Van Hove, 2020).

Item and Scale Precision

Considerable variations in precision were observed across BSAS items. More specifically, conflict, tolerance, and presenting problems increased precision between − 0.5SD and + 2.5SD. Conversely, salience, mood modification, and withdrawal demonstrated limited precision across latent trait levels suggesting that they may be less accurate compared to other items. Additionally, none of the items provided sufficient information to reliably identify individuals with significantly low levels of shopping addiction (− 3SD to − 2SD). Indeed, the total information function (TIF, Fig. 4 right panel) demonstrated a significant decrease in precision at the scale level reflecting items’ behavior. Nonetheless, the scale provides excellent precision between − 0.5SD and + 2.5SD, suggesting that the BSAS is an accurate and reliable instrument to capture the risk of shopping addiction within this range.

Cut-off Scores and Prevalence of Risk of Shopping Addiction

Based on the present sample, raw BSAS scores’ translation into scaled scores indicated that a cut-off of 23 or above (out of 28) represents scores + 2SD above the mean, and thus indicate a high risk of shopping addiction (Embretson & Reise, 2013; Thissen, 1995). Accordingly, following this suggested cut-off score, 8% of participants (n = 75) were considered at-risk of shopping addiction. Additionally, respondents recording BSAS raw scores between 14 and 23 are suggested to be at medium risk of shopping addiction (+ 1SD to + 2SD), and less than 14 are less likely to experience the risk of shopping addiction. Prevalence rates showed no significant differences between males and females, suggesting a possible bias effect due to social desirability (Biolcati, 2017).

Limitations, Further Research, and Conclusion

Despite robust findings, there are several limitations in the present study. Firstly, the findings here may not be generalizable to other cultures or languages given that the sample used in this study only comprised English-speaking participants. Secondly, the convenience sampling used to recruit participants may have attracted individuals from the online community and, therefore, may not represent the larger community. This may explain the BSAS’ limited functionality in extremely low scores below the mean. Thirdly, the self-reporting nature of the scale may have enabled social desirability to operate as a confounding factor attenuating potential differences between males and females (Fisher & Katz, 2000). Fourthly, considering that the recruited sample had a large percentage of male participants, further studies with more balanced samples may be needed to replicate the preliminary findings reported here. These limitations may be addressed in future research. Additionally, it may be interesting to investigate shopping addiction by age, as different age groups are likely to have different propensities to develop such behavior.

Despite these weaknesses, the present study provides further evidence of the seven-item BSAS as a valuable and psychometrically sound instrument for assessing the risk of shopping addiction. Overall, the findings observed here demonstrate meaningful differences in item discrimination, difficulty, and precision, which can be used to assess the risk of shopping addiction. Considering IRT item parameters, tolerance (Item 4) appears to be the item with the highest discrimination power, while mood modification (Item 2) appears to perform differently across the two genders.