Introduction

Over the past few decades, the term resilience has been used in many disciplines generating several theories and definitions. As a psychological construct, resilience has been highly valued due to its close relationship with the ability to moderate the negative effects of stress and promote adaptation to the environment under adverse circumstances (Ahern et al., 2006; Ng Deep & Leal, 2012; Wagnild & Young, 1993). It denotes the capacity of positively facing adversity and “bouncing back” (Windle et al., 2011, p. 2) under a perspective of health and well-being promotion, as well as of quality of life (Ng Deep & Leal, 2012).

Some authors have described that resilient individuals possess self-esteem, self-confidence, belief in self-efficacy, and control over the environment, which enables them to succeed in spite of stressors (Beardslee, 1989; Caplan, 1990; Rutter, 1987; Wagnild & Young, 1993). It is, however, important to note that for some authors resilience is considered a transactional process mediated between the person and the environment, and while this interaction keeps changing throughout life, so does the individual’s ability to be resilient (Pinheiro et al., 2015; Reppold et al., 2012; Windle, 2010). Resilience represents the mitigation of risk factors and the enhancement of protective factors, as well as the interaction between the two (Ahern et al., 2006). It is a dynamic process with a multidimensional nature, and, as such, there has long been a question of whether those dimensions are a product of underlying personality traits which give rise to resilience or whether resilience is itself a distinct trait or state.

The debate of “state vs. trait” according to the Resilience Scale (RS) authors (Wagnild & Young, 1993) is difficult to resolve due to the complex nature of resilience. According to interdisciplinary studies involving biosciences and behavioral sciences, there is a contribution of genetics as well as interactions with the environment that play a role in building resilience (Cicchetti & Blender, 2006; Feder et al., 2009; Haglund et al., 2007; Plomin & Spinath, 2004). Based on this reasoning, the authors believe that resilience is a result of both state and trait and, as such, resilience can be strengthened (Wagnild, 2009). Part of the problem with this argument is that it circumvents any specific details of what characteristics (in concrete psychological terms) help to build resilience. This is reflected in the confusing terminology used in the measure (more details in the overview section), which deviates from the terms used to describe it in the literature that most of the work in this area has been based upon.

The origins of the concept of resilience can be found in two main bodies of literature: the physiological aspects of stress and the psychological aspects of coping (Tusaie & Dyer, 2004). Resilience has evolved from a variety of earlier concepts including hardiness (e.g., Kobasa, 1979), adaptability to change (e.g., Rutter, 1985), and the concept of ego-resilience incorporated in early personality inventories such as the MMPI (Hathaway & McKinley, 1943). Perhaps the core constructs which typically emerge across definitions are self-efficacy, adaptability, and problem-solving; none of which are used to describe the proposed dimensions within the RS. Applying new labels to pre-existing constructs is generally unhelpful in psychological measurement. For these to be acceptable in measurement terms, they must be empirically based.

It is evident that interest in the concept of resilience is growing, however, due to the recognized complexity of the construct, and the little consensus among researchers on the definitions and measurement, it has been a challenge to develop a single operational definition of resilience (Luthar & Cicchetti, 2000; Luthar et al., 2000; Wagnild, 2009; Windle et al., 2011). To tackle this issue, authors and work programs have conducted reviews of the literature and concept analyses to provide a benchmark to allow the operationalization of the concept and the ability to measure it (Windle et al., 2011).

The ability to measure the mind has been a subject of research and debates within the field of psychology for many years (Thurstone, 1928). Psychometric tests are, essentially, a standard and scientific method used by professionals to measure individuals’ mental capabilities, behavioral styles, attitudes, and beliefs. These are grounded in classical and modern measurement theory (e.g., Kline, 1986, 2000) which requires that they meet a scientific standard in terms of measurement properties (reliability), but that they also have a clear basis in robust theory that supports the construct (validity). And, as the need for reliable and valid instruments to assess resilience increased, so did the need to ensure data quality (Ahern et al., 2006). One way to warrant this quality is to exclusively use measures that have undergone a validation procedure, demonstrating that they accurately measure the intended construct, independently of who responds, when they do it, and to whom (Windle et al., 2011). The items should also reflect the concepts and theory they are proposing to measure and the meaning must be instantiated within the test items (e.g., Barrett, 2005; McGrath, 2005).

A number of tests have been developed to assess resilience, and these have been informed by the same literature exploring characteristics of resilient people, with an emphasis on Rutter’s (1985) work. In Rutter’s view, resilience is the capacity to ‘resist’ psychiatric disorder, and so it includes protective factors such as psychological traits, social organization and dynamics surrounding an individual. This acknowledges that resilience is a product of a person’s underlying dispositions, mediated by learning and interpersonal situations which precipitate resilient responses. Any test which purports to measure resilience needs to be grounded in a theoretical framework of this kind in order to be valid.

Although the focus of this paper is the Resilience Scale (Wagnild & Young, 1993), over the years, several subsequent scales have been created in order to measure resilience. Ahern et al. (2006) conducted a review of instruments measuring resilience with the specific application to adolescent populations. Using inclusion and exclusion criteria, six psychometric instruments underwent a full review (Baruth & Carroll, 2002; Connor & Davidson, 2003; Friborg et al., 2003; Oshio et al., 2003; Sinclair & Wallston, 2004; Wagnild & Young, 1993). While the authors reported that all six instruments had limitations in terms of psychometric properties, they determined the RS to be the best instrument to study resilience in the adolescent population due to its psychometric properties and applications in a range of age groups (Ahern et al., 2006). The Adolescent Resilience Scale (ARS; Oshio et al., 2003), the Connor–Davidson Resilience Scale (CD-RISC; Connor & Davidson, 2003) and the Resilience Scale for Adults (RSA; Friborg et al., 2003) also demonstrated acceptable credibility, however, needed further application with adolescent populations (Ahern et al., 2006). The Baruth Protective Factors Inventory (BPFI; Baruth & Carroll, 2002) and the Brief-Resilient Coping Scale (BRCS; Sinclair & Wallston, 2004) lacked evidence of appropriateness to administer to the adolescent population (Ahern et al., 2006).

Windle et al. (2011) conducted a methodological review of resilience measurement scales developed for use in general and clinical populations. They included 15 measures of resilience, including their own, and criticized the review conducted by Ahern et al. (2006) because, in their view, it lacked explicit criteria for defining good measurement properties (Windle et al., 2011). In order to address this flaw, Windle et al. (2011) used published quality assessment criteria (Terwee et al., 2007) to score each measure’s psychometric properties. The criteria addressed content, criterion and construct validity, internal consistency, reproducibility, responsiveness, interpretability, as well as floor and ceiling effects (Terwee et al., 2007). Similarly to Ahern et al. (2006), Windle et al. (2011) also found it difficult to assess the measures because they all had missing information regarding psychometric properties. However, based on their review, the CD-RISC, the RSA, and the Brief Resilience Scale (Smith et al., 2008) received the highest ratings, obtaining a rate of moderately good (Windle et al., 2011). The RS was also well positioned within the quality assessment ratings, achieving maximum score in content validity and construct validity and acceptable scores in internal consistency and interpretability (Windle et al., 2011).

Notwithstanding the above, it is important to note that none of the tests scored highly in the quality assessment, suggesting that there may not be a gold standard on which to judge the quality of resilience scales. It is, however, important to note that there were substantial omissions in the ratings despite available evidence. Given this and the fact that across measures anything from one to twelve dimensions were suggested as being the basis for resilience, then it seems unlikely that the content or construct validity has been established in a sufficiently meaningful way to enable this to be scored within a quality assessment.

The aim of this paper is to critically evaluate the Resilience Scale. It will achieve this by first providing an overview of the measure with reference to how the measure was developed, including some initial critique points. The review will then progress to focus more specifically on the psychometric properties of the measure. This will include information regarding the level of measurement, the self-report nature of the measure, and the norms and populations used in the development of the measure. The reliability and validity of the measure will then be discussed. The article will conclude by providing suggestions regarding the clinical application of the measure with reference to the limitations outlined in the critique. Recommendations for further research will then be made.

Overview of the Resilience Scale

The Resilience Scale is a 25-item self-report assessment tool published in 1993 that measures the degree of individual resilience (Wagnild, 2009, 2017; Wagnild & Young, 1993). The scale was originally developed based on a qualitative study conducted in 1987 with a sample of 24 older women who had adapted successfully following a major life event and a qualitative study of 39 caregivers of spouses with Alzheimer’s disease (Wagnild, 2016; Wagnild & Young, 1993). The authors then proceeded to validate and clarify the construct of resilience through a comprehensive review of the literature available on the topic at the time and designed an initial RS, which consisted of 50 items; each a verbatim statement from their study (Wagnild & Young, 1990). The RS was reviewed and analyzed by two psychometricians and two nurse researchers prior to further testing, and this resulted in some changes in the wording of the items (Wagnild & Young, 1993). The authors then reduced the scale to 25 items, deemed representative of five interrelated components of resilience. The RS was further developed to focus on positive psychological characteristics instead of deficits (Wagnild, 2016). This part of the process was, arguably, glossed over as a psychometrically valid one, when in reality, it was fairly qualitative. It is not a typical psychometric approach to determine the structure a priori, but rather to analyze the data factorially to determine any underlying factors. Reducing each proposed factor to five items also greatly weakens the potential scale as there may be insufficient bandwidth fidelity (Cronbach & Gleser, 1957) to adequately measure the constructs.

Five characteristics have been proposed for underpinning the RS, namely, equanimity, perseverance, meaningfulness, self-reliance, and existential aloneness (Wagnild & Young, 1993). Equanimity represents a balanced perspective that people can have on their lives and experiences and implies the ability to “sit loose and take what comes”, consequently regulating extreme reactions to adversity (Beardslee, 1989; Wagnild & Young, 1993, p. 167), often with a sense of humor (Wagnild, 2009). Perseverance represents the act of persistence despite hardship or discouragement, implying a willingness to remain involved, keep going and continue the struggle to rebuild one’s life despite setbacks (Wagnild, 2009). Meaningfulness characterizes the realization that there is something to live for — a life purpose (Caplan, 1990; Wagnild, 2009). Self-reliance denotes the capacity people have to believe in themselves and their capabilities; being able to depend on themselves while recognizing their strengths and limitations (Caplan, 1990; Wagnild & Young, 1993). Existential aloneness (subsequently labelled authenticity) characterizes a sense of individuality and the awareness that each person is unique and that people have to go through some experiences by themselves, even if other experiences can be shared. This characteristic also denotes a sense of freedom (Wagnild, 2009; Wagnild & Young, 1990, 1993).

The authors’ five underlying components, although reasonable, essentially redefined the constructs within the literature used already to describe the phenomenon. At its core, resilience is a product of emotional and cognitive fortitude although the specific psychological features which usefully describe these characteristics could be labelled in a variety of alternative ways, including self-efficacy which is far more tangible.

The pilot form of the RS was pretested for readability and clarity of items initial reliability and specificity of directions in a population of 39 undergraduate nursing students, achieving an internal consistency reliability coefficient of 0.89 (Wagnild & Young, 1993). Five other studies were conducted using the RS, the results of which supported the internal consistency and test–retest reliabilities as well as construct and concurrent validity of the scale, prior to the development and validation study (results of these are available in the psychometric properties section) (Wagnild & Young, 1993). The scale was made available in 1988 and was developed to be used, not only with a female population but also with male participants as well as across a range of ages (Lundman et al., 2007; Wagnild & Young, 1990).

In 1993, the scale was further tested on a large sample of middle-aged and older adults (n = 810) in the Pacific Northwest, obtaining a 54% response rate after the authors mailed 1500 surveys (Wagnild & Young, 1993). In addition to the RS and sociodemographic questions, participants were asked to complete measures of life satisfaction, morale, and depression, and to provide a self-report of health status ranging from poor to excellent (Wagnild, 2016; Wagnild & Young, 1993). The sample ranged in age from 53 to 95 years with a mean of 71.1 years (SD = 6.5) (Wagnild & Young, 1993). The majority of the participants were female (62.3%), 61.2% were married, 66.2% educated beyond high school, 79% were retired, 59.4% lived with a spouse, and about 47% reported very good to excellent health (Wagnild, 2016; Wagnild & Young, 1993).

During the development and evaluation study of the scale, principal component analysis (PCA) was conducted followed by oblimin rotation and Kaiser normalization (Wagnild & Young, 1993). Even though the theoretical definitions of resilience supported a multidimensional construct and the RS items were selected to reflect the five characteristics, the PCA suggested a substantial primary factor underlying the data and the scree test criterion resulted in a two-factor solution (Wagnild & Young, 1993). This was most likely an artefact of the approach given that such analyses tend to be overinclusive. This would, inevitably, lead to a mass of shared variance suggesting a single component. The resultant main factor was, thus, a heterogeneous mass of contributory features which span a range of conceptual themes. This does not mean that there are not a number of more precise constructs underpinning resilience, but rather that they have not been adequately tapped within the test. There were simply insufficient numbers of items to resolve multiple factors using a restrictive linear model such as PCA, and the authors could have applied multidimensional scaling as an alternative for example. A further difficulty arises from the fact that when higher factor solutions were examined to test the five-factor model, it was clear, by the authors own admission (i.e., Wagnild & Young, 1993) that there was substantial cross loading indicating a high level of ambiguity within some of the items. This is problematic and is evident from the item content given items in equanimity (e.g., I usually do not dwell on things that I can’t do anything about), perseverance (e.g., when I make plans, I follow through with them), and authenticity (e.g., I am able to depend on myself more than anyone else) could all just be viewed as self-efficacy. Given that this suggests a slightly repetitious test, then this would also explain the high reliability, which could be an artefact. Therefore, despite suggesting a two-factor solution, the RS continued to be presented as a unitary construct measured as a single scale. Arguably there were two potential scales with differential meanings that warranted different scores. Conceptually, there could have potentially been even more. Indeed, in other studies where the RS was adapted and tested, authors found five-factor solutions using confirmatory factor analysis (e.g., Konaszewski et al., 2021) as well as using explorative factor analyses (e.g., Lundman et al., 2007).

Factors I and II contained factor loadings at 0.40 or higher, explaining a total of 44.0% of the variance (Wagnild & Young, 1993). Factor I, labeled Personal Competence, comprised 17 items and suggested self-reliance, determination, independence, mastery, invincibility, resourcefulness, and perseverance. Factor II, labeled Acceptance of Self and Life, encompassed 8 items and suggested adaptability, flexibility, balance and a well-adjusted perspective of life. Both factors reflected, according to the authors, definitions of resilience and provided support to the construct validity of the scale. Arguably, both factors could have been labeled in various ways and the list of suggested constructs does not appear to be supported by the item content (e.g., invincibility). Subsequent analyses (albeit not their own) suggested that the RS items constitute a unitary construct (Wagnild, 2016). It would appear that the psychometric evaluation did not support the constructs in the way the authors proposed. Given that resilience is the product of various internal and external influences, it is not surprising that a factor analysis would yield one general factor which likely reflected the core of resilience, while the second factor could be more of a trait construct showing elements of both extraversion and low neuroticism. This would be aligned with a number of studies that have shown resilience to be fundamentally related to these two core features of personality (e.g., Oshio et al., 2018). The items and their factors loadings can be seen in Table 1.

Table 1 Resilience Scale factors with item loadings

The RS is applicable to almost any age group with a number of studies containing participants ranging from adolescents to the elderly (Wagnild, 2016). For example, in a recent study (Shi et al., 2021), the RS was considered to be a reliable and valid tool for assessing resilience among adolescent survivors after disasters. Moreover, Cosco et al. (2016) suggested, in their systematic review, that the RS was a suitable measure for use in older adults based on the evidence gathered. According to the author of the scale, it is essential to have the Resilience Scale User’s Guide, which can be acquired when people who want to use the scale purchase a licensing agreement (Wagnild, 2017).

The Resilience Scale has been translated and validated in several languages including, but not limited to Chinese, English, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Tamil, Turkish, and Urdu. Since 2006, more than 6000 researchers have requested permission to use the RS, administering it to over 3 million people around the world in 150 countries, making it the most widely used resilience measure (Wagnild, 2016, 2017). According to the author, at the time that the RS was developed there was no validated resilience measure, so the RS was the first scale to measure the resilience construct (Wagnild, 2017).

A further 14-item version of the Resilience Scale (RS14) was developed to fulfil the preferences of some researchers for shorter instruments in order to reduce participant burden and increase response (Wagnild, 2017). After conducting a series of large surveys, the creator of the tool made the instrument available as this proved to be strongly correlated (r = 0.97, p < 0.001) with the original Resilience Scale (Wagnild, 2017). The author also developed the True Resilience Scale for Children (RS10), which intended to measure individual resilience in children ages 7–11 years (Wagnild, 2017). While these two adapted versions have been used by practitioners, they have received less attention from researchers in terms of the reliability and validity of the scales. However, it can be argued that as these scales are both developed from the original RS, the drawing together of findings relating to the reliability and validity of the RS (as provided in the current review) will also be potentially relevant to the RS14 and RS10. The difficulty in developing even shorter versions of tests which already have an uncertain number of dimensions is that the actual quality of the measurement is reduced regardless of whether or not it correlates well. It would be of interest to look at differences and similarities around the reliability and validity of the three scales (i.e., RS, RS14, and RS10); however, this is not within the remit of the current paper and, as such, these two versions of the scale will not be examined. The following section will examine the psychometric properties of the RS in more depth, particularly in terms of its level of measurement, the self-report nature of the measure, and the norms on which it was based. The reliability and validity of the measure will then be discussed in more detail.

Psychometric Properties

Kline (1986, 2000) suggests that a good psychological test requires the following characteristics: 1) needs at least an interval scale (although this is not always achievable within psychological measurement as scores often represent a construct that is ordinal), 2) needs to be reliable, 3) needs to be valid, 4) needs to be discriminating, and 5) needs to have appropriate normative data. Essentially, the test should measure the intended construct both accurately and steadily (Kline, 1986).

Level of Measurement

The level of measurement used in the RS is inevitably ordinal given that psychological measurement can only approximate true measurement, but item format and scoring are potentially problematic. Using a 7-point Likert item, participants are required to rate how much they agree or disagree with the statements and how much they identify with them. The response options are as follows: 1 = strongly disagree; 2 = mostly disagree; 3 = somewhat disagree; 4 = neither agree nor disagree; 5 = somewhat agree; 6 = mostly agree; and 7 = strongly agree. All items are worded positively and possible scores range from 25 to 175, with higher scores reflecting higher levels of resilience (Wagnild & Young, 1993). Therefore, numerical differences between participants are possible to establish, benefiting the analysis of data (Field, 2009). This is, however, slightly atypical on Likert scales, which are normally averaged instead of using the total and are better suited to attitudinal measurement rather than state or trait constructs. After repeated administrations of the RS with different samples, it was concluded that scores greater than 145 indicate moderately high to high levels of resilience, 121–145 indicate moderately low to moderate levels of resilience, and scores of 120 and below indicate low levels of resilience (Wagnild, 2009). It is not entirely clear how the authors reached these cut-offs, except that they based it loosely around the mean. The scoring clearly needs some consideration, given that the 1 to 3 responses are all simply disagreeing, and it is almost unimportant to what extent people disagree. It would have made sense to have had an ordinal response of 1 disagree, 2 slightly agree, and 3 strongly agree. Given this, the scores only become meaningful when respondents have scored at least 5 on every item, with the somewhat problematic “neither agree nor disagree” represented by 4. Essentially, scores in the range of 25 to 125 are almost meaningless as this is a great range to have to denote low resilience. While 145 and above is suggested as denoting high resilience, the mean is approximately 145 (SD 15) so, by definition, this is the average score, and the SD should be considered when denoting high and low scores which would suggest that anything less than 130 is low, 130 to 160 is average and only anything above 160 is probably high. All of this has implications for the way the scores are interpreted. Moreover, the issue with using Likert items for personal attributes is that they tend not to work as well given the item content. Typically, Likert items have been used to assess attitudes wherein a person can disagree or agree to a particular viewpoint. When the statements are characteristics then the endorsement does not always fit with agreement versus disagreement. Rather people would typically answer in terms of never, sometimes, regularly, always. Considering the items “I follow through with plans”, or “I have self-discipline”, it would likely be challenging for people to answer this by agreeing or disagreeing, which could contribute to the responses being skewed.

Self-Report

The RS is a self-report assessment, which simplifies the administration of the scale. However, this can result in limitations to the instrument such as response set bias, especially social desirability and acquiescence (Paulhus, 1984, 2017). Wagnild acknowledges that responses to the RS tend to be negatively skewed, with most participants scoring in the upper range of the scale (i.e., maximum achievable is 175, and the average for most samples is between 140 and 148). Moreover, it is acknowledged that the most desirable/adequate responses to the RS may be obvious to most participants. Arguably, these limitations may not be so much due to social desirability, as they could be a flaw in the item response format, given that 1 to 3 are all disagreement, and for the items in the test it appears that the levels of disagreement are basically viewed the same by respondents with no clear discrimination between the lower levels. There is likely no plausible degrees of disagreement for the test items and half the Likert response could be considered redundant. The other issue is that all items are worded positively and keyed in the same direction, which means the scale is particularly vulnerable to the effects of an acquiescence response bias (Paulhus, 2017; Wagnild, 2009).

In order to overcome these biases, Wagnild (2009) suggests the rewording of statements and negatively keying some of the current items. Wagnild further advises that revising the current response set to enable a forced-choice format might also minimize some of these response biases. For example, instead of allowing seven options (including a neutral response), there could be only four possible options to each statement, thus forcing the participant to endorse one side only of a particular statement. Lastly, it is necessary to ensure anonymity to reduce the likelihood of social desirability bias occurring (Wagnild, 2009).

Norms and Populations

In the development of psychometric measures, normative populations or references are useful for researchers and practitioners to interpret the meaning of the individual scores. Moreover, the norms describe the range of scores that should be expected from the population being tested (Kline, 2000). Without norms, the interpretation at individual and group levels becomes meaningless (Kline, 2000).

While the RS was not norm referenced per se, the authors state that it can be seen as norm referenced “…in that it measures individual resilience in such a way that there is discrimination among the level of resilience among the subjects” (Scoloveno, 2017, p. 3), In addition, Wagnild and Young (1993) note that norms can be obtained from making comparisons between individual scores in relation to mean scores. Mean scores of RS items are available in the user’s guide, as well as a detailed analysis of samples divided into those scoring low and high on the RS.

In a review of 12 studies using the RS, the author of the scale concluded that the scale had been used with a variety of age groups ranging from adolescents to elderly (16 to 103 years old) (Wagnild, 2009). She reported that, in all studies, there were no age-related differences on RS scores and that the predominant group being studied was European American, highlighting the need to study the RS with respect to race and ethnicity (Wagnild, 2009).

In the reviewed studies, the sample with the lowest average RS score (average score = 111.9) was homeless adolescents (Rew et al., 2001). Average RS scores for other samples in the same review were moderate to moderately high with most scores ranging from 140 to 148 (Wagnild, 2009). The studies included in this review were conducted in the USA, Canada, Australia, Sweden, and Germany and did not present particular mean score differences based on the countries. Studies that report the adaptation of the RS to other languages and cultures have also presented mean scores providing an idea of the norms for the specific countries (e.g., Felgueiras et al., 2010). This information can be found in the specific articles pertaining to the adaptation, rather than as a whole in the RS user’s guide.

The author presents an example of the RS results and mean scores of a study (n = 1061) conducted over nine months in 2009 and 2010 through the RS website (www.resiliencescale.com) (Wagnild, 2016). The mean RS score was 135.5, which, in comparison to other studies, falls in the range of moderately low to moderate levels of resilience. The average age for participants was approximately 36 years, with the majority (64%) between 20 and 40 years of age (Wagnild, 2016). Furthermore, 77% of the participants were female as opposed to only 23% of males (Wagnild, 2016). A summary of the results can be seen in Table 2.

Table 2 Resilience Scale results

Reliability

Internal Consistency

Internal consistency is defined as the degree to which scores or answers are free from random error, implying homogeneity of content in tests with many items and internal consistency among the responses to test items (SAC-MOS, 2002). Cronbach’s alpha coefficient estimates the reliability of a measure based on their internal consistency (Kline, 2000). Accepted minimal standards for reliability coefficients are 0.70 for group comparisons and 0.90–0.95 for individual comparisons (SAC-MOS, 2002).

The five studies using the RS conducted prior to the validation and evaluation study supported the internal consistency of the scale achieving respectable reliability with Cronbach’s alpha coefficients from 0.76 to 0.90 (these can be seen in Table 3).

Table 3 Coefficient alpha for the Resilience Scale (preliminary studies)

Although preliminary studies supported the reliability of the RS, the validation study (Wagnild & Young, 1993) was a necessary next step to explore the psychometric properties of the scale. In this study, the scale achieved a high internal consistency with a coefficient alpha of 0.91. Item-to-total correlations ranged from 0.37 to 0.75, with the majority falling between 0.50 and 0.70 all being significant at p < 0.001 (Wagnild & Young, 1993).

Over the years, many studies (including both genders, all ages and ethnic groups) attested the strong internal consistency reliability of the RS (some examples can be seen in Table 4).

Table 4 Coefficient alpha for the Resilience Scale (newer studies)

Test–Retest Reliability

Test–retest reliability examines the consistency of the test over time by correlating the scores from a set of participants who take the test on two occasions (Kline, 2000). This type of reliability assumes that the quality and the construct measured will remain the same at both points in time (Kline, 2000).

At the time that the validation study (Wagnild & Young, 1993) was published, test–retest reliability of the RS was being assessed in an ongoing study of pregnant and post-partum women (Killien & Jarrett, 1993). Women in the study were administered the RS over an 18-month period, during pregnancy and at 1, 4, 8 and 12-months post-partum. The correlations ranged from 0.67 to 0.84 (p < 0.01) which was considered acceptable and suggestive of resilience being stable over time (Wagnild & Young, 1993).

The authors have not published more test–retest results, which suggests that this type of reliability needs further evaluation. Recommendations made by the author include longitudinal studies to measure how resilience changes over time (Wagnild, 2009). In spite of this, there have been some examinations of test–retest reliability in the RS when the scale was validated in different countries. For example, in their study with 215 participants using the RS, Felgueiras et al. (2010) conducted a test–retest correlation, in a sub-group of 30 participants, and achieved 0.72 (p < 0.001), showing good stability over time.

Validity

Face Validity

Face validity concerns the extent to which a test appears to be measuring what it claims to measure (Kline, 2000). Clear wording (designed to be easy to understand for the intended population to be tested) can improve the face validity of a test. In contrast, if items are too complex, participants may be discouraged and disengage from completing the measure (Kline, 2000).

According to the extensive application of the RS, the author claims the test is easy to use, readable at the 6th-grade level (12–13 years) taking only 5 to 7 min to complete by most people (Wagnild, 2016, 2017). Moreover, the author reports the scale has been used effectively, operationalizing the theoretical construct of resilience, with a range of samples from adolescents to the elderly, which demonstrates that people from different backgrounds and ages easily understand the items (Wagnild, 2016).

Content Validity

Content validity examines whether a measure includes all possible aspects pertaining to the construct under investigation (Windle et al., 2011). Kline (2000) suggests that content validity should be supported by concurrent validity.

The authors of the scale suggest that the RS possessed a priori content validity given that, during the construction of the items, they selected generally accepted definitions of resilience from the literature and drew definitions from interviews of persons who characterized the construct (Wagnild & Young, 1993). The authors reported that five themes derived from these interviews and these were further validated with research literature (Wagnild & Young, 1993). Finally, the authors recognize that the use of all positively worded items may have led to a response set bias, however, they opted to not change the statements as they were concerned that reversing the items would change their original meaning (Wagnild & Young, 1993).

Although Windle et al. (2011) awarded the maximum score for content validity to the RS in their methodological review of resilience measures, they criticized the fact that the authors did not outline the analytical approach they used to derive the five themes that serve as a foundation to their scale. Moreover, they criticized the fact that the authors claim they used generally accepted definitions of resilience from the literature, yet these are not specified in the article which makes it unclear how comprehensive the sampled items are (Windle et al., 2011).

Construct Validity

Construct validity is considered the crucial form of validity as it ensures that the test operates well as a construct, measuring what it intends to measure with clearly defined items. The construct validity essentially embraces validity of every type (Kline, 2000). It can be explored through correlations between the construct under investigation and variables that are known to be connected (Campbell & Fiske, 1959).

Multiple methods have been used to assess the construct validity of the RS and, according to the author, the accumulation of this evidence over the years supports the construct validity of the scale (Wagnild, 2017). Methods include content analysis, known groups, convergent/discriminant studies, correlation studies, factor analysis, among others.

The resilience construct, as measured by the RS, appears to be positively associated with many positive qualities, including self-esteem and optimism (Lee et al., 2008), psychological well-being (Christopher & Kulig, 2000), sense of community, social support, spiritual well-being, and goal achievement (Wagnild, 2003, 2017). In contrast, the RS appears to be inversely related to issues such as hopelessness and stress (Rew et al., 2001), passive coping, depression, anxiety, compassion fatigue, burnout, and employee turnover (Wagnild, 2017). One difficulty with the correlational approach is that relatively high scores on the RS can indicate disagreement as much they can indicate agreement, which would inevitably lead to correlational artefacts and a tendency for the measure to correlate positively with almost anything that is conceptually positive.

A study conducted by March (2004) examined the relationships between life adversity and resilience in late life development. The author found that Resilience Scale scores were significantly negatively correlated with life stress, measured using the Holmes–Rahe Stress Inventory (Holmes & Rahe, 1967), and a number of stressful events (r =  −0.43, and −0.40, respectively, both p < 0.01), and suggested that even though life stresses lower resilience, resilience upholds its buffering effects on well-being (March, 2004).

Furthermore, items from the Health Promoting Lifestyle Profile (HPLP) (Walker et al., 1987) were used to document convergent and discriminant validity of the RS in a sample of middle-aged to older adults (n = 707). The HPLP is considered a reliable and valid measure of health promotion behaviors, comprising six domains: stress management; health responsibility, nutrition, exercise, self-actualization, and interpersonal support (Wagnild, 2016). In order to support convergent validity of the RS and the RS14 it was anticipated that the scores of these two would have moderate to high (r > 0.45) correlation with the HPLP subscales as they tap into similar constructs (Wagnild, 2016). Higher correlations (convergent) were anticipated between the RS/RS14 and domains in the HPLP including self-actualization and stress management (Wagnild, 2016). Lower correlations (discriminant) were anticipated between the RS/RS14 and domains of exercise and nutrition (Wagnild, 2016). It can be seen in Table 5 that correlations were as expected, in the hypothesized directions.

Table 5 Pearson correlations between RS and RS14, and HPLP domains

While it is promising that the test overall shows robust associations to hypothetically related constructs, there is a fundamental issue with the underlying constructs proposed within the test. The spectrum of items broadly taps a range of features that are considered to represent resilience, but it remains unclear what those features are, and how they should be labeled or defined with reference to established psychological constructs.

Concurrent Validity

Concurrent validity examines the relationship between the test and other associated theoretically relevant criteria. It is measured through correlations between the test and other measures aimed at assessing the same construct (Kline, 2000).

Concurrent validity was demonstrated by high correlations of the RS with well-established and valid measures of the constructs linked with resilience and outcomes of resilience, during the validation study (Wagnild & Young, 1993). It was hypothesized that resilience would be positively related to measures of adaptation to stress such as life satisfaction and morale and negatively correlated with a measure of depression (Wagnild & Young, 1993). Furthermore, it was assumed that physical health, as an indicator of adaptation to stress, would be positively correlated with higher scores on the scale (Wagnild & Young, 1993). As demonstrated in Table 6, all correlations were significant in the expected directions at p < 0.001.

Table 6 Correlations among the Resilience Scale and depression, health status, morale, and life satisfaction

While the authors suggest the concurrent validity to be good, some studies have found it to be fairly weak (e.g., Abiola & Udofia, 2011). Gender differences also pose a problem for the measure as it is either interpreted differently by males and females, or the items are biased towards male self-aggrandizement, which itself may be a White Northern Hemisphere phenomenon. Lower scores in other ethnic groups also suggest that the items may not be viewed in the same way across cultures.

Furthermore, studies comparing multiple measures of resilience exploring how well they correlate with each other, and how more specific sub-dimensions correlate with similar dimensions offered within other tests, are probably lacking.

Predictive Validity

Predictive validity examines the extent the results of the test can predict some criterion (Kline, 2000). The author suggests that resilience, measured with the Resilience Scale, might be used to predict outcomes. Given the five characteristics underpinning the scale, individuals who score higher on resilience might be expected to self-manage chronic disease more successfully than those who score lower (Wagnild, 2009). Furthermore, those with higher Resilience Scale scores might be more likely to succeed in a weight loss or smoking cessation program (Wagnild, 2009).

A study conducted in Japan to examine the psychometric properties of the Resilience Scale (Hasui et al., 2009) confirmed its predictive validity by the finding that the RS scores predicted depressive mood two weeks later. This was still the case after controlling for depressive mood and the stressful life events, which occurred in the previous week (Hasui et al., 2009). In spite of these findings, this particular type of validity would benefit from more research.

Conclusions

The evaluation process of the psychometric properties of instruments is a complex and laborious venture (Ahern et al., 2006). When measures have reports or manuals available, this task can be easier in that it allows researchers to gain a better understanding of the measure, its norms, standardizations, reliabilities, and validities (Ahern et al., 2006). However, when this information is not available, or at least not centralized in a manual, it can be extremely complicated to understand the measure.

Despite the inexistence, according to some authors, of a current gold standard measure of resilience, the majority of the studies in resilience have used self-reported assessments, and one of the most accepted and well-established measures is the RS (Dias et al., 2016). The adaptation of this scale to approximately 40 languages makes it one of the most published and most validated to different cultures and age groups (Ahern et al., 2006; Coelhoso et al., 2017; Dias et al., 2016; Ospina Muñoz, 2007; Windle et al., 2011).

Over the years, the creators of the scale have recognized a number of limitations within the scale, particularly in terms of potential bias due to all the items being worded positively and keyed in the same direction. However, this has not been changed to date. The inclusion of low resilience items, as well as negatively worded items, could be piloted to address these limitations (Wagnild & Young, 1993), as could changes to the Likert items that would match the questions in a more useful way.

Overall, this critique has found that the RS has a number of good psychometric properties, particularly in terms of internal consistency, construct validity and concurrent validity. The scale would benefit from more research on its test–retest reliability, as well as its predictive validity, for example through its application in longitudinal studies. In addition, although research suggests the RS to have good reliability and validity across a range of populations, it is unclear as to whether the underlying constructs of the measure are the same across cultures. It has been established that community and cultural factors contextualize how resilience is defined by different populations and manifested ordinarily (Ungar, 2008). As such, it would be useful for future research to explore the construct of resilience put forward by Wagnild and Young across a range of cultures and populations and assess whether the construct remained stable. Furthermore, examining whether the two factors identified within the test operate as distinct components of resilience would be helpful given the construct is clearly not unitary. Since the domains suggested as underpinning resilience in the RS appear to have little empirical support, then it would seem imperative that the tool is reframed to reflect the structural properties identified and appropriate sub-scales scored differentially. Theoretically, the content should also reflect more established constructs within the literature, instead of using somewhat vague labels to mask heterogenous concepts.

The RS appears to be effective for use in large populations and in its intended field, however, it is likely to be more valuable when used in combination with other instruments (depending on the researchers/practitioners’ needs) in order to achieve more comprehensive results. It would be interesting to further assess the capacity of the RS as a measure to be used in the recruitment of individuals going into challenging professions such as the police, army, and fire brigade. These populations have not, thus far, been extensively examined using the RS with only a small number of studies reporting its use in these contexts (e.g., Gupta et al., 2012). As such, research could have the potential to contribute to these areas while also increasing the already large scope of the RS.