Introduction

Several hundred thousand people live in German care facilities. At the end of the first half of 2020, around 731,000 people were cared for in Germany’s full inpatient care facilities. According to this, the number of residents in the facilities has risen steadily over the past 10 years: from 620,249 in 2010 to 676,584 in 2015 and now around 731,000. Among 80- to 85-year-olds, one in four people already need care (Statistisches Bundesamt, 2020). If the number of older adults in the population increases, the number of people in need of care and assistance is also likely to rise. Almost two-thirds of people in Germany suffer from three or more chronic diseases associated with physical and cognitive limitations in old age (RKI, 2015). However, there is no linear relationship between such limitations and subjectively perceived quality of life (QoL) and well-being (WB); QoL and WB can be subjectively experienced as high despite physical or cognitive limitations (Megari, 2013). Therefore, the maintenance and improvement of QoL are increasingly seen as essential tasks for health care.

Numerous studies and surveys show that people’s desires and expectations in need of care and their relatives are focused on providing good care and including WB factors such as consideration of individuality and participation, and respect for human dignity (García & Ramírez, 2018). Over the years, this has been recognized, and WB and QoL have become central in caring for older people and an essential topic for research (van Leeuwen et al., 2019). While the concept of QoL finds its scientific origins in various disciplines, especially in philosophy, sociology, and medicine, the term WB originates from psychology (Schumacher, Klaiberg, & Brähler, 2003). Earlier research focused mainly on QoL’s social and economic indicators, such as financial security, social equity, health care. More recently, subjective indicators of QoL, such as subjective WB and life satisfaction, have been increasingly considered (Diener & Suh, 1997).

Although research on QoL and WB has a long history, there is considerable debate in the literature about what is meant by QoL and WB. Attempts to describe and define QoL and WB in a universally valid way usually failed because they are individual and influenced by age and related circumstances (Oppikofer & Mayorova, 2016). Moreover, a clear distinction between the two concepts is hardly possible; sometimes they are used synonymously, or aspects of WB are used to define QoL; although they are associated with different theoretical concepts (e.g., Ryff’s Psychological Well-being, Ryff & Keyes, 1995): eudaimonic well-being emphasizing the realization of a person’s potential (Ryan & Deci, 2001); Diener’s (2000) hedonic approach of Subjective Well-Being; PERMA model (Seligman, 2018); Wilson & Cleary Model of HRQoL (1995) (for an extensive discussion see Das, Jones-Harrell, & Fan, 2020; and Skevington & Böhnke, 2018). Subjective WB and subjective QoL are overarchingly described as key concepts that describe experiences, abilities, states, behaviors, appraisals, and emotional responses to circumstances. Perhaps most widely used is the concept of QoL as a multidimensional construct that encompasses physical, emotional, mental, social, and everyday functional factors (Bullinger, 2014; Koller et al., 2009). Health-related QoL is defined as how subjective QoL is influenced by physical and mental health (Karimi & Brazier, 2016). Subjective WB (SWB) is composed of general life satisfaction, satisfaction with individual life domains, the presence of positive feelings, and a low level of negative feelings (Diener, 2000). In a recent study using network analysis, Skevington and Böhnke (2018) proposed an integrated model of SWB and QoL through testing its overlapping and exclusive dimensions, which resulted in 14 facets: energy, sleep (physical domain); positive feelings, self-esteem (psychological); dependence on medication, & treatment (independence); personal relationships, social support, sex-life (social relationships); home environment, financial resources, health & social care, recreation & leisure (environment); wholeness, inner peace (spiritual). In addition, they showed that health-related variables are closely linked to physical QoL facets on medication, activity, and mobility. Testa and Simonson (1996) proposed an interesting conceptual framework of the subjective and objective domains to unravel the different components of QoL and WB. Each domain of QoL can be viewed from two dimensions: as an objective examination of a person’s health status, based on an assessment of their physical, mental, and social functioning, and as a subjective perception of that status, in terms of physical, mental, and social well-being. Figure 1 shows a modified conceptual model of their QoL framework supplemented by socioeconomic (objective) factors and the classification into physical, psychological, and social well-being. The x‑axis represents the subjective, the y‑axis the objective dimensions of the three central domains of the concept of QoL. The concept of QoL spans between the two axes. The objective dimension of QoL is to be understood as a gauge of the state of health in physical, mental, and social terms, which is, however, only transformed into the concept of QoL by the person’s subjective judgment. Using the example of the physical domain, this means that the frequency and severity of limitations and their relevance from the individual’s point of view are important (Fig. 1).

Fig. 1
figure 1

Conceptual model for quality of life and well-being. QoL Quality of Life, ADL Activities of Daily Living, IADL Instrumental Activities of Daily Living. (Modified after Testa & Simonson, 1996)

However, there remains ongoing debate about how QoL and WB should be conceptualized and measured, and therefore, a variety of approaches exist to measure QoL and WB in surveys. More recently, Lindert, Bain, Kubzansky, and Stein (2015) identified 60 unidimensional and multidimensional instruments to survey subjective WB alone. Some of these are generic, i.e., appropriate for the general population, and can be applied to various conditions (e.g., diabetes, cancer; Ryffs Scales of Psychological Well-being (Ryff & Keyes, 1995); WHO (Five) Well-being Index (Bech, 1996); WHOQOL-100 (WHOQOL, 1998)), while others are disease-specific and relate to a particular pathology (e.g., depression, Parkinson’s disease; Beck Depression Inventory II (Beck, Steer, Ball, & Ranieri, 1996); Parkinson’s disease questionnaire (Hagell & Nygren, 2007)). However, few of these instruments have been developed specifically for older “healthy” adults; instead, there are mainly questionnaires for individuals with dementia (e.g., Qualidem). Also, no instruments are available specifically for the nursing home setting.

Previous studies on older adults in a nursing home setting (Ballmer, Wirz, & Gantschnig, 2019; van der Wolf, van Hooren, Waterink, & Lechner, 2019) often use questionnaires designed for young and middle-aged adults to evaluate WB or QoL, such as the SF-36 and its short form the SF-12 by Bullinger and Kirchberger (1998), the World Health Organization Quality of Life 100 (WHOQOL-100), or the 5‑item World Health Organization Well-Being Index (WHO-5). These questionnaires cover medical and physical aspects, while social and psychological factors are only marginally addressed. However, they do not address individual priorities and needs arising from different life experiences (Meyer, Drewniak, Hovorka, & Schenk, 2019). Another problem is that the questionnaires, which are also used for community-dwelling individuals, often do not correspond to the nursing home residents’ living environment. For example, the SF-36 or SF-12 asks whether there are limitations to activities such as running fast, lifting heavy objects, or engaging in strenuous sports due to current health status, to name just a few examples. One of the few questionnaires developed for nursing home residents is the Qualidem (Ettema, Dröes, de Lange, Mellenbergh, & Ribbe, 2007). In this questionnaire, residents are not directly interviewed, but their close caregivers, such as relatives or caregivers, can assess their situation. However, considering the resident’s perspective is important as the third party to self-assessment can be very different (Oppikofer & Mayorova, 2016).

The Laurens Well-Being Inventory for Gerontopsychiatry (LWIG; van der Wolf, van Hooren, Waterink, & Lechner, 2018) is a self-rated 30-item well-being measure (that relate to the last 7 days) for individuals residing in a nursing home driven by both theoretical and data-driven considerations. In creating their scale, van der Wolf et al. (2018) followed the WHO view that at least the physical, social, and psychological dimensions should be included (WHOQOL Group, 1995; see Fig. 1). Two models that, when used together, cover and explain these dimensions are the Social Production Function (SPF) model (Lindenberg, 1986; for both the physical and social domains) and Ryff’s model of psychological well-being (Ryff & Keyes, 1995; for the psychological domain). Based on previous matching question instruments, interviews with geriatric professionals, and focus groups, the authors created an item pool from which selections were made based on their chosen theoretical models. Further decisions about whether to retain or reject items in this process were based on an empirical study of nursing home residents, reducing the initial pool of more than 300 possible items to 30 items. The authors of the original scale (van der Wolf et al., 2018) analyzed the proposed 3‑factor structure of wellbeing through confirmatory analysis (PCA), but only on the questionnaire with 53 variables (preliminary stage). Overall, the fit statistics did not show a perfect model (CFI = 0.688, TLI = 0.675, RMSEA = 0.063). In addition, they examined whether one or more factors were present within the three different dimensions. The final instrument contains six subscales in three dimensions: physical well-being’ (6 items: items 1, 7, 13, 19, 24, 27), social well-being’ (subscales positive social experience, 6 items: 4, 10, 16, 22, 26, 29; negative social experience, 4 items: 5, 11, 17, 23; communal living, 3 items: items 6, 12, 18), and psychological well-being’ (subscales affect, 7 items: 2, 8, 14, 20, 2, 28, 30; self-worth, 4 items: 3, 9, 15, 21). The LWIG has demonstrated adequate reliability and validity (van der Wolf et al., 2018). Reliability of the dimensions and their underlying factors was assessed using McDonald ω and Cronbach α: All but one subscale “negative social experience” had values > 0.70. Although self-rated and observer-rated WB is generally weakly correlated, the LWIG subscales were significantly related to most of the observer-rated Qualidem subscales and the Cantril Ladder (Ettema et al., 2007; van der Wolf et al., 2018). In addition, older adults with depressive symptoms scored significantly lower on all subscales of the LWIG compared to individuals with no depression.

Purpose of this study.

To our knowledge, the LWIG is a practical and reliable WB assessment tool that has never been validated or implemented in German nursing home populations. Therefore, the purpose of our study was to (1) translate and cross-culturally adapt the LWIG to a German context and (2) test the reliability and validity of the German LWIG in a group of older nursing home residents using the Rasch model.

Following the Standards for Educational and Psychological Testing (Frey, 2018), the present study addressed evidence related to dimensionality, reliability, and construct validity (Boateng, Neilands, Frongillo, Melgar-Quinonez, & Young, 2018). Dimensionality, in general, refers to the structure of a specific phenomenon and examines the extent to which the internal components of a scale match the defined constructs and is concerned with item homogeneity. We examined dimensionality using both exploratory and confirmatory factor analytical procedures as well as Rasch analysis. Reliability refers to the consistency and relative freedom from error of an instrument. This study assessed internal consistency using the reliability coefficients Cronbach’s alpha (α), composite reliability (CR), and McDonald’s omega (ω) coefficient. Finally, construct validity refers to how well a scale measures the construct it is intended to measure and is based on, among other things, the relationships of the constructs to other variables. There are two subsets of construct validity: convergent construct validity and discriminant construct validity. Convergent construct validity tests the relationship between the construct and a similar measure; it shows that constructs that are supposed to be related are related. Discriminant construct validity tests the relationships between the construct and an unrelated measure; this indicates that constructs are not associated with something unexpected. Consistent with previous studies, factors such as physical and cognitive activities as well as physical function are found to be positively related to well-being in nursing home residents (Brett, Traynor, & Stapley, 2016; Grönstedt et al., 2011; Saadeh, Welmer, Dekhtyar, Fratiglioni, & Calderón-Larrañaga, 2020) and negatively related to impaired cognition, fear of falling, and depression (Morsch, Shenk, & Bos, 2015; Smalbrugge et al., 2006). Therefore, these constructs were selected to assess convergent construct validity using correlational analyses.

Methods

Study design

This cross-sectional study was designed to examine the psychometric properties (dimension, reliability, validity) of the German version of the LWIG (LWIG-GER) in nursing home residents.

Transcultural translation and adaptation of the instrument

The translation was performed according to translation guidelines of the World Health Organization (WHO, 2010). Four different phases were followed: (1) the initial translation from English to German by two independent bilingual translators; (2) the synthesis of the first two translations to provide the first version of the translated questionnaire; (3) the backward translation by two different bilingual blinded to the original English version and having German as their first language; (4) an expert committee review to compare the backward translations with the original questionnaire and consent on a second version of the translated questionnaire; (v) the pretest of the second version of the LWIG-GER to ensure good comprehension of each question of the questionnaire and conclude with the third version, the final version of the German LWIG.

Two geriatric nursing experts were convened for the consensus process, which identified and addressed discrepancies between different language versions to achieve conceptual, idiomatic, and semantic equivalences until they agreed on the preliminary German LWIG version. Last, a pretest of 16 older adults within the selected nursing homes was performed using face-to-face cognitive interviews to evaluate the preliminary version’s comprehension and clarity. No changes were made to the applied version because they were judged as straightforward and easy to understand. The questionnaire has three different answer scales with four answering options: (1) “not, sometimes, often, always”, (2) “not, seldom, sometimes, often”, (3) “completely disagree, mostly disagree, mostly agree, completely agree”. The total scale range of the original scale with 30 items is 30–120, with a higher score indicating a higher level of well-being. Eventually, the final version was obtained, which is reported in the Appendix.

Participants

Between March and September 2019, a convenience sample of 104 long-term nursing home residents was recruited from five different nursing homes in Baden-Württemberg, Germany. Inclusion criteria for the study were (i) the consent to participate, (ii) ≥ 60 years, (iii) the ability to comprehend and carry out simple instructions, and (iv) no communication deficits (due to chronic conditions such as sensory loss, dementia, and stroke or inability to speak German). Subjects were recruited in cooperation with the nursing staff based on the mentioned criteria. In the first stage, participants completed the LWIG-GER and the Montreal Cognitive Assessment (MoCA); one week after completing the questionnaire and test, all respondents were approached with an invitation to participate in further tests. As a result, 64 respondents agreed to participate in this second stage. There were no significant differences in age, gender, and MoCA between participants and nonparticipants in stage 2. All nursing home residents or their legal caregivers provided written informed consent before participating in the study. The experimental procedure was explained in detail to participants. The study was carried out according to the institute’s ethical standards and the 1964 Helsinki declaration and its later amendments (WMA, 2015).

Instruments

Timed-up-and-go test and walking speed

The TUG (timed-up-and-go test; Podsiadlo & Richardson, 1991) is a widely used assessment to examine balance, gait speed, and a predictor for the risk of falling (Allison, Painter, Emory, Whitehurst, & Raby, 2013; Herman, Giladi, & Hausdorff, 2011; Shumway-Cook, Brauer, & Woollacott, 2000). The TUG measures the time it takes a person to stand up from a chair, walk 3 m, turn around, walk back, and sit down again. Subjects independently selected the walking speed that was appropriate for them. If individuals require less than 10 s, they are freely mobile. Times between 10–20 s can be considered as independent mobile. If the task is completed in 20–29 s, the individual has variable mobility, and if it takes more than 29 s, the individual is mobility impaired (Podsiadlo & Richardson, 1991).

In addition to the TUG, walking speed was measured over a distance of 5 m (Cromwell & Newton, 2004). The number of steps was counted, and the time was measured in seconds. Each subject performed two trials with the task of completing the distance as fast as possible. The better value was included in the analysis. Walking speed is used to assess lower extremity mobility and strength. It is considered a reliable measure of physical functioning in older adults and performs the so-called ADLs (Activities of Daily Living) (Buchner, Larson, Wagner, Koepsell, & de Lateur, 1996).

Depression in Aging Scale

The DIA‑S (Depression in Aging Scale; Heidenblut & Zank, 2010) is a test designed as a self-assessment procedure for screening for depressive disorders. The scale consists of ten short statements about depression to be evaluated as true or false with a simple yes/no answer format. The items were constructed to be brief, easy to use, and easy to interpret. Care was taken to select items to ensure that they were context-free so that the instrument could be used in various health settings (Heidenblut & Zank, 2014). The scale is based on the current WHO definition of depressive disorders and addresses everyday clinical practice demands in geriatric institutions. In addition, the implementation and evaluation should require little time and not overburden both the examiner and the respondent. Regarding the interpretation, a total score of 0–3 points is considered normal, whereas 4 points and more indicate depressive disorders (Heidenblut & Zank, 2010).

Montreal Cognitive Assessment

Global cognitive status was assessed using the MoCA (Montreal Cognitive Assessment; Nasreddine, Phillips, Bédirian, & Charbonneau, 2005). It is a brief screening tool to detect mild cognitive impairment. The test consists of 30 items categorized into the following cognitive domains: executive functions, visuospatial abilities, language; short-term memory; attention, concentration, working memory, and temporal and spatial orientation. MoCA scores range between 0 and 30, with a score of 26 or above considered no cognitive impairment, a score of 14 or less indicating cognitive impairment (Nasreddine et al., 2005).

Demographic data, activities, and fear of falling

In addition to sociodemographic data (age, gender, height, weight, and education), weekly activities (e.g., church service, game sessions, gymnastics, singing and making music, and memory training) were collected. Additionally, the subjects were asked if they were afraid of falling in general (yes/no) to determine their fear of falling (Cameron et al., 2000).

Statistical analysis

Descriptive statistics, correlation, and independent t‑tests were performed using IBM SPSS Statistics ver. 27.0 (IBM Corp., Armonk, NY, USA). Exploratory factor analysis and calculating omega were performed using JASP (v0.14.1); AMOS 27.0 was used for the confirmatory factor analysis. All Rasch analyses were performed using Winstep ver. 4.8.0 (Winsteps, Beaverton, OR, USA).

There are several approaches for assessing a scale’s psychometric characteristics and reducing items: classical test theory (CTT) and modern test theory (item response theory, Rasch Measurement; Ellis & Mead, 2002). For the CTT evaluation of the 30-items LWIG-GER, we examined the data quality and scaling evaluation, the scaling assumptions, and the reliability. For the modern test theory, we used the Rasch Measurement approach. The deletion of the items is discussed in terms of both Rasch Measurement properties and impact on the content of the final instrument, taking into account the importance of the items. The most meaningful and psychometrically sound solution was retained to create the final version of the LWIG-GER questionnaire.

Classical test theory (CTT).

The 30 items of the original LWIG were subject to item analysis using standard statistical procedures. The distributional properties of each item were examined by inspecting the skewness and kurtosis of the item’s distribution and the pattern of response frequency. Excess kurtosis (SPSS-specific) greater than 4 indicates substantial deviance from the normal distribution; an absolute skew value > 2 represents a substantial departure from normality (West, Finch, & Curran, 1995).

The initially hypothesized 6‑factor did not fit the data, and thus we followed the initial confirmatory factor analysis (CFA) with exploratory factor analysis (EFA). The EFA was conducted with JASP, using weighted least-squares estimation as the factor extraction method and geomin rotation as the factor rotation method to examine the construct validity of the LWIG-GER. Only factor loadings above 0.40 were considered indicative of item loading and were considered for CFA. The CFA was used to confirm the factor structure determined by EFA. The mean- and variance-adjusted weighted least squares (WLSMV) estimator was employed for CFA, as it is superior to other estimation methods for Likert-type rating scales (Li, 2016). Model fit indices included the Χ2 test, the comparative fit index (CFI) and the Tucker–Lewis index (TLI) ≥ 0.90, and the root mean square error of approximation (RMSEA) ≤ 0.06 (Hu & Bentler, 1999). Finally, following the basics assumptions of CTT (Ellis & Mead, 2002), a summary score of the reduced questionnaire was obtained by summing and averaging the scores of their component dimensions.

Internal consistency was evaluated to determine the reliability of the LWIG-GER. Internal consistency, which indicates the degree to which all items in the instrument refer to the same construct, can be assessed by several coefficients, each with its strengths and limitations (Deng & Chan, 2017). Cronbach’s α, composite reliability (CR), and McDonald’s ω coefficient were calculated for this study.

Convergent and divergent validity was explored via correlations with other scales. The LWIG-GER scores (total with 19 items [after EFA, CFA] and subdimensions) were correlated with age, body mass index (BMI), the DIA, the MoCA score, 5 m gait speed, and the TUG. It was hypothesized that the LWIG-GER would present higher correlations with motor performance. Lower correlations were expected with the DIA and the MoCA. Known-group validity was studied by examining the LWIG-GER score values against DIA and gait speed after transforming them into categorical variables.

Rasch measurement.

To provide additional information on the psychometric properties of the LWIG-GER items, Rasch analysis (Boone, Staver, & Yale, 2014) was conducted on the original item set of 30 questions, INFIT and OUTFIT statistics, reliability, and separation indices were calculated. First, validity was represented through fit statistics, reported as log odd units (logits) in an unstandardized (mean square [MnSq]). We have applied an iterative procedure to identify the persons causing an item to misfit and start off with the worst person and give them a PWEIGHT of 0. We repeated this process until the item fit. Next, person and item reliability and separation were calculated. The item–person map, often called a Wright map, displays both persons (in terms of their well-being) and items (in terms of their difficulty to agree with) along a common vertical axis (Fig. 3). Thus, it can be shown whether there are redundant items, too many items of the same difficulty, and gaps between items.

Results

Participants

A total of 104 long-term nursing home residents (57 women, 47 men) aged 60–99 years (M = 79.5 ± 9.1, participant characteristics shown in Table 1) completed the translated LWIG-GER. While the mean age is comparable to the population living in nursing homes in Germany, as presented in epidemiological studies, the percentage of men in our sample is higher than in similar studies (von Renteln-Kruse & Ebert, 2003). The MoCA total score across all residents was 19.03 (±5.24) points, which is just above the cut-off value (19 points) for differentiating mild cognitive impairment and Alzheimer’s disease (Roalf et al., 2013). In addition to the sample’s sociodemographic information, Table 1 shows the descriptive results on mobility, depression, the number of regular activities, fear of falling, and cognitive function. However, these data were collected from only a subsample (n = 64).

Table 1 Sample characteristics

Psychometric properties of the LWIG-GER

Item analysis

Nonresponse, item difficulty, skewness, and kurtosis were computed to give descriptive information and distribution properties (Supplementary material: Table S2). Items 5, 22, and 23 showed unacceptable values for skewness and kurtosis and a low item difficulty. Therefore, these three variables are excluded from the factor analyses (EFA, CFA).

Exploratory factor analysis

The Kaiser–Meyer–Olkin (KMO) test and Bartlett’s test of sphericity were conducted to assess the dataset’s suitability for factor analysis. In this study, the KMO value was 0.76, and Bartlett’s test of sphericity was significant (χ2 = 1130, p < 0.001), which means that the sample was appropriate for factor analysis. The number of factors was explored based on parallel analysis, a scree plot, and Eigenvalues. A three-factor model was determined as the final factor structure of the LWIG-GER (Eigenvalues 3.30, 3.34, 2.01). In the EFA, all items except nine (Items 2, 3, 8, 16, 17, 18, 21, 24, 29) were loaded (> 0.40) in at least one factor (Table 1). Two items (16, 17) loaded on two factors and did not maximize McDonald’s ω; item 29 led to an improved McDonald’s ω. The short LWIG-GER total scale comprises 19 items. The three-factor solution ‘Psychological well-being’ (7 items; affect and self-worth), ‘Social well-being’ (8 items; positive and negative social experience, communal living), and ‘Physical well-being’ (4 items; absence of physiological needs, feeling fit, being cared for physically) accounted for 39.4% of the variance. McDonald’s ω were 0.827, 0.795, and 0.685, respectively.

Rasch analysis

Participants who had unexpected values for a given item were weighted with 0 using the PWEIGHT function to get the fit statistics (INFIT and OUTFIT MnSq) within the intended range (0.5–1.5). Thus, weights of 0 were applied to four individuals. This brought the MnSq into the intended range, and all items fit the Rasch model (Table 2). The acceptable MnSq value of an item ranges from +0.5 to +1.5 considered a high yield for measurement (Boone et al., 2014). All items fall within the designated range. The mean MnSq INFIT value for all items is +1.04 (P.SD = 0.21). The mean MnSq OUTFIT value is +1.01 (P.SD = 0.22).

Table 2 Means, internal consistency reliability, factor loadings, and item fit statistics (N = 104)

Item measures show that item 12 (1.20 logits) is the most difficult item to agree with, and items 5, 22, and 23 are most easily agreed with −1.03 to −1.97 logits (Table 2; Fig. 2; the items are shown in red). The combined analysis of the 30 items showed a good separation index for both the participants studied (2.27; reliability 0.84) and the 30 items (3.89; reliability 0.94).

Fig. 2
figure 2

Item measures of the LWIG-GER (If the measure is positive, the ability is higher than difficulty, and the probability of solution is higher than 50%. The most difficult and the easiest items to agree with are shown in red)

Rasch person reliability and Rasch item reliability values range from 0–1 and can be interpreted similarly to Cronbach’s α. Separation indices can range from 0 to infinity, with higher values indicating better separation. The separation statistics for the individuals indicated that two different groups with different ability levels could be reliably distinguished based on the scale (in terms of person separation, an index of 1.50 is acceptable, 2.00 is good, and 3.00 is excellent (Duncan, Bode, Lai, & Perera, 2003)). In addition, items separation statistics show that three item groups that differ in item difficulty can be distinguished from each other (Linacre (2012) suggests that item separation indices of 3 or greater are desirable).

Figure 3 shows how the individuals with their different ability expressions and the difficulty of the 30 items can be represented in a common metric. Here, one can see some psychometrically redundant items and assess the same level of difficulty on the construct (e.g., WB10_SW, WB16_SW, WB18_SW, WB24_PHW, WB9_PSW or WB15_PSW, WB19_PHW, WB20R_PS, WB28R_PS, WB4_SW_P). Moreover, gaps between items can be observed, for example, between WB12_SW and WB6_SW_6, especially between WB22_SW and WB23R_SW. If, as with the former gap, individuals fall into this gap, a researcher would not be able to differentiate these individuals.

Fig. 3
figure 3

Wright item–person map of the LWIG-GER (Items that are harder to agree with are plotted toward the top of the Wright map. Items that are easier to agree with are plotted toward the base of the Wright map)

Confirmatory factor analysis

Subsequently, the three-factor structure obtained with the EFA was verified with the CFA, excluding redundant items from the Rasch analysis or items with low factor loadings. As modification indices suggested that model fit would improve provided correlated error terms were included, we added one theoretically plausible correlated error term (item 6: How often did you experience a sociable atmosphere when with the other residents? And item 12: How often did you feel you fit in with the other residents?). Results indicated that the three-factor model was appropriate with an acceptable fit (Χ2 (147) = 190; CFI = 0.937, TLI = 0.919, and RMSEA = 0.045).

Convergent, divergent, and known group validity

Convergent, divergent, and known group validity data are presented in Table 3. A higher score on the DIA scale was associated with lower psychological and physical well-being. Better functional status in mobility and balance was associated with significantly better social and physical well-being scores. Higher education was also reflected in better social and physical well-being. No significant associations were present for the LWIG-GER and age, sex, BMI, number of medications, and MoCA. As far as the known-group validity was concerned, higher gait speed and lower DIA scores were significantly associated with higher well-being scores in almost all subdimensions and the total score (based on 19 items).

Table 3 Convergent, divergent, and known-group validity of the LWIG-GER scale with DIA and gait speed

Discussion

This study translated the LWIG—a questionnaire specially designed for nursing home residents reflecting their environment—into German (i.e., LWIG-GER) and examined its psychometric properties based on both factor analysis (EFA, CFA) and Rasch modeling. The CFA results supported the validity of the three-factor model of the original LWIG (psychological well-being, social well-being, physical well-being) but not the 6‑factor model. Based on classic test theory, the analysis produced a shorter version of 19 items of the LWIG-GER with acceptable reliability, validity, and moderate to strong correlations with other depression and functional performance measures. Rasch modeling suggested that the LWIG-GER was unidimensional, with acceptable INFIT and OUTFIT values, good person separation, and high person reliability. It also generated a reduced version of the LWIG_GER, excluding similar items as the classic test theory procedure.

In this convenience sample from different nursing homes, the answers of almost all items of the LWIG are somewhat on the side of high well-being, leading to left-skewed distributions of the scores. This should not necessarily be regarded as unfavorable; the values of other instruments measuring QoL or WB show a similar pattern (Lindert et al., 2015). However, classical test theory as well as item response theory analysis (using Rasch) clearly show that items 5, 22, and 23 have unacceptable values for skewness and kurtosis (compared to factor analysis, a normal distribution is not a requirement for Rasch analysis) and a low item difficulty (item analysis: 0.92, 0.93, 0.94; Rasch measure: −1.03, −1.28, −1.97). Therefore, we omitted these three items, which addressed the questions “be bothered by other residents”, “relationship with nurses”, and “be bullied”.

Factor analysis demonstrated the appropriateness of a three-factor structure for the LWIG. The structure for all 30 items was identified for the first time because factor analysis on all items was not conducted on all items simultaneously in the original LWIG development study in the Netherlands (van der Wolf et al., 2018). Subscales were labeled to reflect the theory guiding the original LWIG item pool. In the EFA, the factor loadings of nine items on the scale are below 0.40 or loaded significantly on more than one dimension. For example, according to van der Wolf et al. (2018), item 3 (“I think I am worth the effort”) belongs to the psychological dimension, but our results show that this item loads with a factor of 0.337 on the social WB dimension. In contrast, for item 18 (“How often did you enjoy the communal mealtimes?”; on the social dimension), we see a factor loading of 0.373 on the psychological dimension. The different assignment of the items to a specific dimension can be attributed to a different interpretation of the questions. Undeniably, short questionnaires improve assessment by saving response time and effort, increasing the response rate, minimizing burden, and decreasing fatigue. Therefore, based on the previous results for subscaling, items 2, 3, 8, 16, 17, 18, 21, and 24 were excluded. Concerning the results based on CFA, we found the LWIG-GER with 19 items had good performance on all psychometric indicators.

Item fit analysis for each item was performed to ensure that all items matched expected responses based on the Rasch model. INFIT and OUTFIT MnSq statistics for each item were within the acceptable range, indicating a satisfactory fit to the underlying global trait, i.e., the residents’ well-being. Nevertheless, our results also indicate that the LWIG-GER had some redundant or poorly constructed items. In this context, six items can be considered redundant (e.g., 8, 16, 28, 24). The Wright map shows that most of the items that are too easy to agree with can be assigned to the social WB dimension (e.g., items 5, 22, and 23), which corresponds with the item analysis (skewness and kurtosis of the item’s distribution and the pattern of response frequency).

A widely used way to measure a questionnaire’s reliability is to assess internal consistency. We preferred the use of McDonald’s ω rather than the widely used Cronbach’s α since Cronbach’s α is based on the assumption of equal factor loadings (however, for comparison, we also reported Cronbach’s α). Therefore, Cronbach’s α, McDonald’s ω values > 0.7, can be interpreted as good internal reliability. For the LWIG-GER, we obtained McDonald’s ω of 0.685, 0.795, and 0.827 indicating a moderate to good internal consistency for the three dimensions of the LWIG-GER. Both reliability measures were comparable to the Dutch instrument (van der Wolf et al., 2018).

In terms of convergent, divergent, and known group validity, the LWIG-GER was significantly correlated with education, depression, functional performance (TUG, gait speed), and fear of falling (measures as a one-item question). Like van der Wolf et al. (2018, 2019), we observe a negative correlation between the LWIG-GER total score and the depression scale (DIA-S). Our results are also consistent with those of Meeks and Murrell (2001), who showed that higher education levels are related to lower levels of negative affect in older adults. Also, Subaşı and Hayran (2005) demonstrated that educational level is a statistically significant independent predictor of life satisfaction among nursing home residents. We also see an association between fear of falling and the LWIG-GER total score and physical WB and an association between mobility (TUG and 5 m gait speed) and the LWIG-GER total score and in almost all of its dimensions. This study conducted a known-group comparison to examine whether the LWIG-GER and its subdimensions scores could discriminate between three groups according to depression and gait speed. Our results confirm this relationship with respect to DIA‑S (p < 0.001 to p = 0.003) and 5 m gait speed (p < 0.001 to p = 0.121). In this context, the study by Painter et al. (2012) aimed to examine the relationship of fear of falling, depression, anxiety, activity level, and activity limitation. They were able to show strong correlations for most of these constructs. This illustrates that these are multifaceted constructs that have a reciprocal relationship along with the construct of WB. Through these analyses, convergent validity and known group validity were confirmed to be satisfactory.

The present study has several limitations that should be noted. One issue is achieving a sufficient sample size to conduct an EFA or CFA since getting a sample in a clinical setting is not easy. Although our sample size was smaller than that recommended for traditional factor analysis (minimum subject to item ratio of at least 5:1, Hatcher, 1994; or more recently 3:1, Bujang, Ghani, Soelar, & Zulkifli, 2012), the Rasch rating scale model functions with a minimum of 10 observations per category (Boone et al., 2014). As a Likert-type scale of four categories was used for the present study, the minimum sample size for Rasch model measurement analysis for the measurement instrument in this study was n = 40. These recommendations suggest a valid sample size of n = 104 in the present study to analyze the LWIG-GER. However, more significant numbers of cases are needed to replicate our findings and draw reliable conclusions.

We are confronted with the same attendance problem mentioned by van der Wolf et al., (2018). Accordingly, it can be assumed that residents who did not attend are either more physically (e.g., gait speed, TUG) or mentally (e.g., MoCa) impaired, which should negatively affect WB. In particular, a more comprehensive range of facilities should be considered in terms of socioeconomic status.

Regarding reliability, an internally consistent questionnaire can be assumed based on correlations, Cronbach’s α, McDonald’s ω, and calculated Rasch reliability. However, no conclusions can be made about test–retest reliability and interrater reliability. In future studies, testing these reliability measures is an exciting aspect. Moreover, even though we did not include items 5, 22, and 23 in the factor analysis, and most of the items that are too easy to agree with belong to the social dimension, we would not recommend removing all of these items. This is because the present data were collected in only five different facilities, and thus it is not certain that the full range of facilities was covered. Therefore, it is important to replicate our results not only with larger samples in future studies to revisit the questionnaire’s undoubtedly reasonable shortening; but in addition, other formulations regarding items that are too “easy” should be tested with the residents to avoid falling into the trap of social desirability in response behavior.

Conclusion

Taken together, the German version of the short LWIG-GER with 19 items provides a valid and reliable measurement tool to evaluate well-being in nursing home residents. Although there is no consistent definition of WB, it seems to be a multidimensional construct (Lindert et al., 2015; Su et al., 2014). Based on the CFA and comparable to the results of van der Wolf et al. (2018), we can verify a three-factor structure (psychological, social, and physical well-being) with an acceptable fit.

We see great potential in this questionnaire because, on the one hand, there have been few specific questionnaires on WB in healthy nursing home residents. On the other hand, the questionnaire is meaningful, cost-effective, time-efficient, and easy to use. However, significantly more data are needed to determine the sensitivity, specificity, and cut-off values of the LWIG-GER. Nevertheless, this provides an instrument for future studies that measure WB validly and reliably.