Worldwide, conflict situations and the resultant number of refugees continues to increase, with over 40 million forcibly displaced people recorded in 2008 [1], and nearly half of these originally coming from Iraq or Afghanistan. As many will eventually be resettled elsewhere, their long term health and settlement concerns are of continuing relevance, providing a likely focus for research due to a high prevalence of mental health problems among these groups[2]. Since Australia and New Zealand have both accepted refugees for many years and have dedicated but distinctly different settlement policies, a study was proposed to compare the resettlement of two discrete refugee groups, Afghans and Kurds resettled and living in Christchurch or Perth, by assessing their health and subjective well-being (SWB). The main findings of the study will be reported separately. However, as a major challenge involved the selection of standardised, validated instruments in appropriate languages to measure the outcomes of interest with these ethnic groups, the aim of this article is to describe the instrument selection criteria, taking into consideration language requirements and a review of previous instruments used with refugees and groups from Afghanistan and the Middle East region. The three instruments eventually used for the study will be briefly outlined, and participant language preferences, instrument reliability and baseline descriptive statistics for the 193 former refugees presented to assist other researchers planning studies or working in this area.


Study design

A mixed methods approach was used, combining qualitative interview data on resettlement experiences with quantitative assessment of psychological distress, general perceived self efficacy and subjective well being in a sample of adult Kurdish and Afghan former refugees settled for up to twenty years. The study was approved by the Human Research Ethics Committee, Curtin University of Technology in Perth.

Statistical analysis

Quantitative data was analysed using SPSS 12.0 (SPSS Inc.). Frequency distributions for each language version by demographic variables and baseline descriptive statistics were calculated for each instrument. Kruskal-Wallis and Mann-Whitney U tests were performed to assess differences between groups of variables. Significant results from the Kruskal-Wallis test were further analysed by pair wise comparison using the Mann-Whitney test and the Bonferroni correction to determine significance level. Cronbach's alpha was calculated to assess reliability of the instruments.

Criteria for Instrument selection

Language considerations were a major concern, as although many former refugees, especially those who have been settled for several years have a good understanding of English, there is not only an ethical imperative to ensure participants fully understand the implications and reason for the research, but because study validity could be compromised if instrument concepts are poorly understood. For this reason, the availability of pre-translated instruments in appropriate languages for the selected target populations was a key criterion in their selection. Instruments needed to be available in Farsi (Persian) as it is a national language of Afghanistan (Dari) and is also understood by many Kurdish refugees, as well as Arabic and English. No instruments were identified in any of the Kurdish dialects, however, as most Kurds have been educated in relevant the state languages, Farsi and Arabic were considered a compromise choice. In addition to availability in appropriate languages, we also required questionnaires to report adequate validity and reliability with comparable populations, measure the constructs of interest, and ideally have comparative national or local population data sets available.

Because of our Afghan and Kurdish focus, articles describing research with refugee and migrant participants from the Middle East were reviewed to identify instruments that had been selected by other authors (summarised in Table 1). This revealed no clear consensus on which to base the choice of instruments, as many authors did not discuss language details.

Table 1 Published studies of health and wellbeing in Afghan and Middle Eastern refugees and migrants

Following extensive database and internet searching three instruments were selected for the study, with the format, scoring, website information and comparative data sources summarised in Table 2. All are freely available from the website links listed, in a selection of languages suitable for use with groups from the Middle East region.

Table 2 Summary of instrument characteristics

• Kessler-10 Psychological Distress Scale (K-10)

The Kessler-10 scale is a population screening tool for psychological distress and has been used in New Zealand and Australian National Health and state surveys[3, 4]. The K-10 consists of ten questions designed to measure psychological distress over the previous four weeks, scored with five response categories on a Likert scale. The sum of all ten items gives a total score with a range from 10 to 50. Variations in cut off levels have been noted; however, the NZ and Australian health surveys use the following criteria: scores of 10-15.9 indicate a low risk of psychological distress; 16-21.9 indicates an individual may be experiencing moderate levels of distress consistent with a diagnosis of moderate depression and/or anxiety disorder; 22-29.9 suggest a high level of distress; and scores of 30 or more indicate the possibility of very high or severe levels of distress. An additional four questions, which do not contribute to the final score, are included to assess the impact or degree of disability associated with the identified level of distress. Only people scoring above the minimum are asked to complete these. The questionnaire has been translated into Farsi, Arabic, and Turkish and validated by the Transcultural Mental Health Centre in NSW, Australia.

One recent study assessed the psychometric properties of the instrument with Moroccan and Turkish respondents, concluding that it is a reliable and valid screening instrument for anxiety and depression among groups from the Middle East [5]. The K-10 compares favourably with diagnostic interviews (World Health Organization Composite International Diagnostic Interview (CIDI)) and also with the General Health Questionnaire-12 (GHQ-12) [6].

• General Perceived Self-Efficacy Scale (GPSE)

The GPSE aims to assess an individual's general sense of self-efficacy, reflecting their ability to cope with daily hassles and flexibility to adapt after experiencing stressful life events. It correlates positively with self-esteem and optimism and negatively with anxiety, depression and physical symptoms. Efficacy beliefs control levels of motivation and perseverance, resilience to adverse situations, and they impact on an individual's vulnerability to stress and depression, as well as influencing life choices [7]. Measurement of generalised self efficacy has been subject to debate, although recent studies have confirmed it as a global construct [8, 9]. The scale consists of ten questions in which respondents rate how well each statement describes their approach to problem situations on a four point Likert scale. A sum score, with a range from 10 to 40 points, can be calculated by adding all responses, or alternatively a mean score may be used. Higher scores represent higher perceived self efficacy. If there are more than three missing values, scores are not calculated. The scale is available in 30 languages from the website listed in Table 2, which also provides links to comparative data sets.

• Personal Well-Being Index (PWI)

The Australian Unity Well-Being Index (Personal Well-Being Index) was selected to measure subjective wellbeing, through eight domains representing the first level deconstruction of the global question 'How satisfied are you with your life as a whole?'[10]. Domains comprise standard of living, health, life achievement, personal relationships, and personal safety, feeling part of the community, future security and spirituality/religion. The optional religion/spirituality domain was also included as this is an important component of subjective wellbeing for groups from the Middle East[11]. Questions are scored using an 11-point Likert scale with the anchors 0 'Completely dissatisfied' and 10 'Completely satisfied'. The domains can be analysed as separate variables, or aggregated to give an average percentage score representing subjective well being, with higher values representing greater satisfaction. The questionnaire has validated Farsi and Arabic versions showing acceptable sensitivity between different demographic groups, and normative datasets for Australian and international populations are also available for comparison from the developer's website (Table 2).

As all instruments were directly downloaded in suitable languages, further translation prior to use was not necessary. We offered participants pre-translated and validated Farsi and Arabic versions, in addition to English. Turkish versions of the K-10 and GPSE were also obtained (although not needed) due to anticipated variations in the demographic profile of Kurdish groups in Australia; however, the PWI was unavailable in this language.

We had participant information sheets outlining our study objectives and procedures, as well as consent forms professionally translated into Farsi and Sorani (Kurdish dialect) using a standard back-translation procedure for the benefit of participants and interpreters. During this process, the translators also checked the original translated instruments and prepared them so that they were well presented and their format was consistent.

Open-ended questions were included in the interview to provide qualitative feedback and personal perspectives on participants' resettlement experiences. These explored differences between home and host countries, resettlement difficulties and suggestions for improvement, assessment of support, and strategies for dealing with stress and ill health. Respondents were also given the opportunity to raise any other issues of concern or interest. Results for this will be reported separately.


Participants were of Afghan or Kurdish ethnicity, 18 years or older at the time of the study, who had arrived in New Zealand or Australia as refugees or asylum seekers between 1988 and 2008 and were resident in either Perth or Christchurch at the time of the study. A link methodology sampling method was used to overcome some of the sampling challenges with socially invisible groups, including invisibility in national data sets (a particular issue for people of Kurdish ethnicity), difficulties with access and trust and concerns about research motives. Multiple access points into each of the four refugee groups helped reduce selection bias while improving representativeness of the sample [12, 13]. At least six discrete snowball initiation points were used with each group, with a variety of people recruited from each entry point giving a good cross-section of each community.


The sample consisted of 193 former refugees living in Christchurch (n = 98) and Perth (n = 95), 47% were Afghan and 53% Kurdish; 48% of the sample was female. Participants' ages ranged from 18-70 years, with time since resettlement ranging from several months to 20 years. Although sixteen had been minors at the time of arrival, all except two were of school age, mostly teenagers and had clear recollections of the resettlement experience. Most (86%) of participants reported themselves as having functional English ability, with everyone settled over ten years being able to speak it. Despite this, many people still preferred to use Farsi versions of the questionnaires, as outlined in Table 3.

Table 3 Questionnaire language version selected by participants (n = 193)

There were significant differences in the language chosen between Afghans and Kurds (with Afghans more likely to choose Farsi versions), between those settled in Christchurch and Perth, and based on English language ability. Variations in language choice between locations were mainly due to differences in resettlement time; with participants in Perth settled longer overall. No gender differences were observed. The length of time settled influenced the language version completed. Using the Mann-Whitney test for each paired combination of categories and the Bonferroni correction (p = .008) significant results were observed between groups settled for between 1-2 years and 11-20 years (U = 404.0, Z = -4.406, p.000), 3-5 years and 11-20 years (U = 536.0, Z = -4.973, p.000), and 6-10 years and 11-20 years (U = 1121.0, Z = -4.431, p.000). This indicates that people settled 11 years or longer were more likely to complete English questionnaires.

Most participants' self-completed questionnaires in their chosen language, discussing responses to open-ended questions in English, with interpreter help as needed. No one requested Turkish copies and only a few people wanted Arabic copies as a cross reference for English. Likert formats proved easy to understand, even for pre-literate participants.

All instruments showed good reliability when tested with our data using separate English and Farsi versions, and also when combined with the entire sample of 193 participants, as shown in Table 4.

Table 4 Reliability testing of instruments - Cronbach's alpha

Descriptive findings from the study split by gender, refugee community, and the questionnaire language version completed, as well as the total score for the combined sample of 193 participants is presented in Table 5. A full analysis of the results will be reported separately (article in preparation). As shown, statistically significant differences in mean scores were noted by gender for each instrument, by refugee group for the PWI and between language versions for K-10 and GPSE.

Table 5 Participant descriptive statistics for each instrument


Conflicts in the Middle East have led to large numbers of refugees from Afghanistan and Iraq who seek resettlement by the United Nations. Both conflict and globalisation have increased the movement of people between countries with very different cultural backgrounds, posing a number of methodological and ethical challenges for research with such groups. In particular, quantitative measures are needed, that allow comparison between groups and monitoring of trends related to resettlement.

The validity of a study using standardised instruments may be compromised if concepts are poorly understood by participants, so provision of validated instruments in suitable languages is necessary. Many instruments have been used in previous studies with refugee groups [14], and some such as the Harvard Trauma Questionnaire (HTQ) or Vietnamese Depression Scale were specifically developed for refugee research, however many of these instruments focus on pre-migration traumatic experiences or were developed only for use with specific groups. Hollifield and colleagues, [14] in a review of 183 articles describing trauma and health status in refugees, identified 12 specific refugee instruments but none met all their evaluation criteria for definition of purpose, construct definition, design, development and testing with refugee groups, nor were any of the instruments available in the published literature. Instruments such as the Hopkins Symptom Checklist (HSCL) and Beck Depression Inventory meet Hollifield's evaluation criteria, provide measures of general health status and have been adapted for use with forced migrants. These adapted instruments seemed a possibility for use, but many were not available in languages spoken by immigrants and refugees who come from the regions of our concern and have few traditional linkages to western academic or health care institutions. As the emphasis of our study was on a general overview of health and quality of life to reflect the daily realities associated with resettlement, specialised, diagnostic trauma instruments were not selected.

As described and summarised in Table 1, the next step in selecting suitable instruments was to identify instruments previously used with participants from the Middle East or Afghanistan. We included studies of refugees, asylum seekers or migrants living in resettlement countries. Of these, twenty instruments were used in ten quantitative studies, but no consensus on the suitability of different questionnaires emerged and it was unclear in many cases which language versions were used as this was rarely discussed, with the focus of most articles being on results and analysis. Of the well known instruments, the HSCL-25 and HTQ were used twice, and the General Health Questionnaire-30 (GHQ-30) was used once for assessment of mental health status. Although translations into Arabic and Farsi have been reported for some of these instruments, we could not locate them through searching published articles and the internet. In contrast, the instruments eventually chosen, although not specialised refugee tools, were freely available in translation, easy to find, did not require administration by specialist personnel and, with the exception of the GPSE, were commonly used or developed in Australasia so comparison with local national data sets was possible.

Ideally, translated instruments should have been validated with the community in question, or groups from similar cultural backgrounds, to reflect conceptual variations and different explanatory models [15, 16]. If translations are not available, questionnaires need to undergo a standard translation/back translation process, taking care to ensure semantic and conceptual equivalence [17, 18], avoidance of culturally sensitive material, and would need validation with each cultural group; a requirement beyond the scope of this and many other studies. The selection of previously translated versions of the instruments helped address these issues.

In practice, nearly 59 percent of study participants chose English language versions, with the remainder selecting Farsi questionnaires. We found significant differences between groups based on ethnic group, resettlement location, English language ability and resettlement time. People from Afghanistan were more likely to choose Farsi even many years after arrival, as it is their first language, while Kurdish respondents mainly chose English. No instruments were available in any of the Kurdish dialects; however most Kurds are educated in their state of origin languages, mainly Farsi, Arabic or Turkish and may not be literate in Kurdish, so this adds an extra layer of complexity and limitation for research with these groups. In a small number of cases, mainly for pre-literate participants, questionnaires were completed with interpreter assistance, so the language version used was dependent on them. Overall, participants in Perth had been settled longer, which accounted for some variation in English ability between the two locations and was also reflected in the language versions chosen.

Questionnaires showed good reliability (Table 4) when tested for each language version and with combined results. Amongst our participants, the PWI presented no problem for completion; however, a few people had trouble with some of the GPSE questions, with seven (three English, four Farsi) failing to complete the required number for inclusion. These asked participants to rate how well each statement described their approach to various situations, for example 'I am certain that I can accomplish my goals'. For those with strong religious beliefs (97 percent, mostly Muslim), the relevance of these concepts to their personal lives was not apparent. As one woman stated, "It doesn't matter what I think, God decides". Question 10 in the K-10 which asks if participants felt 'worthless' was culturally problematic for some Kurdish respondents as it challenged their ideal of human dignity, however they understood the reason for the question and responded accordingly. Despite these minor concerns, the instruments were easy to understand, with the format and Likert scales presenting no difficulties for participants, including those with limited literacy.

Researchers need to be cautious with interpretation of results, and aware that response biases have been reported in cross-cultural surveys with other instruments. In particular, acquiescent responses to personally relevant items have been more commonly observed in collectivist cultures [19, 20]. Cut-off points for each instrument and population norms, preferably with existing result databases to allow meaningful comparisons and conclusions to be drawn, should also be available if possible. Determination of cut-off values normally involves comparison with other instruments or interviews as the 'gold standard' to assess the validity of the instrument and should ideally be determined for each cultural group surveyed. For example, high prevalence of anxiety and depression are commonly reported in Afghanistan, however, a comparison of standard mental health questionnaires with psychiatric interviews indicated differences in optimal cut-off points[21]. In particular, gender disparities have been noted, with recommended cut-off points lower than normal for men and higher for women, suggesting that some studies may have over or under-estimated prevalence rates respectively. Although it was beyond the scope of the present study to determine this, mean K-10 scores for females were 21.8 and for men 18.5, so even if these were adjusted accordingly would still fall within the mild/moderate risk range for psychological distress.

Our study was exploratory as comparative assessment of similar ethnic groups in Australia and New Zealand has not previously been attempted, however, there are limitations that need to be acknowledged. Firstly, our desire to use pre-translated, culturally validated instruments in Farsi considerably limited the choice of instruments available. None of the specialised refugee instruments was available in Farsi, nor were we able to locate any other commonly used tools in that language when the study was developed in 2006-7. Some authors prohibit independent translation, and as it was beyond the scope of our study to undertake a full translation/validation procedure, a key selection criterion was suitable instrument language availability. However, because these instruments have not commonly been used with refugee groups, comparison data is limited. Although validation should ideally be carried out with each target group, we had to rely on those who translated the instruments and used Farsi-speaking groups as a proxy for our Afghan and Kurdish participants. The chain referral sampling method used also limits generalisability of our results to a wider population, although the personal endorsements characterised by this method helped break down barriers, providing reassurance to potentially suspicious participants. This proved particularly helpful for recruitment of female participants and helped ensure a large enough sample for a valid study.


Overall, our experience with these three instruments, the Kessler-10 Psychological Distress scale, General Perceived Self Efficacy scale and Personal Well-Being Index suggests they are suitable for use with former refugees from the Middle East and Afghanistan. They were easy to obtain in appropriate languages and scripts, generally presented no significant problems for participant completion, have population datasets available for comparison and showed good reliability when tested with our sample. The majority of Afghan participants completed Farsi language versions with most Kurdish participants preferring English questionnaires to Farsi, and no participant choosing Arabic or Turkish versions. Participants settled 11 years or longer were more likely to complete English versions than those settled ten years or less, so provision of study materials in suitable language translations for participants within this time frame is important.

Despite predictions of an increasing number of refugees in the future, at present there are limited methodological articles available to assist researchers planning studies with ethnic minority groups. Reviews of suitable instruments to allow collection of consistent and comparable data from refugees is needed. As our societies become increasingly multicultural, there is an imperative to ensure research with diverse ethnic groups is robust and conceptually sound, so instrument evaluation, cross-cultural and linguistic preferences, and interpretation of results should all be taken into consideration as part of the research process.