Introduction

Auditory vocal hallucinations (AVH) are prevalent in children, adolescents, and adults, both in clinical settings and in the general population [1,2,3,4]. AVH are common in psychotic illnesses and other mental disorders such as depression, bipolar, dissociative, and substance use disorders [5]. AVH severity may be predictive of (amongst others) social problems [6, 7], suicidal ideation [8] or substance abuse [9] and a reliable assessment of AVH is therefore very important. Given that voice-hearing is an internal experience which cannot be directly observed or measured, investigating AVH relies on the report of individual experiences. The most reliable manner to do this is using structured interviews and self-report instruments.

In 2012, Bartels-Velthuis and colleagues validated the AVHRS, a structured interview to gain insight into the characteristics of voices [10] and from which a severity measure of voice-hearing can be derived. The AVHRS distinguishes itself from other measures for AVH, as besides the qualitative characteristics and severity of AVH, it also assesses the form and content of voices (in contrast to the BAVQ-R; [11]) and the number of voices (in contrast to the PSYRATS; [12]) (see the validation paper [10] for a more elaborate description). Given that there has been a shift from interview measures to self-report measures of AVH [13], a questionnaire version of the AVHRS was warranted and has now been developed. Indeed, self-report measures have the benefit of being inexpensive and time-efficient and do not require training of assessors. This is especially useful when quick or frequent assessment of AVH is required. For example, in clinical practice this may be necessary for routine outcome monitoring (ROM) or for clinical intakes. Self-report measures may also be useful for research on clinical therapies, to examine to what extent or at which time-point, the occurrence, characteristics, and severity of AVH are changing.

A number of self-report measures for AVH are available [13]. These questionnaires are usually tailored to measure a specific aspect of AVH, for example beliefs about AVH (e.g., BAVQ-R; [11]), interpretations and attitudes towards AVH (e.g., VPD; [14]), coping with AVH (RAHQ; [15]), and mindfulness of AVH (SMVQ; [16]). There are some questionnaires on AVH that have a wider focus and are also quite brief (13 items, Hamilton Program for Schizophrenia Voices Questionnaire (HPSVQ), [17]; ten items, the delusion and voices self-assessment (DV-SA), [18]). However, these questionnaires do not incorporate items on the form of address (1st, 2nd or 3rd person), the location of voices, separate or simultaneous voices, severity of negative content, or whether the voices make them anxious. The DV-SA specifically does not enquire about the duration or loudness of voices, or whether negative voices are present. Overall, compared to previous measures the AVHRS-Q ensures a comprehensive assessment of AVH, encompassing multiple qualitative aspects of AVH (e.g., negative voices, distress, interference with thinking and daily functioning) in a set of 17 items.

The aim of this study is to validate a self-report version of the auditory vocal hallucination rating scale (AVHRS; [10, 19]), called the AVHRS-Q(uestionnaire). In this validation study, the internal consistency, convergent validity and divergent validity of the AVHRS-Q will be examined. It is expected that the AVHRS-Q will correlate highly with the interview version (AVHRS; [10]), demonstrating good convergent validity. As greater severity of AVH is related to both increased psychological distress and a lower quality of life [20], it is expected that the severity measure of the AVHRS-Q will correlate with measures of psychological distress (the outcome questionnaire, OQ-45; [21] and the symptom checklist, SCL-90; [22]) and quality of life (the Manchester Short Assessment of Quality of Life, MANSA; [23]). However, given that the AVHRS-Q specifically measures AVH characteristics and severity, and not general psychological distress or quality of life, the correlations between these measures are expected to be no more than moderate, indicative of divergent validity.

Methods

Participants and procedures

For the current study data of two clinical samples (for demographics see Table 1) were used. Inclusion criteria consisted of receiving treatment for AVH, being between 18 and 65 years old, and having a good command of the Dutch language. Exclusion criteria consisted of having an organic brain disorder. Approval for the study with sample I was obtained from the Medical Ethics Committee of the University of Medical Center Groningen (ref: M13.146159). The sample size for sample I was calculated a priori by a statistician. It was determined that at least 31 people were required to obtain a two-sided confidence interval with minimum length of 0.1 for a correlation of 0.9. Thirty-two patients with AVH were recruited for sample I at the Voices Outpatient Department of the University Medical Center Groningen (The Netherlands). Patients were approached by their therapist or by the research coordinator of the Voices Outpatient Department, and received both verbal and written information about the research study, including an informed consent form. Upon providing written informed consent, participants were contacted by the researcher and completed the AVHRS-Q and were interviewed with the AVHRS. During the study, participants alternately started with the self-report version of the AVHRS (AVHRS-Q) or with the interview (AVHRS) to rule out selective memory biases for one of the measurements. Data collection for sample I took place from February 2011 until December 2015.

Table 1 Demographics of sample I and II

Data for sample II were retrospectively retrieved from a pseudonymised routine outcome monitoring (ROM) database collected in the context of mental healthcare at the Voices Outpatient Department of the University Medical Center Groningen (The Netherlands). All patients who are referred for treatment to the University Medical Center Groningen take part in ROM assessments and are informed that their data may be used for research purposes whilst having the option to opt-out. Given that these data were collected in the context of treatment and not for research purposes—therefore not requiring the patient to change their behavior for the research—no additional ethical approval for the data is required according to Dutch legislation. Sample II consisted of 82 patients with AVH receiving treatment at the Voices Outpatient Department of the University Medical Center Groningen (The Netherlands). At the start of their treatment, they completed the AVHRS-Q, the MANSA and either the OQ-45 (n = 62) and/or the SCL-90 (n = 24) (depending on which instrument their therapist selected) through the ROM service. The requested ROM data were collected from October 2011 until February 2017. As the current study took part in The Netherlands, all questionnaires and interviews were completed in Dutch language.

Measures

The AVHRS and development of the AVHRS-Q

The AVHRS [10, 19] is a structured 16-item interview, administered by an experienced therapist to evaluate AVH during a period of 1 month. The AVH are rated on four- and five-point scales in terms of frequency, duration, loudness, negative content, distress, anxiety, control, and interference with thinking and daily life. Scores range from 0 (not applicable) to 3 or 4 (most applicable).

The AVHRS-Q [24] is the self-report version of the AVHRS, designed to be administered without the presence of an interviewer, therapist or researcher. A full version of the AVHRS-Q can be downloaded at https://www.rgoc.nl/downloads (see Table 2 for a summary of the items). The AVHRS-Q has 17 items, 15 of which are assessed with a four- and five-point scale and two on a ten-point scale. For the four- and five-point scales, scores range from 0 (not applicable) to 3 or 4 (most applicable). For the ten-point scales, scores ranging from 1 (not at all/never) to 10 (extremely/always). The items of the AVHRS-Q were based on the items of the AVHRS, but adapted somewhat for the purpose of self-report administration. The first version of the AVHRS-Q was evaluated by ten patients with AVH. Based on their feedback and input from experts in the field, ten questions from the original AVHRS were refined and one item was expanded into two items. To specify, some items of the AVHRS-Q received more answer options in comparison to the interview version (see Table 2, items 3, 4, 7, 8, 10, and 15). For example, the item assessing duration of voices has four answer options in the interview version (seconds, minutes, 1 h, and several hours to continuously) in comparison to five options in the questionnaire (see Table 2, item 4). Moreover, the wording of some items was reformulated to be more simple and unambiguous (see Table 2, item 12 and 13). Additionally, the AVH frequency and intensity of suffering in the AVHRS-Q (see Table 2, item 16 and 17) are rated on a ten-point scale instead of a five-point scale in the AVHRS, as to be more sensitive to subtle changes over time.

Table 2 Summary of individual items of the AVHRS-Q and construction of the severity index

In accordance with the AVHRS and previous publications with this measure [4, 7, 25, 26], a severity index can be composed from the individual items of the AVHRS-Q. Items regarding the number of voices, localization of voices and hypnagogic/hypnopompic hallucinations are not included in the severity index (see previous publications; [10]). The answers to individual items are recoded to ‘0’ (none to mild consequences) or ‘1’ (considerable to severe consequences). Subsequently, a sum score of the recoded items is created, ranging from 0 to 14. In addition to the AVHRS-Q providing an overall severity measure of AVH, the individual items can also be used to yield specific information on characteristics of AVH (see Table 2).

Quality of life

Quality of life was assessed with the Manchester Short Assessment of Quality of Life (MANSA; [23]), a 16-item self-report measure consisting of four objective items and 12 subjective items [satisfaction with life, accommodation, housemates (or living alone), leisure activities, physical health, psychological health, personal safety, friendships, relationship to family, (absence of) romantic relationship, sex life, and financial circumstances]. Items are rated on a seven-point Likert scale, ranging from 1 ‘could not be worse’ to 7 ‘could not be better’. The summary score consists of the mean of the twelve subjective items, with higher scores indicating better quality of life.

Psychological distress

Psychological distress was assessed with either the outcome questionnaire (OQ-45) or the symptom checklist (SCL-90). Given that the data were collected through ROM assessments for treatment purposes, the therapist was free to choose which assessment measure was administered to the patient on the basis of the therapists’ own preference and familiarity with the instrument. For the current study, both questionnaires were selected, as using only one would have led to a loss of information. The OQ-45 [21] is a 45-item self-report measure assessing clinical outcome in terms of symptom distress, interpersonal relations and social role performance. For this study, the symptom distress subscale was used consisting of 25 items. Each item is scored on a five-point rating scale, from never ‘0’ to almost always ‘4’. A sum score denoting psychological distress was computed by adding up all items, with high scores pointing to more distress. The SCL-90 [22] is a 90-item self-report measure, assessing a variety of psychopathology. Each item is rated on a five-point rating scale, from ‘1’ (never) to ‘5’ (almost always). The items are clustered in nine dimensions: somatization, obsessive–compulsive, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and psychoticism. A sum score denoting psychological distress was computed by adding up all items. Higher scores suggest a lower level of psychological and physical functioning.

Statistical analyses

Analyses were carried out using SPSS version 23 for Windows [27]. In sample I, two severity groups were created separately for both the AVHRS and AVHRS-Q: those with ‘severe AVH’ (scoring in the highest quartile of the severity index, i.e., in our study 10 or higher) and with ‘mild AVH’ (scoring 0–9).

To examine convergent validity in sample I, Pearson correlation coefficients between total severity scores and separate items of the AVHRS and AVHRS-Q were computed. A paired-samples t test was performed to examine the differences in the mean AVH severity score between the AVHRS-Q and the AVHRS. An exact McNemar’s test was used to examine the distribution of mild and severe AVH groups between the two measures. Internal consistency of both instruments was determined by calculating Cronbach’s alpha [28].

To examine divergent validity in sample II, Pearson correlation coefficients between the total severity score of the AVHRS-Q and total score of the MANSA, the SCL-90 and OQ-45, were computed.

Results

Descriptives

In sample I, the AVHRS-Q took an average of 5.8 min to be completed (SD: 2.72, range: 2–15), whereas the AVHRS took an average of 14.3 min to administer (SD: 4.69, range: 8.3–27).

Internal consistency

In sample I, Cronbach’s alpha of both the AVHRS-Q and the AVHRS was 0.87. In sample II, Cronbach’s alpha of the AVHRS-Q amounted to 0.78.

Convergent validity

The average severity scores and severity groups for both the AVHRS-Q and AVHRS for sample I are given in Table 3. The severity measures of the AVHRS and AVHRS-Q were highly correlated. This correlation did not differ for participants who started with the AVHRS (r = 0.90, p < 0.01) and those who started with the AVHRS-Q (r = 0.89, p < 0.01). The Pearson correlation coefficients between individual corresponding items of both measures ranged from 0.44 (moderate) to 0.82 (high), with a median of 0.72 (see Table 4). The mean AVH severity measure and the distribution of severity groups did not differ significantly between the AVHRS-Q and the AVHRS.

Table 3 Average AVH severity score and distribution of severity groups for the AVHRS-Q and AVHRS (sample I, N = 32)
Table 4 Correlations between individual items of the AVHRS-Q and AVHRS

Divergent validity

Descriptives of sample II are given in Table 5. In sample II, the AVHRS-Q severity score was moderately correlated with the psychological distress (OQ-45 and SCL-90) and the quality of life (MANSA) scores. AVH severity was not significantly different between those who did and did not complete the OQ-45 [t(80) = 0.46, p > 0.05] and SCL-90 [t(80) = − 0.48, p > 0.05].

Table 5 Comparison of average AVH severity scores (AVHRS-Q) with measures of quality of life (MANSA) and psychological distress (OQ-45 and SCL-90) (sample II)

Discussion

The current study shows that the auditory vocal hallucination rating scale questionnaire (AVHRS-Q) [24] is a reliable and valid self-report instrument to assess the characteristics and severity of auditory vocal hallucinations (AVH). The findings demonstrate that the AVHRS-Q converges highly with the interview measure on which it was based (the AVHRS; [10]). In addition, the AVHRS-Q is shown to be a specific measure of AVH and not a general measure of psychological distress (OQ-45; [21] and SCL-90; [22]) or quality of life (MANSA; [23]). Internal reliability of the AVHRS-Q was found to be good and comparable to the reliability of the AVHRS.

The AVHRS-Q severity scores correlate highly with the corresponding severity scores of the interview version (AVHRS). In addition, the AVHRS-Q and the AVHRS did not identify a different proportion of patients as having ‘mild’ or ‘severe’ AVH. This implies that the already validated and widely used AVHRS [10, 29,30,31] can now also be used in the self-report version for the same (research or clinical) purposes. Importantly, the individual items of the AVHRS-Q also corresponded highly to the items of the AVHRS, with the exception of four moderately correlating items. Given that the AVHRS-Q had to be short and not (too) cognitively demanding, explanations, and examples of items were not included in the self-report questionnaire. This may have resulted in discrepancies (and therefore moderate correlations) between four specific items of the AVHRS-Q and AVHRS. It is therefore important to keep in mind that whilst the AVHRS-Q can be used to reliably achieve a quick overall severity measure of AVH (similar to the interview-based AVHRS), one should be cautious when only interpreting single items of the AVHRS-Q (specifically the items on form of address, severity of negative content, interference with daily functioning, interference with thoughts). Moreover, the AVHRS-Q severity scores were only moderately related to measures of quality of life and psychological distress, which indicates that the AVHRS-Q specifically measures characteristics and severity of AVH. Overall, the AVHRS-Q demonstrates good convergent and divergent validity in this study.

An important feature of the AVHRS-Q is that it takes only 6 min on average to complete. This makes it exceptionally suitable for quick and frequent assessments, for instance in research on the effectiveness of treatments or for frequent monitoring in a clinical context, such as routine outcome monitoring (ROM) assessments. Currently, ROM assessments for patients with psychosis often consist of more global outcome measures for positive symptoms, such as the positive and negative syndrome scale (PANSS) [32,33,34]. One PANSS item assesses the severity of hallucinations, but does not inquire about, for example, whether the patient has separate or simultaneous voices, whether the patient has negative or positive voices, or even how AVH interfere with daily functioning. All these aspects may be potentially relevant for treatment or in signifying the nature of distress. Currently, the AVHRS-Q is being utilized in the ROM protocol of the Voices Outpatient Department of the University Medical Center Groningen.

The current study has some limitations. First, in contrast to the HPSVQ [17] and the DV-SA [18], the AVHRS-Q does not enquire about the social circumstances of AVH, or whether the command hallucinations are obeyed. However, the AVHRS-Q does enquire about the interference with daily functioning and the presence of command hallucinations, which can be further explored during therapy. Second, similar to the validation study on the AVHRS interview [10] we did not measure sensitivity to change. As all patients were in therapy for their voices and the AVHRS-Q is incorporated in treatment, retest data would likely be confounded with therapeutic effects. To assess this in an unbiased manner, a control group not receiving treatment for their AVH should have been included. However, given that all patients had quite severe AVH for a substantial amount of years, this was deemed unethical. Third, the current study recruited two reasonably chronic patient samples, implying the current findings may be less generalizable to healthier populations. However, the AVHRS-Q has already been administered in a general population sample, supporting its use in less chronic samples [25].

One important strength of the current study is that the AVHRS-Q is based on an existing measure, the AVHRS, which has already been deemed to have good psychometric properties [10] and was used in multiple research projects [26, 30]. A second strength is that the AVHRS-Q was evaluated by patients with AVH and that their feedback was used to improve the AVHRS-Q into its current form. Third, given that AVH are prevalent in multiple disorders, therefore, being a trans-diagnostic symptom, it is a strength that the AVHRS-Q was tested in patients with different disorders.

To conclude, the AVHRS-Q is a quick self-report version of a validated interview on auditory hallucinations already in use, the AVHRS. The current study demonstrates that the AVHRS-Q has good internal consistency, convergent validity and divergent validity. The AVHRS-Q can very well be applied in both clinical practice and research, where it is required to assess AVH in a quick and reliable manner.