FormalPara Key Summary Points

Why carry out this study?

A review of existing clinical outcome assessments (COAs) available for use in amblyopia populations determined that none were fit-for-purpose, according to regulatory guidelines that describe standards for developing and/or evaluating instruments used to support endpoints in clinical trials.

Given the lack of suitable instruments available for use in amblyopia trials, this study aimed to conduct qualitative research to support development and evaluate the content validity of new COA instruments assessing visual function symptoms, impacts on vision-dependent activities, and broader health-related quality of life (HRQoL) in adult and pediatric populations—namely the three versions of the Amblyopia Quality of life (AmbQoL).

What was learned from the study?

This study provides evidence to support the face and content validity of the 23-item adult/adolescent patient-reported outcome (PRO), 24-item child PRO, and 12-item observer-reported outcome (ObsRO).

Further research is required to evaluate the psychometric measurement properties of the three AmbQoL versions to support their use in amblyopia treatment trials.

Introduction

Amblyopia is a neurodevelopmental vision disorder defined as unilateral or bilateral reduction of visual acuity due to impaired development of the visual pathway, affecting 1.36% of the global child population [1, 2]. Primary causes of amblyopia include differences in refractive error between the eyes (anisometropia), ocular misalignment resulting in disrupted binocular interaction (strabismus), or stimulus deprivation (e.g., cataract) [3]. In addition to loss of visual acuity, amblyopia affects a range of visual functions, including depth perception (stereopsis) and oculomotor function [3, 4], resulting in considerable impacts on vision-dependent activities, mobility, and broader health-related quality of life (HRQoL) [4,5,6,7].

The gold standard of care in children is correction of the underlying refractive error condition (e.g., via glasses), followed by patching or atropine penalization of the fellow eye [8,9,10]. More recently, novel binocular approaches to managing amblyopia have emerged [11, 12], which hope to mitigate some of the psychosocial and mobility impacts (e.g. bullying, feelings of embarrassment, and physical discomfort) associated with traditional treatments [5, 6, 13] that often lead to poor treatment compliance and suboptimal clinical outcomes [13].

Clinical Outcome Assessments (COAs) can help to evaluate the efficacy of new treatments. For COAs to be considered “fit-for-purpose” to support endpoints in clinical trials, regulatory guidance highlights that COAs should measure concepts that are clinically relevant and important to patients, as well as show evidence of validity, reliability, and the ability to detect change in the target population [14, 15]. A review of existing COAs for use in amblyopia, or similar ocular conditions, demonstrated that none were adequately fit-for-purpose according to current regulatory guidance [14]. For example, some instruments lacked evidence of content validity in an amblyopia population (i.e., were developed without direct input from patients with amblyopia, or not formally tested with patients with amblyopia, in terms of understanding and concept relevance via cognitive debriefing interviews), e.g., the Functional Ability Quality of Vision (faVIQ) instrument [16]. Conceptual coverage of instruments was often limited. Specifically, instruments lacked coverage of visual function symptoms; some primarily included coverage of proximal impacts (mobility and conducting daily activities), e.g., the Impact of Vision Impairment Scale (IVI), whereas others focused on broader HRQoL domains (emotional well-being and social functioning), e.g., the Pediatric Eye Questionnaires (PedEyeQ). Existing COAs tended to employ recall periods that are deemed unsuitable for the context of a clinical trial; recall periods are either not specified or longer than accepted by the Food and Drug Administration (FDA) (i.e., “past month”), which could introduce recall bias, particularly in children [14]. Evidence of responsiveness, the ability to detect change, and interpretation of change in amblyopia populations was also not reported for any of the COAs, which are vital to ensure the instrument is suitable to assess treatment benefit in a clinical trial context. Finally, amblyopia affects a wide age range, with onset from birth to approximately 7 years, which can continue into adulthood [17], yet no single instrument had sufficient age-appropriate versions to target the entire age span.

Given the lack of appropriate instruments available for use in amblyopia trials, this study aimed to conduct qualitative research to support development and evaluate the content validity of new COA instruments assessing visual function symptoms, impacts on vision-dependent activities, and broader HRQoL for use in global adult and pediatric populations, in line with current regulatory guidance [14].

Methods

Study Design

This was a qualitative study to develop and evaluate new COA instruments suitable for amblyopia treatment trials. The new COA instruments were drafted on the basis of the findings of a targeted qualitative literature review. Following this, combined concept elicitation and cognitive debriefing qualitative interviews were conducted with individuals who had been diagnosed with amblyopia, and caregivers of children with amblyopia, to assess the content validity of the new COAs. Interviews were also conducted with ophthalmologists experienced in treating amblyopia, to obtain feedback on the instruments. To limit cultural bias and ensure global perspectives, participants were recruited from the USA, France, and Germany; thus, the instruments underwent a forward–backward translation into French and German. Interviews were conducted in two rounds. In round 1, feedback was obtained on the initial version of the instruments developed on the basis of the literature review findings. Based on this feedback, modifications were implemented and subsequently tested in the second round of interviews. A translatability assessment was conducted in parallel to round 1 interviews, to ensure cultural appropriateness of the wording and concepts included for future translations and use across countries. FDA feedback was also sought on the instruments between interview rounds. To ensure clinical insight, three additional clinical experts in the field of amblyopia, which included a pediatric optometrist (from the UK, the USA, and Australia, to further ensure global perspectives), formed an “expert panel,” which provided input throughout the study. Figure 1 provides an overview of the study design. Note: this article describes the cognitive debriefing interview findings only, given the cognitive debriefing focused on exploring participants’ understanding of the instrument (including items, response scale, and recall period) and relevance of the concepts assessed; the literature review and concept elicitation findings are reported elsewhere [18]. The study was approved and overseen by Western Copernicus Group Independent Review Board (WCG IRB, reference 20204313) in the USA. Written informed consent and/or assent was obtained prior to conducting any study-related activities. The study was performed in accordance with the Helsinki Declaration of 1964, and its later amendments [19].

Fig. 1
figure 1

Overview of qualitative study design

Instrument Development

Development of the draft conceptual model in amblyopia and draft COA instruments to assess visual function symptoms, impact on vision-dependent activities, and wider HRQoL domains was informed by a targeted qualitative literature review (discussed elsewhere) [18], along with input from the expert panel. The concepts most frequently reported in the literature informed the initial instrument content to be tested during the combined concept elicitation and cognitive debriefing interviews. Given that amblyopia affects a wide age range, it was deemed that multiple age-appropriate versions would be needed. Three versions of the Amblyopia Quality of Life (AmbQoL) were developed, with recommended age bands for children and adolescents guided by ISPOR’s task force report [20]: a PRO for adults and adolescents (patients aged 13 years and older), a PRO for children (patients aged 9–12 years, inclusive), and an ObsRO for caregivers of patients aged 4–8 years, inclusive. The instruments were developed with a response scale to capture frequency and severity, depending on the concept. The visual function symptoms section of the adult and adolescent, and child PROs employed a 4-point Verbal Response Scale capturing frequency (“never,” “sometimes,” “most of the time,” and “all of the time”). The vision-dependent activities section of the adult and adolescent PRO used a 4-point Verbal Response Scale capturing the degree of difficulty (“not difficult,” “a little difficult,” “difficult,” and “very difficult”). Where relevant, additional response options were provided for instances where the item was not applicable. A simpler, but equivalent, response scale was selected for the child PRO (“not hard,” “a little hard,” “hard,” “very hard”). The ObsRO employed two, 3-point Verbal Response Scales, one assessing severity/degree of difficulty (“no, not at all,” “yes, a little difficulty,” “yes, a lot of difficulty”) and one assessing frequency (“never,” “sometimes,” “often”). Additional “not applicable” response options were provided, where relevant. A 7-day recall period was selected to reflect the relatively stable nature of amblyopia [21], but also considered long enough for patients to have conducted the activities assessed, and short enough to maximize recall ability. Instructions were included for each instrument to ensure that respondents completed the instruments thinking about their vision/conducting the activity using both eyes (i.e., not when patching), and when wearing glasses/contact lenses, if usually worn. This ensured the true impact of amblyopia was assessed, rather than impacts of treatment or a comorbid eye condition.

While the aim was to ensure alignment across the different age-appropriate versions, key differences exist. Specifically, the ObsRO only assesses concepts/behaviors that are observable to the caregiver, as such, does not assess visual function symptoms (e.g., blurry vision). Moreover, the child PRO includes images of filled boxes and smiley faces alongside the response scales, to aid understanding and engagement (see Fig. S1 and Fig. S2 in the supplementary materials) [22], and some of the wording used in the instrument has been simplified (e.g., “difficult” to “hard”).

Alongside each AmbQoL version, global impression of severity and global impression of change items were developed in line with regulatory guidance [23] to support any future psychometric validation pursuits.

Qualitative Interviews

Recruitment

Participants were identified by recruitment agencies via referring optometrists/ophthalmologists, advertisements on social media, and patient databases. To be eligible to participate, patients had to be ≥ 4 years of age with a current clinician-confirmed diagnosis of amblyopia (i.e., Snellen Acuity score of 6/9 or worse OR LogMAR 0.2 or worse) due to strabismus, anisometropia, deprivation, isometropia, or combined mechanism. Caregivers had to be ≥ 18 years of age and a parent/caregiver of a child aged 4–12 years with clinician-confirmed amblyopia. Participants were excluded if they/their child had previous intraocular or refractive surgery, or if they had another ocular comorbidity in either eye that may have reduced best corrected visual acuity. However, prior surgery to correct strabismus was allowed. Target sampling quotas were employed to ensure a range of clinical and demographic characteristics were represented. Ophthalmologists who had several years’ experience treating amblyopia patients were also recruited via third-party recruitment agencies. All participants were compensated for participation.

Target Sample

Overall, 192 participants from the USA (n = 96), France (n = 48), and Germany (n = 48) were targeted for the patient/caregiver interviews. The target sample comprised adults (aged ≥ 18 years), adolescents (aged 13–17 years), older children (aged 9–12 years), and younger children (aged 4–8 years) with amblyopia, and their caregivers (for those aged 4–12 years). On the basis of current guidance, children aged 4–8 years were interviewed together with their caregiver as a “dyad” for the concept elicitation section of the interview, as self-report is often unreliable in younger children [20]. The cognitive debriefing section of the interview was conducted with the caregivers only. Given that reliability improves with age, children aged 9–12 years and their caregivers were interviewed separately to ensure the child could independently discuss their experiences. Based on previous research, a target of 160 interviews, with at least 24 interviews conducted in each age group, was expected to be sufficient for assessing content validity of the COAs [24]. Ten ophthalmologists from the USA, France, and Germany were also targeted for the interviews.

Interview Procedure

All interviews were 60-min and conducted via telephone in the participant’s native language by trained qualitative interviewers, using a semi-structured interview guide.

For the patient/caregiver interviews, adults and adolescents completed the adult/adolescent PRO, older children (aged 9–12 years) and their caregivers completed the child PRO, and caregivers of younger children aged 4–8 years completed the ObsRO. Most participants completed a paper version of the instrument; however, US participants in round 2 interviews completed an electronic version, which was administered on a 10.23″ × 6.33″ × 0.31″ tablet device with a 10.5″ screen.

Participants completed the instrument using a “think aloud” approach, where they shared their thoughts as they read each instruction/item and selected their response [25]. This helped identify any aspects that were understood/interpreted incorrectly. Participants were then asked detailed questions about their interpretation and understanding of instructions/item wording, the relevance of concepts, and the appropriateness of the response options and recall period. If a participant lacked sufficient visual functioning, the interviewer read each instruction, item, and response option out loud to the participant. Usability of the electronic device was also explored where relevant.

The ophthalmologist interviews were conducted to obtain clinical perspectives on the suitability of the instruments for use in amblyopia. Ophthalmologists were provided with a copy of the instruments and asked to provide feedback on the clinical relevance of items, whether any key concepts were missing, and the appropriateness of the item wording, response options, and recall period. Due to time constraints, it was not possible to gain feedback from every ophthalmologist on all three instrument versions.

Data Analysis

All interviews were audio recorded and transcribed verbatim; French and German interviews were further translated to English. Interview transcripts were analyzed in Atlas.Ti (Version 8) [26] using a framework approach [27]. For the patient/caregiver interviews, dichotomous codes were assigned to each item, instruction, response option(s), and recall period indicating whether it was understood, relevant, and/or appropriate, and why. Suggested changes were also coded. Where the electronic version was completed, usability feedback was coded. For the ophthalmologist interviews, codes were assigned to instrument feedback (e.g., likes/dislikes, missing/relevant concepts).

Translatability Assessment

As the instruments may be used globally in future studies, a translatability assessment was conducted to ensure that the US-English items could be translated and culturally adapted into other languages (French, German, Italian, Spanish, Portuguese, Japanese, and Chinese) while maintaining conceptual equivalence [28]. The translatability assessment was conducted in parallel to round 1 interviews, to allow any modifications to be implemented and tested in round 2 interviews.

Results

Cognitive Debriefing and Translatability Assessment of Newly Drafted Instruments

Sample Characteristics

Overall, 133 participants were recruited to the study across both rounds. Seventy-five participants were recruited in round 1 from the USA (n = 36, 48.0%), France (n = 25, 33.3%), and Germany (n = 14, 18.7%); 51 (68.0%) were patients and 24 (32.0%) were caregivers of children aged 4–12 years. In round 2, 58 participants were recruited from the USA (n = 41, 70.7%), France (n = 5, 8.6%), and Germany (n = 12, 20.7%); 35 (60.3%) were patients and 23 (39.7%) were caregivers of children aged 4–12 years.

The demographic and clinical characteristics for the patient sample are presented in Table 1; there was broad representation of different education levels (including current school grades from preschool to grade 12), ensuring that each instrument was assessed by participants with low and high education levels. Caregivers (Table 2) were well educated, and most held at least a college or university degree (n = 14/21, 66.6%), although there was representation of lower education levels too. Ten ophthalmologists, including two pediatric ophthalmologists, experienced in treating patients with amblyopia, were also interviewed from the USA (n = 4), France (n = 3), and Germany (n = 3) (Table 3).

Table 1 Demographic and clinical characteristics of patients, split by age group and interview round
Table 2 Demographic characteristics of caregivers (of children aged 4–8 years)
Table 3 Demographic characteristics of ophthalmologists

Cognitive Debriefing and Translatability Assessment Findings

Instructions

The instructions were well understood across all instruments; however, the instructions throughout were modified to include “amblyopia”/“lazy eye” to clarify that the items ask about symptoms and impacts of amblyopia, rather than other ocular conditions. Based on FDA feedback, the instructions were also updated to include text reminding respondents to consider their/their child’s vision using both eyes, and when using vision correction (if relevant), as they respond to each item. The emotional well-being domain instructions were also updated to ensure respondents consider the impacts of their vision rather than visible signs of their amblyopia (e.g., wearing a patch, or having a wandering eye) when responding to these items.

Items

Adult/adolescent PRO

The 25-item adult/adolescent PRO was debriefed with 27 participants (16 adults, 11 adolescents) in round 1, and 12 participants (7 adults, 5 adolescents) in round 2. The items were well understood by participants (Table 4). There were a number of items where up to a third of participants’ responses were deemed “unclear”; however, they still selected a response that indicated they had understood but just did not elaborate further on their understanding of the item wording. As such, there were no major concerns raised with item understanding. Some modifications were made due to feedback from the interviews, the translatability assessment, the expert panel, and the FDA. Modifications were made to enhance clarity, particularly where instances of misunderstanding occurred during the interviews, e.g., “when looking forward” was revised to “when looking straight ahead” for the “peripheral vision” item, and “judging distances” was added to the “depth perception” item, given that some participants used this wording when discussing the item. Based on FDA feedback, “double vision” was further clarified with the addition of “overlapping images” to the item wording. Some items were updated to broaden applicability; for example, the “video games” item was updated to include “or games on a tablet or smartphone.” For some items, additional examples were included to add clarity; e.g., “reading the board in class” was added to the “difficulty reading things far away” item.

Across both rounds, most items (n = 20/25, 80.0%) were relevant to ≥ 40% of the sample (Table 4). Following round 2, the “social activities” item was removed due to low relevance and poor response distributions, and the “typing on a computer keyboard” item was removed due to low relevance, potential conceptual overlap with the “using touchscreen” item, and participant feedback suggesting that this item was exploring typing skills more than amblyopia impacts. This resulted in a 23-item adult/adolescent PRO. Some other items were also considered less relevant by a number of participants and in some cases, relevance could not be established (denoted as “unclear” in Table 4). However, where items were relevant to some participants, and those that had reasonable response distributions, were retained so that their relevance could be explored further during psychometric testing. This approach was taken for each instrument.

Table 4 Adult/adolescent PRO: overview of item understanding and relevance
Child PRO

The 25-item child PRO was debriefed with 22 individual participants (11 children aged 9–12, 11 caregivers of these children) in round 1, and the 24-item child PRO was debriefed with 30 participants (15 children aged 9–12, 15 caregivers of these children) in round 2. The items were generally well understood by the children (see Table 5), although some modifications were made to enhance clarity and ensure the items were interpreted as intended. For example, the impaired depth perception item was updated from “see how close things are” to “see how close or far away things are,” given that participants used such language in the interviews. Also, the self-confidence item was updated from “felt unsure about yourself” to “shy or not confident” due to difficulty understanding by participants. Wording modifications were also made to broaden applicability e.g., the “pay attention in class” item was updated to “pay attention to one thing (e.g., at home or in class).” Following feedback from the translatability assessment, “do schoolwork” was broadened to “your work at school or your homework” to enhance translatability.

In round 1, most items (n = 20/25, 80.0%) were relevant to ≥ 30% of the children interviewed (see Table 5). The “holding things without dropping them” item was removed following round 1, due to all participants selecting “not hard” as their response. In round 2, 22/24 items (91.7%) were relevant to ≥ 40% of the sample (see Table 5). The “using touchscreen” item was removed due to low relevance across both rounds, and poor response distributions. This resulted in a 24-item child PRO.

Table 5 Child PRO: overview of item understanding and relevance
ObsRO

The 16-item ObsRO was debriefed with 13 caregivers in round 1, and 8 caregivers in round 2. The items were very well understood (see Table 6). The context of all items in the ObsRO was modified by replacing the phrase “have you seen” with “has your child had difficulty” to mitigate the chance of the incorrect response option being selected. Specifically, there were instances in round 1 where participants had selected the incorrect response option (“no, not at all” rather than “I have not seen my child do this in the last 7 days”) if the activity had not been observed. Some wording modifications were also made to increase clarity and to align with updates made to the other instrument versions. For example, “reading words on paper” replaced “difficulty reading books” to clarify that the item assesses ability to see and read text on paper, rather than assessing the child’s reading ability. To broaden applicability, “educational games” was included as an example to the “video games/games on a tablet/smartphone” item, following clinical experts’ feedback indicating that young children are more likely to play educational games than video games.

Table 6 ObsRO: overview of item understanding and relevance

Most items (n = 13/16, 81.3%) were considered relevant by ≥ 30% of the caregiver sample across both rounds (see Table 6). The “using touchscreen”, “dropping things”,“near vision”, and “social activities” items were removed following round 2 due to low relevance. This resulted in a 12-item ObsRO.

The conceptual framework for the final AmbQoL versions following round 2 interviews is presented in Table S1 in Supplementary Materials.

Response Options and Recall Period

The response options were generally well understood and considered appropriate across all instrument versions. Some modifications were made to aid understanding and ensure correct use of response options, e.g., “I did not do this in the past 7 days” (or equivalent for the Child PRO and ObsRO) was moved to be the first response option (instead of last) to mitigate risk of “Not difficult at all/Not hard/No difficulty” being incorrectly selected for instances where the item was not applicable. The response option “I cannot do this” (or equivalent for the Child PRO and ObsRO) was removed due to participant feedback that amblyopia did not completely limit patients’ ability to conduct activities. Additionally, the “smiley face” image corresponding to “a little hard” was changed to a “straight face”, rather than a “small smile” in the child PRO, on the basis of FDA feedback indicating a disconnect between the faces and the wording of the responses. The 7-day recall period was well understood and considered appropriate by participants for each instrument.

Ophthalmologist Feedback

Ophthalmologists provided feedback on the adult/adolescent PRO (n = 8), child PRO (n = 9), and ObsRO (n = 7). The item wording and response options were deemed appropriate for the intended age group of each instrument, particularly the “smiley faces,” which were considered child-friendly. However, for the adult/adolescent PRO and child PRO, items assessing impaired depth perception and impaired peripheral vision were highlighted as potentially difficult to understand, although this did not appear to be the case in the patient interviews.

The majority of concepts assessed were considered relevant to amblyopia, although some were considered less relevant (e.g., “double vision,” “peripheral vision,” “writing,” and “using stairs”). One ophthalmologist suggested removing “sad” or “upset” from the corresponding child PRO item due to those being different concepts, and given that such feedback was also noted in the translatability assessment and the patient/caregiver interviews, “upset” was removed. One ophthalmologist suggested combining the items on “bumping into things,” “tripping or falling over,” and “going down the stairs;” however, patient/caregiver interview findings supported assessing these concepts separately. One ophthalmologist also suggested combining the “upset” and “frustrated” items on the ObsRO, but feedback from the caregiver interviews did not support this.

No additional items were implemented on the basis of ophthalmologist feedback; suggested impacts were either considered to be covered by the current items, or were not considered relevant for the purpose of the instruments (e.g., impacts of patching/glasses). Feedback on the 7-day recall period was mixed for all instrument versions; some (n = 3) suggested that a longer recall period would be more appropriate as impacts may not be captured with the 7-day timeframe. Others (n = 2) suggested removing the recall period and using the present tense. However, patient/caregiver interviews confirmed the recall period was understood and appropriate.

Electronic COA Device Usability Findings

The instruments were debriefed using an electronic COA (eCOA) tablet device with 37 US participants in round 2. Of the participants who commented on usability, most found the device easy to use (n = 32/34, 94.1%), most commented that the screen size was appropriate (n = 33/34, 97.1%), and most liked the layout of the questions (n = 32/34, 97.0%).

Global Impression of Severity and Change Item Findings

The adult/adolescent, child, and caregiver global items and their response options were well understood. Following round 1, instructions were added to align with the AmbQoL versions and clarify that respondents should think about their vision using both eyes (i.e., not when patching) and when wearing glasses/lenses (if relevant) when responding to the items.

Discussion

Existing COAs used in amblyopia populations were not considered adequate for use in amblyopia clinical trials. As such, this study sought to address this gap by developing new COA instruments that would be fit-for-purpose in amblyopia trials across the age span, as well as in wider real-world use—namely the AmbQoL.

This study provides evidence to support the face and content validity of the 23-item adult/adolescent PRO, 24-item child PRO, and 12-item ObsRO, per regulatory guidance [14]. Each AmbQoL version employs a 7-day recall period to reflect the relatively stable nature of amblyopia, where symptoms are not expected to fluctuate day to day [21]. Patients/caregivers understood the recall period well and considered it appropriate; although some ophthalmologists suggested changing or removing the recall period, a 7-day recall period was deemed most appropriate for the instruments’ context of use. Given that some of the existing instruments used in amblyopia or similar conditions do not employ appropriate recall periods [i.e., employ a lengthy recall period such as “one month” (PedEyeQ), or do not specify one (faVIQ)] for use in a clinical trial context [16, 29,30,31,32], regulatory guidelines suggest this is an important consideration, and the developed instruments address this gap. The FDA advocates shorter recall periods where respondents are asked to recall their current or recent state [14]. ISPOR’s task force report explains that instrument development should consider developmental differences and determine age-based criteria for COA administration; the task force authors offer guidance on four key age groups under the age of 18 years old [20], which was used to guide preliminary decisions on age-based criteria for each AmbQoL version. Feedback from patients/caregivers, ophthalmologists, and the expert panel supported the target age groups for each instrument version. While the PedEyeQ has multiple self-report/proxy versions for individuals aged 0–17 years, they may not be appropriate for younger children or applicable for adults. Parent/caregiver reports of observable behaviors are generally more reliable than self-report for children aged 5–7 [22], meaning the child self-report version of the PedEyeQ may not be appropriate for completion by children as young as 5–7 years. Further, there is no version of the PedEyeQ that is suitable for adults, which was a key consideration for the AmbQoL, given that amblyopia can affect individuals from birth to adulthood [17].

While each version of the AmbQoL was well understood by patients/caregivers, ophthalmologist feedback did suggest that the concepts of “impaired depth perception” and “impaired peripheral vision” could be potentially challenging to understand among patients; however, patient feedback indicated the contrary—patients were able to understand and respond to these items without issues. In terms of the concepts assessed, it was critical that the AmbQoL assesses concepts that are clinically relevant and important to patients with amblyopia [15]. As such, the instrument content was initially informed by a targeted literature review and in-depth qualitative concept elicitation interviews with patients with amblyopia and caregivers, and ophthalmologists experienced in treating amblyopia [18]. Following drafting of the instrument, patient/caregiver and ophthalmologist feedback generally supported that the concepts included were relevant to their (or their child’s) experience of amblyopia; least relevant items were removed, and no new items were deemed necessary to add.

A key strength of this study was the global, multi-stakeholder approach employed to ensure a robust COA development process, with the inclusion of a large sample of patients and caregivers from the USA, France, and Germany who provided evidence for the instruments’ content, ophthalmologists who provided feedback on the instruments, and the expert panel that included a pediatric optometrist from the USA, the UK, and Australia who provided clinical insight throughout the study. A translatability assessment was also conducted in several languages to confirm that the instruments were culturally appropriate in target languages and countries. Importantly, regulatory feedback from the FDA was implemented between interview rounds, further strengthening the instrument development process.

However, the findings should be interpreted considering limitations of the study. Although the interview sample included patients with a range of amblyopia diagnoses and severities, there was less representation of patients with a diagnosis of amblyopia due to combined mechanism (strabismic and anisometropic), and those with severe amblyopia. It could also be argued that patients may have discussed the relevance of the items in relation to strabismus, rather than amblyopia, as it would be difficult to distinguish between them. However, those with strabismic or combined strabismic and anisometropic amblyopia accounted for only 43% of the sample, and there were no key differences in the concepts relevant to patients with differing amblyopia types, supporting content validity of the instruments across various amblyopia types. While the instruments have demonstrated strong face and content validity, the next step in the validation process is to evaluate the psychometric measurement properties of the AmbQoL and develop a scoring algorithm, and score interpretation guidelines. As part of this, items would be reviewed to minimize redundancy and optimize the balance between conceptual coverage and burden of completion. It would be ideal to conduct these analyses in a larger sample of patients with a variety of different amblyopia diagnoses and severities.

Conclusion

The AmbQoL instruments were developed in accordance with best practice scientific standards [33] and regulatory guidelines [14, 15], with qualitative input from children, adolescents and adults with amblyopia, caregivers of children with amblyopia, and ophthalmologists. Each AmbQoL version has documented evidence of face and content validity for the assessment of the amblyopia symptoms and/or HRQoL impacts for use in amblyopia populations aged ≥ 4 years. To be used in amblyopia treatment trials, further research is required to evaluate the psychometric measurement properties of the instruments.