A Systematic Review of the Use and Quality of Qualitative Methods in Concept Elicitation for Measures with Children and Young People

Background Qualitative research is recommended in concept elicitation for patient-reported outcome measures to ensure item content validity, and those developing measures are encouraged to report qualitative methods in detail. However, in measure development for children and young people, direct research can be challenging due to problems with engagement and communication. Objectives The aim of this systematic review was to (i) explore the qualitative and adapted data collection techniques that research teams have used with children and young people to generate items in existing measures and (ii) assess the quality of qualitative reporting. Methods Three electronic databases were searched with forward citation and reference list searching of key papers. Papers included in the review were empirical studies documenting qualitative concept elicitation with children and young people. Data on qualitative methods were extracted, and all studies were checked against a qualitative reporting checklist. Results A total of 37 studies were included. The quality of reporting of qualitative approaches for item generation was low, with information missing on sampling, data analysis and the research team, all of which are key to facilitating judgements around measure content validity. Few papers reported adapting methods to be more suitable for children and young people, potentially missing opportunities to more meaningfully engage children in concept elicitation work. Conclusions Research teams should ensure that they are documenting detailed and transparent processes for concept elicitation. Guidelines are currently lacking in the development and reporting of item generation for children, with this being an important area for future research. Electronic supplementary material The online version of this article (10.1007/s40271-020-00414-x) contains supplementary material, which is available to authorized users.


Introduction
The process of healthcare decision making, specifically measuring and comparing the clinical and cost effectiveness of healthcare technologies, interventions or services, can be facilitated through the development and use of patient-reported outcome measures (PROMs). PROMs are questionnaires designed to capture the clinical and broader outcomes of treatments from the perspectives of patients [1]. They comprise items that should be designed to represent the concepts and outcomes most important to the population in which a measure will be used. Empirical work to develop measure items will be referred to here as 'concept elicitation' [2] but can also be known as conceptual attribute development [3][4][5]. Patients are asked to complete PROMs before and after receiving an intervention to record any differences in their outcomes as a result. The focus of a measure's items will vary according to whether a measure has been developed for use in a specific disease area (condition-specific) or for generic use, with the latter facilitating the comparison of patient outcomes across a broad range of health and social care conditions [1].
An important consideration for all PROMs is to ensure that the contained items are relevant and sensitive to changes in aspects such as the health or well-being of that population [6]. Guidance on PROM development from the US Food and Drug Administration (FDA) [5] and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) task force [7] suggests that qualitative, empirical research with the target population is essential to establishing a measure's content validity, that is, whether it adequately captures the items of interest [8]. The goal of qualitative research is typically to understand a

Key Points for Decision Makers
The use of qualitative research for concept elicitation is important to ensuring the content validity of patientreported outcome measures.
The quality of the reporting of qualitative concept elicitation for child and young person measures was generally poor, making judgements around the content validity of measure items challenging.
Few measures reported adapting their data collection techniques to be more suitable for children and young people, potentially missing opportunities to more meaningfully engage this population in item development, particularly younger children.
Those developing measures for children and young people would benefit from clear guidelines on how to undertake and report qualitative methods for concept elicitation.
tasks [15]. This raises questions around whether and how researchers are developing items for PROMs with the CYP population, including how they are overcoming issues with involving CYP in direct research and how they are ensuring the generation of sensitive and valid measures. This paper presents a systematic review of empirical studies documenting the development of measures using qualitative methods with CYP. The review has two aims: (i) to explore the qualitative methods that research teams have used with CYP to develop measure items, and whether methods have been adapted to suit the age and developmental needs of the population; and (ii) to explore the quality of the reporting of these methods. The discussion section of the paper synthesises the main findings from the retrieved studies and makes comparisons between what is being carried out in practice and the limited guidance available on CYP PROM development, as well as reporting standards in qualitative research generally.

Search Strategies for Studies
With a focus on exploring the qualitative approaches taken with CYP for concept elicitation, the search was designed to retrieve a breadth of papers, including condition-specific and generic measures. The search combined electronic database searching, reference list and forward citation searching of key papers and using existing systematic reviews of CYP measures to identify whether any of the measures featured had reported the use of qualitative methods in item development [16][17][18].
Three relevant electronic databases were searched: Pub-Med (includes MEDLINE), EMBASE and EconLit, with no limits on dates. The search was updated in November 2019. Search terms were developed in PubMed and adapted slightly to maximise sensitivity within each database. The terms used combined the population of interest (children and young people) with variations on the possible focus and outcomes of the developed measures (i.e., an economic, qualityof-life or well-being focus), with alternative terms for the methodological approach taken to measure development, centred around the language used in the FDA PROM development guidance (i.e., qualitative, qualitative research). The search terms developed for use in the electronic databases are detailed in Appendix 1 (see electronic supplementary material [ESM]). The 'find citing articles' feature of electronic journals was used to identify other studies that had cited key papers. Key papers for forward citation and reference list searching were studies that included a higher level of detail on the qualitative methods for item development, in phenomenon from the perspectives of those who are knowledgeable, experienced or involved [9], and qualitative data are most commonly generated through listening to the views and experiences of participants. The FDA emphasise the importance of reaching data saturation for items, that is, ensuring that they achieve full coverage of all aspects important to a population and decision-making context. The importance of clear reporting of the qualitative development of these measures is also emphasised (e.g., [5,10]) to allow users (i.e., clinicians, researchers, decision makers etc.) to decide on a measure's content validity and how suitable it is for use.
The FDA give specific advice on PROM development in children and adolescents, centred around content validity and ensuring that measures can be understood and completed by children and young people (CYP) [10]. However, direct research with CYP can prove challenging for PROMs development [11]. This is because traditional qualitative methods are typically very adult-orientated and less appropriate for use with children, particularly with young children and those not able to articulate their opinions using formal or language-based methods [12][13][14]. Arbuckle and Abetz-Webb [11] suggest that further challenges include engaging children in research activities and finding methods that are appropriate to meet the different age and developmental abilities of CYP. Rowen and colleagues (2020) note similar issues with asking CYP to provide values for items for preference-based measures, with concerns around their understanding and ability to address the complexity of elicitation anticipation that other papers may have followed and cited their work [19][20][21][22][23][24].

Selection Method
The lead reviewer (SH) screened the title and abstracts of each paper identified through the search. If the abstract did not contain enough information to make a judgement on its relevance, the full-text version of the paper was downloaded. All duplicate articles were excluded. An independent reviewer (PMM) screened a proportion (5%) of all paper abstracts in one electronic database (PubMed) against the inclusion and exclusion study criteria to ensure agreement and consistency in the papers included. The independent screening of the abstracts encouraged the authors to clarify which studies were and were not considered relevant against the inclusion and exclusion criteria.

Study Inclusion and Exclusion Criteria
Studies were included in the review if they were (i) empirical studies documenting the development of the items of a measure using qualitative research with CYP and (ii) were developing a measure for use with CYP aged between 0 and 18 years. Excluded studies included non-English language articles, review articles, methodological guidelines and research protocols. Studies were excluded if they only reported using qualitative methods for validation of items (rather than development) or if they only briefly cited or discussed linked and already existing/published item development work-although any linked articles were then searched (via Google Scholar) for possible inclusion in the review. Excluded studies extended to those that were found to be superseded by papers with more detail available on the qualitative concept elicitation work, if existing papers focused on the development of the same measure and no information important to the review was sacrificed. Finally, studies were excluded if they also involved those over the age of 18 years or if the qualitative research was undertaken with parents/ guardians or families only, that is, no CYP were directly involved in the concept elicitation.

Data Extraction and Quality of Reporting of Qualitative Methods
Data were extracted from each article into a data extraction form (see Appendix 2 [in ESM]) to ensure that the same information was captured for all studies [25]. Details recorded for all articles were the author(s) and paper characteristics (i.e., year, title and paper objective). Information was also recorded on the measure name, the type of measure (i.e., condition-specific, generic), the age of the CYP the measure was developed for and whether parents/guardians had been involved in development work. Information was documented on the qualitative methods used and studies were assessed for quality using principles from the 32-item 'Consolidated criteria for reporting qualitative research (COREQ)' tool [26], which focuses on the adequacy of reporting provided on the research team and reflexivity (i.e., reflections on how a researcher's personal and professional biases may affect research processes and outcomes [27]), study design and the analysis of findings. Details on the qualitative research in the data extraction form was collected under the following headings: information available on sampling, qualitative methods used, approach to analysis and positive and negative reflections on the methods (both the authors' and the reviewer's [SH]). The form also collected details on whether any other methods were used (aside from qualitative) to develop the items. Data extraction was completed independently by a second author for 20% of publications, as was the quality check through the COREQ checklist (PMM).

Synthesis of Results
Microsoft Excel was used to tabulate the extracted data. The data were then summarised and collated into a narrative report to describe the findings. After a summary of the paper characteristics, information from the articles were synthesised under two themes: (i) an overview of the qualitative approach used in CYP concept elicitation and (ii) the quality of reporting in concept elicitation for CYP.

Search Results
The search strategy retrieved 5072 papers; nine duplicates were removed. After screening article abstracts and titles and full-text versions of the 70 articles retrieved, a total of 37 studies met the inclusion criteria and were included in the review. Of these, 29 were identified through electronic databases and eight through other means. One study retrieved in the review [28] was found to have a 'sister' paper that contained additional detail on the qualitative item development work but predated any specific CYP measure development [29]. Information from both studies were used to inform the review, but for clarity, were treated as one record [28]. The search process is documented in Fig. 1 and the full paper characteristics for the included papers are in Table 1.
The result of the independent review of a proportion of all abstracts screened (n = 251) by two reviewers was an agreement of 99.6% abstracts to include/exclude (kappa statistic inter-rater agreement of 0.67, rated as 'good' [30]). There was no disagreement between SH and PMM regarding the accuracy and completeness of data extracted in the selected proportion of papers, including completion of the COREQ checklists.

Characteristics of Included Studies
All included studies had a similar aim: to document the development of a measure for children and/or young people. However, the studies differed in terms of how much of a focus there was on reporting the methods for, and results of, the development of the items. Two thirds of the papers discussed the quantitative psychometric validation and development of items, although this was in varying detail, and only seven focused solely on item development.
Most studies aimed to develop a condition-specific measure (31/37), with many for use with specific diseases but some also designed for use generically across disease areas, for example, chronic conditions [31,32]. Six studies reported on the development of generic measures for quality of life or health-related quality of life of CYP [19,[33][34][35][36][37]. Although most studies focused on measuring quality of life in CYP, others also aimed for the measure to be suitable for use in cost-effectiveness analyses and as a preference-based measure [19][20][21]32]. Almost two-thirds of the studies used other approaches in addition to qualitative methods to develop items. These studies mostly used literature searches, searches for existing relevant measures and consultations with experts. The exceptions were two studies that used the experience of the research team/authors to decide on the factors important to include [38,39]. Five of the 22 studies suggested that the findings of these other methods were used to inform the direction of questioning or analysis framework for the qualitative inquiry. However, in most studies these additional methods appeared to be used alongside qualitative methods to either support or add information to the developed items, although it was often not clear how this synthesis of information worked. Two of the 15 studies using qualitative methods only suggested that they thought it optimal for the items to be informed solely by direct research with CYP [19,23].
Most of the measures reported in the papers had been developed for adolescents (11/37), with the next most common being those developed for CYP aged 0-18 years (6/37) or older primary school-aged children to adolescents (i.e., those aged 8-18 years) (7/37). The remaining measures were developed for primary school-aged CYP aged 5-12 years (4/37), secondary school-aged CYP aged 10-15 years (3/37), all school-aged children aged 5-15 years (n = 1) or for use across childhood but excluding very young children aged 0-4 years (3/37). Two papers [33,39] included unclear information on the age of CYP that their measures had been developed with and for, stating their population as 'high school students' and 'adolescents' respectively.
Most papers explicitly specified that their measures should only be used with the population that the items had been developed with through empirical work. However, six studies implied that the developed measures could potentially be useful in age groups outside of this. As an example, Varni et al. [34], Ronen et al. [23], McMillan et al. [40] and Gilchrist et al. [28] did not involve any CYP from the upper range of their stated age groups in item development, and Khadra et al. [41] had very little representation from CYP at the lower end. Graham et al. [36] suggested that their measure could potentially be suitable for completion by children (or parent proxies) as young as 5 years, despite the youngest child in their concept elicitation sample being 9 years old. This raises questions around how representative the items in these measures might be for these 'missing' age groups, although this is likely to depend on the context and focus of each measure.
Nineteen of the measures involved CYP's parents/guardians or carers in item development either alongside CYP in paired interviews or focus groups, or in separate data collection. Four papers gave justification for involving parents or guardians, stating that their perspectives can offer additional valuable and valid insight into CYP's quality of life [24,[42][43][44]. Others also mentioned practical reasons for involving them-to act as proxies in instances where CYP are not able to participate [20,43,45]. One third of the 19 measures involved CYP and parents/guardians separately in data collection where possible, with authors suggesting that this was important to allow CYPs' individual opinions to emerge [23,24,43,46,47].

Data Collection Methods
The majority (n = 21) of included studies used either indepth or semi-structured qualitative interviews. Eight studies used focus group methods, and six used a combination of interviews and focus groups. One paper used the nominal group technique, where the aim was for participants to present ideas to the group relevant to the factors important to the quality of life of CYP with heart disease [48]. Participants were asked to rank the shared ideas in order of importance. This method differs from focus groups because members do not discuss (the importance of) research themes between themselves, but instead make judgements independently [49]. In the remaining study [33], the methods for data collection were not explicitly stated; however, it was implied that a qualitative approach (most likely focus groups) was used, as the authors described undertaking 'group meetings' with high school pupils for instrument development. Several papers offered justification for their choice of method. Oluboyede et al. [21] discussed using interviews with adolescents to gather individual perspectives on how being obese/overweight affected their quality of life, with the authors suggesting that adolescents felt more confident discussing this on a one-to-one basis. A further four papers suggested that they selected interviews because it either allowed CYP a more comfortable environment to discuss issues, or because it encouraged them to reflect on how their own lives were affected by their condition [19,24,35,36,38]. Markham et al. [22] and Ronen et al. [23], however, suggested that they used focus groups with CYP because they provided a supportive and social setting that encouraged CYP to share ideas and experiences.

The Use of Adapted Data Collection Techniques with CYP
Only five of the 37 papers reported adapting data collection methods to make them more suited to CYP, which for all involved using traditional qualitative methods alongside other techniques designed to involve/engage CYP in research. In the case of Stevens [19], this was setting up a warm-up activity for the children, asking them to decorate  name badges to help them to relax prior to being interviewed. The author decided against using props or activities during interviews as they thought it would distract from data collection. However, the remaining four papers used adapted techniques during data collection, including the use of pre-set picture cards [22], drawings [21] and statements [47] aimed at prompting discussion about aspects potentially relevant to CYP's quality of life. For example, Oluboyede et al. [21] used body shape drawings with adolescent focus groups to encourage participants to consider how individuals with bigger body shapes might be affected by their size. Two of the papers reported using creative/participatory methods with CYP, asking them to use modelling clay [23] and 'life maps' [47] to express ways in which their quality of life is affected by their conditions. In the latter study, CYP were asked to create a character who had a foot or ankle problem and think about and map how that character's life would be affected by their condition at different times of the day (morning, school, home, weekends). Two studies discussed adapting techniques to the different age groups of CYP [22,47], with younger CYP in the former study drawing rather than writing about their experiences, and younger children in the latter study taking part in games to select topics for discussion, rather than choosing topics at random as with the older children.
There was suggestion from the studies that those using creative and participatory methods were able to engage their relative CYP population for a longer time period. For example, Markham et al. [22], Morris et al. [47] and Ronen et al. [23] undertook focus groups with those aged as young as 6 years old that lasted from 45 up to 90 min. In contrast, focus groups with 5-to 13-year olds in the study by Gilchrist et al. [28] lasted only 12-14 min. In studies using interviews, Gilchrist et al. [28] carried out interviews lasting 6-16 min, Khadra et al. [41] did interviews with adolescents lasting 18 min on average and Stevens [19]-who used warm up activities with CYP but avoided creative methods during data collection-undertook interviews with 7-to 11-year olds lasting from 4 to 26 min. A summary of the qualitative methods and perceived quality of retrieved papers is in Table 2.

The Quality of Reporting in Concept Elicitation for CYP
The retrieved papers varied in terms of the number of COREQ checklist criteria met; however, almost half of the papers reported on none or very few of the 32 quality indicators.

Reporting on Data Analysis
Papers tended to miss reporting information on data analysis, with 15/37 not including any information on the approach to qualitative analysis used. An additional four papers included only very brief information on analysis, including the technique used (e.g., content analysis or constant comparison) but with little or no information on the process of data analysis, that is, how codes were developed and applied to the data and how themes were identified. In terms of findings, only eight of the 37 papers included quotations from the data to support the themes that had informed the items of their measures.

Reporting on Sampling
Seven papers included no information on sampling at all. A further seven studies included very basic information on either the sampling strategy (e.g., convenience or purposive sampling) or where participants were identified. The papers generally lacked information on the methods for initially contacting participants (e.g., though face-to-face consultation or postal invite) and information on those who had declined to participate. Two papers also lacked basic information on the age of the CYP included in their study [33,39].

Reporting on Data Collection
More information was generally available on data collection, with all but one paper [33] making clear which data collection method they had used. Just under one third of the papers gave an indication of the average duration of focus groups or interviews, and a similar number mentioned reaching saturation of the themes identified to inform items. However, only nine papers included an interview/focus group topic guide or examples of the questions that were asked to participants. The papers also tended not to include information on where data collection took place and who was present.

Reporting on Research Team and Reflexivity
The most common area in which information was lacking was on research team and reflexivity, with only eight [22,28,38,41,43,46,50,51] of the 37 papers including any sort of background information on the researchers (including gender and academic background). Of these seven papers, only two provided reflections on how the backgrounds of the authors may have influenced data collection or the nature of research findings. For example, Gilchrist et al. [28] commented on the potential impact of the researcher's role as a dentist when exploring the consequences of dental caries on children's quality of life. The authors reflected that due to the researcher not being the children's personal dentist, it would have been unlikely to have inhibited children's interview responses-and further, because the researcher was not aware of the children's dental history until after interviews had been undertaken and transcripts analysed, it was unlikely to have affected the nature of this researcher's questioning or analysis. In contrast, Davis et al. [50] reported that the comprehensiveness of their findings on the impact of cerebral palsy on adolescents may have been impacted by both the researchers being female, with the possibility that male adolescent participants may not have felt comfortable discussing more sensitive issues (such as relationships) with female researchers. Markham et al. [22] acknowledged that his professional and academic background would have potentially biased data collection and analysis but suggested that this potential had been "mitigated by the facilitator's reflexivity, whereby a priori preconceptions were consciously noted and attempted to be bracketed from the study" [p. 753]. However, the author gave no indication of what these biases might have been, and how they had been avoided.

Strengths in Reporting
Despite many of the papers meeting limited quality criteria on the COREQ checklist, there were strengths to some of the studies reviewed. Eleven met 15 or more of the 32 checklist criteria, including greater coverage of information on sampling, data collection and analysis than other papers. Four studies (three of these being those identified as meeting a high number of criteria on the COREQ) reported following FDA guidelines for measure development [19,21,46,52] and a further study (also highly detailed) mentioned following the COREQ guidelines for reporting [20]. Twenty of the 37 papers stated that they had ethical approval for the qualitative study, with twelve mentioning gaining informed consent (or assent) from research participants. It is important for researchers to show that they have thought about ethical issues, particularly when conducting research with CYP who may be vulnerable to pressure to take part in studies or who may not fully understand what they are being invited to participate in [53,54]. However, despite the acknowledgement of ethical procedures within many of the papers, only two of these mentioned developing study information sheets specifically for CYP's understanding, which if not developed, may have limited CYP's ability to give informed assent for their participation in research [14].

Discussion
The review retrieved a total of 37 papers, featuring condition-specific and generic measures to record changes in the quality of life of CYP. Most studies had developed measures for adolescent populations and had used either interviews or focus groups for item generation, with those choosing interviews seemingly because the method provided a more comfortable environment for CYP to discuss individual and potentially sensitive issues. This fits with previous recommendations made for PROM development in paediatric populations, which suggest that focus groups might lead to social desirability bias, as CYP could feel inhibited to express their own opinions and more likely to agree with previously raised themes in group situations [11]. Therefore, the use of focus groups in this context could potentially cause problems around the representation of all CYP's views in item generation. However, similar issues could conceivably arise in interviews, in situations where CYP might feel compelled to answer questions in a manner that they think will be viewed favourably by the interviewer. A relatively low number of studies discussed adapting methods to be more suitable for the CYP population, with only four using creative and participatory methods alongside interviews and focus groups. Several PROM guidance papers recommend the use of such approaches with CYP to keep their attention [11] and to help overcome anxiety and encourage discussion [55]. Further, studies in the child methodology literature recommend these methods to allow CYP more time and freedom to express themselves, and to address power imbalances between CYP and adult researchers, by giving CYP more control over the topic and direction of research [12,13,56,57]. Those using creative and participatory methods in the studies collected here appeared to engage their CYP population for a longer period, and although length of data collection is not necessarily an indication of quality, relatively short data collection periods might suggest that aspects important to a population may not have been discussed fully or in depth. The suggestion from the literature and this review therefore is that participatory and creative methods can be beneficial in helping CYP to engage in concept elicitation work in a more meaningful way, potentially helping to enhance the coverage and validity of included items.
However, the literature suggests that these methods are particularly relevant for engaging and keeping the attention of younger age groups [11,55], with Arbuckle and Abetz-Webb recommending the use of creative approaches in research with 6-to 11-year olds, with traditional qualitative methods becoming more appropriate in adolescents aged 12 years and over [11]. Indeed, several studies in this review appeared to carry out successful concept elicitation work with very young children (as young as 6 years), and the increased use of such methods in this area may help with the development of further measures for younger children, which at the moment are less common than those for adolescents.
In terms of reporting quality, although there were strengths, none of the 37 papers met all criteria outlined on the COREQ checklist for qualitative research, and almost half of the papers met two, one or zero. Further, many of those meeting criteria did so in very little detail. Detail was most lacking on qualitative data analysis, sampling and the research team, with these missing details making it difficult for the reader/user to make judgements about content validity and whether the items in the measures had achieved full coverage. For example, evidence of a robust sampling strategy is crucial in ensuring that important characteristics of a population have been captured (i.e., purposive sampling) [58] and, in several of the studies retrieved in the review, there was no representation in the empirical work from specific age groups within their stated population. This is particularly important in light of guidance from the FDA and others [5,11], which state that measures should be developed and saturation of items achieved in narrow age groupings of CYP, due to the rapid changes that take place in their developmental and cognitive abilities during childhood and into adulthood [59].
Details on the processes of qualitative data analysis and the research team are important to allow judgements around the robustness of the authors' interpretations of collected data. Reflexivity regarding the authors' acknowledgement of how their own personal characteristics and assumptions may have influenced findings is essential to judgements around validity [9,60] and this review found that only a small number of papers had disclosed and discussed this information. Qualitative quality guidance states that researchers should be explicit about how final themes and concepts are developed from data and provide evidence in quotations from participants to support these [27]. This review has demonstrated that very few studies had a high level of detail on the analysis process, and under a quarter of the retrieved studies included any quotations to support the items generated, leaving measure content without a clear evidence base.
Many studies used other methods with qualitative data collection to inform measure items, such as literature reviews, expert opinion and even the expertise of the authors. Although these are potentially valuable sources of information [7], it is ambiguous in many of these papers as to how far final measure content was informed by CYP's own opinions and experiences of what is important. An important quality indicator is transparency in the reporting of research processes and how research conclusions are generated [61] and this review has indicated that reporting of qualitative concept elicitation for CYP measures appears to be generally lacking in this respect. This mirrors findings of a systematic review of condition-specific preference-based measures (PBMs) by Brazier et al. [62], who found that measures using qualitative analysis in item development had reported their methods in very little detail, with the authors describing this as a 'barrier' to this aspect of measure development being better understood and becoming more scientifically rigorous (p. . To the authors' knowledge, this is the first review to summarise and critically analyse the qualitative methods used for concept elicitation for measures for children and young people. Existing reviews of generic paediatric measures have tended to summarise and critically analyse the items contained within the measures (e.g. [17,18]) or review the usage of the measures in practice (e.g. [16]), with conditionspecific measure reviews tending to summarise the measures available in particular disease areas. The strength of this review is that it has focused on how researchers have reported concept elicitation with CYP [5,7], and has importantly highlighted where more transparency is needed to allow judgements around content validity. Although research teams are clearly recognising the value of having direct input from CYP into item development, the poor quality of reporting in these studies raises questions around how far the content of these measures is truly sensitive to what is important to these populations.
Despite this review critiquing the quality of reporting for concept elicitation in CYP measures, it is important to note that it is not necessarily that researchers have not followed robust research processes, but that this has not been made clear and described in a high level of detail. For example, some of the research teams also went on to perform further validation tests with CYP on the developed items, which may have strengthened content validity (i.e., using qualitative cognitive interviews with the relevant population to check their coverage). It is also important to acknowledge that these studies have followed recommendations to use qualitative methods in item generation. Given that the focus of this review has only been to retrieve studies using qualitative methods for concept elicitation, we are unable to calculate the number of studies not using qualitative research, but we know that in economics, for example, the vast majority of PBMs for child economic evaluation have not included CYP in item development [16]. The measures included here have therefore been successful in facilitating the inclusion of the 'patient voice' in content development, which is particularly important given that children and young people have often been excluded from research [63].
This review only searched for papers in peer-reviewed journals and it is possible that further papers may have been retrieved if the grey literature had also been searched. Further, a few more relevant papers may have been picked up if the search terms had been expanded slightly-for example, to include 'health measures' in the 'focus and outcomes of developed measures' criterion of the search. However, the authors used additional techniques such as searching in relevant systematic reviews and forward citation and reference list searching to encourage a more comprehensive and targeted search. It is unlikely that the inclusion of additional studies would have changed the overall message of this review, as the reporting quality was low or lacking in most included studies. It is possible that the authors of this review could have contacted the authors of the retrieved studies for further information on concept elicitation, but in practice this would not be helpful to the users of measures who need to make judgements around content validity using the (published) information that is readily available to them. Having said this, it is also important to note that authors are often restricted by manuscript length limits and the need to report other aspects of measure development. The development of detailed guidelines on how to undertake qualitative concept elicitation work with CYP [7], and particularly on what to prioritise when reporting measure development, may help to overcome issues around poor reporting and content validity, and therefore should be considered an important area for future research.

Conclusion
This systematic review has summarised the qualitative methods and, where relevant, the adapted data collection techniques used to develop the conceptual items in measures for children and young people. We found that very few of the retrieved studies had used creative and participatory methods for item development, despite these approaches being potentially beneficial for engaging children and generating more meaningful data for concept elicitation, particularly with younger populations. The review identified important gaps in terms of the quality and transparency of reporting for item generation, with many studies not reporting information central to establishing content validity. This review recommends that research teams report concept elicitation work with children and young people in greater detail, with the development of methodological and reporting guidelines in this area being key to facilitating this.