Measurement properties of patient-reported outcome measures (PROMs) in hyperhidrosis: a systematic review

Purpose To critically appraise, compare and summarize the quality of all existing PROMs that have been validated in hyperhidrosis to at least some extend by applying the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology. Thereby, we aim to give a recommendation for the use of PROMs in future clinical trials in hyperhidrosis. Methods We considered studies evaluating, describing or comparing measurement properties of PROMs as eligible. A systematic literature search in three big databases (MEDLINE, EMBASE and Web of Science) was performed. We assessed the methodological quality of each included study using the COSMIN Risk of Bias checklist. Furthermore, we applied predefined quality criteria for good measurement properties and finally, graded the quality of the evidence. Results Twenty-four articles reporting on 13 patient-reported outcome measures were included. Three instruments can be further recommended for use. They showed evidence for sufficient content validity and moderate- to high-quality evidence for sufficient internal consistency. The methodological assessment showed existing evidence gaps for eight other PROMs, which therefore require further validation studies to make an adequate decision on their recommendation. The Hyperhidrosis Disease Severity Measure-Axillary (HDSM-Ax) and the short-form health survey with 36 items (SF-36) were the only questionnaires not recommended for use in patients with hyperhidrosis due to moderate- to high-quality evidence for insufficient measurement properties. Conclusion Three PROMs, the Hyperhidrosis Quality of Life Index (HidroQoL), the Hyperhidrosis Questionnaire (HQ) and the Sweating Cognitions Inventory (SCI), can be recommended for use in future clinical trials in hyperhidrosis. Results obtained with these three instruments can be seen as trustworthy. Nevertheless, further validation of all three PROMs is desirable. Systematic review registration PROSPERO CRD42020170247


Background
Hyperhidrosis is characterized by excessive sweating beyond physiological needs. This disorder can be generalized, involving the whole body, or focal, involving specific areas of the body such as the axillae (axillary hyperhidrosis), the hands and feet (palmar and plantar hyperhidrosis) or the face (cranio-facial hyperhidrosis) [8,49,50]. With a recently found prevalence of 4.8% in the USA, about more than half of the affected individuals suffer from axillary hyperhidrosis [8,42]. The severity of hyperhidrosis can range from light sweating to real dripping. Therefore, those affected often report negative impacts on their quality of life (QoL) including for example limited daily activities, less social relationships, impairments in their study or work life, and a general reduced emotional well-being [8,14,19].
Measurement instruments that try to capture what is reported by affected individuals are called "patient-reported outcome measures (PROMs)". PROMs are self-completed questionnaires reflecting the patient's perspective and measuring, e.g. severity or QoL. By PROMs, the involvement of patients in both clinical research as well as routine health care can be fostered [28,51].
Several PROMs that cover diverse constructs have been developed and reported in the literature for patients with hyperhidrosis, for instance, the Hyperhidrosis Quality of Life Index (HidroQoL) [20], the Hyperhidrosis Disease Severity Scale (HDSS) [47] or the Axillary Sweating Daily Diary (ASDD) [34]. In clinical research and practice, not only hyperhidrosis-specific PROMs are used but also skinspecific or more generic PROMs such as the Dermatology Life Quality Index (DLQI) [9] or the short-form health survey (SF-36) [18,38].
Especially in clinical research, it is important to select measurement instruments with sufficient measurement properties in the population of interest. PROMs should be reliable, valid, responsive and feasible. The selection of instruments should be based on complete information regarding these measurement properties and the quality of the underlying research.
In preparation of the development of the HidroQoL, Kamudoni rated the psychometric properties of several instruments used in measuring QoL in hyperhidrosis. For this rating, he used literature-based standard quality criteria [18]. Wade et al. [50] also conducted a review of the most commonly used QoL measurement instruments in hyperhidrosis, but explicitly refrained from using the COnsensusbased Standards for the selection of health Measurement INstruments (COSMIN), as this would have been beyond the scope of their review given the high level of detailed information required and the level of expertise needed in the application of the COSMIN checklist.
We intend to fill this gap by performing a systematic comparison of all existing PROMs in hyperhidrosis (not just of those measuring QoL) and an assessment of the quality of these PROMs using the established COSMIN methodology.

Objectives
Our main objective was to critically appraise, compare and summarize the quality of all existing PROMs that have been validated in hyperhidrosis to at least some extent by applying COSMIN methodology.
More specifically, our objectives were 1. to systematically assess the measurement properties of PROMs in hyperhidrosis and 2. to identify PROMs in hyperhidrosis a. that meet the predefined criteria to be recommended in future hyperhidrosis trials; b. that have the potential to be recommended in the future depending on the results of further validation studies; c. that do not meet the predefined criteria to be recommended and therefore should not be used anymore.

Protocol and registration
The methods of this systematic review were developed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement [40]. The corresponding study protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO): CRD42020170247 and published in Systematic Reviews [10].

Literature search
A systematic, librarian assisted literature search was performed in the bibliographic databases MEDLINE (via Ovid, 1946-02 June 2020, database code "medall"), EMBASE (via Ovid, 1974-02 June 2020, database code "oemezd"), Science Citation Index Expanded (1965-02 June 2020, database code "SCI-EXPANDED") and Social Sciences Citation Index (1990-02 June 2020, database code "SSCI") (the latter two simultaneously via Web of Science) on 02 June 2020 with a last update on 11 June 2021. The search strategy comprised the following search elements [32]: A. Target population: Hyperhidrosis. In order to reach maximal sensitivity a broad compilation of controlled vocabulary and free text terms was used. The search strategy for this element was not peer reviewed. B. Construct of interest: All patient-reported outcome measures regardless of the underlying construct. For optimal sensitivity the search strategy of this search element was based on a combination of the PubMed filter "Quality of life (QoL)" of Vissers and de Vries [48], the PubMed filter "Patient reported outcome measures (PROMs)" of Jansma and de Vries [17], and additional search terms from the "PROM group construct & instrument type filter" of Mackintosh et al. [27] Patient-reported outcome measures is a broad term and it includes measures of QoL or health status [12,28]. The search elements were combined as follows in order to identify all articles on the measurement properties or the feasibility of PROMs in hyperhidrosis. From these records, the exclusion filter removed irrelevant publication types as well as animal-only studies: ((A AND B AND (C OR D)) OR (C AND E)) NOT F, or in words: ((population AND construct AND (measurement properties OR feasibility)) OR (individual PROMs AND measurement properties)) NOT (exclusion filter).
Search strategies for MEDLINE, EMBASE and Web of Science were developed. The initially developed MEDLINE search strategy was translated to the other databases choosing appropriate syntax and index terms. The full, reproducible search strategies are included in Appendix 1 (supplementary files). A PRISMA-S checklist is in Appendix 2 (supplementary files) [37].
In addition, databases specific for PROMs were searched for records relevant to the target population: PROQOLID (https:// eprov ide. mapi-trust. org/ about/ about-proqo lid), the COSMIN database of systematic reviews of outcome measurement instruments (http:// www. cosmin. nl/ datab aseof-syste matic-revie ws. html), the Test Archive of Leibniz Institute for Psychology Information (https:// www. testa rchiv. eu/) and the PubPsych search engine (https:// pubps ych. zpid. de/ pubps ych/). In addition to the electronic search, handsearching was conducted by perusing reference lists of the studies included and by searching key articles on this topic. No study registries were searched due to the study designs eligible for this review. We did not contact persons or institutions in order to seek additional studies.
Subsequently, the bibliographic databases and the databases specifically on PROMs were searched again with the names of hyperhidrosis-specific PROMs found during the initial search.
There were no restrictions regarding publication date. Only papers in English, German, French or Italian were included. After the deduplication in EndNote X9 following the method of Bramer et al. [2], titles and abstracts were screened in EndNote. No further software was used for the full-text review. Data were extracted using excel sheets.

Eligible studies
The eligibility criteria are in agreement with the COSMIN guideline for systematic reviews of patient-reported outcome measures [36]. The population of interest were patients with hyperhidrosis. At least 50% of the study sample need to consist of hyperhidrosis patients to fulfil the eligibility criteria. The evaluation of measurement properties, the development of a PROM or the evaluation of the interpretability of the PROMs of interest should be the principal aim of selected studies. Studies that only use the PROM to measure the outcome or in which the PROM is used for the validation of another instrument were excluded. Only full-text articles were included because abstracts or posters provide quite often very limited information on the design of a study. Studies that concern the development ("development paper") and/or the evaluation of the measurement properties ("validation paper") of PROMs were included as well (Table 1).

Study selection
Titles and abstracts found in the literature search were independently judged by two reviewers. For the remaining titles

Assessment of measurement properties and adequacy of the PROMs
Measurement properties were evaluated in the following order: a. Evaluation of the content validity. b. Evaluation of internal structure including structural validity, internal consistency and cross-cultural validity/measurement invariance. c. Evaluation of remaining measurement properties including reliability, measurement error, criterion validity, hypotheses testing for construct validity and responsiveness.
All measurement properties were evaluated following three sub steps, except for the measurement property "criterion validity" since no gold standard for PROMs in hyperhidrosis exists. For construct validity and responsiveness, we formulated hypotheses to evaluate the results against.
First, the methodological quality of the included studies was evaluated by two independent reviewers using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Risk of Bias checklist which was developed exclusively for systematic reviews of PROMs [30]. Both reviewers had a psychological and therefore also psychometric background and were familiar with the COS-MIN methodology. The COSMIN Risk of Bias checklist consists of 10 Boxes, each for one measurement property ( Table 2). Only those boxes for the measurement properties that are assessed in an article were filled in.
All measurement properties of the COSMIN Risk of Bias checklist are clearly defined [33]. Content validity is considered as the most important measurement property because the items of a PROM have to be relevant, comprehensive and comprehensible regarding the population and construct of interest [45]. If there is high-quality evidence for insufficient content validity, the PROM was not further assessed and directly categorized as C, i.e. the PROM should not be recommended for use. Each study was rated on a 4-point rating scale (that is, "inadequate", "doubtful", "adequate", "very good"). The overall quality of a study was determined by the lowest rating of any standard in the box, i.e. "the worst score counts" principle [30]. Each study on a measurement property was assessed separately and all measurement properties of each study were rated as either very good, adequate, doubtful or inadequate [31].
Second, we extracted relevant data on characteristics of the included PROMs and the included study populations and summarized them in evidence tables [31]. Interpretability and feasibility which are also important for a recommendation were described after the evaluation of the measurement properties. Interpretability means the degree to which qualitative meaning can be assigned to a PROM's quantitative score. Feasibility contains aspects of the ease of application (e.g. costs, length, ease of administration) [31].
Furthermore, we applied quality criteria. We used updated criteria for good measurement properties recommended by the COSMIN group [36]. The result of each single study was rated as either sufficient (+), insufficient (−) or indeterminate (?) [31].
Third, we aimed to summarize the evidence per measurement property per PROM, rate the overall result against criteria for good measurement properties and grade the quality of the evidence by the GRADE approach. Here, the focus was no longer on the single studies, but on the PROM [31].
The third substep included several further substeps: (1) we looked at the consistency of our results, searched for explanations if inconsistency occurred or downgraded for inconsistency if no explanation was found; (2) we pooled or summarized the results in Summary of Findings (SoF )  Tables, each measurement property per PROM in one table; (3) we rated each pooled or summarized result again against the quality criteria to obtain an overall rating for the pooled or summarized result as either sufficient (+), insufficient (−), inconsistent ( ±) or indeterminate (?); and (4) we graded the quality of the evidence to define whether the pooled or summarized result was trustworthy [31]. The recognition of the quality of evidence can help to prevent misguided recommendations [13]. Using the GRADE approach, we COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, PROM patient-reported outcome measure determined whether confidence in estimates of true measurement properties is given. We used a GRADE approach with four GRADE factors (risk of bias, inconsistency, imprecision and indirectness) and four levels of quality evidence (high, moderate, low or very low) ( Table 3). If the results did not seem trustworthy, the quality of evidence was downgraded. Each PROM was graded separately [36]. If the overall rating for a measurement property was inconsistent (±) or indeterminate (?), the quality of evidence was not graded [36].

Generating recommendations for the use of PROMs in patients with hyperhidrosis
Each assessed instrument was assigned to a recommendation category according to its methodological quality and adequacy. We used three categories of recommendation that were proposed by the COSMIN group [36]: A. PROMs with evidence for sufficient content validity (any level) and at least low-quality evidence for sufficient internal consistency. B. PROMs categorized not in A or C. C. PROMs with high-quality evidence for an insufficient measurement property PROMs of category A can be recommended for use and results obtained with these PROMs can be seen as trustworthy. For PROMs of category B, further validation is needed; however, they still have the opportunity to be recommended for use. PROMs of category C should not be recommended for use. If only PROMs of category B are found, the PROM with the best evidence for content validity can be preliminarily recommended for use, until further evidence is given [31].
Our aim was to identify the best (currently available) PROM(s) in hyperhidrosis.

Results
Searching the bibliographic databases yielded 6691 records of which 3922 remained after deduplication and were moved into the screening. We found 188 studies to be included in the full-text screening, 19 of which were considered for qualitative and quantitative synthesis (Fig. 1). Three further relevant articles were found in the reference lists of the included studies, resulting in 24 relevant studies for data extraction. One study by Kamudoni contained data on the content validity of the HidroQoL, but did not formally meet the inclusion criteria [19]. Nevertheless, supplementary information on content validity was extracted to assess the methodological quality of the PROM development. The PhD thesis of Paul Kamudoni was also included as it provided complementary information to the paper by Kamudoni et al. published in 2015 [18, 20]. The development study of Amir et al. [1] was added since it provided preliminary work for the development of the Hyperhidrosis Quality of Life Questionnaire (HQLQ) by de Campos et al. [6]. In total, we identified 24 studies reporting on 13 different PROMs.
Five included studies reported on the DLQI [3,26,41,43,46]; three on the HQLQ [1,6,35], the HidroQoL [9,18,20] and the HDSS [39,46,47], respectively; two on the SF-36 [38,46], the Hyperhidrosis Disease Severity Measure-Axillary (HDSM-Ax) [16,22], as well as the Hyperhidrosis Scale (HS) [21,35]; and one each on the ASDD and its child version (ASDD-C) [34], the Hyperhidrosis Impact Questionnaire (HHIQ) [25], the Illness Intrusiveness Rating Scale (IIRS) [5], the Hyperhidrosis Questionnaire (HQ) [24], the Sweating Cognitions Inventory (SCI) [52] and the Table 3 GRADE approach for grading the quality of evidence [36] Starting point: assumption that the evidence is of high quality. Information on how to downgrade is described in the COSMIN user manual [31]. Definitions were adapted from the GRADE approach [11] [23]. The questionnaires of Amir et al. [1] and de Campos et al. [6] were also found as two independent questionnaires. Since the de Campos questionnaire is a further development based on the main content of the study of Amir et al. [1], studies on these two questionnaires were summarized in this paper under the de Campos questionnaire. Additionally, we found several designations for the questionnaires of de Campos et al. [6] and Kuo et al. [24]. In the following, we adopted the terms used by Wade et al. [50], namely the HQLQ and HQ.

Data extraction
Regarding the data extraction using the COSMIN Risk of Bias checklist, the two reviewers had an agreement of 80.35%. Consensus was mostly found between the two reviewers. Some major disagreements were discussed with a third reviewer having expertise with the COSMIN methodology.

Evaluation of content validity
An 'inadequate' PROM development rating was found for three PROMs: the HQ, the HQLQ as well as the SCI.
Regarding the HQLQ, the omission of a cognitive interview study or pilot test to assess the comprehensibility and comprehensiveness in a sample representing the target population was the reason for the 'inadequate' rating. The evaluation of the HQ's and the SCI's PROM development was inadequate since the PROM development studies were based on a literature search only or performed in a group of clinicians and researchers rather than in a sample representing the target population for which the PROM was developed, as required. All content validity studies, when conducted, were of doubtful quality as information on the number of researchers involved in the data analysis was mostly lacking ( Table 4).
The quality of evidence of the HidroQoL was moderate since at least one content validity study of doubtful quality was available [18,20]. Copies of the HQ and SES were not available. Only the general design and the structure of the response options were known, but no complete version of the questionnaires and their wording was available. Therefore, some aspects were judged as "?" in the reviewer's rating. In case of an inconsistent or indeterminate overall rating, there was no grading of the quality of the evidence (Table 5).
We could not find high-quality evidence that the content validity of any PROM was insufficient; thus, the remaining measurement properties of every PROM were further assessed.  [29]. For more information, visit www. prisma-state ment. org. PROM patient-reported outcome measure

Characteristics of the included PROMs and study populations
A complete overview of all included PROMs is presented in Appendix 3 (supplementary files). Characteristics of the included study populations are shown in Appendix 4 (supplementary files). Sample sizes ranged from 8 to 665 patients. Only four [1,34,43,46] of the 21 studies included children < 16 years. On average, slightly more women participated in the studies, accounting for 64% of the study populations. The studies were classified into development, validation and intervention studies and were conducted in more than 15 countries. The lowest number of items in a questionnaire was one, the highest 41 with an optional 10-item follow-up module. Only two questionnaires partially use a dichotomous response format, whilst the predominant Likert scale format is applied in various forms (3-to 11-point Likert scale) in all PROMs.
Most of the PROMs are disease-specific measurement instruments for hyperhidrosis. The ASDD(-C) and the HDSM-Ax are also site-specific for axillary hyperhidrosis. The same applies to the HS and the SES for palmar  The remaining PROMs, the DLQI, the IIRS and the SF-36, are not hyperhidrosis-specific. The SF-36 is a measurement instrument of generic health-related QoL. The DLQI is applied to patients with skin conditions and measures the impact of the condition on the patient's QoL. A child version is also available. The IIRS is validated for patients with moderate to severe chronic disabling and/or life-threatening diseases. The construct of intrusiveness represents the disruptive effects on various aspects of life due to the disease. As general questionnaires, these PROMs are applicable in a large population, which is also shown by the high number of translated versions. The DLQI is available in more than 110 languages (https:// www. cardi ff. ac. uk/ medic ine/ resou rces/ quali ty-of-life-quest ionna ires/ derma tology-life-quali ty-index). There are various validated translations for the SF-36 (https:// www. rand. org/ health-care/ surve ys_ tools/ mos/ 36-item-short-form. html), such as German, French or Japanese and also the IIRS has been translated into various languages, e.g. French or Chinese [7].

Information on interpretability and feasibility
Information on the distribution of scores in the study population was only given for the HidroQoL and the SCI. The results in the thesis of Kamudoni [18] showed a positive skew for the items towards higher response categories, whereas Gabes et al. [9] indicated negative skewness and evidence that the data were not normally distributed. Both reports showed ceiling effects for most of the items. According to Gabes et al. [9], 26-91% of the patients chose the highest response category. Floor effects were found by Kamudoni [18] for 13 items of the HidroQoL. Wheaton et al. [52] compared the distribution of scores in patients with hyperhidrosis and in a control group, showing a normal distribution for patients and positively skewed scores for controls. Small floor and ceiling effects were observed for the HDSM-Ax [16]. Other ceiling effects were only given by Nelson et al. [34] indicating some ceiling effects for Item 4 of the ASDD. The distribution of missing items was analysed for the HidroQoL, showing an increase in missing data towards the end of the questionnaire. However, no further structure in the missing data was apparent. Minimal important difference (MID) values of 3 to 4 were proposed for the HidroQoL by Kamudoni [18] and Gabes et al. [9]; differences in values are likely due to a more homogenous study population in the latter study. Hobart et al. [16] estimated meaningful change scores of the HDSM-Ax and stated that a change in the HDSM-Ax total score of one point represents a clinically meaningful change in axillary hyperhidrosis severity. Nelson et al. [34] did not calculate a MID value but referred to patients who achieved a reduction in weekly average scores on ASDD Item 2 of ≥ 4 points as responders to treatment. No information on response shift could be extracted of the included studies.
All PROMs are self-administered. There were no problems reported regarding the patient's comprehensibility or administration. Campanati et al. [3] explicitly mentioned that the DLQI does not require any specific intellectual abilities. For most questionnaires, it was stated that they have a short completion time, ranging from only a few minutes for the DLQI to approximately 8-10 min for the HQ. The HDSS can also be used by non-specialists. The HDSS and the SES are only single-item instruments. For the DLQI, HDSM-Ax, HidroQoL, HQLQ, IIRS and SCI, a simple summary score can be obtained by adding up all item scores. For the ASDD(-C), consisting of 2 to 4 questions, scores for the individual items can be calculated. For the HQ and SF-36, scores for their individual domains can be obtained. Only for the HS, a normalized score can be used by dividing the total score by the number of completed items. No information on the scoring system was given for the HHIQ. The DLQI is copyrighted, but can be used without further permission for routine clinical purposes (https:// www. cardi ff. ac. uk/ medic ine/ resou rces/ quali ty-of-life-quest ionna ires/ derma tology-life-quali ty-index). Two other questionnaires are copyrighted, the ASDD(-C) and the HidroQoL, but no information about their terms of use was found. The SF-36 is freely available on the RAND homepage and, except for a credit line, requires no further permission for use (https:// www. rand. org/ health-care/ surve ys_ tools/ mos/ 36-item-shortform. html). A copy of the HHIQ was kindly provided by the responsible company for an evaluation within this paper. For the remaining PROMs, no information could be retrieved regarding their accessibility.

Summary of findings (SoF) tables and recommendation
The summarized results per measurement property per PROM are presented in Table 7. The overall ratings for reliability of the HDSS and for hypotheses testing for construct validity of the ASDD-C and the SCI were inconsistent since not all studies reported ICCs ≥ 0.7 for the HDSS and only around half of the a priori hypotheses could be confirmed for the ASDD-C and the SCI. Structural validity and internal consistency of the SCI were downgraded due to indirectness since one relevant study was partly performed in another population of interest (student population). The HQLQ and the SF-36 showed inconsistent results regarding hypotheses testing for construct validity. We decided to base the overall rating on the majority of the results and therefore downgraded the quality of evidence for one level due to inconsistency. For the items 3 and 4 of the ASDD, we found an inconsistent overall rating since only one-third of the a priori hypotheses could be confirmed with the data extracted. There was one study in which test-retest reliability was assessed twice for the HidroQoL, with and without an intervention between the two measurements. Results were presented for both subgroups separately. The insufficient reliability rating found for the intervention-subgroup might be explained by treatment effects and should not be overestimated. Furthermore, for the HidroQoL and the HDSM-Ax, very few hypotheses for construct validity and responsiveness could not be confirmed. However, the corresponding  correlations were only 0.03-0.09 above the fixed threshold and therefore not classified as "inconsistent". The results of the SoF Tables were used to recommend the most appropriate PROM. The final recommendations according to the COSMIN guidelines [31] for all four PROMs are presented in Table 8.

Discussion
This systematic review provides a first synthesis of the methodological assessment of the measurement properties of 13 PROMs used in patients with hyperhidrosis following an established methodology. As Wade et al. [50] have already stated in their review in 2017, a high level of information about the development and the validation of PROMs is necessary to be able to appropriately judge them. In this study, three PROMs, the HidroQoL, the HQ and the SCI, showed evidence for content validity and moderate-to highquality of evidence for internal consistency and therefore can be further recommended for use according to the COS-MIN criteria. Results obtained with these PROMs can be seen as trustworthy. Those PROMs are assessing different constructs. The HidroQoL and the HQ are both measuring health-related QoL, whereas the SCI is measuring sweating cognitions, i.e. types of dysfunctional negative beliefs in hyperhidrosis.
Especially construct validity of the SCI should be further assessed since we found an inconsistent overall rating for this measurement property. An evaluation of the responsiveness and reliability of the SCI is also needed, as there are still gaps in evidence. In addition, further validation studies should be conducted within the target population (patients with hyperhidrosis) to strengthen the quality of evidence and avoid downgrading due to indirectness. Regarding the two QoL-PROMs, the HidroQoL currently seems to be more convincing than the HQ. This is based on a higher quality of evidence of the HidroQoL regarding content validity and internal consistency as well as a larger study population where these results are based on. Moreover, the HidroQoL lacked only evaluations of three measurement properties, measurement error, criterion validity and cross-cultural validity. Many other measurement properties can already be considered as sufficient on a high quality of evidence level. The HQ met the requirements for a recommendation according to the COSMIN criteria; however, evidence gaps remain, for instance with regard to structural validity with an indeterminate overall rating. Especially in clinical trials, PROMs should be reliable, valid, responsive and feasible and therefore, a comprehensive assessment of the measurement properties is crucial.
Many other PROMs such as the ASDD(-C), DLQI, HDSS, HHIQ, HQLQ, HS, IIRS and the SES could still possibly be recommended for use, but further validations studies are needed. The HDSM-Ax and the SF-36 cannot be recommended for use in patients with hyperhidrosis since we found moderate-and high-quality evidence for insufficient measurement properties (structural validity for the HDSM-Ax and construct validity and responsiveness for the SF-36). The HDSM-Ax did not fit the Rasch model what was shown in two independent studies and what led to an insufficient structural validity rating. The poor performance of the SF-36 could also be a consequence of the fact that generic as well as skin-specific PROMs do not comprehensively reflect the specific needs of patients with hyperhidrosis. This assumption is also reflected for instance in the insufficient content validity rating of the DLQI. Importantly, future validation studies should look at the interpretability and feasibility of PROMs since only little information was available for the currently included PROMs.

Strengths and limitations of this systematic review
In this systematic review, we identified several strengths: an a priori registered protocol, the use of a comprehensive and sensitive search filter, the search in three large databases (MEDLINE, EMBASE and Web of Science), several smaller databases and reference lists of the included studies, the application of predefined eligibility criteria and the use of the COSMIN Risk of Bias checklist to assess the methodological quality of the included studies. Two independent reviewers (MG und GK) carried out every step of the review process to ensure consistency. One eligible paper where two of the authors (MG and CA) were conflicted was evaluated by two unconflicted reviewers (GK and CT). Discrepancies were discussed and resolved within the whole research team. A potential limitation of this systematic review is the fact that not all reference lists of relevant full-texts were searched for further eligible studies (backward search). We have not performed a forward search either.

Conclusion
This systematic review suggests that currently three PROMs, the HidroQoL, the HQ and the SCI, can be recommended for use in patients with hyperhidrosis. To strengthen and extend the evidence of those measurement instruments, future validation studies should focus on those PROMs.
Funding Open Access funding enabled and organized by Projekt DEAL. The study was funded by a grant from Dr. August Wolff GmbH & Co. KG Arzneimittel.

Declarations
Conflict of interest Christian Apfelbacher has received institutional funding from Dr. August Wolff GmbH & Co. KG Arzneimittel for research on the measurement properties of the HidroQoL and consultancy fees from Dr. August Wolff GmbH & Co. KG Arzneimittel, Sanofi Genzyme and LEOPharma. MG is lead author and CA is senior author of a validation paper of the HidroQoL which is included in this systematic review.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.