Quantification of breast lymphoedema following conservative breast cancer treatment: a systematic review

Purpose Breast lymphoedema is a possible side effect of breast conserving surgery, but it is poorly understood. This is due, in part, to difficulty assessing the breast. This systematic review described outcome measures that quantify breast lymphoedema signs and symptoms and evaluated the measurement properties for these outcome measures. Method Seven databases were searched using terms in four categories: breast cancer, lymphoedema and oedema, clinician reported (ClinROM) and patient reported outcome measures (PROM) and psychometric and measurement properties. Two reviewers independently reviewed studies and completed quality assessments. The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) methodology was used for studies including measurement property evidence. Results Fifty-six papers were included with thirteen questionnaires, eight patient-reported rating scales, seven physical measures, seven clinician-rating scales and four imaging techniques used to quantify breast lymphoedema. Based on COSMIN methodology, one ClinROM had sufficient reliability, ultrasound measuring dermal thickness. Tissue dielectric constant (TDC) measuring local tissue water had promising reliability. Four questionnaires had sufficient content validity (BLYSS, BLSQ, BrEQ and LYMQOL-Breast). Conclusions Ultrasound is recommended to reliably assess breast lymphoedema signs. No PROM can be recommended with confidence, but BLYSS, BLSQ, BrEQ and LYMQOL-Breast are promising. Further research is recommended to improve evidence of measurement properties for outcome measures. Implications for Cancer Survivors There are many approaches to assess breast lymphoedema, but currently, only ultrasound can be recommended for use, with others, such as TDC and questionnaires, showing promise. Further research is required for all approaches to improve evidence of measurement properties.


Introduction
Breast conserving surgery with adjuvant radiotherapy is a common treatment regimen for women with early breast cancer as it leads to better quality of life [1] and improved survival to that of women undergoing mastectomy [2,3].Unfortunately, breast lymphoedema can be a painful and distressing complication of the breast conserving treatment regime [4,5].Breast lymphoedema is not well understood and poorly addressed by health professionals [6].The reported incidence of breast lymphoedema varies considerably across studies, ranging from 0 to 90% due to variances in the definition and tools selected to diagnose and quantify breast lymphoedema [7,8].
Assessments of lymphoedema in the limbs have been validated [9][10][11][12][13]; however, it is unknown if those tools can be used in the assessment of breast lymphoedema.Measurement of lymphoedema in the breast differs to that in the arm as the breast is the direct recipient of the surgical and radiotherapy treatment.These treatments change the volume and tissue architecture of the affected breast, reducing the usefulness of measuring the breast pre-operatively or measuring the contralateral breast as a direct comparator.Changes to the breast caused by surgery and radiotherapy may also make it more difficult to distinguish between treatment impacts and those changes caused by presence of breast lymphoedema.Furthermore, self-reported questionnaires for lymphoedema have tended to focus on and be tested with people with limb lymphoedema rather than on populations with breast or midline lymphoedema [11,13].
This systematic review describes what outcome measures are available to quantify breast lymphoedema signs and symptoms following breast conserving surgery and evaluates the evidence underpinning the measurement properties for these assessment tools or approaches, where available.

Methods
The systematic review was registered with the International Prospective Register of Systematic Reviews on 05 July 2020 (PROSPERO registration no: CRD42020183851).
The review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [14] and Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) guideline for systematic reviews [15][16][17].

Database search
Five electronic databases were searched including Medline, Embase, CINAHL, Web of Science and Scopus as well as Trove and ProQuest Dissertations & Theses Global for theses that explored breast lymphoedema measurement.Searches were conducted with support from a librarian at the University of Sydney.Search terms were grouped into four categories relating to (i) breast cancer; (ii) lymphoedema and oedema; (iii) clinician-reported (ClinROM) and patientreported outcome measures (PROM); and (iv) psychometric and measurement properties.The full Medline search strategy is described in Online Resource 1.The initial search was conducted on 19th April 2020 and repeated on 19th August 2021 and 14th February 2022 to check for recently published articles.There was no restriction on date of publications, but only articles published in English were included.

Selection criteria
Studies were included in which an assessment was used to quantify breast lymphoedema and related symptoms (e.g.peau d'orange, induration, hardness, heaviness, discomfort, skin redness) in adult women following breast conserving surgery (lumpectomy/wide local excision) for breast cancer.Women may have been treated with chemotherapy, radiotherapy and/or immunotherapy.Theses were included when publicly available online or provided by authors following request.Studies with men, women under 18 years old, women treated with mastectomy and/or reconstruction and assessment for lymphoedema in areas of the body other than the breast were excluded.Studies only using toxicity or cosmesis rating scales (e.g.CTCAE, LENT SOMA, National Cancer, Institute Canada-Common Toxicity Criteria 2, Harvard Breast Cosmesis Scale, Outcome by American Society for Radiation Oncology (ASTRO) Consensus Panel (CP) group and acute and late RTOG scales) were also excluded.

Study selection
Duplicates were removed using electronic and manual review in EndNOTE (version X9) with additional duplicates identified when titles were imported to Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia (available at www. covid ence.org).Titles and abstracts, followed by full text papers, were independently screened by two reviewers (NF, SK, CL).Reference lists of included full text papers were examined to identify additional appropriate studies.When disagreements on study eligibility occurred, consensus was reached through discussion as a team.

Data extraction and analysis
Two reviewers independently extracted data (SK and NF or CL and NF) using Covidence data extraction template (version 1).Information extracted included study design, participant demographics, treatment history and the stage at which the assessments took place in the participants' cancer treatment timeline (e.g.time since diagnosis, surgery and/or radiotherapy).The purpose of the assessment (e.g.assessing treatment side effects, quality of life or measuring outcomes from an intervention to treat breast lymphoedema) and details pertaining to the measurement properties of the tools were also extracted where available.If there were missing data or data from participants following breast conserving surgery or breast lymphoedema were not presented separately, authors were contacted requesting this data.

Quality assessment
Several tools were used to assess the quality and risk of bias of included papers due to the variety of study designs included in this review.All assessments were completed by two reviewers (NF, SK, CL) independently and all disagreements were resolved through discussion until agreement was made.Included papers that had been authored by SK were assessed by other team members (NF and CL) to prevent potential bias in quality assessment.
The Cochrane tool for assessing risk of bias (RoB) in randomised trials, version 2 (RoB 2) [18], was used for the randomised controlled trials, and the quality assessment for cohort or non-randomised experimental studies was completed using the National Heart, Lung and Blood institute (NHLBI) Quality Assessment Tool for before-after (prepost) studies with no control group (URL: www.nhlbi.nih.gov/ health-topics/ study-quali ty-asses sment-tools (accessed 26 October 2020).
Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) Risk of Bias checklist adapted for clinical measures (ClinROMs) [15] was completed for studies including measurement properties for clinician rating scales, measurement device or imaging tool.The COSMIN ROB checklist for patient reported outcome measures (PROMs) [16] was used for studies including measurement properties for PROMs.The studies were assessed separately against each standard using the fourpoint scale (very good, adequate, doubtful or inadequate), and then quality was rated using the "worst-score-counts method" [17].The quality of PROM development was evaluated first, followed by the quality of content validity studies, and these results were combined to rate the content validity overall based on relevance, comprehensiveness and comprehensibility for breast lymphoedema measurement in women following conservative breast cancer treatment.Next, the other eight measurement properties were evaluated.Finally, the overall quality of evidence for each tool was graded using the modified GRADE (Grading of Recommendations, Assessment, Development and Evaluation) approach incorporating the assessment of risk of bias, inconsistency, imprecision and indirectness to grade the quality of evidence as high, moderate, low or very low quality [17].
Recommendations for the use of tools or approaches were categorised, based on the evidence, as (A) recommended (PROM, evidence of sufficient content validity and internal consistency; ClinROM, evidence of sufficient face validity and reliability), (B) promising (additional validation studies required, not categorised as A or C) or (C) insufficient (high quality evidence of insufficient measurement property) [17].

Data synthesis
A narrative synthesis of the findings from the included studies was performed for the breast lymphoedema measurement tools or approaches and available measurement properties.Meta-analysis was not conducted as the review was of the assessment tools, not treatment outcomes or efficacy of treatment.
Fourteen studies used a combination of ClinROM and PROM, including all but one [52] of the breast lymphoedema interventions studies.ClinROMs alone were used in 23 studies, and PROMs alone in 20 studies.Most studies (62.5%) used at least two different tools, with one study using seven [59].
Measurement locations for the different tools and techniques varied and were either taken of the entire breast, quadrants or one or two selected locations on the breast.The entire breast was assessed for dermal backflow (ICG), volume (3D-SI, anthropomorphic, and MRI) and clinical rating scales.Ultrasound (dermal thickness), BIS, tissue resistance and TDC measures were performed both in breast quadrants [35, 36, 53, 57-60, 62, 73], two breast sites [21][22][23][24] or a single measurement site [52,54,55,61,72,74].TDC breast quadrant measures were also combined and reported as averages [19,20,51,53,59,61] with the unaffected breast that was assessed to determine ratios.BIS measures were also reported as a ratio for the affected breast compared to the unaffected breast [54,74].

COSMIN summary: ClinROMs
The COSMIN Risk of Bias tool adapted for clinical measures (ClinROMs) [15] was completed for five clinical assessment tools [31,57,[59][60][61][62][63], from eight studies that evaluated measurement properties.Evaluation of face validity, reliability and measurement error were conducted for all tools (Table 2).Criterion validity was not evaluated as there is no gold standard for measurement of breast lymphoedema.Measurement properties for dermal thickness measured with mammography as well as imaging signs and breast volume measured using anthropomorphic or MRI techniques were not reported in the studies meeting the inclusion criteria for this study.Only a single study presented measurement properties for clinician rating scales [31].Face validity was evaluated as sufficient for dermal thickness measurement using ultrasound [57,59,60], local tissue water measured with TDC [59,61], breast volume measurement using 3D-SI [63], extracellular fluid measured with BIS [62], tissue resistance measured with pitting [61] and clinician rating scales of breast lymphoedema signs [31].Tonometry face validity was evaluated as being indeterminate [59,62] and having insufficient structural validity as this tool could not detect a difference between affected and unaffected breasts or lymphoedematous and non-lymphoedematous breasts [59].Structural validity was not described for any other ClinROMs.
Reliability was rated as sufficient for measurement of dermal thickness for both the image capture [60] and image measurement [57,59] using ultrasound, with a GRADE rating of moderate quality of evidence, due, in part, to low combined sample size (< 100) of studies that investigated it.Reliability was also evaluated as sufficient for TDC measuring percentage water content (PWC) ratio (affected:unaffected breasts) [59,61]; however, it received a GRADE rating of low quality evidence due to imprecision (combined sample size for two studies < 50).Reliability of a clinician rating scale was indeterminate from a single study with GRADE rating downgraded to low quality due to risk of bias [31].Pitting test reliability was rated as insufficient based on results from a single study with a GRADE of low quality due to small sample size (< 50) [59].Reliability results were not available for tonometry, BIS, ICG or breast volume measurement.
Measurement error for all assessment tools was graded as indeterminate as minimally important change (MIC) has not been defined for any breast lymphoedema tools.Both dermal thickness assessed by ultrasound [57,59,60] and TDC [59,61] had values for standard error of measurement and limits of agreement to allow some interpretation of results, with quality of evidence for measurement error graded as moderate for dermal thickness assessed by ultrasound and low for TDC, both of which were downgraded for the same reasons described for reliability respectively.Coefficient of variation was reported for tonometry [62], BIS [62] and breast volume measured by 3D-SI [63] with the quality of evidence for these tools graded as very low.There was no measurement error information for pitting test [59,61] or clinician rating scales [31].
Based on the information provided, dermal thickness measurement assessed by ultrasound is recommended (Category A) for the assessment of breast lymphoedema as it has both have sufficient face validity and evidence for sufficient reliability with moderate quality of evidence.The other assessment tools, including TDC, BIS, tonometry, 3D-SI, clinician rating scales and the pitting test, are categorised as promising  Cut off values ICC: < 0.4 weak; 0.4-0.75moderate; 0.75-0.9strong; > 0.9 very strong (McDowell 1996) *Significant: p value < 0.05.Cut off values Cronbach alpha coefficients: < 0.5 unacceptable; 0.5-0.6 weak; 0.6-0.7 acceptable; 0.7-0.9good; > 0.

COSMIN-PROM
COSMIN for PROM [16] was used to evaluate nine PROMs from eleven papers [38,58,59,[64][65][66][67][68][69][70][71] meeting the inclusion criteria for this systematic review as well as two additional original validation papers in mixed breast cancer populations [75,76] (Table 3).All included PROMs were evaluated as having adequate face validity; however, all PROM development studies lacked the detail required by the COSMIN methodology to score above a rating of doubtful quality.The BLYSS, BrEQ and EORTC-BR23 received a doubtful rating for quality of PROM design and the pilot study and a doubtful rating overall for PROM development.These three questionnaires consulted patients for concept elicitation using a qualitative approach but lacked detail on the interview and analysis process.Authors for the BSLQ and LYMQOL-Breast involved patients using quantitative methods for concept elicitation and to confirm comprehensibility and comprehensiveness but only performed this with a small sample of women (n = 20) resulting in an inadequate quality rating for PROM development.
The BCTOS-22/12/13 and LSIDS-T were all rated as inadequate for PROM design as they did not involve patients, either relying on literature review and experience of the authors (BCTOS), or only involving professionals in PROM design and pilot testing (LSIDS-T).A single exception was the pilot study testing the BCTOS-13 [70].This study involved patients to rate comprehensibility and relevance resulting in a doubtful rating for quality for this pilot study.
The content validity studies for the nine PROMs similarly only achieved a maximum rating of doubtful quality.The BLYSS, LYMQOL-Breast, BLSQ, EORTC-BR23 and BrEQ were all rated as doubtful quality for relevance, comprehensiveness and comprehensibility due to lack of detail on the conduct and analysis of patient or professional interviews (BLYSS/EORTC-BR23/BrEQ) or only surveys being used (BLSQ, LYMQOL-Breast).The original studies for BCTOS were rated as inadequate quality for content validity.However, the German [69] and Brazilian-Portuguese [68] translations of BCTOS-22 did ask patients regarding comprehensibility but was rated as doubtful quality, due to limited information on analysis of this process.LSIDS-T content validity was also rated as inadequate due to lack of patient involvement in the content validity study.
Overall, BLYSS had sufficient quality of evidence with moderate grade evidence for content validity.The BrEQ, BLSQ and LYMQOL-Breast were also rated as sufficient with moderate GRADE evidence, with downgrading due to either lack of input from professionals (BrEQ) or use of quantitative methods in only a small sample for patient feedback (BLSQ/ LYMQOL).The EORTC-BR23 had sufficient quality of evidence with low GRADE evidence due to indirectness of the sample used.LSIDS-T and BCTOS were both rated as indeterminate with very low GRADE evidence for content validity.
Six of the nine measurement properties were reported for the included PROMs (Table 3).Construct validity was evaluated for all questionnaires with a sufficient rating for six questionnaires (BrEQ, BCTOS-12, BLSQ, LYMQOL-Breast, EORTC QLQ BR23 and LSIDS-T).Internal consistency was evaluated for six questionnaires (BrEQ, BCTOS-22, BCTOS-12, BCTOS-13, LYMQOL-Breast, EORTC-BR23, LSIDS-T) with two measures receiving a sufficient rating (BCTOS-22, BCTOS-12) with high GRADE evidence.Reliability was evaluated for seven questionnaires (BrEQ, BLYSS, BCTOS-22 [Brazilian-Portuguese], BCTOS-13, BLSQ, LYMQOL-Breast, EORTC-BR23), with five achieving a sufficient rating (BrEQ, BLYSS, BCTOS-22, BCTOS-13, BLSQ), but the GRADE was low or very low for all.Structural validity was evaluated for four questionnaires (BCTOS-22, BCTOS-13, BCTOS-12, LSIDS-T), but no measure achieved a sufficient rating for this measurement property.The BCTOS-22 had an insufficient rating with high GRADE evidence.Responsiveness was only available for two questionnaires (BCTOS-22 and EORTC-BR23 [Spanish and Dutch versions]), with the EORTC-BR23 [Spanish and Dutch versions] achieving a sufficient rating with low GRADE evidence, and the BCTOS-22 rated as insufficient with very low GRADE evidence.Measurement error was presented for just one questionnaire (LYMQOL-Breast) and was rated as indeterminate with low GRADE evidence.Cross-cultural validity, criterion validity and measurement invariance were not presented for any questionnaires.There was no gold standard to assess criterion validity for PROMs.

Discussion
The signs and symptoms of breast lymphoedema were quantified using a variety of approaches, including 13 patientreported questionnaires, eight patient-reported rating scales, seven types of physical measures, seven clinician rating scales and four imaging techniques.Dermal thickness measured with ultrasound is recommended for assessment of breast lymphoedema, but further studies are required to establish the MCID and responsiveness (the validity of a change score).A breast lymphoedema PROM, however, cannot be recommended at this time as the reported details for development and measurement properties were lacking for all questionnaires.Nevertheless, the symptom-based PROMs, BLYSS, BLSQ and BrEQ (Dutch) and the QOL PROM LYMQOL-Breast are promising, with sufficient content validity.However, all tools require additional appropriately powered studies with women with, or at risk of breast lymphoedema to improve the measurement property evidence.
To fully assess the impact of breast lymphoedema, more than one assessment tool is suggested [7,54,77].Breast lymphoedema is complex, with no agreed upon definition of the condition and with the presence of oedema in the breast influenced by treatment factors including surgery, radiotherapy and chemotherapy [78].Measurement of signs and symptoms of breast lymphoedema, including both clinician-and patient-reported outcomes, would provide a comprehensive assessment of the underlying changes occurring.Forty-six percent of the included studies in this systematic review assessed more than one measurement outcome to quantify breast lymphoedema with 14 reporting both patient-reported and clinicianreported outcomes [31, 36, 40, 41, 47, 51, 53-56, 58, 59, 72, 73].Inclusion of both ClinROMs and PROMs can also highlight the discord between patient and clinician reported outcomes, such as has been found in arm lymphoedema [12,79], For example, measurements of dermal thickness provided information on the secondary tissue changes that can occur within the oedematous breast, but this does not necessarily relate to symptoms experienced by women [60].Furthermore, due to the lack of a gold standard to assess the tools, we are unable to determine which tool is the best.Therefore, use of multiple tools, including those tools with the best available measurement property evidence, are recommended.
The practicality and expense of tools to quantify breast lymphoedema is a consideration for clinical usefulness.Questionnaires are the least expensive option, but responsiveness has only been established in EORTC-BR23 in non-English speaking samples.Ultrasound is readily available in hospitals and imaging centres but may be less accessible in private clinics where lymphoedema therapists often treat patients with lymphoedema.Comparably, TDC is a small, portable tool that could prove useful in clinical settings, but cost may still be prohibitive for small clinics at approximately $6000 AUD for a unit.Unfortunately, two reliable approaches that are widely used for limb lymphoedema, volume measurement and BIS [12], do not currently have sufficient evidence for breast lymphoedema assessment.
This review highlighted the need for standardised assessment protocols for the ClinROMs as there was heterogeneity across many of the studies on the measurement locations on the breast, with some studies reporting individual quadrant results while others only reporting overall means/ratios.For example, findings for dermal thickness measured with ultrasound and TDC may have been influenced by the location at which the measurement was taken.In healthy breasts, dermal thickness is greater in the inferior and medial breast quadrants [57,59]; similarly, TDC varied across location in healthy breasts as well as unaffected breasts [59,73].Other factors such as age and menopausal status [80] and scar tissue [81] may also impact on breast signs but have yet to be investigated in the context of women with breast lymphoedema.Inclusion of these data may become important in the future in interpreting the findings.
This review identified significant gaps for the measurement properties of breast lymphoedema tools.The COSMIN framework for determining ratings for measurement properties is very comprehensive and relies on studies thoroughly reporting the study design to avoid poor ratings.Nevertheless, overall, there was a lack of high-quality evidence of measurement properties for breast lymphoedema tools.Dermal thickness measured with ultrasound had the most evidence but still lacked evidence of measurement error due to no established MCID for these or any breast lymphoedema tools.Four questionnaires (BrEQ, BLYSS, BLSQ, LYMQOL-Breast) were promising but require further investigation and larger sample sizes to improve overall quality of evidence for their measurement properties and overall quality of evidence.It is only after those investigations can recommendations to be made about their usefulness in assessment of breast lymphoedema.

Conclusion
The findings from this systematic review reveal that ultrasound has the best measurement properties, including information on measurement error, but MIC has not yet been established.Of the PROMS, BLYSS, BrEQ and BLSQ for symptom severity and LYMQOL-Breast for measurement of QOL are promising tools to assess women following conservative breast cancer treatment.Well-designed and reported studies on measurement properties for all tools are required to improve quality of evidence in this emerging area of assessment.Based on the current level of evidence, a combination of objective and subjective measurements is recommended to quantify the full manifestation of breast lymphoedema signs and symptoms.

Systematic Review Registration
This systematic review was registered in PROSPERO (CRD42020183851).
Author contribution All authors contributed to the systematic review conception and design.Titles, abstract and full paper review, data extraction and analysis were performed by NF, SK and CL.SK, ED and KS are PhD supervisors of NF.The first draft of the manuscript was written by NF, and all authors commented on previous versions of the manuscript.All authors read and approved the final manuscript.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions This work was supported the Joyce Anderson and Betty Schofield Grant -Awarded to KS and Westmead Breast Cancer Institute.NF received a PhD scholarship from this grant.

Kilbreath
96 (0.94-0.97) a ICC = 0.85 (0.82-0.88) bCronbach's α = 0.995 c ) as they do not have sufficient evidence for reliability, and the studies are of low or very low quality.No tools were categorised as insufficient (Category C).

Table 3
Patient Reported Outcome Measure (PROM) COSMIN ratings BREQ, Breast Edema Questionnaire; BLYSS, Breast Lymphoedema Symptom Severity; BCTOS, Breast Cancer Treatment Outcome Scale; BLSQ, Breast Lymphoedema Symptom Questionnaire; LYMQOL-Breast, Lymphoedema Quality of Life tool-Breast; EORTC-QLQ BR23, European Organization for Research and Treatment of Cancer Breast Cancer-Specific Quality of Life Questionnaire; LSIDS-T, Lymphedema Symptom Intensity and Distress Survey-Trunk ** Indirect mixed breast cancer population; a Spanish and Dutch versions only COSMIN ratings, + ; sufficient rating, -; insufficient rating, ?; indeterminate rating, () Grading of overall quality of evidence based on modified GRADE approach; H, high; M, moderate; L, low; VL, very low; NR, not reported.Bold denotes measurement criteria that has sufficient rating and a GRADE of moderate to high