Urinary Leukotriene E4 as a Biomarker in NSAID-Exacerbated Respiratory Disease (N-ERD): a Systematic Review and Meta-analysis

Purpose of Review Non-steroidal exacerbated respiratory disease (N-ERD) currently requires aspirin challenge testing for diagnosis. Urinary leukotriene E4 (uLTE4) has been extensively investigated as potential biomarker in N-ERD. We aimed to assess the usefulness of uLTE4 as a biomarker in the diagnosis of N-ERD. Recent Findings N-ERD, formerly known as aspirin-intolerant asthma (AIA), is characterised by increased leukotriene production. uLTE4 indicates cysteinyl leukotriene production, and a potential biomarker in N-ERD. Although several studies and have examined the relationship between uLTE4 and N-ERD, the usefulness of uLTE4 as a biomarker in a clinical setting remains unclear. Findings Our literature search identified 38 unique eligible studies, 35 were included in the meta-analysis. Meta-analysis was performed (i.e. pooled standardised mean difference (SMD) with 95% confidence intervals (95% CI)) and risk of bias assessed (implementing Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Cochrane DTA)). Data from 3376 subjects was analysed (1354 N-ERD, 1420 ATA, and 602 HC). uLTE4 was higher in N-ERD vs ATA (n = 35, SMD 0.80; 95% CI 0.72–0.89). uLTE4 increased following aspirin challenge in N-ERD (n = 12, SMD 0.56; 95% CI 0.26–0.85) but not ATA (n = 8, SMD 0.12; CI − 0.08–0.33). This systematic review and meta-analysis showed that uLTE4 is higher in N-ERD than ATA or HC. Likewise, people with N-ERD have greater increases in uLTE4 following aspirin challenge. However, due to the varied uLTE4 measurement and result reporting practice, clinical utility of these findings is limited. Future studies should be standardised to increase clinical significance and interpretability of the results. Supplementary Information The online version contains supplementary material available at 10.1007/s11882-022-01049-8.


Introduction
NSAID-exacerbated respiratory disease (N-ERD) or aspirin exacerbated respiratory disease (AERD), formerly known as aspirin-intolerant asthma (AIA) and Samter's triad, is a phenotype of asthma characterised by increased leukotriene production and leukotriene driven inflammation [1]. N-ERD is the name used henceforth as it is the term accepted in current clinical practice [2••].
N-ERD is clinically characterised by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and exacerbation of respiratory symptoms on exposure to substances having cyclo-oxygenase 1 (COX-1) inhibiting activity [1, 3•]. The prevalence of N-ERD is reported to be 7% of asthmatics overall and approximately 15% in those who have severe asthma [4]. However, it occurs in 30-40% of those with asthma and nasal polyposis [5]. Accurate diagnosis of Malcolm Marquette and Bhavesh V. Tailor equally contributed to this work.
This article is part of the Topical Collection on Asthma this asthma phenotype requires provocation testing, which involves nasal, oral, or inhaled challenge with aspirin [6,7]. These procedures, whilst being clinically validated, do carry some inherent risks including significant bronchospasm and are thus not recommended for patients with severe airways disease. For these patients, diagnosis of N-ERD has typically relied on medical history alone, which increases the risk of misdiagnosing N-ERD, and the likelihood of providing inappropriate health management, by withholding the use of this class of medication in non-NERD individuals [2••]. Consequently, it is considered highly desirable to identify a robust, accessible, and safe biomarker of N-ERD.
Given that leukotriene status is heightened in N-ERD, there is significant interest in establishing their utility as candidate biomarkers for the diagnosis and disease/treatment monitoring in N-ERD. More specifically, urinary leukotriene E4 (uLTE 4 ) excretion has been identified as a surrogate marker of leukotriene production in vivo and is preferred to other leukotrienes (e.g. Leukotrienes B 4, C 4, and D 4 ), which have a short half-life and are difficult to measure [8,9]. To this extent, Hagan et al. [10] reviewed the role of uLTE4 in the diagnosis of N-ERD in 2016. This is the only previous systematic review, of 10 studies, and showed uLTE 4 as a biomarker for N-ERD. However, the inclusion criteria for that review [10] required the availability of primary level data to carry out the necessary analysis, and a proportion of full text manuscripts were not available to the authors.
Therefore, in this present study we sought to update the work carried out by Hagan et al. [10], whilst reviewing and analysing the broader literature on this subject to compare the baseline uLTE 4 levels in patients with N-ERD, aspirin tolerant asthma (ATA), and healthy control (HC) subjects. In addition, we aimed to determine the impact of aspirin challenge testing on uLTE 4 concentration in N-ERD and ATA individuals and the diagnostic accuracy of baseline uLTE 4 measurements to predict aspirin intolerance in patients with asthma. In keeping with Hagan et al. [10], we analysed the different assays separately, given the variations in these techniques.

Literature Search
The protocol for the review was published in the PROS-PERO database (CRD42021228674) and developed with reference to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2020 guidelines [11]. A systematic search of MEDLINE, EMBASE, EMCARE, CINAHL and PsycINFO was undertaken by a medical librarian in conjunction with one reviewer (B.V.T.) from database inception to 31st December 2021. In contrast to the previous review, a comprehensive search strategy was implemented which captured all studies reporting baseline uLTE4 levels in N-ERD and ATA groups, irrespective of whether these studies reported primary level data to answer our primary research question. No filters were used. The strategies were peer reviewed by a second reviewer (M.M.) prior to final execution of the search. Reference lists from included studies and review articles that were identified through the database searches were hand searched to identify additional articles for possible inclusion. Both Healthcare Databases Advanced Search (HDAS) and Rayyan were used to identify duplicate records and additional duplicates were manually removed before screening for inclusion. Articles were screened by two independent reviewers (B.V.T., M.M.). Disagreements between the reviewers were resolved through discussion. The full search strategy can be found in Online Resource 1.

Study Eligibility
The following medical diagnosis terminologies, i.e. N-ERD/ AERD, Samter's triad, and AIA, have been interchangeably used in the literature to describe the population of interest and were included within the search criteria to ensure completeness of data capture and synthesis.
Original research studies recruiting human subjects with asthma utilising uLTE 4 as a biomarker (index test) to differentiate N-ERD from ATA were considered for inclusion. Diagnosis of N-ERD required at least one of the following two criteria to be met (reference standard): (a) positive aspirin challenge, either historic (case-control study design) or performed prospectively (singe-gate design); (b) unequivocal history of asthma exacerbation following ingestion of aspirin and/or other NSAIDs. There were no age restrictions.
The following exclusion criteria were applied: publication types other than primary studies (review articles, case reports, conference abstracts, book chapters and letters to the editor); papers published in languages other than English if a translation could not be found. Studies concerning aspirin challenge testing of asthmatic patients were excluded if baseline (pre-challenge) uLTE 4 data was not reported in the published article, in supplementary material, or on request from the corresponding author of the publication.

Study Outcomes
The primary study outcome was to determine whether uLTE 4 concentration at baseline in N-ERD is different from ATA and (non-asthmatic) HC subjects, using a betweengroup comparison. Secondary outcomes were (a) to determine the diagnostic accuracy of baseline uLTE4 measurements to predict aspirin intolerance in patients with asthma; and (b) to determine the change in uLTE 4 concentration in N-ERD and ATA following aspirin challenge testing.

Data Extraction
Two reviewers (B.V.T., M.M.) independently extracted the following data from included studies: author(s); year of publication; country of origin; source of funding; demographic characteristics (n, sex, age); clinical characteristics (inclusion/exclusion criteria, co-morbidities, definition of asthma, baseline pulmonary function); index test (method of uLTE 4 analysis, original units, nature of urine collection); reference standard (clinical history/aspirin challenge/both, criteria for N-ERD); mean and standard deviation (SD) of uLTE 4 at baseline for N-ERD, ATA and HC; diagnostic test accuracy (if reported-area under curve, cut-off value, sensitivity, specificity, positive predictive value, negative predictive value); mean and SD of uLTE 4 following aspirin challenge testing for N-ERD and ATA (if performed). Two attempts at requesting missing data from the corresponding authors of included studies were made by contacting them via e-mail. Disagreements in data extraction were resolved through discussion.
If relevant data concerning baseline and/or post-challenge uLTE 4 were presented in published figures but not specified as summary data in the accompanying text or supplementary materials, the underlying numerical data was extracted from relevant figures using WebPlotDigitizer (v4.4, California, USA), a web-based semi-automated extraction tool [12].

Risk of Bias Assessment
A modified version of the QUADAS tool from the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy was used to assess the methodological quality of included studies [13]. This was performed independently by two reviewers (B.V.T., M.M.), with disagreements resolved through discussion.

Data Synthesis and Meta-analysis
A descriptive synthesis of included studies was performed and structured around the review objectives. Studies reporting the mean and SD of uLTE 4 at baseline (± postchallenge) for N-ERD, ATA, and HC were included in our meta-analysis. If the extracted data were described as the median with range, or the median with interquartile range, then the data were converted to mean and SD using established approximation methods [14]. Data presented in separate subgroups were combined using established formulae from the Cochrane Handbook for Systematic Reviews of Interventions [15]. Pooled standardised mean difference (SMD) and 95% confidence intervals (CI) were calculated.
We investigated the presence of statistical heterogeneity among included studies by using the I 2 test. The randomeffects model was used if there was significant heterogeneity (I 2 > 50%), otherwise the fixed-effects model was used to combine the results. To explore possible sources of heterogeneity, meta-regression analysis was performed, with variables including publication year, country of study origin, sample size, male percentage, and baseline lung function. Any p values of < 0.05 were considered statistically significant.
In a change to the planned data synthesis as registered in PROSPERO, summary receiver-operating characteristic (SROC) modelling was not performed since individual data points were largely missing from included studies. Hence, evaluation of test diagnostic accuracy was not possible.
All data were extracted and stored in an Excel data file (Microsoft Excel for Mac; Microsoft Corporation, USA). Review Manager version 5.4 (The Cochrane Collaboration, Copenhagen, Denmark) and R software version 4.0.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for conducting the meta-analysis.

Study Selection
A total of 660 articles were identified [December 2021], with 547 article titles and abstracts reviewed following de-duplication. Of these, 491 articles were ineligible for full-text review. A total of 38 eligible full-text articles were reviewed ( Fig. 1). Each article described a unique study. We performed qualitative synthesis of all included studies (n = 38) and meta-analysis of 35 studies. Three of the studies which did not meet the criteria for inclusion in the metaanalysis did not have the required effect size data to allow for such an analysis.

Study Characteristics
Included studies (n = 38) were published between 1991 and 2021, across 8 countries [study numbers as follows: Japan (n = 13), Poland (n = 11), USA (n = 5), South Korea (n = 3), Sweden (n = 2), United Kingdom (n = 2), Italy (n = 1), Switzerland (n = 1)]. A total of n = 1354 N-ERD, n = 1420 ATA, and n = 602 HC subjects were represented across the included studies, with n = 1010 (36.5%) males. In 19 studies, patients with N-ERD were study-defined N-ERD and/ or there was clear documentation concerning co-morbid chronic rhinosinusitis and/or nasal polyposis status. In the remaining studies (n = 19), the terminology AIA was used without reference to presence of nasal polyposis. The 1 3 main characteristics of included studies are summarised in Table 1.
Twenty-seven studies used positive aspirin challenge alone (inhaled, intravenous, nasal, or oral) as the reference standard to diagnose N-ERD, two studies used convincing clinical history of asthma exacerbation secondary to ingestion of aspirin alone, and the remaining nine studies used either positive challenge or convincing clinical history. Further details on the  aspirin challenge criteria and methodology for uLTE 4 measurement are found in Table 2.

Key Findings
Studies with different uLTE 4 measurement methodologies were combined. Thirty-five studies including 1127 N-ERD and 1191 ATA reported that the baseline concentration of uLTE 4 was significantly higher in N-ERD (SMD 0.80, 95% CI = 0.72 to 0.89; I 2 = 42%, Fig. 2

Meta-regression and Risk of Bias
Heterogeneity observed between studies in this meta-analysis was low. Despite this, we performed meta-regression analysis to assess the contribution of several covariates on effect size across studies included in pooling of effect size for baseline uLTE 4 in N-ERD vs ATA comparison. I 2 for this analysis was low (42%). Meta-regression revealed that country of study had an impact on effect size (I 2 = 13.05%). Furthermore, by identifying different study sites and including this in the multiple regression analysis, we found that this would account for an I 2 of 100%, suggesting that heterogeneity across studies in this meta-analysis is related to site. There was no significant impact on the effect size when other covariates (publication year, percentage male participants, baseline lung function, and methodology for uLTE 4 measurement) were analysed by means of meta-regression, and hence no significant impact on heterogeneity between studies was noted. Risk of bias assessed by means of the QUADAS tool from the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [13], was acceptable across all studies; however 37.8% of quality assessment items were unfulfilled (Figs. 6 and 7). The following risk of bias items were poorly reported across all studies (reported in < 30% overall): spectrum of representative patients (10.5%) and independent interpretation of index and reference standard tests (0%).

Discussion
Our meta-analysis of 35 studies demonstrated a statistically significant higher baseline concentration of uLTE 4 in patients with N-ERD compared to those with ATA and These findings corroborate current knowledge regarding the importance of leukotriene status in patients with N-ERD, and again identify uLTE 4 as a potential biomarker in N-ERD diagnosis and disease monitoring. For the subset of studies reporting uLTE 4 measurements before and after aspirin challenge testing, a significant rise in uLTE 4 was seen in patients with N-ERD, but not those with ATA. This is the first metaanalysis which evaluates the change in uLTE 4 concentrations following aspirin challenge in N-ERD compared to ATA, and the results are consistent with previous literature demonstrating that the magnitude of nasal and/or respiratory reactions to provocative aspirin challenges in asthmatics is associated with both the degree of baseline uLTE 4 elevation and the rise in uLTE 4 during a challenge [51,52]. This study has a number of limitations. Because individual data points were largely missing from most studies, sensitivity and specificity testing was not possible. Four studies did provide some data of interest [8,9,16,38], but this was insufficient to carry out this analysis. The corresponding authors of the rest of the included studies were contacted via e-mail asking for this data, but there was no response from any of them. Studies included were published between 1991 to 2021, a total span of 30 years, and this will invariably carry with it a variation in practice of uLTE 4 measurement. Although, our meta-regression analysis did not identify year of publication as contributing to heterogeneity across studies, four different methodologies were used to measure uLTE 4 across the studies included. However, to account for this, a separate comparison analysis for studies using each of the methods was performed and then the studies were combined. This analysis has revealed that despite the different methodologies, there was no significant heterogeneity across studies (Fig. 2), meaning that different methodologies were not shown to have a significant impact on effect size. Although the different methodologies did not appear to result in heterogeneity, there was a large number of methodologies used and methods of reporting the data. The country of publication had an effect on heterogeneity but not when site was included in the multiple regression. This suggests that site was responsible for the heterogeneity, presumably due to a composite of methodology, definition of N-ERD and population sampled. Greater standardisation of the procedure and reporting is required in clinical research and clinical practice.
There was also variation in the way asthma was defined across studies, with American Thoracic Society (ATS) criteria, Global Initiative for Asthma (GINA) guidelines, National Heart, Lung and Blood Institute criteria, and physician diagnosis all used. In 17 studies, definition of asthma was not specified. This is important given that it will dictate the characteristics of the population being studied. Similarly, the definition of aspirin intolerance varied across studies. Although most studies performed aspirin challenge testing (either retrospectively or prospectively), there was considerable variation in the challenge agent employed and the diagnostic cut-off for a positive test (i.e., fall in FEV 1 relative to baseline). Approximately half of studies included in the meta-analysis (18/35) provided clear documentation of comorbid chronic rhinosinusitis and/or nasal polyposis status, or the aspirin-intolerant cohort was defined as N-ERD. The remaining studies did not provide such population characteristics. In several studies, summary data concerning uLTE 4 levels were not stated in the published text or supplementary materials and had to be derived from figures using a web-based extraction tool. This invariably is an estimation of the data. Similarly, for studies where the reported data was described as median with range or interquartile range, this required conversion to mean and SD using published approximation methods. This is important because of the potential impact this has on the accuracy of the results and the impact this could have on the weight of the individual studies, and therefore the overall study results. We therefore feel that standardisation of result reporting should also be implemented.
One of the most important features of this meta-analysis is the enforced use of the standardised mean difference. This summary statistic is used when the measurement scales of the various papers are too diverse to be pooled in a metaanalysis, and thus they have to be converted to a common statistical denominator, or statistical units. The use of the standardised difference means that we cannot know the absolute difference between groups, nor can we define a diagnostic cut off. This is important especially when considering developing study protocols going forward with the aim of establishing sensitivity and specificity. This work has identified the need for standardisation of such protocols to move closer towards achieving clinical significance. Our results show that all the methodologies employed to measure uLTE 4 yielded comparable results across studies. Mass spectrometry has been described in a number of publications as the gold standard for the measurement of leukotrienes in biological fluids [53,54]; however, access to MS and cost might impact its availability in the clinical setting, whereas, enzyme immunoassays might be more readily available. We feel that these are important considerations to make going forward in the protocol development for research of this subject area. This would allow calculation of the absolute mean difference in clinically useful terms rather than the slightly abstract concept of a standardised mean difference. The current heterogeneity in methods and measurement makes it impossible to come up with clinically relevant recommendations on the use of such diagnostic technology.
It should also be noted that most studies have been conducted in specialist centres and excluded participants with uncontrolled asthma or participants reporting a respiratory tract infection or asthma exacerbation in the preceding 6 weeks. While this provides a well-defined cohort for research purposes, our findings may not be generalisable to patients undergoing testing in routine clinical practice, especially since N-ERD is most prevalent among patients with severe asthma.
Overall, the risk of bias was acceptable across all studies. However, in all included studies, it was not reported whether study authors were blinded to baseline uLTE 4 data (index test) when performing aspirin challenge testing or obtaining clinical history of aspirin intolerance (reference standard). The primary aim of many included studies was not to determine test diagnostic accuracy, which may account for this. It is also unclear how much a lack of blinding could affect interpretation of aspirin challenge testing since challenges are normally undertaken following a set protocol with a predetermined diagnostic cut-off.
The finding of a significant rise in uLTE4 following aspirin challenge testing is in keeping with the central role leukotriene release as a cause of upper and lower airway symptoms [44]. Daffern et al. showed that rise in uLTE4 following challenge was related to severity of airflow obstruction post challenge. However interestingly the rise does not seem to be attenuated by inhibition of 5-lipoxygenase which should reduce leukotriene production [51,55]. provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.