Introduction

Clonality testing is widely accepted as a valuable tool in routine diagnosis of lymphoid malignancies [1, 2]. The vast majority of lymphoid malignancies arise from the unconstrained expansion of a single transformed B- or T-cell, accompanied by the presence of clonal rearrangements of immunoglobulin (IG) or T-cell receptor (TR) genes, rendering them the most widely applied gene targets for clonality testing.

In 2003, the European BIOMED-2 consortium, continued under the name “EuroClonality consortium,” designed a standardized multiplex PCR assay for nearly all IG/TR targets [3], which showed a high rate of clonality detection in common B- and T-cell malignancies [4,5,6,7,8,9,10,11,12]. To date, several commercial kits are available to run the multiplex PCR assays.

Due to the technical standardization and commercialization (PCRs, protocols, and readouts), clonality assays can be performed in routine diagnostics [13]. However, reporting of clonality assays is still considered a complex task, because molecular clonality testing reflects immunobiology and comprises the integrated interpretation of multiple multiplex PCR results. These multiplex PCRs use primers of potentially different efficiencies, annealing to highly homologous genes. Although there are basic rules for interpretation of the molecular patterns [13], extensive knowledge of IG and TR gene rearrangement patterns and the PCR design is needed. Also, interpretation should consider the pathology and the clinical question as the presence of a small clone in a reactive lesion has another implication than its presence in a full-blown malignancy.

Laboratories performing molecular pathology tests are advised to participate in external quality assessment (EQA) schemes [14], preferably an accredited scheme providing samples mimicking routine cases as closely as possible [15]. It is essential that the EQA participants read the final reports with feedback on errors made by all participants, act on recommendations made, and ensure that their own errors are corrected rapidly [14].

The EuroClonality consortium organized five EQA rounds between 2008 and 2011 [16], using capillary electrophoresis (GeneScan, GS) or polyacrylamide heteroduplex (HD) gel analysis [3]. The schemes aimed to (i) assess the laboratory performance and (ii) develop a uniform scoring system for interpretation of IG/TR clonality testing. To render interpretation less subjective, algorithms have been introduced, especially in the USA [17,18,19]. However, this may potentially lead to false negative or false positive interpretation [13], and the need for guidelines on interpretation and reporting of clonality data is apparent for IG/TR routine diagnostics and EQA schemes. This prompted the development of the EuroClonality (BIOMED-2) guidelines. The guidelines describe the technical scoring of the individual IG and TR PCR target results, and scoring of the final molecular conclusion, based on the integration of the different PCR results. During validation of the EuroClonality uniform description and reporting system, the majority of the cases were scored appropriately, with only 3.1% of 1150 cases being identified as difficult to score, i.e., the final scoring of either a minor clone with polyclonal background or polyclonal with a minor background, actually describe the same phenomenon but the scoring may reflect the personal favor of the clinical scientist [13].

Several other providers offer IG/TR EQA schemes [20,21,22,23,24]. Both small sample sets with frequent distribution and larger sample sets are currently used in different EQA programs [25, 26] (supplemental Table 1). Previously, performance improvement upon EQA participation or other quality improvement projects has been reported in schemes for testing of oncological biomarkers [27, 28], but not yet for clonality testing.

The aim of this paper was to investigate the effect of repeated EQA participation on the laboratories’ performance for complex clonality analysis. Important parameters such as the participant group, the different final molecular interpretations (clonal, polyclonal, oligoclonal; without evaluation of the more detailed molecular interpretation), sample information, and the analysis method were integrated in these analyses. The data are based on the results of five EQA rounds for IG and TR rearrangement analysis in suspected lymphoproliferations between 2014 and 2018.

Materials and methods

EQA scheme set-up

The schemes were organized by the EuroClonality consortium [29] in collaboration with the Biomedical Quality Assurance Research Unit of KU Leuven as the coordination center, accredited conforming to ISO/IEC 17043:2010 [30]. Each EQA round comprised analysis of extracted DNA samples and interpretation of clonality patterns from paper-based cases on a total of 10 clinical cases. In addition to the EuroClonality laboratories who were involved in the development of EuroClonality/BIOMED-2 primer sets and protocols [3,4,5] and are members of the EuroClonality consortium, also non-EuroClonality laboratories could register. Enrolled laboratories could opt to participate in IG or TR testing, or both. The EQA scheme process is depicted in Fig. 1.

Fig. 1
figure 1

Overview of the EuroClonality EQA scheme process. EQA, external quality assessment; IG, immunoglobulin gene; TR, T-cell receptor gene

Sample selection

Participants received five cases for IG clonality testing and/or five cases for TR clonality testing. With the exception of 2014 (only paper-based cases), these cases alternated yearly to include three DNA samples and two paper-based cases in a given year, versus two DNA and three paper cases in the next year (Table 1). As both hemato-oncology and pathology labs perform clonality analysis, cases of different sample types (e.g., peripheral blood, fresh tissue, FFPE tissue) were included, reflecting the clinical diagnostics. The selection for the wet cases was based on the availability of samples with sufficient DNA-yield for testing in an EQA, the representation of the tube patterns, and results from previous EQA rounds. The selection for the paper-based cases was based on the representation of the tube patterns, the evaluation of rearrangement patterns of separate tubes into an integrated conclusion, and the results from previous EQA rounds. Only wet and paper cases with a consensus overall molecular interpretation during pretesting were included.

Table 1 Performance parameters for IG/TR clonality testing over time of the EQA participants combined

Wet samples consisted of color-coded tubes containing 40 μL DNA at a concentration of 25–50 ng/μL. Paper-based cases focused on the interpretation of IG/TR GS patterns, created by duplicate fragment analysis (GS) of PCR products on various Genetic Analyzer Systems (Life Technologies, Beckman Coulter). Quality of the DNA samples was assessed with the EuroClonality/BIOMED-2 quality control-gene PCR (100, 200, 300, 400 bp amplicons), and the largest sized amplicon product detectable was reported. Participants received information on the PCR targets (e.g., FR1-JH for IGH tube A), fluorochromes (FAM or HEX), and size standards (e.g., LIZ500, ILS600). Patterns were provided per BIOMED-2 tube, including a full view of the tube patterns and a zoomed view per sample to aid visualization of the case’s overall GS profile. All samples (paper and wet) were presented as clinical cases and relevant clinical details (sample type, age/sex of patient, suspected diagnosis, and request), and flow cytometry, histomorphology, and/or immunostaining data were provided.

Results of clonality analysis: the individual tests as well as the final molecular interpretation

Participants were asked to analyze all cases using their routine protocols and to interpret the results according to the published guidelines [13]. Results were entered in an electronic datasheet (Formdesk) and included (i) the overall molecular interpretation, (ii) an optional more detailed interpretation, and (iii) a technical description per PCR tube (with or without peak size(s)). Additionally, information about the detection technique (HD or GS) and test assay used (only in 2018) were requested.

Evaluation and feedback

In 2014 and 2015, a consensus (overall) molecular interpretation and result per PCR tube was reached based on the concerted discussion of the participants’ data. From 2016 up to 2018, consensus scoring was established by the EQA committee experts for the wet and paper cases prior to distribution.

In all scheme years, a maximum score of 1 point could be obtained per sample for a correct final molecular interpretation (Table 1). As the results of the different multiplex PCRs were used to form the basis for the final molecular interpretation, the individual PCR tube results were not scored. For particular cases, a more detailed interpretation of the final molecular interpretation was required (out of scope for this paper). Only in 2016 and 2017, half a point was additionally deducted for an incorrect or suboptimal detailed interpretation. In 2016, half a point was also deducted for discrepancies between individual tubes and the final conclusion or an incorrect identification of clonal peaks.

In the yearly EuroClonality meeting, the results of the EQA scheme were discussed, starting with a plenary presentation of the results, followed by detailed small group discussions involving the expert EQA committee members and the EuroClonality consortium participants. Finally, there was a summarizing plenary presentation. Analysis of the EQA data was integrated and described in detail in an educational EQA report and provided to all scheme participants. The general scheme summary included detailed information about the molecular conclusion and per tube PCR results, an assessment table with scores per case, and a participation certificate [25]. The criterion for successful participation was a performance rate of ≥80% in that respective scheme year, corresponding to at least 4 out of 5 correct final molecular interpretations. Laboratories with a score of ≤4.5 on 5 received a warning due to possible risk of unsatisfactory performance after two EQA rounds. Laboratories with a score of at least 90% (9/10) [25] in two subsequent EQA rounds were listed on the EuroClonality website [29, 31].

Statistics

Statistics were performed with IBM SPSS Statistics v25 (IBM, Armonk, NY, USA) with significance levels set at α=0.05. Mann-Whitney U (MWU) tests were performed to evaluate differences in average analysis scores between groups, and Kruskal-Wallis (KW) tests to assess improvement upon repeated participation for a given group.

Results

General overview

Over all schemes between 2014 and 2018, 84 unique laboratories from 17 countries participated, of which 27 (32.1%) were EuroClonality members, resulting in 279 and 272 scheme registrations for IG and TR analysis, respectively. Results were returned for 98.9% (276/279) and 98.9% (269/272) IG and TR participations, respectively (Table 1).

Average scores were high with minimum values of 4.72/5 (2015, IG) and 4.55/5 (2016, TR) (Table 1). The percentage of laboratories obtaining the maximum score (5/5 points) varied between 75.9–97.9% for IG and 33.3–89.1% for TR analysis.

For IG and TR, a total of 10 wet cases and 15 paper-based cases were distributed to multiple registered laboratories (Table 1). In summary, 94.9% (506/533) of wet and 97.9% (829/847) of paper tests were correct for IG (MWU, p=0.144), versus 96.8% (507/524) wet and 93.2% (765/821) paper tests for TR (MWU, p=0.686). For all sample types and diagnoses, 87.5% or more of the participants were able to provide the correct consensus outcome (Table 2). This excludes a paper-based case in 2016 (peripheral blood with relapsed T-cell prolymphocytic leukemia) for which 29.4% (15/51) of participants incorrectly interpreted this difficult case with oligoclonality/multiple clones detected (Supplemental Table 2).

Table 2 Performance related to sample origin and clinical diagnosis for the different cases distributed in the EQA schemes

Improvement related to repeated EQA participation

We evaluated the performance on individual laboratory level based on the number of EQA participations, not related to the general average score for that scheme year. The average analysis scores were significantly higher for individual laboratories who participated in multiple EQA scheme rounds (KW, p=0.001) for IG and TR. In fact, the successful IG testing performance was 90.6% for first time participants (n=16 laboratories), versus 98.7% for laboratories who participated five times (n=30) (Fig. 2, panel A) (KW, p=0.133). For TR analysis, laboratories who participated one time reached an average score of 90.0% (n=15), whereas 5th time participants reached on average 98.6% (n=29) (KW, p=0.011).

Fig. 2
figure 2

Improvement upon repeated EQA participation for the different targets, sample, and participant types. #, number; IG, immunoglobulin gene; SEM, standard error of the mean; TR, T-cell receptor gene. Bar numbers represent the number of unique laboratories for which the average analysis score is calculated. No standard errors are shown for bars with only one participant

Overall, laboratories performed better for the paper-based cases as compared to the wet cases, although not significant (MWU, p=0.466) (Fig. 2, panel B). During a first EQA participation, a score of 85.9% was reached for wet cases, compared to 92.5% for paper-based cases (IG and TR combined). The difference between both sample types decreased upon frequent EQA participation, ultimately reaching scores of 98.0% (wet cases) and 99.4% (paper cases).

Both EuroClonality and non-EuroClonality laboratories benefited from repeated EQA participation. The EuroClonality laboratories performed significantly better compared to non-EuroClonality participants (MWU, p=0.008) (Fig. 2, panel C). Better performance by EuroClonality laboratories was also observed for IG and TR testing for paper and wet cases separately, although only significant for the paper cases (MWU, p=0.007 for paper cases, p=0.149 for wet cases, p=0.057 for IG, p=0.068 for TR) (Supplemental Figure 1).

Evaluation of the different final molecular interpretations

In total, 1380 tests (both on wet and paper-based cases) were performed for IG rearrangements (Table 3). A total of 97.5% (1021/1047) and 96.1% (265/276) tests correctly assigned the final interpretation of clonal or polyclonal, respectively. Note that the more detailed molecular interpretations were evaluated but no points were deduced when the more detailed interpretation was not correct. Only 12.3% (7/57) of tests with a consensus outcome of oligoclonality/multiple clones were correct, as 78.9% (45/57) were reported as clonal. For TR analysis, 1345 tests were performed (Table 3), of which one paper-based case in 2015 was considered to be educational since no consensus outcome was reached. Similar to IG, the majority of clonal (893/913, 97.8%) and polyclonal (251/269, 93.3%) TR tests were correct, while the majority of oligoclonal tests were incorrectly assigned as clonality detected (76/107, 71.1%). Incorrect interpretations were more often observed for wet samples compared to paper cases, especially for IG analysis (except for oligoclonality, which only included paper-based cases).

Table 3 Percentage of correct and incorrect final molecular interpretations in wet and paper-based cases

Evaluation of the analysis methods used by the EQA participants

For IG and TR analysis of the wet samples, 90.8% (n=533) and 92.2% (n=524) of tests were analyzed by GS. The remaining tests were analyzed by HD (8.3% for IG and 7.3% for TR), which is also described as a preferred analytical technique for some multiplex PCR-tubes [3]. For IG analysis, the majority of the participants tested the IGH-A (94.6%), IGH-B (98.7%), and IGH-C (98.9%) tubes (Supplemental Table 3). These three tubes were mainly tested by EuroClonality Invivoscribe reagents (30/55 participants), EuroClonality laboratory-developed tests (LDT) (23/55), non-EuroClonality LDT (1/55), and Invivoscribe next-generation sequencing (NGS) reagents (1/55).

For wet TR testing, the most included tubes were TRG-A and TRG-B tubes (97.5%). For these two tubes, EuroClonality Invivoscribe reagents were mainly used (29/55 participants), followed by EuroClonality LDT (22/55), and non-EuroConality LDT (2/55). One participant used Invivoscribe NGS reagents, and one other laboratory did not test these targets. Reagents for the other tubes are shown in Supplemental Table 3.

Discussion

The BIOMED-2/EuroClonality assays are widely used for clonality testing of suspected lymphoproliferations. Clonality testing is not a stand-alone test but is an important integral part in the diagnosis of lymphoid malignancies. Correct analysis, evaluation, and result reporting are indispensable and contribute to a correct diagnosis. Particularly, the appropriate interpretation of clonality assays requires study, learning, and training on the job. This can be facilitated by participation in EuroClonality educational workshops, or by submitting difficult cases via the EuroClonality website to get online support. In this paper, we show that participation in the EuroClonality EQA schemes significantly contributes to improving the diagnostic interpretation.

The overall performance scores for both IG and TR analysis were high, with more than 90% of successful participants each year. The individual participants had a significantly higher score when participating in more EQA rounds, although there was no obvious overall improvement between 2014 and 2018. There was no observed difference in performance based on the sample type or final clinical diagnosis. The used methodology could not be linked to the improvement for a specific laboratory or sample type.

The final molecular interpretations such as “clonal” or “polyclonal” were in general scored well. Also, truly challenging cases were included such as clonal cases with polyclonal background, cases with bi-allelic rearrangements, bi-clonal cases, and cases with multiple clonal (IGK or TRB) rearrangements that still belong to one clone. In our schemes, thus far, we have evaluated the more detailed molecular interpretations such as bi-allelic or bi-clonal, but no points were deducted when the more detailed interpretation was incorrect. In the next EQA schemes, we intend to include the more detailed interpretations in the evaluation. Based on the previous EQA schemes, we then expect a lower performance for clonal samples. The scoring of oligoclonality clearly was difficult. Oligoclonality is defined as the reproducible detection of three or more clones. As for the interpretation of clonality (including bi-allelic or bi-clonal cases), this requires the appropriate interpretation of the individual tube results as well as understanding the IG and TR loci and PCR design. Due to the non-quantitative PCR nature and potentially preferential amplification of some rearrangements, the identification of true clonal rearrangements compared to minor peaks in an irregular polyclonal background is especially difficult for oligoclonal cases. Because true oligoclonal cases are scarce, the experience with these cases is limited. Only three oligoclonal cases could be included in the paper-based EQA; one for IG and TR in 2015 and one TR case in 2016. Of the total 164 molecular interpretations (Table 3), EuroClonality laboratories correctly assigned the cases as oligoclonal in 24/73 (32.9%) tests, compared to 10/91 (11.0%) for non-EuroClonality laboratories. Of the 56 laboratories interpreting the first oligoclonal TR case in 2015, 39 participated again in 2016. Twenty-eight of 39 participants incorrectly denoted the oligoclonal case in 2015, of which 21 made the same mistake and 7 laboratories improved in 2016. The other 7 participants had a correct outcome in both 2015 and 2016, while 4 laboratories were correct in 2015, but incorrect in 2016.

A better performance (although not significant) was observed for paper-based tests in which only the result interpretation was evaluated, versus the wet cases in which the technological approach, performance of the test, and the interpretation of the results were evaluated. While paper cases were evaluated adequately during a laboratory’s first or second EQA participation, wet cases were more error-prone (Fig. 2, panel B). This is not surprising, given that wet sample analysis includes extra (pre-)analytical processes potentially impacting the results, compared to solely interpreting GS results according to guidelines. However, paper-based cases may also be perceived as difficult by the laboratories, as cases with complex rearrangement patterns were included.

Both EuroClonality and non-EuroClonality laboratories improved their performance upon repeated participation, which is in line with results for biomarker analysis in colorectal cancer [28]. The overall scores were significantly better for EuroClonality laboratories (Fig. 2, panel C). The EuroClonality-affiliated participants may have benefited from the annual meeting and the provided feedback in the group discussions. Feedback has been shown to be an important parameter in learning [32]. However, the better performance should be interpreted with caution, as the majority of the EuroClonality-affiliated laboratories also participated more frequently to the EQA schemes, and repeated participation significantly improved performances. In addition, the difference between EuroClonality and non-EuroClonality laboratories is smaller for individual scheme years. We expected a larger score difference in a single scheme, given the longstanding experience of the EuroClonality participants, who were involved in the design and testing phase of the BIOMED-2/EuroClonality assays and in preparation of the EuroClonality guidelines. The question remains how non-EuroClonality laboratories educated themselves. Most likely, training on the job within an expert environment and/or attending dedicated trainings resulted in translation of theoretical knowledge into diagnostic practice and competence building. In addition, the feedback given in the extended EQA report may also have contributed to learning and good performance [33, 34]. In the end, it remains the responsibility of the participants to implement the necessary corrective actions to improve performance.

This EQA scheme with five wet cases and five paper-based cases in each round allowed us to evaluate the successful performance over two EQA rounds, as it was previously estimated that at least 10 samples are needed to allow a reliable performance estimate [25, 26]. Accredited laboratories have to demonstrate their performance, but not all laboratories participated to all EQA rounds, as the frequency of participation is not specified by ISO15189 [15] or equivalent national accreditation standards. Recent recommendations from the Belgian Molecular Diagnostics working group now state that laboratories should perform a risk analysis to determine their ideal participation frequency [35].

The several international clonality EQA providers each (i) evaluate a different number of IG/ TR targets, (ii) distribute various numbers of samples per annum, and (iii) apply different criteria for successful participation (Supplemental Table 1). Although not all providers include paper-based cases, the majority assesses the laboratory’s interpretation of the rearrangement patterns according to the guidelines. Namely, EQA providers should assess the complete analysis process, from pre-analytical to post-analytical phase [25]. As the participants received pre-extracted DNA samples, the DNA extraction and preparation steps are not evaluated in the EQA scheme, and could impose additional difficulties in a routine setting. The cut-off of 80% for successful performance and 90% after two participations in these EQAs was based on the requirements for EQA programs, which recommend a cut-off of 90% assessed on a total of 10 samples [25]. Similar to the harmonization efforts in molecular oncology, increased harmonization between providers is advisable for clonality analysis to define a uniform scope, scoring system, criteria for successful participation, and actions following unsatisfactory performance in Europe [36].

In summary, we observed a high performance for IG and TR analysis, which increased when participating to more EQA rounds. There was a higher performance for paper-based cases compared to wet cases and for EuroClonality compared to non-EuroClonality laboratories. There was no difference related to the EQA scheme year, sample origin, or clinical diagnosis. The observed difficulties in interpreting oligoclonal cases highlight the need for continued education via meetings and EQA schemes.