Background

PGP9.5 immunostaining of intra-epidermal nerve fibers in 50 µm sections is widely accepted as the gold standard for the diagnosis of small fiber neuropathy. The assay is used to visualize the number and morphology of the somatic, small caliber, intra-epidermal nerve fibers, and supported by the European Federation of Neurological Societies (EFNS) [1]. The use of skin biopsies as a diagnostic tool in peripheral neuropathies has increased in the last decades, and is regarded as a reliable and standardized tool [17]. The Polyneuropathy Task Force [3] concluded that the gold standard was diagnostically efficient at distinguishing polyneuropathy patients [including small fiber neuropathy (SFN)] from normal subjects as controls. Skin biopsy immunostaining can help to detect aberrations in somatic nerves in neuropathies that were formerly classified as autonomic, such as Ross syndrome [8, 9]. It was [4] concluded that the intra-epidermal nerve fiber density (IENF) assessment has proven its value as a measure for treatment success and for follow-up in clinical trials [1013]. The EFNS Task Force developed guidelines for the diagnostic use of skin biopsies in peripheral neuropathies, published in 2005 [1]. The recommendations include the use of 3-mm punch skin biopsy from the distal leg, fixed in Zamboni solution and quantified for linear nerve fiber density in at least three consecutive 50 µm sections, after PGP9.5 immunohistochemical or immunofluorescence staining. In the counting rules it was emphasized to only include IENF crossing the dermal-epidermal junction, while excluding secondary/tertiary branching from the quantification [1]. From the perspective of use of skin biopsies in clinical trials, the EFNS-advised method is time-consuming and challenging for large scale batch testing in a standardized manner. One way of approaching this challenge is to fully automate the staining procedure.

For each laboratory developed test (LDT), which aims to identify disease and is intended for accreditation and use in clinical trial settings, assay performance characteristics must be established as required by CAP/CLIA [14] and local accreditation boards [15]. Good laboratory practice requires that accuracy, precision, analytical sensitivity, analytical specificity, reportable range and reference intervals are established [16, 17]. Biopsies from subjects clearly showing intra-epidermal nerve fiber reduction, as confirmed by conventional diagnostic tools, were selected from two studies conducted earlier by Ragé and colleagues [18, 19]. In the first study, the experimental model of reversible capsaicin-induced small SFN [10, 11, 20] was examined in healthy subjects using the laser evoked potentials (LEPs) in comparison to the linear nerve fiber density in skin. The second study aimed at investigating the diagnostic performance of LEPs in the assessment of small nerve fiber loss in asymptomatic diabetic neuropathy. From this study, diabetic subjects presenting with distal polyneuropathy were included. In addition, skin biopsies of healthy subjects were studied to indicate achievable reference ranges, and reveal possible discrepancies and quantitative caveats between the LDT and the gold standard method.

In this study we developed an automated PGP9.5 LDT, to complement the gold standard, and validated its potential for use in large scale clinical studies.

Methods

Human samples

Fourteen skin punch biopsies are obtained from healthy volunteers (n = 14, age group 33–52 years). Five biopsy specimens, from diabetic subjects presenting poly-neuropathy (n = 5), obtained from the study described by Ragé and coworkers [19] were included in the study. In addition, two biopsy specimens obtained from healthy volunteers who received topical capsaicin application on three consecutive 24-h cycles were examined as well [18]. Studies were approved by the Local Ethics Committee; informed consent was obtained for all participants.

Skin punch biopsy processing

Skin punch biopsies (diameter 2–4 mm) were performed under local anesthesia at the lateral aspect of the distal leg in a clinical unit. Skin biopsies were fixed in cold Zamboni fixative solution (60 min at room temperature), cryopreserved in sucrose 30 % and frozen in OCT compound (Sakura TissueTek Europe, The Netherlands) as recommended by the EFNS guidelines [1, 21]. After freezing, tissue-blocks were stored at −80 °C prior to sectioning.

Accuracy confirmation by western blot analysis and double immunofluorescence histochemistry

Western Blot analysis was performed by SDS-page on cell lysates from A549 (lung carcinoma) and U87 (glioblastoma) cell lines exhibiting high PGP9.5 expression, to verify the accuracy of the rabbit polyclonal anti-human PGP9.5 antibody (RA95101, UltraClone Ltd., UK). Subsequently, western transfer of proteins was performed onto a transfer membrane (Immobilon FL, Li-Cor, Germany). After incubation of the rabbit polyclonal anti-human PGP9.5 antibody (1/300), visualization was performed using IRDye® 800CW Conjugated Goat (polyclonal) anti-rabbit IgG antibody on the Odyssey® Infrared Imaging System (Li-Cor).

For double immunofluorescence histochemistry, 16 µm-thick sections were incubated overnight at room temperature with a mix of primary anti-PGP9.5 (1/2000) and anti-βIII-Tubulin (clone 5G8, Promega Corp., USA; 1/250) antibodies to be subsequently visualized by Cy3-conjugated goat anti-rabbit and Alexa-488-conjugated goat anti-mouse antibodies (Jackson Immunoresearch Laboratories, Inc., USA; 1/500 and 1/100 respectively). Counterstain was performed using Hoechst 33342 (Invitrogen Molecular Probes, USA; 1/2000).

Gold standard PGP9.5 immunofluorescence according to EFNS (GS-EFNS)

Three consecutive 50 µm cryosections were produced and collected in PBS-azide. To enhance penetration of immunoreagents into 50 µm-thick sections, tissue was permeabilized using TritonX100 (Sigma-Aldrich, Belgium; 0.1 %). Manual staining was initiated on free-floating tissue sections. Incubation was performed using the rabbit polyclonal anti-human PGP9.5 primary antibody (UltraClone Ltd.; RA95101; 1/2000) overnight at room temperature as prescribed [1, 22]. Subsequently, the visualization was established using a Cy3-labeled secondary goat anti-rabbit antibody (Jackson Immunoresearch Laboratories; 1/500; 2 h) and counterstained with Hoechst 33342 as described above.

Automatization of PGP9.5 Immunofluorescence staining on Ventana discovery XT®

Six consecutive 16 µm cryosections were produced and collected on dry ice. The staining protocol was custom-programmed using RESEARCH IHC QD Map XT software of the Discovery XT® (Ventana, USA). After loading the slides into the instrument, incubation was performed using the rabbit polyclonal anti-human PGP9.5 primary antibody for 2 h. Subsequently, the visualization was established using a Cy3-labeled secondary goat anti-rabbit antibody and counterstained with Hoechst 33342 as described above.

Quality control

For all specimens, internal controls, i.e. autonomic nerve fibers innervating sweat glands and m. arrector pili, were evaluated for positive staining and subsequent acceptance of each individual section. As a negative control, rabbit immunoglobulins (confirm negative control rabbit, Ventana) were used to replace the primary antibody.

Imaging and quantification

The quantification of the linear density of nerve fibers was performed using a conventional fluorescence microscope (20×–40×; Zeiss, Germany) by two readers blinded for treatment. Virtual images were generated using the Axiovision Mozaik Imaging Software (Axiovision Rel 4.8®) on an Axioplan 2 imaging microscope equipped with a motorized stage and z-stack features (Zeiss, Germany) and used to perform length measurements. Individual counts were divided by the length of the epidermis and expressed as mean numbers/mm (± SD).

Gold standard counting rule

The EFNS guidelines [1, 3, 23] prescribe to count each nerve fiber crossing the dermal-epidermal junction as a single unit. Nerve fibers that are approaching the junction without crossing it or nerve fiber fragments lying free in the epidermis are not enumerated, nor are branches.

LDT modifications to the gold standard counting rule

To critically evaluate the LDT, and the effect of reducing the section thickness, the following additions to the gold standard counting rule were implemented on both the LDT and gold standard stained slides. The epidermal nerve fibers, traditionally named IENF, include all nerve fibers crossing the dermal—epidermal junction (Fig. 1) as per the gold standard counting rule. In order to accurately define existing discrepancies in the LDTs ability of visualizing the 3D-IENF network, incidence of secondary branching of IENF was taken into account. Therefore, intra-epidermal nerve fiber crossing dermal-epidermal junction as single fiber (IENF_Si) (Fig. 1: 1A) and intra-epidermal nerve fiber crossing dermal-epidermal junction as fiber showing branching (IENF_Br) (Fig. 1: 1B) as being epidermal nerve fibers crossing the dermal-epidermal junction as single or branching fibers are reported. Since the LDT uses thinner tissue sections, and a higher incidence of IENF fragments is to be expected, intra-epidermal nerve fiber fragments that do not cross the dermal-epidermal junction free intra-epidermal nerve fiber fragments (IENF_F, Fig. 1: 2A, 2B) were included. These fragments were not included in the EFNS guidelines but have been proven to be valuable [24]. Fragments are considered as such based on morphological characteristics and an approximate minimal length of 5 µm. For IENF_F, the incidence of secondary branching of IENF was taken into account as well, resulting in IENF_FSi (Fig. 1: 2A) and IENF_FBr (Fig. 1: 2B). The actual number of branches on each nerve fiber was recorded separately. The total number of epidermal nerve fibers (total ENF, Fig. 1) represented the sum of IENF and IENF_F.

Fig. 1
figure 1

Counting rules according to the GS-EFNS (1) and LDT (2). Modifications to the gold standard counting rule for intra epidermal nerve fibers (IENF) are depicted to include IENF fragments (_F) detected as single fibers (_Si) or branching (_Br). Red PGP9.5, blue Hoechst, dashed line dermal-epidermal junction. 1 IENF; 1A IENF_Si; 1B IENF_Br; 2 IENF_F; 2A IENF_FSi; 2B IENF_FBr. (Scale bar 20 µm)

LDT method validation

Besides the confirmation that the anti-human PGP9.5 antibody accurately detects its target, we established reference intervals and defined LDTs discrepancies in reference to the GS-EFNS (LDT modifications to the GS-EFNS counting rule) on biopsies obtained from healthy subjects. Diagnostic yield and diagnostic performance (analytical sensitivity and specificity) were explored by plotting the plausibility of false positives (specificity) and true positives (sensitivity). The closer the ROC curve approaches the true positive axis, the better the performance of the LDT (see “Statistics” section). The examination of the inter-slide stability of the GS-EFNS and LDT was included since this could highlight the need for a minimum number of serial slides to be examined. For each subject and each staining method three serial slide measurements were performed by one observer (M1, M2, and M3). In order to estimate the reliability of results obtained by independent observers, individual counts for the different parameters were compared after automated staining of 12 randomly selected samples.

Statistics

Method comparison of the LDT and GS-EFNS was performed using Bland–Altman analysis [25]. To prove a good agreement between the two techniques the values should be lumped near the 0-difference line. Statistical analysis was performed using a rank sum test for paired samples (Wilcoxon) for which p values below 0.05 were considered significant. Statistical analysis for assessing the diagnostic yield of the LDT compared to conventional diagnostic tools was performed as described before [19]. To explore if data obtained using the EFNS advised method can serve as the gold standard to define the diagnostic performance of the LDT on the selected biopsies, a one-way analysis of variance was performed (ANOVA). This allowed confirming that mean values were significantly different between control and SFN groups. The diagnostic performance of the LDT was estimated by the area under the receiver operating characteristic curve (ROC) with 95 % confidence interval for sensitivity and specificity using De Long’s test. Inter-observer agreement was evaluated by determining intra-class correlation (ICC) for all parameters using the same raters for all measurements and consistency as type. All analyses were performed using MedCalc® v12.3.0.0 statistical software. Finally, a power calculation was carried out to determine the statistical power of this study and the optimal sample size for a future study using software package R, version 3.1.2 [26]. We performed this analysis using data for the total linear density of epidermal nerve fibers.

Results

Accuracy of the anti-human PGP9.5 antibody

The accuracy of the rabbit polyclonal anti-human PGP9.5 antibody was confirmed by western blot analysis on cell lysates from U87 and A549 cell lines. For both cell lines the antibody showed a band at a molecular weight of approximately 25 kDa (Fig. 2a), confirming the recognition of PGP9.5 (27 kDa). In addition, βIII-tubulin, a nerve and langerhans cell-specific marker, co-localized in all PGP9.5 immunoreactive structures of the epidermis (Fig. 2b), confirming the accuracy of the antibody.

Fig. 2
figure 2

Evaluation of antibody accuracy. a Western Blot analysis of A549 and U87 cell lines using the rabbit polyclonal anti-human PGP9.5 antibody (green). b Immunofluorescent staining of PGP9.5 (red) and ß-III-Tubulin (green) in the epidermis of a human skin biopsy, blue Hoechst. Asterisk Nerve fiber; Arrow head Langerhans-cells. (Scale bar 20 µm)

Assessment of IENF and nerve fiber branching density in skin biopsies of healthy volunteers**

A significant difference existed in the ability of the LDT assessing IENF compared to the gold standard (p < 0.001, mean difference 5.8 IENF/mm), especially with regard to IENF_Br (p < 0.001, mean difference 5.7 IENF_Br/mm) (Table 1). For IENF_Si equal results were obtained for the GS-EFNS and LDT (p = 0.95, mean difference 0.1 IENF_Si/mm). Enumeration of nerve fiber fragments lying free in the epidermis showed a significantly higher number of IENF_F for the LDT compared to GS-EFNS (p = 0.01, mean difference −2.8 IENF_F/mm). The IENF_F were mainly present as single nerve fiber fragments (IENF_FSi; p = 0.002, mean difference −2.5 IENF_FSi/mm). Overall, for the total population of epidermal nerve fibers (IENF and IENF_F) a good agreement was reached between the GS-EFNS and LDT (p = 0.24, mean difference 3.0 Total ENF/mm) (Fig. 3; Table 1). Based on the mean difference and SD of both measurements, we determined that the difference between the groups equals 0.68 SD units. A power calculation showed that, to detect a difference in means of 0.68 SD units, with a power of 80 %, at alpha level of 0.05, sample size should be at least 35 in both groups.

Table 1 Comparing GS-EFNS and LDT for assessing nerve fiber density in skin biopsies of healthy volunteers
Fig. 3
figure 3

PGP9.5 Immunofluorescence in skin biopsy of a healthy volunteer. PGP9.5 immunofluorescence using the gold standard (GS-EFNS: a1) method and laboratory developed test (LDT: b1) was performed as described. a2 and b2 show the single channel image of PGP9.5 immunofluorescence (red). Red PGP9.5, Blue Hoechst. (Scale bar 50 µm)

A significantly lower number of IENF branches was observed using the LDT (p < 0.001, mean difference 18.5 Branches/mm). Minor differences were seen for IENF fragment branches (p = 0.04, mean difference 0.2 Branches/mm) (Fig. 3; Table 1).

Diagnostic performance of the LDT for SFN

The LDT was able to distinguish between healthy and SFN groups (p < 0.001) when assessing IENF linear density (Table 2). Importantly, the LDT reached good concordance with the gold standard as determined by Bland–Altman after assessing IENF (p > 0.42, mean difference 0.23 IENF/mm); IENF_F (p > 0.25, mean difference −0.29 IENF/mm) and thus total ENF (p > 0.42, mean difference −0.06 ENF/mm) in skin biopsies of SFN subjects. This clearly indicates that the LDT and GS-EFNS are equivalent in detecting SFN (Fig. 4). A lower number of nerve fiber fragments lying free in the epidermis (IENF_F) was observed when the gold standard was applied (Table 2).

Table 2 Comparing GS-EFNS and LDT for assessing nerve fiber density in skin biopsies of SFN patients
Fig. 4
figure 4

PGP9.5 Immunofluorescence in skin biopsy of a SFN patient. PGP9.5 immunofluorescence using the gold standard (GS-EFNS: a) and laboratory developed test (LDT: b) was performed as described. Red PGP9.5, Blue Hoechst. (Scale bar 50 µm)

When secondary branching of nerve fibers was taken into account, a good agreement was found for all parameters evaluated, showing mean differences less than 0.5/mm. Nevertheless, lower scoring for IENF without secondary branching (mean difference of −0.16 IEFN_Si/mm) (p = 0.42) at one hand and a higher assessment of IENF showing secondary branching (mean difference of 0.39 IENF_Br/mm) (p = 0.09) was seen using the gold standard, as was the case in the biopsies from the healthy volunteers (Table 2).

Diagnostic yield of the LDT for SFN

ANOVA analysis confirmed that GS-EFNS can serve as gold standard for the diagnosis of SFN when performed in our lab. Data were grouped using the knowledge of the disease states of the different subjects. A significant difference was observed for IENF between healthy and SFN groups (p < 0.001, F-ratio 87.5) (Fig. 5a). Therefore, GS-EFNS results could be used to determine the expected diagnosis to be obtained using the LDT (ROC, Fig. 5b).

Fig. 5
figure 5

Diagnostic performance of the LDT for SFN. a Multiple comparisons graphic, retrieved after one-way analysis of variance (ANOVA) of IENF data obtained using gold standard staining (GS-EFNS) of skin biopsies from healthy volunteers and SFN patients (*** p < 0.001). b Diagnostic performance of LDT for variable Total ENF/mm: area under the ROC curve of 0.72 and p = 0.031 with a sensitivity of 80 % and specificity of 64 %. The lower average Total ENF linear density value in the tested control group, 11.02 Total ENF/mm, was used as cut-off

As IENF density, determined using the LDT, is significantly lower compared to the GS-EFNS, the total population of epidermal nerve fibers counted (Total ENF/mm) was used as cut-off. The average linear density in the tested control group was 15.45 ± 4.43 Total ENF/mm (Table 1), therefore the lower cut-off used was 11.02 Total ENF/mm. Using this cut-off to classify data obtained with the LDT, a sensitivity of 80 % and specificity of 64 % was reached with an area under the ROC curve of 0.72 and p = 0.031.

Inter-slide stability and robustness

For all parameters and both staining methods, mean differences between repeated measures (M1, 2, 3) observed were very small, not exceeding 1 nerve fiber/mm as determined by Bland–Altman analysis (Table 3). Both techniques showed a success rate exceeding 95 % with no failure and thus no exclusion of samples when the LDT is applied.

Table 3 Assessment of the inter-slide stability for scoring PGP9.5 using Bland–Altman analysis for the comparison of GS-EFNS and LDT

Inter-observer agreement

Overall, the inter-reader agreement can be considered excellent with ICC values ranging from 0.93 to 0.99 (Table 4) for all linear density values enumerated on LDT stained samples.

Table 4 Inter-observer agreement when determining nerve fiber density in skin biopsies of SFN patients using LDT

Discussion

As proven by multiple investigators, skin biopsies are excellent tools to investigate the nerve fiber endings in the epidermis [17]. Since the early nineties, the neuronal biomarker PGP9.5 has been regarded as the most accurate for the visualization of epidermal nerves [22, 27]. As the use of this technique grew for the diagnosis of SFN, the need for guidelines and standardization grew accordingly. In 2005, the members of the EFNS indicated the advised procedure for the assessment of epidermal nerve fiber density [1]. The gold standard was described in detail [23], and implemented successfully in specialized laboratories over the world. Other investigators used the IENF density as a reference in experimental disease progression studies [10, 11], and early detection of SFN in diabetics [12, 13], and compared its accuracy to that of conventional diagnostic tools. Nevertheless, this method is a labor intense, manual staining procedure, requiring high methodological skills and training, and is therefore prone to human error when applied in conventional diagnostic laboratories. The use of uniform outcome measures in peripheral neuropathies has been commended recently, to enable comparison between studies [28].

In this study, we explored whether a gold standard-related LDT could be developed, with decreased labor and increase of standardization as primary goals, by automating the staining method. Automated slide staining use implied a significant decrease in section thickness of the skin biopsies. The choice of using 16 µm-thick sections was driven by the work of Torres and colleagues [29] and Hedreen [30], both describing that thicker sections mounted on glass slides can present suboptimal penetration of immuno-reagents into the tissue, leading to a lost-cap phenomenon. The use of thinner sections for subsequent IENF assessment has been proven valuable by different groups, not only for SFN detection, but also the study of multiple nerve markers including ion channels and receptors in the same biopsies and the progression and/or follow up of disease [6, 22, 3136].

The design of the LDT and its subsequent validation was based on requirements of CAP, CLIA and a local accreditation board (BELAC) to be able to introduce the LDT in large scale batch testing required for clinical trial testing. We explored whether the LDT could meet the needs for both an accredited laboratory and the pharmaceutical industry, while maintaining a good concordance with the gold standard. In this study, fluorescence detection was preferred to allow future advanced imaging as described recently [37, 38]. Published data showing good concordance between fluorescent and bright-field assessment of IENF in the diagnosis of SFN justifies this choice [39]. Implementation of fluorescence in IENF assessment additionally helps with the ease of studying multiple targets in a single tissue section.

Accuracy of the PGP9.5 antibody was confirmed in our hands, being concordant with results described by groups of Brichory, Krimm, Tokumaru and Barrachina [4043].

Once good accuracy was confirmed, the automated user-defined assay performed on the Discovery XT® was challenged against the gold standard using skin biopsies of healthy volunteers. We first demonstrated that performance of the assay in our accredited laboratory according the EFNS advised gold standard method meets all criteria. A good concordance was reached with published normative values (9.8–12.4 IENF/mm for female subjects in the age group of 33–52) as described by Lauria and co-workers [23] when assessing IENF in skin biopsies of predominantly female subjects included in this study. The gold standard could distinguish between healthy and SFN subjects with high significance and served as a reference.

When exploring the IENF density as assessed by the LDT, results clearly indicated that the LDT’s ability to visualize IENF in skin of healthy subjects is considerably lower compared to the gold standard, whereas a good concordance was reached when both IENF and IENF fragments were considered (total ENF). A higher number of IENF and visualization of their branches using the gold standard is in line with the increased occurrence of nerve fiber fragments observed in the thinner sections of the LDT. Considering the threefold decrease in section thickness, one would expect to have a threefold decrease in linear densities of the evaluated parameters. As we found an approximate twofold decrease for healthy subjects and 1.25-fold decrease in SFN subjects compared to the GS, one must consider that nerve fibers might be included more than once in the counting strategy when the LDT is applied. Nevertheless, the diagnostic performance of the LDT was equal to that of the gold standard in discriminating healthy from SFN subjects with high significance when IENF was assessed.

Additionally, an excellent agreement between both staining methods was found for all parameters quantified in biopsies from subjects with (experimental) SFN. Both the number of IENF, IENF fragments and their secondary branching could be equally visualized. In addition, previously published work documented concordance of the LDT’s outcome with TRPV1 staining performed in a different laboratory. Results describing a functional recovery, prior to morphological recovery in a time course study after capsaicin was applied, were consistent with findings of other groups [18].

When the diagnostic yield of the LDT was measured, we concluded that IENF and IENF fragments are both mandatory to be included in enumeration when the LDT is applied. IENF assessment alone lacked concordance with the gold standard method. When the total number of nerve fibers (IENF and IENF fragments) was considered, the LDT reached relatively good concordance with the gold standard, with strong sensitivity and specificity. Finally, the LDT showed an excellent robustness and an excellent inter-observer concordance.

Conclusions

To conclude, in this proof of principle study we evaluated a standardized, on-slide, automated staining procedure for the assessment of the linear density of epidermal nerve fibers. This method demonstrated an equal performance to the gold standard in distinguishing SFN from healthy subjects. By automation and on-slide application of staining thinner sections, labor, variation of results induced by human manipulation and differential penetration of immunoreagents are drastically reduced. The decreased sensitivity, however, for detecting the complex three-dimensional branching network of epidermal nerve fibers is important to consider, as depletion of more distal axons is likely to be an early feature of dying back neuropathies. In this perspective, the appearance of collateral sprouting, typical for regenerating nerve fibers [44], and the LDT’s capability of discriminating that from normal epidermal nerve fibers, needs further investigation. The same applies for the risk of missing the SFN status in older patients. Effectiveness in detection of important cutaneous nerve abnormalities, such as axonal swellings, crawlers and sprouts, by the LDT need to be examined as well [45]. Formation of a larger cohort, minimally 35 per group as determined by power calculation and inclusion of a solid age and gender distribution is required to determine reference values and to determine whether the dynamic range of the LDT is acceptable. Power calculation showed that the current study design only offers 29 % power to detect a difference in means of the calculated SD units. Since epidermal nerve fiber fragments need to be included in the now more laborious counting strategy, compatibility of this technique with the semi-automated analysis recently presented [38] should be investigated as well. This technique could lead to high reproducibility of counting within and between neuropathological institutes. Introducing proficiency testing and dermal nerve quantification [46] could benefit in standardization.

Full automation of the staining procedure could be valuable and lead to accessible, stable and reliable testing in clinical trials and diagnostics for clear SFN detection. A number of limitations however exist for this LDT which require expert-evaluation in a larger cohort.