Introduction

The human mammary gland is composed of discrete ductal–alveolar systems that originate at the nipple and branch through the surrounding stroma toward the chest wall. The majority of breast carcinomas arise from the epithelial cells that are lining the terminal ducts. As a renewal tissue, the breast epithelium exfoliates cells and secretes fluid into the luminal compartment of the gland, and this fluid can be collected noninvasively by nipple aspiration. In nipple aspiration, a simple handheld suction cup is placed on the nipple and used to quickly obtain concentrated fluid droplets at nipple openings. This procedure is noninvasive and is successful in most nonpregnant/nonlactating women [1], with the typical yield of fluid volume between several to several hundred microliters, protein content between several to several hundred micrograms [25].

Compared to serum, breast fluids potentially offer a superior source of biomarkers for breast cancer as the proteins are directly sampled from the tumor microenvironment, highly concentrated, and yet with a protein composition much less complex. The potential drawback is limitation on fluid volume. Pooled samples are often required to afford comprehensive proteomics analysis [6, 7]. Using Liquid Chromatography and Tandem Mass Spectrometry (LC-MS/MS), Varnum et al. [8] reported the identification of 64 proteins in NAF pooled from healthy volunteers, the largest number of proteins identified so far in NAF.

The unique feature of the breasts as paired organs allows a direct comparison between the affected and unaffected breasts of the same cancer patient. This could substantially minimize the biological variability between individuals and the same women at different time of her physiological cycle. For this reason, sample pooling should be avoided provided that the proteomics platform is sufficiently sensitive for protein samples at microgram level. To avoid pooling of the samples, we and several other groups have explored the utility of SELDI. SELDI is a protein chip array-based mass spectrometry method. Compared to the classical LC-MS/MS method, this platform has the advantage of minimal sample requirement (1–4 µg), the disadvantage of its inability to provide immediate protein identifications. As protein identification requires separate effort, and is time consuming, the number of candidate proteins identified through this approach is small. In fact, among the several studies that described differentially expressed protein peaks [911], we are the only group reporting protein identification (human neutrophil peptide 1–3 [4], fragment of alpha-antitrypsin inhibitor (unpublished result)). Protein identification is not only important in understanding the underlying mechanism of the potential biomarker, but also important in the development of antibody based immunoassays, which is the commonly adopted test format for tumor biomarkers in clinical practice.

Limited by the number of proteins we could characterize using SELDI, we have turned to a targeted proteomics approach that allows the direct analysis of unpooled NAF specimens for biomarker discovery. This is the high-content Antibody Microarray developed by Clontech. The array in its current format consists of 512 distinct monoclonal antibodies covalently coupled to a glass slide; each antibody serves as bait for its specific target antigen (You can obtain the list of antibodies on http://www.clontech.com/images/brochures/FL943083_Antibodies_List_FL.pdf).

The initial selection of these antibodies among a large pool of available antibodies was based on their antigen specificity, sensitivity (limit of detection), liner response (the ability of the antibody to detect changes in antigen abundance), and antibody function when printed on glass slide. At present, this is the highest content antibody array commercially available. Designed as a screening tool for correlating proteins with a physiological or pathological process, the targets of the printed antibodies on the Antibody Microarray represent a broad range of biological functions, including signal transduction, cell-cycle regulation, gene transcription, apoptosis, cell growth, oncogenesis, and neuron development.

The Clontech Antibody Microarray was first available in 2003. It was not suitable for analysis of NAF specimens until its recent development of the nanoscale labeling kit. The regular labeling protocol requires approximately 1 mg of total protein, and the yield of protein in NAF specimens is often in the range of several to hundred micrograms. The new nanoscale kit uses FluorTrap Matrix Technology to purify fluorescently labeled proteins from as few as 1,000 cells (microgram level). None of the other technologies would allow us to obtain relative abundance of such a wide variety of proteins using this little sample.

Taking advantage of this technology advancement, we were able to compare on the same antibody array the relative abundance of 512 proteins in matched NAF from cancer patients, using 1 µg of protein from each specimen. The reciprocal labeling of paired samples allowed us to select candidate biomarkers with higher confidence; the potential biomarkers identified through this approach were presented.

Materials and Methods

Patients

We obtained NAF samples from a clinical trial that involved four local breast centers: Breast Center at Johns Hopkins, Breast Imaging Center at Johns Hopkins, The Hoffberger Breast Center at Mercy Hospital, and Breast Imaging Center at Johns Hopkins Greenspring Station. The eligibility criteria were as follows: age 18–55 (we have found that our success in obtaining NAF decreases dramatically after age 55), no previous cancer other than nonmelanoma skin cancer, not currently pregnant, at least 12 months since the patient ceased breast-feeding, not taking hormone replacement therapy, no previous breast reduction or augmentation surgery, capable of providing informed consent. Controls were enrolled from the two breast imaging clinics after a normal mammogram.

We performed antibody microarray analysis on the matched samples of 20 individuals, including patients with invasive cancer (stage I/II, 13), DCIS [2], ADH [3], and normal controls [2] (Table 1). Additional clinical information (when applicable) not listed in the table includes: age, weight, height, education, race, family history, medical history of breast related disease, tumor size, Elston Grade, mBR score, pT, pN, pM, Node status, and status of ER, PR, and Her2.

Table 1 Selected candidate biomarkers

Nipple Aspiration Procedure

The Research Coordinator obtained NAF from patients in the above clinics using well-established methods [1214] that we have successfully employed in previous studies. A modified breast pump was used, comprised of a finely polished clear plastic cup with a Luer-lok end attached directly to a 20-ml syringe. The cup has an adhesive pad that helps it stay on the breast prior to application of suction (FirstCyte, Inc., Boxborough, MA, USA).

Before aspiration was attempted, the nipple was cleansed with Cerumenex, a solvent used to dissolve earwax (which, like NAF, is an apocrine secretion). This helped to dissolve keratin plugs that normally block the duct openings. A warming pad heated in a microwave was placed over the breast while the questionnaire was filled out, which helped increase flow of NAF. The pad was removed, and the woman was instructed to massage her breast (using hand lotion on her hands) for 5 min, starting at the base of the breast, encircling with both hands, and massaging toward the nipple (taking care not to get lotion on the nipple). The cup was placed over the cleansed nipple of the breast, and the woman compresses her breast with both hands while the plunger of the syringe is withdrawn to 10 ml and held for 8–10 s. Droplets of fluid that appeared at any duct openings on the nipple were collected into capillary tubes. Three attempts of 8–10 s each were made to obtain fluid. NAF was immediately blown into an Eppendorf tube containing 100 μl phosphate-buffered saline (pH 7.4), and the tube was vortexed. The process was repeated on the other breast, and the samples were stored in separate labeled tubes. The volumes of NAF were recorded for each patient and each breast. All samples were kept on ice after collection and were transferred to a −80°C freezer within 4 h of sampling.

Antibody Array Analysis

Nanoscale reciprocal labeling and array analysis were performed according to the manufacturer’s instruction (Clontech). Modifications made on the original protocol included adjustment on amount of NAF protein, amount of FluorTrap Matrix for dye removal, as well as length of incubation and washing steps to achieve optimal signal-to-noise ratio. Briefly, 1 µg of each of the paired NAF (A and B) were labeled with Cy3 (green fluorescent dye) and Cy5 (red fluorescent dye). The labeling mix were combined (A-Cy3 with B-Cy5, A-Cy5 with B-Cy3), mixed with FluorTrap Matrix to remove the unbound dye, and the labeled proteins were recovered by passing the mixture through a 0.22-µm spin filter. After incubation of labeled proteins with antibody arrays, the unbound proteins were washed, and the arrays were scanned using GenePix 4000 scanner.

Data Analysis

The data were analyzed using Nanoscale workbook provided by Clontech. The ratios, A-Cy3sum/B-Cy5sum from the first slide and B-Cy3sum/A-Cy5sum from the second slide, were used to normalize differences due to inherent variations in dye labeling, respectively. After normalization, relative signal intensity between Cy3 and Cy5 on each spot represents the relative abundance of the target antigen in the paired samples. For each antibody, four data points were available, duplicate spots on the same slide, and two slides for each sample pair. The quality of the data was evaluated by degree of consistency between duplicate spots and between slides. Differential analysis between sample A and B was performed at different stringency settings. The recommended low stringency setting represents >10% difference observed on one of the two slides, medium stringency setting represents >10% difference (same trend) observed on both slides, and high stringency setting represents >15% difference (same trend) observed on both slides.

Western Blot Analysis

The antibody array is a multiplexed platform which might be more susceptible to interferences than single immunoassays. To affirm the validity of our findings, we performed western blot analysis on one selected antigen, cathepsin D, to see if we could duplicate the expression pattern observed on antibody array. Monoclonal antibody against cathepsin D was obtained from BD Biosciences (CA, USA), the same antibody provider for the production of Clontech Antibody Microarrays.

NAF protein (3 µg) were separated on a precast 4–12% SDS Bis–Tris gel along with a protein size standard with molecular weights ranging from 3 to 185 kD (Invitrogen, CA, USA). The protein was transferred onto PVDF membrane using XCell blotting module (Invitrogen, CA, USA) for 1 h. Membranes were blocked overnight at 4°C in PBS-T buffer (1× PBS with 0.05% Tween-20) containing 5% nonfat milk. After blocking, the membranes were incubated at room temperature for 1 h with the primary antibody obtained from BD Biosciences. The membrane was then washed with PBS-T 5 min three times and incubated with secondary antibody conjugated with horseradish peroxidase (rabbit anti-mouse IgG, Pierce, WI, USA) for 1 h at room temperature. The membranes were washed again in PBS-T, and the bound antibody was illuminated with a SuperSignal West Femoto Maxium kit (Pierce, WI, USA). The blot image was recorded using ChemiDoc XRS (Bio-Rad, CA, USA).

Results and Discussion

Determination of Experimental Variability and Biological Variability Between Paired Breasts of Assumed Healthy Women

Prior to the analysis of cancer specimens, we performed self-to-self and left-to-right comparisons using control samples. These two experiments were to establish the experimental variations (self-to-self) and the biological variability between the normal paired breasts (left-to-right).

The same sample (self-to-self) was labeled with green and red dye to determine experimental variability. Eleven proteins were listed as differentially expressed at low stringency (>10% change on one slide), one at medium stringency (>10% change on reciprocal slides, same trend), and zero at high stringency (>15% change on reciprocal slides, same trend). The experimental noise can be filtered using the 15% stringency filter.

The normal biological variability in the matched fluids of the same healthy individual was determined by a left-to-right comparison. Two normal subjects were studied. Using the 15% stringency filter, three proteins were listed as differentially expressed in one pair of sample and two proteins were listed as differentially expressed in the other pair.

Based on these two experiments, we set the selection criteria for “difference” as >15% change on reciprocal slides, same trend. At this cutoff, none of the 512 proteins were found differently expressed in the self-to-self experiment; minimal number of proteins (two and three, respectively) was flagged as differentially expressed in the two left-to-right experiments. Although a more stringent setting could potentially reduce the normal biological variations between paired breasts to zero, we chose this moderate setting as it would not eliminate too many of the candidate biomarkers at the initial screening step.

Differential Analysis Between Affected-To-Unaffected Breasts of the Same Cancer Patient

We have analyzed paired NAF samples from 18 patients (13 invasive cancers, two DCIS, and three ADH). An example of the array images is shown in Fig. 1. C3e8 and C3f8 are duplicate spots of antibody against protein kinase C, delta. C3g8 and C3h8 are duplicate spots of antibody against protein tyrosine kinase 2 beta (PTK2B). Two slides were generated from each pair of NAF by dye-swap (subject 30, healthy control, slides A and B; subject 50, invasive cancer in the right breast, slides C and D). No difference was observed on both antigens between left-to-right breasts of the control subject; spots appear yellow. Higher expression of PTK2B was observed in the affected breast of the cancer patient; spots appear red on one slide, green on the other.

Fig. 1
figure 1

Magnified array images showing differentially expressed antigens. C3e8 and C3f8 are duplicate spots of antibody against protein kinase C, delta. C3g8 and C3h8 are duplicate spots of antibody against protein tyrosine kinase 2 beta (PTK2B). Two slides were generated from each pair of NAF by dye-swap (subject 30, healthy control, slides A and B; subject 50, invasive cancer in right breast, slides C and D). No difference was observed on either antigen between left-to-right breasts of the control subject; spots appear yellow. Higher expression of PTK2B was observed in the affected breast of the cancer patient; spots appear red on one slide, green on the reciprocal slide

In contrast to the relatively constant pattern of protein expressions between the normal paired breasts, there are a much greater number of changes observed in the tumor associated NAF samples. Using the selected stringency filter of 15%, the number of differentially expressed antigens varied between 10 and 72 in the tumor-associated NAF samples. There is also an apparent patient-to-patient variability when we compare the list of the differentially expressed proteins, suggesting a highly variable tumor microenvironment patient-to-patient.

Selection of High Rank Antigens that is Prevalent Among Patients

To identify antigens that are prevalent across patients, we compiled all candidate antigens from 18 patients and ranked them based on a calculated probability score PS (PS = N+ − N, N+, number of patients with elevated antigen in the affected breast; N, number of patients with reduced antigen in the affected breast). This ranking was done automatically using an in-house computer program that was developed for this application.

As presented in Table 2, the best-ranked antigen is NME. This protein has an N+ of 6 (elevated in the affected breast of six patients), an N of 0 (zero patient showed reversed expression pattern), which resulted a probability score of 6. Going down the list, the probability of a candidate antigen as a potential biomarker is getting lower, as reverse expression patterns for the same protein are observed in different patients. There are 14 antigens with a probability score > = 0, among them eight has a score of 4 and higher. These eight antigens were selected as candidate biomarkers for further evaluations.

Table 2 Summary of candidate biomarkers

Differential Expression Pattern of Cathepsin D Detected by Antibody Array Analysis can be Validated by Western Blot

The validity of the protein expression patterns detected by the antibody array analysis was tested by western blot. Limited by the volume of each NAF sample and the detection sensitivity of the western blot analysis, we decided to use cathepsin D as a model protein for this study. Among all antigens tested on the antibody array, cathepsin D is the most abundant based on its fluorescence signal. We obtained monoclonal antibody against cathepsin D from BD Biosciences (CA, USA), the same antibody provider for the production of Clontech Antibody Microarrays.

Based on the antibody array analysis, cathepsin D is elevated in the affected breast of patient 48, 50, and 72 and showed a reverse pattern in patient 39, 80, and 84 (Table 2). Consistent with this pattern, we detected cathepsin D bands in the three tumor-associated samples with elevated fluorescent signals, but no appreciable band in their contralateral controls (Fig. 2). As also expected, we could not detect any appreciable band from the paired samples of the normal control, as well as of patient 80, whose affected breast showed lower cathepsin D signal on antibody arrays.

Fig. 2
figure 2

Expression pattern of cathepsin D detected by antibody array can be validated by western blot. Equal amounts of NAF protein (3 µg) were separated on a precast 4–12% SDS Bis–Tris gel. Cathepsin D was detected using monoclonal antibody from BD Biosciences. Samples from four patients and one control were studied. Single asterisk indicates tumor-associated NAF, triple plus signs indicate elevated expression detected by antibody array, broken line indicates decreased expression detected by antibody array

Four of the Selected NAF Markers are Supported by Published Immunohistochemistry Studies on Breast Cancer Tissues

We did an extensive literature search to obtain more information on these candidate proteins. The searched database included the human protein reference data base (www.hprd.org), the human protein atlas (www.proteinatlas.org), and reference database PubMed.

The subcellular localizations of the candidate proteins are diverse. NM23A and HSP60 can be found extracellularly, GFRA1 is primarily membrane bound, and PTK2B, ARRB1, MRIP, and SLP76 are primarily cytoplasmic. As proteins in NAF can be either secreted or as a result of cell lysis, observation of both intracellular, membrane bound, and extracelluar proteins in NAF are not surprising and have been previously reported by others [8].

Several of the candidate proteins have been studied by immunohistochemistry in breast cancer tissues. NM23A, ARRB1, GFRA1, and HSP60 [1517] are elevated in breast cancer tissue, supporting the observed elevated protein level in cancer associated NAF (Table 2). Additionally, HSP60 has been identified as tumor-associated antigen, autoantibodies directed against HSP60 were present in 16/49 (31%) early stage breast cancer and 18/58 (32.6%) DCIS patients, compared to 4/93 (4.3%) healthy subjects [17]. The protein candidates without existing tissue data will be validated in future studies.

Concluding Remarks

Nipple aspiration fluid is a rich source of breast cancer biomarkers. The unique anatomical feature of the breast as a paired organ also makes it a perfectly matched case and control system not confounded by interindividual variability. However, due to the limited fluid production from each breast, comprehensive proteomics analysis is difficult to achieve using conventional top–down approach. To overcome the limitation on sample quantity, we explored the utility of a targeted approach that takes advantage of the technology advancement on nanoscale two-color labeling and high-content antibody microarrays. This approach is proven effective; we were able to perform differential analysis on 512 proteins in matched NAF samples, the largest number of proteins ever surveyed in NAF.

Based on results obtained from 18 patients with early stage breast cancer and precancerous lesions, we made the following observations: (1) There are appreciable protein changes in NAF at the very early stage of the tumor development. (2) The differentially expressed proteins are different patient-to-patient, suggesting a highly variable tumor microenvironment. (3) We did not find a single biomarker that could recognize all 18 patients. The best marker identified in this pilot study has a sensitivity of 33% (6/18). Knowing breast cancer is highly heterogeneous, this result is expected, and our anticipation is to find a good combination of several of these biomarkers that jointly could offer a much more desirable sensitivity.

We provided initial validation studies on our findings. The validity of the antibody array findings was affirmed by single immunoassay on western blot, and elevated expression of four of the selected markers in NAF is supported by published immunohistochemistry studies on breast cancer tissues. With this, we are confident to move on to the next stage of the tissue validation studies, as well as testing the markers in a larger cohort of NAF samples. As the scope of this biomarker screening is limited to the content of the antibody arrays, we also anticipate that more protein abnormalities can be discovered when new generation of the arrays become available.