Skip to main content

Deep Learning Enables Individual Xenograft Cell Classification in Histological Images by Analysis of Contextual Features

Abstract

Patient-Derived Xenografts (PDXs) are the preclinical models which best recapitulate inter- and intra-patient complexity of human breast malignancies, and are also emerging as useful tools to study the normal breast epithelium. However, data analysis generated with such models is often confounded by the presence of host cells and can give rise to data misinterpretation. For instance, it is important to discriminate between xenografted and host cells in histological sections prior to performing immunostainings. We developed Single Cell Classifier (SCC), a data-driven deep learning-based computational tool that provides an innovative approach for automated cell species discrimination based on a multi-step process entailing nuclei segmentation and single cell classification. We show that human and murine cell contextual features, more than cell-intrinsic ones, can be exploited to discriminate between cell species in both normal and malignant tissues, yielding up to 96% classification accuracy. SCC will facilitate the interpretation of H&E- and DAPI-stained histological sections of xenografted human-in-mouse tissues and it is open to new in-house built models for further applications. SCC is released as an open-source plugin in ImageJ/Fiji available at the following link: https://github.com/Biomedical-Imaging-Group/SingleCellClassifier.

Introduction

Most of our understanding of mammary gland development and breast carcinogenesis stems from experiments with animal models. Mice are by far the most widely used experimental system due to their size, ease of use, and most importantly, the ability to establish genetically-engineered mouse models (transgenic mice are one subtype of genetically-engineered mouse models). However, approximately 90% of potential oncology drugs fail in clinical trials [1, 2], partly because of the lack of adequate preclinical models, raising concerns on how representative of the human physiology and disease data derived from mice are.

Patient-Derived Xenografts (PDXs), namely preclinical models developed by transplanting human-derived cells into immunosuppressed or humanized mice, currently best recapitulate the complexity of human tissues and are increasingly employed for translational research [3,4,5]. Classically, mammary xenografts are generated by transplantation of pieces of primary breast tissues to the mammary fat pad of recipient immunosuppressed mice [6]. However, under these settings, the PDXs growth and their HR expression are dependent on estradiol supplementations which, resulting in serum E2 equivalent to mid-menstrual cycle levels [6, 7], alter the physiological relevance of this preclinical model. Recent advances in this field were achieved with the Mouse INtraDuctal (MIND) model, which entails the injection of primary human-derived breast cells directly into the mouse mammary ductal tree via cleaved teat [8, 9]. In the intraductal microenvironment, primary HBECs and breast cancer cells grow independently of exogenous hormone supplementations while retaining their HR expression and hormone responsiveness, making the MIND model an appealing preclinical tool [8, 10,11,12,13,14]. Hence, the MIND model provides the unprecedented opportunity to study the role that individual HRs play in the luminal compartment of the human breast epithelium by, for instance, histological techniques [9]. However, molecular analyses in xenograft models are hindered by the presence of both human and murine cells, which can lead to data misinterpretation due to contamination of different cell species. Therefore, prior to performing specific immunostaining to assess levels of proteins of interest in the xenografted cells, paraformaldehyde-fixed paraffin-embedded sections are usually stained by Haematoxylin and Eosin (H&E) in order to obtain a rough inference of the abundance of human cells within the tissue of interest based on morphological features. Although human cells appear usually bigger in size and more elongated than their murine counterparts, manual evaluation is error-prone, time-consuming and subject to inter-personal variability. This warrants the need for better tools to reveal species-specific features.

Machine learning techniques have been effectively applied in a number of different fields and emerged as valuable resources to decipher the content of biological images [15, 16]. While some methods have already been developed to analyze H&E stained human histological sections, their aim was mainly set on tissue segmentation [17,18,19] and more sophisticated supervised learning-based tools have been created to perform nuclei segmentation [19]. Here, we hypothesized that deep learning could be employed in order to automate human-mouse cell discrimination in intraductal xenografts, a challenging task due to high biological and technical heterogeneity. We developed Single Cell Classifier (SCC), a data-driven machine learning-based approach capable of classifying either normal or malignant individual xenografted cells from murine cells in the same histological section according to specific features rather than images. Upon evaluation of a total of 484 cell-intrinsic or contextual features, SCC was proven to reach up to 96% of classification accuracy, with contextual information playing a major role on the classification performance. SCC is supplied as a publicly available plugin in ImageJ/Fiji [20] and can be downloaded at the following link: https://github.com/Biomedical-Imaging-Group/SingleCellClassifier.

Methods

Intraductal Injections

Single cells readily isolated from reduction mammoplasty specimens or invasive breast cancers were lentivirally infected with firefly luciferase (luc2) and GFP (Lenti-ONE CMV-GFP(2A)Luc2 WPRE VSV, ref = V621004001 or pR980 Luc2GFP, GEG Tech), allowing to track the growth of patient-derived cells in vivo, and promptly xenografted into 7-to-12 week old NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (NSG) mice. Animals were anesthetized by intraperitoneal injection with 10mg/kg xylazine and 90mg/kg ketamine (Graeub) and intraductally injected with 10 \(\mu\)L of PBS containing 500’000 cells per gland, as previously described [9]. At sacrifice, xenografted mammary glands were harvested for histological analyses.

Histology

Tissues were fixed in 4% paraformaldehyde and paraffin-embedded for histological analyses. Paraffin blocks were cut and 4 \(\mu\)m-thick sections were mounted onto 75 x 25mm Superfrost Plus microscope slides (Thermo Scientific, USA, ref = J1800AMNZ). H&E staining was performed according to standard protocol. For immunostaining, sections were deparaffinized in xylene and re-hydrated [10]. Antigen retrieval was carried out in 10mmole sodium citrate (pH 6.0) at 95 \(^\circ\)C for 25min. Blocking was performed with 1% BSA for 60 minutes. Sections were incubated overnight with primary antibodies, followed by 1-hour incubation with secondary antibodies. For fluorescence microscopy, nuclei were counterstained with DAPI (Sigma) and then mounted with Fluoromount-GTM (cat# 4958-02, Invitrogen). E-cadherin antibody: clone G10; sc-8426; dilution 1:100; Santa Cruz. Cytokeratin 7 antibody: clone SP52; ab183344; dilution 1:500; Abcam. CD45 antibody: clone 30F-11; 14-0451-82; dilution 1:200; eBioscience. Secondary anti-mouse antibody: Alexa 488-conjugated; clone A-11029; dilution 1:700; Thermo Fisher Scientific. Secondary anti-rabbit antibody: Alexa 488-conjugated; clone A-21206; dilution 1:700; Thermo Fisher Scientific.

Image Acquisition

Slides were scanned with Olympus VS120-L100 slide scanner using a 20x/0.75 objective connected to a Pike F505 C Color camera. Because of the size of the resulting data (Figs. 1a and 2a), images were loaded into QuPath [21] using the BioFormats extensionFootnote 1. The slide images are publicly available on ZenodoFootnote 2.

Fig. 1
figure1

Characterization of histological images of H&E-stained mouse mammary gland intraductally xenografted with human breast cells. a Complete slide image, image size of approximately 1 billion pixels. Boxed areas mark human and mouse cell clusters. Scale bar, 5000 \(\mu\)m. b Magnified area of a human cluster. Scale bar, 100 \(\mu\)m. c Magnified area of human cells. Scale bar, 20 \(\mu\)m. d Manual annotations of the human nuclei associated with the cells of interest randomly colored for distinction purpose. Scale bar, 20 \(\mu\)m. e Magnified area of a murine cluster. Scale bar, 100 \(\mu\)m. f Magnified area of murine cells. Scale bar, 20 \(\mu\)m. g Manual annotations of the murine nuclei associated with the cells of interest randomly colored for distinction purpose. Scale bar, 20 \(\mu\)m

Data Extraction

Due to the pyramidal nature of the whole slide scanner images, a version of each image was extracted to define areas likely to contain ducts. Because ducts can be defined as relatively large, densely packed cell regions having a dense Eosin signal, we can use this information to extract them using Fiji’s [20] Color Deconvolution with the built-in H&E DAB vectors. First, a 4-fold downsampled version of each whole slide image was sent from QuPath [21] to ImageJ [20]. The extracted signal was then filtered with a Gaussian kernel of \(\sigma ={2}{px}\) before thresholding with ImageJ’s Default method. Finally, connected components analysis (AnalyzeParticles) was used to obtain ROIs. The bounding boxes of these ROIs were extracted and enlarged slightly to ensure no ducts were touching the edge of the image. Resulting bounding boxes were reimported into QuPath [21] as annotations and used to perform the export of the full resolution ducts as .tif images.

Nuclei Detection

As defining precise boundaries between neighboring cells can be challenging, we segmented the nuclei as a first step using StarDist [22, 23] (Fig. 2b), a state-of-the-art method outperforming classical approaches to detect star-convex objects for 2D images using a neural network. Such networks can be trained from various image types and the detection is represented as a labeled image where each nucleus is associated with a label.

Fig. 2
figure2

SCC pipeline. a Source example for the pipeline, an H&E-stained “humanized” mouse mammary duct. b Label image of the nuclei detected by StarDist randomly colored for distinction purposes. c Label image of the cells estimated from the nuclei randomly colored for distinction purpose. d Features are extracted from the nuclei and the cells. e Input image overlaid with the cell class computed from their associated features with a neural network. Green color defines human cells and red murine cells. f Diagram of the cell delineation with respect to two criteria: cells should not overlap with each other and the cell thickness (distance to the nuclei) should not exceed 2 \(\mu\)m. g Source example for the contextual features, an H&E-stained “humanized” mouse mammary duct. h The Delaunay graph in red over the detected nuclei associated with the source, randomly colored for distinction purpose. i. The cell chain graph over the detected nuclei associated with the source, randomly colored for distinction purpose, red segments correspond to cell chain. Scale bars, 100 \(\mu\)m

A StarDist model was trained with a set of 24 images of size 320x320 pixels extracted from H&E-stained sections of normal human breast xenografted mammary glands (Supp. Fig. 7). The training data is publicly available on the GitHub of SCC. The nuclei of these images were manually annotated for a total of approximately 2500 nuclei (Fig. 1b-g). The training was performed in Python using TensorFlow on Google Colab with GPU on 100 epochs for 40 minutes. The StarDist model was configured to detect 48 rays objects and a dropout of 0.5 was added to the network to avoid overfitting.

Cell Delineation

The estimation of the cells (Fig. 2.c,f) was computed according to two criteria: (i) cells should not overlap with each other and (ii) their thickness, defined as distance to the nuclei, should not exceed \(\Delta ={2}{\mu m}\). This maximal value is an estimation of the expected thickness for the type of cells we aim to classify, and can be edited by the users. The mask of the first criterion was computed using a Voronoi diagram on the nuclei labels and of the second criterion by thresholding, at \(\Delta\), a distance map of the nuclei labels. Hence, each detected cell will be associated with its nucleus. These criteria can be represented as masks that can be integrated to delineate each cell.

Measurement

For classification, cells needed to be properly described (Fig. 2d). To do so, 484 features were extracted from the detected nuclei and cells. These features can be related to the object itself (i.e. cell-intrinsic features), or related to their neighbors and their organization (i.e. contextual features). Out of these, 47 concern cell-intrinsic features, 376 contextual features are related to the cell-intrinsic features of the neighboring cells and the remaining 61 contextual features describe the organization of the neighbors.

Shape and Size Features

Cell-intrinsic features related to the shape and the size of both nuclei and cells were measured by fitting an ellipse on the targeted object to extract its elongation, minor axis, and major axis as features. Additionally, the area, and in particular the area ratio between the nuclei and the cell, was exploited for the analysis.

Textural Features

Cell-intrinsic textural features were extracted from the pixels of the source images at specific areas for each cell. The cells were divided into two areas: (i) their nuclei and (ii) their estimated cytoplasms, defined as the cell area without the nucleus. Such features can be divided between features related to the color and features related to the texture itself, i.e. the local pattern and spatial organisation of pixel intensities. For the color-related features, the mean and the variation of the values of each color channel were computed. For the features related to the texture, the Haralick texture features [24, 25] were calculated on a gray level version of the source image. The Haralick texture features provide 13 statistical parameters related to the pixels such as their entropy or their contrast. The gray level image was computed from a principal component analysis (PCA) on the channels of the image, with the first component corresponding to the factors to apply on the channels, thereby maximizing their variance, so that the maximum amount of information can be measured for the texture.

Contextual Features

Our contribution was to propose an additional set of features based on spatial arrangement of each cell and its neighboring cells, hereafter referred to as contextual features.

To efficiently determine the closest neighbors of a cell, we computed a neighborhood graph using the Delaunay Triangulation algorithm [26] (Fig. 2h). The cells are represented as nodes on this graph, with each node corresponding to the centroid of the cells. We identified the cluster of closed cells using the shortest-path algorithm (Djikstra).

In sake of versatility, several kinds of neighboring structures were inspected:

  1. (i)

    The neighbors directly connected to the cell on the Delaunay graph give information about the position of a given cell in the cluster given the mean and the variance of their distance. A cell described by high distance variance from its direct neighbors will have a high probability to be located at the border of a cluster.

  2. (ii)

    The lateral neighbors are the two cells located on the left and the right of a given cell given the orientation of the nucleus ellipse. Let’s define a line N normal to the major axis of the cell \(c_1\) and passing by the center of \(c_1\). Then a cell \(c_2\) is considered to be the lateral neighbor of \(c_1\) if the closest distance between the center of \(c_2\) and the line N is lower than half of the major axis of \(c_1\). Features like the alignment, the distance, and the difference in orientation were measured. When two cells are mutually lateral neighbors, they can be iteratively connected to create a chain of cells (Fig. 2i). Features related to these chains, which are relatively abundant in human cell clusters, are their tortuosity and size.

  3. (iii)

    To cover different types of cell aggregates, 4 sizes of vicinity were analyzed represented by K equal to 5, 10, 20, and 40 cells. These K neighbors are the set of the K closest neighbors to the current cell. The simultaneous use of a range of K values allows the extraction of both local (K small) and global (K large) features. It is possible to extract distance features (mean and variance) between the K neighbors and the current cells and, in particular, the distance of the current cell to the centroid of the set of K neighbors. The distance between the current cell and the centroid of the set provides information about the homogeneity in the set since, in a homogeneous cluster, it is expected that the current cell is very close to the position of the centroid as the neighbors should be spread around the current cell homogeneously.

  4. (iv)

    The K connected neighbors are the set of the K closest neighbors to the cells that are physically connected to each other. Like the K neighbors, the K connected neighbors provide at the same time both local and global information of the neighborhood of the cell taken into account, by considering the same K values. Moreover, a peculiar evidence is that cells that are physically connected share common features. This property motivated the extraction of the cell-intrinsic features of the neighbors through means and variances. It is also relevant to observe the shape of the cluster formed by the neighbors, which can be performed by computing an ellipse in a similar fashion than what was previously performed for the nuclei and cells shape features.

Classification

To classify the cells based on their features (Fig. 2e), a neural network model was trained by supervised learning in Python using TensorFlow. A set of 174 images of various sizes were extracted from H&E-stained xenografted mouse mammary glands (Supp. Fig. 8). Among these images, 96 mostly contain normal human cells, whereas 78 contain only mouse cells. To ensure that an image contains normal human cells, a human-specific E-cadherin (E-CAD) antibody was used in order to uniquely probe for xenografted cells (Supp. Fig. 10). This set of images represented about 60’000 cells after detection, of which 26’000 were human. Thereby, 26’000 mouse cells were considered to balance the number of cells between the two classes of interest such that their influence on the training is equivalent.

To classify tumor-derived PDXs, 17 images that contain mostly human tumor cells were added to the previous set. In order to circumvent interpretation problems due to variable E-cadherin protein expression levels in tumor cells [27], tumor cells were uniquely detected exploiting a human-specific cytokeratin 7 (CK7) antibody (Supp. Fig. 11). The final set represented about 80’000 cells with about 40’000 mouse cells, 26’000 normal human cells, and 14’000 human tumor cells.

To classify PDXs in fluorescence single channel DAPI-stained images, 12’000 cells in each class were extracted from DAPI and E-CAD stained xenografted mouse mammary glands. The E-CAD channel was used as control and the DAPI channel as source for the classification.

To define the species of a cell, classes masks were manually annotated for each of the images based on their fluorescent controls, namely they were compared to adjacent sections that were probed for xenografted cells by means of the above described human-specific antibodies (Supp. Fig. 10, 11).

A neural network adapted to the problem was design which takes as input the 484 features and returns the probability of a cell to belong to both human or mouse class as output.

Training for Further Applications

Our ImageJ/Fiji [20] plugin named “Single Cell Classifier” has been implemented to allow users to perform cell classification with our method using built-in or custom models. The parameters of the methods such as the K values for the neighbors, the factors to convert in gray value, or the cell thickness \(\Delta\) are also editable by the user.

As most deep learning models, the provided pre-trained models can only be used with images that are similar to the one used in their training (i.e. same modality of microscope, tissue, cell, staining, image contrast,...) [28, 29]. To use our method with other images, new models need to be trained, our GitHub provides a set of Python scripts and explanation to help the user performing such task. The nuclei detection and the classification models are trained independently which allows detecting other classes with the same kind of images by re-training only the classification model.

This plugin depends on three other plugins: StarDist [22, 23] for the nuclei detection, MorphoLibJ [30] for its morphological and analysis tools used in cells delineation and features extraction, and finally CSBDeep [31] that executes our classification neural network.

Results

Nuclei Detection

The nuclei detection performed by our StarDist model reaches a detection accuracy of \(74.74\%\) (Table 1), remaining accurate to distinguish nuclei even under challenging conditions (Fig. 3). A detailed description of all computed object detection metrics is available in Supp. Note 1.

Table 1 Segmentation result statistics
Fig. 3
figure3

Nuclei segmentation performance. Examples of nuclei segmentation, with detection outlined in yellow. Scale bars, 50 \(\mu\)m

Features Analysis

Next, we performed a comparison of distribution and correlation between human and mouse features to characterize the impact of each feature on the neural network to assess the morphology of the engrafted human cells in the intraductal environment. Our correlation index corresponds to the absolute value of the Pearson correlation, as the correlation direction is not relevant to our problem. The analysis of the features distribution and their correlation with the classes revealed that shape and size of individual cells poorly discriminate the two classes, with a correlation index lower than 0.2, suggesting that they do not help the discrimination task (Fig. 4a, 4b). However, the contextual features for shape and size have a better correlation ranging between 0.3 and 0.4, highlighting the importance of the context for this purpose (Fig. 4a, 4c). Interestingly, textural features seem to offer better help in the cell species discrimination than shape and size, as highlighted by the resulting small degree of overlap between the two cell species and their global better correlation with the classes, concentrated between 0.15 and 0.35 but reaching a maximum of 0.5. The context seems to decrease the correlation for most of the features, concentrated between 0.05 and 0.3 (Fig. 4a, 4d). However, some exceptions in the contextual texture features still reach a high correlation of 0.5, making them appealing for such analysis.

Fig. 4
figure4

Features analysis. a Violin plot of the correlation between the features, grouped by categories, and the classes. The features number in each category is indicated in parenthesis. The other contextual features refer to the contextual features related to the organization of the neighbors and not related to cell-intrinsic features. b Violin plots of shape and size features. Area ratio as assessed by ratio between cytoplasms and nuclei areas. px=pixels. c Violin plots of some textural features. d Violin plots of some contextual features. The value 0 (or 1 for the left plot) are assigned to cells that do not have enough neighbours

Classification of Normal Breast PDXs in H&E-stained Histological Sections

SCC performs normal human breast cell discrimination reaching an accuracy of 96.51% (Fig. 5), as assessed by quantifying the number of accurate calls upon manual annotation of humanized mouse mammary ducts based on fluorescence labeling by human-specific E-cadherin antibody. Both sensitivity and precision are higher than 96% for the classification of both classes taken into consideration (Table 2). The Area Under the Curve of the Receiver Operating Characteristics (AUC ROC) was used as a standard to estimate the predictive power and revealed that SCC efficiently predicts both human and mouse cells with probabilities associated with each of the analyzed cells close to the extrema 0 and 1, suggesting high confidence of our model. Interestingly, the feature importance previously discussed impinges on the classification task (Table 3). In line with our predictions, we observed that the contextual features without any texture information allow for a very good classification with \(89.02\%\) of accuracy, compared to the shape and size features alone achieving an accuracy of \(66.84\%\), highlighting the importance of contextual features for such classification task. In order to dispel the possibility that our analysis may be confounded by host-derived tissue-resident immune cells [32,33,34,35,36], we performed a co-immunostaining using anti-CD45, antigen expressed on all leucocytes, and anti human-CK7 antibodies that revealed the lack of host-derived immune cells in intraductal xenografts (Supp. Fig. 12).

Table 2 Classification result statistics
Table 3 Features impact on classifier accuracy
Fig. 5
figure5

Cell species classification of normal H&E human breast-derived PDXs. a Example of a cluster of murine cells. Green and red dots correspond to human and mouse class, respectively. For the fluorescent controls, the green signal corresponds to a human-specific E-cadherin, blue signal corresponds to DAPI. The source images are on the left, the classification result in the middle and the fluorescent control on the right. b Example of a cluster of human cells. Green and red dots correspond to human and mouse class, respectively. For the fluorescent controls, the green signal corresponds to a human-specific E-cadherin, blue signal corresponds to DAPI. The source images are on the left, the classification result in the middle and the fluorescent control on the right. c Example of a cluster containing both murine and human cells. Green and red dots correspond to human and mouse class, respectively. For the fluorescent controls, the green signal corresponds to a human-specific E-cadherin, blue signal corresponds to DAPI. The source images are on the left, the classification result in the middle and the fluorescent control on the right

Classification of Breast Cancer-derived PDXs in H&E-stained Histological Sections

As patient-derived xenografts are mainly used in the context of cancer research, we went on to assess whether SCC was also able to classify cell species on tumor-derived PDXs. Whereas the segmentation model remained unchanged compared to the one developed for normal human breast epithelial cells, we prepared a new classification model in order to discriminate human cancer cells from mouse cells by integrating new tumor intraductal xenografts-derived images to the previous model. Estimation of accuracy in both Invasive Lobular Carcinoma (ILC) and Non-Special Type (NST) breast cancer histological subtypes based on 20 images representing a range of 5’000 to 10’000 cells per type revealed that SCC successfully classified more than \(90\%\) of xenografted cells (Table 4). Overall, this new model reaches an accuracy of 96.21% when both ILC and NST breast tumors were taken into consideration (Fig. 6). The analysis of how the features impinged on the classification was performed and showed that the textural features have a higher impact compared to the normal human cells, reaching an accuracy of 95.50% (Table 3).

Table 4 Tumor type classification accuracies
Fig. 6
figure6

Cell species classification results on H&E breast cancer-derived PDXs. a Clusters of human tumor cells: source image (left), with its classification result (middle) and its fluorescent control (right). At the top, a cluster of human-derived cells from No Special Type (NST) tumor growing in the intraductal microenvironment. At the bottom, an instance of Invasive Lobular Carcinoma (ILC). Green and red dots correspond to human and mouse, respectively. For the fluorescent controls, the green signal corresponds to human-specific CK7, blue signal corresponds to DAPI. Scale bars, 100 \(\mu\)m

Classification of Normal Breast PDXs in fluorescence DAPI-stained Images

Having established the accuracy of SCC in H&E-stained histological sections, we went on to test its performance on fluorescence single-channel DAPI-stained images (Supp. Fig. 9), representing a more challenging task because of the lack of any contextual features available. For nuclei detection, a built-in model named “Versatile” was used and a new model for classification of human and mouse cells was trained. Noticeably, SCC reached an accuracy of 94.78% in these settings. The analysis of the features impact revealed similar tendencies than the ones previously observed for the H&E counterparts (Table 3). While on one hand, shape and size features allowed to reach an accuracy of 70.91%, suggesting that these features are sufficient to perform the classification task under these settings, on the other hand textural features appeared less relevant, in line with cytoplasms providing fewer information in single-channel DAPI-stained images than H&E.

Comparison with an Image-based Single-stage Method

Finally, we investigated whether both the nucleus segmentation task and cell classification task could be performed jointly by a single-stage model. To that end, we extended the model architecture of StarDist [22] and added a dedicated classification head that predicts the probability of a nucleus belonging either to a human or a mouse cell. We then annotated nucleus outlines and cell types (human/mouse) for 16 images of size 330x320 containing in total 1’450 human and 300 mouse nuclei. After training, the extended model achieved a classification accuracy of 85.90% across all matched nucleus instances, well below the accuracy achieved by SCC. This suggests that decoupling the segmentation and classification tasks is beneficial in our case possibly due to the availability of only few annotated training data, and that SCC outperforms standard image-based single-stage methods.

Discussion

PDXs are innovative preclinical models and important for translational research. However, their usage is hampered by difficulty in data interpretation arising from the presence of host cells and no methods are currently available for species discrimination of individual cells in PDX models. To fill this gap, we developed SCC, a publicly available deep learning-based tool distributed as an ImageJ/Fiji plugin aiming at classifying human and mouse cells in PDX-derived histological sections. For the first time, SCC elaborates a comprehensive set of information to efficiently classify the species of individual cells based on both cell-intrinsic and contextual features proper of the neighboring tissue. The set of features taken into account by our tool is very general and allows any cell classification task within its reach. Surprisingly, features analysis revealed that contextual features were more important than cell-intrinsic ones for accurate cell classification in H&E-stained histological sections, suggesting that xenografted human cells retain their typical morphologies while creating clusters characterized by distinctive conformations in the host. Although SCC was initially designed for detection of normal breast epithelial cells in H&E-stained sections, we show that our method equally extends to cancer cell discrimination and to fluorescence DAPI-stained images because of the completeness of the set of features taken into consideration yielding, on average, up to 96% of discrimination accuracy for both normal and malignant xenografted cells derived from different breast cancer histological subtypes. The automated classification of individual human and mouse cells performed by SCC can facilitate the work of researchers dealing with PDXs-derived histological sections, making the subsequent image analyses faster and more reproducible, and will enable quantification of xenografted cells to assess grafting efficiency or cell growth. Moreover, SCC can be employed to input new in-house models in order to perform classifications between any cells of interest. For instance, to discriminate between normal human breast epithelial cells, hyperplastic and breast cancer cells, as well as to distinguish different tumor grades are just some of the potential future applications of SCC. Finally, SCC is an easy-to-use and dynamic software that can be employed to perform classifications between any cells of interest, making it an appealing resource in the field of image analysis and breast cancer research.

Data availability

Archived images as at time of publication: https://doi.org/10.5281/zenodo.3960270.

Code availability

The plugin source is accessible onlineFootnote 3 and open-source.

Notes

  1. 1.

    https://github.com/qupath/qupath-bioformats-extension

  2. 2.

    https://zenodo.org/record/3960270

  3. 3.

    https://github.com/Biomedical-Imaging-Group/SingleCellClassifier

Abbreviations

SCC:

Singe Cell Clasifier

H&E:

Haematoxylin and Eosin

HBECs:

Human Breast Epithelial Cells

HR:

Hormone Receptor

MIND:

Mouse INtraDuctal

PDX:

Patient-Derived Xenograft

References

  1. 1.

    Arrowsmith J. Phase ii failures: 2008-2010. Nat Rev, 2011. https://doi.org/10.1038/nrd3439.

  2. 2.

    Dimasi J, Reichert J, Feldman L, Malins A. Clinical approval success rates for investigational cancer drugs. Clin Pharmacol Ther. 2013;94:06. https://doi.org/10.1038/clpt.2013.117.

    CAS  Article  Google Scholar 

  3. 3.

    Dobrolecki L, Airhart S, Alferez D, Aparicio S, Behbod F, Bentires-Alj M, Brisken C, Bult C, Cai S, Clarke R, Dowst H, Ellis M, Gonzalez-Suarez E, Iggo R, Kabos P, Li S, Lindeman G, Marangoni E, McCoy A, Lewis M. Patient-derived xenograft (pdx) models in basic and translational breast cancer research. Cancer Metastasis Rev. 2016;35:12. https://doi.org/10.1007/s10555-016-9653-x.

    CAS  Article  Google Scholar 

  4. 4.

    Eirew P, Steif A, Khattra J, Ha G, Yap D, Farahani H, Gelmon K, Chia S, Mar C, Wan A, Laks E, Biele J, Shumansky K, Rosner J, McPherson A, Nielsen C, Roth A, Lefebvre C, Bashashati A, Aparicio S. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature. 2014;518:11. https://doi.org/10.1038/nature13952.

    CAS  Article  Google Scholar 

  5. 5.

    Hidalgo M, Amant F, Biankin A, Budinsk E, Byrne Phd A, Caldas C, Clarke R, Jong S, Jonkers J, Mlandsmo G, Roman-Roman S, Seoane J, Trusolino L, and Villanueva A. Patient-derived xenograft models: An emerging platform for translational cancer research. Cancer discov. 2014;4:998–1013. https://doi.org/10.1158/2159-8290.CD-14-000.

  6. 6.

    Haricharan S, Lei J, Ellis M. Mammary ductal environment is necessary for faithful maintenance of estrogen signaling in er+ breast cancer. Cancer Cell. 2016;.29:249–250. https://doi.org/10.1016/j.ccell.2016.02.017430.

  7. 7.

    Kratz A, Ferrare M, Sluss P, and Lewandrowski K. Laboratory reference values. N Engl J Med. 2004;351:1548–1564.

  8. 8.

    Behbod F, Kittrell F, Machado H, Edwards D, Kerbawy S, Heestand J, Young E, Mukhopadhyay P, Yeh HW, Allred D, Hu M, Polyak K, Rosen J, and Medina D. An intraductal human-in-mouse transplantation model mimics the subtypes of ductal carcinoma in situ. Breast Cancer Res: BCR, 2009;11:R66. https://doi.org/10.1186/bcr2358.

  9. 9.

    Sflomos G, Dormoy V, Metsalu T, Jeitziner R, Battista L, Scabia V, Raffoul W, Delaloye JF, Treboux A, Fiche M, Vilo J, Ayyanan A, Brisken C. A preclinical model for era-positive breast cancer points to the epithelial microenvironment as determinant of luminal phenotype and hormone response. Cancer Cell. 2016;29:1–16. https://doi.org/10.1016/j.ccell.2016.02.002.

  10. 10.

    Ataca D, Aouad P, Constantin C, Laszlo C, Beleut M, Shamseddin M, Rajaram RD, Jeitziner R, Mead TJ, Caikovski M, Bucher P, Ambrosini G, Apte SS, and Brisken C. The secreted protease adamts18 links hormone action to activation of the mammary stem cell niche. Nat Com. 2020;11(1):1571. https://doi.org/10.1038/s41467-020-15357-y.

  11. 11.

    Koch C, Kuske A, Joosse SA, Yigit G, Sflomos G, Thaler S, Smit DJ, Werner S, Borgmann K, Grtner S, Mossahebi Mohammadi P, Battista L, Cayrefourcq L, Altmller J, Salinas-Riester G, Raithatha K, Zibat A, Goy Y, Ott L, Bartkowiak K, Tan TZ, Zhou Q, Speicher MR, Mller V, Gorges TM, Jcker M, Thiery JP, Brisken C, Riethdorf S, Alix-Panabi C, and Pantel K. Characterization of circulating breast cancer cells with tumorigenic and metastatic capacity. EMBO Mol Med. 2020;e11908. https://doi.org/10.15252/emmm.201911908.

  12. 12.

    Richard E, Grellety T, Velasco V, MacGrogan G, Bonnefoi H, Iggo R. The mammary ducts create a favourable microenvironment for xenografting of luminal and molecular apocrine breast tumours. J Pathol. 2016;240(3):256–61. https://doi.org/10.1002/path.4772.

    CAS  Article  Google Scholar 

  13. 13.

    Russell TD, Jindal S, Agunbiade S, Gao D, Troxell M, Borges VF, and Schedin P. Myoepithelial cell differentiation markers in ductal carcinoma in situ progression. Am J Pathol. 2015;185(11):3076–3089. https://doi.org/10.1016/j.ajpath.2015.07.004.

  14. 14.

    Siersbk R, Scabia V, Nagarajan S, Chernukhin I, Papachristou EK, Broome R, Johnston SJ, Joosten SE, Green AR, Kumar S, Jones J, Omarjee S, Alvarez-Fernandez R, Glont S, Aitken SJ, Kishore K, Cheeseman D, Rakha EA, D’Santos C, Zwart W, Russell A, Brisken C, Carroll JS. Il6/stat3 signaling hijacks estrogen receptor \(\alpha\) enhancers to drive breast cancer metastasis. Cancer Cell. 2020. https://doi.org/10.1016/j.ccell.2020.06.007.

  15. 15.

    Danuser G. Computer vision in cell biology. Cell, 2011;147:973–8. https://doi.org/10.1016/j.cell.2011.11.001.

  16. 16.

    McKinney S, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, Chesus M, Corrado G, Darzi A, Etemadi M, Garcia-Vicente F, Gilbert F, Halling-Brown M, Hassabis D, Jansen S, Karthikesalingam A, Kelly C, King D, and Shetty S. International evaluation of an ai system for breast cancer screening. Nature. 2020;577:89–94, 01. https://doi.org/10.1038/s41586-019-1799-6.

  17. 17.

    Chen H, Qi X, Yu L, Dou Q, Qin J, Heng P-A. Dcan: Deep contour-aware networks for object instance segmentation from histology images. Med Image Anal. 2016;36:11. https://doi.org/10.1016/j.media.2013.07.007.

    CAS  Google Scholar 

  18. 18.

    Janssens T, Antanas L, Derde S, Vanhorebeek I, Berghe G, and Guiza F. Charisma: An integrated approach to automatic h&e-stained skeletal muscle cell segmentation using supervised learning and novel robust clump splitting. Med Image Anal. 2013;17:1206–1219. https://doi.org/10.1016/j.media.2013.07.007.

  19. 19.

    Salvi M, Molinari F. Multi-tissue and multi-scale approach for nuclei segmentation in h&e stained images. Biomedical Eng Online. 2018;17:12. https://doi.org/10.1186/s12938-018-0518-0.

    Article  Google Scholar 

  20. 20.

    Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez J-Y, White DJ, Hartenstein V, Eliceiri K, Tomancak P, Cardona A. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012. https://doi.org/10.1038/nmeth.20192019.

  21. 21.

    Bankhead P, Loughrey MB, Fernandez JA, Dombrowski Y, McArt DG, Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG, James JA, Salto-Tellez M, and Hamilton PW. Qupath: Open source software for digital pathology image analysis. Sci Rep. 2017. https://doi.org/10.1038/s41598-017-17204-5.

  22. 22.

    Schmidt U, Weigert M, Broaddus C, and Myers G. Cell detection with star-convex polygons. Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II. 2018;265–273. https://doi.org/10.1007/978-3-030-00934-2_30.

  23. 23.

    Weigert M, Schmidt U, Haase R, Sugawara K, and Myers G. Star-convex polyhedra for 3d object detection and segmentation in microscopy. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2020.

  24. 24.

    Haralick RM, Shanmugam K, and Dinstein I. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 1973;SMC-3(6):610–621.

  25. 25.

    Miyamoto E, Jr T. Fast calculation of haralick texture features. 2008.

  26. 26.

    Chew LB. Constrained delaunay triangulations. Algorithmica. 1989;97–108. https://doi.org/10.1007/BF01553881.

  27. 27.

    Lamouille S, Xu J, and Derynck R. Molecular mechanisms of epithelialmesenchymal transition. Nat Rev Mol Cell Biol. 2014;15:178–196. https://doi.org/10.1038/nrm3758.

  28. 28.

    Caie PD, Schuur K, Oniscu A, Mullen P, Reynolds PA, and Harrison DJ. Human tissue in systems medicine. The FEBS Journal, 2013;280(23):5949–5956. https://doi.org/10.1111/febs.12550.

  29. 29.

    Dimitriou N, Arandjelovic O, Caie PD. Deep learning for whole slide image analysis: An overview. Front Med. 2019;6:264. https://doi.org/10.3389/fmed.2019.00264.

    Article  Google Scholar 

  30. 30.

    Legland D, Arganda-Carreras I, and Andrey P. MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics. 2016;32(22):3532–3534. https://doi.org/10.1093/bioinformatics/btw413.

  31. 31.

    Weigert M, Schmidt U, Boothe T, Mller A, Dibrov A, Jain A, Wilhelm B, Schmidt D, Broaddus C, Culley S, Rocha-Martins M, Segovia-Miranda F, Norden C, Henriques R, Zerial M, Solimena M, Rink J, Tomancak P, Royer L, Jug F, and Myers EW. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat Methods, 2018;15(12):1090–1097. https://doi.org/10.1038/s41592-018-0216-7.

  32. 32.

    Dawson C, Pal B, Vaillant F, Gandolfo L, Liu Z, Bleriot C, Ginhoux F, Smyth G, Lindeman G, Mueller S, Rios A, and Visvader J. Tissue-resident ductal macrophages survey the mammary epithelium and facilitate tissue remodelling. Nat Cell Biol. 2020;22:1–13, 05. https://doi.org/10.1038/s41556-020-0505-0.

  33. 33.

    Nguyen AV, Pollard JW. Colony stimulating factor-1 is required to recruit macrophages into the mammary gland to facilitate mammary ductal outgrowth. Dev Biol. 2002;247:11–25. https://doi.org/10.1006/dbio.2002.0669.

    Article  Google Scholar 

  34. 34.

    O’Brien J, Durand-Rougely HMC, Schedin P. Macrophages are crucial for epithelial cell death and adipocyte repopulation during mammary gland involution. Development. 2012;139:269–75. https://doi.org/10.1242/dev.071696.

    Article  Google Scholar 

  35. 35.

    Pollard JW, Hennighausen L. Colony stimulating factor 1 is required for mammary gland development during pregnancy. Proceedings of the National Academy of Sciences. 1994;91(20):9312–6. https://doi.org/10.1073/pnas.91.20.9312.

    CAS  Article  Google Scholar 

  36. 36.

    Ingman WV, Wyckoff J, Gouon‐Evans V, Condeeli J, Pollard JW. Macrophages promote collagen fibrillogenesis around terminal end buds of the developing mammary gland. Dev Dyn. 2006;235:3222–3229. https://doi.org/10.1002/dvdy.20972.

Download references

Acknowledgements

The authors wish to acknowledge the support of the Phenogenomics Center, Histology and the Bio Imaging & Optics Core Facility at EPFL for technical assistance, G. Sflomos and P. Aouad for revising the manuscript, A. Ayyanan, G. Sflomos and L. Battista for material collection.

Funding

Open Access funding provided by EPFL Lausanne. This work was supported by the EPFL Open Science Fund “Reproducible and Reusable Imaging Workflows” from the Imaging@EPFL initiative and the Swiss Data Science Center (SDSC). Q.J. was supported by the EPFL Center for Imaging under the fund OSF RENKU 563229. F.D.M. was supported by SNF (310030_179163/1 Exploring key steps of the metastatic cascade in estrogen receptor positive breast cancer in vivo). M.W. was supported by a generous donor represented by CARIGEST SA.

Author information

Affiliations

Authors

Contributions

Conceptualization: Fabio De Martino. Data Curation: Quentin Juppet, Fabio De Martino, Elodie Marcandalli. Formal Analysis: Quentin Juppet, Martin Weigert. Funding Acquisition: Daniel Sage, Cathrin Brisken. Investigation: Fabio De Martino. Methodology Development: Quentin Juppet, Daniel Sage, Olivier Burri. Project Administration: Fabio De Martino. Resources: Fabio De Martino, Cathrin Brisken. Software: Quentin Juppet, Daniel Sage. Supervision: Fabio De Martino, Michael Unser, Cathrin Brisken, Daniel Sage.Writing - original draft: Fabio De Martino, Quentin Juppet, Martin Weigert, Olivier Burri. Writing Review & Editing: Fabio De Martino, Quentin Juppet, Martin Weigert, Michael Unser, Cathrin Brisken, Daniel Sage.

Corresponding authors

Correspondence to Cathrin Brisken or Daniel Sage.

Ethics declarations

Confic of Interest/Competing Interest

The authors declare that they have no conflict of interest.

Ethical Approval

Animal experiments were performed in accordance with protocol approved by the Service de la Consommation et des Affaires vtrinaires of Canton de Vaud (VD 1541.4 and VD 1865.3). NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ mice (NSG) breeders were purchased from Jackson Laboratories

Consent to participate

The cantonal ethics committee approved the study on patient samples (183/10). Informed consent was obtained from all subjects.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Juppet, Q., De Martino, F., Marcandalli, E. et al. Deep Learning Enables Individual Xenograft Cell Classification in Histological Images by Analysis of Contextual Features. J Mammary Gland Biol Neoplasia 26, 101–112 (2021). https://doi.org/10.1007/s10911-021-09485-4

Download citation

Keywords

  • Deep learning
  • Patient-Derived Xenografts
  • MIND model
  • Histology
  • Cell Classifier