Introduction

Histological assessment of tissue is central to the diagnosis and classification of malignancy, and critically informs patient management. Pathologists routinely report visible alterations in nuclear morphology. Altered nuclear features are ubiquitous in cancer, and changes in nuclear size, shape, coloration, texture, nucleoli, and nuclear-cytoplasmic ratio, as well as their intratumoral variance, are important features of histologic grade, which has prognostic relevance independent of disease stage1,2. The enumeration and morphologic features of mitoses also informs pathologist assessment of malignancy3. Furthermore, nuclear morphology can be important diagnostic features of certain cancers, such as nuclear clearing (“Orphan Annie Eyes”) and pseudoinclusions of papillary thyroid carcinoma4.

A complex interplay exists between nuclear morphology and the genetic, epigenetic, and transcriptomic milieu of cancer cells, reflecting the importance of the nucleus to the process of oncogenic transformation. Distorted nuclei can indicate dysregulated replication processes, aneuploidy, genomic instability, and genetic mutations that affect stability and function of the nuclear envelope5. Indeed, many cancers have altered expression of nuclear envelope components, resulting in nuclear rupture and micronuclei formation, further increasing genomic instability5,6. In addition, components of the nuclear envelope are known to bind to both chromatin and transcription factors, providing a spatial regulation to gene transcription and expression5,7. Therefore, the visual appearance of cancer cell nuclei has the potential to reveal key information about the biology of a tumor.

The quantitation of nuclear morphology has been a long sought-after goal8. Early studies used semi-quantitative approaches to enumerate features such as nuclear size and shape; these works revealed relationships between increased nuclear area and altered nuclear shape with poor prognosis and advanced disease in breast cancer and prostate cancer, respectively9,10,11. The use of computational approaches in pathology image analysis to identify and quantify nuclear changes has gained traction as modern computer vision methods have allowed for rapid, reproducible and cost-effective quantification of nuclear morphology. Using these methods, nuclear morphometric features have been shown to correlate with relevant clinical and pathological metrics, such as oligodendroglioma component in glioblastoma12, as well as stage13, disease aggressiveness14, recurrence15,16,17, and outcome18 in other cancer subtypes. In addition, increased nuclear size has been correlated with whole genome duplication19,20, and nuclear morphometric features have allowed for the prediction of relevant molecular information, such as ER status21 and Oncotype DX risk scores22,23 in breast cancer. Most recently, Nimgaonkar et al. described an AI-derived histologic signature, the main component of which was variance in nuclear morphology in cancer cells, that predicted response to gemcitabine in patients with pancreatic adenocarcinoma24.

Digitized whole slide images (WSIs) have enhanced the degree to which nuclear morphology can be studied in histological specimens12,13. However, the large size of WSIs—up to billions of pixels and containing thousands of nuclei—makes exhaustive manual annotation infeasible; thus studies have relied on manually-selected subregions of interest rather than entire slides20,25. Automated methods are, therefore, needed to fully quantify nuclear features in WSIs. We recently described a cell- and tissue-level computational pathology pipeline using WSIs for the automated computation of human interpretable features (HIFs), distinctive features with tangible methods for validation26. This pipeline allows the use of HIFs to predict treatment-relevant molecular phenotypes and allows for integration with current pathological methods. Given that morphological analysis of histology features is central to pathology workflows, we sought to extend this work to identify nuclear human interpretable features (nuHIFs) in multiple cancer types.

In this paper, we present a multi-tissue model for the exhaustive detection, segmentation, and classification of nuclei from entire hematoxylin and eosin (H&E)-stained WSIs, allowing for the exhaustive analysis of slide-level descriptors of nuclear size, shape, texture, and staining intensity. Furthermore, we demonstrate that these nuHIFs are predictive of clinically-relevant information in multiple cancer types.

Results

Model development, performance, and nuclear feature extraction

We collected annotations and trained a machine learning (ML) model to detect and segment nuclei in H&E-stained WSIs as described in the “Methods” and shown in Fig. 1. The model is not limited to sampling regions of interest from tissue samples, but rather can be utilized to exhaustively annotate WSIs. Application of the model to our held-out test data, including held-out tissue and disease types, demonstrated performance (mean Dice score = 0.818, aggregated Jaccard index (AJI) = 0.619) comparable to models reported previously in the literature27,28. Importantly, model speed was adequate to apply to multi-gigabyte WSIs at full resolution (approximately 0.25 μm/pixel; roughly 30 min per slide). Examples of model performance in mesothelioma, head and neck squamous cell carcinoma, and stomach adenocarcinoma are shown in Fig. 2.

Fig. 1: Machine learning model annotation collection, training, and application.
figure 1

a Model workflow. Briefly, pathologists trained expert annotators to perform exhaustive annotations of nuclei on H&E slide patches from diverse tissue sources. These were used to train a pan-H&E nucleus detection and segmentation model, which was subsequently evaluated on held-out patches and applied to exhaustively segment nuclei in three WSI datasets. b Features extracted from the model. Mean and standard deviation values were calculated for these features at the whole-slide level for cancer cells, lymphocytes, and fibroblasts.

Fig. 2: Example of model performance.
figure 2

Representative WSI patches from mesothelioma, head and neck squamous cell carcinoma (HNSCC), and stomach adenocarcinoma stained with H&E are shown in the left-most panel. Ground truth nuclei identified manually and nuclei predicted by the model are shown in the middle and right-most panels, respectively. Each color represents a nucleus instance.

We selected clinical samples from two additional datasets, designated OOD-Test-1 and OOD-Test-2, characterized in Table 2. We collected additional annotations on these datasets and characterized model performance. We found performance numerically comparable or superior to our initial held-out test data despite different sample origin, and one dataset containing non-cancer tissue samples (OOD-Test-1 mean Dice = 0.818, AJI = 0.628; OOD-Test-2 mean Dice = 0.826, AJI = 0.649).

Having evaluated our model’s performance, we deployed the resulting model on primary diagnostic (DX1) H&E slides from the breast cancer (BRCA; N = 892), prostate adenocarcinoma (PRAD; N = 392), and lung adenocarcinoma (LUAD; N = 426) TCGA cohorts (Fig. 3); model performance was visually assessed to be consistent with test data. The distribution of pixel sizes (microns per pixel; MPP) of these three cohorts are shown in Supplementary Fig. 4. The median MPPs were 0.248, 0.252, and 0.252 for BRCA, LUAD, and PRAD datasets respectively. We extracted interpretable features describing the shape, size, staining intensity, and texture of every nucleus on each WSI (Supplementary Table 3). We performed further analysis on nuHIFs specific to cancer cells, fibroblasts, and lymphocytes, as these three cell classes are common across all cancer types and have been implicated in clinical outcomes.

Fig. 3: Nuclear segmentation and cell-type identification in multiple cancer types.
figure 3

Representative H&E images of (a) breast cancer (TCGA BRCA), (b) lung adenocarcinoma (TCGA LUAD), and (c) prostate adenocarcinoma (TCGA PRAD) are shown at ×40 magnification. df Nuclear segmentation and cell-type identification masks are overlaid onto H&E images shown in (ac). h, i High-magnification images of BRCA (g), LUAD (h), and PRAD (i). Magnified regions are indicated by dashed boxes in (df). Scale bars indicate a distance of 50 μm.

nuHIFs show within- and between-cancer-type variation

To assess whether nuHIFs differ between cancer types, we performed UMAP to compare the nuHIFs from cancer cells (Fig. 4a), fibroblasts (Fig. 4b), and lymphocytes (Fig. 4c) in BRCA, LUAD, and PRAD datasets. We observed notable inter- and intra-dataset variation in nuHIFs. For cancer cells, nuclear morphology was distinct between PRAD and LUAD datasets, while BRCA dataset cancer cells showed nuclear features similar to both PRAD and LUAD (Fig. 4a). Unsupervised hierarchical clustering of z-scored features revealed specific nuHIFs differentially exhibited in these three cancer subtypes (Fig. 4b). For example, features associated with nuclear size were higher in LUAD cancer nuclei relative to PRAD. Assessment of the distribution of three size-related features in BRCA, LUAD, and PRAD confirmed these observations—cancer nuclei in PRAD were smaller in area and major axis length than cancer nuclei in BRCA and LUAD (Supplementary Fig. 5A), while fibroblast area and major axis length is larger in BRCA than in LUAD and PRAD (Supplementary Fig. 5B). Minute variation in minor axis length was observed between the three cancer types for cancer cells and fibroblasts (Supplementary Fig. 5C). In contrast, lymphocyte nucleus size parameters did not appear to differ between cancer types (Supplementary Fig. 5). In addition to size features, features associated with nuclear staining were observed to differ between cancer types. In particular, notable differences in features relating to nucleus stain intensity, color, and shape between PRAD and LUAD were observed (Fig. 4). The clearest distinction between cancer subtypes was discerned through nuHIFs of fibroblasts in BRCA, LUAD, and PRAD (Fig. 4b, d). Unsupervised hierarchical clustering revealed specific features enriched in fibroblasts from these cancer subtypes. Interestingly, lymphocyte nuHIFs also differed between cancer types.

Fig. 4: nuHIFs show variation within and between cancer types.
figure 4

Uniform manifold approximation and projection (UMAP) visualization of BRCA, LUAD, and PRAD defined by nuclear human interpretable feature (HIF) for (a) cancer cells, (b) fibroblasts, and (c) tumor-infiltrating lymphocytes. Clustered heatmaps of median Z-scores for all 30 nuHIFs are shown for each cell type. d Receiver operating characteristic (ROC) curves for binary classification between paired cancer types using nuHIFs from each of cancer, fibroblast, or lymphocyte nuclei. ROCs are shown for the five held-out validation splits and mean area under ROC (AUROC) is shown for each classification problem. In particular, fibroblast and lymphocyte nuclear features are highly able to differentiate between cancer types. Mean AUROC is shown for each class of nuclear HIF.

To ensure that the observed differences in nuclear features between cancer types were not biased by scanned image pixel size, we measured the Pearson correlation between nuclear size (using major axis length as a representative feature) and MPP for each cell type within BRCA, LUAD, and PRAD datasets individually, to remove the potential effect of possible between-cancer-type variation in nuclear size. The within-cancer-type variation in mean nuclear major axis length between slides at the same MPP is large for cancer cells (Supplementary Fig. 6A), fibroblasts (Supplementary Fig. 6B), and lymphocytes (Supplementary Fig. 6C). In addition, the magnitude of the within-cancer-type Pearson correlations is low, although some rise to the level of significance, perhaps due to the high power of the large dataset. The within-cancer-type Pearson correlations also show an inconsistent sign, ranging from 0.206 to −0.151. Generally, these results suggest that there is an inconsistent directional effect of MPP on nuclear size, and other factors are likely driving the observed differences.

Because of the apparent association between nuclear morphology and cancer type, we hypothesized that nuHIF-quantified nuclear morphology could be a distinguishing feature of cancer types. To test this, we constructed a simple random forest binary classification model for differentiating between each pair of cancer types (BRCA, PRAD, LUAD) using cancer, fibroblast, or lymphocyte nuclear HIFs. We performed five-fold cross-validation to estimate the extent to which cancer types may be differentiated by nuclear morphology. We found consistently strong performance for differentiating between cancer types using nuclear morphology (Fig. 4d). Although lymphocyte nuclear morphology was less distinct between cancer types when visualized with UMAP, supervised analysis indicated that lymphocyte morphology differed between cancer types.

Cancer nuclear morphology is associated with metrics of genomic instability in multiple cancer types

Cancer nuclear atypia is used clinically as a marker of malignancy. We therefore hypothesized that underlying levels of genomic instability may partially explain the observed heterogeneity in cancer nuclear morphology within cancer subtypes, as well as between cancer types with known differences in malignancy. We tested this hypothesis by assessing the relationship between cancer nuclear morphology and genomic instability in LUAD, BRCA, and PRAD cohorts using aneuploidy score and homologous recombination deficiency (HRD) score as metrics of genomic instability. Indeed, using the standard deviation of cancer nuclear area as a metric of nuclear atypia, we detected significant correlation between this nuHIF and both aneuploidy score (Fig. 5a) and HRD score (Fig. 5b). When assessed in a pan-cancer manner, the overall correlation increased, and the pattern observed in cancer nuHIF UMAP analysis persisted: PRAD displayed a lower level of genomic instability across both metrics compared to LUAD, while BRCA showed a wide range of genomic instability, with similarities to both PRAD and LUAD. These results confirm that cancer nuclear morphology, especially variability in nuclear size, is associated with the level of genomic instability.

Fig. 5: Variation in cancer nuclear size correlates with metrics of genomic instability.
figure 5

Standard deviation of cancer cell nuclear area was compared to (a) aneuploidy score and (b) homologous recombination deficiency (HRD) score for BRCA, LUAD, and PRAD. c Receiver operating characteristic (ROC) curves for prediction of whole-genome doublings in BRCA, LUAD, and PRAD. ROCs are shown for the five held-out validation splits; mean AUROC is shown for each cancer type.

Because aneuploidy score was correlated to variation in cancer nuclear area, we posited that cancer nuclear morphology was predictive of whole genome doubling. To address this hypothesis, we trained random forest models for predicting binarized whole-genome doubling using cancer nuHIFs from each of the BRCA, LUAD, and PRAD cancer types. We found that cancer nuclear morphology was predictive of WGD for each cancer type, with strongest predictive power in BRCA, and more variation in performance expected for PRAD, where WGD occurs less frequently (Fig. 5c). The mean RF importance across the five splits is reported for the top five features for each cancer type in Supplementary Table 4. Briefly, variation in cancer nuclear dimensions were most important for predicting WGD in BRCA, mean cancer nuclear dimensions were most important for predicting WGD in LUAD, and a mix of color and shape features were found to be most important for PRAD.

Nuclear morphology enables prediction of breast cancer molecular subtype

We hypothesized that nuclear morphology would differ in subtle but meaningful ways between molecular subtypes of breast cancer, and that these differences might enable classification of molecular subtypes of breast cancer from H&E images. To test this theory, we trained nuHIF-based classification models for predicting breast cancer subtype in a one-vs.-all manner (Fig. 6). Briefly, we found that cell-type-specific nuclear morphology enabled classification of some but not all breast cancer molecular subtypes. Interestingly, the ability to predict subtype varied by subtype as well as by cell type being used to make the inference. Cancer nuclear morphology (Fig. 6a) or lymphocyte nuclear morphology (Fig. 6c) enabled moderate prediction (AUROC > 0.7) of luminal A and basal-like breast cancer subtypes. Cancer nuclear morphology but not lymphocyte or fibroblast nuclear morphology enabled moderate prediction of HER-2 breast cancer subtype. Interestingly, fibroblast nuclear morphology alone was a poor predictor of molecular subtype (Fig. 6b). When aggregating cell types (Fig. 6d), luminal A and basal-like prediction AUROC increased further to ≥0.80. These results suggest that altered nuclear morphology is a possible histological presentation of breast cancer molecular subtypes.

Fig. 6: Cell-type-specific nuclear morphology enables classification of breast cancer molecular subtypes.
figure 6

One-vs.-all binary classification of breast cancer molecular subtypes (luminal A, luminal B, HER2-like, basal-like, and normal-like)46 was performed using random forest classification on nuHIFs derived from (a) cancer cells, (b) fibroblasts, (c) lymphocytes, and (d) aggregated cell types. Five-fold stratified cross-validation was used, and mean AUROC for each of the iteratively held-out test sets is reported here.

Fibroblast nuclear morphology is associated with survival and gene expression patterns in breast cancer

The interplay between fibroblasts and cancer cells is complex and prognostically relevant, as associations between cancer-associated fibroblasts (CAFs) and cancer progression have been recently described29,30,31,32. Notably, in breast cancer, CAFs have been shown to contribute to prognosis33, while CAF subset heterogeneity correlates with metastasis34. We therefore hypothesized that fibroblast nuHIFs in BRCA would be clinically prognostic, independent of further molecular testing. We sought to identify fibroblast nuHIFs that are associated with progression-free (PFS) and/or overall survival (OS). We performed regression between each fibroblast nuHIF and PFS and OS using Cox proportional hazards models with patient age and ordinal cancer stage as regression covariates. After FDR correction, multiple fibroblast nuHIFs were significantly prognostic of PFS (Supplementary Table 5) and OS (Supplementary Table 6). Features quantifying the same general attribute, e.g. nuclear area and nuclear axis length as measures of size, were indeed found to be correlated with one another (mean pairwise Pearson r = 0.90 for fibroblast nuclear area, major axis length, minor axis length, and perimeter). We selected the mean fibroblast nucleus area (“MEAN[FIBROBLAST_NUCLEUS_AREA]_H & E”) for further evaluation, and show Kaplan–Meier survival curves for PFS and OS for the population binarized by this feature median value. High nuclear area was prognostic of worse outcomes (Fig. 7, PFS HR = 1.81, 95% CI [1.32–2.48], p = 0.0002; OS HR = 1.77, 95% CI [1.22, 2.56], p = 0.002).

Fig. 7: Association between fibroblast nuclear area and survival in breast cancer.
figure 7

Increased fibroblast nuclear area (≥50th percentile) corresponds to poor PFS (HR = 1.8163, 95% CI [1.3119–2.4823], p = 0.0002) and OS (HR = 1.7753, 95% CI [1.2206, 2.5620], p = 0.0022).

Having identified this relationship between fibroblast nuclear size and prognosis, we sought to assess whether mean fibroblast nuclear area was associated with differences in bulk gene expression in breast cancer. We computed the rank-based (Spearman) correlation between fibroblast mean nuclear area and each gene in TCGA bulk gene expression to identify genes associated with this nuHIF (see “Methods” for details). Fibroblast nuclear area was significantly, albeit weakly (absolute r > 0.15), associated with expression of numerous individual genes (Supplementary Table 7). In contrast to the weak associations observed at the individual gene level, gene set enrichment analysis performed on the genes associated with morphology revealed significant relationships between fibroblast nuclear size and levels of several previously identified expression pathways. Notably, larger fibroblast nuclear size showed positive association with gene expression in pathways associated with degradation and remodeling of the extracellular matrix (Supplementary Table 8) indicating higher fibroblast activity. Meanwhile, larger fibroblast nuclear size showed negative association with the expression of genes in pathways relating to immune response to the tumor, such as B cell receptor signaling and lymphoid cell interactions with non-lymphoid cells (Supplementary Table 9). Taken together, these results suggest that fibroblast nuclear morphology is indicative of underlying patterns of gene expression and is thus biologically grounded.

Discussion

In this study, we have presented a pan-tissue approach for nucleus segmentation, classification, and featurization on entire whole-slide pathology images. This method enabled the construction of predictive models and the identification of features linking nuclear morphology with quantitative biomarkers across BRCA, PRAD, and LUAD. These results highlight the potential of ML-enabled quantification of nuclear morphometry as a prognostic feature of many cancer types and a potential biomarker to be used by pathologists. Furthermore, this approach enables the quantitative testing of hypotheses and numerical quantification of histological relationships proposed by pathologists (e.g., by establishing a numerical relationship between nuclear atypia and disease metrics). In addition, our approach enables the data-driven identification of sub-visual changes that may be clinically meaningful.

One particular strength of our approach is the ability to not only measure morphologic features associated with nuclei in a cancer specimen, but to assign a cell class to each nucleus, as well. To our knowledge, this work provides the first characterization of nuclear morphologies of specific cell types in different cancers at scale. As such, we were not only able to assess the associations of cancer cell nuclear morphology with clinically-relevant metrics, but we were also able to examine these relationships using nuclear features of fibroblasts and lymphocytes. For example, fibroblast nuHIFs provided a clear separation of cancer types in both unsupervised and supervised analyses, indicating that the nuclear morphologies of fibroblasts differ in breast, lung, and prostate cancers. Given recent observations that CAFs can be classified into multiple functional subtypes based on gene expression35, the distinctive nuclear morphologies seen in fibroblasts of breast, lung, and prostate cancers suggests that fibroblasts may contribute to cancer progression differently in these three cancer types. Importantly, we cannot distinguish between the multiple known subtypes of intratumoral fibroblasts using the approach described herein. This caveat is particularly relevant to the associations of fibroblast nuclear morphology with gene expression in breast cancer. Increased nuclear size was positively associated with an extracellular matrix remodeling gene expression profile and negatively associated with the expression of genes relating to anti-tumor immune response (Supplementary Tables 4 and 5). Interestingly, single-cell analysis of fibroblasts in breast cancer has revealed several disparate populations, including an immunosuppressive population characterized by the expression of genes involved in collagen production and extracellular matrix remodeling and a separate class with an inflammatory gene expression profile36. While our model cannot directly predict the presence of these fibroblast sub-populations, given the prognostic associations of nuclear morphology in our dataset and sc-RNAseq expression36, it will be of interest to test whether specific nuclear features of CAFs associate with functional subtypes.

Furthermore, nuclear features derived from our model were associated with PFS and OS in breast cancer. It is worth noting that this analysis, while incorporating patient age and clinical stage as regression covariates, was conducted on a large cohort of patients across study sites for whom relevant clinical information (e.g., treatment history) was not readily available. Therefore, while our result linking fibroblast nuclear morphology to prognosis in breast cancer is intriguing, further study in more controlled patient cohorts is needed to confirm this observation.

Herein, we observed that nuclear morphology differed between cancers as assessed using nucleus segmentation models. This result was observed not only for cancer epithelial cells and fibroblasts, but also, surprisingly, for lymphocytes. However, caution is warranted in interpretation—it is plausible that batch effects between slides from different tumor groups could drive variation in nuclear presentation, especially due to differences in pre-analytic variables such as slide preparation and staining. However, it is also plausible that this finding reflects the differences in genetic and epigenetic landscapes between tumor types, levels of genomic instability, and overall differences in cancer evolution between these cancer types that may manifest as disparate nuclear morphologies.

The observed relationship between greater variation in cancer nuclear area and genomic instability was consistent across cancer types, indicating a quantitative link between nuclear pleomorphism and genomic instability pertinent to numerous cancer histologies. Prior analyses have noted an association between increased variation in nuclear size and whole genome doubling, suggesting a direct link between variation in nuclear size and genomic instability19,20. Additional work has noted a correlation between nuclear morphology and HDR in luminal and triple-negative breast cancer37. Given that nuclear size reflects DNA content, variation in nuclear size features between cells may be linked to underlying genomic instability. Similarly, recent work identified a histologic signature based on variability in nuclear morphology in pancreatic cancer cells that was associated with improved response to gemcitabine but was not associated with a previously defined gene expression-based disease subtype24. Pancreatic cancer patients with BRCA1/2 mutations, associated with increased genomic instability, are known to respond more favorably to therapy regimens involving gemcitabine38; thus, our result that nuclear variation is associated with genomic instability may explain this recent finding. To this end, our observation that variability in nuclear size (measured here by standard deviation of cancer cell nuclear area) is consistent with these prior hypotheses and allows for them to be tested on a larger scale for each case (all cells for each cell type in the WSI). While the biological result linking nuclear morphology with genomic instability is not novel, the observation of this expected result through the analyses of our novel model-derived nuclear features indicates that our approach supports the technical robustness and biological applicability of our approach.

One mitigation to potential batch effects is to analyze nuclear morphology within a single cancer type, and additionally to focus on size and shape features that are more likely to be robust to tissue preparation variabilities. For example, in breast cancer, we observed a clear relationship between fibroblast nuclear size, prognosis, and gene expression patterns. In breast cancer, increased fibroblast nuclear area was positively correlated with gene expression in extracellular matrix remodeling pathways and negatively correlated with genes in anti-tumor immune response pathways. The CAF subtypes present in a breast cancer sample may impact the tumor immune microenvironment35. While it would be interesting to posit that fibroblast nuclear morphology could reflect these subtypes, the ability to explore this is precluded by the use of bulk RNAseq data, since fibroblast nuclear features and the bulk expression profiling reflect a summarization of a whole slide. However, because nuclear morphology is quantified at single-cell resolution, this approach could be tied directly to single-cell expression analysis. Further work is necessary to delineate the functional relevance of nuclear morphology changes in fibroblasts in cancer.

As noted, batch effects have the potential to influence the interpretation of model outputs due to data that are aggregated across different sites, sources, and preparation laboratories. Pixel size variability, due to slide scanning with different MPP resolution, is one aspect of how these differences may manifest, but there are others to consider as well: differences in stain reagents, sample preparation, sample storage, or other pre-analytical variables. For the analyses described herein, the median MPP values were highly similar across the three indications, with the BRCA MPP slightly lower than that of LUAD and PRAD (Supplementary Fig. S4). That said, to further ensure against differences in pixel dimension contributing to bias, the size-related features of the nuclei are reported here in units of microns or square microns, which is created by multiplying the size of the mask by the appropriate MPP conversion factor. Thus, differences in the MPP should not propagate into length-features, and the slide scan characteristics should not bias the features. Furthermore, we measured the Pearson correlation between nuclear size (using major axis length as a representative feature) and MPP for each cell type within BRCA, LUAD, and PRAD datasets individually to eliminate the potential effect of possible inter-cancer-type variation in nuclear size (Supplementary Fig. 6). While the within-cancer-type variation in mean nuclear major axis length between slides at the same MPP is large, the magnitude of the within-cancer-type Pearson correlations is low, although some rise to the level of significance (likely due to the high power of the large datasets). Lastly, it is worth noting that cancer-type differences in nuclear size appear to be an outlier of relatively larger magnitude than expected if MPP bias was the primary driving factor (Supplementary Fig. 5). Thus, we are confident that the observations noted in this study regarding nuclear size features are not biased by scan-specific metrics.

The approach that we undertook for nucleus segmentation and morphometry analysis in this paper has several key strengths. First, the ability to compute human-interpretable nuclear features at scale enables testing quantitative biological hypotheses, rather than relying on by-eye estimation of parameters such as variation in nuclear morphology. The ability to perform these analyses on WSIs of H&E-stained cancer tissue additionally obviates the need to hand-select regions of interest, which may contribute to biased analyses. In addition, we were able to train and deploy our model on tissues from diverse cancer types, suggesting that the model can be readily deployed on samples from varied cancer indications39.

A particular strength of this approach is the interpretability of the predictions made. While HIF-based predictive clinical models are inherently less flexible than end-to-end black-box approaches (and, thus, can yield lower performance), they benefit from the lower dimensionality of features as a method of regularization, as the HIFs used herein directly map to low-dimensional representations of the tissue image. Furthermore, HIF-based models allow researchers and clinicians to learn from the features and generate novel hypotheses without discarding the wealth of known biology.

Although our results point to the potential of nuclear segmentation, classification, and feature analysis as a clinical screening tool, our study is limited in that our biomarker analysis was focused on academically curated datasets. These datasets were selected due to their size, completeness, and rich genomic and transcriptomic profiling data. Construction and validation of generalizable predictive machine-learning models requires the inclusion of a broad range of training and validation data, and future efforts should focus on validating these hypotheses in additional cohorts. The technical approaches we describe here have been validated by their application to other clinical datasets, showing their generalizability of this methodology and robustness of these models (data not shown).

In sum, this work highlights the power of ML-driven quantitative nuclear morphometry in multiple cancer types. The models and resulting features described herein have the potential not only to aid pathologists and research teams in discerning novel biomarkers but to provide meaningful prognostic information for cancer patients. The ability to measure these features robustly and consistently at scale may enable the development of improved clinical tools for advancing precision medicine.

Methods

Study design

Manually collected annotations were used to train and validate an object detection and segmentation model to detect and segment nuclei from H&E-stained tissue slides. Training data variation and number of annotations were selected to exceed previously used standards in the field27 and exhibit wide variation in tissue morphology as subjectively assessed by study pathologists (MGD and LY). This model was deployed on whole-slide H&E images from The Cancer Genome Atlas (TCGA) to extract features from each nucleus in each slide, and the resulting features were used to analyze the relationship between nuclear morphology and underlying molecular markers of cancer, and patient outcomes. Inclusion of TCGA slides was performed in accordance with literature norms (e.g. as by Saltz et al.40): TCGA slides were selected to be the DX1 (primary diagnostic) slide for each case in TCGA and no outlier exclusion was performed, to conservatively reflect real-world conditions where same-case replicates may not be available. Where multiple hypotheses were tested, all reported statistics were corrected to control false discovery rate as described below.

Dataset description and annotation collection

Over 29,000 manual annotations of cell nuclei were collected from H&E images from 21 tumor types at ×40 and ×20 magnification from TCGA41. Additional H&E-stained tissue biopsies of skin, liver non-alcoholic steatohepatitis, colon inflammatory bowel disease, and kidney lupus were also utilized. These samples were commercially acquired from Precision for Medicine (Frederick, Maryland) or Inform Diagnostics (formerly Miraca Life Sciences, Irving, Texas) or were generously provided by Dr. Fabio Tavora (Argos Laboratory, Sao Paolo, Brazil) or Dr. Robert Najarian (University Gastroenterology, Portsmouth, RI). Samples were provided by Drs. Tavora and Najarian under sample acquisition agreements approved by the Institutional Review Board, Independent Ethics Committee, or equivalent authority at Argos Laboratory and University Gastroenterology, respectively. Board-certified pathologists (MGD and LY) selected 1000 × 1000 pixel patches that were exemplary of varied tissue and nuclear morphology from the training slides and trained collaborators to perform exhaustive manual annotation of nuclei in the patches. Annotations were checked for quality, adjusted, and confirmed by MGD and LY. This process resulted in 67 WSI patches exhaustively annotated for nuclei. These patches were split into training, validation, and held-out test datasets to ensure distribution of tissue types (Table 1).

Table 1 Samples used for training and evaluating the segmentation model

Following model training and initial testing, an additional two data sources were used to collect additional annotations for model testing. H&E-stained slides of ulcerative colitis were obtained from BioIVT (Westbury, NY), and H&E-stained breast cancer slides were generously provided by Cleveland Clinic Foundation (CCF; Cleveland, OH) under a data licensing arrangement approved by the CCF Institutional Review Board. An additional 14 512 × 512 pixel patches were identified from these data sources (seven patches from each source), and an additional 2647 manual, exhaustive nucleus annotations were collected for model evaluation (Table 2).

Table 2 Samples used for out-of-distribution (OOD) evaluation of model performance

This work complied with all relevant ethical regulations, including the Declaration of Helsinki. Samples utilized for this study were procured from clinical sources, public databases, or biobanks. For all cohorts used herein, patients provided informed consent for their tissue being used for research purposes, with few exceptions: in some instances, the consent for patients whose tissues were obtained from biobanks were considered “waived” due to the length of time that had passed since the tissue was collected.

Nuclear segmentation model architecture

A Mask-RCNN-style architecture was selected for nuclear segmentation. A ResNet50 backbone pretrained on the ImageNet dataset was used to produce the feature pyramid network. The first two of five modules that comprise ResNet50 were frozen during training to preserve the pretrained weights of early layers. Model development was performed using the PyTorch library42.

Nuclear segmentation model training

The manually-collected annotations were used to train the model for detecting and segmenting cellular nuclei (Fig. 1a). During training, the annotated patches were augmented by crops, flips, rotations, and affine deformations.

Cell classification

Following nuclear segmentation, the cell class of each nucleus was assigned using PathExploreTM (PathAI, Boston, MA)43 models specific to breast cancer (BRCA), lung adenocarcinoma (LUAD), and prostate adenocarcinoma (PRAD); PathExplore is for research use only and is not for use in diagnostic procedures. Cancer epithelial cells, fibroblasts, macrophages, lymphocytes and plasma cells were predicted for all three cancer types, while additional cell classes were predicted for LUAD (granulocytes and normal cells) and PRAD (smooth muscle cells, endothelial cells, and normal epithelial cells). Model performance for the prediction of cell types was assessed by comparing model predictions to pathologist annotations in nested pairwise fashion44. Model performance metrics for BRCA, LUAD, and PRAD are shown in Supplementary Figs. 13, respectively, and Supplementary Tables 1 and 2. Example prediction results are shown in Fig. 3. The five pan-indication cell classes (cancer epithelial cells, fibroblasts, macrophages, lymphocytes, and plasma cells) were used for analyses assessing the biological implications of nuclear feature differences in BRCA, LUAD, and PRAD.

Deployment dataset and feature extraction

The nuclear segmentation model was deployed on publicly available images of H&E slides from the BRCA (N = 886), PRAD (N = 392), and LUAD (N = 426) TCGA cohorts; a summary of clinicopathologic features of each cohort is shown in Table 3. Model performance was qualitatively assessed by board-certified pathologists and determined to be consistent with performance on the held-out test dataset. The features computed for each individual nucleus were: area, circularity, eccentricity, major and minor axis length, perimeter, solidity, and the mean and standard deviation of pixel grayscale intensity, pixel saturation, and pixel A and B channels in LAB colorspace. The mean and standard deviation of each feature from each nucleus class on the slides were used to summarize the nuclear morphology on each slide. This yielded 30 slide-level nuHIFs for each cell type, e.g. the mean area of cancer nuclei, the standard deviation of fibroblast nuclear eccentricity, or the mean pixel grayscale intensity of lymphocyte nuclei. Attributes and features described by nuHIFs are included in Fig. 1b. Thus, the total number of features summarizing the morphology on each slide was 30 times the number of cell classes.

Table 3 Characteristics of patients in TCGA cohorts

Exploring cancer type and nuclear morphology

To compare the nuHIFs quantifying cancer cell, fibroblast, and lymphocyte morphology, uniform manifold approximation and projection (UMAP) analysis was performed. Nuclear HIFs were z-scored across all cancer types for standardization. UMAP was parameterized with 100 neighbors, an embedding dimension of 2, and the Euclidean distance metric. Features characteristic of each cancer type were evaluated by averaging each feature across the samples of each cancer type and z-scoring for visualization; hierarchical clustering (using Euclidean distance with average linkage) identified features that varied across cancer types.

Classifying cancer type from nuclear morphology

Random forest (RF) binary classification models were trained and applied to each cell-type-specific nuHIF set to differentiate between pairs of cancer types. RF classification models were trained using 5-fold stratified cross-validation with balanced class weighting. The performance of each model was assessed using the area under the receiver operating characteristic curve (AUROC) on each held-out validation split. The mean AUROC on the held-out validation splits is reported. RF model training was performed in scikit-learn with default hyperparameters (100 trees)45.

Classifying breast cancer subtype from nuclear morphology

Characteristics of breast cancer molecular subtypes (luminal A, N = 457; luminal B, N = 159; HER-2, N = 66; normal-like, N = 31; basal-like, N = 161) were obtained from a prior study by Berger et al.46. Random forest (RF) binary classification models were trained and applied to each cell-type-specific nuHIF set to differentiate between subtypes in a one-vs.-all manner. RF classification models and cross-validation schemes were identical to cancer-type classification.

Statistical analysis

Spearman (rank-based) correlation was used to find the association between variation in cancer nuclear morphology and metrics of genomic instability. Variation in size was captured by the nuHIF “standard deviation of cancer cell nuclear area” for each slide. For metrics of genomic instability, previously published metrics were selected: aneuploidy score47 and homologous recombination deficiency (HRD) score48. RF binary classification models were trained in scikit-learn with default hyperparameters45 using 5-fold stratified cross-validation with balanced class weighting, and applied to the cancer nuHIF set from each cancer type to predict binarized whole-genome doubling (WGD; 1-2 doublings = 1; no doublings = 0). The performance of each model was evaluated using AUROC on each held-out validation split, and the mean AUROC is reported. The mean RF Gini importance (also called the mean decrease in impurity) of the top five features for each cancer type across the five splits are reported. Cox proportional hazard models were utilized to explore the relationship between BRCA fibroblast nuHIFs and overall and progression-free survival (OS and PFS, respectively). Ordinal tumor stage (1–4) and patient age were included as clinical covariates; 17 subjects missing tumor stage and the one missing survival data were excluded. Robust z-scoring (i.e. using the median and scaled interquartile range) of each nuHIF before modeling was performed for simple interpretation of the hazard ratios (HRs). The p values associated with each nuHIF were corrected for false discovery rate (FDR) by the Benjamini–Hochberg procedure. Survival analyses were performed using the Lifelines library49. Gene expression data was acquired from the Genomic Data Commons (GDC)-processed TCGA BRCA cohort (release 18.0) from the UCSC Xena data portal50. Gene expression samples were paired to case-matched slides in our dataset, yielding 868 expression-nuHIF pairs. Spearman (rank-based) correlation was used to quantify the association between bulk RNAseq expression and the mean fibroblast nucleus area nuHIF for each gene and corrected for FDR via Benjamini–Hochberg procedure. Genes with corrected p < 0.05 and Spearman correlation greater than 0.15 or less than −0.15 were selected to comprise the significant positively and negatively associated gene sets, respectively, for gene set enrichment analysis (GSEA). GSEA51 was performed using the Molecular Signatures Database (MSigDB)52 and the REACTOME pathway database53, and the ten most significant pathway overlaps, with FDR-corrected p < 0.05, are reported.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.