Introduction

In this report, we survey the history and role of machine learning and artificial intelligence in anatomic pathology along with the tools and methods that are to visualize and manage data generated by algorithms. Digital pathology utilizes computational deep learning-based image analysis methods to extract embedded information in high-resolution whole-slide images (WSIs) of tissue sections to obtain quantitative data. We use the term Pathomics to embody the wide variety of data that is captured from image analyses to generate quantitative features to characterize the describe diverse phenotypic features of tissue samples in WSIs. Image analysis of tissues is based on the ability to (1) detect, (2) segment, (3) label, and (4) classify regions of tissue in terms of architectural elements, cells, nuclei, and other histologic features. Digital pathology image analyses methods are being developed to study several types and subtypes of cancer in order to correlate quantified features with various phenotypic characteristics in order to complement traditional histopathologic evaluation performed by pathologists. The ultimate goal of these higher order analyses is to combine pathomics with patient management, radiologic, laboratory testing, and genomic data to analyze and predict clinical outcomes and treatment response.

Even though the digital pathology is gaining momentum and becoming more widely adopted in clinical research, it is currently still limited to single hospital deployments in the USA due to the availability of one FDA-approved digital slide scanner. Current clinical applications that utilize, WSIs include intradepartmental and outside consultation, improving workflow by electronically transferring WSIs from the histology laboratory to pathologists, displaying images for interdisciplinary tumor boards, performing pathologic review for clinical trials, archiving glass tissue slides, and teaching. As the technology becomes more available, the demand for readily available digital pathology image analysis tools appears inevitable. Thus, there is a lot of active research and development of image analysis tools combined with machine learning algorithms that extract, calculate, and analyze a wide variety of histopathologic features in an effort to improve diagnostic evaluation of tissue samples and quantitatively characterize the biological behavior of many different types of cancer. Currently, there are large collections of digitized glass slides that contain tissue sections that are stained with hematoxylin and eosin (H&E) since pathologists routinely utilize H&E for diagnostic examination.

WSIs typically contain hundreds of thousands to millions of objects within a heterogeneous histologic landscape that includes different types of structures, various types of cells, cell morphologies, and spatial relationships that underlie cellular interactions and signaling. Considerable research efforts are being devoted to identify and quantify these histologic features in tissue samples in order to further explore the intrinsic variability of the biological behavior of different phenotypes across the spectrum of cancer in various anatomic sites. Digital pathology can empower pathologists with the ability to observe and quantitatively assess diagnostic features of cancer during histopathologic examination by providing quantitative data about different types of cells and tissue structures within the context of enhanced measurements of the tumor and improved delineation of the invasive border, calculated percentages of the glandular or solid patterns of growth, counts and distributions of different types of cells, calculated features of nuclei like size, area, color, chromatin density, and mitotic activity, and calculated percentage of necrosis across the entirety of a WSI. Common examples of pathomics applications include spatial characterization of tumor and stromal regions, shapes and textures of nuclei, classifications of cell types, quantitative characterization of lymphocytic infiltration, and efforts to quantitatively estimate the number of cells that are labeled with different kinds of biomarkers in immunohistochemistry (IHC) testing.

Tissue-level, cellular-level, and nuclei-level histologic features are evaluated and documented by pathologists in diagnostic reports for every case, after which, the slides are typically filed away in storage. With the advent of digital pathology, these slides are immediately available for further examination and image analyses to quantify these kinds of features in large-scale collections of WSIs to identify various types of relationships and interactions between malignant and non-tumor cells. In this scenario, pathomics can improve the ability of pathologists to better predict biological behavior, clinical outcomes, and guide treatment by comparing the features of a particular WSI to their own collections of WSIs or to those from other studies and clinical trials. Over the past 20 years, the field of digital pathology has developed several types of applications to provide valuable quantitative data in a variety of correlative and prognostic studies. Thus, the ultimate goal is to develop and implement deep learning pathomics tools to provide real-time clinical decision support (CDS) that also supports quality assurance and control (QA/QC) and actively integrates various types of other data alongside routine histopathologic examination of tissue sections on glass slides.

In addition, digital pathology image analysis methods are being developed to utilize pathomics to explore tumor heterogeneity since varying degrees of disease progression, clinical outcomes, and treatment response correspond to the range of histologic features and genomic variability in different populations of tumor cells. Pathomics applications are also being designed to teach trainees and help reduce pathologic interpretation bias to improve diagnostic accuracy and reduce intra- and inter-observer variability. Digital pathology, pathomics, and machine-learning methods are positioned to revolutionize our current level of knowledge about the various types and subtypes of cancer by allowing pathologists to extract and incorporate tremendous amounts of data per WSI compounded by the scale of the studies that can be performed through the analyses of archived WSIs within and across institutions to discover features that have not been readily apparent without these kinds of tools in smaller studies.

This report will focus on a description of traditional histopathologic evaluation by pathologists, image analysis tasks and machine learning, segmentation and classification, WSI analytic systems, and data and metadata management.

The Role of Pathomics in Traditional Histopathology

Surgical pathologists evaluate patterns of tumor growth and tumor cell morphology in tissues from biopsies and surgical resections in conjunction with ancillary laboratory testing and radiology to classify tumor types and subtypes. Pathologists comprehensively report diagnoses that identify tumor type and subtype, size, location, invasive growth pattern, mitotic rate, presence of tumor at surgical margins, and metastases to provide insight into the biological behavior of different tumors in order to guide patient management and select treatment. However, there are countless nuances, semantics, and observation biases that are intrinsic to microscopic examination and formulating diagnoses.

Traditional histopathologic evaluation of cancer is performed at multiple scales of magnification and resolution in order to categorically examine various kinds of complex phenotypic features, as shown in Fig. 1. These phenotypic features are based on a wide variety of molecular events that lead to the presence and progression of disease. Even though pathologists examine the entirety of tissue samples, diagnosis, prognosis, and patient management are based on the phenotypic features of the foci of cancer that appear the most aggressive. Pathologists examine (1) tumor cellularity, (2) architectural distortion, (3) tumor growth pattern, (4) tumor-associated stromal characteristics, (5) inflammatory responses, (6) presence or absence of necrosis, and (7) the invasive border with respect to adjacent surrounding normal tissues at low magnification (typically  20× to  40×). Intermediate and high magnification (typically  100× to  400×) is used to examine (8) distribution of cell types; (9) cell shapes; (10) relative cell sizes (compared with lymphocytes or endothelial cells); (11) cytoplasmic appearance and features; (12) distribution of the size of malignant nuclei; (13) nuclear color; (14) shapes of malignant nuclei; (15) irregularity and contour of the nuclear membrane; (16) chromatin texture, density, and color; (17) presence of nucleoli; and (18) mitotic activity.

Fig. 1
figure 1

Common appearances of histologic samples at low magnification and intermediate-high magnification. Left: Low magnification features include architectural distortion, increased tumor cellularity, solid tumor growth pattern, no overt areas containing dense inflammatory infiltrates, no overt areas of necrosis, and an irregular border with adjacent surrounding normal tissues. Right: Intermediate-high magnification features include predominantly medium-sized and round to ovoid nuclei with abundant eosinophilic cytoplasm. Nuclei contain irregular nuclear contours with open chromatin and variable numbers of small nucleoli. No mitoses are present in this microscopic field

Pathologists typically calibrate histologic examination at multiple scales of magnification by evaluating the aforementioned features in cells and nuclei in non-tumor tissues (normal epithelial tissues, connective tissues, immune cells, etc.) in order to identify irregular or abnormal features of tumor cells. For example, large and hyperchromatic (dark purple-blue) nuclei with abnormal shapes and irregular nuclear borders in a focal area of cancer can indicate high-grade malignancy. In contrast, large and euchromatic (grayish-purple) nuclei are most commonly seen in benign reactive cells in association with inflammatory, regenerative, infectious, and degenerative processes. However, pathologists do not and cannot routinely count or characterize every single cell out of hundreds of thousands of cells in every tissue section. This basic type of information remains unknown and could be valuable in terms of the number of tumor cells in a given cancer type and what proportion of the tumor cells are at the leading invasive border of the tumor that should be coupled to the distribution of the size of tumor cells in different areas of the tumor. Evaluating mitoses is another common example of limited numerical assessment, where pathologists typically look at 10 high-power fields (hpf) (typically  400×) and provide an estimated mitotic rate based on the focus of tumor with the highest number of mitoses instead of counting the total number of mitoses that are present in the tumor.

A typical WSI of almost any tissue section that contains cancer also contains variable amounts of high-grade tumor cells, low-grade tumor cells, normal epithelial structures and cells, stromal connective tissues, adipose tissue, lymphovascular structures, immune infiltrates, and necrosis. Pathomics can be very useful to provide a quantitative assessment of many of these structures and aforementioned features at multiple magnifications to complement traditional histopathologic evaluation by pathologists. Deep learning image-analysis tools have been developed that have the capacity to automatically detect regions of cancer in WSIs to complement low-magnification assessment of tumor growth pattern, tissue architecture distortion, and increased cellularity. Since pathologists also rely on using the dark blue-purple color associated with high cellularity and increased density of chromatin in large malignant nuclei, cancer detection algorithms can be further combined with pathomics analyses that detect and count the number of nuclei and provide quantitative information about the size, shape, texture, colors, etc. of the objects in that region. Additional analyses can be applied to identify and extract quantitative data to describe the features of glands and vessels, lymphoplasmacytic infiltrates, and regions of necrosis in order to parallel and complement the histologic features that are used by pathologists for diagnosis and grading cancer.

Since these various tasks are computationally intensive and may take several hours per algorithm to analyze hundreds of thousands of objects in WSIs, image analysis methods typically divide WSIs into tiles that can be as small as 50 by 50 pixels or as large as 2000 by 2000 pixels to circumnavigate constraints in computer memory and decrease processing time. Therefore, pathomics tools are usually based on the ability to detect, segment, label, and classify regions of tissue, tissue structures, cell types, and nuclei in tiles, which are then combined to represent the analyses for entire WSIs. This, too, is very similar to how pathologists evaluate numerous regions of interest (ROIs) by physically moving the slide through the field of view under microscope objectives to evaluate the entirety of a tissue section at various scales of magnification and resolution. Digital pathology image analysis methods are either global and attempt to simultaneously identify all the structures in a specific ROI or localized to detect and segment physiologic structures based on the number of objects, cells, and nuclei.

The addition of quantitative deep learning image analysis and pathomics to descriptive traditional histopathologic evaluation by experienced pathologists is a very exciting frontier that can be immediately useful in routine surgical pathology workflow in many subspecialties of pathology. Pathomics is also well positioned to quantitatively define known salient features across the scale of thousands to hundreds of thousands of WSIs of cancer and discover unknown features and relationships that can provide insight into many different types of cancer in an unprecedented manner by improving our ability to better characterize disease through improved diagnostic accuracy and decreased inter-observer variability. For example, if there is an unusually aggressive clinical course with treatment failure or tumor metastasis, digital pathology can be utilized to explore and capture quantitative information about the phenotypic features of specific cells, cellular structures, and focal areas of tumor features that are beyond the scope of traditional histopathologic diagnosis in order to possibly provide an explanation for this kind of biological behavior.

Thus, there is considerable excitement about digital pathology image analyses and pathomics due to the opportunities to integrate the classification of histopathologic features of many different types of cancer with applicable diagnostic endpoints.

Image Analysis Tasks and Machine Learning

In this section, we outline whole slide image analysis tasks. Analysis tasks include the following: (1) assignment of a classification to a collection of WSIs arising from a given biopsy or resection (e.g., a set of WSIs might be classified using established classification guidelines such as Gleason grade for prostate cancer); (2) classification of a region of a whole slide image (e.g., ROI classified as cancer, infiltrated by lymphocytes, or having a particular tissue morphology, such as a local Gleason pattern assigned to a tissue tile); (3) detection and/or segmentation of specific microanatomic structures such as specific types of cells or nuclei with subsequent classification (e.g., mitotic figures, lymphocytes, or cancer cells). Accomplishing these tasks encompasses segmentation and classification. As shown in Fig. 2, segmentation extracts the locations and boundaries of objects (nuclei, cells, or regions) in an image, whereas classification groups objects and assigns class labels.

Fig. 2
figure 2

Examples of nuclear segmentation. The boundaries of the nuclei are highlighted in yellow based on differences in color, contrast, and texture

Earlier approaches in WSI analysis often employed statistical techniques to detect and delineate object boundaries based on the statistics of intensity and texture variations, clustering methods, binary classifiers, and probabilistic/non-probabilistic machine learning methods to classify regions and images [1,2,3,4,5]. In recent years, deep machine learning has significantly grown in popularity in the computer vision and image analysis communities, driven by the increased uptake of deep learning methods in Big Data and Internet companies and increased computing capacity through specialized computing devices (GPUs) and cloud-based computing [1, 5, 6]. Since there are many research and engineering projects that target the development of efficient and reliable image analysis techniques, software tools, and infrastructure, we describe recent work in segmentation and classification in selected current projects with the goal of providing an overview of the evolving digital pathology landscape.

Changes in subcellular tissue structure can function as valuable biomarkers that can be used to assess onset and progression of disease. The use of digital pathology data in clinical and research settings have been studied and validated by several studies [7,8,9,10,11,12,13]. Availability of tissue images can facilitate multi-institutional and national level studies with large cohorts of patients. There are approximately 31,000 WSIs in the TCGA from diagnostic and frozen tissue samples from over 30 different cancer types that were collected from 11,000 patients. This resource has led to numerous digital pathology studies both in terms of novel methodology development and scientific inquiry [10, 12•, 14,15,16,17].

Another national consortium, the Surveillance, Epidemiology, and End Results (SEER) Program, collects tissue specimens from large cohorts of patients (about 500,000 patients per year) and is investing in digitizing glass tissue slides as WSIs. Imaging studies at this scale can reveal novel biomarkers and phenotypes that are common within cohorts of these patients that can eventually lead to more effective diagnoses and treatment strategies. Even in smaller studies, data from WSIs enables quantitative, objective, and reproducible characterizations of tissue data that are not possible by histopathologic evaluation of glass slides. These kinds of capabilities have great potential for improving the prediction of clinical outcomes and treatment response as we continue to evaluate and refine workflows and systems that can play crucial roles in precision medicine.

Qaiser et al. [18, 19] designed a tumor segmentation method that uses persistent homology profiles (PHPs) and deep convolutional networks, where PHPs map a given image patch into one-dimensional statistical distributions that represent the degree of nuclear connectivity based on a combination of features learned from a convolutional neural network (CNN) and PHPs. A fast histopathology image inference network was proposed by Lin et al. [20] for cancer metastasis detection in whole slide tissue images by taking advantage of fully convolutional architectures for speed through the utilization of deep convolutional and pooling layers to improve prediction accuracy. Cruz-Roa et al. [21•, 22] proposed a set of CNNs for detection of invasive cancer regions by using an adaptive sampling method that selects the most relevant image patches instead of densely scanning the entire whole slide image to predict regions of tissue with invasive cancer.

Hou et al. [23•] developed a sparse convolutional autoencoder that enables the detection and encoding of nuclei in an image patch into a sparse feature map, which is then processed to segment the boundaries of nuclei. Zheng et al. [24] devised a CNN method to extract and characterize distributions of nuclei in tissue images, whereas Janowczyk et al. [25] devised an adaptive deep hierarchical approach to nuclear segmentation that leverages deep learning models at lower image resolutions to increase the speed of computations while preserving accuracy. Al-Milaji et al. [26] developed a CNN-based pipeline to classify regions of tissue in H&E WSIs into stromal and epithelial regions. A related approach was proposed by Yu et al. [27] with a machine-learning model that learned class dictionaries to classify tissue images.

Mobadersany et al. [12•] implemented a method that combines image analysis by CNNs with genomic markers into a unified machine learning model to predict the survival of patients with glioma, where the deep learning architecture consists of convolutional layers that are trained to predict image patterns associated with survival, fully connected layers that further transform image features from the convolutional layers, and a Cox proportional hazard layer that models survival data. Peikari and Martel [28] proposed a color transformation step that maps the red-green-blue (RGB) color space by computing eigenvectors of the RGB space to perform cell segmentation by utilizing the color-mapped image. A deep learning method is employed by Sirinukunwattana et al. [29] to detect and classify nuclei in H&E stained color cancer tissue images by implementing a spatially constrained CNN for nucleus detection followed by a predictor that is coupled with a CNN for classification. Deep learning-based analysis framework have also been used for analyses of prostate cancer cases with Gleason scores [29]. The framework implements methods that combine data from whole slide tissue images and genomic data to identify computational biomarkers for recurrence and survival analysis.

Ensembles of support vector machines (SVMs) were used by Manivannan et al. [30] to detect and classify cellular patterns. Peikari et al. [28, 31] designed an analysis pipeline where a clustering operation is executed on input data to detect the structure of the data space, where a semi-supervised learning method is then executed to carry out classification using clustering information. Chen et al. [32] developed a deep learning framework for segmentation that implemented a multi-task learning approach by the use of multi-level CNNs. A pipeline of clustering, segmentation and classification operations for identification and categorization of breast regions is described by Dundar et al. [33]. The analysis pipeline segments and extracts imaging features from cells and use them in a binary classifier. Beck et al. [34•] use a binary classifier for epithelial vs. stromal classification in breast cancer cases. The binary classifier is trained with morphological imaging features extracted from H&E images.

Saltz et al. [10] developed a deep learning–based patch classification workflow to characterize patterns of lymphocytes in whole slide tissue images that utilizes a CNN for classification of image patches as lymphocyte-positive (i.e., containing lymphocytes) or lymphocyte-negative in combination with a CNN to segment necrotic regions in order to eliminate false lymphocyte-positive classifications. The lymphocyte-detection pipeline was applied to 5200 images from 13 cancer types in the Cancer Genome Atlas (TCGA) repository to quantitatively characterize tumor infiltrating lymphocytes (TILs), since TILs have become increasingly important in precision medicine with the growth of cancer immunotherapy. These kinds of characterizations of WSIs will become increasingly significant to understanding the immune response associated with cancer in each patient in various clinical scenarios. High densities of TILs correlate with favorable clinical outcomes including longer disease-free survival and/or improved overall survival (OS) in multiple cancer types, where recent studies further suggest that the spatial context and the nature of cellular heterogeneity within the tumor microenvironment in the main bulk of the tumor and the invasive margin are important in cancer prognosis.

One of the challenges in machine learning analysis of WSIs is the lack of large training datasets that contain ground truth due to the amount of intense labor and time that is necessary for manual segmentations and classifications of tissue regions and micro-anatomic structures such as nuclei and cells. Several research projects have investigated the generation of synthetic datasets to overcome this limitation. Mahmood et al. [35] proposed a conditional generative adversarial network (GAN) for nucleus segmentation that is trained by synthetic and real data to address limited real training data and the ongoing challenge of separating overlapping and clumped nuclei by utilizing a cycle GAN method to generate synthetic image patches and segmentation masks. Hou et al. [36] proposed an unsupervised (GAN) architecture for generation of synthetic tissue images and segmentation masks to generate training data for deep learning segmentation methods consisting of multiple CNNs that generate initial synthetic images and then refined them to reference styles in order to train a segmentation model. Senaras et al. [37] developed a GAN method that generates synthetic image datasets with known amounts of positive and negative nuclei in images from IHC-stained tissue specimens for tumor grading.

Machine learning and deep learning methods have also been used for quality assessment in whole slide images. Senaras et al. [38] applied deep learning methods to detect out-of-focus regions in whole slide tissue images that can be avoided in segmentation and classification operations. Wen et al. [39, 40] utilized machine learning classifiers (SVM, random forest, and CNN) that operate on texture and intensity features extracted from image patches to evaluate the quality of nuclear segmentation results.

Several research groups have also developed methods for 3D reconstruction of morphological structures and image volumes from histopathology images [41, 42]. Kather et al. [43•] employed image analysis pipelines to generate topographic mappings of multiple immune cells in immunohistochemistry stained images. These various methodologic approaches from nuclear segmentation and characterization to 3D reconstruction are continuing to be refined and developed for a wide variety of clinical applications that can be very useful for pathologists in the near future.

WSI Visual Analytics Systems

There are currently several virtual microscope applications that facilitate the visualization of WSIs for pathomics analyses that include open-source and commercial software tools for WSIs, where a few examples include caMicroscope (Emory University, Atlanta, Georgia, USA), QuPath (University of Edinburgh, Edinburgh, UK), HALO (Indica Labs, Corrales, New Mexico, USA), Aperio GENIE (Vista, California, USA), HistoRx AQUA Analysis (Branford, Connecticut, USA), and Visiopharm (Hoersholm, Denmark). These viewers are designed to give users the ability to freely explore any part of the image by scrolling and zooming in and out of WSIs to function as a virtual microscope. These software applications also provide interfaces that (1) permit the ability to gather, store, and interact with large collections of WSIs; (2) annotate and measure specific histologic features at multiple scales (0.0001 cm for tumor size and distance to the surgical resection margin to 0.1 μm for nuclear and cell size to calculate nuclear to cytoplasmic (N/C) ratio); and (3) view results from image analysis and deep learning methods, such as nuclear segmentation, tumor identification, and lymphocyte detection.

Even though there has been a lot of progress and development in software and infrastructure to perform WSI analyses to produce and store pathomics data with various methods and approaches, there is a need for visual analytic systems that can ultimately integrate and represent the various forms of large amounts of data in a biologically interpretable manner. This is even more important when we consider integrating pathomics with correlative data from molecular studies, radiomics, and clinical data. For example, Fig. 3 shows an example of the wide spectrum of features that can be calculated from nuclear segmentation to provide heat maps, which depicts the relationships between salient image-based features that can be further correlated with relevant histopathologic features.

Fig. 3
figure 3

Featurescape example showing the relationship between calculated features in a WSI based on nuclear segmentation

Even though Featurescape provides a powerful and interactive view into the WSI from an image analysis data point of view, the relationships between these numerous and different kinds of extracted image features from segmented objects are still not readily interpretable by pathologists. Therefore, pathomics analyses can also be displayed as an image-based representation to show how features can be readily interpretable within a histologic context as a multilayer Featuremap. The combination of these views into the data can be utilized to address the challenging problem of identifying which of these features are relevant and how they are related to the histopathologic features of disease.

Therefore, significant efforts have been dedicated to develop interfaces that permit easy navigation of large collections of WSIs combined with Featuremaps that permit interactive assessment of how image analysis-based features can be used to quantitatively identify heterogeneous structural and textural tissue characteristics in different types of tissues and tumors. As shown in Fig. 4, H&E WSIs can be analyzed with tumor and lymphocyte detection deep learning methods and then combined to provide a view into the quantitative spatial assessment of TILs in terms of TIL%, location, and overall pattern of the immune infiltrate with respect to intra- and peritumoral TILs. These maps can also display the calculations in a similar manner to help interpret the image-based features in the context of histology at multiple scales of magnification. In the near future, these tools will be used to correlate specific image-based features with various histologic features to predict the expression of various proteins on the cell surface and subcellular structures in conjunction with immunohistochemistry (IHC) for applications ranging from biomarker discovery, pharmaceutical research, systems biology, and treatment planning.

Fig. 4
figure 4

Deep learning image analysis pipeline to generate a Featuremap. a Low magnification H&E image. b Automated tumor detection displayed as a probability distribution from 0 to 1. Non-tumor tissue in solid blue. c Automated lymphocyte detection displayed as a probability distribution from 0 to 1. Non-lymphocyte tissue in solid blue. d Combined tumor and lymphocyte detection with tumor depicted in yellow, lymphocytes in red, and non-tumor tissue in gray. This permits the ability to see lymphocyte detection in the context of cancer to identify TILs and calculate TIL% in intra- and peri-tumoral areas. The parameters shown in Featurescape (Fig. 3) can also be depicted in this manner to permit the visual interpretation of features within a histologic context in combination with tumor and lymphocyte detection

Data and Metadata Management

Pathology images are captured from glass tissue slides by digital microscopy scanners. Currently, there are no standard or widely accepted community formats for pathology image files, whereas radiology images are stored and shared in DICOM format [44]. Each digital pathology imaging vendor has their own file format which stores metadata about images at varying levels of detail in vendor-specific metadata fields that limits the types and amount of metadata that can be directly extracted from a pathology image. There are open source libraries, such as OpenSlide [45] and Bio-formats [46] that can parse some vendor pathology image formats and read image metadata, but are limited by whether they can recognize specific vendor metadata fields. Similarly, there are no community-accepted models or standard data models to represent metadata about the results from analyses of digital pathology image data.

The lack of imaging standards in digital pathology has led to the development of a wide variety of data models and databases. There are a number of commercial software systems developed by digital microscopy vendors and pathology image analysis companies and a growing number of open source and freely available software systems. The OME (Open Microscopy Environment) project has developed a data model for representation of image acquisition parameters, image experiment metadata, and image analysis results [46]. The OME model is supported by a relational database for storage, indexing, and querying that is realized in XML file format for exchange of image metadata between software systems and research teams. The OMERO (OME Remote Objects) extension of the OME platform is designed to serve as a central repository of pathology image data [47]. Wang et al. [2, 48] developed PIDB (Pathology Image Database System) and PAIS (Pathology Analytical Imaging Standards) models to represent pathology image metadata and image analysis results and metadata that draw from the AIM (Annotation and Image Markup) model [49] and extend it with data elements for pathology image experiments, acquisition parameters, and analysis results, which are then implemented on top of a relational database management system.

Gutman et al. [50] implemented a software system for management and analysis of pathology images by using Girder [51] as the backend database management system. Martel et al. [52] developed a desktop application, PIIP (Pathology Image Informatics Platform), which extends a freely available tool, Sedeen, for the management, annotation, and viewing of pathology image data. Bisque [53] is a web-based platform that provides support for organization and analysis of image data and uses metadata representation based on tag documents (i.e., nested sets of (key, value) tags). Williams et al. [54] developed a prototype system, IDR (Image Data Resource), to integrate and share image data from multiple modalities, including digital pathology and multi-dimensional microscopy data. Image and annotation metadata are represented in tabular forms, drawing from the MAGE-TAB and ISA-TAB models. The QuIP platform [55] implements a fully containerized software system for the management, analysis, viewing, and sharing of digital pathology imaging data and image analysis results by supporting a GeoJSON compliant [56] data model to represent image analysis results in the FeatureDB database, which is built on top of a NoSQL document store. Cytomine [57] is a web-based software platform designed to support sharing of histology and molecular imaging data for proteomics preprocessing. The ImageMiner system [58] provides support for management and analysis of tissue microarray datasets by using a relational database backend for management of image data and annotations. A prototype system for content-based image retrieval to search and retrieve pathology images was developed by Zheng et al. [59]. The caTIES project led by Crowley et al. [60] developed methods and software infrastructure to support analysis and coding of surgical pathology reports. The coded results could then be used to search and retrieve specific cancer cases and tissue specimens.

In addition to these commercial and open source software systems, there are ongoing efforts for the standardization of the representation and storage of pathology image data and analysis results. Even though there are presently very few digital microscopy vendors who have adopted DICOM, there is an increasing push for use of the DICOM WSI format [61, 62]. The DICOM standards body is also working on additional supplements for the capture and representation of digital pathology image and analysis metadata [63] and DICOM Structured Reporting [6466]. As these standards are refined and implemented, we expect that open source and commercial software products will adopt these formats as their default data models for image analysis results to enable interoperability across different imaging and software systems in order to facilitate easier development and integration of new data management capabilities.

Conclusions

Artificial intelligence has arrived in anatomic pathology. While the technology is still primarily in the hands of researchers, emerging methods will transform the landscape and workflow of surgical pathology by allowing identification, analysis, and classification of every cell and microanatomic structure found in tissue sections. In our opinion, the deluge of data will increase our collective insight into cancer and provide pathologists with tools that will allow them to interpret their visual inspection in unprecedented ways. Since we have already begun to see how the relationships and patterns that are present in these new types of data can be utilized to steer patient treatment and predict outcome, we believe that it is the right time to introduce the wider audience of pathologists and cancer researchers to established and emerging pathomics.