Automated prostate tissue referencing for cancer detection and diagnosis
- First Online:
The current practice of histopathology review is limited in speed and accuracy. The current diagnostic paradigm does not fully describe the complex and complicated patterns of cancer. To address these needs, we develop an automated and objective system that facilitates a comprehensive and easy information management and decision-making. We also develop a tissue similarity measure scheme to broaden our understanding of tissue characteristics.
The system includes a database of previously evaluated prostate tissue images, clinical information and a tissue retrieval process. In the system, a tissue is characterized by its morphology. The retrieval process seeks to find the closest matching cases with the tissue of interest. Moreover, we define 9 morphologic criteria by which a pathologist arrives at a histomorphologic diagnosis. Based on the 9 criteria, true tissue similarity is determined and serves as the gold standard of tissue retrieval. Here, we found a minimum of 4 and 3 matching cases, out of 5, for ~80 % and ~60 % of the queries when a match was defined as the tissue similarity score ≥5 and ≥6, respectively. We were also able to examine the relationship between tissues beyond the Gleason grading system due to the tissue similarity scoring system.
Providing the closest matching cases and their clinical information with pathologists will help to conduct consistent and reliable diagnoses. Thus, we expect the system to facilitate quality maintenance and quality improvement of cancer pathology.
KeywordsProstate cancer Database Tissue morphology Tissue retrieval Infrared imaging Decision support
Quality assurance in diagnostic histopathology plays a critical role in development of a treatment plan for patients with prostate cancer . Methods to integrate quality development, maintenance, and improvement of diagnostic accuracy are, hence, critical to cancer management in any setting. In diagnostic prostate pathology, Gleason grading  is the most commonly used grading system that is based upon the structural patterns of the tumor. The Gleason grade is a primary determinant in treatment planning . However, it is well known that the grading of prostate tissues suffers from intra- and inter-pathologist variability [4, 5, 6]; for example, the exact intra-pathologist agreement was achieved in 43–78 % of the instances, and 36–81 % of the exact inter-pathologist agreement was reported. It is also known that the variability of the grading can be reduced with focused retraining. There could be many ways to educate pathologists such as meetings, courses, online tutorials, and etc , but these are not time- and cost-effective and rarely implemented. Therefore, building an automated, fast, and objective method to aid pathologists in evaluating prostate can improve prostate cancer diagnosis.
When a pathologist evaluates a tissue sample, he/she looks at a stained tissue and mentally compares it against a fund of knowledge and experience and may consult publications when needed. In essence, the pathologist is matching structural patterns with samples they have seen earlier and mentally recalling the diagnosis made such that they can make the same diagnosis in the specific test case. Despite training, intra- and inter-observer variation and controversial areas still exist . To aid and improve the diagnostic process, there have been several research efforts to develop automated systems for the detection and grading of prostate cancer. The majority of the previous methods have used morphological features [9, 10, 11, 12, 13, 14, 15, 16] to characterize and classify tissue samples into correct classes, and others have also used Fourier Transform , Wavelet Transform [13, 18, 19], and Fractal Analysis [13, 20] to extract texture features. Though these methods claim to be accurate, the information that pathologists will obtain by using such methods may be limited since these only provide the predicted grade in general. The prediction also relies on the conditions of the training and testing datasets such as acquisition settings [15, 19] and staining .
Alternatively, content-based image retrieval (CBIR) systems [22, 23, 24] have been proposed to aid cancer pathology. The main objective is to effectively and efficiently manage an enormous amount of image data and to provide similar cases to a new test case that is examined. In addition to clinical usage, CBIR systems can help medical research, education, and training [22, 24]. The similar cases can be determined as owning the same grade [25, 26, 27, 28] or sub-structures [29, 30]; for instance, single lumen glands, multi-lumen glands blood vessels, and lymphocytes in prostate . The basic premise of such systems in diagnostic histopathology is that tissue samples that have the same grade or similar characteristics and patterns with the sample of interest will afford useful information to pathologists and improve the decision-making process. Similar to cancer detection and grading systems, tissue is represented as several quantitative features such as morphology [26, 32, 33], histogram , color [28, 34], and texture [27, 28, 29, 32, 33, 34, 35]. The similar samples can be retrieved by computing distance metrics or similarity scores between a new case and the previously diagnosed or examined cases. In order to improve tissue representation and retrieval, features are often post-transformed and/or their weights are adjusted in an implicit or explicit manner; for example, kernel function , simplex method , manifold learning [26, 36], boosting [25, 27], and self-organizing map (SOM) .
Previous retrieval systems have been measured against a gold standard of diagnostic category and grade of tumor, defined by a pathologist. Prostate cancer is, in particular, a multifactorial disease and a mixture of heterogeneous growth patterns , and hence tissues belonging to the same Gleason grade may possess different cellular, nuclear, or glandular sub-patterns. A number of histological variants, in fact, exist in prostate carcinoma and some of the variants cannot be addressed by the Gleason grading system . Moreover, the Gleason grading system results in a tumor grade that correlates with overall outcomes (survival), but fails to provide information on risk of metastasis, and correlates poorly with the clinical decision making process. Further, the Gleason grading system has gone through several refinements over time [8, 39, 40, 41] and may undergo further changes [42, 43]. These changes result variations among pathologists in practice  and disrupt developing robust automated grading and retrieval systems.
The rest of the paper is organized as follows. In Methods section, we begin with a description of the dataset and data preparation process. In the following subsections, we describe the three key components of our new system – tissue similarity measure, tissue morphological feature extraction, and tissue retrieval function. Then, feature selection and balanced training are described. In Results section, the experimental results, including tissue similarity measure and tissue retrieval, via cross-validation are demonstrated. In Discussion section, the implications and limitations of our study are discussed. Finally, we conclude in Conclusions section.
Samples and data preparation
This study and protocols were approved by the University of Illinois Institutional Review Board (IRB) and was conducted as per the permission of the IRB in accordance with relevant guidelines and regulations. We have obtained 114 prostate cancer tissue samples (Tissue Array Research Program, National Cancer Institute and Clinomics Inc.), composed of 19 (Gleason 6), 26 (Gleason 7; 16 Gleason 3 + 4, 10 Gleason 4 + 3), 22 (Gleason 8), 10 (Gleason 9; 1 Gleason 4 + 5, 9 Gleason 5 + 4), and 37 (Gleason 10) samples. Both hematoxylin and eosin (H&E) stained and FT-IR images are available for the samples. Tissue samples were first sectioned to ~5um thick sections, with a section being placed on a standard glass slide and a serial section on IR transparent BaF2 slide. Stained with H&E, tissue images were acquired on a standard optical microscope at 40x magnification, and the size of a pixel is 0.963um × 0.963um. On IR transparent BaF2 slides, FT-IR images were acquired at a spatial pixel size of 6.25um × 6.25um and a spectral resolution of 4 cm-1 at an undersampling ratio of 2 using Perkin-Elmer Spotlight imaging system. The spectral profile of a pixel was truncated to a spectral range of 4000-720 cm-1. Detailed description of sample preparation and data acquisition for FT-IR imaging are available in Fernandez et al. . Clinical information (Gleason grade, age, surgery type, etc.) of the samples were prepared by pathologic review, and 308 morphological features were also extracted. The database we build here, therefore, contains 114 tissue images (of two different modalities) and their clinical information and 308 morphological features.
Morphologic criteria and tissue similarity measure
Description of 9 Morphologic criteria
Gland tightness and cohesiveness
Roundness of external perimeter of gland
Serrated contours or spindle shaped contours
Swollen, plump cells in stroma and splayed collagen fibers
Prominent nucleoli, variation in nuclear diameter and amount of chromatin
Some prominent nucleoli, moderate variation
Many prominent nucleoli, large variation
Cleft formation or retraction artifact around cancer glands
Ratio between lumen area and total gland area
Continuous sheets of cells
Individual cells separated by stroma
Predominant and secondary Gleason score
6 – 10a
Morphological feature extraction
Feature selection is the step where the retrieval algorithm examines all available features (308 in our case) with respect to the training samples, and selects a subset to use on test data. This selection is generally based on the criterion of high accuracy on training data, but also strives to ensure generalizability beyond the training data. We adopt a two-stage feature selection approach here. In the first stage, we order the features by their individual retrieval performance and sequentially measure the retrieval performance of a feature set by adding a new feature one at a time according to the order. In the second stage, feature selection continues with the feature set resulting the best retrieval performance in the first stage as the starting point, following the sequential floating forward selection (SFFS) method . This method sequentially adds new features followed by conditional deletion(s) of already selected features.
Ranking-SVM tries to learn an overall ranking of the training dataset. When trained on biased or unbalanced training dataset, Ranking-SVM may be biased towards dominant dataset, and thus its retrieval capability may be limited. To prevent this, we sought to take roughly balanced sub-samples of the training dataset and trained Ranking-SVM on the sub-samples. To obtain the roughly balanced training dataset, we first divide the total TMS score range into P equal-width partitions. Then, NP number of pairs of samples from each partition was randomly selected. We set NP to the smallest number of pairs of samples in a partition.
Tissue morphologic similarity measure
Tissue retrieval system provides good matching cases
To evaluate the tissue retrieval system, we performed K-fold cross-validation (K = 10; maintaining a sufficient number of tissues in the database). The entire dataset was divided into K roughly equal-sized partitions, one partition was left out as “test data” (or queries), the union of the remaining K – 1 partitions (the “training data”) was used to build the database where top-T similar samples are retrieved for each query (T = 5). This was repeated K times with different choices of the left-out partition. In each repetition, the 2-stage feature selection was carried out on the training data via a cross-validation (5-fold). The average NDCG at rank position T of the tissue retrievals for the queries, across all K repetitions, was computed to measure the performance of the retrieval. To handle the imbalance of TMS scores in the dataset, a roughly balanced training dataset was formed by dividing the entire score range into P equal-width partitions (P = 10; allocating a sufficient number of tissues per partition in regard to the number of retrieved samples) and randomly taking equal number of samples from each partition. The method was implemented in IDL (tissue segmentation and morphological feature extraction) on 1 1.67GHz Intel Core Duo machine running Windows 7 with 2GB memory and C++ (feature selection and tissue retrieval) on a 2.5GHz Intel Core 2 Duo machine running Redhat Linux 4 with 2GB memory. The average processing time for tissue segmentation and morphological feature extraction is ~8 min per sample, and the tissue retrieval time is ~1 s. The Ranking-SVM training and the feature selection took ~3 s and ~90 min, respectively.
Statistical significance of tissue retrieval
Tissue retrieval performance
0.35 ± 0.13
0.75 ± 0.06
0.29 ± 0.14
0.68 ± 0.06
TMS score reveals the complicated relationship between tissues
Herein, a tissue retrieval system has been developed and tested for prostate cancer. This approach is particularly well suited for cancer and other diagnostic situations where there are multiple parameters applied to defining a grade. In the system, a database allows pathologists to easily manage and maintain the previous cases and outcomes, and immediate access to them is available due to efficient retrieval algorithm. Accordingly, the performance of tissue retrieval is reliant on both a database and a retrieval process. Hence, further study on matching algorithm, performance measure, and data handling, e.g., data normalization, would be necessary, and a large-scale validation study should be conducted to optimize and stabilize the system for various queries, tasks and users’ demands.
The size of the database may substantially affect the performance of the retrieval system. In tissue retrieval, it is assumed that the database contains enough number of similar samples to any kind of query. That is, the retrieval system will benefit from the large-scale database, including a variety of patterns of tissue samples from multiple institutions. The retrieval system with the large-scale database will not only serve for various queries and tasks but also improve and stabilize TMS scores. The similarity score for a criterion between two samples is dependent on the number of samples between them according to the criterion. The distribution of the samples will become more realistic, leading to the more accurate and reliable similarity measure. Moreover, scoring tissue samples by multiple pathologists will further aid in improving TMS scores. However, with the limited size of the database, the distribution of TMS score for one query differs from another (Fig. 3a). Some may have many high scoring sample pairs, but some may have few of them. In the latter cases, the retrieval system may return the most similar samples, i.e., the retrieval is valid and useful, but it is a seemingly bad retrieval due to relatively lower TMS score. The overall distribution of TMS score also affects the retrieval. In our study, a limited number of tissue sample pairs show a high or low TMS score (Fig. 3b), i.e., it is likely that the system retrieves tissue samples owning mid-range TMS scores. In fact, as we trained Ranking-SVM on the entire training dataset, i.e., without balanced training, less number of samples owning higher TMS scores was retrieved for the query (Additional file 1: Figure S1), for example, TMS score ≥6. Accordingly, taking a roughly balanced subset of the training dataset is a valid decision and helps to provide a more effective and robust retrieval process.
Gleason grades in the dataset are not evenly distributed. A lack of a sufficient number of samples per grade may result in a loss of information of certain patterns in prostate cancer. However, the imbalance of the distribution in this study is not likely to have a significant impact on the retrieval system. The system is still able to retrieve matching cases from the database. A high TMS score does not indicate that a sample pair has the same grade. The effect of each grade on the retrieval system may be further studied to improve and stabilize the retrieval system.
We only retrieved the 5 closest samples to a query. The more samples we retrieve, the higher probability the system provides well matched cases with pathologists. However, retrieving many samples (e.g., >10) will be burden to pathologists due to additional time and effort to decide what samples are relevant and useful. Hence, providing the most similar samples would be more helpful and effective. It necessitates little time and work from pathologists to judge on the retrieved samples, however deliver good matches. We note that if a pathologist would like to retrieve more or fewer samples from the database, then the retrieval system (Ranking-SVM) should be re-trained by adjusting the number of retrievals. If more samples are added to the database, then the whole system should be re-trained (or updated) by computing TMS scores and morphological features and constructing a new Ranking-SVM. Moreover, as one or more morphological properties are of interest to a pathologist, the similarity score can be re-computed and used to train the retrieval system. The pathologist may indicate that certain matches were better than others, resulting in an updating of the database (e.g., changing TMS score) and matching algorithms as needed. The updating may be conducted in real-time. Therefore, the system is potentially adaptable to users’ demand and purpose.
The 9 morphological criteria were manually scored by a pathologist and used to measure TMS score. Like Gleason grading, it is still a qualitative measure. Based on the qualitative measure, the pathologist categorizes (or scores) tissue samples per criterion. It is well known that such qualitative measure is subject to inter- and intra-observer variability, i.e., likely mis-score (or mis-classify) tissue samples, in particular for the borderline cases. Poor scoring (or mis-scoring), in our study, will disrupt the similarity measure. However, the impact of mis-scoring on the retrieval system may not be as significant as that of Gleason grading. Mis-scoring in Gleason grading may give rise to a totally different pattern and outcome prediction. Unlikely, TMS score is a combined measure of the 9 different properties and varies in a continuous fashion. Some mis-scorings of the 9 criteria clearly affect the similarity measure but may not cause a complete change in the tissue similarity. Nevertheless, a follow-up study is desirable to examine the influence of mis-scorings among the 9 criteria on the similarity measure and the tissue retrieval performance.
We have presented an efficient and effective tissue management and decision-support system. TMS score offers an alternate means of assessing tissue characteristics and similarities as well as developing and testing computerized methods. Next steps in development would be the validation and application of this system with additional users. The system can be applied to a diversity of diagnostic entities in histopathology. The approach is adaptable in scale, including reference dataset, scoring metrics and matches presented to the pathologist. We anticipate that this approach will open a new direction for the development of automated methods for cancer pathology.
This work was supported by National Institutes of Health – National Cancer Institute via grant R01CA138882.
Availability of data and material
The source code, datasets, and supplementary information are available through the following link: chemimage.illinois.edu.
Conception and design: JK, SMH, SS, RB. Development of methodology: JK, AK-B, SS, RB. Acquisition of data: JK, AK-B, SMH. Analysis and interpretation of data: JK, AKB, SS, RB. Writing, review, and/or revision of the manuscript: JK, AK-B, SMH, SS, RB. Study supervision: SS, RB. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
This study was performed on diagnostic specimens with information that neither identified the subjects directly nor indirectly through identifiers linked to the subjects. It was approved by and performed in accordance with the University of Illinois at Urbana-Champaign Institutional Review Board. The approved project is entitled “Optical spectroscopy and imaging of archival fixed tissue,” case number 06684, and consisted only of secondary analysis performed on anonymized archival tissue and, as such, according to the University of Illinois at Urbana-Champaign IRB policy, is exempt from written, informed consent.
- 1.Humphrey PA. Prostate pathology. Chicago: American Society for Clinical Pathology; 2003.Google Scholar
- 4.Montironi R, Mazzuccheli R, Scarpelli M, Lopez-Beltran A, Fellegara G, Algaba F. Gleason grading of prostate cancer in needle biopsies or radical prostatectomy specimens: contemporary approach, current clinical significance and sources of pathology discrepancies. Bju Int. 2005;95(8):1146–52.CrossRefPubMedGoogle Scholar
- 10.Wetzel AW, Crowley R, Kim SJ, Dawson R, Zheng L, Joo YM, Yagi Y, Gilbertson J, Gadd C, Deerfield DW, et al. Evaluation of prostate tumor grades by content based image retrieval. P Soc Photo-Opt Ins. 1999;3584:244–52.Google Scholar
- 11.Doyle S, Hwang M, Shah K, Madabhushi A, Feldman M, Tomaszeweski J. Automated grading of prostate cancer using architectural and textural image features, I S Biomed Imaging. 2007. p. 1284–7.Google Scholar
- 12.Naik S, Doyle S, Feldman M, Tomaszewski J, Madabhushi A. Gland segmentation and computerized gleason grading of prostate histology by integrating low-, high-level and domain specific information. In: MIAAB workshop. 2007. p. 1–8.Google Scholar
- 14.Arif M, Rajpoot N. Classification of potential nuclei in prostate histology images using shape manifold learning, International Conference on Machine Vision 2007, Proceedings. 2007. p. 113–8.Google Scholar
- 24.Wei C-H, Li C-T, Wilson R. A content-based approach to medical image database retrieval, Database Modeling for Industrial Data Management: Emerging Technologies and Applications. 2005. p. 258–90.Google Scholar
- 25.Naik J, Doyle S, Basavanhally A, Ganesan S, Feldman MD, Tomaszewski JE, Madabhushi A. A boosted distance metric: application to content based image retrieval and classification of digitized histopathology. In: SPIE Medical Imaging. Lake Buena Vista, USA: International Society for Optics and Photonics: 72603F-72603F-72612; 2009.Google Scholar
- 26.Sparks R, Madabhushi A. Out-of-Sample Extrapolation Using Semi-Supervised Manifold Learning (Ose-Ssl): Content-Based Image Retrieval for Prostate Histology Grading, 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro. 2011. p. 734–7.Google Scholar
- 27.Sridhar A, Doyle S, Madabhushi A. Boosted Spectral Embedding (Bose): Applications to Content-Based Image Retrieval of Histopathology, 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro. 2011. p. 1897–900.Google Scholar
- 31.Mehta N, Alomari RS, Chaudhary V. Content Based Sub-Image Retrieval System for High Resolution Pathology Images Using Salient Interest Points. In: Engineering in Medicine and Biology Society. Minneapolis, USA: Annual International Conference of the IEEE; 2009. p. 3719-3722.Google Scholar
- 36.Doyle S, Hwang M, Naik S, Feldman M, Tomaszeweski J, Madabhushi A. Using manifold learning for content-based image retrieval of prostate histopathology. In: MICCAI 2007 Workshop on Content-based Image Retrieval for Biomedical Image Archives: Achievements, Problems, and Prospects. Heidelberg, Germany: Citeseer; 2007. p. 53-62.Google Scholar
- 39.Gleason DF, Mellinge G. Prediction of Prognosis for Prostatic Adenocarcinoma by Combined Histological Grading and Clinical Staging. J Urology. 1974;111(1):58–64.Google Scholar
- 46.Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining: 2006. ACM: 217-226.Google Scholar
- 47.Guyon I, Elisseeff A. An introduction to variable and feature selection. The Journal of Machine Learning Research. 2003;3:1157–82.Google Scholar
- 48.Yu H, Kim S. SVM Tutorial—Classification, Regression and Ranking. In: Handbook of Natural Computing. Heidelberg, Germany: Springer; 2012. p. 479-506.Google Scholar
- 50.Veltri RW, Partin AW, Miller MC. Quantitative nuclear grade (QNG): A new image analysis-based biomarker of clinically relevant nuclear structure alterations. J Cell Biochem. 2000;79:151-57.Google Scholar
- 51.Kavantzas N, Agapitos E, Lazaris AC, Pavlopoulos RM, Sofikitis N, Davaris P. Nuclear/nucleolar morphometry and DNA image cytometry as a combined diagnostic tool in pathology of prostatic carcinoma. J Exp Clin Canc Res. 2001;20(4):537–42.Google Scholar
- 55.Khamis ZI, Sahab ZJ, Byers SW, Sang QXA. Novel Stromal Biomarkers in Human Breast Cancer Tissues Provide Evidence for the More Malignant Phenotype of Estrogen Receptor-Negative Tumors. J Biomed Biotechnol. 2011;2011:1-7.Google Scholar
- 58.Iczkowski KA, Torkko KC, Kotnis GR, Wilson RS, Huang W, Wheeler TM, Abeyta AM, La Rosa FG, Cook S, Werahera PN, et al. Digital Quantification of Five High-Grade Prostate Cancer Patterns, Including the Cribriform Pattern, and Their Association With Adverse Outcome. Am J Clin Pathol. 2011;136(1):98–107.CrossRefPubMedPubMedCentralGoogle Scholar
- 59.Epstein JI, Netto GJ. Biopsy interpretation of the prostate. Philadelphia, USA: Lippincott Williams & Wilkins; 2008.Google Scholar
- 61.Kwak JT, Sinha S, Bhargava R. A New Segmentation Framework for Infrared Spectroscopic Imaging Using Frequent Pattern Mining, 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro. 2011. p. 452–5.Google Scholar
- 63.Järvelin K, Kekäläinen J. IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval: 2000. ACM: 41-48.Google Scholar
- 64.Scheel C, Lommatzsch A, Albayrak S. Performance Measures for Multi-Graded Relevance. In: SPIM. 2011. p. 54–65.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.