Real-time interactive data mining for chemical imaging information: application to automated histopathology
- First Online:
Vibrational spectroscopic imaging is now used in several fields to acquire molecular information from microscopically heterogeneous systems. Recent advances have led to promising applications in tissue analysis for cancer research, where chemical information can be used to identify cell types and disease. However, recorded spectra are affected by the morphology of the tissue sample, making identification of chemical structures difficult.
Extracting features that can be used to classify tissue is a cumbersome manual process which limits this technology from wide applicability. In this paper, we describe a method for interactive data mining of spectral features using GPU-based manipulation of the spectral distribution.
This allows researchers to quickly identify chemical features corresponding to cell type. These features are then applied to tissue samples in order to visualize the chemical composition of the tissue without the use of chemical stains.
Vibrational spectroscopic imaging, or chemical imaging, data is composed of a series of absorption or scattering measurements taken across the electromagnetic spectrum. Materials exhibit a characteristic spectral signature that is indicative of their molecular composition. The speed and versatility of vibrational techniques offer the most potential for label-free microscopy by providing micron-scale resolution and significant molecular detail.
In this paper we focus on mid-infrared spectroscopic imaging , which is a form of vibrational spectroscopy with a data rate that is currently viable for clinical applications. We will also show that our methods are applicable to other techniques, such as Raman spectroscopy, which show promise for future in vivo imaging. Recent advances in optical detector technology allow the rapid acquisition of spectral signals that are spatially specific, producing multidimensional images that can be extensive in size (several tens of gigabytes). These techniques have shown promise in biomedicine for cell type classification [2, 3] and cancer analysis . However, the interpretation of a spectral signature is a complex task, requiring a significant amount of pre-processing before identifying spectral features that correspond to chemical information.
While the acquisition of data is relatively rapid, there are limited options at present to assist researchers in visualizing data. In this paper, we propose a method for interactively exploring the chemical composition of a tissue sample. The proposed software allows a user to quickly identify spectral features corresponding to specific tissue types within a sample. The results are then applied to other samples in order to identify tissue types without the use of histological labels. This allows us to overcome several disadvantages of current histology methods by using quantitative information that is collected non-destructively. This makes our methods repeatable and provides an array of useful quantitative information that can be used by a pathologist to aid in diagnosis.
In Section ‘Mid-infrared spectroscopy’, we provide an overview of mid-infrared spectroscopic imaging. Section ‘Morphological effects’ describes the coupling between tissue morphology and chemistry, which makes the characterization of tissue difficult. Our method and implementation details are described in Section ‘Methods’ and Section ‘Visualization’. Section ‘Results and discussion’ provides validation using chemical phantoms and demonstrates our results on breast cancer biopsies.
Our SpecVis software is open-source and 32-bit Windows binaries available online (http://www.chemimage.illinois.edu/software).
where A is the absorbance, I is detected light through the sample, and I0 is the light detected without the sample present. Molecular bonds are identified by their characteristic pattern where is non-zero.
Recent advances in FT-IR spectrometry allow the use of focal plane arrays (FPAs) for acquiring spatially-resolved mid-infrared absorbance spectra at high speeds . This allows the facile collection of hyperspectral images, where each pixel provides the corresponding spatially resolved absorption as a function of wavenumber.
The increasing availability of imaging systems now permit the analyses of non-homogeneous samples, where structural changes accompany changes in chemical composition. However, these samples introduce additional spectral characteristics due to the sample morphology. These effects can dramatically affect the ability to differentiate between chemical constituents. Morphological characteristics can affect the spectrum in two ways: (a) increased absorbance as a function of density and thickness, and (b) scattering affects. Changes in tissue thickness and density result in well-understood spectral changes characterized by the Beer-Lambert Law, while scattering is significantly more difficult to characterize.
Light transmitted through non-homogeneous samples is subject to scattering as it transitions between material interfaces exhibiting different indices of refraction, . These effects are prominent in mid-infrared spectroscopy, where the re-direction of light is indistinguishable from absorbance [7-9], resulting in wavelength-dependent changes in the absorbance spectrum that make the determination of the actual tissue absorbance, , extremely difficult . The study of scattering effects in FT-IR imaging is an active area of research [8-10]. Current work suggests that a significant portion of scattering through tissue samples is the result of interaction with microscopic structures, such as cell bodies and nuclei . Empirical methods have been proposed for correcting spectra based on Mie theory , however these techniques are time consuming and do not provide interactive feedback. In addition, Mie theory is not generally applicable to scattering effects and the prior information required for these estimations is not always available. While computational methods have been proposed for eliminating scattering effects for known structures such as spheres , no automated techniques have been proposed that compensate for spectral features introduced by elastic scattering, which accounts for a large amount of variance in mid-infrared spectroscopic images .
However, the process requires manual specification of points that represent b as well as either detailed knowledge of the chemical compound under consideration or extensive exploration of the data set. This makes the selection of spectral features representing chemical information extremely time-consuming, particularly when the chemical composition of the tissue sample is unknown.
Here, is the scattering-corrected absorption as a function of the wavenumber , is the absorption coefficient, ℓ is the path length through the specimen (thickness), and N is the molecular density. The absorption coefficient κ is the desired material property that defines the chemical composition of the specimen.
While a single reference feature is useful, it is not generally applicable to samples containing several unique components, therefore several references are used for classifying multiple chemical species.
In this section, we describe our methods for interactive exploration of hyperspectral data. The chemical composition of a sample is identified by finding features, such as absorbance peaks, that can be used as references and to differentiate individual compounds. For example, the biological compound collagen exhibits several absorbance peaks between 1235cm −1 and 1265cm −1. However, these peaks are difficult to distinguish in a raw spectrum, due to morphological characteristics of the sample (Section ‘Morphological effects’).
where is the location of a spectral feature used for normalization and is an estimate of the spectral contribution due to scattering.
For an unknown sample, both the normalization feature and baseline function must be estimated. In addition, if the sample is composed of several unique chemical constituents, multiple normalization features and baseline functions must be utilized for classification. This is an extremely difficult problem to solve computationally, requiring detailed knowledge of both the sample and the relationship between spectral bands and molecular characteristics.
The SpecVis software addresses this problem by allowing interactive visualization of the data set. Our software allows a spectroscopist to specify the baseline function b (Section ‘Scattering’), dynamically select reference features (Section ‘Beer-Lambert law’), and visualize the chemical characteristics. This is done by allowing the user to specify changes in the distribution of spectra, as reflected using a dynamic 2D histogram. The user then selects chemical features in this histogram, exploring the results through an interactive 2D visualization of the tissue sample. Computing the changing distribution of spectra as well as visualization of user-selected features is computationally expensive, making interactive feature selection impossible on current CPU-based desktop systems. We therefore demonstrate that this problem can be well-formulated for implementation on programmable graphics hardware. In the following sections, we first describe the types of metrics used to identify features in hyperspectral images. We then discuss how spectra in an image are adjusted to remove scattering artifacts and other distortions, based on a histogram describing the distribution of spectral information.
We first identify measures of spectral features, or metrics, that quantify, for example, characteristics in forensic spectroscopy  and differentiate cell types in biological samples by acting as features for more complex classification systems [2, 17]. The user specifies parameters for these measurements in the spectral domain. Once specified, a metric is immediately applied to all pixels in the image. The metrics described in this paper include the peak height, which is the most basic measure of chemical composition, as well as peak integral and centroid. While these are not all of the possible types of metrics, we limit our study to these as this is an active area of research and they allow us to identify several important chemical differences in biological tissue samples. Additional types of metrics can be readily added as our approach is general. For each metric, the function S(x, y, ) represents a generic spectral value, where (x, y) is the spatial position and is the wavenumber. In the case of absorbance measurements, S can be either the raw or corrected absorbance spectrum.
This metric is highly sensitive to peak shifts. This is particularly noticeable for the Amide I peak at 1650cm −1, which is narrow and composed of multiple chemical contributions that make it prone to shift. Absorbance is particularly useful for detecting broad peaks and the density of well-known and localized molecular bonds.
This metric provides a robust measure of absorbance within a spectral region and is relatively insensitive to noise and peak shifts. This robustness makes it an ideal candidate for use as a spectral reference. It is insensitive, however, to subtle spectral changes that are often found in biological tissue samples.
The resulting value is the wavenumber for the center of mass in the specified region. This metric is useful for measuring shifts in single peak positions as well as the distribution of multiple species’ absorption among multiple neighboring peaks. This metric is dependent on the distribution of absorption and therefore does not require a reference. However, it is incapable of detecting the height and is of limited utility for peaks that do not shift as a function of spatial position.
The first step in visualization is to display the distribution of spectra in the image using a 2D histogram. This allows the user to identify chemical compounds by approximately removing morphological effects and selecting metrics. Chemical features are selected using the joint histogram of wavenumber and absorbance for all spectra in the image. The user selects features in this domain that correspond to the structural and chemical components of the tissue sample. However, the data processing required to separate structural and chemical features is currently time consuming and the results are difficult to visualize. We implement a dynamic approach, which allows the user to explore the data set via interactive feedback in both the spatial and spectral domains, which facilitates meaningful feature selection. Our framework allows the user to interactively adjust points for baseline correction and select reference features. Data processing is performed dynamically on the GPU using CUDA  and provides interactive feedback for multi-gigabyte data sets. The unprocessed data set is stored on the GPU as a three-dimensional texture map represented as 32-bit floating point values. The user explores the data in two ways: (a) the insertion of baseline points to build the scattering approximation and (b) the selection of a reference metric. The histogram is computed interactively and dynamically as the user changes parameters for the baseline and reference. This is done using a CUDA device kernel. A block of threads is assigned to process each band (wavenumber). We specify a 2D block size of threads, where w is the maximum warp size supported by the GPU. Therefore, each block consists of one warp that executes data in a single-instruction multiple-data (SIMD) fashion. Each block b is responsible for computing the complete one-dimensional histogram for a single band . The threads within each block are responsible for evaluating a spatially coherent square of pixels, initially positioned at the upper left-hand corner of the image. Each thread then iterates across the image to the lower-right corner at intervals of . Note that all threads within the block are part of the same (SIMD) warp, therefore they will be spatially coherent at each iteration across the spatial domain of the image. This spatial coherence is used to perform faster fetches using texture units, which is particularly useful for fast evaluation on GPUs with compute capability lower than 2.0.
Computing the processed spectrum at each spatial location within a band requires a maximum of four memory fetches: (1) the raw data value at , (2) the raw values at the baseline points and , and (3) the reference value r(x,y). Computing the reference value Ar(x,y) also requires its neighboring baseline points at and . However, an image of the reference values at each spatial location is pre-computed whenever the reference is changed.
As each thread traverses the image at , the resulting histogram is accumulated in shared memory allocated for the block. Accurate computation of the histogram, however, requires the use of atomic operations for incrementing the counters for each bin. This can reduce performance when multiple threads encounter similar absorbance values - a common occurance, given the close spatial proximity of all threads in the image. In addition, the SIMD execution within a warp will cause the entire block to pause while atomic adds are resolved. We address this issue by allocating a separate shared histogram for each thread and summing the results when the entire image has been processed. The resulting histogram is then displayed using a log-scale intensity filter.
The chemical composition of the tissue sample is visualized by building a 2D image based on user-selected metrics. Each metric is assigned a color value, where the intensity is based on the value of the metric. The metrics are then evaluated for each pixel, and the resulting colors are combined to create a spatially-resolved chemical visualization of the sample. This color-mapping technique is similar to a transfer function, which is commonly used in volumetric visualization. We first provide an overview of this technique and describe how it is applied to our algorithm.
Transfer functions are used in volume visualization to assign color and opacity values to pixels based on features defined in a separate domain [19, 20]. These techniques generally use spatial features such as gradient magnitude , curvature , size , and orientation  to assign color values. While these techniques have been applied to gigabyte-scale data sets [25, 26], they are difficult to generalize to spectroscopic images since each pixel represents a spatially-resolved absorbance function. The principle behind using spectra to apply transfer functions has been previously explored through contour spectra , where spatial characteristics are used to highlight geometric features in the data set. More flexible methods using spatially local statistics have also been proposed . Both contour spectra and spatial statistics may be applicable for extracting spatial characteristics in heterogeneous samples. In this paper, we focus on visualizing chemical features, and therefore each pixel is considered to be an independent component with a corresponding chemical signature.
Very few techniques currently exist for visualizing spectroscopic images. Li et al.  propose a technique for visualizing astrophysical data imaged at various wavelengths. They propose the use of transfer functions for defining opacity when rendering the image stack volumetrically. However, the number of bands in a single image is small and the samples are discontinuous, which limits the amount of chemical information in the data set. More recent work demonstrates a visualization framework for near-infrared spectroscopic images of historical documents sampled regularly in the spectral domain . This technique allows relighting of images, interactive selection of individual bands, and metrics for evaluating similarity between user-specified spectra. However, similarity is difficult to measure for vibrational spectra because of the coupled structural and chemical contributions to the spectrum. Unsupervised techniques, such as principal component analysis (PCA) and vertex component analysis (VCA) , have been proposed to perform spectral unmixing. However, these techniques assume homogeneous samples and also make specific assumptions about the data, such as the existence of orthogonal chemical signatures, which are not generally applicable. Comprehensive Data Maps (CDM) have been proposed to examine data , but do not provide imaging visualizations.
rather than the integral. This allows the widget to be placed in the vicinity of the selected spectra.
Results and discussion
Rendering time is dominated by the computation of the histogram as the user changes processing and visualization parameters. However, the frame rate is interactive for our largest sample image (700x700x491, ≈1GB), requiring <32ms for complete evaluation of the histogram (<147ms with atomic writes to shared memory) using a GeForce GTX 580 with 1.5GB of global memory. We use the developed software to identify characteristics in mid-infrared spectroscopic images. We first demonstrate the visualization of structural and chemical components from images of synthetic polymer targets that are often used to assess image quality. We then show how these techniques can be used to extract similar information from mid-infrared images of tissue biopsies, including the visualization of chemical components useful for breast cancer diagnosis. Finally, we demonstrate that these techniques can be extended to other forms of spectroscopy by visualizing Raman images of tissue samples.
Infrared images of tissue
The ability to acquire spatially resolved information using hyperspectral imaging is strongly emerging as a promising avenue across a variety of areas, especially biomedical analyses. Vibrational spectroscopy has the potential to provide quantitative histology for disease diagnosis, without the use of chemical stains in an objective and automated manner. Recent work has shown that mid-infrared spectroscopic imaging provides sufficient chemical detail for differentiating between tissue types. Researchers have reported very high accuracy after rigorous scattering correction  and classification algorithms [2, 3] are applied. These studies demonstrate the potential for applying vibrational spectroscopy to cancer diagnosis.
However, computational methods must be developed to visualize the data quickly and reliably. The size and complexity of the data contained in hyperspectral images makes this a difficult problem, requiring the separation of physical and chemical characteristics from underlying spectra. In this paper, we demonstrate an interactive method for building transfer functions for visualizing hyperspectral images. Our method allows users to dynamically assimilate large collections of spectra using algorithms designed to separate structural and chemical features in real time. These features are then selected using transfer functions which allow the visualization of these characteristics in a spatial image of the sample. To our knowledge, the reported methods are the first to allow interactive processing and visualization of hyperspectral images at this level of spectral and structural detail, and we have demonstrated their usefulness in biological samples. Applying these features to tissue provides a method for label-free identification of tissue types that is quantitative, non-destructive, and can be performed in a time frame that is clinically viable.
Future directions include applying these techniques to three-dimensional samples, which can be acquired using Raman spectroscopy in combination with a confocal microscope, for example.
Our proposed technique also has several advantages over unsupervised methods, such as PCA and VCA. In particular, the metrics that we use for visualization have finite support, requiring only a narrow band of information within the spectrum. Once useful metrics are identified, the number of collected bands can then be reduced, allowing faster imaging, for example using narrow-band filters for IR . Finally, since the separation of structural and chemical characteristics from an IR image is so difficult, many algorithms for the classification of hyperspectral images rely on the use of user-defined metrics . Our method may provide an efficient method for selecting features for use in more complex classifiers for IR-based clinical histology.
This work was funded in part by the Beckman Institute for Advanced Science and Technology, the National Institutes of Health (NIH) via grant number 1R01CA138882, the National Science Foundation (NSF) Division of Chemistry (CHE) via 0957849, and the Congressionally Directed Medical Research Program Postdoctoral Fellowship via BC101112.
- 3.Kallenbach-Thieltges A, Großerüchkamp F, Mosig A, Diem M, Tannapfel A, Gerwert K: Immunohistochemistry, histopathology and infrared spectral histopathology of colon cancer tissue sections. J Biophotonics. 2013, 6: 88-100. 10.1002/jbio.201200132. [http://onlinelibrary.wiley.com/doi/10.1002/jbio.201200132/abstract]CrossRefPubMedGoogle Scholar
- 7.Bhargava R, Wang SQ, Koenig JL: FT-IR Imaging of the interface in multicomponent systems using optical effects induced by differences in refractive index. Appl Spectrosc. 1998, 52 (3): 323-328. 10.1366/0003702981943653. [http://www.opticsinfobase.org/as/abstract.cfm?URI=as-52-3-323]CrossRefGoogle Scholar
- 10.Reddy R, Mayerich D, Walsh M, Schulmerich M, Carney PS, Bhargava R: Optimizing the design of FT-IR spectroscopic imaging instruments to obtain increased spatial resolution of chemical species. IEEE Int Symp Biomed Imaging. 2012, : , 354-357.Google Scholar
- 11.Bassan P, Byrne HJ, Bonnier F, Lee J, Dumas P, Gardner P: Resonant Mie scattering in infrared spectroscopy of biological materials - understanding the ‘dispersion artefact’. Analyst. 2009, 134 (8): 1586-1593. 10.1039/b904808a. [http://pubs.rsc.org/en/Content/ArticleLanding/2009/652âˆ-%20AN/B904808A]CrossRefPubMedGoogle Scholar
- 12.Bassan P, Kohler A, Martens H, Lee J, Jackson E, Lockyer N, Dumas P, Brown M, Clarke N, Gardner P: RMieS-EMSC correction for infrared spectra of biological cells: Extension using full Mie theory and GPU computing. J Biophotonics. 2010, 3 (8-9): 609-620. 10.1002/jbio.201000036.CrossRefPubMedGoogle Scholar
- 16.Deming SN, Michotte Y, Massart DL, Kaufman L, Vandeginste BGM: Chemometrics: A Textbook. 1988, : Elsevier ScienceGoogle Scholar
- 17.Levin IW, Bhargava R: Fourier transform infrared vibrational spectroscopic imaging: Integrating microscopy and molecular recognition. Ann Rev Phys Chem. 2005, 56: 429-474. 10.1146/annurev.physchem.56.092503.141205. [http://www.annualreviews.org/doi/abs/10.1146/annurev.physchem.56.092503.141205] [PMID: 15796707]CrossRefGoogle Scholar
- 18.Nvidia C: Compute unified device architecture programming guide. NVIDIA: Santa Clara, CA. 2007, 83: 129-Google Scholar
- 19.Drebin RA, Carpenter L, Hanrahan P: Volume rendering. Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, Volume 22. 1988, 65-74.Google Scholar
- 21.Kniss J, Kindlmann G, Hansen C: Interactive volume rendering using multi-dimensional transfer functions and direct manipulation widgets. IEEE Conference on Visualization. 2001, : , 255-562.Google Scholar
- 22.Kindlmann G, Whitaker R, Tasdizen T, Moller T: Curvature-based transfer functions for direct volume rendering: methods and applications. IEEE Conference on Visualization. 2003, : , 513-520.Google Scholar
- 25.Crassin C, Neyret F, Lefebvre S, Eisemann E: GigaVoxels : Ray-guided streaming for efficient and detailed voxel rendering. ACM Symposium on Interactive 3D Graphics and Games (I3D). 2009, : , 15-22.Google Scholar
- 27.Bajaj C, Pascucci V, Schikore D: The contour spectrum. IEEE Conference on Visualization. 1997, : , 167-173.Google Scholar
- 28.Tenginakai S, Lee J, Machiraju R: Salient iso-surface detection with model-independent statistical signatures. Proceedings of the 12th Annual Conference on Visualization. 2001, : , 231-238. [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=964516]Google Scholar
- 31.Nascimento J, Dias J: Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geoscience Remote Sensing. 2005, 43 (4): 898-910. [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1411995]CrossRefGoogle Scholar
- 33.Beck AH, Sangoi AR, Leung S, Marinelli RJ, Nielsen TO, van de Vijver MJ, West RB, van de Rijn M, Koller D: Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. 2011, 3 (108): 108-113. [http://stm.sciencemag.org/content/3/108/108ra113.abstract]Google Scholar
- 35.Schulmerich MV, Finney WF, Fredricks RA, Morris MD: Subsurface Raman spectroscopy and mapping using a globally illuminated non-confocal fiber-optic array probe in the presence of Raman photon migration. Appl Spectrosc. 2006, 60 (2): 109-114. 10.1366/000370206776023340. [http://as.osa.org/abstract.cfm?URI=as-60--2-109]CrossRefPubMedGoogle Scholar
- 36.Schulmerich MV, Dooley KA, Morris MD, Vanasse TM, Goldstein SA: Transcutaneous fiber optic Raman spectroscopy of bone using annular illumination and a circular array of collection fibers. J Biomed Optics. 2006, 11 (6): 060502-10.1117/1.2400233. [http://spiedigitallibrary.org/jbo/resource/1/jbopfo/v11/i6/p060502_s1]CrossRefGoogle Scholar
- 37.Schulmerich MV, Cole JH, Kreider JM, Esmonde-White F, Dooley KA, Goldstein SA, Morris MD: Transcutaneous Raman spectroscopy of murine bone In Vivo. Appl Spectrosc. 2009, 63 (3): 286-295. 10.1366/000370209787599013. [http://as.osa.org/abstract.cfm?URI=as-63--3-286]PubMedCentralCrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.