Quantitative identification of teratoma tissues formed by human embryonic stem cells with TeratomEye
- First Online:
- Cite this article as:
- Oh, S.K.W., Chua, P., Foon, K.L. et al. Biotechnol Lett (2009) 31: 653. doi:10.1007/s10529-009-9928-1
- 401 Downloads
An automated vision system, TeratomEye, was developed for the identification of three representative tissue types: muscle, gut and neural epithelia which are commonly found in teratomas formed from human embryonic stem cells. Muscle tissue, a common structure was identified with an accuracy of 90.3% with high specificity and sensitivity greater than 90%. Gut epithelia were identified with an accuracy of 87.5% with specificity and sensitivity greater than 80%. Neural epithelia which were the most difficult structures to distinguish gave an accuracy of 47.6%. TeratomEye is therefore useful for the automated identification of differentiated tissues in teratoma sections.
KeywordsHuman embryonic stem cellsIdentificationStem cellsTeratoma
Materials and methods
Human embryonic stem cells were injected into SCID mice and, after 10 weeks, teratomas were harvested and sections were prepared and stained with haematoxylin and eosin (H&E) as described previously (Choo et al. 2005). Images were collected at 10× and 20× objective magnification using a Carl Zeiss AxioVert microscope and examined with the imaging software Axio Vision, Release 4.5. All images were sized at 1300 × 1030 pixels and consolidated into a database of 93 images.
All software was written using Matlab version 7.3. The GETmuscle algorithm for muscle segmentation comprises of three stages. Firstly, the image is converted from RGB color space to L*a*b* color space, where L*, a* and b* refer to the luminosity value, chromaticity value on the red-green axis and chromaticity value on blue-yellow axis respectively. This is followed by K-means (Duda et al. 2000) clustering to classify the image into four distinct components. Muscle tissues are separated from the background and other tissues since they generally have a distinct red/pink coloration enables. The muscle segment is extracted by calculating the Euclidean distance, with the mean a* and b* for each cluster, from a threshold/color-marker. Finally, the identified muscle segment is converted to a binary image to removing trace elements of the background after which the outline is created.
The GETgut algorithm for gut segmentation is affected by a series of morphological operations (Mathworks Inc 1997) which are divided into two stages. The primary aim of the first stage is to eliminate the background and remove or mask other small elements in the image; this helps to reduce the possibility of neural structures being included in the segments. Gut tissues can be identified by their lumen, as they would constitute maxima in the images. This is followed by conversion to a binary image via thresholding to obtain gut markers. A threshold level was determined by trial-and-error with the images from the training set. This results in a shortlist of possible gut structures.
Each candidate in the shortlist is then evaluated individually in the second stage using partial least squared determinant analysis (PLSDA) (Wise et al. 2004). Thirteen sub-images depicting gut epithelium and 25 sub-images depicting non-gut epithelium structures were used to build the PLSDA model. Each sub-image was compressed to a standard size of 60 × 64 pixels, then unfolded to form an array X. Mean centering was used to pre-process X. Y then contains the corresponding class memberships of each row in X, where class 1 denotes gut epithelium and class 2 denotes non-gut. The PLSDA model thus developed is used for classifying new candidates.
The same segmentation process for gut tissues was adopted for neural tissues in the GETneural algorithm with thresholds set at different values. The underlying reason for the differences is the smaller dimensions of the neural structures, thus the structuring element for image reconstruction has to be reduced to create the neural tissue maxima (size 7), and a higher binary threshold in order to isolate the neural tissues (0.8196). The development of the PLSDA model for neural epithelium classification is similar to that for gut. Twenty-one sub-images depicting neural epithelium and 35 sub-images depicting non-neural epithelium structures were used to build the PLSDA model for neural classification. In both the neural and gut identification algorithms where there are two phases of selection and identification, both of these algorithms run automatically and there is no need for user intervention for the second phase.
TeratomEye graphical user interface
Results and discussion
Number of images tested, predictive accuracy, specificity and sensitivity of GETmuscle, GETgut and GETneural algorithms for the identification of muscle, gut and neural tissues in TeratomEye
No. of images testeda
Predictive accuracy (%)
A series of morphological operations was used to isolate gut/neural epithelial structures which were then assigned into grayscale sub-images. PLSDA models were developed to recognize sub-images containing gut/neural epithelia. Fifty test images for gut epithelia gave a predictive accuracy of 87.5% with specificity and sensitivity of greater than 80%. GETgut was able to correctly identify all five gut epithelia as shown in Fig. 3b. Gut structures with large lumen maxima were thus relatively easy to identify.
It was necessary to use 20 training images and 58 test images for identifying neural epithelia with GETneural, which were the most difficult structures to distinguish, giving an accuracy of 47.6%. It is possible that identification using lumen maxima results in some neural epithelium cells, which do not have a distinctive lumen, being excluded during segmentation. Structural diversity of neural structures which are sometime elongated and sometimes more rosette-like further added to the difficulty in prediction. In particular, irregularly shaped neural structures failed to be identified. Examples of the variety of neural structures can be seen in Fig. 3c, in this case only three out of eight neural epithelia with their borders well-defined, were correctly highlighted. Thus we are exploring a wavelet image analysis approach (Misiti et al. 1996) to further improve this program. It may also be necessary to stain with antibodies specifically to highlight neural epithelia prior to identification with TeratomEye to increase the accuracy of this process.
Currently, the classical method of measuring the pluripotency of hESC is by qualitative visualization followed by choosing only one representative image of each tissue from the three germ layers found in teratomas of SCID mice models as shown in Fig. 1a. Examples of such tissues can be found in recent publications characterising hESC pluripotency (International Stem Cell Initiative 2007; Cooke et al. 2006; Przyborski 2005). This traditional method may require expert help from a trained pathologist and provides no quantitative data on the numbers of differentiated tissues found in teratomas. The International Stem Cell Banking Initiative has accepted teratoma formation as a measure of pluripotency (http://www.stemcellforum.org/forum_initiatives/international_stem_cell_banking_initiative.cfm) and it has been suggested recently that teratoma formation may provide a window to study developmental biology (Aleckovic and Simon 2008).
The creation of the TeratomEye program therefore, is aimed at providing embryonic stem cell researchers with a means to more objectively identify and potentially quantify the number of common structures such as muscle, gut and to a lesser extent, neural epithelia. For many stem cell researchers who are not trained as pathologists, this software could provide an automated and easy means for qualitatively identifying the three common structures found in teratoma tissues. As TeratomEye can also count the numbers of muscle, gut and neural structures in tissue sections, potentially different hESC lines could be compared for their propensity to form the three types of differentiated tissues in teratomas. However, TeratomEye has not yet been tested on different hESC lines, but there is anecdotal evidence that some hESC lines are more likely to form cystic structures which are indicative of poorer teratoma formation. Other structures such as bone and cartilage which are also found in teratomas may be added to this program in the future. Potentially, cartilage may be easy to resolve as it has a distinctive round shape with spotted nuclei. While there are other commercially available software which can identify tissue sections for oncology, opthalmology and diabetes research applications, for example that provided by Aperio (www.aperio.com) there is none available for the identification of a variety of complex tissues found in teratoma.
In summary, we have developed an automated vision system TeratomEye, which identified muscle with an accuracy of 90.3% with specificity and sensitivity greater than 90%. Gut epithelia were identified with accuracy of 87.5% with specificity and sensitivity greater than 80%. Neural epithelia which were the most difficult structures to distinguish, gave an accuracy of 47.6%. With further refinements, TeratomEye can be a useful tool for the automated identification of tissues in teratoma sections enabling a quantitative measure of pluripotency of human embryonic stem cells injected into SCID mice models.
We thank the Agency for Science Technology & Research (A*STAR) for generous funding of our research and Dr. Jeremy Crook for critically reviewing this manuscript.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.