Background

Introduction

The vertebrate retina contains two or more subtypes of photoreceptors and dozens of interneuron types, thus being organized for effective operation at different light levels and at different bands of the sunlight’s spectrum. Regional shifts in densities and proportions of the subtypes of photoreceptors and interneurons in the retina have been studied intensively as they are assumed to reflect both the evolution of species and specific adaptations to their lifestyle [1, 2]. Deviations within cell density and mosaic regularity from their normal variability are of specific interest for research involved in developmental control or progressive loss of photoreceptor and other cells due to degenerative diseases and other pathologic processes in the visual system.

Obtaining such data implies the acquisition and analysis of a large number of samples, which is often a time-consuming task requiring persons with appropriate training and experience. Given, moreover, the approximately planar organization of retinal layers, the problem of detection and counting of photoreceptor cells is a promising candidate for at least partial automatization. However, while various papers have developed options for mapping and analysis of retinal cell mosaic data, once they are digitized [37], approaches to full automatization of the time-consuming procedure of actual identification and counting have been rare (we mention [810]). Besides the time and computing power required for acquiring two- and three-dimensional representations of the tissue of interest, the heterogeneity of the tissue itself is still a major challenge. Many parameters such as tissue thickness/transparency, preparatory and manipulatory distortions change across the retinas, and the cells of interest themselves change in size, shape and spacing. Consequently, for reliable detection of the targets and their differentiation from other items such as debris, other cells, local damage or blood vessels highly adaptive algorithms are required.

The present approach focuses on addressing these problems for the segmentation and counting of photoreceptors. We propose an automatic detection and counting procedure, which is based on Rudin-Osher-Fatemi total variation (TV) denoising [11] with subsequent segmentation. As the comparison with manually collected data shows, this method is able to detect the targets at comparable reliability. In the view of the authors, the main advantage of the method consists in the complete traceability of all data processing steps and its reproducibility independently from a particular software platform. Thus a possibility for standardization and direct comparisons of automatic counts for samples obtained within different environments is provided.

In the present study, the method has been implemented as a MATLAB module and has been applied to single frames (and montages). The method, however, is not limited to the processing of two-dimensional data and can be equally implemented for the analysis of three- or multidimensional data stacks. It could as well be integrated into more comprehensive motorized acquisition setups for stereological sampling or complete mapping of retinal populations.

Retinal image data

Retinal samples from the following species have been used: Orangutan (Pongo pygmaeus (L. 1760)), Domestic cat (Felis silvestris catus (L. 1758)), Manul (Felis manul (Pallas 1776)), Eurasian Lynx (Lynx lynx (L. 1758)), Cheetah (Acinonyx jubatus (Schreber 1775)), Jaguar (Panthera onca (L. 1758)), Long-tailed Pangolin (Manis tetradactyla (L. 1766)) and Black-rumped Agouti (Dasyprocta prymnolopha (Wagler 1831)) (see Table 1 and Figure 1).

Table 1 Retinal image data
Figure 1
figure 1

Examples of retinal micrographs used in the experiments. The scale, indicated by the red bar, is 500 μm in (A) and 50 μm in (B)–(F). (A) Orangutan, No. 1, S-cones and vessels labeled. (B) Domestic cat, No. 4 (clip), S-cones labeled. (C) Jaguar, No. 11 (clip), S-cones and horizontal cells labeled. (D) Pangolin, No. 12 (clip), M-/L-cones labeled. (E) Agouti, No. 13, S-cones labeled. (F) Agouti, No. 16, S-cones labeled. For more details, see Table 1.

Most of the eyes were obtained from animals delivered to veterinary pathology from zoos and animal parks; some originate from collaborations for other studies [12, 13]. Post mortem times were between 0.5 and 12 hours. Eyes were enucleated and immersed in 0.01M phosphate buffer saline (PBS, pH 7.4) with 4% paraformaldehyde. Some were treated after being opened with a cut along the corneal limbus for faster penetration of fixative. Retina wholemounts were prepared in PBS and flattened by radial cuts, in order to preserve the horizontal and vertical meridian. Cone photoreceptors sensitive to medium/long wavelengths (M-/L-cones) were labeled in isolated Pangolin retina using JH492 antibody, in all other retinas S-cones were labeled using JH455 (both antibodies provided by J. Nathans [14]). In peripheral Jaguar retina, in addition to S-cones horizontal cells are (unintentionally) co-labeled by JH455. Retinal vessels in Orangutan were labeled with rabbit anti-mouse collagen IV (AbD Serotec, 2150-1470). After incubation in primary antisera overnight for up to 3 days visualization was done using goat anti-rabbit igg-biotin (Sigma, B7389), ExtrAvidin-peroxidase conjugate (Sigma, E2886) and the diaminobenzidine (DAB) reaction. After washing in PBS, retinas were gradually transferred up to 90% glycerol, mounted with photoreceptor-side up on a glass slide, and cover slipped.

Manual counting of labeled cones was done within sampling frames of 150×150μm or 300×300μm using an online-video overlay system consisting of Canvas 5 (ACD Systems, USA) software on a Macintosh computer connected with a Hamamatsu 2400 analog camera attached to a Nikon Eclipse microscope. This system allows dual live view of the specimen: through the microscope’s optics or on the video image overlaid by the sampling frames. Optional change of focus and illumination supports optimized online identification of cells and exclusion of artifacts by position, form, color and other details.

The images (8 or 10 bit grey scale) used for computer-assisted cell counting were obtained by using a Photometrix Camera model CH250/A connected to a Nikon Eclipse E600 microscope (magnification factors 200× to 600×) using QED Imaging Software (QED Imaging Inc., Pittsburgh, PA) on a Macintosh computer. In most cases, a projection image (max density or sum) was composed from a stack of images at relevant focus levels using the public domain NIH Image program (developed at the U.S. National Institutes of Health; available at http://rsb.info.nih.gov/nih-image/ (accessed 11.02. 2013)).

For testing the computer-assisted cell counting method, images were chosen by the following criteria: retinas from different species, differences in quality concerning morphology (different amount of cell debris, broken cones etc.) and contrast between labeled cells and background (see Table 1). Images with additional labeled structures like horizontal cells or blood vessels have been analyzed as well. The respective objects of interest were then manually counted by an experienced coauthor (C.S.), and the results were tabulated for comparison with the outcome from the automatic counting method (see below).

Description of the detection and counting method

Due to the reasons mentioned in the introduction, the immediate segmentation of the retina image data I (0) by intensity thresholding leads in many instances to poor results, see below. Therefore, we carry out two processing steps before segmentation. In a first step, we generate from the original image I (0), cf. Figure 2A, a median-filtered version I ~ using window size m and subtract it from I (0), which results in a considerable removal of brightness fluctuations in the retinal background, cf. Figure 2B. Subsequently, we subject the image I ( 0 ) - I ~ = I ( 1 ) to a Rudin-Osher-Fatemi TV denoising procedure, cf. [11]. This method, representing a well-established standard in mathematical image processing, may be understood as a kind of filtering, which generates a coarsened, cartoon-like version of the input data, cf. Figure 2C. Nevertheless, during this procedure the images of the dyed retinal cells will be conserved as spots. In mathematical terms, TV denoising means to solve an continuous optimization problem, namely

F(x)= Ω ( x ( s ) - I ( 1 ) ( s ) ) 2 ds+ α 2 |x | TV Min!
(1)
Figure 2
figure 2

The counting procedure. (A) Original data (Pangolin, No. 12, upper left part). (B) Output after Step 1 (subtraction of median), grey values divided by factor 1.05. (C) Output of TV denoising after Step 2, grey values divided by factor 1.05. (D) Output of Step 4, superimposed to the original image (counted features dyed in red color). (E) Result of direct segmentation, superimposed to the original image (counted features dyed in red color). (F) Result of manual count (counted cells tagged with green dots).

for an unknown function x(s) : Ω → [ 0, 1 ], which represents the converted (“filtered”) image. For more details, we refer to the Appendix. Let us only remark that the number α>0 within (1) remains fixed from the outset. For the numerical solution of problem (1), surprisingly efficient methods are available by now. In our present study, a recently published solver (by Chambolle/Pock, cf. [15]) has been implemented as a MATLAB subroutine. The segmentation of the output x = (x ij ) of the TV denoising step will be performed now by application of the following rule: After calculating the expectation E(x) and variance V a r(x), we declare all pixels x ij with

x ij <E(x)-c·Var(x)
(2)

as “black enough” to belong to images of photoreceptor cells. Finally, all “black” features consisting of a number of connected adjacent pixels, which is bigger than a given threshold size f, are automatically counted, cf. Figure 2D.

In this method, no more than three parameters remain to be adjusted manually. These are: the window size m for the median filter, the parameter c in (2), which influences the contrast differentiation between photoreceptor cells and the background, and the minimal size f of a connected feature to be recognized as a photoreceptor cell.

Our algorithm can be summarized as follows:

Algorithm 1 Automatic segmentation after TV denoising

Implementation

Implementation as a MATLAB tool

Algorithm 1 has been implemented as a MATLAB tool with a graphical user interface (cf. Figure 3A), which allows for batch processing of multiple images. It has been tested on MATLAB 7.14.0.739 (R2012a) and requires the MATLAB Image Processing Toolbox (documented at http://www.mathworks.com/products/matlab and http://www.mathworks.com/products/image (accessed 11.02.2013)). In the following, details regarding the implementation of the procedure are given. In Step 1, the background homogenization, the median filtering is realized in a straightforward manner by calling the MATLAB procedure medfilt2(image, [m,m], ’symmetric’) which is part of the image processing toolbox. For the TV denoising in Step 2, the primal-dual algorithm from [15] is utilized. It is realized by performing N steps of the iteration

Figure 3
figure 3

Screenshots of the MATLAB software tool. (A) Main graphical user interface. (B) The configuration dialog.

p ( 1 ) , k + 1 p ( 2 ) , k + 1 = P α ( p ( 1 ) , k p ( 2 ) , k + τ d s 1 + x ̄ k s 2 + x ̄ k ) , P α ( v ) ij = α Max ( α , ( v ij ( 1 ) ) 2 + ( v ij ( 2 ) ) 2 ) v ij ( 1 ) v ij ( 2 ) x k + 1 = 1 1 + τ p x k + τ p 1 + τ p I ( 1 ) + s 1 - p ( 1 ) , k + 1 + s 2 - p ( 2 ) , k + 1 x ̄ k + 1 = 2 x k + 1 - x ̄ k
(3)

with step sizes τ p ,τ d  > 0 such that τ p ·τ d  ≤ 0.125 and the forward and backward finite difference operators

s 1 + x ̄ k ij = x ̄ i + 1 , j k - x ̄ ij k if 1 i < n , 0 else , s 2 + x ̄ k ij = x ̄ i , j + 1 k - x ̄ ij k if 1 j < r , 0 else,
(4)
s 1 - p ( 1 ) , k + 1 ij = p 1 j ( 1 ) , k + 1 if i = 1 , p ij ( 1 ) , k + 1 - p i - 1 , j ( 1 ) , k + 1 if 1 < i < n , - p n - 1 , j ( 1 ) , k + 1 if i = n ,
(5)
s 2 - p ( 2 ) , k + 1 ij = p i 1 ( 2 ) , k + 1 if j = 1 , p ij ( 2 ) , k + 1 - p i , j - 1 ( 2 ) , k + 1 if 1 < j < r , - p i , r - 1 ( 2 ) , k + 1 if j = r .
(6)

The initializations are x 0 = x ̄ 0 = I ( 1 ) , p (1),0 = p (2),0 = 0, and the output will be given by ( x ij )=( x ij N ). As default values of the regularization parameter and the number of iterations, we use α=0.05 and N=50. Note that, in principle, other minimization algorithms could be implemented for solving the TV denoising problem. For instance, we mention the generalized TV approach from [16] and the optimal control method described in [17], which is based on the interior-point solver IPOPT [18, 19].

The thresholding in Step 3 is realized with the help of the built-in MATLAB functions mean and var. Finally, for Step 4, the labelling and counting procedure, the function bwlabel is utilized, which is again part of the image processing toolbox. It yields a labelled image in which each connected component is identified by a positive integer. With this information, the identification and counting of those connected components, which comprise at least f pixels, can be easily realized.

Usage

Usually, in order to analyze the topography of items in a retina preparation, a considerable number of image files has to be generated, each showing a segment. Our software tool was especially designed to cope with multiple files showing similar structures. In this situation, it is possible to start with a manual count within one or two typical images in order to calibrate the parameters m, c and f. This can be done by starting the program and adding a single image to the file list. In most cases, the default parameters give a good starting point, hence one can perform a segmentation in order to decide whether one is satisfied with the results. Otherwise, adjust the parameters m, c and f using the configuration dialog (cf. Figure 3B) and try again. Once the parameters have been adjusted, they can be utilized for the analysis of the whole image set. The batch processing feature of the software allows to perform this analysis without further user interaction: simply add the remaining files to the list and start the segmentation procedure. Finally, a report which lists, for each file in the batch, the number and mean density of detected cells as well as their positions, can be automatically generated. As default values for the parameters m, c and f, we encoded m=30, c=2.5 and f=5. Eventually, for convenience and reproducibility, the tool also provides saving and loading of the file list as well as of a list of all parameters.

Results and discussion

The results of automatical counting are documented in Tables 2, 3 and 4. In order to justify the application of a counting procedure, it is essential that manually counted data are reliably matched. Consequently, the counts generated by TV denoising and subsequent segmentation are compared with carefully realized manual counts. The results are listed in Table 2. Furthermore, for every automatic count the number of recognized cells, which have also been marked in the manual counting procedure (Table 3), as well as the number of artifacts (Table 4) were identified. For comparison, we performed automatic counts by direct segmentation without preceding TV denoising using the same parameters c and f as in Algorithm 1.

Table 2 Overall cell counts by different methods
Table 3 Number of correctly recognized cells within the automatic counts
Table 4 Artifacts within the automatic counts produced by the different methods

The quality requirements for automatic counts depend on the specific goal of interest. When assessing general distribution patterns or gradients for certain types of receptors, a relative error of 10% may be acceptable while for the analysis of degeneration or proliferation phenomena or the detection of initial points of density changes, an error of about 5% and less is desirable. Table 2 shows that the latter goal has been realized by the TV/segmentation method in 12 of 18 cases while the error is below 10% in 16 of 18 cases. The mean error amounts to 5.9%. Moreover, Table 3 shows that the method recognizes correctly more than 90% of the manually marked cells in 15 of 18 cases (93.0% in the mean). The number of artifacts contained in the counts, as listed in Table 4, amounts to 8.7% in the mean. Thus, given that the TV/segmentation method makes no use of additional information about the shape of the cells or the variation of their size, these results are quite satisfactory.

As to be expected, our results show that automatic counting by the TV/segmentation method is superior to direct segmentation in every respect. The mean relative error produced by the latter amounts to as much as 23.6%, and the relative error goes below 10% in 8 of 18 cases only. The loss of precision is mostly caused by the fact that, by direct segmentation, a significantly smaller number of marked cells is recognized than by the TV/segmentation method (cf. Figure 2D–F) while the ratio of artifacts produced by both methods is comparable. The superiority of the TV/segmentation method can be seen as well by comparing the standard deviations of the indicators. Let us remark that, during our experiments, we further observed that the TV/segmentation method is even superior to direct segmentation after subtraction of the median (Steps 1, 3 and 4 of Algorithm 1).

As the patterns within the Orangutan retina closely resemble the organization within the human one, similar results are to be expected for human samples.

Let us briefly compare the proposed method with other approaches pursued in the literature. In [8], the authors perform the image processing steps by use of a commercial software package which, unfortunately, comes as a “black box” without documentation of the internally utilized algorithms. In [9, 10, 20, 21], after certain presmoothing/denoising steps, watershed segmentation is employed. Additionally, before segmenting, in [20] an illumination correction is performed while the authors in [10] interpose a contrast enhancement step. A common feature of all approaches is the necessity to select a number of parameters, including the (expected or minimal) feature size, by the experimenter.

Although preprocessing steps, particularly denoising or smoothing of the raw image data, are crucial for the quality of the results of subsequent segmentation, they have not been thoroughly documented in the cited references (if at all), and their dependence on additional, manually tuned parameters remains unclear. Moreover, any denoising method generates artifacts, thus modifying fine structures within the images in a specific way. In contrast to this situation, the preprocessing steps involved in our method (median filtering and TV denoising) are traceable and reproducible, including the manual setting of the single parameter m. For the TV denoising method, the characteristic artifacts are well-investigated (see e. g. the discussion in [16], pp. 519 ff.). In fact, this method has been deliberately selected in order to take advantage of its well-known “cartooning” effect. As a further difference, we segment by intensity thresholding instead of watershedding. The latter approach is well suited for the analysis of large, clumpy cell aggregates while our method is better suited for the detection of single cells to be differentiated from a more or less blotchy background, which may contain additional structures like vessels or different cell types. The segmentation depends on no more than two further parameters c and f, which have to be selected on the base of the observed contrast as well as of a reasonable guess of the feature size. A possible improvement could be the introduction of an additional upper bound for the size of the recognized features, thus reducing and possibly underestimating the number of cells since larger aggregates formed of merged cell images will then be excluded.

The reliability of our method, when evaluated by the mean relative error of automatic counting in relation to manual counts (5.9% as documented above), fits well within the range of errors documented in the cited references: [9], p. 1969, Figure one: ≤ 10% in 11 of 23 cases; [10], p. 641, Figure two: ≤ 5% in 31 of 40 cases; [20], p. R100.7, Figure two(A): 6% and 17%; [21], p. 589, Figure three: ≤ 10% in roughly half of the cases.

Due to its traceability, the TV denoising method offers the further advantage of easy reimplementation. On the one hand, this is possible with respect to the use of non-proprietary software, on the other hand, the method may be carried over (after optimization of the code as necessary) to the analysis of three- or multidimensional data stacks. Concerning the runtime behaviour, the analysis of an 1024 ×1024-pixel image takes typically less than 40 sec (on a Mini-PC equipped with four processors Intel(R)Core(TM)i3 CPU M380 @ 2.53 GHz) where approximately half of the time is consumed by the median filtering procedure. No particular attempts for tuning have been made.

The limitations of the proposed method are exemplified in Figure 4. It shows typical errors within the automatic counting, which would be avoided by a human examiner. Adjacent cells are merged and counted as a single feature (Figure 4A and B). Another typical error occurs when a single photoreceptor cell does not lie exactly in the image plane. The resulting blurred image, may be “broken” into two or more features, resulting in double or multiple counts (Figure 4C and D). Also, cell debris and background spots may not be recognized as such (Figure 4E and F). Consequently, for heavily contaminated samples such as Nos. 16 and 18, a reduction of the quality of the automatic count is to be expected (in Table 2, No. 16 is the outlier). If background structures such as horizontal cells or vessels are present in the samples, it may happen that parts of them will be counted for cells as well (Figure 4G–J). Normally, however, the TV denoising method is well able to differentiate horizontal cells or vessels and even to recognize target cells, which are positioned immediately above a horizontal cell or a vessel. This is exemplarily shown in Figure 4G–J as well.

Figure 4
figure 4

Typical errors in automatic counting. Sample clips with manually counted photoreceptors (tagged with green or blue dots, respectively) (A,C,E,G,I) will be compared with automatic counts obtained with Algorithm 1 (counted features dyed in red) (B,D,F,H,J). In the latter, the arrows point to occasional artifactual errors. (A,B) Lynx, No. 8: Adjacent cells have been merged and counted as a single one. (C,D) Pangolin, No. 12: Single cells are counted twice. (E,F) Agouti, No. 16: Background features (clustered melanin granules) and outer segment fragments are counted as cells. (G,H) Jaguar, No. 11: Parts of horizontal cells in the background are counted as S-cones. Note that, however, S-cones and horizontal cells are generally well differentiated, and even S-cones positioned directly above a horizontal cell are correctly recognized (middle left). (I,J) Orangutan, No. 2: Parts of vessels are counted as S-cones. In general, however, S-cones and vessels are well differentiated, and even S-cones positioned directly above a vessel are correctly recognized (left and middle part of the clip).

Conclusions

In conclusion, the presented approach provides a reproducible method for the segmentation of labeled cellular objects, in particular retinal photoreceptors from grey scale micrographs. Relatively simple outlines and little overlap between the targets in the projection plane are clearly important preconditions for the current implementation, but within these limitations the procedure is shown to deliver robust results well comparable to those of manual counts by experienced observers. The method is well documented and can be transferred to other tasks including those with more automatization for both image acquisition and further refinements such as from 2D- or 3D-object recognition criteria.

Availability and requirements

The software, to which applies the GNU General Public License v.2, is stored at the location http://www.meduniwien.ac.at/counttool/ within the archive cacount_tool_2013_02_08.zip. Its execution requires a current version of MATLAB (e.g., v. 7.14.0.739 (R2012a) and higher) together with the MATLAB Image Processing Toolbox (e.g., v. 8.0 (R2012a) and higher). Within this environment, the tool runs platform-independently. The code has been written in MATLAB; its listing will be provided as Additional file 1. To use the tool, deflate the zip-archive, include its location as well as location of the image data into the MATLAB path and type the command main. Further details on the usage can be found in the accompanying online documentation.

Appendix

Rudin-Osher-Fatemi TV denoising: mathematical background

The Rudin-Osher-Fatemi TV denoising procedure fits into a framework where greyscale images will be modeled by “continuous” rather than by “discrete” mathematical objects. More precisely, a greyscale image will be identified with a function x(s):ΩR, which is at least bounded and measurable. The commonly used model for capturing an original scene is the equation

I(s)=S ( x ( s ) ) +N(s)
(7)

where x(s):ΩR is an “ideal” image of the scene, is an operator encoding the known systematical errors of the imaging device and N(s) is a noise term, cf. [22], pp. 60 ff., and [23], pp. 7 ff. Due to the presence of N(s), the error in the formal solution of (7),

x(s)= S - 1 ( I ( s ) ) - S - 1 ( N ( s ) ) ,
(8)

cannot be controlled by the possible deviations within the captured data I(s) alone. In mathematical terms, the reconstruction of the “denoised” or “smoothed” image x(s) via (8) thus represents an “ill-posed problem”, which needs for regularization. In large parts of the literature, consequently, image denoising is performed by minimizing functionals of the type

F(x)= Ω ( I ( s ) - S ( x ( s ) ) ) 2 ds+ α 2 ·R ( x )
(9)

over suitable function spaces, e. g. spaces of Sobolev functions or functions of bounded variation. The first member within F, the data fidelity term, aims for a least-square approximation of the captured data I(s) while the second one, the so-called regularization term, has been purposely introduced in order to ensure existence as well as uniqueness of a minimizing solution x(s). The influence of the second term is weighted by a number α > 0, which is called the regularization parameter. Note that the regularization term depends on the gradient ∇x(s), thus favorizing a certain edge structure within the minimizing solution x(s). A rigorous mathematical development of this idea relies on a closer analysis of the Euler-Lagrange equation, which must be satisfied as a second-order PDE by the minimizers of (9) as a necessary condition, cf. [22], pp. 64–66.

In the Rudin-Osher-Fatemi TV denoising problem, the operator within the data fidelity term is the identity S ( x ( s ) ) =x(s). The regularization term, which favors a minimizing solution x(s) with a fairly accentuated edge structure, is taken as R ( x ) =|x | TV with

| x | TV = sup Ω x ( s ) ψ 1 s 1 ( s ) + ψ 2 s 2 ( s ) × ds ψ 1 ( s ) , ψ 2 ( s ) : Ω R are continuously differentiable test functions, taking zero boundary values and satisfying ψ 1 ( s ) 2 + ψ 2 ( s ) 2 1 everywhere on Ω ,
(10)

and the minimization is performed over all functions of bounded variation on Ω (for more details, cf. [24], pp. 355 ff., and [25], pp. 175 ff.). Loosely speaking, | x | T V is a measure for the oscillation of the function x(s) and depends on the (generalized) derivatives of x(s) through a duality expression. A closer mathematical analysis reveals that the solution x(s) of the Rudin-Osher-Fatemi TV denoising problem largely preserves the edge structure of the input data I(s). After a suitable discretization, the particular structure of | x | T V allows for the application of highly efficient primal-dual numerical solvers like the one utilized in our present study, cf. again [25], pp. 179 ff.