Keywords

1 Introduction

This paper describes an application of model-based methods for the algorithmic correction of segmentation errors in digitised histological images. This is a real-world application where qualitative spatial reasoning (QSR) and constraint-satisfaction programming methods have been integrated with classical image processing methods to develop context-based histological imaging algorithms.Footnote 1 The context here arises from: (i) making an ontological stand whereby regions rather than pixels in digitised images are deemed to be the main carriers of histological content, and (ii) highlighting the importance of and explicitly representing topological (and in particular relational) information encoded in digitised histological images. The topological analysis and constraints are provided by the spatial logic Discrete Meterotopology (DM) [1, 2], which we use to augment the operations of classical Mathematical Morphology (MM) by explicitly encoding a set of binary relations such as contact, overlap and the part-whole relation on pairs of regions. These mereotopological relations are used both to model the domain and to guide algorithms for correcting segmentation results so that they conform to the requirements for a valid histological model.

In three-dimensional reality a cell consists of a nucleus surrounded by a body of cytoplasm; the two are in contact but do not overlap, the nucleus exactly filling a cavity within the cytoplasm. In a correctly-formed H&E (haematoxylin and eosin)-stained two-dimensional image, on the other hand, based on the differential staining of cellular components, the cytoplasm region appears as a simply connected whole, lacking a cavity, and the nucleus forms a proper part of this (see the idealised image in Fig. 1(a)). In practice, however, cell and nuclear segmentations are most often achieved independently of each other, which can result in imperfect relations, where, e.g., the nucleus partially overlaps its cytoplasm, as in Fig. 1(b). Such errors can often be corrected by a process of resegmentation, whereby one or both of the cell component images are manipulated using morphological or other operators so that the expected spatial relation between them holds. In Fig. 1(c) this is accomplished by eroding the nucleus; in (d), by dilating the cytoplasm. If the changes required to achieve this are too extreme, then the presence of an imaging artefact or other anomaly may be indicated.

Fig. 1.
figure 1

Idealised image of an H&E-stained nucleated cell, with pink-stained cytoplasm and blue-stained nucleus; overlap between cytoplasm and nucleus is shown in magenta. In (a) the nucleus forms a proper part of the cytoplasm, as expected; (b) shows an anomalous image in which the nucleus partially overlaps the cytoplasm; (c) and (d) show possible morphological corrections of (b), with dotted lines marking the original extent of nucleus and cytoplasm respectively. (Colour figure online)

Our approach to resegmentation uses a set of directed graphs defined on region-region relations. Each graph details transformations to the relations arising from application of specified changes to their relata. These changes use discrete topological and set-theoretic operators defined on regions and mapped to their MM equivalents. The graphs enable transitions between relations (e.g., from partial overlap to proper part) to be implemented as sequences of operations on the regions themselves. This correspondence between relations and operations underpins the model-based correction of segmentation described in this paper.

This paper builds on a series of papers that apply DM to the interpretation and segmentation of histological images [3,4,5,6]. While the mereotopological interpretation of digitised images is not new (see e.g., [7]), what is novel is its role in enabling systematic context-based manipulation of regions and their relations in quantitative histological image processing. DM first appeared in [1], which arose out of the mereotopological theory RCC and its well-known eight-element relation set RCC8 [8]. The GRCC (Generalised Region Connection Calculus) [9] offers another way to model discrete domains using mereotopology. As a formalism for modelling cellular processes, mereotopology was used to model phagocytosis and exocytosis in [10]. A motivating example of DM used in a constraint-based graph traversal problem is given in [11]. In general, though, while RCC8 has been used extensively in the development of efficient constraint-based algorithms (for deciding the consistency of a set of constraints) the algorithms are typically restricted to operations on symbolic state-state models and not grounded in interpreted segmented digitised images as done here. For a good introduction to QSR literature and its methods, see [12].

This paper extends the framework that was originally reported in [6]. Specifically, the four directed graphs originally described in [6] have been extended to include a set of eight set-theoretical as well as topological operations on regions. The method is generic and does not favour any one particular image segmentation method over another. All that is assumed is that the segmentation of images into regions forming relations with other regions can be ultimately represented as binary regions, e.g. as masks. These relations can be used to test whether the segmented regions conform to a histological model, or not. Where they fail to conform to the segmentation model they may be either be rejected completely, or corrected by applying one of several operations on the regions in question. For this reason, we do not necessarily require a set of gold-standard segmented images with which to compare and analyse the results (though of course this could be done). We discuss this further below. In its place we consider the set of topological segmentation solutions satisfying our assumed histological model and work with those. Reasons of space and the scope of this paper mean we cannot adequately compare and contrast other segmentation methods and the various validation methods used in the literature here, though we acknowledge there is a large body of literature on the subject, e.g. [13, 14]. However, none of these frameworks adopt the topological framework assumed here, nor do they address systematic methods of resegmentation to bring the results of initial segmentation into line with the expectations of histological theory, which is the main focus of the present paper.

2 Discrete Mereotopology (DM)

Of the different variants of DM that have been proposed, the version we adopt is the one described in [1, 6]. The domain is a set of possibly empty regions which are defined as sets of pixels. We denote regions by lower-case letters (\(x, y, \ldots \)); individual pixels are denoted \(\hat{x}, \hat{y}, \ldots \). Set inclusion is defined in the standard way, the part-whole relation (P) defined as inclusion restricted to non-null regions, and overlap (O) between regions restricted to regions sharing a part in common. Adjacency between pixels is axiomatised as a reflexive, symmetric relation. This is used to define the contact relation (C) between regions. The set of all pixels is defined as the universal region u and the null set as \(\emptyset \). Singleton pixel-sized regions are treated as atoms. The neighbourhood \(N(\hat{x})\) of pixel \(\hat{x}\) is defined as the set of pixels in u that are adjacent to \(\hat{x}\). In our application domain u is a rectangular pixel array, and \(N(\hat{x})\) a 3 \(\times \) 3 pixel array centred on \(\hat{x}\). Implemented in MM, the function \(N(\hat{x})\) maps to a morphological 3 \(\times \) 3-pixel, 8-connected structuring element which we now assume by default.

As is common practice in QSR when developing these spatial logics and algebras, subsets of dyadic relations forming jointly exhaustive and pairwise disjoint (JEPD) sets are singled out. In DM two JEPD relation sets are used: first, the eight-element set RCC8 comprising DC, EC, PO, TPP, NTPP, TPPi, NTPPi and EQ, respectively disconnected, externally connected, partial overlap, tangential proper part, non-tangential proper part, with their inverses, and equals, and second, the five-element relation set RCC5 comprising the relations DR (= \(\mathsf{DC}|\mathsf{EC}\)), PO, PP (= \(\mathsf{TPP}|\mathsf{NTPP}\)), PPi (= \(\mathsf{TPPi}|\mathsf{NTPPi}\)), and EQ, respectively disjoint, partial overlap, proper part, and inverse proper part. Here the symbol ‘|’ signifies disjunction, e.g., \(\mathsf{DR}(x,y)\leftrightarrow \mathsf{DC}(x,y)\vee \mathsf{EC}(x,y)\). These respectively form the eight and five base relations of two relational subsumption lattices with top and bottom elements interpreted as the universal and null dyadic relations. Given that RCC8 is predicated on a continuous embedding space and DM on a discrete one, the relations sharing the same name are strictly not identical. For this reason, DM’s JEPD relation sets are identified in the text by the suffix ‘D’ as in ‘RCC8D’.

In [6] the discrete interior () and closure (\({\textsf {cl}}_{D}\)) are pseudo-topological operators defined on regions; they share some but not all of the usual properties of the interior and closure operators in standard treatments of topology. A map between these operators and the MM operations erosion and dilation [5] enables one to define a notion of approximate equality that underpins transitions encoded in DM’s conceptual neighbourhood graphs, and it also enables the RCC8D relation set to be easily implemented in any image-processing programs featuring standard MM libraries. Other properties of regions can be defined in DM, e.g., regions with or without an interior, regular regions (i.e., those without pixel-wide spikes and fissures), self-connected regions, connected components.

In the histological domain, cells and their parts, groups of cells forming tissues and compartments, and the background of a digitised histological preparation can all be modelled using DM. Simple regions and arbitrary sets of pixels forming regions which may or may not be spatially contiguous all yield potential models. If a histological preparation is thresholded as a single binary image, then segmented regions of interest will form connected components, so that in any one image, the only possible relations between pairs of regions are DC and EQ. The methods presented here, however, assume that two or more independent imaging or segmentation modalities are used, e.g., separating out the contribution of the different dyes in stained sections (as in Fig. 1) or confocal microscopy channels. In this case, pairs of regions, segmented from each channel, can be compared, all RCC5D and RCC8D relations being now possible.

3 Conceptual Neighbourhood Graphs, Continuity and Change, and Composition Tables

In [1, 6] a set of conceptual neighbourhood diagrams (CNDs), or graphs, were defined on dyadic relations. In RCC8, relation \(R'\) is a conceptual neighbour of relation R if some pair of regions related by R can be continuously deformed so that R changes to \(R'\) with no other relation holding during that deformation. In the discrete setting of DM, continuous deformation is recast in terms of minimal change. In [1], this was defined using the discrete interior (\({\textsf {int}}_{D}\)) and closure (\({\textsf {cl}}_{D}\)) operators; here we extend this to include changes to regions produced using the set-theoretic operators union (\(\mathsf {sum}\)), intersection () and difference (\(\mathsf {diff}\)) on region pairs. The universal region u, representing the image, is assumed to be self-connected, i.e. SC(u).

In general, an RCC8D conceptual neighbourhood of a binary relation R can be defined as

(1)

where \(\alpha \) and \(\beta \) are designated functions of the region variables x, y. The elements of are called the \(\langle \alpha ,\beta \rangle \)-neighbours of R. They are all the possible relations that can hold after regions x and y are modified in accordance with \(\alpha , \beta \). Given a segmented image, by a resegmentation we understand the replacement of a set S of regions in the image by a new set \(S'\) defined from S using some sequence of conceptual neighbourhood transitions. Such a resegmentation is chosen in order to correct anomalous relations in the original segmentation so that it satisfies the constraints of the domain being modelled.

Figure 2 shows a set of graphs that encode the conceptual neighbourhood relations defined in terms of the operators and \({\textsf {cl}}_{D}\). In the first graph, for example, showing

(2)

the arrow from DC to TPP represents the case , meaning that if \(\mathsf{DC}(x,y)\) holds then we must have \(\mathsf{TPP}(x,\mathsf {sum}(x,y))\). In this case only one resegmentation exists; other cases, such as , may allow more than one. Loops in the CND indicate where a change to a region of a designated pair does not necessarily result in a corresponding change of relation. Isolated nodes or nodes without outgoing edges arise where an operator returns null (e.g., \(\mathsf {diff}(x,y)\) where \(\mathsf{PP}(x,y)\)).

For graph n, the outgoing edges from the vertex labelled with relation R are designated with the mnemonic \(n_R\); for example, in the case of the pair of outgoing edges from PO to TPP and NTPP in graph 1, this is notated as \(1_\mathsf{PO}\). Note that four of the RCC8D relations are self-inverse, i.e., R(xy) implies R(yx); these are DC, EC, PO, and EQ. The other four relations form two mutually inverse pairs: \(\mathsf{TPPi}(x,y)\) if and only if \(\mathsf{TPP}(y,x)\), and \(\mathsf{NTPPi}(x,y)\) if and only if \(\mathsf{NTPP}(y,x)\). These inverse relations will sometimes be exploited in our reasoning. In graph 4, for example, we see that \(\mathsf{PO}(x,y)\) implies ; sometimes we will find it more convenient to rewrite this as , in which case we would cite the graph operation as \(4'_\mathsf{PO}\). An example of this is seen later in Table 1.

Fig. 2.
figure 2

Directed graphs encoding set theoretic and discrete topological operators. In each case the regions and the resulting operation on them are non-null.

We also use RCC8D’s composition table (RCC8D-CT). The notion of composition is well-known in AI as it provides an efficient inference mechanism for many QSR constraint satisfaction programs, where it is typically implemented as a simple look-up table. Following [15], weak composition of DM’s JEPD relation sets is defined as follows. Given relation set \( {\Sigma }\), the weak composition RCC8D-CT(R,S), where \(R,S\in {\Sigma }\), is defined to be the smallest subset \(\{T_i\}\subseteq {\Sigma }\) such that \(\mathrm{{DM}}\,\models \,\forall x,y,z ((R(x,y)\wedge S(y,z)) \rightarrow T_1(x,z)\vee \cdots \vee T_n(x,z))\). The elements of RCC8D-CT defined on non-null regions is identical to RCC8’s composition table entailed by RCC. This was mechanically proved using the sorted theorem prover SPASS [16] to verify that all entailments of the above form were included in the composition table; and constructing a set of graphical models satisfying each \(T_i(x,z)\) disjunct. The same method was used to verify the sets of directed edges of the graphs depicted in Fig. 2.

Fig. 3.
figure 3

(a) Original RGB image; (b) Haematoxylin and (c) Eosin channels; binary segmented (d) nuclei and (e) cytoplasm; (f) RGB colour merge: magenta = cytoplasm, green = nucleus, white = cytoplasm/nucleus overlap; (g,l) cropped details of (f); (h–k, m–p) resegmentations of g,l respectively, satisfying PP(nucleus,cytoplasm). See text for explanation. (Colour figure online)

4 Example: Segmenting Cells in Culture

In Fig. 3, image (a) depicts an H&E-stained culture of H400 cells grown on glass. Various image pre-processing operations are done. First a Gaussian filter (kernel radius 2) is applied to the original image to remove noise and reduce the fragmentation artefacts near the region boundaries. Next, colour deconvolution [17] is used to unmix the dye contributions and identify cell nuclei (H-stain, image (b)) from the rest of the cell bodies (E-stain, image (c)). Several standard image processing operations then follow (k-means clustering on the H,E stain images using 3 clusters, Boolean compositions of thresholded clusters, binary watershed separation), which are used to generate the two binary images of cell nuclei and their associated cells (images (d) and (e) respectively). The colour composite merge depicted in (f) illustrates the extent of conformity to the assumed histological constraints; binary segmented nuclei (d) are mapped to green, cytoplasm (e) to magenta, and overlap between the two to white. Where a nucleus forms a proper part (PP) of its cytoplasm, the latter appears as magenta surrounding a white nucleus; in less common cases where EQ holds, the whole cell appears white. First we test the RCC8D relation between cell nuclei and cytoplasm, where each typed set of spatially disjoint regions is treated as a mereological whole (regions n and c respectively).Footnote 2 In the case illustrated, we obtain \(\mathsf{PO}(n,c)\), as indicated by the presence of green regions in image (f). As this fails the test of conformity (which require nuclei to fall within their associated cytoplasm) the task is to repair the segmentation. In (f) seven candidate nuclei partially overlap (PO) cytoplasm regions and six form proper parts (PP). But these include several small ‘slivers’ adhering closely to the image boundary. Typically, regions bordering the edge of the image frame (so their true extent is not known) are removed from the analysis.

Using the directed graphs, we look for resegmentation operations on candidate nucleus/cytoplasm pairs that take us from PO to PP. Consider the enlarged detail shown in Fig. 3(g), where candidate nucleus n is PO to cytoplasm component c. One possible solution (image (h)) successively erodes n (i.e., replaces it by its discrete interior) until it becomes PP to c; this requires three successive erosions. Another (image (i)) replaces c by its discrete closure (dilation) until the same result is achieved. In (j) we extend c to cover all of n so that once again the nucleus is PP to its cytoplasm. In (k) we achieve the same result by subtracting from n the part that lies outside c.

Image (l) shows a second enlarged detail of (f), in which another candidate nucleus \(n'\) partially overlaps two cytoplasm components \(c_1\) and \(c_2\). One correction (image (m)) splits the nucleus into two by taking its intersections with \(c_1\) and \(c_2\). Each nuclear component is now PP to one cytoplasm component. In (n), the lower cytoplasm component (\(c_2\)) is extended to cover the whole of \(n'\). This, though, has the effect of merging two cytoplasm components; to compensate for this, the upper component (\(c_1\)) is reduced by subtracting from it the closure of \(n'\) (image (o)). Finally, in (p), the nucleus is completely surrounded by cytoplasm; this is achieved by extending \(c_2\) to cover not just \(n'\) but its closure; the compensatory reduction of \(c_1\) must now subtract the closure of the closure of \(n'\) in order to ensure complete separation from the extended \(c_2\).

These resegmentations, associated graphs and inferences used to generate them are summarised in Table 1.

Table 1. Resegmentation details for Fig. 3. Here CT refers to the RCC8D composition table.

In detail, the steps for \(c_1\) in (p) are as follows:

  1. 1.

    Start with \(\mathsf{PO}(n',c_1)\).

  2. 2.

    By 11\(_\mathsf{PO}\) this gives \(\mathsf{PO}|\mathsf{NTPPi}|\mathsf{TPPi}({\textsf {cl}}_D(n'),c_1)\).

  3. 3.

    By 1\(_\mathsf{PO|NTPPI|TPPI}\) this gives \(\mathsf{EQ}|\mathsf{NTPP}|\mathsf{TPP}({\textsf {cl}}_D(n'), \mathsf {sum}({\textsf {cl}}_D(n'),c_1))\).

  4. 4.

    Next, from \(\mathsf{EQ}(n',n')\) and 12\(_\mathsf{EQ}\) we have \(\mathsf{EQ}|\mathsf{NTPP}(n',{\textsf {cl}}_D(n'))\).

  5. 5.

    The RCC8D weak composition \(\mathsf{EQ}|\mathsf{NTPP}\circ \mathsf{EQ}|\mathsf{NTPP}|\mathsf{TPP}\) is \(\mathsf{EQ}|\mathsf{NTPP}|\mathsf{TPP}\).

  6. 6.

    Hence from 3 and 4 using 5 we have \(\mathsf{EQ}|\mathsf{NTPP}|\mathsf{TPP}(n',\mathsf {sum}({\textsf {cl}}_D(n'),c_1))\).

In step 2, we apply graph 1 to each disjunct of \(\mathsf{PO}|\mathsf{TPPi}|\mathsf{NTPPi}\) separately to generate the new disjunction \(\mathsf{EQ}|\mathsf{TPP}|\mathsf{NTPP}\) in step 3: here PO and TPPi (NTPPi) are mapped to \(\mathsf{TPP}|\mathsf{NTPP}\) and \(\mathsf{EQ}\) by \(1_\mathsf{PO}\) and \(1_\mathsf{TPPi}\) (\(1_\mathsf{NTPPi}\)) respectively; the combined operation is notated \(1_\mathsf{PO|TPPi|NTPPi}\).Footnote 3 The soundness and completeness of the inference procedures ensure not only that steps 1–4 encode the DM theorem \(\mathsf{PO}(x,y)\rightarrow \mathsf{EQ}|\mathsf{TPP}|\mathsf{NTPP}(x,\mathsf {sum}({\textsf {cl}}_D(x),y))\), but also that the disjunctive relation \(\mathsf{EQ}|\mathsf{TPP}|\mathsf{NTPP}\) is the strongest obtainable. Space limitations preclude similarly detailed analysis of the resegmentation of \(c_2\).

5 Discussion

The 12 graphs reveal six resegmentation operations that take us directly from PO to PP (i.e., \(\mathsf{TPP}|\mathsf{NTPP}\)), hence guaranteeing the transition to PP,Footnote 4 and another four that merely allow that possibility.Footnote 5 The number (and complexity) of potential resegmentations increases when several graphs are combined and node-node paths through these networks of length \(n>2\) are considered, as in the segmentation operations used to generate the cell depicted in Fig. 3(n).

Strategies for selecting optimal resegmentations would be relatively straightforward if the segmented cells were widely separated from each other, but in Fig. 3 this is not so, since several cells are separated by a pixel-width distance and in (f) several nuclei overlap more than one cytoplasm component. Hence when applying \(12_\mathsf{PO}\) to (g) to generate (i), or applying \(1_\mathsf{PO}\) to (l) to generate (o), hitherto separated cytoplasm components are merged. One way to avoid this is to restrict the discrete closure operation so as to prevent merging with a neighbouring component: \({\textsf {cl}}_D^-(x,y) =_\mathrm{def} \{\hat{x}\ |\ \mathsf{O}(N(\hat{x}),x) \wedge \lnot \mathsf{O}(N(\hat{x}),y)\}\). This picks out those pixels whose immediate neighbourhoods overlap x but are disjoint from y, giving the largest subset of \({\textsf {cl}}_D(x)\) not connected to y. It cannot be applied carte blanche, however, since histological domain constraints may require some regions (e.g., fragmented nuclei) to be merged.

Within DM proper, different classes of regions operate as filters, e.g., atoms, regions without interiors, regular regions (lacking spikes or fissures) or connected components; and all these give rise to varying constraints on the mereotopological relations defined on them. At the image scales typically used in digital microscopy, most segmented regions mapping to histological objects have interiors, and very few are atomic, but restricting these topological properties will constrain the possible segmentation models and relations that can be defined on them. Constraints other than those defined directly within DM can be also used to reduce the number of segmentation models. A simple example is where the range of sizes of histological objects can be used as a filter, either using MM granulometry methods or filtering by morphological thickness [1], enabling us to rule out the segmentation models depicted in Fig. 3(i), (n) (cell bodies too large), and (h) (cell nucleus too small).

Another extra-logical constraint that can be used exploits empirical information about the histological stains and their known selectivity in dye take-up with respect to targeted tissues and their parts. Given that the H-stain offers better segmentations for nuclei than the eosin counter-stain does for cytoplasm, resegmentations can be ranked so as to favour those that minimise the changes to nuclei. Other assumed empirical and ontological dependencies can also be exploited. For example, depending on microscope resolution, each cell nucleus should fall wholly within some cytoplasm component, whether this is segmentable from the original image or not; in DM this can be captured by adding the histological domain axiom \(Nuc(x) \rightarrow \exists y(Cell(y) \wedge P(x,y))\). This constraint also justifies the assumption underlying the transition \(1_\mathsf{PO}\), where a cytoplasm component partially overlapping a nucleus is extended to cover the nucleus so that it forms part of that cell. We can also add the correspondence between histological features and stain selectivity: given a PO relation defined on a poorly segmented nucleus and its host cytoplasm, we favour \(12_\mathsf{PO}\) (dilating the cytoplasm) over \(9_\mathsf{PO}\) (eroding the nucleus). Also worthy of note is that the boundaries of histological objects in a greyscale image exhibit intensity gradients. This means that when binary thresholding candidate regions, subsequent erosion and dilation-based resegmentations are more likely to track gray-scale intensity levels in the original image than blind set-theoretic segmentation operations on binary images. This observation highlights a limitation of the underlying method, namely, that once the segmentation mask of the target histological object is provisionally segmented, changes subsequently made and translated back to the images may not conform to all available information in the image. However, in some cases restricting oneself to information in an image may not, even in principle lead to a histological model. For example, uneven staining may fail to reveal the full extent of the cytoplasm in the sample; in which case a model-based resegmentation solution can be used to factor out those regions in an image that need to be treated differently than the rest.

It is also perhaps useful here, to give at least some indication of empirical methods and metrics envisaged for quantifying, validating and measuring our segmentation solutions. For example, given cell nuclei are easier to segment than their associated bodies of cytoplasm, these can be used as a gold-standard reference measuring how well the segmented cytoplasm conforms to the histological model. In this case a necessary condition is that segmented nuclei form part of some overlapping body of cytoplasm, so one possible measure is the number of region pairs that form a part-whole relation divided by all possible overlap cases. These simple examples show (i) that the underlying physical model should guide the abstraction and (ii) the danger of abstracting and working with generic cases too quickly, where empirical constraints restricting valid resegmentations may be missed.

6 Conclusions and Future Work

We have shown how DM provides the means to model cellular and tissue structure in digitised histological images. Segmentation and resegmentation satisfying a histological model can be achieved by a set of operations on regions that satisfy a set of constraints on pairs of regions. These constraints can be encoded as a set of graphs in which topological and set-theoretic operators lead from one vertex to another. The method is generic and can be applied to any domain where it is required to segment digitised images into regions satisfying specific sets of mereotopological relations.

Several directions for future work can be suggested. First, the set of operators and associated graphs can be extended to cover all the standard topological and set-theoretic operators, including the discrete exterior, boundary, the (absolute) complement and the symmetric difference. Second, various different metrics can be defined on the conceptual neighbourhoods and their graphs, allowing optimisation of segmentation models and prediction of the most likely path to take through the graphs from a given state to a segmentation goal. These could be based on what proportion of JEPD relations reached at each step can lead to a valid model, or probability measures determined from a statistical analysis of the data sets, taking into account a priori and empirically derived properties such as tissue type and morphological shape and size.