1 Introduction

Multi-class segmentation problems are common in analysis of biomedical images. A typical solution is to train a neural network pixel classifier. Commonly, these networks predict a probability distribution over all classes in each pixel, which can be thresholded to obtain a final segmentation. These predictions often contains holes, partial misclassifications, shrinkage of small classes and rough borders between classes, resulting in errors in the final segmentation. To improve the segmentation, post-processing is often used to close holes, reclassify uncertain pixel labels based on proximity, grow objects and smoothen rough boundaries.

Mathematical morphology is a powerful framework for post-processing binary and grayscale images. Binary and grayscale morphology are special cases of morphology on complete lattices [1]. A complete lattice is a partially ordered set (poset), where each non-empty subset has an infimum and a supremum. For complete lattices, the core operators, dilation and erosion, can be defined using supremum and infimum: for binary morphology using set union and intersection; and for grayscale morphology using maximum and minimum under the standard total ordering of the reals; see [1] for an in depth treatment of the theoretical foundations of mathematical morphology.

For general multi-class images, there is no natural ordering of the classes, and hence, they do not form a complete lattice. For example, for a segmentation of microscope images of cells into cell membrane, mitochondria and background, any ordering of the classes is task-dependent and not given by the images themselves. A natural representation of this kind of data is the categorical distribution, which can represent both crisp segmentation masks and uncertainty as encountered in prediction images. In the remainder of this work, we will use the term “categorical” instead of “multi-class.”

In this work, we provide a thorough review of previously proposed approaches to morphology on categorical images. We then propose two approaches for morphology on categorical distributions: an indirect approach where we operate on Dirichlet distributions that are then transformed to categorical distributions and a direct approach where we operate on the categorical distributions themselves. We then define protected variants of the direct operations that allow finer control over the processing. Finally, we illustrate the utility of the proposed approach on two tasks: fixing misclassified mitochondria and modeling annotator bias.

2 Background and Related Work

In this section, we briefly restate morphology on complete lattices and on binary and grayscale images, before we review the most relevant literature [2,3,4,5,6,7,8]. What we refer to as categorical images have various names in the literature: color-coded images, label images and n-ary images. In the sections below, we will use the original names in the section titles, but otherwise we will refer to categorical images and categorical morphology.

In the literature, there are three main approaches for extending morphology to images with values that do not have a natural ordering: impose an order on the values, which is the common approach for color images; operate on all categories simultaneously [2, 5]; and operate on a single category at a time [6, 7]

Morphology on color images has received a lot of attention, with the primary focus on ordering colors by exploiting the relationship between dimensions of color spaces. See, for example, [9] for an overview of approaches for defining an ordering of colors. Our focus is on categorical images, where such approaches are less relevant.

2.1 Morphology on Complete Lattices

Let \(\Gamma \) be a set with the partial order \(\le \). The poset \((\Gamma , \le )\) is a complete lattice if every subset of \(\Gamma \) has an infimum \(\wedge \) and a supremum \(\vee \). We define an image as a function f from pixel-coordinates \(\mathbb {D}= \mathbb {Z}^d\) to \(\Gamma \) and a structuring element B as a subset of \(\mathbb {D}\)

$$\begin{aligned} f&\in \mathcal {F} = \left\{ g \mid g : \mathbb {D}\mapsto \Gamma \right\} , \end{aligned}$$
(1)
$$\begin{aligned} B&\subseteq \mathbb {D}. \end{aligned}$$
(2)

The dilation \((\delta )\) and erosion \((\epsilon )\) of f by B are then defined as the supremum and infimum over the local neighborhoods in f given by B

$$\begin{aligned} \delta (f;B)(x)&= \bigvee \limits _{\{y\mid (y-x) \in B\}} f(y), \end{aligned}$$
(3)
$$\begin{aligned} \epsilon (f;B)(x)&= \bigwedge \limits _{\{y\mid (y-x) \in B\}} f(y). \end{aligned}$$
(4)

Opening (\(\gamma \)) and closing (\(\phi \)) are the compositions of dilation and erosion

$$\begin{aligned} \gamma (f;B)(x)&= \delta (\epsilon (f;B);B), \end{aligned}$$
(5)
$$\begin{aligned} \phi (f;B)(x)&= \epsilon (\delta (f;B);B). \end{aligned}$$
(6)

2.2 Binary and Grayscale Morphology

We define a grayscale image as in (1) with \(\Gamma = [0,1]\). Let \(\le \) be the usual ordering of the reals, then the poset \(([0,1], \le )\) is a complete lattice, where the \(\min \) function gives the infimum and the \(\max \) function the supremum. Let B be defined as in (2). Dilation and erosion can then be obtained from (3) and (4) as

$$\begin{aligned} \delta (f;B)(x)&= \max \limits _{\{y\mid (y-x) \in B\}} f(y), \end{aligned}$$
(7)
$$\begin{aligned} \epsilon (f;B)(x)&= \min \limits _{\{y\mid (y-x) \in B\}} f(y). \end{aligned}$$
(8)

If we restrict \(\Gamma \) to \(\{0,1\}\), we obtain binary morphology.

2.3 Morphology on Color-Coded Images

In [2], the authors propose a framework for categorical morphology where pixels have a set of categories. Let \(C = \{c_1, c_2, \dots , c_n\}\) be a set of n categories. The powerset of C, \(\mathbb {P}_{C}\), is the set of all subsets of C, including the empty set. An image f is then defined as in (1) with \(\Gamma = \mathbb {P}_{C}\). In this framework, the value of a pixel can be any element of \(\mathbb {P}_{C}\), e.g \(\{c_1\}\), \(\{c_1, c_n\}\) or \(\{\}\). Let \(\subseteq \) be the usual subset relation, then the poset \((\mathbb {P}_{C}, \subseteq )\) is a complete lattice where set intersection is the infimum and set union is the supremum. In [2], the authors propose to use structuring elements that are images that is \(B \in \mathcal {F}\). For the sake of comparison, we first consider the simpler case where B is defined as in (2). Dilation and erosion can then be obtained from (3) and (4) as

$$\begin{aligned}&\delta (f;B)(x) = \bigcup \limits _{\{y\mid (y-x) \in B\}} f(y), \end{aligned}$$
(9)
$$\begin{aligned}&\epsilon (f;B)(x) = \bigcap \limits _{\{y\mid (y-x) \in B\}} f(y). \end{aligned}$$
(10)

An example of these operations is shown Fig. 1a.

Let \(B \in \mathcal {F}\). Under this scheme, an operation is only performed when one or more categories in the structuring element match a category in the image, and the result depends on the categories in both image and structuring element. Several variations of dilation and erosion are proposed in [2]; here, we only consider the “transparent” operations. Let \(\mathbb {D}_f\) be the domain of f and \(\mathbb {D}_B\) the domain of B. A specified reference point, \(y_0 \in \mathbb {D}_B\), is used to determine whether B matches f and could, for example, be the center of a ball-shaped \(\mathbb {D}_B\). Dilation and erosion are then defined as

$$\begin{aligned}&\delta (f;B)(x) = f(x) \cup \bigcup \limits _{\{y \in \mathbb {D}_B \mid f(x+y) \cap B(y_0) \ne \emptyset \}} B(y)\end{aligned}$$
(11)
$$\begin{aligned}&\epsilon (f;B)(x) \nonumber \\&\quad = {\left\{ \begin{array}{ll} f(x), &{} \text {if}\,f(x) \cap B(y_0) = \emptyset \\ f(x)\setminus B(y_0), &{} \text {if}\,[\exists y \in \mathbb {D}_B](f(x+y) \cap B(y_0) = \emptyset ),\\ f(x) \cup B(y_0), &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(12)

An example of these operations is shown Fig. 1b using a cross-shaped structuring element with \(y_0\) in the center.

2.4 Morphology on Label Images

In [5], the authors propose a framework for categorical morphology where pixels have no category (\(\bot \)), a unique category or conflicting categories (\(\top \)). Let \(C = \{c_1, c_2, \dots , c_n\}\) be a set of n categories and let \(C_* = C \cup \{\bot , \top \}\). An image f is then defined as in (1) with \(\Gamma = C_*\). The poset (\(C_*, \le \)) where \(\le \) satisfies \([\forall c \in C](\bot \le c \le \top )\) is a complete lattice. Let B be defined as in (2) and let \(V(x) = \{f(x-y) \mid y \in B\}\). Dilation and erosion are then defined as

$$\begin{aligned} \delta (f;B)(x)&= {\left\{ \begin{array}{ll} \top , &{} \quad \text {if}\,\top \in V(x)\\ \top , &{} \quad \text {if}\,\vert V(x) \cap C\vert > 1\\ V(x) \cap C, &{} \quad \text {if}\,\vert V(x) \cap C \vert = 1\\ \bot , &{} \quad \text {otherwise}\end{array}\right. }\end{aligned}$$
(13)
$$\begin{aligned} \epsilon (f;B)(x)&= {\left\{ \begin{array}{ll} \bot , &{} \quad \text {if}\,\bot \in V(x)\\ \bot , &{} \quad \text {if}\,\vert V(x) \cap C\vert > 1\\ V(x) \cap C, &{} \quad \text {if}\,\vert V(x) \cap C\vert = 1\\ \top , &{} \quad \text {otherwise}\end{array}\right. } \end{aligned}$$
(14)

An example of these operations is shown Fig. 1c. In the context of categorical distributions, where we have detailed information about label uncertainty, this approach is unsuitable due to the loss of information.

2.5 N-ary Morphology

In [6], the authors propose a framework for categorical morphology where pixels have a unique category. Let \(C = \{c_1,c_2,\dots ,c_n\}\) be a set of n categories. An image f is then defined as in (1) with \(\Gamma = C\). Instead of operating on all categories simultaneously, the authors propose to operate on a single category at a time. Let B be defined as in (2) and let i be the category we operate on. We use subscripts to distinguish single category operations from standard operations. Dilation and erosion are then defined as

$$\begin{aligned} \delta _i(f;B)(x)&= {\left\{ \begin{array}{ll} f(x), &{} \quad \text {if}\,[\forall y \in B](f(x+y) \ne i)\\ i, &{} \quad \text {otherwise}\end{array}\right. }\end{aligned}$$
(15)
$$\begin{aligned} \epsilon _i(f;B)(x)&= {\left\{ \begin{array}{ll} f(x), &{} \quad \text {if}\,f(x) \ne i\\ i, &{} \quad \text {if}\,[\forall y \in B](f(x+y) = i)\\ \theta (x,f), &{} \quad \text {otherwise}\end{array}\right. } \end{aligned}$$
(16)

where \(\theta \) is a function that assigns a value in the case where there are different categories in the neighborhood of x. A natural choice for \(\theta \), which is also suggested in [6], is to pick the value of the closest pixels. However, this does not help when the closest pixels have different values, which is a fundamental problem when pixel values cannot represent uncertainty. This is solved by ranking the categories and using the ranking to break ties. In general, there is no obvious way of ranking categories based on the image alone, and as the number of multi-category interfaces increases, it becomes more difficult to understand how one particular ranking influence the outcome.

Without ranking categories a priori, the above definition implies an ordering \(\le _i\), which is not a partial order, and thus, \((C, \le _i)\) is not a complete lattice. In [7], the authors show that \(\le _i\) is a preorder, and formalize constraints for choosing \(\theta \) such that dilation and erosion form an adjunction and their compositions are an opening and a closing. However, this does not help decide which category to choose when multiple categories are closest, as the constraints on \(\theta \) do not yield a unique rule for breaking ties. An example of these operations is shown in Fig. 1d, where the question marks highlight two pixels that cannot be assigned a value without a method for breaking ties.

Fig. 1
figure 1

Comparison of categorical morphologies from the literature. From left to right: image, structuring element, dilation, closing

2.6 Fuzzy n-Ary Morphology

In [6], the authors also propose an extension of n-ary morphology to images of categorical distributions. Let \(C = \{c_1,c_2,\dots ,c_{n+1}\}\) be a set of \(n+1\) categories. The categorical distribution of \(n+1\) categories is completely determined by a point in the n-simplex \(\Delta ^n = \{ \pi \in \mathbb {R}^{n+1} \mid \pi _k \ge 0, \sum \pi _k = 1\}\), where \(\pi _k\) is the probability of \(c_k\). An image f is then defined as in (1) with \(\Gamma = \Delta ^n\). Operations are again defined on a single category at a time. Let \(B_r\) be a closed ball of radius r centered at the origin and let i be the category we operate on. Let \(f_k(x) = f(x)_k\) be the probability of observing category \(c_k\) in pixel x and let \(\omega _k(x) = 1 - f_k(x)\). Dilation is then defined as

(17)

where \(\delta (f_i;B_r)(x) = 1 \implies [1 - \delta (f_i;B_r)(x)]\frac{f_k(x)}{\omega _i(x)} = 0\).

Two variations on erosion are proposed in [6], neither of which we find satisfactory. The first requires that we pick a ranking of all categories and does not yield idempotent opening and closing

$$\begin{aligned}&\epsilon _i(f;B_r)(x)_k\nonumber \\&\quad = {\left\{ \begin{array}{ll} \epsilon (f_k;B_r)(x) &{} \text {if}\,k = i\\ f_k(x) + f_i(x) - \epsilon (f_i;B)(x) &{} \text {if}\,k = \min ( \arg \min \limits _{j\ne i}(\delta (f_j;B)) )\\ f_k(x) &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(18)

The second assumes that the image is restricted to the edges of the simplex (at most two categories are nonzero in any pixel) and opening and closing are again not idempotent

$$\begin{aligned}&\epsilon _i(f;B_r)(x)_k \nonumber \\&\quad = {\left\{ \begin{array}{ll} \epsilon (f_k;B_r)(x) &{} \quad \text {if}\,k = i\\ \frac{1 - \epsilon (f_i;B)(x)}{1 - f_i(x)}f_k(x) &{} \quad \text {if}\,f_i(x) \le 0.5 \vee \max \limits _{j\ne i}(\delta (f_j;B)(x) < 0.5)\\ 1 - \epsilon (f_i;B)(x) &{} \quad \text {if}\,k = \min (\arg \max \limits _{j\ne i}\delta (f_j;B)(x))\\ 0 &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(19)

We refer the reader to [6] for the motivation for these formulations and their properties.

2.7 Fuzzy Pareto Morphology

In [3], the authors propose fuzzy Pareto morphology for color images. An RGB color image can be seen as a three-dimensional fuzzy set, where the membership function for each set corresponds to the value of each color channel. This can equivalently be seen as point in the half-open unit cube. An image f is then defined as in (1) with \(\Gamma = (0,1]^d\). For each \(a \in \Gamma \), we can associate a hyperrectangle defined by the vector from the origin to a. Fuzzy Pareto morphology is based on the idea of dominance. For \(a,b \in \Gamma \), let \(a \cap b = \{\min (a_i,b_i)\}_{i=1\dots d}\) be the intersection of a and b. Let \(A(a) = \prod _i a_i\) be the area function, yielding the area of the hyperrectangle of a. The degree to which a dominates b is then

$$\begin{aligned} \mu _D(a,b) = \frac{A(a \cap b)}{A(b)}, \end{aligned}$$
(20)

which measures how much of the hyperrectangle of b is contained in the hyperrectangle of a.

Let \(B(x) = \{x+y \mid y \in B\}\), dilation and erosion are then defined as

$$\begin{aligned} \delta (f;B)(x)&= f\left( \arg \min \limits _{y \in B(x)}\left\{ \max \limits _{z \in B(x) \wedge z \ne y}\mu _D(f(z), f(y))\right\} \right) ,\end{aligned}$$
(21)
$$\begin{aligned} \epsilon (f;B)(x)&= f\left( \arg \max \limits _{y \in B(x)}\left\{ \min \limits _{z \in B(x) \wedge z \ne y}\mu _D(f(z),f(y))\right\} \right) . \end{aligned}$$
(22)

Although not directly applicable to categorical distributions, it could easily be extended by either restricting \(\Gamma \) to \(\{v \in (0,1]^d \mid \sum _i v_i = 1\}\) or by considering it in the context of the Dirichlet distribution. However, (21) and (22) are not guaranteed to yield a unique solution, requiring us to come up with an arbitration rule.

2.8 Morphology on the Unit Circle

In [10], the authors propose morphology on the unit circle for processing the hue space of color images. The idea is to use structuring elements from the hue space and define an ordering based on the shortest distance along the unit circle between values in the image and values in the structuring element. Although not directly applicable to categorical images, it could be relevant to consider structuring elements that are themselves categorical distributions and base morphology on distance between distributions.

Morphology on the unit circle is also considered in [4] where the authors propose three approaches: using difference operators (e.g., gradient), using grouped data and using “labeled openings.” It is the labeled openings that are most relevant in our context. Let f be an image as defined in (1) with \(\Gamma = [0, 2\pi ]\). In a labeled opening, the unit circle is partitioned into segments \(S(\omega ) = \{[0,\omega )\), \([\omega , 2\omega )\),\(\dots \), \([2\pi -\omega , 2\pi )\}\) and each segment \(s \in S(\omega )\) gives rise to a binary image \(f(x;s) = f(x) \in s\). A labeled opening is then the union of the binary openings of all segments, \(\gamma _\omega (f) = \cup _{s \in S(\omega )}f(x;s)\), indicating for each pixel if it is present in at least one of the opened segments. The resulting image highlights areas of uniform direction and the inverse of that image highlights areas with change in direction. Categorical images have a natural partitioning based on the categories. A labeled opening of a categorical image would then be a binary image indicating those pixels where at least one category was preserved after opening each category. The resulting image highlights areas where at least one category is uniformly present, and the inverse of that image highlights areas without a category after the opening, similarly to \(\bot \) in Sect. 2.4.

2.9 Morphology on Component Graphs

In [8], the authors propose a framework for morphology on multi-valued images based on component graphs. Let an image be defined as in (1) with \(\Gamma = \mathbb {R}^d\). The component graph is constructed from the connected components of the threshold sets of an image. For example, for \(d=2\) and \(f(x) \in \{0,1\}^2\) the levels of f are \(\{(0,0), (0,1), (1,0), (1,1)\}\) and the level set of (0, 1) is \(\{x \,\mid \, f(x) = (0,1)\}\). Let \(\le \) be a partial order on \(\{0,1\}^2\). The threshold set of (0, 1) is then \(T_{(0,1)} = \{ x \,\mid \, (0,1) \le f(x)\}\). We can represent the threshold set as a binary image and find the connected components in this image. For two levels \(l_i\) and \(l_j\), we have \(l_i \le l_j \implies T_i \supseteq T_j\), so any connected component in \(T_j\) must be contained in a connected component in \(T_i\). The component graph is then constructed by adding a node for each connected component and an edge from node u to node v if the connected component of v is contained in the connected component of u. In order to construct the component graph, it is required that \(\Gamma \) allows a minimum, e.g., \(\{0\}^d\), such that the graph will be connected. For categorical images, this would require that we have a special background category as in Sect. 2.3 and Sect. 2.4. Further, it requires that each pixel can have multiple categories; otherwise, no component will be nested inside another and the graph will be the root with all connected components as children.

Because the component graph directly exposes the spatial relationship between differently valued regions, it is possible to apply morphological filters, e.g., noise reduction, by pruning some nodes and reconstructing the image from the pruned component graph. Directly pruning the component graph can lead to ambiguity in the reconstruction when a node with two non-comparable parents is removed. The authors propose to solve this by building a component tree of the component graph, prune the tree and then reconstruct the graph from the tree and the image from the graph. In order to construct the component tree, it is necessary to impose a total order on the nodes of the component graph, for example, by using a shape measure on the connected components in the component graph.

Because the component graph only captures spatial relationships when connected components overlap for different level sets, some common post-processing operations, such as closing holes in segmentations, are challenging to perform.

3 Morphology on Categorical Distributions

In this section, we propose two approaches for morphology on categorical distributions. In Sect. 3.1, we show how to operate on all categories simultaneously by operating on Dirichlet distributions. The limitations of this approach will then motivate single category operations that work directly on categorical distributions, which we will define in Sect. 3.2.

3.1 Morphology on Dirichlet Distributions

Let \(\mathbb {R}_+\) be the positive real line. We consider the Dirichlet distribution of order \(n \ge 2\) with parameters \(\alpha \in \mathbb {R}_+^n\), written as \(\textrm{Dir}(\alpha )\), as a distribution over the \((n-1)\)-simplex \(\Delta ^{n-1} = \{ \pi \in \mathbb {R}^{n} \mid \pi _k \ge 0, \sum \pi _k = 1\}\) with density function

$$\begin{aligned} \textrm{pdf}(\pi ) = \frac{1}{\textrm{Beta}(\alpha )}\prod \limits _{k=1}^n \pi _k^{\alpha _k-1} \end{aligned}$$
(23)

where \(\textrm{Beta}(\cdot )\) is the multivariate Beta function defined with the Gamma function as

$$\begin{aligned} \textrm{Beta}(\alpha ) = \frac{\prod _{k=1}^n \textrm{Gamma}(\alpha _k)}{\textrm{Gamma}(\sum _{k=1}^n\alpha _k)}. \end{aligned}$$
(24)

Let \(X^\alpha \sim \textrm{Dir}(\alpha )\), with \(\alpha \in \mathbb {R}_+^n\). A realization of \(X^\alpha \) is a point in the \((n-1)\)-simplex, which can be taken as parameters of the categorical distribution with n categories. The expectation of \(X^\alpha \) is

$$\begin{aligned} \mathbb {E}[X^\alpha _k] = \frac{\alpha _k}{\sum \alpha }, \end{aligned}$$
(25)

which maps each Dirichlet distribution to a specific categorical distribution. Note that \(0< \alpha _k < \infty \) implies that we can only represent categorical distributions in the open simplex. In practice, this is not a problem as we can get arbitrarily close to the boundary of the simplex.

Let \(f_k\) be the kth category in f. An image f is defined as in (1) with \(\Gamma = \mathbb {R}_+^n\). If we equip f with the ordering \(f \le g \iff [\forall k](f_k(x) \le g_k(x))\), we obtain a complete lattice. Dilation and erosion are then defined as their grayscale counterparts applied to each category independently

$$\begin{aligned} \delta (f;B)(x)_k&= \delta (f_k;B), \end{aligned}$$
(26)
$$\begin{aligned} \epsilon (f;B)(x)_k&= \epsilon (f_k;B). \end{aligned}$$
(27)

An example of these operations is provided in Fig. 2. It is interesting to compare the images of entropy (uncertainty) and \(\alpha \) parameter magnitude for dilation and erosion. We can see that entropy and magnitude are positively correlated for dilation, and negatively correlated for erosion. Try to think of dilation as “increasing the probability of everything” and erosion as “decreasing the probability of everything.” It is, of course, impossible to change the “probability of everything,” all we can do is shuffle probability around between categories. Nevertheless, the idea captures our intent and the magnitude image reflects this. High entropy and low magnitude can be interpreted as uncertainty due to a lack of confidence, whereas high entropy and high magnitude can be interpreted as uncertainty due to over confidence. Opening and closing appear to be more straightforward as they, respectively, decrease and increase uncertainty at the boundaries between overlapping categories.

Fig. 2
figure 2

Morphology on Dirichlet distributions. The top left image is an RGB representation of an image f with three categories, where the colors red, green, and blue correspond to points very close to the vertices of \(\Delta ^2\) and the remaining colors are mixtures of these three colors. The first row is the Dirichlet distribution. The second row is the probability vectors obtained from (25). The third row is entropy of the probability vectors, and the fourth row is the magnitude (\(l_1\) norm) of the parameter vectors. We can see that dilation increases both entropy and magnitude, whereas erosion decreases magnitude and increases or decreases entropy depending on the local distribution (Color figure online)

We can easily extend these operators to operate on a subset of categories S by simply only updating those categories

$$\begin{aligned} \delta (f;B\vert S)(x)_k&= {\left\{ \begin{array}{ll} \delta (f_k;B), &{}\quad \text {if}\,k \in S\\ f_k, &{}\quad \text {otherwise}\end{array}\right. }\end{aligned}$$
(28)
$$\begin{aligned} \epsilon (f;B\vert S)(x)_k&= {\left\{ \begin{array}{ll} \epsilon (f_k;B), &{} \quad \text {if}\,k \in S\\ f_k, &{} \quad \text {otherwise}\end{array}\right. } \end{aligned}$$
(29)

An example of these operations is provided in Fig. 3 where we operate on the green category. Consider the gray/blue region surrounded by green that is indicated with a white ellipse in the left image of the second row. When we dilate the green category, we would expect this region to become green in the probability image, but in the Dirichlet space these pixels already have the same green value as the green region, so they are unaffected by the dilation. We could partly solve this by carefully setting the \(\alpha \) values, e.g., setting the pixels with only green to have very large green values. However, if our goal is to work on categorical distributions, this becomes too large a burden to be practical and we now turn our attention to morphological operators that work directly on categorical distributions.

Fig. 3
figure 3

Morphology on Dirichlet distributions using a subset of categories, in this case the green category \(\{g\}\). See also Fig. 2 (Color figure online)

3.2 Morphology on Categorical Distributions

Recall from Sect. 2.6 that for a set of \(n+1\) categories, \(C = \{c_1,c_2,\dots ,c_{n+1}\}\), the categorical distribution over these categories is completely determined by a point in the n-simplex \(\Delta ^n = \{ \pi \in \mathbb {R}^{n+1} \mid \pi _k \ge 0, \sum \pi _k = 1\}\), where \(\pi _k\) is the probability of \(c_k\). An image f is then defined as in (1) with \(\Gamma = \Delta ^n\). Operations are again defined on a single category at a time. Let \(B_r\) be a closed ball of radius r centered at the origin and let i be the category we operate on. Let \(f_k(x) = f(x)_k\) be the probability of observing category \(c_k\) in pixel x and let \(\omega _k(x) = 1 - f_k(x)\).

3.2.1 Dilation

For the dilated category i, the operation is the same as standard grayscale dilation. For the remaining set of categories, the operation is a rescaling to ensure that the probabilities sum to one, while the conditional probabilities \(\textrm{Pr}(k = j \vert x, k \ne i)\), for \(j \ne i\) are unchanged

(30)

If \(\delta (f_i;B_r) = 1\), then the conditional probabilities are not defined and we simply set the probabilities to \(1 - \delta (f_i;B_r) = 0\). This definition is the same as (17) and equivalent to the definition from [6].

3.2.2 Erosion

Erosion is defined similarly to dilation, with the exception of the case when \(f_i(x) = 1\) where we cannot rescale the remaining categories because \(\omega _i(x) = 0\)

(31)

The function \(\theta \) must only depend on the neighborhood defined by \(B_r\) and defined such that \(\epsilon (f_i;B_r)(x) < 1 \implies [\exists k \ne i]\left( \theta (f_k,B_r)(x) > 0\right) \). In addition, we require that, when disregarding discretization issues, eroding with \(B_{r+\rho }\) is equivalent to first eroding with \(B_{r}\) and then eroding with \(B_\rho \)

$$\begin{aligned} \epsilon _i(\epsilon _i(f,B_r),B_{\rho })(x) = \epsilon _i(f, B_{r+\rho })(x) . \end{aligned}$$
(32)

Since \(\theta \) is only used in the case where \(f_i(x) = 1\), we must have that

$$\begin{aligned}{} & {} \epsilon (f_i;B_r)(x) < 1 \implies \frac{\theta (f_k,B_{r+\rho })(x)}{\sum _{j \ne i} \theta (f_j;B_{r+\rho })(x)} \nonumber \\{} & {} \quad = \frac{\theta (f_k;B_r)(x)}{\sum _{j \ne i} \theta (f_j;B_r)(x)} \end{aligned}$$
(33)

So \(\theta \) must only depend on the smallest possible neighborhood \(B_{r^*}\) where

\(\epsilon (f_i; B_{r^*}) < 1\), leading to

$$\begin{aligned} \theta (f_k,B_r)(x)&= \delta (f_k;B_{r^*})(x)\\ r^*&= \arg \min \limits _{r' > 0} r',\; \mathrm {s.t.} \; \epsilon (f_i; B_{r'})(x) < 1. \nonumber \end{aligned}$$
(34)

This amounts to picking the closest category as suggested for crisp categorical images in [6, 7], although without the need for breaking ties since multiple closest categories are now handled by rescaling. In Appendix 1, we show that these definitions have the same properties as the definitions in [7] for operating on n-ary images.

An example of the proposed operations is provided in Fig. 4, where we operate on the green category. Compared to morphology on Dirichlet distributions using subsets in Fig. 3, the operations now work directly on the probabilities, making it much easier to understand and control.

Fig. 4
figure 4

Morphology on categorical distributions. Here we operate on the green category g (Color figure online)

4 Protected Morphological Operations

In [2], the authors introduce the concept of protected morphological operations, where a subset of categories are protected from being updated. Here we adapt the idea of protected morphological operations to categorical distributions and define protected dilation and erosion.

Let L be a set of categories, we then write \(\epsilon _i(f;B_r\vert L)\) for an erosion of i that protects L. Let \(J = C{\setminus } (\{i\} \cup L)\) be the set of categories that are not protected nor operated on. Let \(f_K(x) = \sum _{k \in K} f_k(x)\) be the sum over a set of categories \(K \subset C\). If L is empty, or \([\forall x](f_L(x) = 0)\), protected operations reduce to their non-protected counterparts. Because L can change the topology of the domain, we cannot just define operations based on Euclidean distance. Instead we introduce a distance function \(d_\Omega (x,y)\), which computes the distance from x to y on the domain \(\Omega \). If \(\Omega = \mathbb {Z}^d\), then \(d_\Omega (x,y)\) is the Euclidean distance. Computing exact Euclidean distance on a Euclidean domain with holes is non-trivial. Here we use the simplified fast marching method (FMM) from [11] with the update rule defined in [12], which results in a small approximation error. For brevity, when possible we leave out function application and write f instead of f(x) in the following.

Fig. 5
figure 5

Protected morphology on categorical distributions. The red category \(\{r\}\) is protected while we operate on the green category g (Color figure online)

4.1 Protected Dilation

Let \(\Omega _p = \{x \in \mathbb {D}\mid f_L(x) \le 1-p\}\), this is the part of f where it is possible to set \(f_i = p\). Protected dilation is then defined as

$$\begin{aligned}&\delta _i(f;B_r\vert L)(x)_k = \nonumber \\&{\left\{ \begin{array}{ll} f_k &{} \quad \text {if}\,k \in L\\ \min \left( 1-f_L, \max \limits _{p \in (0,1]}\max \{f_i(y) \mid d_{\Omega _p}(x,y) \le r\}\right) &{} \quad \text {if}\,k = i\\ \left[ 1 - f_L - \delta _i(f;B_r\vert L)_i\right] \frac{f_k}{f_J} &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(35)

4.2 Protected Erosion

Protected erosion is defined similarly to protected dilation, with the added complication of normalization

(36)

The first case ensures that all protected categories are unchanged. The second case ensures that a pixel x is not updated, unless there is a path, not blocked by \(f_L\), to a pixel y with \(f_J(y) > 0\). The importance of this is easily seen by considering the case where \(f_i\) varies in an region, but \(f_i + f_L = 1\) in the region. The third case states that if there is such a path, then it can be eroded. The fourth and fifth cases handle normalization. The \(\theta \) function is defined in a similar manner as for non-protected erosion in (34),

$$\begin{aligned} \theta (f_k)(x)&= \max _{p \in (0,1]}\max \{f_k(y) \mid d_{\Omega _p}(x,y) \le r^*\}\\ r^*&= \arg \min \limits _{r'> 0} r' \;,\; \mathrm {s.t.} \; [1 - f_L - \epsilon _i(f;r'\vert L)_i(x)] > 0. \nonumber \end{aligned}$$
(37)

An example of these operations is provided in Fig. 5, where the red category is protected while we operate on the green category. Compared to the non-protected operations in Fig. 4, we can see that changes are restricted to the green and blue categories.

5 Examples

The first example illustrates how morphology on categorical distributions (Sect. 3.2) can be used to remove noisy predictions. The second example illustrates how protected morphology on categorical distributions (Sect. 4) can be used to model annotator bias.

5.1 Removing Noisy Predictions

Despite the impressive performance of neural networks for segmentation, the results are rarely perfect. Figure 6 shows part of an electron microscopy image of the hippocampus, along with multi-class predictions and segmentations obtained from [13]. Notice the noisy mitochondria predictions resulting in misclassifications highlighted in Fig. 6a. We can remove these misclassification by opening the mitochondria class before the final classification. Figure 7 shows the opened predictions along with the final classifications. Notice in particular how the errors in circle 2 in Fig. 7a are fixed, such that the vesicle (teal) and the endoplasmic reticulum (yellow) are separated by cytosol. This would have been very difficult to achieve by working directly on the final segmentations. That the vesicle and endoplasmic reticulum are probably misclassified just illustrates that not all things should be fixed in post-processing.

Fig. 6
figure 6

Electron microscopy image of the hippocampus with predictions of five classes: cytosol (white), membrane (blue), mitochondria (purple), endoplasmic reticulum (yellow) and vesicle (teal). By examining neighboring slices, the areas 1–3 have been confirmed to wrongly contain mitochondria predictions (Color figure online)

Fig. 7
figure 7

Fixing mitochondria misclassifications by opening the mitochondria predictions with \(B_{12}\)

5.2 Modeling Annotator Bias

Expert annotation is the gold standard in most clinical practice as well as for evaluating computer methods. However, annotation tasks are inherently subjective and prone to substantial inter-rater variation [14, 15]. When investigating the influence of this variation on statistics and decisions, it can be interesting to consider specific hypotheses regarding the variation. Consider the brain tumor annotation in Fig. 8. The annotation is derived from the QUBIQFootnote 1 challenge brain tumor dataset, where three annotators each annotated whole tumor, tumor core and active tumor. From this, we obtain an image with four categories: background, edema, active core and inactive core. Although the annotators have a high level of agreement, there is still substantial variation in the extent of edema and in how much of the tumor core is active.

Fig. 8
figure 8

Inter-rater variation in annotation of brain tumors. White is background, blue edema, yellow inactive core and purple active core. Variation is indicated by color mixing. The black circles highlights two regions with large variation (Color figure online)

Using protected dilation, we can, for example, hypothesize how the merged annotation would appear under the assumption that the tumor core is oversegmented, but the active part is undersegmented. Figure 9 shows the results where we first dilate the active core while protecting edema and background, then dilate edema while protecting background. This would allow us to easily investigate if statistical differences in a case–control study could be explained by biased annotations.

Fig. 9
figure 9

What could the annotation look like if the core was oversegmented, but the active part undersegmented? Dilation of active core while protecting edema and background, followed by dilation of edema while protecting background using \(B_1, B_2, B_3\)

6 Discussion and Conclusion

We have provided a thorough review of morphology on categorically valued images. Based on this, we have defined morphology on Dirichlet distributions and morphology on categorical distributions. Inspired by [2], we have further defined protected morphology on categorical distributions. We have demonstrated the behavior of the proposed operations and shown how they can be used in real-world applications such as noise removal in multi-class predictions and modeling annotator bias.

The definition of dilation is straightforward and no obvious alternatives present themselves. This is not so for erosion. In our definition, erosion corresponds to conditioning on a change in probability of the eroded category. An equally valid approach would be to also condition on where this change came from. Instead of simply rescaling the categories with nonzero mass, we could include information from the neighborhood. For example, when eroding i we would fill the difference \(f_i(x) - \epsilon (f_i;B_r)(x)\) based on the pixels that contribute to the difference, that is, those with minimum mass for i. This would result in smoother boundaries, which could be a better representation of uncertainty. A downside is that categories can leak into each other, leading to undesirable results.

In this work, we have focused on the basic morphological operations, dilation and erosion, and their compositions, closing and opening. A logical next step is to investigate more complex morphological operations, such as the morphological gradient, which may be used to investigate spatial relationship between categories by measuring the change in one category as a function of change in another category.

We have defined protected versions of dilation and erosion. From these, we could define opening and closing in the standard way. Alternatively, by changing which categories are protected for dilation and erosion we get more control over how a category is opened or closed. In [2], the authors explore similar ideas for the so-called tunneling and bridging operations on their set-based morphology, which would be interesting to consider in the context of categorical distributions.

Our aim in this work was to bring morphological operations to probabilistic representations of categorical images. These representations can be considered as generative processes that can be sampled. Naive sampling will result in noisy and unrealistic samples. Combining the sampling process with the proposed morphological operations could be an easy approach to obtain smoother and more realistic samples.

In summary, morphology is an indispensable tool for post-processing segmentations. Extending morphology to categorical images and their probabilistic counterparts presents a particular problem since there is in no inherent ordering of categories. In this paper, we have proposed to view categorical images as images of categorical distributions and defined morphological operations that are consistent with this view.