1 Introduction

The process of image understanding, as discussed by Tadeusiewicz and Ogiela (2004), Tomczyk and Szczepaniak (2007) and Tadeusiewicz et al. (2009), is usually more complex than finding of the segments that represent semantically-relevant objects. It can be regarded as a bottom-up process, in which the objects are organized in terms of increasing complexity. Within the process implemented in the machine, both the relevant image information and additional (expert) knowledge about the analyzed image need to be considered. In the general case, the expert’s life experience and even the cultural context can be necessary to use for solving the task of image content interpretation or understanding. Such an approach has already been advocated by Tadeusiewicz et al. (2009).

Active contours is a group of methods developed originally for image segmentation. Since the basic image element commonly considered in the analysis is a pixel, the method is considered as a low-level one. The term active contours was first introduced with reference to the snakes method by Kass et al. (1988). In that work, contour, defined as a parameterized curve, evolved until the desired object was identified in the image. The purpose of the contour’s evolution was defined by energy function, which was the objective function of the optimization process. For the purpose of optimization, calculus of variations was used. Its application led to iterative solution of partial differential equation set and, as a result, to iterative changes of the contour itself. The literature abounds in modifications and improvements of this basic method. Cohen (1991) introduced additional pressure forces able to compress or expand the contour, which was further elaborated by Ivins and Porrill (1994), who introduced region energy and region forces. Both of these changes allowed additional knowledge to be taken into account, in order to prevent the process from getting stuck in local minima of energy function, which happened in the case of wrong initialization of the contour. Similar reasons can be ascribed to the introduction of gradient vector flow by Xu and Prince (1998) and distance potential by Cohen and Cohen (1991). Other significant modifications can be found also in Amini et al. (1990, McInerney and Terzopoulos (1995), Delingette and Montagnat (2000).

The snakes method was the first but not the only variation of active contours to have emerged in the literature. The most significant of them are geometric active contours, which differ from the snakes method in that they do not take into account the information about parameterization of curves representing the contour (the snakes method is regarded as a parametric active contours method). Such an approach was introduced simultaneously by Casseles et al. (1993) and by Malladi et al. (1995) and used during the optimization of the level-set method, earlier applied by Osher and Sethian (1988) for the solution of front propagation problem. The main advantage of this method was the possibility to change easily the contour’s topology during its evolution. A variation of this technique was geodesic active contours introduced in two different ways by Caselles et al. (1997), Yezzi et al. (1997 and Kichenassamy et al. (1995). Here, similarly to the snakes method, the purpose of evolution was formulated in the form of energy function, and the evolution was formulated in the form of forces that influence the contour. This enabled Xu et al. (2000, 2001) to indicate the dependencies between those methods, which helped to share experiences.

Other types of active contours include: active shape models proposed by Cootes and Taylor (1992, 1994), where the contour was described by a set of landmark points, and the appropriately tested point distribution model helped to impose additional limitations on the evolving contour, Brownian strings introduced by Grzeszczuk and Levin (1997), where the contour is described linguistically and simulated annealing algorithm is used for the contour evolution, the approaches discussed by Jacob et al. (2001) and by Schnabel and Arridge (1995), in which the contour is represented by splines, the approach of Staib and Duncan (1989), in which Fourier descriptors are used and finally the active ray approach described by Denzler and Niemann (1996). Although substantially different in regard to contour’s description model, the above methods share a few characteristics, namely: application of the notion of contour, specification of the expectations concerning the image object (directly or indirectly) in the form of quality index called energy function, and optimization of the contour shape. In this paper, the analysis of medical images lies in the center of consideration. Although the roots of the method lie in the idea of active contours, the approach is more sophisticated. Within the presented process of image content interpretation a number of iteratively performed stages of image partitions is performed (from pixels to complex objects), with each part being ascribed a semantic meaning in such a way that, after each iteration, newly understood objects constitute additional knowledge about the image. The new knowledge can be used in the next step, and so on. The method called cognitive hierarchical active partitions CHAP for short (Tomczyk et al. 2010) uses external knowledge and experts experience, here the medical one. The application of the internet technology makes it possible that images can be analyzed instead of being collected (e.g. in the medical radiology center) while the physician is supporting the image understanding process in the distance manner. This is possible not only because of the use of computer networks but also because of the construction of the CHAP algorithm.

The paper is organized as follows. First, the CHAP method is presented. In Sect. 3, the pixel, line and circle based approaches are described. Section 4 deals with the description of system organization for distance semantic image analysis. The summary closes the paper. It should be emphasized that the intention of the authors is not to present details regarding system implementation but to present a new concept of image analysis system where humans and computers can collaborate to understand image content.

2 Cognitive hierarchical active partitions

As shown by Tomczyk and Szczepaniak (2005) and by Tomczyk et al. (2007), the classic contour can be regarded as a classifier of image points, and therefore, the classifier of pixels, which assigns the pixels one of two labels: l o or l b. The labels denote the object and the background, respectively. The classifier partitions the set of image points and, consequently, the set of pixels, into those composing the object:

$$ {\mathcal{O}}^{l^o} = \{ o\in {\mathcal{O}} : k(o) = l^o \} $$
(1)

and those composing the background:

$$ {\mathcal{O}}^{l^b} = \{ o\in {\mathcal{O}} : k(o) = l^b \} $$
(2)

where \(\mathcal{O}\) represents the whole set of pixels. Of course, \(\mathcal{O}^{l^o}\cup\mathcal{O}^{l^b}=\mathcal{O}\) and \(\mathcal{O}^{l^o}\cap\mathcal{O}^{l^b} = \emptyset\) The new object detected in the image by the active contour technique is thus represented by the set of all points corresponding to pixels labeled with l o.

The classic active contour approach can be generalized, given that image analysis generally consists in indicating those image fragments that carry a certain semantic meaning, which does not necessarily have to be (and most often is not) performed at the level of particular image points. For the purpose of generalization, let us assume that P represents a set of patches that have already been assigned a semantic meaning. Individual patch \(p\subseteq\mathbb{R}^2\) can denote, in the most simple case, a set containing a single point that has been assigned a color like in the active contour method, but also a division line between two regions of different characteristics, regions of uniform color or regions representing objects of higher-level semantic meaning significant for a class of images analyzed. It is worth emphasizing that in the methodology presented further every considered patch \(p\in P\) represents in the image an object \(o\in \mathcal{O}\) with a known semantic meaning, e.g. at the beginning of the analysis each image point represents a pixel. Moreover, although the patches correspond to separate objects, in general they do not have to be separate regions. Additionally, let us assume that k is not a binary classifier, but it assigns each object one of the L labels, denoted as l 1, …, l L . Then the partition defined by such a classifier consists of parts:

$$ {\mathcal{O}}^l = \{o\in {\mathcal{O}} : k(o) = l \} $$
(3)

for l being one of those labels that, like previously, are separate and compose the whole set \(\mathcal{O}. \) In this case, the new object corresponding to label l is naturally represented in the image by a patch being a sum of patches assigned to objects with label l assigned by classifier k. If optimal partition finding process is performed in the same way as in active contours, that is, by selecting the model of the classifier and then iterative finding of its optimal parameters for a given energy function, then one can speak of a new image analysis method referred to as active partitions. The name of the method reflects evolution of the partition that results from classifier’s evolution.

As mentioned, the process of automatic image content interpretation can be regarded as a bottom-up process, in which the objects are organized in terms of increasing complexity. This complex process should include both image information and external (human) knowledge about the image analyzed. What is also important for image content understanding is the experience of a person who is analyzing the image, which refers to expert knowledge. Since the approach requires the retrieval of semantically meaningful objects, one can make use of the active partitions approach described above. For this reason, the process of image content interpretation can be presented as iterative image partition, with each part being ascribed a semantic meaning in such a way that, after each iteration, newly understood objects constitute additional knowledge about the image. The new knowledge can be used in the next step.

Using notation described above, the presented concept of image analysis can be described in a following way: at the beginning assume that set \(\mathcal{O}\) contains all the pixels in the image (patch corresponding to each pixel contains a single image point) and iteratively perform steps adding new elements to set \(\mathcal{O}\) (i.e. extending the knowledge about image content) and remembering the patches corresponding to those objects. Each step is composed of: selection of subset of objects \(\mathcal{O}, \) construction of the proper classifier k of those objects, extraction of new objects basing on the partition determined by that classifier.

Due to its structure and the fact that for partition finding, especially in the more complex cases, one has to employ an algorithm that uses external knowledge and expert’s experience, the described method has been called Cognitive Hierarchical Active Partitions. Although it seems natural to use active partitions during construction of the proper classifier (or active contours at a pixel level), the partitioning can be performed by means of any segmentation algorithm or either supervised or unsupervised classification algorithm. Moreover, these approaches can be mixed.

A possible example of CHAP approach is depicted in Fig. 1. The analysis of individual pixels in Fig. 1a allows one to find only the regions of approximately the same color, which represent structures that look the same on images. However, these structures can represent various anatomical structures, which are impossible to determine without medical knowledge. In the example, it is particularly visible in the attempt to segment articular cartilage in a knee. Even though (due to contrast application) connective tissue is partially distinguishable from the neighboring structures, there are still multiple similar objects that look the same in the study. Nevertheless, anatomical knowledge combined with accumulated knowledge about location of other more specific structures in the knee (e.g. bones) gathered on the earlier stages of CHAP can lead to a proper analysis of an entire image. Moreover, it demonstrates that in one CHAP step it is sometimes necessary to use different kind of objects found so far. Localization of articular cartilage is possible when patches representing bones in Fig. 1b, c and joint cavity are available (Fig. 1d).

Fig. 1
figure 1

The concept of CHAP approach in a task of articular cartilage segmentation: a original image, b, c bones, d joint cavity approximation, e result

The above discussion refers to traditional 2D images, in which patches are subsets of \({\mathbb{R}^2. }\) However, CHAP can be applied also to image sequences, both spatial and video ones. Then patches are subsets of a corresponding space \({\mathbb{R}^n}\) for \({n\in \mathbb{N}. }\) Therefore, instead of patches, a new term is used, namely spatches—a blend of spatial and patch.

3 Sample spatch approaches

In order to understand an image, it is essential to combine the knowledge contained in an image with external knowledge and experience of an expert. This makes methods of knowledge representation in CHAP systems important.

Knowledge about an image is contained in set \(\mathcal{O}\) that includes semantically defined objects visible in an image (locations of those objects describe the corresponding spatches from set P). Because they carry semantic meaning, each of them can be described by such features as location and color of a pixel, coordinates of a circle’s center and circle’s radius, coordinates of end points of a segment, coordinates of a centroid, color and descriptors of a region’s shape, etc. Such information constitutes full knowledge about the objects but might not be always useful in that form, because it is different from human way of description, which is a statement in a natural language.

Analogously, in order to find an object in an image, it is necessary to use experience of an expert. However, to be used in a computer system, such an experience has to be expressed in an appropriate form. Unfortunately, the form of expression which is convenient to recognition algorithms is not convenient for a human expert, especially from non-technical domains. The problem is usually solved in two ways: expert provides a set of examples, or expressions in natural language reflecting their knowledge. Although the first approach is often applied, it requires accuracy and a big amount of work (the set should be representative and consistent). It is also time consuming (gathering a sufficient number of examples may take years). The second, though it also needs a proper translation of natural language statements to the form that can be used by computers, seems to be more convenient for the experts. It may require a knowledge engineer to express it as a mathematical formula or it can be done in semi-automatic way using e.g. fuzzy controller with quasi-natural language rules.

To sum up, the CHAP methodology can easily benefit from all kinds of knowledge described above, with quasi-natural language rules. which can be automatically transormed into a form of energy function for example by using machine learning techniques. This makes this method more humanized than the traditional methods of image analysis. Further, sample CHAP approaches with different types of spatch representations are presented.

3.1 Pixel spatch

In this example, the CHAP approach is used to detect the shape of left heart ventricle in CT images depicted in Fig. 2. Its localization is crucial since the analysis of ventricle shape can be used during the diagnostic process of pulmonary embolism.

Fig. 2
figure 2

Localization of the left ventricle interior using pixel spatch: a analysed image, b circle circumscribing both ventricles, c parabola identifying the localization of interventricular septum and the left part of the heart, d left heart ventricle interior

The proposed approach is composed of three CHAP steps where in every step the whole set of pixel spatches is partitioned and where, after each step, a new object is added to the set \(\mathcal{O}. \) The expert knowledge that is used during CHAP steps can be expressed in a natural language using the following statements:

  • Contour describing the interior of the ventricle should contain all the pixels representing blood inside that ventricle.

  • Contour should be possibly small but smooth since the interior of the ventricle can contain not only the blood but also fragments of heart muscle.

  • The blood with injected contrast is represented by bright pixels.

  • The interventricular septum is a part of the heart muscle separating both ventricles.

  • The interventricular is represented by darker pixels between the interiors of left and right ventricle.

The first CHAP step uses potential active contours (PAC), a variant of active partitions with pixel spatches described by Tomczyk (2007) and Tomczyk et al. (2009), to find the localization of both ventricles, which is shown in Fig. 2b. During the second step, using that information about ventricles, a parabola approximating interventricular septum is sought. The result is shown in Fig. 2c. The parabola partitions image into two parts: one containing the left and one containing the right ventricle. Finally, in the third step again PAC are used to determine the contour circumscribing the left ventricle. In this example those steps must be executed in a sequence because the next step depends on the results of the previous one. In general, however, if there are no such constraints they can be computed in parallel. A sample resulting spatch is presented in Fig. 2d. Of course, those three steps utilize the knowledge mentioned above. In particular, the energy function used during potential contour evolution in the first and third step is composed of the following two components:

  • E o : outside energy corresponding to the expert’s second statement, its value depends on the number of pixels representing blood inside the ventricles that lie outside of the contour (this component differs in the first and the third CHAP step, as in the latter it considers only pixels that lie on the right side of interventricular septum and consequently it depends on the result of the second CHAP step).

  • E a : area energy corresponding to the second expert’s statement, its value depends on the area of the region circumscribed by a contour.

The detailed description of the methodology used during automatic interpretation of those images and study of effectiveness can be found in works by Tomczyk et al. (2009), Tomczyk and Szczepaniak (2009, 2011).

3.2 Line spatch

Here, the application of active partitions method to detection of spicular lesions in mammograms is presented. Spicular lesions are pathological changes in breast with irregular centers and numerous fibrous spicules. Due to the spicules, a lesion is shaped like a star in radiological images, which is shown in Fig. 3. Detection of those changes is crucial, since they are often indicative of a breast cancer.

Fig. 3
figure 3

Sample results for line spatches: a analysed image, b all the considered segments representing edges between regions of different brightness and bright lines, c the localized spicular lesion

In the proposed method, we look for the segments creating star-shaped forms. For this reason, in the mammogram all visible lines should be detected. The attempts to automate this process have not brought satisfactory results due to the character of mammographic images. The segments detected are numerous and very small. Therefore, the first step of CHAP has been performed manually and, as a result, set \(\mathcal{O}\) contains pixels described by their coordinates and lines described by the coordinates of their end points. Of course, all those objects have their reflection in spatches, which, for a line, is shown in Fig. 3b.

In active partitions, it is necessary to use energy function, which applies the knowledge about the object that we look for. In the example presented, the function evaluates classifier k by taking into account two kinds of knowledge in the form that is readable for a human expert:

  • E s : shape energy reflecting the following statement: segments composing a spicular lesion should create a star-shaped form, and they should intersect at right angle a circle, the center of which is the centroid of those segments, and the radius of which equals average distance between the circle’s center and the segments’ midpoints.

  • E t : center energy reflecting the following statement: the center of the spicular lesion in the image is an area of high brightness. Its value reflects the reward equal to the square of average brightness of the pixels inside the circle found in the above energy component.

The second component is necessary as without it another set of segments that resembles a star could be found. A more detailed description of this approach can be found in work by Tomczyk et al. (2010).

3.3 Circle spatch

The present section focuses on the problem of automatic localization of a ventricular system in CT images of the brain. Exact recognition is important from the diagnostic point of view, since changes in the system, particularly deformities of shape, asymmetry, contraction or expansion are indicative of pathological changes in the central nervous system.

The CHAP approach requires the selection of components of set \(\mathcal{O}, \) which is indirectly determined by the selection of types of classifiers k. Since the retrieval of meaningful objects causes difficulties typical of such an analysis, we look for an easier solution, which would reduce the granularity problem (depart from pixel analysis) and help to avoid the problem of complex retrieval of objects.

The approach described in the present section is based on image partition performed by connecting points of a similar color into circular regions, which will constitute set \(\mathcal{O}. \) As a result, the number of objects is reduced drastically, which enables the application of more complex analysis methods. Moreover, the very definition of a circle carries information about the size and location of a region of approximately uniform color, while the set of circles contains information about the neighboring regions. Knowledge, which is absent if the image has not been pre-processed, is used and structured by the methods presented below.

Similarly to pixels, in the context of image analysis, it is the neighborhood of the circles that determines their meaning and importance. The description will be recorded in whole in the circle graph, also referred to as the graph of linguistic description. The choice of graph languages as a tool of linguistic description is not accidental. Being formal and semantically precise, it is also extensible and expressive enough to carry information about both object’s structure and external knowledge.

A fundamental characteristic of a linguistic description is its information content. In the case of a circle graph, circles defined for each node code information about subareas that are cohesive color-wise. Edges carry information about the neighborhood of such subareas and about the possibility of their composition into larger areas which may constitute potential candidates in the recognition process. Above all, however, the edges carry information about the shape of an area, supplementing surface information contained in a set of vertices and associated circles.

The energy function of the process presented was the sum of three components, basing on human expert suggestions:

  • E m : shape energy reflects knowledge about anatomical shape of the structure searched—it is given by an expert in the form of a prototype graph and can be computed using any similarity measure between candidate subset of linguistic description and a prototype.

  • E b : blackness energy reflects the following statement: study and platform profile dictates the region searched is expected to be black, as it is filled with cerebrospinal fluid. Consequently, it can be computed on the basis of color of the circles of candidate graph.

  • E c : centrality energy is based on an average distance between centers of circles of candidate graph and the center of the image and reflects the following information: head positioning and orientation during the study position ventricular chambers close to the center of images in which it is visible.

With a function constructed in this manner, one should remember that both blackness and centrality have supplementary meaning, which promotes well-located solutions, whereas linguistic matching cost is to promote areas of an expected shape, encoded in the prototype. What is important, the prototype may not exist in a literal sense (e.g. in the form of a circle graph). In this context, the term prototype should be applied to the knowledge of the system’s designer.

Various approaches to compute candidate-prototype similarity can be considered. Two groups of methods can be easily distinguished on the basis of how much knowledge encoded in a linguistic description is actually used:

  • Non-linguistic prototype matching: in this approach, the selected candidate graph is recoded to the set of pixels representation and the same is done with prototype after it has been placed onto the image. The surface similarity between the candidate and the prototype can then be computed using various measures, e.g. using measures for evaluating supervised and unsupervised classification, in particular those presented in Strehl (2002): purity, entrophy, precision, recall and their combinations (e.g. F1 measure), mutual information, etc. Proper placement and scaling of the prototype are crucial for maximizing the resulting similarity measure.

  • Linguistic prototype matching: it utilizes all information explicitly encoded in the image description, literally vertices, edges and their parametric description. In the work presented, an important observation is used—similar shapes are likely to have similar circle structures and spatial relations between them and their groups. A well-defined measure based on graph homeomorphism has been defined and used to measure a candidate to prototype graph similarity.

The results shown in Fig. 4, produced by an algorithm that uses linguistic matching demonstrate significant resemblance of shape between the ventricular system and the prototype. Again, emphasis is put on the shape localization problem, which is solved by adding to energy function two components: blackness and centrality. A detailed description of this methodology can be found in works by Tomczyk et al. (2010) and Jojczyk et al. (2010).

Fig. 4
figure 4

Sample results for circle spatches: a analyzed image, b all the circles representing regions of uniform color, c the localized part of ventricular system

4 System architecture and its functions

The rapid advance of imaging techniques and, consequently, the growing number of images that should be examined require perfectly trained specialists that are able to analyze those images. And, although that development brings such benefits as better medical diagnostic procedures, it causes also new problems that must be overcome.

First of all, the growing complexity of the images sometimes requires highly specialized experts to obtain the proper interpretation of the image content. In consequence, in many cases the process of image analysis can be difficult, specially if the experts were not there where the image was taken. Such a situation can occur for example during a diagnostic process in small provincial medical centers. The other problem is the number of the images that must be examined by a single expert. If he is the only specialist in a given region or if many images are taken in medical screening programs, there is no possibility to interpret all those images correctly either because there is not enough time or because of human tiredness. Those problems can be solved if computers are used to support the interpretation process. However, such systems are still very imperfect. Not only are they are very imprecise and still require an expert to verify their results, but they also need high computational power to achieve satisfactory results. In the latter case there exists a similar problem as the problem with highly specialized experts, namely the lack of proper computing units at the place where the image acquisition is made. Because of all those reasons computer-aided distributed and collaborative analysis of images seems to be a reasonable solution of the described problems.

The CHAP method is perfectly suited to be applied in distributed and collaborative systems because each CHAP step can be performed in different localizations where human experts or computers identify successive image elements. As an example of such a system the architecture presented in Fig. 5 can be considered. Note that this architecture is flexible and can be adapted to the particularly available human and hardware resources. Of course, for human experts there must be provided a proper user interface to allow easily acquire expert knowledge. The image analysis starts in the A node where either the computer or the expert takes the image that should be interpreted. The initial knowledge about this image is the information about its pixels. The B node (computer) performs the first step of the analysis enlarging the set of recognized image elements. This step requires a high computational power, so it must be executed in a separate node located in a proper computational center. The next step is performed by C node (computer) but in this case the results must be verified by the expert in D node before the identified objects could be added to the knowledge base. Further, node E (expert) and node F (computer) perform an additional analysis basing on the currently found objects. The expert in node E cannot be replaced by a machine because there are no automatic algorithms able to complete this analysis. Finally, node A uses the gathered knowledge to give the final interpretation of the image content. It is worth mentioning that this example reveals an additional advantage of the presented approach. In certain cases, some elements of the image interpretation can be done simultaneously (for example node E and node F), which can decrease the time of the whole analysis process. It is possible if different objects can be found basing on the same set of previously identified elements of the image.

Fig. 5
figure 5

A sample architecture of a distributed system of image interpretation based on the discussed CHAP approach. A initial node; B automated node extracting initial knowledge; C automated object extractor; D human verifier of objects recognised by C; E and F addition nodes, human and automated, respectively

Of course, the presented architecture is only a sample distributed application of the CHAP approach. In practice, any other arbitrarily chosen, simpler or more complex architectures can be used. For example, in Fig. 6a the first and the third step of the approach described in Sect. 3.1 can be performed using the same machine (node A), whereas the second one can be delegated to the remote computer (node B). Similarly, in Fig. 6b the first step of the methods presented in Sects. 3.2 and 3.3 can be assigned to the remote machine (node B), the obtained results can be verified by an expert in the original node (node A) and the final analysis can be performed again in another localization possessing sufficient computational power (node C).

Fig. 6
figure 6

Sample architectures of distributed CHAP approach for described examples of pixel, line and circle spatches with information where the successive CHAP steps are performed and what kind of knowledge is sent between nodes (numbers and letters refer to the images presented in Sect. 3)

5 Conclusions

The CHAP method applied to image content interpretation has been presented. The approach draws upon the active contours method, which employs not only the knowledge contained in the image but also any other knowledge that might be useful or sometimes indispensable for proper localization of objects in an image. Since full utilization of this knowledge can be complicated or even impossible, the task of image understanding has to be divided into steps, with each step contributing a new semantic knowledge. This helps to avoid low-level pixel analysis and focus on high-level spatch set analysis, which in many cases makes it easier to use expert knowledge. The usage of expert knowledge, which is a key element of the approach presented, should be performed in such a way as to let an expert demonstrate the knowledge in the most natural way. That is why this work presents three approaches based on the knowledge described using natural language and examples. This makes the proposed methodology more humanized. The examples prove that the approach is very promising and can be applied in many domains at different analysis levels.

The method presented brings two possible enhancements to image analysis systems/processes. The first of them is connected with its deployment. Logical separation and transparency of analysis steps allow remote node deployment, bringing flexibility in both computational and human resources management. What is more, the method facilitates allocating human experts to subtasks of various granularity and level of complication, which enables effective use of their knowledge and abilities that are currently unattainable for automated agents. As a result, if properly implemented, the method presented may boost both quality and speed of an analysis process allowing for limited resources.

It is worth emphasizing that the distance/distributed medicine can also benefit from the construction and features of the described methodology. As shown, each CHAP step can be easily performed in a different methodology. It is a great advantage in a world where more and more images must be analyzed.