VoxLogicA: a Spatial Model Checker for Declarative Image Analysis (Extended Version)

Spatial and spatio-temporal model checking techniques have a wide range of application domains, among which large scale distributed systems and signal and image analysis. We explore a new domain, namely (semi-)automatic contouring in Medical Imaging, introducing the tool VoxLogicA which merges the state-of-the-art library of computational imaging algorithms ITK with the unique combination of declarative specification and optimised execution provided by spatial logic model checking. The result is a rapid, logic based analysis development methodology. The analysis of an existing benchmark of medical images for segmentation of brain tumours shows that simple VoxLogicA analysis can reach state-of-the-art accuracy, competing with best-in-class algorithms, with the advantage of explainability and replicability. Furthermore, due to a two-orders-of-magnitude speedup compared to the existing general-purpose spatio-temporal model checker topochecker, VoxLogicA enables interactive development of analysis of 3D medical images, which can greatly facilitate the work of professionals in this domain.


Introduction and Related Work
Spatial and Spatio-temporal model checking have gained an increasing interest in recent years in various domains of application ranging from Collective Adaptive Systems [14,10,17] and networked systems [27], to signals [32] and images [26,13]. Research in this field has its origin in the topological approach to spatial logics, dating back to the work of Alfred Tarski, who first recognised the possibility of reasoning on physical, continuous, space using topology as a mathematical framework for the interpretation of modal logic (see [8] for a thorough introduction). More recently these early theoretical foundations have been extended to encompass reasoning about discrete spatial structures, such as graphs and images, extending the theoretical framework of topology to (quasi discrete) closure spaces (see for instance [23,24,1]). That framework has subsequently been taken further in recent work by Ciancia et al. [12,13,16] resulting in the definition of the Spatial Logic for Closure Spaces (SLCS), temporal extensions (see [11,35,32]), and related model checking algorithms and tools.
The main idea of spatial (and spatio-temporal) model checking is to use specifications written in a suitable logical language to describe spatial properties and to automatically identify patterns and structures of interest in a variety of domains (see e.g., [5,17,15]). In this paper we focus on one such domain, namely medical imaging for radiotherapy, and brain tumour segmentation in particular, which is a important and currently very active research domain of its own. One of the technical challenges of the development of automated (brain) tumour segmentation is that lesion areas are only defined through differences in the intensity (luminosity) in the (black & white) images that are relative to the intensity of the surrounding normal tissue. A further complication is that even (laborious and time consuming) manual segmentation by experts shows significant variations when intensity gradients between adjacent tissue structures are smooth or partially obscured [31]. Moreover, there is a considerable variation across images from different patients and images obtained with different Magnetic Resonance Images (MRI) scanners. Several automatic and semi-automatic methods have been proposed in this very active research area (see for example [29,36,19,34,20,22]).
In this paper, continuing the research line of [7,3,6], we present the free and open source tool VoxLogicA (acronym for Voxel-based Logical Analyser ) 3 , catering for a novel approach to image segmentation, namely a rapid-development, declarative, logic-based method, supported by spatial model checking, tailored to identify 'regions of interest' in medical images. This approach is particularly suitable to reason at the "macro-level", by exploiting the relative spatial relations between tissues or organs at risk. VoxLogicA is similar, in the accepted logical language, and functionality, to the spatio-temporal model checker topochecker 4 , but specifically designed for the analysis of (possibly multi-dimensional, e.g. 3D) digital images as a specialised image analysis tool. It is tailored to usability and efficiency by employing state-of-the-art algorithms and open source libraries, borrowed from computational image processing, in combination with efficient spatial model checking algorithms.
We show the application of VoxLogicA on BraTS 2017 [31,2], a publicly available set of benchmark MRI images for brain tumour segmentation, with an associated yearly challenge. For each image in the benchmark a manual segmentation of the tumour by domain experts is available, enabling rigorous and objective qualitative comparisons via established similarity indexes. We show that our simple, high-level logical specifications for glioblastoma segmentation are directly competitive with the state-of-the-art techniques submitted to the BraTS 2017 challenge, some of which based on machine learning. Our approach to segmentation has the unique advantage of explainability and replicability.
Logically specified procedures can be understood by humans and improved to encompass new observations by domain experts. The fact that our segmentation results are of very high quality indicates, in our opinion, that in the medical domain, expert knowledge -that is central in our approach -plays a fundamental role. However, machine learning can still be used, for instance, to fine-tune numeric parameters for the analysis, and will constitute exciting future work.
The outline of the paper is as follows. Section 2 briefly recalls the spatial logic framework on which VoxLogicA is based. Section 3 describes the architecture of the VoxLogicA and an example of its use, whereas in Section 4 the results of the application of VoxLogicA on a publicly available set of benchmark MRI images are presented. Finally, in Sect. 5 we conclude and provide an outline for future research.

The Spatial Logic Framework
In this section, we briefly recall the logical language ImgQL (Image Query Language) proposed in [3], which is based on the Spatial Logic for Closure Spaces SLCS [12,13] and which forms the kernel of the framework we propose in the present paper. We then show two examples of operators specifically designed for digital image analysis, which we express as additional logic operators. In Section 4 we will show how the resulting logic can be used for actual analysis via spatial model checking.

Foundations: Spatial Logics for Closure Spaces
The logic for closure spaces we use in the present paper is closely related to SLCS [12,13] and, in particular, to the SLCS extension with distance-based operators presented in [3]. As in [3], the resulting logic constitutes the kernel of a solid logical framework for reasoning about texture features of digital images, when interpreted as closure spaces.
In the context of our work, a digital image is not only a 2-dimensional grid of pixels, but, more generally, a multi-dimensional (very often, 3-dimensional) grid of hyper-rectangular elements that are called voxels ("volumetric picture elements"). When voxels are not hypercubes, images are said to be anisotropic; this is usually the case in medical imaging. Furthermore, a digital image may contain information about its "real world" spatial dimensions, position (origin) and rotation, permitting one to compute the real-world coordinates of the centre and edges of each voxel. In medical imaging, such information is typically encapsulated into data by machines such as MRI scanners. In the remainder of the paper, we make no dimensionality assumptions, and we therefore refer to picture elements as voxels. We recall the main definitions below referring the reader to the afore mentioned papers for details, examples and discussion.

Definition 1.
A closure space is a pair (X, C) where X is a non-empty set (of points) and C : 2 X → 2 X is a function satisfying the following axioms: y R x} satisfies the axioms of Definition 1 thus making (X, C R ) a closure space. Whenever a closure space is generated by a relation as above, it is called a quasi-discrete closure space. A quasi-discrete closure space (X, C R ), can be used as the basis for a mathematical model of a digital image. X represents the finite set of voxels and R is the reflexive and symmetric adjacency relation between voxels [25]. We note in passing that several different adjacency relations can be used. For instance, in the orthogonal adjacency relation only voxels which share an edge count as adjacent, so that, in the 2D case, each pixel is adjacent to (itself and) four other pixels; on the other hand, in the orthodiagonal adjacency relation voxels are adjacent as long as they share at least either an edge or a corner, so that, again in the 2D case, each pixel is adjacent to (itself and) eight other pixels. A closure space (X, C) can be enriched with a notion of distance, i.e. a function d : X × X → R ≥0 ∪ {∞} such that d(x, y) = 0 iff x = y, leading to the distance closure space ((X, C), d). 5 It is sometimes convenient to equip the points of a closure space with attributes; for instance, in the case of images, such attributes could be the color or intensity of voxels. We assume sets A and V of attribute names and values, and an attribute valuation function A such that A(x, a) ∈ V is the value of attribute a of point x. Attributes can be used in assertions α, i.e. boolean expressions, with standard syntax and semantics. Consequently, we abstract from related details here and assume function A extended in the obvious way; for instance, is the closure space of natural numbers with the successor relation: (n, m) ∈ Succ ⇔ m = n + 1. Informally: the ordering in the path imposed by N is compatible with relation R, i.e. π(i) R π(i + 1). For given set P of atomic predicates p, and interval of R I, the syntax of the logic we use in this paper is given below: Informally, it is assumed that space is modelled by the set of points of a distance closure model; each atomic predicate p ∈ P models a specific feature of points and is thus associated with the points that have this feature 6 . A point x satisfies N Φ if a point satisfying Φ can be reached from x in at most one (closure) step, i.e. if x is near (or close) to a point satisfying Φ; x satisfies ρ Φ 1 [Φ 2 ] if x may reach a point satisfying Φ 1 via a path passing only by points satisfying Φ 2 ; it satisfies D I Φ if its distance from the set of points satisfying Φ falls in interval 5 We recall that for In addition, as the definition of d might require the elements of R to be weighted, quasi-discrete distance closure spaces may be enriched with a R-weighting function W : R → R assigning the weight W(x, y) to each (x, y) ∈ R. In the sequel we will keep W implicit, whenever possible and for the sake of simplicity. 6 In particular, a predicate p can be a defined one, by means of a definition as p := α, meaning that the feature of interest is characterized by the (boolean) value of α.
I. Finally, the logic includes logical negation (¬) and conjunction (∧). In the following we formalise the semantics of the logic, noting that all definitions are independent from the nature of the paths (e.g. being quasi-discrete or not) or of the closure space.
where ((X, C), d) is a distance closure space, A : X × A → V an attribute valuation, and V : P → 2 X is a valuation assigning to each atomic predicate the set of points where it holds. • is defined by induction on the structure of formulas: ⇔ there exist path π and index such that: π(0) = x and M, π( ) |= Φ 1 and for all indexes j : where, whenever p := α is a definition for p, we assume x ∈ V(p) if and only if A(x, α) yields the truth-value true. • In the logic proposed in [12,13], the "may reach" operator is not present, and the surrounded operator S has been defined as basic operator as follows: x satisfies Φ 1 S Φ 2 if and only if x belongs to an area of points satisfying Φ 1 and one cannot "escape" from such an area without hitting a point satisfying Φ 2 . Several types of reachability predicates can be derived from S. However, reachability is in turn a widespread primitive, implemented in various forms (e.g., flooding, connected components) in programming libraries. Thus, in this work we prefer to use reachability as a basic predicate of the logic 7 . In the sequel we show that S can be derived from the operators defined above, employing a definition patterned after the model-checking algorithm of [12]. This change simplifies the definition of several derived connectives, including that of touch (see below), and resulted in notably faster execution times for analyses using such derived connectives. We first recall the formal definition of S: for all paths πand indexes the following holds: Proposition 1. For all closure models M = ((X, C), A, V) and all formulas Φ 1 , Φ 2 the following holds: Definition 4. We define a number of derived operators that are of particular use in medical image analysis: is satisfied by points that satisfy Φ 1 and that are on a Φ 1 -path that can reach a point satisfying Φ 2 . The formula grow (Φ 1 , Φ 2 ) is satisfied by points that satisfy Φ 1 and by points that satisfy Φ 2 which are on a Φ 2 -path that can reach a point satisfying Φ 1 . The formula flt(r, Φ 1 ) is satisfied by points that are at a distance of less than r from a point that is at least at distance r from points that do not satisfy Φ 1 . This operator in practice works as a filter where only contiguous areas of points satisfying Φ 1 that have a minimal diameter of at least 2 * r are preserved (or taken into consideration), while also smoothening those areas in case they have an irregular shape (e.g. having protrusions of less than the indicated distance).
We conclude this section with an illustration of the derived spatial operators touch, grow and the surrounded operator in combination with the distance operator. In the top row of Fig. 1 some sample input images (bitmaps) are shown of 100 by 100 points that are red, blue or black. In the second row of Fig. 1 the results of some spatial properties are shown, where points in the original image that satisfy the spatial formula φ are shown as white points. Fig. 1e shows the points that satisfy Φ = blue S red, i.e. blue points that are surrounded by red points in Fig. 1a. Fig. 1f shows the points that satisfy Φ = touch(red, blue), i.e. red points that are on a path, consisting of only red points, that reaches a blue point in Fig. 1b. There are no points that satisfy f lt(5.0, red) as the red area has a minimal diameter which is less than 5.0 (result not shown). Fig. 1g shows the points that satisfy Φ = grow(red, blue) of those in Fig. 1c. Fig. 1h shows the results of a formula combining the surround operator with a distance operator where the white points are those corresponding to red points in Fig. 1d that are surrounded by points that are less than 11 points (in distance) away from points that are blue. In Fig. 1d the area with red points is 30 by 30 points, the black area separating the red and the blue areas is 10 points wide.

Digital Image Analysis Specific Operators
In the sequel, we provide some details on a logical operator, first defined in [3], that we use in the context of Texture Analysis (see for example [28,30,9,18]) for defining a notion of statistical similarity between image regions. The statistical distribution of an area Y of a black and white image is approximated by the histogram of the grey levels of points (voxels) belonging to Y , limiting , with k bins and m, M min and max values respectively. The mean h of a histogram h with k bins is the quantity 1 k k i=1 h(i). The cross correlation between two histograms h 1 , h 2 with the same number k of bins is defined as follows: The value of r is normalised so that −1 ≤ r ≤ 1; r(h 1 , h 2 ) = 1 indicates that h 1 and h 2 are perfectly correlated (that is, We embed statistical similarity in the logic by adding it to the grammar defined by (1) and extending the definition of the satisfaction relation (Def. 3) with the following equation, with m, M, k as above: 1], ∈ {<, ≤, =, ≥, >} and S(x, r) = {y ∈ X|d(x, y) ≤ r} is the sphere of radius r centred in x. Note that, differently from topochecker that was used in [3], in VoxLogicA, for efficiency reasons, r is actually the hypercube with edge size 2 * r, which is discretely approximated by a hyperrectangle after computing the number of voxels corresponding to r in each dimension. So c m M k r a b Φ compares the region of the image constituted by the sphere (hypercube) of radius r centred in x against the region characterised by Φ. The comparison is based on the cross correlation of the histograms of the chosen attributes of (the points of) the two regions, namely a and b and both histograms share the same range ([m, M ]) and the same bins ( [1, k]). In summary, the operator allows to check to which extent the sphere (hypercube) around the point of interest is statistically similar to a given region (specified by) Φ.
Finally, we introduce the percentiles operator; percentiles(φ 1 , φ 2 ) takes a number-valued image φ 1 and a boolean-valued mask φ 2 and returns an image in which each point is associated to the percentile rank of its intensity in φ 1 with respect to the population of voxels that are true in φ 2 . For example, a point x with percentile rank 0.8 is a point with an intensity such that 80% of the points in the image, that correspond to the mask, have an intensity below that of point p. This quantitative operator is essential to be able to use the same segmentation specification on images that may vary in intensity distribution (i.e. a similar phenomenon as variations in slightly under or over exposed photographs). It makes it possible to avoid the use of absolute values in constraints on the intensity of points, as will be illustrated in Sect. 4. The percentile ranking of a point is defined as pr(p) = c l (p)+0.5 * fi(p) N * 100%, where c l (p) is the count of all intensities that are below the intensity of p, f i (p) is the frequency of the intensity of p and N is the total number of points in the considered part of the image (namely that part corresponding to the mask).

The Tool VoxLogicA
Functionality-wise, VoxLogicA specialises topochecker to the case of spatial analysis of multi-dimensional images. It interprets a specification written in the ImgQL language, using a set of multi-dimensional images 8 as models of the spatial logic, and produces as output a set of multi-dimensional images representing the valuation of user-specified expressions. For logical operators, such images are Boolean-valued, that is, regions of interest in medical imaging terminology, which may be loaded as overlays in medical image viewers. Non-logical operators may generate number-valued images. VoxLogicA augments ImgQL with file loading and saving primitives, and a set of additional commodity operators, specifically aimed at image analysis, that is destined to grow along with future developments of the tool. The main execution modality of VoxLogicA is batch execution. A (currently experimental) graphical user interface is under development. A planned future development is interactive execution, in particular for semi-automated analysis, by letting a domain expert calibrate numeric parameters in real-time, while seeing the intermediate and final results.
Implementation-wise, the tool achieves a two-orders-of-magnitude speedup with respect to topochecker. Such speedup has permitted the rapid development of a novel procedure for automatic segmentation of glioblastoma that, besides being competitive with respect to the state-of-the-art in the field (see Section 4), is also easily replicable and explainable to humans, and therefore amenable of improvement by the community of medical imaging practitioners.

Functionality
We provide an overview of the tool functionality, starting from the syntax of analysis session files. For space reasons, we largely omit details on parsing rules (e.g. the syntax of identifiers and infix operators, which is delegated to the tool documentation). In the following, f, x1,..., xN, x are identifiers, "s" is a string, and e1, ..., eN, e are expressions (to be detailed later). A VoxLogicA specification consists of a text file containing a sequence of (side-effecting) commands (see Specification 1 in Sect. 4 as an example). Five commands are currently implemented: let f(x1,...,xN) = e is used for function declaration, also in the form let f = e(constant declaration), and with special syntactic provisions to define infix operators; declarations have no imperative side effects, and only affect name bindings, so that after execution of the command, name f is bound to a function or constant that evaluates to e -with the appropriate substitutions of expressions (to be detailed later) for parameters; load x = "s", loads an image from file "s" and binds it to x for subsequent usage; save "s" e, stores the image resulting from evaluation of expression e to file "s"; print "s" e prints to the log (in batch mode, the console, with prints decorated by elapsed time and other debug information)the string s followed by the numeric, or boolean, result of computing e; import "s" imports a library of declarations from file "s"; subsequent import declarations for the same file are not processed; furthermore, such imported files can only contain let or import commands.
VoxLogicA comes equipped with a set of built-in functions, among which arithmetic operators, logic primitives as described in Section 2, and imaging specific operators, stemming from basic tasks such as computing the gray-scale intensity of a colour image, or getting the separate colour components, to advanced operations such as computing cross correlation of histograms (see Section 2.2). An exhaustive list of the available built-ins is provided in the user manual. Furthermore, a "standard library" is provided containing short-hands for commonly used functions, and for derived operators. An expression may be a numeric literal (no distinction is made between floating point and integer constants), a named constant (e.g. x), a function application (e. g. f(x1,x2)), an infix operator application (e.g. x1 + x2), or a parenthesized (sub-)expression (e.g. (x1 + x2)).
The language features strong dynamic typing, that is, types of expressions are unambiguously checked and errors are precisely reported, but such checks are only performed at "run time", that is, when evaluating closed-form expressions with no free variables. No function or operator overloading is possible in the current version, although this is a planned improvement to the type system, as well as some form of static typing. However, it is not the case that a type error may waste a long-running analysis. Type checking occurs after loading and parsing, but before analysis is run.
Actual program execution after parsing is divided into two phases. First (usually, in a negligible amount of time), all the "save" and "print" instructions are examined to determine what expressions actually need to be computed; in this phase, name binding is resolved, all constant and function applications are substituted with closed expressions, types are checked and the environment associating names to expressions is discarded. Finally, the set of closed expressions to be evaluated is transformed into a set of tasks to be executed, possibly in parallel, and dependencies among them. After this phase, no further syntax processing or name resolution are needed, and it is guaranteed that the program is free from type errors. The second phase simply runs each task -in dependency order -parallelising execution on multiple CPU cores when possible.
Each built-in logical operator has an associated type of its input parameters and output result. The available types are inductively defined as Number, Bool, String, Model, and Valuation(t), where t is in turn a type. The type Model is the type assigned to x in load x = "f"; operations such as the extraction of RGB components take this type as input, and return as output the only parametric type: Valuation(t), which is the type of a multi-dimensional image in which each voxel contains a value of type t. For instance, the red component of a loaded model has type Valuation(Number), whereas the result of evaluating a logic formula has type Valuation(Bool) 9 .
The design of VoxLogicA, and in particular its simple type system, and the low number of basic constructs, has been tailored to usability by an audience of users with diverse backgrounds. For example, the tool abstracts from technical aspects such as the number of bits and representation (signed/unsigned integer or floating point) of numeric values, in favour of an extremely declarative and automated approach. In this respect, VoxLogicA expressions should be thought of as the equivalent, in the application domain, of SQL queries over a relational database. In this line, a very important aspect of the execution semantics of VoxLogicA specifications is the fact that expressions have no side effects and are amenable to memoization, that is, intermediate results are cached and reused when computing equal sub-expressions. In VoxLogicA, memoization is not just an optimization technique, but rather the core of the execution engine (see Section 3.3), used to achieve maximal sharing of subformulas along execution. In other words, in VoxLogicA, no expression is ever computed twice 10 . This frees the user from having to worry about how many times a given function is called, and makes execution of complex macros and logical operators feasible. To see this, consider that a function that uses one of its parameters twice, when called with an expression e as an argument, should in principle (e.g., when side-effects are allowed) evaluate e twice; this would lead to non-linear growth of the computation time with respect to the size of the input, which is avoided by memoization. Without using such technique, in our experiments, the number of subformulas to compute could easily reach numbers in the order of one million. With maximal sharing, this reduces to the order of one hundred.

Efficiency and Comparison with topochecker
The evaluation of VoxLogicA in Sect. 4 involves sets of 3D images of size 240 × 240 × 155 (about 9 million voxels), and uses features of VoxLogicA, that are not present in topochecker. On the other hand, the example specification in [3], and its variant aimed at 3D images, can be readily used to compare the performance of VoxLogicA and topochecker. The specifications consist of two humanauthored text files of about 30 lines each; the largest logic formula in the experiments is the one identifying the oedema, generating a syntax tree (when all macros are expanded) of depth about 80, and about one milion subformulas. The machine used for testing is a desktop computer equipped with a 7th generation Intel Core I7 processor and 16GB of RAM. In the 2D case (image size: 512 × 512), topochecker takes 52 seconds to complete the analysis, whereas VoxLogicA takes 750 milliseconds. In the 3D case (image size: 512 × 512 × 24), topochecker takes about 30 minutes, whereas VoxLogicA takes 15 seconds. As we mentioned already, such a huge improvement is due to the combination of a specialised imaging library, new algorithms (e.g., for statistical similarity of regions), parallel execution and other optimisations. One could get into more details by designing a specialised set of benchmarks, some of which can also be run using topochecker; however, for the purposes of the current paper, the performance difference is so large that we do not deem such detailed comparison necessary.

Implementation details
VoxLogicA is implemented in the functional, object-oriented programming language FSharp, using the .NET Core implementation of the .NET specification 11 . This permits a single code base with minimal environment-dependent setup to be cross-compiled and deployed as a standalone executable, for the major desktop operating systems, namely Linux, macOS, and Windows. Despite .NET code is compiled for an intermediate machine, this does not mean that efficiency of VoxLogicA is somehow "non-native". There are quite a number of measures in place to maximise efficiency. First and foremost, the execution time is heavily dominated by the time spent in native libraries (more details below), and VoxLogicA acts as a higher-level, declarative front-end for such libraries, adding a logical language, memoization, parallel execution, and abstraction from a plethora of technical details that a state-of-the-art imaging library necessarily exposes. In our experiments, parsing, memoization, and preparation of the tasks to be run may take a fraction of a second; the rest of the execution time (usually, several seconds, unless the analysis is extremely simple) is spent in foreign function calls.
The major performance boosters in VoxLogicA are: a state-of-the-art computational imaging library (ITK); the optimised implementation of the may reach operator; a new algorithm for statistical cross-correlation; an efficient memoizing execution engine; parallel evaluation of independent tasks, exploiting modern multi-core CPUs. Besides, special care has been put in making all performancecritical loops allocationless. All used memory along the loops is pre-allocated, avoiding the risk to trigger garbage collection during computation. We will address each of them briefly in the following.
ITK library. VoxLogicA is heavily based on the state-of-the-art imaging library ITK, exploiting the SimpleITK glue library 12 . Most of the logical and non-logical operators of VoxLogicA are implemented directly by a library call (notable exceptions include the reaches logical connective, and statistical cross-correlation).
Novel algorithms. The two most relevant operators that are not simply based on an ITK function are mayReach and crossCorrelation, implementing, respectively, the logical operator ρ, and statistical comparison described in Section 2.2. The computation of the voxels satisfying ρ φ 1 [φ 2 ] can be implemented either using flooding or by exploiting the connected components of φ 2 as a flooding primitive; both solutions are available as primitives in SimpleITK, and in our experiments, connected components perform better using this library than plain flooding, for large input seeds. More precisely, it is easy to see that the voxels satisfying ρ φ 1 [φ 2 ] are those belonging either to N φ 1 , or to N ψ, where ψ contains each voxel x such that there is a connected component C of φ 2 , with x ∈ C and C containing at least one voxel satisfying N φ 1 . The surrounded logical connective is defined in terms of mayReach; also other frequently-used operators such as touch are derived from may reach. Therefore, an optimised algorithm for mayReach is a key performance improvement. CrossCorrelation is a resource intensive operation, as it requires one to compute the histogram of a multi-dimensional hyperrectangle at each voxel. We note that it is not straightforward to apply pre-computation methods, such as the integral histogram [33], since the cross-correlation operator in VoxLogicA can be applied to images that are created along the computation by sub-formulas, so these are not known a priori. In this work, we have designed a specialised parallel algorithm exploiting additivity of histograms. Given two sets of values P 1 , P 2 , let h 1 , h 2 be their respective histograms, and let h 1 , h 2 be the histograms of P 1 \ P 2 and P 2 \ P 1 . For i a bin, we have h 2 (i) = h 1 (i) − h 1 (i) + h 2 (i). This property leads to a particularly efficient algorithm when P 1 and P 2 are two hyperrectangles centred over adjacent voxels, as P 1 \ P 2 and P 2 \ P 1 are hyperfaces, having one dimension less than hyperrectangles. Thus, our algorithm first divides a multi-dimensional image into as many partitions as the number of processors in the machine running the program, and then computes a Hamiltonian path h p for each partition p, passing by each voxel of p exactly once (such paths are cached along program execution, too). After computing the histogram of a hypercube for each h p (namely the one centred on its first element), in parallel, each partition p is visited in the order imposed by h p , the histogram is computed incrementally as described above, and cross-correlation is also computed and stored in the resulting (quantitative) image, which can then be further manipulated, for example, by applying thresholds.
Memoizing execution semantics. As we already mentioned, no expression is computed twice in VoxLogicA. Sub-expressions are by construction identified up-to syntactic equality and assigned a number, representing a unique identifier (UID). UIDs start from 0 and are contiguous, therefore admitting an array of all existing sub-formulas to be used to pre-computed valuations of expressions without further hashing.
Dereferentiation of memory pointers. In performance-critical functions that do not have a direct counterpart in ITK, the interoperability-oriented functionality of FSharp (and the dotnet platform) is exploited to access the underlying memory of an image directly, even with array bounds checking turned off where possible, to achieve C-like efficiency in declarative, functional code.

Design and data structures
The design of VoxLogicA defines three implementation layers. The core execution engine implements the concurrent, memoizing semantics of the tool. The interpreter is responsible for translating source code into core library invocations. These two layers only include some basic arithmetic and boolean primitives. Operators can be added by inheriting from the abstract base class Model, which is part of the core layer, defining appropriate methods, and tagging them with the identifier and type to be used in the interpreter, using so-called .NET attributes. Upon instantiation, all such methods are found via the reflection capabilities of FSharp, and added to the ones available in the interpreter. The third implementation layer is the instantiation of the core layer to define operators from ImgQL, and loading and saving of graphical models, using the ITK library. We provide some more detail on the design of the core layer, which is the most critical part of VoxLogicA. At the time of writing, the core consists of just 350 lines of FSharp code, that has been carefully engineered not only for performance, but also for ease of maintenance and future extensions.
A number of classes are essential to make this possible. The central ones are named ModelChecker, FormulaFactory, Formula, and Operator, of which Constant is a subclass. Class Operator is used for evaluating an operator with a provided list of arguments (via the method Eval). Its attributes include a string name, and the type of arguments and return value. The class Formula is a symbolic representation of a syntactic sub-expression. Each instance of Formula has a unique numeric id (UID), an instance of Operator, and (inductively) a list of Formula instances, denoting its arguments. The UID of a formula is determined by the operator name (which is unique across the application), and the list of parameters. Therefore, by construction, it is not possible to build two different instances of Formula that are syntactically equal. The class FormulaFactory manages UID assignment. It features a Create method, that given an instance of Operator and a list of UIDs, returns either a fresh instance of Formula or an existing one, using a hash table for the purpose. In the current implementation, there is only one instance of FormulaFactory, but nothing prevents one to use more instances in the same program for different instances of the core layer. UIDs are contiguous and start from 0. This permits all created formulas to be inserted into an array. Furthermore, UIDs are allocated in such a way that the natural number order is a topological sort of the dependency graph between subformulas (that is, if f 1 is a parameter of f 2 , the UID of f 1 is greater than the UID of f 2 ). This is exploited in class ModelChecker; internally, the class uses an array to store the results of evaluating each Formula instance, implementing memoization. The class ModelChecker turns each formula in a task to be executed. Whenever formula with UID i is an argument of the formula with UID j, a dependency is noted between the associated tasks. The class ModelChecker then makes use of the high-level, lightweight concurrent programming library Hopac 13 and its abstractions to maximise CPU usage on multi-core machines, by queuing tasks for parallel evaluation whenever their dependencies have already been evaluated. Hopac takes care of scheduling tasks to processor cores. This guarantees high CPU efficiency 14 . The interpreter uses one instance of the class ModelChecker, which is parameterised by one instance of Model. In VoxLogicA, only one instance of Model is defined, namely the "third layer" specialised to image processing that we already mentioned.

Experimental Evaluation: Segmentation of Glioblastoma in 3D Medical Images
We have evaluated the performance of VoxLogicA in two ways. The first considers the use of VoxLogicA for the segmentation of Glioblastoma in 3D medical images obtained as MRI scans in NIfTI-1 format from patients in preparation for radiotherapy. In radiotherapy the clinical target volume (CTV) of the whole tumour is considered, which is an extension of the gross tumour volume (GTV), corresponding to what can actually be seen on an image. For glioblastomas this margin is a 2-2.5 cm isotropic expansion of the GTV volume within the brain. In addition, the performance of VoxLogicA has been evaluated on the Brain Tumor Image Segmentation Benchmark (BraTS) of 2017 [31,2] for what concerns the GTV of the whole tumour. This evaluation provides information on the quality of the proposed approach compared to the most advanced techniques that are currently being developed. The 2017 version of this benchmark contains 211 multi contrast MRI scans of low and high grade glioma patients that have been obtained from multiple institutions and were acquired with different clinical protocols and various scanners. All the imaging data sets provided by BraTS 2017 have been segmented manually and were approved by experienced neuroradiologists. In our evaluation we have used the T2 Fluid Attenuated Inversion Recovery (FLAIR) type of scans, which is one of the four provided modalities in the benchmark.

ImgQL segmentation procedure
Specification 1 shows the tumour segmentation procedure that we used for the evaluation. The syntax is that of VoxLogicA, namely: |,&,! are boolean or, and, not; distlt(r,a) is the set of points having distance less than r from the points that are true in a (similarly, distgt; distances are in millimiters); crossCorrelation(r,a,b,phi,m,M,k) is an intermediate function yielding a cross-correlation coefficient for each voxel, to which a predicate c may be applied to obtain the statistical similarity function of Section 2.2; the > operator performs thresholding of an image; border is true on voxels that lay at the border of the image; other operators should be self-explaining. The various intermediate phases of the segmentation procedure, for axial view of one specific 2D slice of an example 3D MRI scan of the BraTS 2017 data set, are shown in Fig 2. We briefly discuss the main parts of this specification. Line 1 imports the definitions of some basic derived operators. Lines 2-4 define a few specific operators for the segmentation (see Def. 4). Lines 4-5 load the 3D MRI scan in .nii format as well as the manually segmented one used for calculating the indexes. Line 9 defines the background i.e. all voxels having intensity less than 0.1 which are part of an area that touches the border of the image. Points that are part of the brain are all those points that do not satisfy background. The application of percentiles in line 11 assigns to each point of the brain the percentile rank to which it belongs considering the range of intensities of points that are part of the brain in the FLAIR image. Based on these percentiles hyper-intense and veryintense points are identified that satisfy hI and vI, respectively (Lines 12-13).
Hyper-intense points have a very high likelihood to belong to tumour tissue and the very-high intensity points are likely to belong to the tumour as well, or to the oedema that is usually surrounding the tumour. However, not all hyper-intense and very-intense points are part of a tumour. The idea is to identify the real tumour using further information. First of all, the hyper-intense points should form an area of certain dimensions and moreover they should be close to the area covered by the oedema. Such characteristics can be easily specified in the ImgQL language. In lines 14-15 the hyper-intense and very-intense points are filtered such that noise is removed (i.e. single points or very small areas of hyper-intense points) and only areas of a certain relevant size are considered. The points that satisfy hyperIntense and veryIntense are shown in red in Fig. 2a and in Fig 2b, respectively. In line 16 the areas of hyper-intense points are extended via the grow operator with those points that are very intense and part of a contiguous area of very intense points (the oedema that accompanies the tumour) that in turn touches the hyper-intense areas (likely tumour tissue). The points that satisfy growTum are shown in red in Fig. 2c. In line 17 the previously-defined (line 8) similarity operator is used to assign to all voxels a texture-similarity score with respect to growTum. In line 18 this operator is used to find those voxels that have a high cross correlation coefficient and likely part of the tumour. The result is shown in Fig. 2d. Finally (line 19), the voxels that are identified as part of the whole tumour are those that satisfy growTum extended with those that are statistically similar to it via the grow operator. Points that satisfy tumFinal are shown in red in Fig. 2e and points identified by manual segmentation are shown for comparison in blue in the same figure (so that the overlapping areas are shown in purple).
Interesting aspects of the ImgQL specification are its relative simplicity and abstraction level, fitting that of neuro-radiologists, its explainability, its timeefficient verification, admitting a rapid development cycle, and its independence of normalisation procedures through the use of percentiles rather than absolute values for the intensity of voxels.

VoxLogicA evaluation results
In this domain results of tumour segmentation are evaluated based on a number of indexes commonly used to compare the quality of different techniques. These indexes are based on the true positive (TP) voxels (voxels that are identified as part of a tumour in both manual and VoxLogicA segmentation), true negatives (TN) voxels (those that are not identified as part of a tumour in both manual and VoxLogicA segmentation), false positives (FP) voxels (those identified as part of a tumour by VoxLogicA but not by manual segmentation) and false negatives (FN) voxels (those identified as part of a tumour by manual segmentation but not by VoxLogicA). Based on these four types the following indexes are defined:  Table 1 shows the mean values of the above indexes for both GTV and CTV volumes for Specification 1 applied to the BraTS 2017 training phase collection. For comparison, we report also the mean and standard deviation of the 20 techniques reported in the BraTS 2017 Segmentation Challenge [21] that were applied to at least 100 cases of the same training phase data. Full data for these techniques is available in the leaderboard 15 . Table 1 shows that our results are well in line with the state-of-the-art  in this domain, which is very encouraging. The evaluation of each case study takes about 10 seconds on a desktop computer equipped with a 7th generation Intel Core I7 processor and 16GB of RAM, paving the way to interactive use and visualisation by neuro-radiologists, which is one of the longer term aims of our work. For the CTV a Dice value of 0.9 is sufficiently accurate for use in preparation of radiotherapy. Further inspection of results of individual cases may provide ideas for further improvements of Specification 1. Furthermore, in this evaluation we have focused on the segmentation of the tumour as a whole, leaving further segmentation of the tumour in various other types of tissue (e.g. necrotic and non-enhancing parts) for future research.

Conclusions and Future Work
We presented VoxLogicA, a spatial model checker designed and optimised for the analysis of multi-dimensional digital images. The tool has been successfully evaluated on 211 cases of an international brain tumour 3D MRI segmentation benchmark. The obtained results are well-positioned w.r.t. the performance of state-of-the-art segmentation techniques, both efficiency-wise and accuracy-wise. Concerning efficiency, the ImgQL specification that has been evaluated provides results on 3D MRI images in less than 15 seconds on a desktop machine. The results are well inline with the current state-of-art for accuracy based on commonly accepted indexes, which is exceptional for the simple specification provided (less than 30 lines including loading, saving and printing indexes), that is open for further improvement. Future research work based on the tool will focus on further benchmarking (e.g. various other types of tumours and tumour tissue such as necrotic and non-enhancing parts), and clinical application. On the development side, planned future work includes a graphical (web) interface for interactve parameter calibration (for that, execution times will need to be further improved, possibly employing GPU computing); improvements in the type-system (e.g. operator overloading); turning the first two design layers into a reusable library available for other projects. Finally, the (currently small, albeit useful) library of logical and imaging-related primitives available will be enhanced, based on input from case studies. Experimentation in combining machine-learning technologies with the logic-based approach of VoxLogicA are also a worthwile research line to explore.

A Proof of Proposition 1
Proof. We prove that where by M, x |= Φ we mean that M, x |= Φ does not hold: