1 Motivation

In the mathematical literature discrepancy theory is devoted to problems related to irregularities of distributions. In this context the term discrepancy refers to a measure that evaluates to which extent a given distribution deviates from total uniformity in measure-theoretic, combinatorial and geometric settings. This theory goes back to Weyl [39] and is still an active field of research, see, e.g., [3, 12, 19]. Applications can be found in the field of numerical integration, especially for Monte Carlo methods in high dimensions, see, e.g., [28, 36, 40], or in computational geometry, see, e.g., [1, 9, 21]. For applications to data storage problems on parallel disks, see [10, 13] and for halftoning images, see [31].

This paper is motivated by [24] which applies Weyl’s discrepancy concept in order to derive an ordering-dependent norm for measuring the (dis-)similarity between patterns. In this context the focus lies on evaluating the auto-misalignment that measures the deviation of some function f(⋅) with its translated version f(⋅−T) with respect to the lag T. The function f represents a signal, the intensity profile of a line of an image, an image or volumetric data. The interesting point about this is that based on Weyl’s discrepancy concept distance measures can be constructed that guarantee the desirable registration properties: (R1) the measure vanishes if and only if the lag vanishes, (R2) the measure increases monotonically with an increasing lag, and (R3) the measure obeys a Lipschitz condition that guarantees smooth changes also for patterns with high frequencies. As proven in [26], properties (R1)–(R3) are not satisfied simultaneously by commonly used measures in this context like mutual information, Kullback–Leibler distance or the Jensen–Rényi divergence measure which are special variants of f-divergence and f-information measures, see, e.g., [4, 20, 29, 30, 37], nor do the standard measures based on p-norms or the widely used correlation measures due to Pearson or Spearman, see [8, 17, 34].

From the point of view of applications properties (R1)–(R3) are relevant for a variety of problems whenever the misalignment of a pattern with its shifted version has to be evaluated. Such problems encounter as autocorrelation in signal processing, see [5]. In computer vision such problems are particularly encountered in stereo matching as point correspondence problem, see, e.g., [32] and [26], in template matching, e.g., for the purpose of print inspection, see, e.g., [7] and [23], in superpixel matching [16] or in defect detection in textured images, see [6, 25, 35]. In these cases, for high-frequency patterns, the discrepancy norm leads to cost functions with less local extrema and a more distinctive region of convergence in the neighborhood of the global minimum compared to commonly used (dis-)similarity measures.

A further promising field of future application is related to measuring the similarity between event-based signals as encountered in neuroscience due to the all-or-none characteristics of neural signals, see, e.g., [38] and, closely related, in event-based imaging, see, e.g., [18] and [22]. In this context it is interesting to point out that the asynchronicity of neighboring sensor elements can lead to misaligned response sequences of events in time. Figure 1 illustrates a sequence of all-or-none events and its auto-misalignment functions induced by the normalized cross-correlation on the one hand and the discrepancy norm on the other hand. Due to properties (R1)–(R3), the discrepancy norm induces a topology in the space of such signals which is compatible with the asynchronicity effect. This means that slightly shifted versions of a sequence of events are still recognized as similar.

Fig. 1
figure 1

Figure (a) shows a time series with 0 or 1 values in the depicted interval and 0 outside. Figure (b) shows its misalignment function induced by the normalized cross-correlation (dotted line) and the discrepancy norm (solid line)

The question addressed in this paper therefore is what makes the discrepancy so special when applied to differences of index shifted sequences. This paper provides a geometric analysis that makes clear that the discrepancy norm is inherently related to measuring the distance between index-shifted sequences.

The paper first recalls Weyl’s definition [39] in Sect. 2, formulates it as a norm on ℝn and recalls some of its properties from [23]. As the main result of this paper, in Sect. 3 its unit ball is revealed as special zonotope. Section 4 focusses on geometric properties of this zonotope like the number of k-dimensional faces in Sect. 4.1 and its volume in Sect. 4.2.

2 The Discrepancy Norm

In [39] Weyl introduces a concept of discrepancy in the context of pseudorandomness of sequences of numbers from the unit interval. He proposes the formula

$$ D_N = \sup_{0 \leq a < b \leq 1}\biggl \vert \frac{N(a,b)}{N} - (b-a) \biggr \vert $$
(1)

to measure the deviation of a sequence (y k ) k∈{1,…,N}⊂(0,1) from a uniformly distributed sequence where N(a,b)=|{k∈{1,…,N}|y k ∈(a,b)}|, a,b∈(0,1), b>a. As a generalization, the discrepancy of measures μ and ν is defined as

$$ D(\mu,\nu)= \sup_{A\in \tilde{A} \subset\mathcal{A}} \bigl|\mu(A) - \nu(A)\bigr|, $$
(2)

where \(\mathcal{A}\) is a σ-algebra of measurable sets over the domain \(\mathcal{U}\), and μ, ν are signed measures defined on the measure space \((\mathcal{U}, \mathcal{A})\).

For linear combinations of Dirac measures δ {i} on ℤ given by \(\mu = \sum_{i=1}^{n} x_{i} \delta_{\{i\}}\) and \(\nu = \sum_{i=1}^{n} y_{i} \delta_{\{i\}}\), x i ,y i ∈ℝ, and \(\tilde{A}\) a set of index intervals, Definition (2) yields

$$D(\mu,\nu) = \sup_{1\leq n_1 \leq n_2 \leq n} \Biggl|\sum_{i=n_1}^{n_2} x_i - y_i\Biggr|. $$

Therefore, for a summable sequence of real values x=(x i ) i∈ℤ, ∑ i∈ℤ|x i |<∞, Weyl’s discrepancy concept leads to the definition

$$ \|x\|_D = \sup_{n_1, n_2 \in \mathbb{Z}: n_1 \leq n_2} \Biggl|\sum_{i=n_1}^{n_2} x_i\Biggr|, $$
(3)

which induces a norm, see Appendix. Applications of the norm (3) can be found in pattern recognition [27] and print inspection in the context of pixel classification [2]. In contrast to p-norms ∥⋅∥ p , ∥x p =(∑ i |x i |p)(1/p), the norm ∥⋅∥ D strongly depends on the sign and also the ordering of the entries, as illustrated by the examples ∥(−1,1,−1,1)∥ D =1 and ∥(−1,−1,1,1)∥ D =2.

Generally, x=(x i ) i with x i ≥0 entails ∥x D =∥x1, and x=((−1)i) i the equality ∥x D =∥x, respectively, indicating that the more there are alternating signs of consecutive entries, the lower is the value of the discrepancy norm. Observe that ∥x≤∥x D ≤∥x1; hence, due to Hölder’s inequality, n −1/px p ≤∥x D n 1−1/px p . For convenience, let us consider a sequence (x i ) i with iI n , x i =0 for iI n , and denote by Δ x (k)=∥(x i+k x i ) i D the misalignment function of x with respect to ∥⋅∥ D . For the proof of the following properties, see Appendix:

  1. (P1)

    \(\|(x_{i})_{i\in I_{n}}\|_{D}\) induces a norm on ℝn.

  2. (P2)

    Δ x (0)=0 for all summable real sequences x.

  3. (P3)

    \(\| (x_{i})_{i\in I_{n}}\|_{D} = \max\{0,\max_{k\in I_{n}} \sum_{i = 1}^{k} x_{i}\} - \min\{0, \min_{k \in I_{n}} \sum_{i = 1}^{k} x_{i}\}\).

  4. (P4)

    Δ x (k)≤|k|⋅L, where L=max i x i −min i x i , k∈ℤ, and x i ≥0.

  5. (P5)

    Δ x (k)=Δ x (−k) for x=(x i ) i with x i ≥0 and k∈ℤ.

  6. (P6)

    For x=(x i ) i with x i ≥0, the function Δ x (⋅) is monotonically increasing on ℕ∪{0}.

The equation of (P3) allows us to compute the discrepancy of a sequence of length n with O(n) operations instead of O(n 2) number of operations resulting from the original Definition (3). Especially the monotonicity (P6) and the Lipschitz property (P4) are interesting properties for applications in the field of signal analysis. It is instructive to point out that the Lipschitz constant in (P4) does not depend on frequencies or other characteristics of the sequence x. Properties (P4), (P5), and (P6) are illustrated in Figs. 1(a) and 1(b), which demonstrate the behavior of the misalignment function of a sequence of all-or-none events. While Fig. 1(a) shows typical local minima of the misalignment function with respect to the Euclidean norm, Fig. 1(b) visualizes the symmetry property (P5), the monotonicity property (P6), and the boundedness of its slope due to the Lipschitz property (P4) of the corresponding misalignment function induced by the discrepancy norm.

3 The Unit Ball of the Discrepancy Norm as Convex Polytope

In this section we consider the unit ball of the discrepancy norm in dimension n∈ℕ, \(B_{D}^{(n)} = \{\mathbf{x} \in \mathbb{R}^{n}| \|\mathbf{x}\|_{D} \leq 1 \}\), as geometric object. Definition (3) immediately leads to the representation

$$ B_{D}^{(n)} = \Biggl\{ (x_i)_i \in \mathbb{R}^n \bigg \vert -1 \leq \sum_{i=1}^n 1_{I}(i) x_i \leq 1, I \in \mathcal{I}_n \Biggr\}, $$
(4)

where \(\mathcal{I}_{n}\) denotes the set of subintervals from {1,…,n}, and 1 I (⋅) the indicator function given by 1 I (i)=1 if and only if iI. Equation (4) represents the unit ball \(B_{D}^{(n)}\) as bounded intersection of a set of half-spaces, which shows that the unit balls of the discrepancy norm are convex polytopes. Figures 2(a) and 2(b) illustrate the unit balls \(B_{D}^{(n)}\) for n=2 and n=3. Lemma 1 shows a first relationship between \(~B_{D}^{(n)}\) and the (n+1)-hypercube.

Fig. 2
figure 2

Illustration of the unit balls of the discrepancy norm in ℝ2 and ℝ3

Lemma 1

Let x=(x i ) i ∈[−1,1]n withx D ≤1. Then,

$$ (c ,c + x_1, c + x_1 + x_2, \ldots,c + x_1 + \cdots + x_n) \in [0,1]^{n+1} $$
(5)

if and only if

$$ - \min_{k= 1}^n\Biggl\{0, \sum _{j= 1}^k x_j\Biggr\} \leq c \leq 1 - \max_{k= 1}^n\Biggl\{0, \sum _{j= 1}^k x_j\Biggr\}. $$
(6)

The constant c is uniquely determined if and only ifx D =1.

Proof

Note that if \(\min_{k= 1}^{n}\{0, \sum_{j= 1}^{k} x_{j}\} <0\), then

$$ \min_{i=1}^n \Biggl\{- \min_{k= 1}^n\Biggl\{0, \sum_{j= 1}^k x_j\Biggr\} + \sum_{j=1}^i x_j \Biggr\} = 0, $$
(7)

and that if \(\max_{k= 1}^{n}\{0, \sum_{j= 1}^{k} x_{j}\} >0\), there holds

$$ \max_{i=1}^n \Biggl\{1 - \max_{k= 1}^n\Biggl\{0, \sum_{j= 1}^k x_j\Biggr\} + \sum_{j=1}^i x_j \Biggr\} = 1. $$
(8)

According to property (P3), the assumption ∥x D ≤1 implies

$$ 0 \leq - \min_{k= 1}^n\Biggl\{0, \sum _{j= 1}^k x_j\Biggr\} + \sum _{j=1}^i x_j \leq 1 - \max_{k= 1}^n\Biggl\{0, \sum_{j= 1}^k x_j\Biggr\} + \sum_{j=1}^i x_j \leq 1, $$
(9)

which shows that condition (6) implies formula (5). Formulas (7) and (8) reveal that the bounds 0 and 1 in inequality (9) are assumed, showing the necessity of condition (6). □

Given a sequence x=(x 1,…,x n ) with ∥x D ≤1 Lemma 1 reveals that x can be represented as a sequence of differences y i+1y i , iI n , with y i ∈[0,1], and that such a representation is uniquely determined if ∥x D =1. This observation motivates Lemma 2, which points out a fundamental relationship between the discrepancy and the maximum norm.

Lemma 2

Let x=(x i ) i ∈ℝn+1, n∈ℕ. Then,

$$ \bigl\|(x_{i+1} - x_{i})_{i \in I_n}\bigr\|_D = \bigl\| \bigl(x_i - \min\{x_i\}\bigr)_{i \in I_{n+1}}\bigr\|_{\infty}. $$
(10)

Proof

Consider an x with ∥x−min i {x i }∥>0 and set \(\tilde{x}_{i} = \frac{x_{i} - \min_{i}\{x_{i}\}}{\| \mathbf{x} - \min_{i}\{x_{i}\}\|_{\infty} } \in [0,1]\). Then by the Lipschitz property (P4) we get

$$ \bigl\|( \tilde{x}_{i+1} - \tilde{x}_i)_i\bigr\|_D \leq 1. $$
(11)

Since \(\max_{i}\{\tilde{x}_{i}\}= 1\) and \(\min_{i}\{\tilde{x}_{i}\}= 0\), there are indices i 0 and i 1 such that \(\tilde{x}_{i_{0}}=1\) and \(\tilde{x}_{i_{1}}=0\). Without loss of generality, let us assume that i 0<i 1. Then \(\|( \tilde{x}_{i+1} - \tilde{x}_{i})_{i}\|_{D} \geq | \tilde{x}_{i_{1}} - \tilde{x}_{i_{1}-1} + \dots + \tilde{x}_{i_{0}+1} - \tilde{x}_{i_{0}} | = 1\), which, together with (11), yields \(\|( \tilde{x}_{i+1} - \tilde{x}_{i})_{i}\|_{D} = 1\), and hence ∥(x i+1x i ) i D =∥x−min i {x i }∥. □

For convenience, let us define that for a sequence \(\mathbf{x} = (x_{i})_{i\in I_{n}}\in \mathbb{R}^{n}\), the index interval CI n is called a core discrepancy interval with respect to x if and only if for any subset \(\tilde{C} \subseteq C\) with \(|\sum_{i\in \tilde{C}} x_{i}| = \|\mathbf{x}\|_{D}\), it follows that \(\tilde{C} = C\). Note that for any x, due to the definition of the discrepancy norm, there is at least one core discrepancy interval. Further, for convenience, let 0=(0,…,0)T and 1=(1,…,1)T.

With these prerequisites we come to the central result of this paper that characterizes the vertices \(\mathrm{vert}(B_{D}^{(n)})\) of \(B_{D}^{(n)}\) in terms of vertices of the hypercube of dimension (n+1).

Lemma 3

x∈ℝn is a vertex of the convex polytope \(B_{D}^{(n)}\) if and only ifx D =1 and x∈{−1,0,1}n.

Proof

First of all we show that \(B_{D}^{(n)}\) equals the convex hull of \(\mathcal{D}^{(n)} = \{\mathbf{c}\in \{-1,0,1\}^{n} | \|\mathbf{c}\|_{D}=1\}\).

Observe that \(\mathrm{conv}(\mathcal{D}^{(n)}) \subseteq B_{D}^{(n)}\) follows immediately from definition (3) and the representation (4) as \(\mathbf{x} \in \mathcal{D}^{(n)}\) implies \(\sup_{n_{1}, n_{2} \in \mathbb{Z}: n_{1} \leq n_{2}} |\sum_{i=n_{1}}^{n_{2}} x_{i}| \leq 1\).

What remains to be shown is that an arbitrary \(\mathbf{x} \in B_{D}^{(n)}\) can be represented as a convex combination of elements from the set \(\mathcal{D}^{(n)}\).

Therefore, suppose that x∉{−1,0,1}n with ∥x D =1. Let C={n 1,…,n 2}⊆I n be a core discrepancy interval with respect to x. Without loss of generality we may assume that ∑ iC x i >0.

Let us consider the cases n 1>1 or n 1=1. For the case that n 1>1, let us set

$$ \alpha_i := \sum _{j=i}^{n_1-1} x_j. $$
(12)

Observe that \(\alpha_{i^{*}} > 0\) for some index i ∈{1,…,n 1−1} entails \(\sum_{i= i^{*}}^{n_{2}} x_{i} = \alpha_{i^{*}} + \sum_{i = n_{1}}^{n_{2}} x_{i} > \sum_{i = n_{1}}^{n_{2}} x_{i} = \|\mathbf{x}\|_{D} \) and, therefore, contradicts the fact that C is a core discrepancy interval. From this it follows that

$$ \alpha_i \leq 0 $$
(13)

for all indices i∈{1,…,n 1−1}.

Now, arrange the partial sums α i , i∈{1,…,n 1−1}, in an increasing order: \(0 \leq -\alpha_{r_{1}} \leq -\alpha_{r_{2}} \leq \dots \leq -\alpha_{r_{n_{1}-1}} \), and set (k∈{1,…,n 1−1})

(14)

Then, we have λ i ≥0 for i∈{1,…,n 1−1}, and due to \(-\alpha_{r_{1}} = \lambda_{1}\) and \(-\alpha_{r_{k}} = \lambda_{1} + \dots + \lambda_{k}\), we obtain \(\sum_{i= 1}^{n_{1}-1} \lambda_{i} \leq \max_{i= 1}^{n_{1}-1} \{-\alpha_{i}\} \leq 1 \). Consequently,

$$ \lambda_0 := 1 - \sum _{i = 1}^{n_1-1} \lambda_{i} \in [0,1]. $$
(15)

Finally, we get the representation

$$\alpha_{r_{k}} = \lambda_{1}\cdot v(\alpha_{r_k}, \lambda_{1} ) + \dots + \lambda_{n_1-1}\cdot v( \alpha_{r_k}, \lambda_{n_1-1} + \dots + \lambda_{1}), $$

where

$$v(\alpha, \lambda) = \left \{ \begin{array}{l@{\quad}l} -1 & \mbox{if } \lambda \leq -\alpha \\ 0 & \mbox{else. } \end{array} \right . $$

Next let us define the auxiliary vectors \(\mathbf{s}^{(0)}, \mathbf{s}^{(1)},\ldots, \mathbf{s}^{({n_{1}-1})} \in \{-1,0\}^{(n_{1}-1)}\) given by

(16)

where j∈{1,…,n 1−1}. Observe that the vectors (16), the scalars (14) and (15) yield

$$\left ( \begin{array}{c} \alpha_{1} \\ \vdots\\ \alpha_{n_1-1} \end{array} \right ) = \sum_{j=1}^{n_1-1} \lambda_{j} \mathbf{s}^{(j)} + \lambda_0 \mathbf{s}^{(0)}. $$

Hence,

$$\left ( \begin{array}{c} x_{1} \\ x_{2} \\ \vdots\\ x_{n_1-1} \end{array} \right ) = \left ( \begin{array}{c} \alpha_{1}-\alpha_{2} \\ \vdots\\ \alpha_{n_1-2} - \alpha_{n_1-1} \\ \alpha_{n_1-1} - 0 \end{array} \right ) = \sum _{j=1}^{n_1-1} \lambda_{j} \tilde{\mathbf{g}}^{(j)} + \lambda_0 \tilde{\mathbf{g}}^{(0)}, $$

where \(\tilde{\mathbf{g}}^{(0)} = \mathbf{0}\) and

$$\tilde{\mathbf{g}}^{(j)} := \left ( \begin{array}{c} v(\alpha_{1},\sum_{l=1}^{j}\lambda_{l} ) - v(\alpha_{2},\sum_{l=1}^{j}\lambda_{l} ) \\ \vdots\\ v(\alpha_{n_1-2}, \sum_{l=1}^{j}\lambda_{l} ) - v(\alpha_{n_1-1}, \sum_{l=1}^{j}\lambda_{l} ) \\ v(\alpha_{n_1-1},\sum_{l=1}^{j}\lambda_{l} ) - 0 \\ \end{array} \right ) \in \{-1,0,1 \}^{n_1-1}. $$

Note that \(\tilde{\mathbf{g}}^{(1)},\ldots, \tilde{\mathbf{g}}^{({n_{1}-1})} \in \mathcal{D}^{(n_{1}-1)} \) because of

$$\bigl\|\tilde{\mathbf{g}}^{(j)}\bigr\|_D = \underbrace{ \max_{k}\Biggl\{0, v\Biggl(\alpha_{ k}, \sum _{l = 1}^{j} \lambda_l\Biggr)\Biggr \}}_{0} - \min_{k}\Biggl\{0, v\Biggl(\alpha_{ k}, \sum_{l = 1}^{j} \lambda_l\Biggr) \Biggr\} = 1 $$

for j∈{1,…,n 1−1}. Note that also \(\tilde{\mathbf{g}}^{(0)}\) can be represented as a convex combination of vectors of \(\mathcal{D}^{(n_{1}-1)}\), e.g., \(\tilde{\mathbf{g}}^{(0)} = \frac{1}{2} (1,-1, 0, \dots,0)^{T} + \frac{1}{2} (-1,1, 0, \dots,0)^{T}\). This proves that \((x_{1}, \ldots, x_{n_{1}-1})\) can be represented as a convex combination of elements

$$ {\mathbf{g}}^{(i)}\in \mathcal{D}^{(n_1-1)}. $$
(17)

For the other case that n 1=1, let us set \(\beta_{i} := \sum_{j=n_{1}}^{i} x_{j} \) where i∈{n 1,…,n 2}. If n 1=n 2, the core discrepancy interval property of C entails that \(x_{n_{1}} = 1\). Therefore, let us consider the case n 2>n 1. Then, the assumption \(\beta_{i^{*}} < 0\) for some index i ∈{n 1,…,n 2−1} leads to \(\sum_{i=n_{1}}^{n_{2}} x_{i} = \beta_{i^{*}} + \sum_{i=i^{*}}^{n_{2}} x_{i} \) implying that \(\sum_{i=i^{*}}^{n_{2}} x_{i} > \sum_{i=n_{1}}^{n_{2}} x_{i}\). This contradicts the core discrepancy interval property of C; hence, in analogy to the case n 1>1 and formula (13), we get that β i ≥0 for all i∈{n 1,…,n 2}. Now, reasoning steps analogous to the case n 1>1 can be applied in order to show that \((x_{n_{1}}, \ldots, x_{n_{2}})\) can be represented as a convex combination of elements

$$ {\mathbf{s}}^{(j)}\in \mathcal{D}^{(n_2-n_1 + 1)}. $$
(18)

If n 2=n, we are ready, and if n 2<n, then let us consider \(\gamma_{i} := \sum_{j=n_{2} +1}^{n} x_{j} \), which, in analogy to the case n 1>1 leads to γ i ≤0, for which the same reasoning as in case n 1>1 can be applied showing that \((x_{n_{2}+1}, \ldots, x_{n})\) can be represented as a convex combination of elements

$$ {\mathbf{l}}^{(k)}\in \mathcal{D}^{(n-n_2)}. $$
(19)

Putting all together, formulas (17), (18), (19) show that x can be represented by elements

$$\left ( \begin{array}{c} {\mathbf{g}}^{(i)} \\ {\mathbf{s}}^{(j)} \\ {\mathbf{l}}^{(k)} \end{array} \right ) \in \mathcal{D}^{(n)} $$

showing that \(\mathrm{conv}(\mathcal{D}^{(n)}) = B_{D}^{(n)}\).

Finally we show that all elements of \(\mathcal{D}^{(n)}\) are vertices of \(\mathrm{conv}(\mathcal{D}^{(n)})\). Suppose that \(\mathbf{v}_{0} \in \mathcal{D}^{(n)}\) can be represented as convex combination of elements \(\mathbf{v}_{i} \in \mathcal{D}^{(n)}\backslash \{\mathbf{v}_{0}\}\), \(i \in \{1, \ldots, |\mathcal{D}^{(n)}|-1\}\), i.e.,

$$ \mathbf{v}_0 = \sum _i \lambda_i \mathbf{v}_i, $$
(20)

i λ i =1, λ i ≥0. Then, due to Lemma 1 and the fact that ∥v i D =1, \(i \in \{0, \ldots, |\mathcal{D}^{(n)}|-1\}\), there are constants c i such that \(\overline{\mathbf{v}}_{i} = (c_{i}, c_{i} + v_{1}^{i}, \ldots, c_{i} + v_{1}^{i} + \cdots + v_{n}^{i}) \in [0,1]^{n+1}\), where \(\mathbf{v}_{i} = (v_{1}^{i}, \ldots, v_{n}^{i})\). Since \(v_{j}^{i} \in \{-1,0,1\}\), Lemma 1 tells that \(c_{i} = 1 - \max_{k=1}^{n}\{0, \sum_{j=1}^{k} v_{j}^{i}\} \in \{0,1\}\). From this it follows that \(\overline{\mathbf{v}}_{i} \in \{0,1\}^{n+1}\). Further, ∥v i D =1 implies \(\overline{\mathbf{v}}_{i} \in \{0,1\}^{n+1}\backslash \{\mathbf{0}, \mathbf{1}\}\). Note that v i v j implies \(\overline{\mathbf{v}}_{i} \neq \overline{\mathbf{v}}_{j}\). Now, let us consider the linear mapping

$$ \mathcal{D}\bigl((x_i)_{i\in I_{n+1}}\bigr) = (x_{i+1}-x_i)_{i\in I_n}, $$
(21)

where \((x_{i})_{i\in I_{n+1}} \in \{0,1\}^{n+1}\backslash \{\mathbf{0}, \mathbf{1}\}\). Equation (20) expressed in terms of (21) means that \(\mathcal{D}(\overline{\mathbf{v}}_{0}) = \sum_{i} \lambda_{i} \mathcal{D}(\overline{\mathbf{v}}_{i})\), which entails \(\mathcal{D}(\overline{\mathbf{v}}_{0}) = \mathcal{D}( \sum_{i} \lambda_{i} \overline{\mathbf{v}}_{i})\). Since \(\mathcal{D}(\overline{\mathbf{v}}) = \mathcal{D}(\overline{\mathbf{w}})\) implies \(\overline{\mathbf{v}} = \overline{\mathbf{w}} + c \cdot \mathbf{1}\) for some c∈ℝ, we obtain

$$ \sum_i \lambda_i \overline{\mathbf{v}}_i = \overline{\mathbf{v}}_0 + c \cdot \mathbf{1}. $$
(22)

Since \(\{\overline{\mathbf{v}}_{0} + c\cdot \mathbf{1}| c \in \mathbb{R}\} \cap [0,1]^{n+1} = \{\overline{\mathbf{v}}_{0}\}\), we obtain c=0 in Eq. (22), and hence \(\overline{\mathbf{v}} = \overline{\mathbf{w}}\). This proves the injectivity of the mapping (21).

Note that \(\overline{\mathbf{v}}_{0}\) is a vertex of the hypercube and \(\sum_{i} \lambda_{i} \overline{\mathbf{v}}_{i}\) is an element of the hypercube [0,1]n+1. But \(\overline{\mathbf{v}}_{0}\) as a vertex of the hypercube cannot be represented as a convex combination of other vertices of the hypercube [0,1]n+1 different from \(\overline{\mathbf{v}}_{0}\), which by means of the injectivity of (21) shows that assumption (20) cannot be true. Consequently, we get \(\mathcal{D}^{(n)} = \mathrm{vert}(\mathrm{conv}(\mathcal{D}^{(n)}))\), and together with the first part of the proof, \(\mathrm{conv}(\mathcal{D}^{(n)}) = B_{D}^{(n)}\), we conclude that \(\mathrm{vert}(B_{D}^{(n)}) = \mathcal{D}^{(n)}\), which ends the proof. □

Next, the main result that characterizes the unit ball of the discrepancy norm by means of the mapping (21) is stated.

Theorem 1

Let \(B_{D}^{(n)}\) denote the n-dimensional unit ball of the discrepancy norm, n≥1.

  1. (a)

    The mapping \(\mathcal{D}:\{0,1\}^{n+1}\backslash{\{\mathbf{0},\mathbf{1}\}} \mapsto \mathrm{vert}(B_{D}^{(n)})\) given by \(\mathcal{D}((x_{i})_{i\in I_{n+1}}) = (x_{i+1}-x_{i})_{i\in I_{n}}\) is a one-to-one correspondence.

  2. (b)

    \(B_{D}^{(n)} = \mathrm{conv}(\{(y_{i+1}-y_{i})_{i=1}^{n} | y_{i} \in \{0,1\}\})\).

Proof

The injectivity of the mapping (21) can be shown by induction. In order to prove the surjectivity of the mapping (21), let us consider \(\mathbf{x} \in \mathrm{vert}(B_{D}^{(n)})\), which by Lemma 3 is equivalent to ∥x D =1 and x∈{−1,0,1}n. Due to Lemma 1, there is a uniquely determined integration constant \(c = 1 - \max_{k \in I_{n}}\{0, \sum_{j=1}^{k} x_{j}\} \in [0,1] \) such that (c,c+x 1,…,c+x 1+⋯+x n )∈[0,1]n+1. The assumption x i ∈{−1,0,1} and ∥x D =1 therefore implies y=(c,c+x 1,…,c+x 1+⋯+x n )∈{0,1}n+1∖{0,1}; hence there is a sequence \(y = (y_{i})_{i \in {I_{n+1}}} \in \{0,1\}^{n+1}\backslash{ \{\mathbf{0}, \mathbf{1}\}}\) such that \((x_{i})_{i\in I_{n}} = (y_{i+1} - y_{i})_{i\in I_{n}}\). Equation (b) of Theorem 1 directly follows from the bijectivity of the mapping (21). □

Figure 3 illustrates the bijectivity of the mapping (21) for n=2.

Fig. 3
figure 3

Illustration of bijection between the set of vertices of \(B_{D}^{(n)}\) and those of [0,1]n+1 without the diagonal elements by means of (21) for N=2

4 Geometric Characteristics of the n-Dimensional Unit Ball of the Discrepancy Norm

In this section Theorem 1 is applied in order to determine geometric characteristics of \(B_{D}^{(n)}\) like the number of k-dimensional faces and its volume.

4.1 Number of k-Dimensional Faces

The following corollary relates the number of k-dimensional faces of the n-dimensional unit ball of the discrepancy norm to the number of corresponding faces of the (n+1)-dimensional hypercube.

Corollary 1

Let D k,n denote the number of k-faces of \(B_{D}^{(n)}\), n∈ℕ, 0≤k<n. Then

$$ D_{k, n} = 2 \cdot \left ( \begin{array}{c} n+1\\ k \end{array} \right ) \bigl( 2^{n-k}-1 \bigr). $$
(23)

Proof

First of all let us denote by H k,n the number of k-dimensional faces of the n-dimensional unit hypercube [0,1]n. As it is well known from the theory of regular polytopes, see, e.g., [11], we have . Observe that k-faces of the (n+1)-hypercube cannot contain the elements 0 and 1 together if 0≤k<n+1, n≥1. Therefore, for 0≤k<n+1, there are k-faces of the (n+1)-hypercube that contain either 0 or 1. Further, observe that for k=n, all k-faces contain either 0 or 1, which also can be seen from the identity Z n,n+1=H n,n+1.

Now we consider k<n and apply the mapping \(\mathcal{D}\) of Theorem 1 on the (n+1)-hypercube [0,1]n+1. Observe that the elements 0,1∈[0,1]n+1 are mapped to the inner point 0 of \(B_{D}^{(n)}\). Note that a k-face F of the (n+1)-hypercube can be represented by means of a linear combination of (n+1) vertices, i.e., \(F = \{\mathbf{e}_{i_{0}} + \sum_{l=1}^{k} \lambda_{l} \mathbf{e}_{i_{l}} | \lambda_{l} \in [0,1]\}\), where e i denotes the ith unit vector. We show that a k-dimensional face of the (n+1)-hypercube which does not contain either 0 or 1 is mapped to a k-dimensional face of \(B_{D}^{(n)}\).

For this, let us consider the set of linearly independent vectors \(\{\mathbf{e}_{i_{0}} - \mathbf{e}_{i_{1}}, \ldots, \mathbf{e}_{i_{0}} - \mathbf{e}_{i_{k}} \}\). The linear independency of the mapped vectors \(\{\mathcal{D}(\mathbf{e}_{i_{0}} -\mathbf{e}_{i_{1}}),\ldots, \mathcal{D}(\mathbf{e}_{i_{0}} -\mathbf{e}_{i_{k}}) \}\) follows from the observation that \(\sum_{l=1}^{k} \lambda_{l} \mathcal{D}(\mathbf{e}_{i_{0}} -\mathbf{e}_{i_{l}})= \mathbf{0}\), i.e., \(\mathcal{D} (\sum_{l=1}^{k} \lambda_{l} (\mathbf{e}_{i_{0}} -\mathbf{e}_{i_{l}}))= \mathbf{0}\), can only be satisfied if there is a real c∈ℝ such that \(\sum_{l=1}^{k} \lambda_{l} (\mathbf{e}_{i_{0}} -\mathbf{e}_{i_{l}}) = c \mathbf{1}\). Since k<n and \(\mathbf{e}_{i_{l}} \in \{0,1\}^{n+1}\), there is an index k ∈{1,…,n+1} for which the corresponding coordinate is zero for all vectors \(\mathbf{e}_{i_{l}}\), l∈{0,…,k}. This implies c=0, hence λ l =0 for l∈{1,…,k} due to the assumption that the set of vectors \(\{ \mathbf{e}_{i_{0}} - \mathbf{e}_{i_{1}},\ldots, \mathbf{e}_{i_{0}} - \mathbf{e}_{i_{k}}\}\) is linearly independent. From this and from Theorem 1 it follows that there is a one-to-one mapping between the set of k-faces of the (n+1)-hypercube that do not contain 0 or 1 to the set of k-faces of \(B_{D}^{(n)}\). This implies D k,n =H k,n+1Z k,n+1, which equals (23). □

In particular, \(B_{D}^{(n)}\) has D 0,n =2n+1−2 vertices, D 1,n =(n+1)(2n−2) edges, and D n−1,n =n(n+1) facets \(\mathcal{D}(F_{ij})\), ij, of dimension (n−1), where F ij ={e i +∑ ki,j λ k e k |0≤λ k ≤1} and i,jI n+1. Note that F ij =F ji +(e i e j ).

4.2 Volume of \(B_{D}^{(n)}\)

Using the terminology of the theory of convex polytopes, see, e.g., [14, 41], the volume can be obtained by looking at \(B_{D}^{(n)}\) as zonotope generated by a projection from the hypercube [0,1]n+1 followed by a product of shearing transformations. The (n+1) unit vectors e 1,…,e n+1 are mapped to the Minkowski sum generators g 1,…,g n+1. Since the unit ball is decomposed by the sheared projection of n faces out of (n+1) possibilities with each face mapped to an n-dimensional subbody of \(B_{D}^{(n)}\) of volume one, we obtain Corollary 2.

Corollary 2

\(V(B_{D}^{(n)}) = n+1\), n∈ℕ.

It is interesting that the volume \(V(B_{D}^{(n)}) = n+1\) of the n-dimensional unit ball of the discrepancy norm increases linearly with its dimension n, while for p-norms with 1≤p<∞, it can be shown that \(r^{n} V(B_{\|\cdot\|_{p}}^{(n)}) \stackrel{n\rightarrow \infty}{\longrightarrow} 0 \) for any r>0, see, e.g., [15, 33].

5 Conclusion

In this paper Weyl’s discrepancy norm was studied from a geometrical point of view by considering its unit ball. Thinking of sequences and differentiation in the sense that two consecutive entries are subtracted, it was shown that the unit ball of Weyl’s discrepancy norm of dimension n results from differentiating the unit hypercube of dimension (n+1). It was shown how this interpretation helps to derive and prove properties of the discrepancy norm, as, for example, that the volume of the unit ball of dimension n equals n+1. This paper was motivated by considering the discrepancy norm as dissimilarity measure for pattern analysis. In the near future, it is planned to investigate the relevance of the discrepancy norm in various fields of pure and applied mathematics. Particularly, research will be dedicated to the determination of the distribution of the diameter of a random walk, the discrete mathematical foundation of event-based image processing, and the improvement of stereo matching and related algorithms.