1 Introduction

Shape descriptors are numbers that are computed from a two-dimensional shape. In some cases, the set of numbers is complete in the sense that the original shape can be reconstructed from the shape descriptors[1, 2], but even in these situations, only a subset of the shape descriptors is typically used in practical applications. The shape descriptors can thus be considered as an approximative description of the shape such that shape similarity somehow corresponds to similarity of the shape descriptors. Consequently, they can be used for object recognition and object similarity detection.

There are two main categories of shape descriptors: volume-based and contour-based. Volume-based descriptors use all pixels of the object and include descriptors like geometric moments[3] or Zernike moments[4]. Contour-based descriptors compute the descriptors only from the shape boundary and include descriptors like curvature scale space[5] or Fourier descriptors[6]. Which approach is better depends on the application, especially whether the internal content of the shape or the boundary is more important.

The term Fourier descriptors covers a wide variety of shape descriptors, which have in common that they compute the discrete Fourier transform of some representation of a closed contour. They vary in the representation (or 'signature’[6]) of the contour and in the additional manipulations to achieve invariance properties under certain geometric transformations. Typically, invariance under translation, scaling, and rotation is achieved, but there are also Fourier descriptors that are invariant to shearing[7]. As Fourier descriptors require the extraction of a closed contour from the shape, they are restricted to connected shapes and cannot be applied to possibly broken objects.

To overcome this restriction, we define a new three-dimensional contour signature function, based upon the convex hull of the shape and the distance of the convex hull to the closest point of the shape. As the convex hull is defined for arbitrary shapes, including broken shapes, the new shape signature no longer requires connectivity of the shape. Based upon this shape signature, we derive different invariant Fourier descriptors and compare their performance on different data sets.

This paper is organised as follows: In Section 2 we give an overview over different Fourier descriptors described in the literature for closed contours of unbroken shapes. In Section 3 we describe our new method for a contour representation of broken shapes and define different methods to obtain invariant Fourier descriptors from this representation. Sections 4 and 5 contain the results of a comparative evaluation of the different Fourier descriptors and the conclusions drawn therefrom.

2 Fourier descriptors

Every connected object has a closed contour that can be represented as a sequence of the pixel coordinates x(t),y(t) where t = 0,…N - 1. A popular algorithm for contour extraction can be found in[8]. The coordinates can be considered to be sampling values

(x(t),y(t))=f 2 π N t
(1)

of a continuous, closed curvef:[0,2π] R 2 such that f can be extended continuously to a 2π-periodic function. When the function f (or, more generally, any signature function g(x(t), y(t)) derived from the coordinates) is expanded into a Fourier series, a fixed number of discrete Fourier coefficients approximately represents the contour shape. This allows for data reduction.

This is the basic idea underlying all Fourier descriptors suggested in the literature. They vary however in the contour representation g(x(t),y(t)) used as the starting point for the Fourier expansion. The representations typically fall into one of the following categories:

  • Complex representation:x(t)+j·y(t)C

  • Multidimensional representation:(x(t),y(t)) R 2

  • Scalar representation:(x(t),y(t))g(t)R.

Depending on the contour representation, the resulting Fourier coefficients will behave differently under the geometric transformations scaling, rotation, and start point shift. When the contour points are transformed by any of these operations, the Fourier coefficients change according to simple rules, which can be used to define invariant descriptors.

In the following subsections, we give an overview over the three categories and the normalisation approaches proposed in the literature to achieve invariance under these geometric transformations.

2.1 Complex contour representation

When the two-dimensional plane is interpreted as a complex plane, the contour is represented by a sequence of complex numbers z(t) = x(t) + j · y(t) which has the discrete Fourier expansion (t = 0,…,N - 1)

z(t)= k = 0 N - 1 c k exp(j2πkt/N)
(2)

with discrete Fourier coefficients

c k = 1 N t = 0 N - 1 z(t)exp(-j2πkt/N).
(3)

When the coefficients c k are interpreted as numerical approximations of the Fourier coefficients f ̂ (k) of the continuous curve f(τ) = x(τ N/2π) + j y(τ N/2π) in (1)

f ̂ ( k ) := 1 2 π 0 2 π f ( τ ) exp ( - jk τ ) , k Z ,

then the connection between f ̂ (k) and the discrete Fourier coefficients (3) is given by

f ̂ ( k ) c k for 0 k < N 2 f ̂ ( - k ) c N - k for 1 k < N 2 .
(4)

According to the Riemann-Lebesgue lemma (p. 45 in[9]), it is

lim | k | f ̂ (k)=0

so that coefficients for large values of |k| are small and only describe less-important details. Cutting off higher frequencies |k| in (4) is thus equivalent to omitting coefficients in the middle of the vector (c 0,…,c N-1). An example can be seen in Figure1.

Figure 1
figure 1

Absolute values of Fourier coefficients c k and reconstruction of a contour from only coefficients k  ≤ 8 and k  ≥  N  - 8.

The zeroth coefficient c 0 is the centre of gravity of the contour. As the smallest period of the contour curve f is the length 2π of the parameter interval, we can assume that at least c 1 ≠ 0 or c N-1 ≠ 0. Which of these two coefficients actually is guaranteed to be non-zero depends on the orientation of the contour path: for Pavlidis’ algorithm[8], e.g. it is c 1 ≠ 0. In general, it is however not guaranteed that both coefficients are non-zero, as can be seen from the unit circle f(t) := (cos(t), sin(t)), which has coefficients c 1 = 1, c N-1 = 0, N ≥ 3.

Based on elementary properties of the discrete Fourier transform, simple rules for the change of the coefficients c k under translation, scale, and rotation immediately follow. Let c k be a coefficient calculated before any of the following operations. Then the geometric operations have the following effects:

  • Translation. Adding the same complex number u to all points z(t) leads to new coefficients (c 0 + u,c 1,…,c N-1)

  • Scale. Multiplying all points z(t) with the same real factor d > 0 leads to new coefficients d · c k

  • Rotation. Rotation in the complex plane is the same as multiplying all points z(t) with a factor exp(j φ), where φ is the angle of rotation. This leads to new coefficients exp(j φ)c k

  • Start point shift. Starting the contour at a different point results in a cyclic shift of vector ( z ( t ) ) t = 0 N - 1 . If the index shift is m, then the new coefficients areexp jkm 2 π N c k

To achieve translation invariance, the first coefficient c 0 can be discarded because it is the only one that depends on translation. For scale invariance, all coefficients can be divided by the absolute value of a non-zero coefficient |c r |, r > 0. Usually, a fixed coefficient r is chosen, e.g. r = 1, but our experiments in Section 4.3 have shown that it is better to always choose the coefficient c r with the largest absolute value.

Since | exp(j φ)| = | exp(j k m 2π/N)| = 1, a simple approach to obtain a rotation and shift invariant descriptor is to completely drop the phase information and to only use the absolute values of the Fourier coefficients. This approach was used e.g. by Zhang and Lu in their comparative study[6]. The resulting absolute value descriptors are

| l k |:= | c k | | c r | fork=1,N-1,2,N-2,.
(5)

It is also possible to define invariant Fourier descriptors that still keep the phase information, as already observed by Dimov and Laskov[10]. Let c r  ≠ 0 and c s  ≠ 0, r ≠ s be two non-zero coefficients with polar angles α r  = arg(c r ) and α s  = arg(c s ). Rotation invariance is achieved by multiplying each coefficient with exp(-j α r ), and shift invariance is established by replacing the polar angle α k  = arg(c k ) with s α k  - k α s . Combining these phase normalisations yields the invariant descriptors

l k := | c k | exp ( j ( s - r ) [ α k - α r ] ) | c r | exp j ( k - r ) [ α s - α r ] = | c k | | c r | exp j [ ( s - r ) α k + ( k - s ) α r + ( r - k ) α s ]
(6)

for k = 1,N - 1,2,N - 2,…. The two normalisation coefficients c r and c s should be chosen as the coefficients with the largest and second largest absolute values, respectivelya.

Granlund[11] proposed different descriptors as

d k := c 1 + k c N + 1 - k c 1 2 fork=2,,N-2.
(7)

These descriptors include the phase information, but because of d k  = d N-k , there is a considerable loss of information compared to (6). Granlund also defined the (N - 1)(N - 2) (sic!) descriptors c 1 + k i c N + 1 - i k / c 1 i + k , but these have a lot of redundancy and it is not clear how to select a small subset therefrom.

2.2 Multidimensional contour representation

Instead of interpreting the contour coordinates as complex numbers, the x and y coordinates can alternatively be transformed separately:

c k ( x ) = 1 N t = 0 N - 1 x ( t ) exp ( - j 2 π kt / N ) c k ( y ) = 1 N t = 0 N - 1 y ( t ) exp ( - j 2 π kt / N )
(8)

or, split into real and imaginary part:

a k ( x ) = 1 N t = 0 N - 1 x ( t ) cos ( 2 π kt / N ) b k ( x ) = 1 N t = 0 N - 1 x ( t ) sin ( 2 π kt / N ) a k ( y ) = 1 N t = 0 N - 1 y ( t ) cos ( 2 π kt / N ) b k ( y ) = 1 N t = 0 N - 1 y ( t ) sin ( 2 π kt / N ) .
(9)

Half of these components are redundant because b 0 ( x ) = b 0 ( y ) =0 and, for 0 < k ≤ N/2, it is

a k ( x ) = a N - k ( x ) b k ( x ) = - b N - k ( x ) a k ( y ) = a N - k ( y ) b k ( y ) = - b N - k ( y ) .
(10)

The inverse formula then reads

x ( t ) y ( t ) = a 0 ( x ) a 0 ( y ) + 2 k = 1 N - 1 2 a k ( x ) b k ( x ) a k ( y ) b k ( y ) · cos 2 π kt / N sin 2 π kt / N
(11)

where N - 1 2 is the smallest integer i withi N - 1 2 . For the sake of simplicity, let N be an odd number throughout the rest of this section.

The real coefficients (9) of the multidimensional representation are connected to the complex coefficients (3) by

c k = c k ( x ) + j c k ( y ) = a k ( x ) + b k ( y ) + j [ a k ( y ) - b k ( x ) ] .

In contrast to c k , higher frequencies directly correspond to higher indices k of the real coefficients a k ( x / y ) , b k ( x / y ) because of the symmetry relations (10).

Kuhl and Giardina[12] interpreted each summand

x k ( t ) y k ( t ) = 2 a k ( x ) b k ( x ) a k ( y ) b k ( y ) · cos 2 π kt / N sin 2 π kt / N

in (11) as a parameterisation with parameter t ∈ [0,2π] of an ellipse that visualises the k th Fourier coefficients. Therefore they called the resulting Fourier descriptors elliptic features. Based upon this idea, Lin and Hwang[13] proposed the following translation, rotation, and shift invariant (but not scale invariant) Fourier descriptors:

I k := a k ( x ) 2 + b k ( x ) 2 + a k ( y ) 2 + b k ( y ) 2 J k := det a k ( x ) b k ( x ) a k ( y ) b k ( y ) = a k ( x ) b k ( y ) - b k ( x ) a k ( y ) K k := sgn a k ( x ) a k ( y ) + b k ( x ) b k ( y ) c i ( y ) 2 - c i ( x ) 2 + a i ( x ) a i ( y ) + b i ( x ) b i ( y ) c k ( x ) 2 - c k ( y ) 2 × c i ( x ) 2 c k ( x ) 2 + c i ( y ) 2 c k ( y ) 2 + 2 a i ( x ) a i ( y ) + b i ( x ) b i ( y ) a k ( x ) a k ( y ) + b k ( x ) b k ( y )
(12)

where i ∈ 1,…,(N - 1)/2 is a fixed index and sgn denotes the signum function. The properties of the real Fourier coefficients imply

I N - k = I k , J N - k = J k , K N - k = K k .

To make these features also scale invariant, they additionally need to be divided by a normalisation factor, e.g. I 1, which leads to the invariant descriptors I k /I 1, J k /I 1, and K k / I 1 2 .

A nice point about the features (12) is that they have a geometric interpretation: I k is the sum of two semi-axis lengths of the k th ellipse, |J k | is proportional to the area of the k th ellipse, and K k contains the phase difference between ellipses i and k for the fixed i. Compared to the N - 2 complex features (6), the 3 2 (N-1) real elliptic Fourier features contain, however, considerably less information about the shape. Lin and Hwang tried to compensate this with additional features, which were not rotation invariant, however.

An interesting generalisation of Fourier descriptors from two-dimensional curves to n-dimensional closed curves was made by Badreldin et al.[14]. They transformed each component separately and then built a vector containing l 2-norms of all Fourier coefficients of a given index. In two dimensions this is equivalent to the descriptors by Shridhar and Badreldin[15] (k = 0,…,(N - 1)/2):

c k ( x ) 2 + c k ( y ) 2 = a k ( x ) 2 + b k ( x ) 2 + a k ( y ) 2 + b k ( y ) 2
(13)

Rotation and start point shift invariance is obtained because absolute values are used, and translation invariance results from discarding k = 0. For scale invariance, they additionally need to be divided by a normalisation factor, e.g. | c 1 ( x ) | 2 + | c 1 ( y ) | 2 . However, the Fourier descriptors (13) discard a considerable portion of the shape information: not only phase information is lost but also x- and y-components are not coupled. Therefore, for example, a shift of x-coordinates without a change to y-coordinates cannot be detected.

2.3 Scalar contour representation

A two-dimensional contour can also be represented in one dimension by mapping it to a one-dimensional signature function: (x(t),y(t)) ↦ f(t). The signature function f can already be invariant under translation, scaling and rotation, like Zahn and Roskies’ cumulative angular function[16], or the invariance normalisation can be applied after the Fourier transform. Mapping two dimensions onto one generally leads to some loss of shape information (see Figure2), but the hope is that the essential features are still captured for most shapes. For an overview of possible signature functions, see[6].

Figure 2
figure 2

Two different shapes (grey) with the same signature function 'centroid distance’ r ( t ) defined in Equation ( 14 ).

In the comparative study[6], the centroid distance performed best. Let( x 0 , y 0 ):= 1 N k = 0 N - 1 ( x k , y k ) be the centre of gravity of the contour. Then the centroid distance is defined as

r ( t ) := | ( x ( t ) , y ( t ) ) - ( x 0 , y 0 ) | = ( x ( t ) - x 0 ) 2 + ( y ( t ) - y 0 ) 2 .
(14)

These values are already rotation and translation invariant. Let ( c k ) k = 0 N - 1 be the (complex) Fourier transform of ( r ( t ) ) t = 0 N - 1 , i. e.

c k = 1 N t = 0 N - 1 r(t)exp - j 2 π kt N .
(15)

Descriptors R k that are also scale and start point shift invariant can be obtained with the phase normalisation

R k =exp(js α k -jk α s )· | c k | | c 0 | ,
(16)

where α k  = arg(c k ) denotes the polar angle of coefficient c k , and α s is the polar angle of the coefficient c s , s ≥ 1 with the second largest absolute value. Note that |c 0| = c 0 is always the largest absolute value because

c 0 = 1 N t r ( t ) = 1 N t r ( t ) exp - j 2 π kt N 1 N t r ( t ) exp - j 2 π kt N = | c k | for all k .

An alternative simpler normalisation is to discard the phase information and use the absolute value |R k |. This normalisation was used by Zhang and Lu.

3 Application to broken shapes

All Fourier descriptors described in Section 2 start from a closed contour description of the shape and are therefore not applicable when the shape is broken, i.e. consists of more than one connected component. In this section we first present a method to describe the contour of an arbitrary (broken or unbroken) shape by a periodic three-dimensional curve and then derive different Fourier descriptors for this curve which are invariant under translation, scale, rotation, and start point shift.

3.1 Contour representation of broken shapes

A simple solution to circumvent the problem of broken shapes would be to replace the shape parts with a single closed curve that contains all parts and to compute the Fourier descriptors from this curve instead. An obvious candidate for such a curve is the convex hull, i.e. the smallest convex polygon that contains all points of the shape. There are efficient algorithms for computing the convex hull from a set of points[17]. As can be seen in Figure3, replacing a contour with its convex hull looses a considerable amount of information because very different shapes can have the same convex hull. To encode more shape information, we therefore compute for each point (x,y) on the convex hull its closest Euclidean distance d to the shape S:

d=min{|(x,y)-(u,v)|with(u,v)S}.
(17)
Figure 3
figure 3

Two different shapes (grey) with the same convex hull (solid black).

Instead of a two-dimensional contour (x(t),y(t)), we then obtain a three-dimensional parametric curve (x(t),y(t),d(t)) representing the shape, as shown in Figure4.

Figure 4
figure 4

Representation of a broken shape (grey) by a three-dimensional curve ( x ( t ), y ( t ), d ( t )). (x,y) are the coordinates of the convex hull and d is the distance between convex hull and shape.

When implementing an algorithm for computing the contour representation (x(t),y(t),d(t)), two questions occur: how the convex hull should be sampled and how the distances d(t) can be efficiently computed. The vertices of the convex hull polygon can be obtained e.g. with Graham’s scan algorithm[17]. These vertices are obvious sampling points, but their distance can be arbitrary, so that the edges need to be sampled. As the image sampling distance is one pixel, it is natural to compute the edge length l and to add ⌊l - 1⌋ equidistant sampling points on each edge.

To compute the distance d(t) for each sampling point x(t),y(t), two efficient approaches are possible:

  • Compute the distance transform image[18] of the original shape and approximate d(t) by linear interpolation of the distance image at the real point x(t),y(t).

  • Store all shape contour points in a kd-tree[19] and compute (17) for each sampling point x(t),y(t) with a nearest neighbour search in the kd-tree.

To estimate the runtime complexity of both algorithms, let us first observe that a shape with an n × n bounding box has O(n 2) volume pixels, but only O(n) contour points. As the fastest algorithms for computing the distance transform require two runs over all image pixels[18], the first algorithm requires O(n 2) operations to compute all contour distances. The second algorithm requires O(n logn) operations for building the kd-tree and O(n logn) operations for querying all nearest neighbours, resulting in a total runtime of O(n logn). The second approach is thus faster, and we have implemented it with the kd-tree library shipped with the Gamera framework[19].

3.2 Broken shape Fourier descriptors

For the derivation of invariant Fourier descriptors for the three-dimensional point sequence ( x ( t ) , y ( t ) , d ( t ) ) t = 0 N - 1 , we propose three different approaches. Our first Fourier descriptor is built upon the techniques in Section 2.1. It builds a complex number by taking the centroid distance r(t) := |(x(t),y(t)) - (x 0,y 0)| (see Equation (14)) as real part and the distance d(t) as imaginary part. The sequence ( r ( t ) + jd ( t ) ) t = 0 N - 1 is already invariant under translation and rotation. Scale and start point shift invariance of the Fourier coefficients

c k := 1 N t = 0 N - 1 [r(t)+jd(t)]exp(-j2πkt/N)
(18)

is either achieved with the phase normalisation (compare Equation (16))

A k :=exp js α k - jk α s | c k | | c r |
(19)

or, simply, by using the absolute values |A k |, where α k  = arg(c k ) denotes the polar angle of coefficient c k , and c r and c s are the two coefficients with the largest and second largest absolute values (r ≥ 0, 0 < s < N/2), and α s  = arg(c s ) is the polar angle of c s . For a fixed number of n descriptors, the first n values in the sequence A 0,A N-1,A 1,A N-2,… should be selected.

The second Fourier descriptor under investigation follows the multidimensional approach by Badreldin et al. as described in Section 2.2. Let c k ( x ) , c k ( y ) , and c k ( d ) be the complex Fourier coefficients of the three dimensions x(t), y(t), d(t) according to (8). Invariant Fourier descriptors are then obtained as (k=1,2,, N - 1 2 )

B k := c k ( x ) 2 + c k ( y ) 2 + c k ( d ) 2 c 1 ( x ) 2 + c 1 ( y ) 2 .
(20)

For a fixed number of n descriptors, the first n values B 1,B 2,…,B n should be selected.

The third descriptor uses the scalar representation r(t)-d(t) that is already invariant under translation and rotation. It is an approximation to the local radius of the shape and leads to the Fourier coefficients

c k = 1 N t = 0 N - 1 [r(t)-d(t)]exp(-j2πkt/N).
(21)

As r(t)-d(t) are real values, the Fourier coefficients at k and n-k are complex conjugates: c k = c N - k , 1 ≤ k < N. Therefore, only values for0k N - 1 2 are relevant. Again, the coefficients (21) can be made scale and start point shift invariant either with the phase normalisation

C k :=exp js α k - jk α s | c k | | c r |
(22)

or, simply, by using the absolute values |C k |, where α k  = arg(c k ) denotes the polar angle of coefficient c k , and c r and c s are the two coefficients with the largest and second largest absolute values (r ≥ 0, 0 < s < N/2). For a fixed number of n descriptors, the first n values C 0,C 1,C 2,… should be selected.

4 Evaluation

We have evaluated our new Fourier descriptors on two different data sets, the MPEG-7 database of unbroken shapes and a new real-world data set with broken shapes from scans of 19th century chant books in the Eastern neumatic notation[20]. Both data sets are described in detail in Section 4.2. Apart from a performance comparison of the broken shape descriptors (Section 4.5), we have also investigated the effect of different normalisation schemes (Section 4.3) and the number of descriptors needed for similarity-based retrieval (Section 4.4).

We have implemented all Fourier descriptors as a toolkit for the Gamera framework for document analysis and recognitionb[21]. The toolkit is published under a free license together with the new data set of broken neumes in the 'Addons’ section of the Gamera website.c For convenience, a brief summary of all Fourier descriptors under investigation is given in Table1.

Table 1 Names and symbols used for the Fourier descriptors in the present study

4.1 Performance measures

As evaluation criteria for shape based image retrieval, we have used two different performance measures, the precision/recall curve and the leave-one-out error rate of a k-nearest neighbour (k-NN) classification. For a single query image belonging to class ω, precision and recall are defined as follows: let n ω be the number of all images of class ω, and let k ω be the number of images of class ω among the k-nearest neighbours of the query image; then k ω /k is the precision and k ω /n ω is the recall for this query. The precision of all test images is averaged to yield a single precision value. Typically, k is not fixed, but the precision is measured for a given recall rate. When the recall is increased, the precision will generally decrease, but less so for better similarity measures. The decrease of the precision/recall curve can thus serve as a performance measure for similarity-based retrieval.

To evaluate the classification performance of the different Fourier descriptors, a natural criterion is the cross-validation or leave-one-out error rate of a k-NN classifier because it is an unbiased estimator of the expected error rate[22]. A k-NN classifier assigns a test sample to the majority class among its k-nearest training samples. The leave-one-out error rate is the average error rate when each sample is classified with a k-NN classifier that has been trained with the remaining n-1 samples, thereby yielding a single performance measure.

4.2 Data sets

A data set that has already been used for the evaluation of different Fourier descriptors in the study[6] is part B of the MPEG-7 CE-Shape-1 databased[23]. It consists of 1,400 shapes that have been classified into 70 classes with 20 similar items in each class. Figure5 shows sample shapes from this data set. As pointed out by the authors of the data set, a 100% retrieval rate is impossible because some shapes are more similar to the shapes from different classes than to their own class so that 'it is not possible to group them into the same class using only shape knowledge’[23]. In some images, there are noise pixels which form additional small random shapes. In order to ignore this noise, we have computed the contour of the largest connected component for each image only.

Figure 5
figure 5

Example shapes from the MPEG-7 CE-1 part B data set. Samples in each row belong to the same class.

The MPEG-7 data set does not contain any broken shapes and thus allows for a performance comparison of the new descriptors with the ordinary Fourier descriptors described in Section 2. To also test the new descriptors from Section 3.2 on actual broken shapes, we have created a data set of broken glyphs from the four 19th century music prints in Byzantine neume notation that have also been used in[20] (sources HA-1825, HS-1825, AM-1847, and MP1-1850). This 'NEUMES’ data set consists of 640 images out of 40 different classes with 16 items in each class. Due to varying print quality, some glyphs are connected while others are randomly broken into up to eight fragments. As can be seen in Figure6, some neumes are mirrored or elongated versions of different neumes. It is thus important that the shape descriptors used for discrimination are not invariant to axial mirroring or arbitrary affine transformations. The sample images in Figure6 are not rotated; to make rotational invariance of the shape descriptors mandatory, we have rotated the 16 items in each class in steps of 22.5°.

Figure 6
figure 6

Unrotated example shapes from our newly created NEUMES data set. Samples in each row belong to the same class.

4.3 Normalisation schemes

The Fourier descriptors from Section 2 can be normalised (i.e. made invariant) in different ways. There are generally two degrees of freedom:

  • Phase normalisation versus absolute values

  • The index choices s and r for the normalisation coefficients c r and c s

Figure7 shows the effect of the different normalisation schemes on the leave-one-out recognition rate on the MPEG-7 data set for the complex position Fourier descriptor. Both for the absolute values and the phase-normalised descriptors, it is better to normalise not with a fixed coefficient, but with the coefficient with the largest absolute value (which may vary from shape to shape). This normalisation limits the numeric range of the descriptors to a fixed interval, a feature normalisation scheme that is known to improve the recognition rate in many cases[24].

Figure 7
figure 7

Impact of different normalisations on the complex position Fourier descriptor from Equation ( 6 ). Recognition rates have been measured by leave-one-out with a k-NN classifier (k = 1) on the MPEG-7 data set.

The observation that the phase normalisation performs poorer than the absolute values is surprising, however, because the phase-normalised descriptors carry information that is lost in the absolute values. It turned out, however, that the phase angles of the descriptors are much less robust with respect to small changes in the contour coordinates. To demonstrate this phenomenon, we did a small Monte Carlo experiment. We added normally distributed random noise independently to the x and y coordinates of a sample contour and measured the deviation Δ of the resulting descriptors l ~ k and| l ~ k | as

Δ l := 1 L k = 1 m | l k - l ~ k | + | l N - k - l ~ N - k | Δ | l | := 1 L k = 1 m | | l k | - | l ~ k | | + | | l N - k | - | l ~ N - k | |

where l k are the undisturbed descriptors andL:= k = 1 m | l k |+| l N - k |. Figure8 shows these deviations, averaged over 10,000 random experiments, as a function of the variance σ 2 of the random noise. The phase normalisation obviously is much less robust. The same phenomenon already occurs for the Fourier coefficients c k due to|| c k |-| c ~ k ||| c k - c k ~ |, and this is amplified by the phase normalisation (6) because the phase angles are multiplied with large integer values (s-r, k-s, and r-k).

Figure 8
figure 8

Impact of random disturbances on the complex position Fourier descriptors l k . Disturbances have been simulated as Gaussian random noise with variance σ 2.

Further evidence for the instability of the phase angles can be derived from Figure9, which shows the recognition rates for different normalisations of the Fourier descriptor broken A on the NEUMES data set. When the phase normalisation coefficient c s is chosen as the largest coefficient for 0 < s < N, this often results in a high value s ≈ N - 1. This amplifies the phase angle error of α k  = arg c k because α k is multiplied with s in the phase normalisation (19), thereby even resulting in a negative effect on the recognition rate compared to a fixed normalisation with s = 1. When the maximum coefficient c s is only searched for small s (in most cases this led to s = 2), the recognition rate is considerably better. Nevertheless, in any case, the absolute values performed yet better.

Figure 9
figure 9

Impact of different normalisations on the broken A Fourier descriptor from Equation ( 19 ). Recognition rates have been measured by leave-one-out with a k-NN classifier (k = 3) on the NEUMES data set. Note that the curves for r = 0 and r = arg max|c r | are identical.

We therefore conclude that it is generally better to use the absolute values instead of the phase-normalised coefficients and that the scale invariance normalisation should be done with the coefficient c r with the largest absolute value rather than with fixed r.

4.4 Number of descriptors

Figures7 and9 show that a small number of Fourier descriptors is sufficient for shape retrieval. In our experiments, this behaviour was universal, as can be seen in Figure10: using more than 20 descriptors generally does not increase the recognition rate any further. In our experiments in the following subsection, we have therefore limited the number of descriptors to 60, to be on the safe side. It is interesting to note that these numbers, which are derived from the leave-one-out recognition rates, are much lower than the numbers derived by Zhang and Lu from the absolute magnitude of the descriptors[6]. The reason for this difference is that a criterion based on the magnitude does not take the discriminative power of the coefficients into account.

Figure 10
figure 10

Leave-one-out recognition rates of a k-NN classifier ( k  = 1 ) on the MPEG-7 data set. All descriptors have been normalised with the largest coefficient c r and the absolute values have been taken.

4.5 Descriptor comparison

Figure11 shows the precision/recall curves for all investigated Fourier descriptors on the MPEG-7 data set. To all descriptors, the recommendations from the proceeding subsections have been applied, i.e. they have been normalised with the largest coefficient and by taking the absolute value, and the first 60 descriptors have been used.

Figure 11
figure 11

Precision/recall curves for all Fourier descriptors on the MPEG-7 data set. In all cases, 60 descriptors have been used.

The best performing Fourier descriptor was the complex position, which seems to be a contradiction to the experiments by Zhang and Lu[6], who found the centroid distance to be best performing. This discrepancy can however be explained with the different choice of the normalisation coefficient c r , as shown in Figure12: when the complex position Fourier descriptor is normalised with a fixed coefficient, e.g. r = 1, it performs poorer than the centroid distance, as was the case in the study by Zhang and Lu.

Figure 12
figure 12

Comparison between complex position and centroid distance. The complex position Fourier descriptor is only better than the centroid distance when the normalisation coefficient c r is chosen as the largest coefficient ('max r’).

On the MPEG-7 data set, the broken shape Fourier descriptors did not perform as well as the best single shape Fourier descriptor (complex position), but the precision/recall curve of our broken C descriptor is almost identical to the centroid distance Fourier descriptor, with broken A and broken B performing only slightly poorer. That the broken C and centroid distance descriptors behave very similar is hardly surprising because for single closed shapes the signature r(t) - d(t) is simply an approximation of the signature (14).

On the NEUMES data set, only our new Fourier descriptors are applicable, and the resulting precision/recall curves are shown in Figure13. On this data set, there is a more distinct difference between the three broken Fourier descriptors. Actually, the ranking is in ascending order of the information loss of the descriptor: mapping the complex number r + j d (broken A) onto the real number r - d (broken C) looses some information, and the broken B descriptor looses even more shape information because it decouples the x, y, and d coordinates. On the MPEG-7 data set, this has a smaller impact on the recognition performance because the shapes within a single class vary considerably. On the NEUMES data set, in contrast, this is of importance because detailed shape information is required for class discrimination; see e.g. the two bottom rows in Figure6.

Figure 13
figure 13

Precision/recall curves for the broken shape Fourier descriptors on the NEUMES data set. In all cases, 60 descriptors have been used.

5 Conclusions

The new Fourier descriptors for broken shapes have shown retrieval performances that were comparable to common closed contour shape descriptors like the 'centroid distance’ Fourier descriptor. As the new descriptors have the benefit of being applicable to arbitrary shapes (connected or broken), they can serve as a general replacement for other Fourier descriptors lacking this flexibility.

Our experiments have shown that it is generally better to use the absolute values rather than the phase-normalised descriptors and that scale invariance normalisation should be done with the largest coefficient, rather than with a fixed coefficient. For practical applications on real-world data, we would recommend to use the 'broken A’ Fourier descriptor.

Endnotes

a When only one coefficient is non-zero, this can be chosen as c r and no phase normalisation is necessary.

b http://gamera.sf.net/.

c http://gamera.informatik.hsnr.de/addons/fd/.

d http://www.dabi.temple.edu/~shape/MPEG7/.