1 Introduction

Visual lip reading is a technology which combines machine vision and language perception. Visual lip reading systems identify face region from images or videos by machine vision, then extract the mouth variation features of speakers and determine the pronunciations of these features by recognition model, thereby recognizing the speech contents. This system receives more and more attention in the field of human-computer interaction (HCI), pattern recognition (PR), and artificial intelligence (AI) in recent years. It resolves the problem that the recognition rate of audio speech recognition system declines dramatically because of interferences or noises. Lip segmentation is fundamental to visual lip reading systems, because the accuracy of segmentation result directly affects the recognition rate.

There are various methods of lip segmentation, such as a method based on MAP-MRF framework [1], a clustering approach without knowing segment number [2], and the active contour model (ACM) involved in this paper. ACM is one of the popular models for image segmentation. There are several advantages of ACM over classical image segmentation methods [3]. First, ACM can obtain sub-pixel accuracy of object boundaries [3, 4]. Second, this model can be developed within the framework of the energy minimization principle. Third, they can get smooth and closed results and can be used for further applications, such as feature extraction and shape analysis [3].

Existing ACM can be classified into two categories: edge-based models and region-based models. Edge-based models adopt image gradient as constraint condition which impels the initial contour to converge to object boundary [5]. Researchers have proposed many methods to improve edge-based models [4, 6,7,8]. These models have incomplete convergence problem due to fuzzy or weak object boundary and image noise. Region-based models are immune from image noise. They utilize image statistical information as constraint condition, and the performance is superior to edge-based models. These models can segment object area in the case of weak boundary or even without boundary [5]. Papers [9,10,11,12,13,14] proposed many methods to improve region-based models. In the Mumford-Shah method [9], an image is segmented by minimizing the energy function including boundary and image area [15]. Then, Chan and Vese [9] used Shah’s variational method to build an active contour model, which was developed by Osher’s level set method [16]. Chan-Vese algorithm has a high performance and is easy to implement, so it has been widely used in image segmentation [15]. Global region-based models may not obtain satisfactory results when the texture is heterogeneous or different between object and background. Thus, some researchers attempt to use local regional information as constraint of active contour [5]. Localized active contour model (LACM) is one of widely used methods, proposed by Lankton and Tannenbaum [6]. This model constructs a local region at each point along contours. These regions lead to a set of local energies centered on points.

Li and Cheung and Chin et al. [17,18,19] proposed several lip segmentation methods based on grayscale images, while Talea and Yaghmaie [20], Kim et al. [21], Hulbert and Poggio [22], Canzlerm and Dziurzyk [23], and Leung et al. [24] adopted methods using color images directly. The pseudo tone is proposed as the ratio of RGB values for lip detection in [22]. Canzlerm and Dziurzyk [23] suggested that the segmentation quality can be improved by suppressing the blue color component, since the blue component is subordinate in the lip region. Leung et al. [24] applied fuzzy color clustering. There are other color spaces that are used to display lip color and background color differences, such as YCbCr [25], NTSC [26], CIE-Lab [27,28,29,30], and bi-color space [31].

In this paper, we propose a LACM-based method using two initial contours in a combined color space. At first, illumination equalization is used to the original RGB image to reduce the interference resulting from uneven illumination. Then, we use a combined color space which consists of the U component in CIE-LUV color space and the sum of C2 and C3 in discrete Hartley transform (DHT). This color space can retain more details of lip information and highlight differences between the lip and other parts such as skin, teeth, and black holes. Ultimately, we utilize a rhombus as the initial contour for closed mouth and combined semi-ellipses as the outer and inner initial contour for open mouth. Thus, we can obtain the segmentation results without internal part of the mouth.

This paper is organized as follows: we review the LACM and Chan-Vese model in Section 2; the proposed method is described in Section 3; experimental results are shown in Section 4; and lastly, the conclusion is provided in Section 5.

2 Overview of LACM and Chan-Vese model

The core of classical ACM is evolving the curves in constrained images so that the target feature is detected in the image by minimizing their energy and length [15]. LACM was proposed by Lankton in 2008 [6]. This model constructs a local region at each point along the contour. These regions lead to a set of local energies centered on these points. In the process of local energy minimization, the curve gradually converges to the object boundary. We use I to represent a pre-specified image defined on the domain Ω, and C represents a closed contour represented as the zero-level set of a signed distance function ϕ, i.e., C = {x|ϕ(x) = 0} [6]. The interior of C is specified by the approximation of the smoothed Heaviside function:

$$ H\phi (x)=\left\{\begin{array}{cc}1,& \phi (x)<-\varepsilon \\ {}0,& \phi (x)>\varepsilon \\ {}\frac{1}{2}\left\{1+\frac{\phi }{\varepsilon }+\frac{1}{\pi}\sin \left(\frac{\pi \phi (x)}{\varepsilon}\right)\right\},& \mathrm{otherwise}\end{array}\right. $$
(1)

Similarly, the exterior of C is defined as (1 − (x)). The smoothed version Dirac delta function δϕ(x) which is the derivative of (x) is used to specify the area around the curve C:

$$ \delta \phi (x)=\left\{\begin{array}{cc}1,& \phi (x)=0\\ {}0,& \phi (x)<\varepsilon \\ {}\frac{1}{2\varepsilon}\left\{1+\cos \left(\frac{\pi \phi (x)}{\varepsilon}\right)\right\},& \mathrm{otherwise}\end{array}\right. $$
(2)

In order to calculate the local energy, LACM introduces a function β(x, y) to define the local region in terms of parameter r:

$$ \beta \left(x,y\right)=\left\{\begin{array}{l}1,\kern0.5em \left\Vert x-y\right\Vert <r\\ {}0,\kern0.5em \mathrm{otherwise}\end{array}\right. $$
(3)

If the point y is in the local region centered on x, the value of function will be 1; otherwise, it will be 0. The local region is split into interior area and exterior area by the evolving curve, as shown in Fig. 1.

Fig. 1
figure 1

Model of LACM: the red part is the outer area of the local region and the blue part is the inner area of the local region

The energy function can be defined as:

$$ E\left(\phi \right)=\underset{\varOmega }{\int}\delta \left(\phi \right)\underset{\varOmega }{\int}\beta \cdot F\left(I,\phi \right) dydx+\lambda \underset{\varOmega }{\int}\delta \left(\phi \right)\left\Vert \nabla \phi \right\Vert dydx $$
(4)

where I is an image and Ω is its domain, λ denotes a parameter of smoothing item, and F is a local force function.

The mean value of local interior region u and local exterior region v are defined as:

$$ u=\frac{\underset{\varOmega }{\int}\beta \cdot I\cdot H\left(\phi \right) dydx}{\underset{\varOmega }{\int}\beta \cdot H\left(\phi \right) dydx} $$
(5)
$$ v=\frac{\underset{\varOmega }{\int}\beta \cdot I\cdot \left(1-H\left(\phi \right)\right) dydx}{\underset{\varOmega }{\int}\beta \cdot \left(1-H\left(\phi \right)\right) dydx} $$
(6)

The formula of F changes with the change of energy model. In this paper, we choose the Chan-Vese model as the energy model. Thus, F is as following:

$$ F=H\left(\phi \right){\left(I-u\right)}^2-\left(1-H\left(\phi \right)\right){\left(I-v\right)}^2 $$
(7)

and its derivative is denoted as:

$$ {\nabla}_{\phi }F=\delta \left(\phi \right)\left({\left(I-u\right)}^2-{\left(I-v\right)}^2\right) $$
(8)

Substituting Eq. (8) into Eq. (4), we can get the local curvature flow:

$$ \frac{\partial \phi }{\mathrm{\partial t}}=\delta \left(\phi \right)\underset{\varOmega }{\int}\beta \cdot \delta \left(\phi \right)\left({\left(I-u\right)}^2-{\left(I-v\right)}^2\right) dydx+\lambda \delta \left(\phi \right)\operatorname{div}\left(\frac{\nabla \phi }{\left|\nabla \phi \right|}\right) $$
(9)

Local force F changes with the variation of mean value u and v based on Eq. (7). From Eq. (9), it can be found that F makes the curvature flow converge to the minimization. The curve will stop moving when the curvature flow approximates to 0 after several times of iterations and only the object boundary satisfies the condition. That means the curve can converge to the object boundary when the local energies do not change any more after iterations.

3 Method

3.1 Mouth localization

The focus of this paper is lip segmentation, so we just take the lip area as region of interest. In order to reduce redundant information, it is necessary to extract the mouth area. In previous studies, researchers have proposed a variety of methods to extract lip area [32, 33]. However, these methods still contain redundant information or have an adverse impact on the subsequent processing. In this paper, we choose a method which can segment the mouth area according to the general structure and proportion of the face [34]. The formula is as follows:

$$ \frac{1}{4}{W}_{\mathrm{face}}<{W}_{\mathrm{mouth}}<\frac{3}{4}{W}_{\mathrm{face}} $$
(10)
$$ \frac{2}{3}{H}_{\mathrm{face}}<{H}_{\mathrm{mouth}}<\frac{9}{10}{H}_{\mathrm{face}} $$
(11)

where Wface and Hface are the width and height of the face and Wmouth and Hmouth are the width and height of the mouth. From the formula, we can obtain the lip area as shown in Fig. 2. It explains that this method can get satisfactory and effective results in the simple background which is required in this paper.

Fig. 2
figure 2

Obtained lip areas

3.2 Illumination equalization of RGB image

Illumination is a main factor that affects the appearance of an image [35]. The lighting from different directions may cause the uneven illuminations, which often lead to intensity heterogeneity. Therefore, illumination equalization plays a significant role in image analysis and processing [36]. Liew et al. [35] has introduced an effective way to reduce the impacts of vertical direction uneven illumination. This method just processes the single point on the boundary and its effect may be influenced by image noise. In this paper, we adopt an improved illumination equalization method proposed in [35] via the analysis of local region along the image boundary, which is more robust against noise and adaptive to the multifarious illumination directions. This method consists of two illumination directions, the horizontal direction and the vertical direction. The uneven illumination can be regarded as the liner along its direction.

We assume the size of a given image is m × n, L(x, y) denotes the luminance values before illumination equalization and L h (x, y) and L v (x, y) denote the luminance values after illumination equalization in the horizontal direction and vertical direction, respectively. The size of the local region is (2p + 1) × (2q + 1); we take the average intensity value of this local area instead of the intensity of a single boundary point. In this paper, we set the size of the local region as 5 × 7. The formulas of illumination equalization in the horizontal and vertical direction are as follows:

$$ {L}_h\left(x,y\right)=\left\{\begin{array}{cc}L\left(x,y\right)+\frac{\left(n-2j+1\right)\cdot \left(r(p)-l(p)\right)}{2\left(n-1\right)},& i\in \left[1,p\right),\\ {}L\left(x,y\right)+\frac{\left(n-2j+1\right)\cdot \left(r\left(m-p\right)-l\left(m-p\right)\right)}{2\left(n-1\right)},& i\in \left(m-p,m\right],\\ {}L\left(x,y\right)+\frac{\left(n-2j+1\right)\cdot \left(r(i)-l(i)\right)}{2\left(n-1\right)},& i\in \left[p,m-p\right];\end{array}\right. $$
(12)
$$ {L}_v\left(x,y\right)=\left\{\begin{array}{cc}L\left(x,y\right)+\frac{\left(m-2i+1\right)\cdot \left(b(q)-t(q)\right)}{2\left(m-1\right)},& j\in \left[1,q\right),\\ {}L\left(x,y\right)+\frac{\left(m-2i+1\right)\cdot \left(b\left(n-q\right)-t\left(n-q\right)\right)}{2\left(m-1\right)},& j\in \left(n-q,n\right],\\ {}L\left(x,y\right)+\frac{\left(m-2i+1\right)\cdot \left(b(j)-t(j)\right)}{2\left(m-1\right)},& j\in \left[q,n-q\right].\end{array}\right. $$
(13)

where l(i) and r(i) denote the mean intensity of left and right borders within a local region of size (2p + 1) × (2q + 1) at the ith row, respectively. Similarly, t(j) and b(j) denote the mean intensity of top and bottom borders at the jth column individually. The formulas are as follows:

$$ l(i)=\frac{1}{\left(2p+1\right)\cdot \left(2q+1\right)}\sum \limits_{k=-p}^p\sum \limits_{l=1}^{2q+1}L\left(i+k,l\right) $$
(14)
$$ r(i)=\frac{1}{\left(2p+1\right)\cdot \left(2q+1\right)}\sum \limits_{k=-p}^p\sum \limits_{l=n-2q}^nL\left(i+k,l\right) $$
(15)
$$ t(j)=\frac{1}{\left(2p+1\right)\cdot \left(2q+1\right)}\sum \limits_{k=1}^{2p+1}\sum \limits_{l=-q}^qL\left(k,j+l\right) $$
(16)
$$ b(j)=\frac{1}{\left(2p+1\right)\cdot \left(2q+1\right)}\sum \limits_{k=m-2p}^m\sum \limits_{l=-q}^qL\left(k,j+l\right) $$
(17)

In our works, we apply the illumination equalization separately to the R, G, and B components of the original RGB image. By merging the components after illumination equalization, we can get a new RGB image that has uniform illumination. The method is utilized in horizontal direction firstly and then is used in the vertical direction based on the obtained results. Figure 3 shows an example of lip image and its components before and after illumination equalization. The intensity value of each image is changed even though we cannot detect it via our eyes. By this way, we obtain an image which can reduce the interference of uneven illumination and it is favorable for the subsequent segmentation.

Fig. 3
figure 3

(a) From left to right: original RGB image, R component, G component, B component. (b) From left to right: new RGB images after illumination equalization, R component after illumination equalization, G component after illumination equalization, B component after illumination equalization

3.3 Color space and key points

3.3.1 Color space

In previous researches, gray images are used to segment lip area most frequently. Some researchers utilize the three components of RGB image respectively and then add the results together. Others may use some new color space like bi-color space [31] and method based on the HIS and RGB color model [37]. In our study, we choose a combined color space consisting of CIE-LUV color space and DHT.

CIE-LUV color space

CIE-LUV color space is obtained from CIE-XYZ space. In this space, L denotes the luminance, which is separated from the other components. U and V represent chrominance components. CIE-LUV color space has better robustness when the luminance of image has some changes compared with other color space. The experiment shows that the U component has a high performance of brightness difference between the lip region and the background region. This difference makes the lip segmentation from the background more convenient. Therefore, we deem that the color characteristics of the lip largely rely on the U component. But the U component has some drawbacks, for instance, the lip edge is a bit fuzzy. In order to compensate for this shortcoming, we adopt the DHT.

Discrete Hartley transform

DHT has higher computational efficiency, which is one of its advantages. When the input signal is a real number, its transform formula only contains real number. DHT also has better symmetry. DHT is used in the image processing field increasingly because of these advantages. In this paper, we choose DHT to compensate for the deficiency of the U component of CIE-LUV.

From the experiment, the sum of the C2 and C3 components shows the difference distinctly between the lip region and the background region. Thus, we choose the two components to retain more lip details.

We combine the U component in CIE-LUV color space with the sum of the C2 and C3 components of the DHT to extract the color feature. This combined color space can highlight the object region from background region more obvious. It retains more useful lip edge information and provides a more favorable condition for the subsequent lip segmentation study. We can see the components and the final results in Fig. 4.

Fig. 4
figure 4

(a) RGB images after illumination equalization. (b) U component of the CIE-LUV color space. (c) C2 component of DHT. (d) C3 component of DHT. (e) Combined color space. (f) Key points

3.3.2 Key point localization

Determining key points is the critical stage of lip segmentation. In this paper, we define key points in the combined color space. Four points are considered as key points. They are two lip corners, one point on the upper lip and one on the lower lip. We coordinate the horizontal ordinates of the other two points by the median of the horizontal ordinates of the corners.

Two lip corners are detected firstly and then the other two points are defined based on the location of corners. Algorithms of determining lip corners are quite mature. Researchers have proposed various algorithms of lip corner determination. Rao and Mersereau used the gray level information of lip area pixels to define the location of corners, while some researchers utilized the method of lip edge detection to find corners [34, 38, 39]. In order to reduce the inaccuracy, we take a small area as the lip corner search area instead of a single point. Here, we briefly explain this method through the example of the corner of the mouth. First, we find the row with the maximum pixel value as the baseline. Then, columns of each point which have maximum pixel value in the upper and lower 10 lines of the baseline are detected. Finally, we take the pixels with minimum column and maximum column as left and right corner, respectively. For the other two points, we get their coordinates by the same method. Figure 5 shows the position of these points on the lip.

Fig. 5
figure 5

Position of key points

3.4 Initial contour

After transforming color space and finding key points, the initial lip contour is determined by the coordinates of key points in the new color space. In our research, we divide lip shape into two classes according to the ordinate of Q1 and Q2. If the absolute value of ordinate difference between Q1 and Q2 is greater than half the height of lip image, we deem the mouth is open. On the contrary, we consider the mouth is closed. We will explain each case in the rest of this part.

3.4.1 Closed mouth

We suppose the mouth is closed if the absolute value of ordinate difference between Q1 and Q2 is less than half the height of lip image. In this case, we choose rhombus as the initial lip contour, since rhombus is similar to lip shape. Vertexes of the rhombus are key points we have gained. Figure 6 shows the model.

Fig. 6
figure 6

Initial contour model of a closed mouth

3.4.2 Open mouth

When the absolute value of ordinate difference between Q1 and Q2 is greater than half the height of lip image, we consider the mouth is open. We find the shape of an open mouth is like an ellipse rather than a rhombus. However, when the mouth opens widely, a single ellipse shape may be inaccurate. Mark et al. proposed that the upper and lower lip shapes are different so that we can use two semi-ellipses as the initial contour [35, 40], as shown in Fig. 7.

Fig. 7
figure 7

Initial contour model of an open mouth: the outer contour is blue and the inner contour is red

We utilize key points to build the combined semi-ellipse. In previous studies, the segmentation result of LACM only has outer contour and the inner contour is unknown. In order to improve and perfect the lip segmentation, we propose the inner initial contour according to the shape of outer contour. The equations are formulated as follows:

$$ {O}_x=\frac{1}{2}\left(P{1}_x+P{2}_x\right),{O}_y=\frac{1}{2}\left(P{1}_y+P{2}_y\right) $$
(18)
$$ {a}_o=\frac{1}{2}\sqrt{{\left(P{1}_x-P{2}_x\right)}^2+{\left(P{1}_y-P{2}_y\right)}^2},\kern1em {a}_{\mathrm{i}}=\frac{3}{4}{a}_{\mathrm{o}} $$
(19)
$$ {b}_{\mathrm{oup}}=\sqrt{{\left(Q{1}_x-{O}_x\right)}^2+{\left(Q{1}_y-{O}_y\right)}^2},\kern0.75em {b}_{\mathrm{iup}}=\frac{1}{2}{b}_{\mathrm{oup}} $$
(20)
$$ {b}_{\mathrm{olow}}=\sqrt{{\left(Q{2}_x-{O}_x\right)}^2+{\left(Q{2}_y-{O}_y\right)}^2},\kern0.75em {b}_{\mathrm{ilow}}=\frac{1}{2}{b}_{\mathrm{olow}} $$
(21)
$$ \theta =\arctan \left(\frac{P{2}_y-P{1}_y}{P{2}_x-P{1}_x}\right) $$
(22)
$$ \left[\begin{array}{c}X\\ {}Y\end{array}\right]=\left[\begin{array}{c}\cos \theta \kern0.5em \sin \theta \\ {}\begin{array}{cc}-\sin \theta & \cos \theta \end{array}\end{array}\right]\cdot \left[\begin{array}{c}x-{O}_x\\ {}y-{O}_y\end{array}\right] $$
(23)
$$ \frac{{X^2}_{\mathrm{o}\mathrm{up}}}{{a^2}_{\mathrm{o}}}+\frac{{Y^2}_{\mathrm{o}\mathrm{up}}}{{b^2}_{\mathrm{o}\mathrm{up}}}=1,\kern1em \frac{{X^2}_{\mathrm{o}\mathrm{low}}}{{a^2}_{\mathrm{o}}}+\frac{{Y^2}_{\mathrm{o}\mathrm{low}}}{{b^2}_{\mathrm{o}\mathrm{low}}}=1 $$
(24)
$$ \frac{{X^2}_{\mathrm{i}\mathrm{up}}}{{a^2}_{\mathrm{i}}}+\frac{{Y^2}_{\mathrm{i}\mathrm{up}}}{{b^2}_{\mathrm{i}\mathrm{up}}}=1,\kern1.25em \frac{{X^2}_{\mathrm{i}\mathrm{low}}}{{a^2}_{\mathrm{i}}}+\frac{{Y^2}_{\mathrm{i}\mathrm{low}}}{{b^2}_{\mathrm{i}\mathrm{low}}}=1 $$
(25)

where O is the original center of the model, O x and O y are its abscissa and ordinate, and ao and ai are semi-major axes of outer and inner contour, respectively. boup and bolow are upper and lower semi-minor axes of outer contour. Similarly, biup and bilow are upper and lower semi-minor axes of inner contour. θ is the inclined angle, and it is positive along the counterclockwise direction. The local region radius r of both contours is the same, r = boup/4. The value of r ensures that the local region contains lip area and cannot result in over-convergence. According to the experiment, these parameters are applicable in most lip images.

4 Results and discussion

4.1 Results

The experiment is implemented in MATLAB R2013a, and we have tested 500 face images from AR face database. We apply the initial contours to each component of combined color space and then merge the results of each component to gain the final convergence result.

For an open mouth, the outer contour converges to the outer lip boundary firstly. We use Ro to denote the result, and it contains the internal contents of the mouth. Then, the inner contour converges to the inner lip boundary; the result is labeled as Ri, and it is the supernumerary part of the mouth except the lip. Finally, the ultimate result is R = Ro − Ri. It is relatively conspicuous that the teeth and black hole are removed perfectly by the inner contour. Figure 8 shows the initial contours and the results for closed mouth. Figure 9 shows the initial contours and the results for open mouth.

Fig. 8
figure 8

13 a RGB image after illumination equalization. b Initial contour in combined color space. c Final result. d Convergence result of U component. e Convergence result of C2 component. f Convergence result of C3 component. g Segmentation result of U component. h Segmentation result of C2 component. i Segmentation result of C3 component

Fig. 9
figure 9

13 a From left to right: RGB images after illumination equalization, outer initial contours, inner initial contours, results of outer boundary, results of inner boundary, final segmentation results; b Convergence results of outer and inner initial contour in U, C2 and C3 components; c Segmentation results of images in b

In order to demonstrate the proposed method, we use LACM with circular initial contour and the proposed initial contours to segment green component of RGB images, gray images, and images in combined color space, respectively. The iteration numbers of all the methods are set to 300, except that inner contour is 100. Figure 10 displays the results of the method using circular initial contour to segment green component images and the results of our method. Figure 11 shows the results of the method using circular initial contour to segment gray images and the results of the proposed method. In Fig. 12, we explain the results of the method using circular initial contours to segment combined color space images and the results of using the proposed method. Figure 13 shows the results of the method using the proposed initial contours to segment gray images and the proposed method.

Fig. 10
figure 10

a RGB images after illumination equalization. b Results of the method using circular initial contour to segment green component images. c Results of the proposed method

Fig. 11
figure 11

a RGB images after illumination equalization. b Results of the method using circular initial contour to segment gray images. c Results of the proposed method

Fig. 12
figure 12

a RGB images after illumination equalization. b Results of the method using circular initial contours to segment combined color space images. c Results of the proposed method

Fig. 13
figure 13

a RGB images after illumination equalization. b Results of the method using the proposed initial contours to segment gray images. c Results of the proposed method

It can be seen that the results of the proposed method are more accurate, more complete, and smoother than compared methods. Our method contains more details because we utilize a combined color space which consists of the U component in CIE-LUV color space and the sum of C2 and C3 color space of DHT. For open mouth, our method can get the segmentation results only involving the lip area instead of containing teeth or holes. On account of utilizing illumination equalization, the proposed method avoids the problem of partial convergence or over-convergence owing to the non-uniform illumination. Furthermore, we can get appropriate initial contour according to key points which are defined by our method. This avoids the issue that local region excludes the lip area or contains more lip region even involving the information inside of mouth. Hence, it can converge to the real lip boundary precisely using our method.

4.2 Discussion

To evaluate the accuracy of our algorithm, we use two measures [41]. The first measure is defined as

$$ \mathrm{OL}=\frac{2\left({A}_1\cap {A}_2\right)}{A_1+{A}_2}\times 100\% $$

It determines the percentage of overlap (OL) between the segmented lip region A1 and the ground truth A2. The second measure is the segmentation error (SE), defined as

$$ \mathrm{SE}=\frac{\mathrm{OLE}+\mathrm{ILE}}{2\times \mathrm{TL}}\times 100\% $$

where OLE is the number of non-lip pixels being classified as lip pixels (outer lip error) and ILE is the number of lip pixels classified as non-lip ones (inner lip error); TL denotes the number of lip pixels in the ground truth. The ground truth of each image is obtained by manual segmentation.

We divide the images into two groups according to open mouth and closed mouth. Each group is randomly divided into two equal parts. Part 1 and 2 are closed mouth images and part 3 and 4 are open mouth images. The average values of each part are shown in Table 1. It can be found the OL values of the proposed method are higher than compared methods in the closed mouth group. Meanwhile, in the open mouth group, the OL values of the proposed method are higher than compared methods obviously, and the SE values of the former are lower than the latter evidently.

Table 1 Overlap and segmentation error values of two methods

5 Conclusions

In this paper, we have proposed a LACM-based method using two initial contours to segment lip area. For open mouth, we utilized a combined semi-ellipse as the initial contours. And for closed mouth, a rhombus is used as the initial contour. Before the initial contour is determined, we did a set of preparatory works. First, we applied illumination equalization to RGB images to reduce interference of uneven illumination. Then, we adopted a combined color space which involves the U component in CIE-LUV color space and the sum of C2 and C3 components of DHT. Finally, we determined shape of initial contours due to the positions of four key points in the combined color space. It can be seen from the experiment that this method can obtain accurate, smooth, and robust results. Meanwhile, our results are more similar to the true lip boundary comparing with the results obtained by using a circle as initial contour to segment gray images and combined color space images. Nevertheless, as many other methods, this method cannot get particularly satisfactory results in the case of images containing gums, because gums are similar to lips in the color and texture. We will resolve this problem in future research.