Fast 2D/3D object representation with growing neural gas
 1.9k Downloads
 2 Citations
Abstract
This work presents the design of a realtime system to model visual objects with the use of selforganising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building nonrigid shapes using the growth mechanism of the selforganising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from informationtheoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.
Keywords
Minimum description length Selforganising networks Shape modelling Clustering1 Introduction
The images captured of hand gestures, which are effectively a 2D projection of a 3D object, can become very complex for any recognition system. Systems that follow a modelbased method [1, 32] require an accurate 3D model that captures efficiently the hand’s high Degrees of Freedom (DOF) articulation and elasticity. The main drawback of this method is that it requires massive calculations which makes it unrealistic for realtime implementation. Since this method is too complicated to implement, the most widespread alternative is the featurebased method [16] where features such as the geometric properties of the hand can be analysed using either Neural Networks (NNs) [34, 36] or stochastic models such as Hidden Markov Models (HMMs) [6, 35].
However, for the accurate analysis of the hand’s properties, a suitable segmentation that separates the object of interest from the background is needed. Segmentation is a preprocessing step in many computer vision applications. These applications include visual surveillance [5, 10, 18, 20], and object tracking [15, 17, 26]. While a lot of research has been focused on efficient detectors and classifiers, little attention has been paid to efficiently labelling and acquiring suitable training data. Existing approaches to minimise the labelling effort [19, 21, 24, 30] use a classifier which is trained in a small number of examples. Then the classifier is applied on a training sequence, and the detected patches are added to the previous set of examples. Levin et al. [21] start with a small set of hand labelled data and generate additional labelled examples by applying cotraining of two classifiers. Nair and Clark [24] use motion detection to obtain the initial training set. Lee et al. [21] use a variant of eigentracking to obtain the training sequence for face recognition and tracking. Sivic et al. [30] use boosting orientationbased features to obtain training samples for their face detector. A disadvantage of these approaches is that either a manual initialization [19] or a pretrained classifier is needed to initialise the learning process. Having a sequence of images, this can be avoided by using an incremental model.
We decided to use NNs to represent the geometric properties of objects, and more specifically the selforganising maps (SOMs), due to their incremental nature. One of these SOMbased methods is the growing cell structures (GCS) algorithm [8], which is a model formed incrementally. However, it constrains the connections between the nodes, so any model produced during the training stage is always topologically equivalent to the initial topology. The Topology Representing Networks (TRN) approach, proposed by Martinez and Schulten [22], does not have a fixed structure and also does not impose any constraint on the connection between the nodes. In contrast, this network has a preestablished number of nodes and, therefore, it is not able to generate models with different resolutions. The algorithm was also coined with the term Neural Gas (NG) due to the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. However, as the NG has a fixed number of nodes, it is necessary to have some a priori information about the input space to preestablish the size of the network. This model was extended by Fritzke [9] proposing the Growing Neural Gas (GNG) network, which combined the flexible structure of the NG with a growing strategy. Moreover, the learning adaptation step was slightly modified. This extension enabled the neural network to use the already detected topological information while training in order to conform to the geometry. This approach has the capability to add neurons while preserving the topology of the input space.
Although the use of the SOMbased techniques of NG, GCS or GNG for various data inputs has already been studied and successful results have been reported [4, 13, 14, 27, 31, 32], there are some limitations that still persist. Most of these works assumed noisefree environments and low complexity distributions. Therefore, applying these methods on challenging real world data obtained using noisy 2D^{1} and 3D^{2} sensors is our main study. These particular noninvasive sensors have been used in the associated experiments and are typical, contemporary technology.
In this work, we extend the method presented in [2] for object representation using the GNG algorithm. This work extends the already proposed method by considering elimination of noisy connections during the learning process and by applying it to 3D datasets. The method is used for the representation of twodimensional outline of hands and ventricles, which is extended to 3D. Furthermore, we are interested in the minimisation of the user intervention in the learning process; thus, we utilise an automatic criterion for maximum node growth based on topological parameters. We achieve that by taking into consideration that human skin has a relatively unique colour and the complexity or simplicity of the proposed model is decided by informationtheoretic measures.
The remainder of the paper is organised as follows. Section 2 introduces the framework for object modelling using topological relations. Section 3 proposes an approach to minimise the user intervention in the termination of the network using knowledge obtained from informationtheoretic considerations. In Sect. 4 a set of experimental results is presented that includes 2D and 3D representations before conclusions are drawn in Sect. 5.
2 Characterising 2D objects with modified GNG
GNG [9] is an unsupervised incremental selforganising network independent of the topology of the input distribution or space. It uses a growth mechanism inherited from the Growth Cell Structure [8] together with the Competitive Hebbian Learning (CHL) rule [22] to construct a network of the input date set. In the GNG algorithm [9], the growing process starts with two nodes, and new nodes are incrementally inserted until a predefined conditioned is satisfied, such as the maximum number of nodes or available time. During the learning process, local error measures are gathered to determine where to insert new nodes. New nodes are inserted near the node with the highest accumulated error and new connections between the winner node and its topological neighbours are created.
Identifying the points of the image that belong to objects allows the GNG network to obtain an induced Delaunay triangulation of the objects. In other words, to obtain an approximation of the geometric appearance of the object. Let an object \(\mathbf {\textit{O}} = [\mathbf {\textit{O}}_{G}, \mathbf {\textit{O}}_{A}]\) be defined by its geometry and its appearance. The geometry provides a mathematical description of the object’s shape, size, and parameters such as translation, rotation, and scale. The appearance defines a set of the object’s characteristics such as colour, texture, and other attributes.
Given a domain \(\mathbf {S}\subseteq \mathbb {R}^2\), an image intensity function \(\mathbf {I}(x,y)\in \mathbb {R}\) such that \(\mathbf {I} : \mathbf {S} \rightarrow [0, \mathbf {I}_{\max }]\), and an object \(\mathbf {\textit{O}}\), its standard potential field \(\varPsi _{T} (x,y) = f_{T}(I(x,y))\) is the transformation \(\varPsi _{T}: \mathbf {S} \rightarrow [0, 1]\) which associates with each point \((x,y)\in \mathbf {S}\) the degree of compliance with the visual property T of the object \(\mathbf {\textit{O}}\) by its associated intensity \(\mathbf {I}(x,y)\).
 The input distribution as the set of points in the image:$${\mathbf {A}}= {\mathbf {S}}$$(1)$$\xi _{w}= (x,y)\in {\mathbf {S}}$$(2)
 The probability density function according to the standard potential field obtained for each point of the image:$$p(\xi _{w}) = p(x,y) = \varPsi _{T} (x,y)$$(3)
Topology Preservation measures of the original vs. modified GNG with respect to frames per second (fps)
Shape  Nodes  Original GNG  Modified GNG  

Fps  QE  TE  Fps  QE  TE  
Star4  71  1.16  2.7551  0  7.30  2.6375  0 
Star6  74  1.11  2.9564  0  6.06  2.9073  0.0014 
Cloud  97  0.61  2.7275  0  5.26  2.6561  0 
Heart  70  1.38  2.9337  0  5.24  2.9347  0 
Lightning  71  1.04  2.9391  0  6.99  2.8138  0 
As reflected in Table 1, GNG modified version provided lower quantization and topology preservation errors due to the deletion of wrong edges for most cases. However, in a few cases, wrong edges provide a shorter distance between input space and the Delaunay triangulation obtained (see Star6 TE).
3 Adaptive learning
\(e_{T}\) is a similarity threshold and defines the accuracy of the map. If \(e_{T}\) is low, the topology preservation is lost and more nodes need to be added. On the contrary, if \(e_{T}\) is too big, then nodes have to be removed so that Voronoï cells become wider. For example, let us consider an extreme case where the total size of the image is \(I = 100\) pixels and only one pixel represents the object of interest. Let us suppose that we use \(e_{T} = 100\) then the object can be represented by one node. In the case where \(e_{T} \ge I\) then overfit occurs since twice as many nodes are provided.
In our experiments, the numerical value of \(e_{T}\) ranges from \(100\le e_{T} \le 900\) and the accuracy depends on the size of the objects’ distribution. The difference between choosing manually the maximum number of nodes and selecting \(e_{T}\) as the similarity threshold, is the preservation of the object independently of scaling operations. Algorithm 3 shows the steps of the automatic criterion added to the modified GNG algorithm to minimise user intervention in the learning process.

For all K, \((K_{\min}< K <K_{\max })\)
(a) Maximize the likelihood \(L(XW_{K},\varTheta _{K})\) using the EM algorithm to cluster the nodes based on the similarity thresholds applied to the dataset.
(b) Calculate the value of MDL(K) according to Eqs. 9 and 10

Select the model parameters \((W_{K},\varTheta _{K})\) that correspond to minimisation of the MDL(K) value.
4 Experiments
In this section, different experiments are shown validating the capabilities of our extended GNG method to represent 2D and 3D hand models. The proposed method by considering elimination of noisy connections during the learning process is able to define an optimal number of nodes using the MDL criterion. The method has also been used in 3D datasets. First, a quantitative study is performed adding different levels of noise to the ground truth model (datasets). Using the ground truth models and the generated ones adding noise, we are able to measure the error produced by our method. In addition, our method is compared against the stateoftheart algorithms Active Shape Models and Poisson surface reconstruction.
All methods have been developed and tested on a desktop machine of 2.26 GHz Pentium IV processor. These methods have been implemented in MATLAB and C++. The Poisson surface reconstruction method has been implemented using the PCL library^{3} [29].
4.1 Benchmark data
We tested our modified GNG network on a dataset of hand images recorded from 5 participants each performing different gestures (Fig. 7) that frequently appear in sign language. To create this dataset, we have recorded images over several days and a simple webcam was used with image resolution \(800 \times 600\). In total, we have recorded over 12000 frames, and for computational efficiency, we have resized the images from each set to \(300 \times 225\), \(200 \times 160\), \(198 \times 234\), and \(124 \times 123\) pixels. We obtained the dataset from the University of Alicante, Spain and the University of Westminster, UK. Also, we tested our method with 49 images from Mikkel B. Stegmann^{4} online dataset. In total we have run the experiments on a dataset of 174 images. Since the background is unambiguous, the network adapts without occlusion reasoning. For our experiments, only complete gesture sequences are included. There are no gestures with partial or complete occluded regions, which means that we do not model multiple objects that interact with the background.
Topology preservation and processing time using the quantisation error and the topology preservation error for different variants
Variant  Number of nodes  Time (s)  QE  TE 

\(\text {GNG}_{\lambda = 100, K=1}\)  23  0.22  8.932453  0.4349 
\(\hbox {GNG}_{\lambda = 100, K=9}\)  122  0.50  5.393949  −0.3502 
\(\hbox {GNG}_{\lambda = 100, K=18}\)  168  0.84  5.916987  −0.0303 
\(\hbox {GNG}_{\lambda = 300, K=1}\)  23  0.90  8.024549  0.5402 
\(\hbox {GNG}_{\lambda = 300, K=9}\)  122  2.16  5.398938  0.1493 
\(\hbox {GNG}_{\lambda = 300, K=18}\)  168  4.25  4.610572  0.1940 
\(\hbox {GNG}_{\lambda = 600, K=1}\)  23  1.13  0.182912  −0.0022 
\(\hbox {GNG}_{\lambda = 600, K=9}\)  122  2.22  0.172442  0.3031 
\(\hbox {GNG}_{\lambda = 600, K=18}\)  168  8.30  0.169140  −0.0007 
\(\hbox {GNG}_{\lambda = 1000, K=1}\)  23  1.00  0.188439  0.0750 
\(\hbox {GNG}_{\lambda = 1000, K=9}\)  122  12.02  0.155153  0.0319 
\(\hbox {GNG}_{\lambda = 1000, K=18}\)  168  40.98  0.161717  0.0111 
The topology preservation error for gestures (a–d)
Image (a)  Image (b)  Image (c)  Image (d)  

Nodes  TE  Nodes  TE  Nodes  TE  Nodes  TE 
26  −0.0301623  26  −0.021127  24  −0.017626  19  −0.006573 
51  −0.030553  51  −0.021127  47  −0.047098  37  −0.007731 
77  0.04862  77  0.044698  71  0.046636  56  0.027792 
102  0.048256  102  0.021688  95  0.017768  75  0.017573 
128  0.031592  128  0.011657  119  0.014589  94  0.018789 
153  0.038033  153  0.021783  142  0.018929  112  0.016604 
179  0.047636  179  0.017223  166  0.017465  131  0.017755 
205  0.038104  205  −0.013525  190  0.017718  150  0.007332 
230  0.037321  230  0.017496  214  −0.007543  168  0.007575 
Error measurements for modified GNG, Kohonen and GCS
Gestures  Method  Nodes  RMS  TE 

Gesturethree fingers (sigma = 0)  Modified GNG  21  0.2558  0.055554 
Gesturethree fingers (sigma = 0)  Kohonen  25  1.6410  0.172629 
Gesturethree fingers (sigma = 0)  GCS  30  0.5494  0.159913 
Gesturethree fingers (sigma = 0.25)  Modified GNG  21  1.4189  0.083485 
Gesturethree fingers (sigma = 0.25)  Kohonen  25  2.6578  0.237586 
Gesturethree fingers (sigma = 0.25)  GCS  30  1.6134  0.241429 
Gesturethumb (sigma = 0)  Modified GNG  25  0.2440  0.046621 
Gesturethumb (sigma = 0)  Kohonen  30  0.5376  0.194685 
Gesturethumb (sigma = 0)  GCS  31  0.3144  0.176336 
Gesturethumb (sigma = 0.25)  Modified GNG  25  0.3844  0.058153 
Gesturethumb (sigma = 0.25)  Kohonen  30  0.6956  0.242131 
Gesturethumb (sigma = 0.25)  GCS  31  0.3956  0.239292 
Gestureopen hand (sigma = 0)  Modified GNG  23  0.9660  0.048011 
Gestureopen hand (sigma = 0)  Kohonen  25  3.4727  0.146884 
Gestureopen hand (sigma = 0)  GCS  27  2.3790  0.150354 
Gestureopen hand (sigma = 0.25)  Modified GNG  23  1.4025  0.059658 
Gestureopen hand (sigma = 0.25)  Kohonen  25  3.5340  0.240014 
Gestureopen hand (sigma = 0.25)  GCS  27  2.4599  0.112732 
4.2 Variability and comparison with the snake model
Parameters and performance for snake
Hand  Constants  Iterations  Time (s) 

Sequence (a)  \(\alpha = 0.05\)  40  15.29 
\(\beta = 0\)  
\(\gamma = 1\)  
\(\kappa = 0.6\)  
\(D_{\min} = 0.5\)  
\(D_{\max } = 2\)  
Sequence (b)  \(\alpha = 4\)  50  15.20 
\(\beta = 1\)  
\(\gamma = 2\)  
\(\kappa = 0.6\)  
\(D_{\min} = 0.5\)  
\(D_{\max } = 2\)  
Sequence (c)  \(\alpha = 4\)  40  12.01 
\(\beta = 1\)  
\(\gamma = 3\)  
\(\kappa = 0.6\)  
\(D_{\min} = 0.5\)  
\(D_{\max } = 2\)  
Sequence (d)  \(\alpha = 4\)  20  5.60 
\(\beta = 1\)  
\(\gamma = 3\)  
\(\kappa = 0.6\)  
\(D_{\min} = 0.5\)  
\(D_{\max } = 2\) 
Convergence and execution time results of modified GNG and snake
Method  Convergence (iteration times)  Time (s) 

Snake  20  5.60 
40  12.01  
50  15.20  
40  15.29  
Modified GNG  2  0.73 
2  1.22  
3  2.17  
5  4.88 
4.3 3D reconstruction
This section shows the result of applying an existing approach proposed by OrtsEscolano et al. [25] for performing 3D surface reconstruction using the GNG algorithm. In this work, we focused on the application of the abovementioned method for performing reconstruction of human hands and faces that were acquired using the Kinect sensor. Moreover, some experiments were performed using synthetic data.
In [25], the original GNG algorithm is extended to perform 3D surface reconstruction. Furthermore, it considers surface normal information during the learning process. It modifies original Competitive Hebbian Learning process, which only considered the creation of edges between neurons, producing wireframe 3D representations. Therefore, it is necessary to modify the learning process in order to create triangular faces during network adaptation.
The edge creation, the neurons insertion and the neuron removal stages were extended considering the creation of triangular faces during this process. Algorithm 5 describes the extended CHL to produce triangular faces during the adaption process.
Moreover, it can also be appreciated that the generated representation is accurate and implicitly it performs some typical computer vision preprocessing steps such as filtering, downsampling and 3D reconstruction.
In all our experiments, the parameters of the network are as follows: \(\lambda = 100\) to 1000, \(\epsilon _x = 0.1\), \(\epsilon _n = 0.005\), \(\Delta x_{s_{1}} = 0.5\), \(\Delta x_{i} = 0.0005\), \(\alpha _{\max } = 125\).
While 3D downsampling and reconstruction methods like Poisson or Voxelgrid are not able to deal with noisy data, GNG method is able to avoid outliers and obtain an accurate representation in presence of noise. This ability is due to the Hebbian learning rule used and its random nature that update vertex location based on the average influence of a large number of input patterns.
5 Conclusions and future work
Based on the capabilities of GNG to readjust to new input patterns without restarting the learning process, we developed an approach to minimise the user intervention by utilising an automatic criterion for maximum node growth. This automatic criterion for GNG is based on the object’s distribution and the similarity threshold (\(e_{T}\)) which determines the preservation of the topology. The model is then used for the representation of motion in image sequences by initialising a suitable segmentation. During testing we found that for different shapes there exists an optimum number that maximises topology learning versus adaptation time and MSE. This optimal number uses knowledge obtained from informationtheoretic considerations. Furthermore, we have shown that the low dimensional incremental neural model (GNG) adapts successfully to the high dimensional manifold of the hand by generating 3D models from raw data received from the Kinect. Future work will aim at improving system performance at all stages to achieve a natural user interface that allows us to interact with any object manipulation system. Likewise, the acceleration of the whole system should be completed on GPUs.
Footnotes
 1.
Webcam with image resolution \(800\times 600\).
 2.
Kinect for XBox 360: http://www.xbox.com/kinectMicrosoft.
 3.
The Point Cloud Library (or PCL) is a large scale, open project for 2D/3D image and point cloud processing.
 4.
Notes
Acknowledgments
This work was partially funded by the Spanish Government DPI201340534R Grant.
References
 1.Albrecht I, Haber J, Seidel H (2003) Construction and animation of anatomically based human hand models. In: Proceedings of the 2003 ACM SIGGRAPH/eurographics symposium on computer animation, pp 98–109Google Scholar
 2.Angelopoulou A, García J, Psarrou A, Gupta G, Mentzelopoulos M (2013) Adaptive learning in motion analysis with selforganising maps. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–7Google Scholar
 3.Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:26–66MathSciNetzbMATHGoogle Scholar
 4.Cretu A, Petriu E, Payeur P (2008) Evaluation of growing neural gas networks for selective 3D scanning. In: Proceedings of IEEE international workshop on robotics and sensors, environments, pp 108–113Google Scholar
 5.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol I, pp 886–893Google Scholar
 6.Eddy S (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365MathSciNetCrossRefGoogle Scholar
 7.Fernandez A, Ortega M, Cancela B, Penedo MG (2012) Contextual and skin color region information for face and arms location. In: Proceedings of the 13th international conference on computer aided systems theory—EUROCAST 2011, vol 6927/2012, pp 616–623Google Scholar
 8.Fritzke B (1994) Growing cell structures—a selforganising network for unsupervised and supervised learning. J Neural Netw 7(9):1441–1460CrossRefGoogle Scholar
 9.Fritzke B (1995) A growing neural gas network learns topologies. In: Advances in neural information processing systems 7 (NIPS’94), pp 625–632Google Scholar
 10.GarcíaRodríguez J, Angelopoulou A, GarcíaChamizo JM, Psarrou A, Escolano SO, Giménez VM (2012) Autonomous growing neural gas for applications with time constraint: optimal parameter estimation. Neural Netw 32:196–208CrossRefGoogle Scholar
 11.Govindaraju V (1996) Locating human faces in photographs. Int J Comput Vision 19(2):129–146CrossRefGoogle Scholar
 12.Gschwandtner M, Kwitt R, Uhl A, Pree W (2011) BlenSor: blender sensor simulation toolbox advances in visual computing, volume 6939 of lecture notes in computer science, chap 20. Springer, BerlinGoogle Scholar
 13.Gupta G, Psarrou A, Angelopoulou A, García J (2012) Region analysis through close contour transformation using growing neural gas. In: Proceedings of the international joint conference on neural networks, IJCNN2012, pp 1–8Google Scholar
 14.Holdstein Y, Fischer A (2008) Threedimensional surface reconstruction using meshing growing neural gas (MGNG). Vis Comput Int J Comput Graph 24(4):295–302Google Scholar
 15.Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skincolor modeling and detection methods. Pattern Recogn 40(3):1106–1122CrossRefzbMATHGoogle Scholar
 16.Koike H, Sato Y, Kobayashi Y (2001) Integrating paper and digital information on enhanced desk: a method for real time finger tracking on an augmented desk system. ACM Trans Comput Hum Interact 8(4):307–322CrossRefGoogle Scholar
 17.Kruppa H (2004) Object detection using scalespecific boosted parts and a Bayesian combiner. PhD Thesis, ETH ZrichGoogle Scholar
 18.Kruppa H, Santana C, Sciele B (2003) Fast and robust face finding via local context. In: Proceedings of the IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 157–164Google Scholar
 19.Lee M, Nevatia R (2005) Integrating component cues for human pose estimation. In: Proceedings of the IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 41–48Google Scholar
 20.Leibe B, Seemann E, Sciele B (2005) Pedestrian detection in crowded scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol I, pp 878–885Google Scholar
 21.Levin A, Viola P, Freund Y (2003) Unsupervised improvement of visual detectors using cotraining. In: Proceedings of the IEEE international conference on computer vision, vol I, pp 626–633Google Scholar
 22.Martinez T, Schulten K (1994) Topology representing networks. J Neural Netw 7(3):507–522CrossRefGoogle Scholar
 23.Mignotte M (2008) Segmentation by fusion of histogrambased kmeans clusters in different color spaces. IEEE Trans Image Process 5(17):780–787MathSciNetCrossRefGoogle Scholar
 24.Nair V, Clark J (2004) An unsupervised, online learning framework for moving object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, vol II, pp 317–324Google Scholar
 25.OrtsEscolano S, GarciaRodriguez J, Morell V, Cazorla M, Perez J, GarciaGarcia A (2015) 3D surface reconstruction of noisy point clouds using growing neural gas: 3D object/scene reconstruction. Neural Process Lett 43:1–23Google Scholar
 26.Papageorgiou C, Oren M, Poggio T (1998) A general framework for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 555–562Google Scholar
 27.Rêgo R, Araújo A, de Lima Neto F (2007) Growing selforganizing maps for surface reconstruction from unstructured point clouds. In: Proceedings of the international joint conference on artificial neural networks, IJCNN’07, pp 1900–1905Google Scholar
 28.Rissanen J (1978) Modelling by shortest data description. Automatica 14:465–471CrossRefzbMATHGoogle Scholar
 29.Rusu R, Cousins S (2011) 3D is here: point cloud library (PCL). In: Proceedings of the IEEE international conference on robotics and automation, ICRA, Shanghai, China, May 9–13, 2011Google Scholar
 30.Sivic J, Everingham M, Zisserman A (2005) Person spotting: video shot retrieval for face sets. In: International conference on image and video retrieval, pp 226–236Google Scholar
 31.Stergiopoulou E, Papamarkos N (2009) Hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22(8):1141–1158CrossRefGoogle Scholar
 32.Sui C (2011) Appearancebased hand gesture identification. Master of Engineering, University of New South Wales, SydneyGoogle Scholar
 33.Uriarte EA, Martn FD (2005) Topology preservation in SOM. Int J Appl Math Comput Sci 1(1):19Google Scholar
 34.Vamplew P, Adams A (1998) Recognition of sign language gestures using neural networks. Aust J Intell Inf Process Syst 5(2):94–102Google Scholar
 35.Wong S, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. In: IEEE transactions on pattern analysis and machine intelligence, pp 873–8912005Google Scholar
 36.Yang J, Bang W, Choi E, Cho S, Oh J, Cho J, Kim S, Ki E, Kim D (2009) A 3D handdrawn gesture input device using fuzzy ARTMAPbased recognizer. J Syst Cybern Inform 4(3):1–7Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.