After the data is acquired by means of sensor or in the form of images, the most important labour is to analyse the data. The data must have characteristics like clarity, unambiguity and should be mutually exclusive. This needs detail study on the kind of data available as the correct pre-processing method has to be applied before using a feature extraction techniques.
Feature extraction
Quite often the data collected contains noise, repeated information or is unlabeled or have high dimensionality. Noise in case of images can be due to improper illumination, inefficiency in the technologies used to collect the data and the improper ways to capture it. A huge amount of data means higher processing time and large memory requirement as well as a threat of over fitting of data. Such a raw data has to be processed before any feature extraction process is applied to it. Feature extraction is a process of reducing the dimension of the raw data so that it becomes easy to process further. It is a technique to combine variables into important features, by removing the redundant data, still describing the original data set. Various feature extraction techniques studied in the literature are wavelet transform, Gray Level Co-occurrence Matrix, Fractals, Principal Component Analysis, CNN.
Features classification
The extracted features have to be classified into a particular class with accuracy. Feature classification is a technique used to categorize a huge number of data into different classes. The popular techniques for feature classification are Support Vector Machine, Random forest, Decision tree, K-means and CNN.
Traditional and Convolutional Neural Network
Neural network are data driven AI techniques that are used to find the relationship between the input data and output classes. In traditional neural network models, there is a separate tool for feature extraction and feature classification. Deep learning integrates feature extraction, feature selection and features classification in one model. The filters (kernels) are selected automatically by the model depending upon the feature. For example, if a feature is a plus (+) sign the filter corresponding to ‘+’ is selected, if a feature is backward slash (\), then the filter corresponding to ‘\’ is selected. The filter learns from the image, tune the parameters and reconstruct it till it gets the optimized result. The traditional technique uses feature extraction by transforming the data in spatial or frequency domain. Feature selection is then performed to remove the redundant features and feeding it to the classifier, which then classifies data into given classes.
Architecture of CNN
In a basic NN model, each neuron from (n − 1) layer is connected to all the neurons in the nth layer that forms a complex structure of neurons as shown in Fig. 1. CNN, a deep learning tool is a feature based approach that uses multiple layers to process data similar to neural networks. A typical CNN consists of a convolution layer, pooling layer and a fully connected layer between the input layer and the output layer as shown in Fig. 2.
The CNN approach of deep learning can extract the textural features automatically. It has been designed in such a way that the kernels modify itself to suit the learn by itself. The CNN approach has been successfully applied on raw data and classification accuracy is high.
The overall training process of the Convolution Network may be summarized as below:
Step 1 The decided filters and weights are assigned random values.
Step 2 The input image is then taken by the network. Two operations are performed on this image (convolution, ReLU and pooling operations) and a feature map is generated.
Step 3 The obtained feature is then fed to another layer of NN where again the operations like convolution, ReLU and pooling are performed and another feature map is generated. The number of feature maps depends upon the number of layers. Each feature map is stacked over the other.
Step 4 The set of feature maps is fed to fully connected layers that finds the probability for each class. Let’s say the probabilities found are [0.3, 0.2, 0.2, 0.3]. Since weights are randomly assigned for the first training example, output probabilities are also random. e.g. If four images of cat, dog, tiger and leopard are in four different classes and the input image is of cat that belongs to Class-A, then the output probabilities should be [1, 0, 0, 0].
Step 5 The network then calculate the error between input and output layer. This error (if any) is then back propagated and all the filters weights are updated. Then the steps 2 to 4 are performed again and the output probabilities now might be [0.8, 0.1, 0.0, 0.1]. Thus the network learns to adjust weights and filters and classify the image in a correct class.
Step 6 The above steps are then repeated with all the images in the training data set.
An image is given as input to the CNN, where it is first processed by the convolution layer, Relu function, pooling operation and activation function.
Convolution layer The primary purpose of convolution in case of a ConvNet is to extract features from the input image. Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data. The input to convolution layer, which is a linear operation, is the raw image with known class and the output is a feature map.
ReLU function The feature map so obtained is passed through a nonlinear operation ReLU (Rectified Linear Unit). ReLU is applied to each pixel and it replaces all negative pixel values in the feature map to zero as the data in real world is nonlinear. The output of ReLU is a rectified feature map.
Pooling This operation reduces the dimensionality of each feature by retaining the important information in the input image. The step is also called as sub sampling or down sampling. Pooling applies a filter to the rectified feature map and reduces the size of the image by selecting those pixels which best describes the image. Pooling control over-fitting of an images and represents an almost scale invariant image.
Fully connected layer This is a classification layer where every neuron in the previous layer is connected to every neuron on the next layer. The output from the pooling layers is used to classify the input image into various classes based on the training dataset.