Hybrid Model Based on ReliefF Algorithm and K-Nearest Neighbor for Erythemato-Squamous Diseases Forecasting

Machine learning (ML) techniques have been used to solve real-world problems for decades. In the field of medical sciences, these approaches have been found to be useful in the diagnosis and prognosis of a variety of medical disorders. However, when dealing with voluminous, inconsistent, and higher-dimensional data, conventional ML approaches have failed to deliver the expected results. Researchers have suggested hybrid solutions to resolve these problems, which have been found to be more effective than conventional methods because these systems integrate their merits while reducing their drawbacks. In the current research article, hybrid model has been presented by coupling feature optimization with prediction approach. The proposed hybrid model has two stages: the first involves implementing the ReliefF Algorithm for optimum feature selection in erythemato-squamous diseases, and the second involves implementing k-nearest neighbor (KNN) for prediction of those selected optimum features. The experimentation was carried out on bench mark dataset for erythemato-squamous diseases. The presented hybrid model was also assessed with conventional KNN approach based on various metrics such as classification accuracy, kappa coefficient, recall, precision, and f-score.


Introduction
Erythemato-squamous disease refers to skin disorders which cause the skin irritated, blocked, or inflamed. It results in various symptoms on skin such as rashes, lesions, macule, swelling, burning, and itching. It may be caused by inflammation, acne medications, asthma, solar radiation, photosensitization, acute radiation syndrome, bacteria, viruses or fungal infection, any of which may induce dilation of the capillaries, resulting in redness [1]. It can be of several types and its cure depends on the type of erythema. It may occur in numerous patterns and color variations in different areas of the body, and several types result in blisters erupting and causing sores. The severity of this disorder varies from moderate to life threatening. Mild conditions get recovered without cure in some days. Medication or emergency treatment is needed in case of more extreme cases [2].
As per the Global Burden of Disease report, skin disorders have been ranked as the 4th most common cause of non-fatal diseases across the globe [3]. According to [4], around 7.5 million people have been diagnosed with psoriasis, 1 million with melanoma, and 16 million with rosacea. In fact, it is difficult to diagnose erythemato-squamous disease without the assistance of a specialist since there are a number of possible skin conditions and these diseases have histopathological conditions also; so it is best to consult the doctor for any changes. Biopsy is recommended in order to diagnose these illnesses effectively [5].
Hence, it needs research in this domain and many researchers are working in this direction. In order to diagnose these disorders effectively, machine learning approaches have been practiced by researchers since recent years. These techniques have been found to be efficient as they aid dermatologists to make their diagnosis accurate [6].
However, when dealing with non-linear, inconsistent, and higher-dimensional data, these traditional ML techniques have performance issues. To address these issues, researchers have begun to introduce hybrid and ensemble models by combining conventional machine learning methods with other soft computing approaches or by using bagging/ boosting methodology. Hybridizing two or more soft computing techniques improves efficiency by incorporating the benefits of the each technique. These hybrid methods have proven to be very successful for medical diagnosis in cases where data is large, multi-dimensional, and non-linear [7]. This has motivated the authors of the paper to implement hybrid system based on feature optimization and prediction approach for erythemato-squamous disease diagnosis so that the undertaken disease can be predicted with better accuracy.
This paper presents a hybrid model by coupling ReliefF Algorithm with K-Nearest Neighbor (KNN) approach for the prediction of erythemato-squamous disorders. The proposed hybrid model works in two stages: the first stage deals with the implementation of ReliefF Algorithm for optimum feature selection involved in erythemato-squamous diseases; and the second stage deals with implementation of KNN for classifying those selected optimum features. The experiments have been conducted on benchmark dataset of erythemato-squamous disorders. This dataset comprises six erythemato-squamous diseases namely psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra pilaris.
For the current study, ReliefF algorithm has been selected as feature optimization technique because this approach has been found to be better in case of multi-class output and categorical input attributes [8]. KNN approach has been selected as prediction technique because KNN performs better in case of heterogeneous data, i.e., data set has numeric and categorical input attributes. Also, KNN approach predicts highly convoluted decision boundary in case of multidimensional data [9]. The collected dataset of erythematosquamous disease consists of 32 categorical and 02 numeric input attributes; and six output classes.
This research article is organized as follows: After introduction, Sect. 2 sheds light on the literature relevant to the present work. Section 3 elucidates methodology adopted to implement the proposed hybrid system. Section 4 analyzes and discusses about the experimental outcomes, and Sect. 5 outlines the conclusion of the paper.

Literature Review
In order to diagnose skin diseases, several ML based approaches have been implemented by researchers in recent years. These techniques include regression approach, classifiers, association rule mining, decision trees, and neural networks. These approaches assist dermatologists from diagnosis to customized cure [6].
Guvenir et al. [10] proposed a voting feature interval VFI5-based method for the identification of Erythemato-squamous disorders. The proposed approach was implemented with tenfold cross validation, and the proposed approach achieved classification accuracy of 99.2%.
A model for scaling segmentation in 2-D (two dimensional) digital images has been presented by Lu et al. [11] for evaluation of psoriasis severity. The proposed approach integrates Markov random field (MRF) approach with support vector machine (SVM). The experimental results predicted sensitivity of 81.79% with specificity of 87%.
Pomponiu et al. [12] applied pre-trained deep neural network for skin lesion identification. The experimentation was carried out on DermIS dataset consisting of 399 images of skin mole lesion. The proposed system achieved the accuracy of 93.64% along with recall of 92.10% and specificity of 95.18%.
Next year, Gustafson et al. [13] presented a machine learning based system for the diagnosis of atopic dermatitis in adults. The proposed system coupled natural language processing (NLP) with lasso logistic regression technique for the prediction of atopic dermatitis from electronic health records (EHR). The experimental results achieved sensitivity of 75% with positive predicted value (PPV) of 84%.
An automated system has been developed by Wei et al. [14] for detection of herpes, dermatitis, and psoriasis skin diseases from image color and texture features. The proposed system worked by pre-processing the input image for removal of noise and irrelevant background through the implementation of median filter and transformation approaches; followed by gray-level co-occurrence matrix (GLCM) implementation for feature extraction. Finally the extracted features were fed to SVM technique for disease identification undertaken in the study. The presented system predicted maximum accuracy of 95%.
An image processing based approach was presented by Alenezi [15] for the recognition of three types of skin ailments: Melanoma, Eczema, and Psoriasis. The presented approach worked by implementing AlexNet deep neural network for extracting features from images, and support vector machine (SVM) approach for categorization. The experimental results revealed the classification accuracy of 100% for the proposed system.
In the same year, Jamian et al. [16] presented an EHRbased approach for the prediction of systemic sclerosis by incorporating billing codes and random forest technique with clinical data. The presented system achieved positive predicted value of 84% with recall of 92%, and f-score of 88%.
Padmavathi et al. [17] applied deep neural networks such as convolutional neural network (CNN) and residual network (ResNet) for the prediction of dermatological diseases. The experiments have been performed on the dataset of 10,015 dermatoscopic images categorized into seven different classes. The experimental results revealed the classification accuracy of 77% for CNN and 68% for ResNet.
A hybrid model was presented by Rajasekaran et al. [18] for the identification of skin diseases. The proposed hybrid model comprises of four steps: pre-processing, segmentation, feature extraction, and classification. The step of preprocessing phase involves gray scale conversion, application of median filter, and binary mask. Segmentation is performed by otsu's method and histogram chart approach. Feature extraction involves application of wavelet transform for extracting the features from the image. And, the last stage employs CNN to classify the extracted features in order to identify the skin disease.
George et al. [19] introduced a framework based on bagof-visual words approach, k-means clustering, and SVM for evaluating psoriasis severity in skin images. The presented framework achieved accuracy of 80.81%.
Machine learning algorithms can be used to predict skin diseases, according to a study of the related literature. Dermatology disorders have been diagnosed using deep learning techniques from imaging data. The traditional ML approaches have been found to be successful when dealing with clinical data. However, when dealing with voluminous, inconsistent, and higher-dimensional data; conventional ML approaches have failed to achieve the expected accuracy. Hybrid systems have been suggested by researchers as a means of resolving these problems. These hybrid models have been found to be more effective than conventional methods because they integrate their advantages while minimizing their disadvantages. This has prompted the authors to work on a hybrid method for diagnosing skin disease from clinical data, with the aim of improving the efficiency of traditional machine learning technique.

Materials and Methodology
This research work aims to present a hybrid system for Erythemato-Squamous ailments prediction. The proposed hybrid approach integrates ReleifF Algorithm with K-nearest neighbor (KNN) approach, and it works in two stages. The first stage deals with the implementation of ReliefF Algorithm for optimum feature selection involved in erythemato-squamous diseases; and the second stage deals with implementation of KNN for classifying those selected optimum features in order to predict form of Erythemato-Squamous disorder. The experimentation has been performed out on MATLAB 2018 tool.

Dataset for Erythemato-Squamous Diseases
In this research study, the experimentation has been conducted on benchmark dataset of Erythemato-Squamous disorder. This dataset has been collected from University of California, Irvine (UCI) Machine Learning Repository (https:// archi ve. ics. uci. edu/ ml/ datas ets/ derma tology). It contains 366 records, each with 34 input attributes and 1 output attribute. Among 34 input attributes, 12 present clinical attributes and 22 present histopathological attributes. 'Age' attribute represents patient's age, and it has integer value. 'Family History' attribute has nominal value, i.e., 1 (means this disease has occurred in the family) or 0 (this disease has not occurred). All the remaining 32 input attributes are having integer values from 0 to 3 [0-feature is absent, 1-mild value, 2-moderate value, 3-severe value]. The output attribute represents six classes namely lichen planus, psoriasis, pityriasis rosea, seboreic dermatitis, chronic dermatitis, and pityriasis rubra pilaris. The particulars of the dataset attributes are presented in Tables 1 and 2.

Techniques used for Implementation of Hybrid Model
In the research work, a hybrid model has been developed by coupling ReleifF Algorithm with K-nearest neighbor (KNN) for erythemato-squamous disorders Prediction. The background of these techniques is presented as under:

ReliefF Algorithm
This algorithm computes the weights of features in scenarios where the Class is a multi-class and categorical. This algorithm penalizes the predictors which produces different values for the neighbors of a same Class and rewards those predictors which results different values for the different classes. At the onset this algorithm sets the weight of all the predictors to zero. Then, the algorithm iteratively selects any observation randomly and finds the k-nearest observations to the selected observation for each Class. If x r and x q do belong to the same Class, then weights updating has been done for each nearest neighbor for a given predictor as: If x r and x q do belong to the different classes, then weights updating for each nearest neighbor for a given predictor as: where W i j is the weight of the attribute F j at the ith iteration, P y r is prior probability of the class to which X r belongs, P y q is the probability of class to which X q belongs and m is number of iterations.
ReliefF has actually a family of algorithms and is proven to be very successful feature estimator. The family of relief 1 3 algorithms uses conditional dependencies among the features and produces a unified idea of feature selection in classification and regression problems. These algorithms are widely known as feature selection techniques where as they are also used in pre-processing stages before the model is trained [20]. In this study, relieff algorithm has been implemented with input attributes as predictor variables and output attribute as response vector with seven nearest neighbors per Class. The rank values and weights for each input feature determined by relieff algorithm are presented in Table 3. It has been analyzed that feature no. 15 (Fibrosis of the Papillary Dermis) is found to be the most important attribute whereas feature no. 3 (Definite Borders) is computed to be the least significant attribute in the Erythemato-Squamous disorder dataset. Out of 34 input attributes, first 20 most significant attributes has been selected for classification to be done in the next phase of the proposed hybrid model. The details of these selected input features are presented in Table 4.

K-Nearest Neighbor
KNN is one of machine learning approaches, that is nonparametrically supervised and it is applied for both classification and regression. Classification of the given input record into the output Class label depends on the features of neighboring records and the value of K. It operates through the measurement of distance metrics by tracing the pattern of similarity between data points. Such metrics take into account different measurements of distance, such as Manhattan, Euclidean, Hamming, or Cityblock. In order to assess the nearest neighbors, the distance metric measures the minimum distance from the test instance to the training results. The output Class of the test example belongs to most of the groups of k-nearest neighbors. In many research publications, Euclidean measure has been implemented to compute minimum distance between two data points in the dataset, which is represented as under:  where V and W correspond to the records in training set and the test set, respectively, with n input attributes in the data [21]. Here, KNN model is implemented as a classifier in the second phase of the hybrid model. The presented hybrid model has been trained via KNN approach with 20 significant input parameters (selected by ReliefF Algorithm) and 1 output parameter with 6 classes from the dataset under study. The KNN ML approach was applied with tenfold cross validation and Euclidean distance metric. It was found that the most favorable number of nearest neighbors (k) was 5. KNN approach classifies the input data into one of six classes of erythemato-squamous diseases.

Implementation of Proposed Hybrid Architecture
In order to implement and evaluate the proposed hybrid architecture, benchmark dataset has been collected. After this, data pre-processing has been conducted to deal with missing values using imputation. The outliers present in the data have been identified through application of statistical technique. In order to convert all the input attributes in the range [0, 1], Min-Max normalization method has been applied; as presented in following equation.
where M corresponds to the true value of the attribute A, M′ corresponds to normalized value of the respective attribute, minimum(A) refers to smallest value and maximum(A) is the largest value in the attribute A. After normalization, the proposed hybrid model has been implemented. This hybrid model works in two phases. In the first phase, ReliefF Algorithm has been implemented on the pre-processed data for selecting the most significant attributes in the dataset. In a higher-dimensional dataset with fewer instances, majority of the features are not important

Results and Discussion
In the current research work, hybrid model based on Reli-efF Algorithm and KNN approach has been presented for Erythemato-Squamous disorders prediction. From the experimental findings, it has been evaluated that the proposed hybrid model can be well utilized to predict undertaken medical diseases. The hybrid model has been validated based on performance metrics such as accuracy, recall, precision, f-score and kappa coefficient. In case of medical diagnosis, the doctors want that diagnostic system should predict accurate results. This diagnostic system is also expected to prevent false positives as well as false negatives in order to avoid misclassification of the disease. In this research work; accuracy, recall, precision, kappa coefficient, and f-score metrics have been adopted for validation of the hybrid model. The reason is that Accuracy depicts performance of the model; Precision tells about False Positives, Recall points out false negatives. F-score helps in resolving the issue of low variance and high bias vs high variance and low bias.
The experimental results are presented in Table 5, and these results revealed that the proposed hybrid model     The confusion matrix for the proposed hybrid model and traditional KNN is presented in Fig. 3a and b, respectively. From the confusion matrices, it has been found out that traditional KNN predicted somewhat good results for Class 1 (psoriasis), Class 3 (lichen planus), and Class 5 (chronic dermatitis). But it predicted low performance for Class 2 (seboreic dermatitis) and Class 4 (pityriasis rosea). For Class 1, the performance of both hybrid model and traditional KNN was same. For Class 2, traditional approach achieved precision of 44.4% and recall of 50%. For Class 4, it has shown recall of 50% and precision of 55.6%. For Class 6, it had achieved recall of 83%. But the proposed hybrid model achieved better results as compared to traditional approach for all the Classes. Hybrid model predicted Classwise recall as 100%, 75%, 100%, 100%, 93%, and 100%.
Similarly, Receiver Operating Characteristic (ROC) curve for the proposed hybrid model and traditional KNN is presented in Fig. 4a and b, respectively. It also shows that traditional KNN approach predicted poor performance for Class 2 and Class 4 as true positive rate predicted by hybrid model is better than that of traditional KNN approach.
The hybrid model outperformed conventional KNN in terms of accuracy, recall, precision, kappa coefficient and f-score values, according to the results of the experiments. The hybrid model predicts a higher true positive rate than conventional KNN. The conventional approach had more false positives and false negatives than those of hybrid model. Thus, the proposed hybrid model outperformed the traditional KNN approach. The explanation for this is that conventional KNN used all of the features in the disease dataset to classify the disease; as a result, the data has become more dimensional, and certain features may be less relevant. It has a higher computational cost, which reduces the efficiency of the trained model and causes overfitting.
Only significant attributes present in the data were taken into account in the proposed hybrid model, resulting in the data's dimensions being compacted and the learned model's performance being improved.

Conclusion
In this paper, hybrid system has been presented for the forecasting of Erythemato-Squamous disorders from clinical data. From the experimental findings, it has been found out that the proposed model can be effectively used for the diagnosis of the disease under study. This model has improved the performance of traditional KNN by eliminating less contributing features present in the dataset, thus increasing the efficiency of trained model. The main limitation of this model is that it can't be used on imaging data or biomedical signals as it has been trained on clinical data. In the future, the same hybrid model may be used to diagnose other medical conditions. This model can also be updated to diagnose medical conditions using imaging data or biomedical signals.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.