Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector

Customer retention is a major challenge in several business sectors and diverse companies identify the customer churn prediction (CCP) as an important process for retaining the customers. CCP in the telecommunication sector has become an essential need owing to a rise in the number of the telecommunication service providers. Recently, machine learning (ML) and deep learning (DL) models have begun to develop effective CCP model. This paper presents a new improved synthetic minority over-sampling technique (SMOTE) with optimal weighted extreme machine learning (OWELM) called the ISMOTE-OWELM model for CCP. The presented model comprises preprocessing, balancing the unbalanced dataset, and classification. The multi-objective rain optimization algorithm (MOROA) is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM. Initially, the customer data involve data normalization and class labeling. Then, the ISMOTE is employed to handle the imbalanced dataset where the rain optimization algorithm (ROA) is applied to determine the optimal sampling rate. At last, the WELM model is applied to determine the class labels of the applied data. Extensive experimentation is carried out to ensure the ISMOTE-OWELM model against the CCP Telecommunication dataset. The simulation outcome portrayed that the ISMOTE-OWELM model is superior to other models with the accuracy of 0.94, 0.92, 0.909 on the applied dataset I, II, and III, respectively.


Introduction
Customer retention plays an important role in churn prediction, and telecommunication organization is directly associated with financial companies. Hence, many industries performed different actions for developing a robust correlation with users and to reduce user defections. In order to establish the best customer retention, it is essential to understand the user changing hierarchy and find the causes involved in churning. In recent times, the competing nature and costcutting stress invoked the firms to maximize the customer relationship management (CRM) module. The consequent nature of the customer has to be changed from unknown to known ones. But the prediction of the user's decision is highly complex and early detection helps prevent the churn prediction [1]. It is a mandatory input for customer retention models and extraordinary events like gifts, incentives, discounts, and promotions would improve the user requirement and satisfaction. The impact of customer satisfaction on retention in the telecommunication industry is essential, and expert's knowledge regarding the related aspects influences the user satisfaction to enhance limit the churn, where CRM should be employed proficiently [2].
In CRM, user maintenance is one of the important tasks which directly affects the firm's profit and production. For instance, a partial increment in retention rate has resulted in the maximized profits to a greater extent. The modifying costs are related to users switching behavior, and it is incurred by customers, firms, and both. Such costs are limited to monetary factors and it related to mental aspects that influence brand loyalty. Moreover, various switching costs are available like transaction cost, compatibility with a product, knowing to apply novel services, uncertainty regarding the supremacy of new services, and searching costs for novel producers. Hence, the costs are provided for massive corporations like the telecommunication sector [3]. Due to the existence of switching costs, firms ought to expend huge money and attract customers to retain users. Thus, maintaining the old users is better than earning new customers that is expensive and complicated. It is pointed out that, retention can be influenced by massive factors in the telecommunication sector, and some of them are network superiority, cost of a plan, photograph of a competitor, and mobile number constancy [4], and this literature concentrates in identifying the prominent aspects which affect the quality of CRM by applying machine learning (ML) methodologies.
Telecommunication firms have applied an extensive range of tricks like developing a collection of services, providing various ideas to satisfy the user demands, and offering discounts to retain user trustworthiness and inspire novel clients, wherein the profit can be enhanced gradually. Ironically, these tactics result in enhanced competition among several firms, and the prediction of user behavior is a complex operation. It prefers to offer a modified service to maintain the users preventing customer churn and also retain the increased competition. The norm churn means, loss of users who often change the supplier within a limited duration [5]. Certainly, customer retention is performed to reduce the customer churn which is a major problem in telecommunication industries, following which, the user demands are enhanced where the competition among the firms is also progressed in the telecommunication sector, also organizations apply the extensive range of procedures along with tactics.
One among them is the CRM model, which is highly concentrated in building the net profit along with key stockholders as well as users. The presence of digital systems deployment, information technologies, state-of-the-art ML as well as statistical methods offer better opportunities for telecommunication firms to learn the user requirements which is highly essential to present effectual CRM [6]. The economic factor of a company is affected by the users in the telecommunication industry so that even a small adjustment in financial objective has to be watched and monitored. Here, customer churn is caused because of the change in user behavior. But detecting customer thoughts is not possible even by applying tactics and systematic ideas. It is essential to perform a novel study in learning customer behavior to find new factors of customer churn, and applying CRM would be highly advantageous in minimizing the churn rate effectively. Maintaining previous users is highly profitable instead of inspiring new customers [7]. Inspiring new users are expensive and no wonder why telecommunication service providers were trying to maintain the old customers. Thus, developing a précised churn prediction method and finding the users likely to churn are highly difficult operations.
In recent times, the ML method was employed in the CRM application. For instance, [8] identified the desired users at a stage, which begins to display a downward movement and a tendency to churn by applying a data gathered by actively reaching out to recent users and hearing the feedback. Firstly, the developers have defined and predict the downward trend propensity of users under the application of Decision Tree (DT) framework. Besides, semi-supervised learning, as well as casual inference, has been employed for identifying silent sufferers and golden set. [9] applies a supervised ML model to find the user who can be sustained in a mobile app-based business in which users subscribe for e-learning applications. The central premise of this study is to discover the aspects which are applicable to retain loyal customers and perform the marketing procedures accordingly.
This paper presents a new improved synthetic minority over-sampling technique (SMOTE) with optimal weighted extreme machine learning (OWELM) called the ISMOTE-OWELM model for CCP. The presented model includes preprocessing, balancing the unbalanced dataset, and classification. Firstly, customer data perform data normalization and class labeling. Afterward, the ISMOTE technique is employed to handle the imbalanced dataset. At last, the OWELM model is applied to compute the class labels of the applied data. The multi-objective rain optimization algorithm (MOROA) is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM. In this study, ROA is chosen due to the following reasons. The ROA is a simpler and is an efficient searching technique. The presented model is proficient in the searching and identification of optimal solution in the large searching area with a reasonable amount of time. A series of simulations are carried out to ensure the ISMOTE-OWELM model against the CCP Telecommunication dataset.

Related works
Numerous models have been employed for predicting churn in the telecom industries. Mostly, ML and Data Mining (DM) methodologies were applied by several researchers in recent years. Followed by, several related works have concentrated on using a single framework called data mining (DM) for knowledge extraction and the additional approaches have focused on relating numerous principles for churn prediction. Brandusoiu et al. [10] implied the latest module of DM to detect the churn for old customers by applying a dataset with call details of huge customers with features, and a reliant churn variable with 2 metrics namely, yes and no. Additionally, some features have data regarding the value input and output and voicemail of all users. Researchers have used principal component analysis (PCA) for computing dimensionality reduction. Also, three ML approaches were utilized they are neural networks (NN), Support Vector Machine (SVM), and Bayes Networks. The above-mentioned ML modalities are applicable for predicting the churn. Then, the developers have employed the area under curve (AUC) to estimate the model's efficiency. Finally, optimal values are gained from 3 ML models. Hence, a dataset applied in this work is minimum and it does not have any missing values. He et al. [11] developed an approach for churn prediction according to the NN model and overcome the issues of CRM in the large-scale Chinese telecom industry with a huge number of users. As a result, remarkable prediction accuracy has been attained to a greater extent.
Idris et al. [12] implied a framework based on genetic programming with AdaBoost and developed a churn prediction approach in telecommunications. A newly presented scheme is sampled under 2 standard datasets. Initially, Orange Telecom and secondly, cell2cell datasets; among these, maximum accuracy is gained from the cell2cell dataset even better by the first one. Huang et al. [13] analyzed the crisis of customer churn in a big data application. The key objective of developers was to ensure that big data is a capable one and maximize the churn prediction ability based on 3Vs namely, Volume, Variety, and Velocity of data. Random Forest (RF) method is applied and estimated with the help of AUC. Makhtar et al. [14] projected an approach for churn prediction under the employment of rough set theory in telecom services. The Rough Set classification approach has surpassed the previous models like Linear Regression, DT, and Voted Perception NN. Numerous authors have examined the crisis of imbalanced datasets in which churned customer classes are tiny when compared with active customer class labels, as it is the main problem in the churn prediction process. Amin et al. [15] preformed a comparison among six various sampling methods for oversampling about telecom CCP. The outcomes have showcased that the rulesgeneration relied on genetic algorithm (GA) has performed well than oversampling approaches. Burez and den Poel [16] learned the issues of imbalanced datasets in CCP schemes. Hence, the simulation outcome has represented that undersampling frameworks have surpassed all the previous methodologies significantly. Figure 1 depicts the working process of the proposed ISMOTE-OWELM model. As shown in the figure, the input customer churn data perform data preprocessing, balancing the dataset, and classification. The MOROA is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM.

Data preprocessing
During the preparation step, it is discrete by size, the values that exist in all attributes of the dataset, after that allocated particular labels, e.g., Zero to Nine (0-9) feasible values, for all discrete group. In discretizing by size leads to choosing the numerical attributes to nominal attributes and grouped them into a particular size of bins. It divides the entire count of values in an attribute by the size of the bin. Finally, it created a particular list of values in several counts of groups of an attribute.

SMOTE for unbalanced dataset
The SMOTE is an over-sampling model [17], which intends to develop novel minority class samples by interpolating various original minority class samples. SMOTE randomly decides the k-nearest neighbors (kNN) of minority class instance, and performs a random interpolation of 2 instances for developing novel synthetic instance. In particular, the synthetic samples are produced as: apply the variations among an actual instance and the nearest neighbor, maximize these variations by a random number from 0 and 1, and include with the actual instance. It enforces the selection region of the minority class which is considered as a typical one. Hence, over-fitting issues are eliminated and decision boundary for minority class distributes further into the main class space. Based on SMOTE, numerous methods were presented for classifying the unknown data [18]. The previous SMOTE methods suffer from severe issues. It applies a similar sampling rate for instances of the minority class. Regardless, diverse samples comprise distinct roles in sampling and classification processes. The sampling rate is based on the sample instance, followed by the projected MOROA model to identify and apply the best sampling rates for different instances.
Choosing samples from minority class for over-sampling and fixing of sampling rates are relevant to the imbalanced degree of the dataset, entire distribution of samples, interior distribution of minority class samples, count of samples, count of sample attributes, and classes of attributes. It is a complex optimizing issue and resolved 1 3 by the computed numerical method. For reaching the maximum accuracy rate of minority class classification as well as optimal classification accuracy rate, diverse instances in the actual training set are correlated with sampling rates. The following ROA is applied for identifying actual sampling rates for different instances: where f (X) implies the objective function, where the accuracy rate of minority class classification and entire classification of datasets; X denotes the decision vector, which means sampling rate; M refers the dimension of decision space, N i means sampling rate of minority class sample xi; minN and maxN are maximum and minimum bounds, correspondingly, of sampling rate N i .

WELM-based classification
Extreme learning machine (ELM) is utilized to classify the balanced datasets, whereas WELM is utilized to classify the imbalanced datasets. The structure of ELM is shown in Fig. 2. So, this section describes the establishment of WELM [19]. The training dataset has N different instances The single hidden layer NN with L hidden layer nodes, which can be represented as: where w i implies a single hidden layer input weight, l() defines the activation function, i denotes a resultant weight, and b i represents the single hidden layer bias. Here, Eq. (2) is illustrated as: Based on the Karush-Kuhn-Tucker theory, a Lagrangian factor is established for transforming the training of ELM into a dual problem. A resultant weight is computed as follows: where C exhibits the regularization coefficient. So, the outcome function of the ELM classification model is illustrated as: where refers a kernel matrix which is computed as: It is seen from (5), that the hidden layer feature map s(x) is independent of the result ELM classification, and the classification result is compared to kernel function K(x, y) . Afterward, the K(x, y) applies the inner product, and so the hidden layer node count has no result on the output result. Thus, ELM does not require setting the input weights and offsets of the hidden layer. A kernel function of KELM is an RBF function, as defined below: Hence, KELM classification is carried out by 2 parameters: kernel function parameter and penalty parameter C. Based on the ELM advantages, the WELM with allocated weights to several samples can manage the imbalanced classification problem. Its resultant function is computed as: where W refers to a weight matrix. WELM has 2 weighting methods, i.e.,

ROA for sampling rate selection and parameter tuning
In this section, ROA is applied for the selection of optimal sampling rates of SMOTE and parameter tuning of the WELM model.

Inspiration
When rainfall starts, the droplets fall into the earth's surface. After some time, the droplets are joined with one another and it merges and moves on the surface based on the weight. Some other droplets move toward the consecutive droplet and merged with them and few might be exhausted or observed by the soil on the basis of soil features like the texture of soil surface, porosity, permeability, wettability, and so on. In addition, some soils are dissolved in the water, following which droplets dropped on flat region is absorbed by the soil completely and it has vanished from the inclined region and link other droplets for developing a stream. Certainly, few streams are connected with one another and change into a river. In case of any barrier in the path of streams or rivers, lakes are developed where the quantity of water represents the significance of droplets. Once the rain is stopped, streams and rivers are discharged to local lakes. Afterward, tiny lakes disappear because water evaporates in a lake or it is absorbed by the soil. Thus, the important lakes are sustained in groups according to the topology of the earth's surface and feature of a soil. Such lakes represent the local minima of ground surface as well as deeper lake implies the global minima. When the rain type is changed, then there is a small change in the predefined procedure. For sample, in case of heavy rain with massive droplets, every droplet is linked with one another robustly with no absorption or exhaustion in a flood. At this point, the global minimum is predicted and the local minimum is correlated with one another because of a rainstorm. Followed by, if there is a light rain with the smaller droplet, every droplet is absorbed by the soil which leads to no stream development. Hence, it is analyzed that parameter tuning has vital significance at the time of applying ROA [20]. The movement (12) , of a particle in the presented model is the same as gradientrelated optimization approaches and traditional single-point models like hill-climbing (HC) and gradient-descent (GD) and rainfall optimization algorithm (RFO). These approaches modify a single parameter for all iterations for identifying whether the adjustment in this parameter enhances the cost function or not. Therefore, ROA applies a collection of replies where it moves toward the best one at the same time, following which these features change in all iterations.

Algorithmic steps of ROA
Here, the rain behavior is simulated as it is defined in the traditional section. Every solution to a problem could be referred to as a raindrop. Based on these issues, few points in the answer space are decided in a random fashion as raindrops fall on the earth surface. The major characteristic of a drop of rain is the radius. The radius of all raindrops could be limited as time passes and it is enhanced as a raindrop is linked with alternate drops. If the initial answer population is generated, the radius of every droplet is allocated in an arbitrary manner to a limited extent. Additionally, each droplet validates the neighborhood based on the size. Single droplets that are not yet linked just verifies the end limit of the place which has covered. In order to resolve the issues in dimensional space, each droplet is composed of n variable. Thus, in the initial step, the maximum and minimum limit of the variable is validated as the limits are computed by a radius of the droplet. Next, 2 endpoints of the variable are sampled and it is repeated till reaching the final variable. Afterward, the cost of the first droplet is upgraded by moving in a downward direction. It is performed for each droplet, and the cost, as well as position of every droplet, would be allocated. The radius of a droplet would be modified in 2 fashions: When 2 droplets with radius r 1 and r 2 , then they are closer to one another with the typical area and it connects for developing larger droplet of radius R: where n implies the count of variables for all droplets. When a droplet with radius r1 is not shifted, based on the soil features, that are showcased by , water is observed by the soil.
Obviously, illustrates the volume of a droplet that has been absorbed in all iterations from 0 to 100 percent. Moreover, it describes the least values for droplets radius r min , where droplets with a minimum radius of that r min would be diminished.
As mentioned above, the population value might be reduced after some iterations and maximum droplets are deployed with a larger domain of analysis. By enhancing the analysis process, the local searching potential of drops is maximized proportionally to the diameter of droplets. Thus, by maximizing the number of rounds, weaker droplets vanish or link with robust drops with the maximum domain of examination, and the initial population would be reduced intensively and find the correct answer(s). It is assumed that there are few variations among the newly presented optimization model in ROA and the currently implied search model deployed Rain Fall Algorithm (RFA) which is consolidated as: • In ROA, the initial population number is modified after all iterations because of the connection of neighboring drops. It results in enhancing the searching capability of a model and reduces the optimization cost significantly. • Once the size of a droplet is changed, the linking of the nearby droplets or adsorption by the soil is performed. This performance modifies the searching potential of every droplet and classifies the droplets. • In RFA, and alternate searching models, every population is composed of neighbor points and the droplet is enhanced one step in a random fashion. Besides, every population identifies the optimal path to the lower point. Once the path is found, it is shifted in a downward direction iteratively by step and the cost function is reduced in a single iteration.
On the basis of the approximations and idealizations of the model, the rain method is depicted. In depth, tuning parameters of this method are initial raindrops number (population number), basic raindrops radius, and so forth, following which a value is allocated to all droplets on the basis of the cost function. Then, every droplet is moved in downward. Thus, near droplets combin with one another, which leads to enhanced results. If a droplet is terminated at the least point, the radius begins to decrease progressively causing the accuracy of the answer to be increased. Then, it is applicable to identify extrema points of the objective function.

Experimental validation
The presented model is simulated using Python 3.6 with keras and tensorflow packages. The performance of the presented ISMOTE-OWELM model is examined using three benchmark datasets namely 1-3. The first dataset includes 3333 samples with the presence of 21 features and 2 classes. Next, the second dataset comprises 7043 instances with the existence of 21 features with 2 class labels. Finally, the third dataset holds 100,000 samples with 100 features and 2 class labels. Besides, a detailed simulation analysis takes place to ensure the goodness of the ISMOTE-OWELM model in terms of different performance measures. The related details of the dataset are provided in Table 1. For experimentation, tenfold cross-validation process is employed. Table 2          Also, the PCPM model has realized even better results with an accuracy of 0.818 and F-measure of 0.808. Concurrently, it is revealed that the WELM model has shown manageable results with the accuracy of 0.869 and F-measure of 0.852. In line with, the OWELM model has reported outstanding results over the earlier methods with the accuracy of 0.887 and F-measure of 0.879. Though the SMOTE-OWELM model has outperformed the previous compared methods, the ISMOTE-OWELM model has shown maximum predictive performance with the accuracy of 0.909 and F-measure of 0.908. The proposed model has achieved superior results due to the inclusion of ISMOTE to handle class imbalance problem and the efficient characteristics of ROA in the determination of optimal sampling rate. The presented model can be employed in real time e-commerce sites.

Conclusion
This paper has developed a novel ISMOTE-OWELM to determine the churners in the telecom sector. The presented model includes preprocessing, balancing the unbalanced dataset, and classification. Initially, the input customer churn data performed data preprocessing, balancing the dataset, and classification. The MOROA is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM. Initially, the customer data involve data normalization and class labeling. Then, ISMOTE technique is employed to handle the imbalanced dataset. At last, the WELM model is applied to determine the class labels of the applied data. The MOROA is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM. A series of simulations were carried out and the ISMOTE-OWELM model has shown maximum predictive performance with the accuracy of 0.94, 0.92, 0.909 on the applied dataset I, II, and III, respectively. The experimental outcome stated that the performance of the ISMOTE-OWELM model over the compared methods. As a part of future extension, the performance can be further enhanced using different feature selection methodologies. adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.