1 Introduction

In developed countries, the telecom industry has been established as one of the leading industrial sectors, in which the level of competitions is in peak, in terms of technology advancements as well as the growing number of service providers [1]. Industries contribute the great effort based on various approaches to ensure their endurance in this competitive field, among which three primary approaches are presented to increase the turnover namely, (i) acquire new customers, (ii) upsell existing customers, and (iii) increase customer’s retention period. Nevertheless, it can be concluded that the third strategy is foremost beneficial, while assessing all the three approaches by considering each of their Return on Investment (RoI) value [2]. Moroever, it shows that the existing customer retention cause lesser expense than obtaining new customer [3]. Additionally, it requires solely a minimal effort than upselling approach [4]. Industries are required to diminish the possibility of customer churn in terms of enforcing this approach, namely "the customer movement from one provider to another" [5].Churn prediction plays a crucial role in telecom industry, as they are in the position to maintain their precious customers and organize their Customer Relationship Management (CRM) [6, 7]. CRM is a wide-ranging system that regulates the process of structuring, administrating, and reinforcing the reliable and enduring relationships with customers, which is widely accepted strategy and implemented in various sectors such as, retail market, insurance, banking, telecommunication, etc.This strategy primarily intended to retain the customers and the necessity of the intention is apparent that the expenditure in obtaining new customers is extensively higher when compared to the expense of customer retention (multiplied the expense 20 times in some instance). Hence, efficient tools are necessary to develop and enforce the customer retention systems (churn models) and to deal with applications for Business Intelligence (BI). The churning can be caused by the competitive marketing factors like, insufficient service to fulfil customer expectations, competitive strategies, attractive products, new regulations, etc. Churn strategies intended to spot the initial indications of churn and identify the customers with the probability of leaving. In order to classify those customers who are susceptible to churn, it necessitates the system of churn prediction to be created.To procure the optimal forecast on customer churn, several strategies of data mining, like Neural Networks (NNs), Clustering, Decision Tree (DT) [8], Regression [9], Support Vector Machine (SVM) [10], and the ensemble of Hybrid methods [11] were employed in existing studies. Industries have realized further vigorous tactics with the help of technical advancements, in order to guarantee the maximum level of churn customers remains in their company. The focus of the researchers is on discriminating the customers to find out those who are supposed to be churn to other operators [12]. There are heavy competitions in the sector, by reason of deregulation in telecom industry and consumers have lot of options to choose. Hence, the scenario demands the telecom operators to absorb the customer requirements and amend it, which has been considered as a primary objective for them to keep their customers away from the competitor [13]. The key intention of the present studies is that the telecom data needs to be utilized in huge quantity to classify the precious churn consumers. Nonetheless, the current approaches are bound with a lot of constraints and challenges in a real-world environment. The process apparently includes the issue of data derivation with missing values, as it generates huge quantity data in the telecom industry, which in turn the least efficient results from prediction systems. Therefore, the approaches of data preprocessing is developed to resolve these problems, which eradicates the data noise and enable the approach to categorize the data in a proper manner, and eventually, with better efficiency.

Even though the study has employed the feature selection approach, numerous instructive features are ignored during the development process of the system model [14]. Statistic-based approaches have been largely applied in different fields, which source the inefficient outcomes from predictive strategy. The current research verifies their approaches through standard datasets [15, 16]. But, it lacks to provide the appropriate depiction of data and that in turn does not make sense in decision making. In churn prediction approach, the feature selection method aids to choose the essential features like, information gain, and correlation attribute ranking filter [17]. Two massive datasets from the telecom domain are processed through churn and non-churn classification by applying several machine learning strategies [17]. In this research work, Information Gain (IG) and Fuzzy Particle Swarm Optimization (FPSO) techniques are used in this paper, concerning the enhancement of feature selection efficiency. Churn customer data has been categorized through applying the algorithm of Divergence Kernel-based Support Vector Machine (DKSVM). Besides, by utilizing Hybrid Kernel Distance-based Possibilistic Fuzzy Local Information C-Means (HKD-PFLICM) method, the customers are divided into three clusters such as Low, Medium, and Risky on the basis of customer activities, through which the customer profiling has been executed. The evaluation metrics such as F-measure, True Positive Rate (TPR), Precision, False Positive Rate (FPR), Recall and Receiver Operating Characteristic (ROC) have been utilized to evaluate the performance of proposed churn prediction approach. The rest of the paper is arranged as follows, Sect. 2 discuss few related methods proposed earlier. Section 3 illustrates proposed methodology in detail. Section 4 provides information about the performance evaluation of proposed approach in comparison with few other methods discussed in Sect. 2. Section 5 concludes the papers with insight for future research direction.

2 Literature review

A number of techniques such as machine learning, data mining, hybrid techniques etc. are being utilized by the researchers in Churn prediction. As a result of the research, many models confining to customer churn prediction are innovated, this in turn aids in taking decision and CRM for identification, prediction, and retainment of churning customers by the companies.

Huang and Kechadi [14] established a customer behaviour prediction by incorporation of supervised and unsupervised techniques via novel hybrid model-based learning system. This technique comprises of modified k-means clustering algorithm and classic First-Order Inductive Learner (FOIL) technique which takes into account various factors such as customer’s behaviour, usage pattern and network operations. The datamining pre-processing is the challenging concern for detection of outliers and redundant data examples. Hence, this hybrid model helps in achieving better clustering result. Tsai and Lu [18] integrated back-propagation Artificial Neural Networks (ANNs) and Self-Organizing Maps (SOM) techniques for creation of hybrid model and it offers good prediction accuracy results comparatively in line with single NN baseline model and Types I and II errors over three kinds of testing sets.

Jahromi et al. [19] utilized clustering phase and classification phase which is a dual-step model building methodology. The RFM related features have a key role in classification of the customer into four clusters thereby churn logical definition is obtained; additionally, algorithms play a significant role in churn prediction model extraction in clusters formed. The gain measure evaluation supports in differentiating the performance of employed algorithms which also suggest that multi-algorithm methodology offers more advantages compared to single-algorithm. This research is prolonged to prepaid mobile phone organizations [19] with the intention of determining responses to customer loyalty results.Verbeke et al. [20] utilized a methodology involving data mining in customer churn prediction modeling. Comprehensibility and intuitiveness of churn prediction models are the most important factors in decision. The traditional rule induction classifiers C4.5 and Repeated Incremental Pruning are overcome by the novel data mining techniques which are pragmatic to churn prediction modelling for establishing an Error Reduction (RIPPER). The novel techniques namely AntMiner + and Active Learning Based Approach (ALBA) offer good accuracy comparatively. The Ant Colony Optimization (ACO) based AntMiner + is one of the most promising data mining technique and has option of encompassing domain knowledge by stating monotonicity constraints on final rule-set. ALBA exhibits high predictive accuracy of non-linear support vector machine model with unambiguousness of rule-set format. Lu et al. [2] proposed a strategy for enhancement of customer churn prediction model namely boosting which allows customer categorization on the basis of algorithm weight assignment. This helps in determining the higher risk customer cluster. The basis learner in this research is Logistic Regression (LR) and a churn prediction model construction is accomplished on every cluster respectively whose results are contrasted with a single LR model. It also exhibits good separation of churn data which is more useful for churn prediction analysis. Zhang et al. [21] suggested a prediction methodology on the basis of interpersonal influence which incorporates propagation process and customers’ personalized characters. Due to this integration, the novel techniques exhibit good prediction accuracy in contrast with traditional method. Data mining algorithm is a key concern in customer history analysis and prediction. Qureshi et al. [4] projected a data mining technique for churn customer identification. The patterns are obtained for determining probable churners depending on the historical data. The significant algorithms used in this work are Regression analysis, Decision Trees (DTs) and ANNs.

Mishra and Reddy [22] utilized Convolutional Neural Network (CNN) for accomplishing churn prediction which exhibits better accuracy. The experimental results provide accuracy of 86.85%, precision 91.08, error rate of 13.15%, recall 93.08%, F-score 92.06%. The tensor flow package with respect to time and accuracy can be still utilized for enhancement of the model performance. Amin et al. [23] extracted significant decision rules associated to customer churn and non-churn by utilizing a Rough Set Theory (RST) based decision-making technique. The process is carried out by classifying the churn and non-churn customers in addition to customers who conceivably may churn in the proximate prospect. RST based on Genetic Algorithm (GA) [24] is considered as greatest efficient system for extracting implicit knowledge in form of decision rules from publicly available, benchmark telecom dataset.

Mitrovi¢ et al. [25] involve daPareto multi-criteria optimization for the determination of optimal feature type combinations which is a reusable scheme which helps in sorting the gap amid predictive performance and operational efficiency. Caigny et al. [26] presented a new hybrid algorithm namely Logit Leaf Model (LLM) for improved classification data. There are two stages specifically segmentation phase and prediction phase in LLM. Decision is a key factor in decision rules identification in case of first stage and a model creation is done for every leaf of this tree. This is considered as the proficient hybrid technique due to predictive performance and comprehensibility in contrast with DTs, LR, RF and LR trees. Few key benefits are also attained due to the comprehensibility feature in contrast with decision trees or logistic regression. Stripling et al. [27] introduced ProfLogit for maximizing the Expected Maximum Profit measure for customer Churn (EMPC) in training step using a GAs. Also, there occurs a similarity of interior model structure with respect to lasso-regularized model. The EMPC basis helps in obtaining the threshold-independent recall and precision measures depending on expected profit maximizing fraction.

Agrawal et al. [28] utilized Deep Learning Approach for churn on a Telco dataset prediction. A non-linear classification model is constructed by means of a Multilayered Neural Network. This model takes into account various features such as customer, support, usage and contextual features. The determination parameters along with the churn probability are estimated. The final weights on these features are greatly utilized by the trained model for prediction of churn. Amin et al. [29] designed a new Cross-Company Churn Prediction (CCCP) model for explicitly for telecommunication sector. Systematically the exploration of CCCP model performance is accomplished with respect to data transformation technique and state-of-the-art classifiers.

A novel fuzzy particle swarm optimization (NFPSO) is explained by Tian & Li [30]. In this work, based on the control information translated from Fuzzy Logic Controller (FLC), learning coefficient and inertia weight are adjusted adaptively during search process. The Canonical Particle Swarm Optimization (CPSO) is introduced with two-input and two-output FLC. Three benchmark functions are used in experimentation and proposed NFPSO’s effectiveness is demonstrated with related metrics.

With robust update mechanisms and chaos-based initialization, modified particle swarm optimization is introduced by Tian and Shi [31]. For generating uniformly distributed particles, Logistic map is used and it enhances initial population’s quality. Sigmoid-like inertia weight is formulated for making PSO to adapt inertia weight adaptively. For enhancing swarm diversity, particles with fitness value less than average fitness value are applied with wavelet mutation. In addition, global best particle is applied with an auxiliary velocity-position update mechanism exclusively. The MPSO’s convergence is guaranteed effectively because of this mechanism. Extensive experimentation on CEC’13/15 test suites and standard image segmentation validates the MPSO algorithm’s efficiency and effectiveness.

A new technique to enhance PSO is implemented by Neshat [32]. For each iteration of proposed FAIPSO technique, for every particle, acceleration coefficients are adjusted adaptively. A fuzzy inference system is used to control the acceleration coefficients adaptively.

Ullah et al. [17] introduced a churn prediction model involving classification and recognition of the churn customers by providing the factors behindhand the telecom sector churning customers. The information gain and correlation attribute ranking filter are key factors in Feature selection. The rules created by expending the attribute-selected classifier algorithm are also equally important factor in churning of customers. The proposed work is motivated from this work by presenting new feature selection and classifier. Swarm based algorithms is introduced instead of filtering based feature selection. By analysis the above existing approaches, in today’s competitive world, to keep customers satisfied is a key to success for telecommunication companies.

3 Proposed methodology

The proposed work concentrates on presenting a churn prediction model. Initially, data preprocessing is accomplished for elimination of noise, imbalanced data features along with data normalization. The next step involves the significant feature selection carried out by Information Gain (IG) and Fuzzy Particle Swarm Optimization (FPSO). Thirdly, Divergence Kernel based Support Vector Machine (DKSVM) classifier has been proposed for customer churn prediction. DKSVM classifier is introduced for churn and non-churn classification of telecom sector dataset in addition to which factors are recognized for utilizing in clustering algorithms involved in the subsequent step. The next step involves Hybrid Kernel Distance based Possibilistic Fuzzy Local Information C-Means (HKD-PFLICM) for customer profiling operation. The HKD-PFLICM segregates into three groups Low, Medium and Risky customers based on their behaviour [33]. The concluding step involves the retention strategies for every category of churn customers by means of model recommendations. The churn prediction accuracy is estimated utilizing the metrics such as True Positive Rate (TPR), Precision, False Positive Rate (FPR), Recall, Receiver Operating Characteristic (ROC) area and F-measure. Figure 1 shows theproposed model for customer churn prediction.

Fig. 1
figure 1

Proposed model for customer churn prediction

3.1 Call detail record (CDR) datasets

This research utilizes two datasets. The customer churn prediction problem is analysed with the help of first dataset (telecom churn (cell2cell)) acquired from Teradata center for customer relationship management at Duke University. Pre-process operation is accomplished for Cell2Cell dataset and for analyzing Process, a balanced version is delivered comprising 71,047 instances and 58 attributes. The first dataset is a publicly existing telecom churn (cell2cell) “https://www.kaggle.com/jpacse/datasets-for-churn-telecom/version/2”.The second dataset is a publicly accessible churn-bigml dataset “http://github.com/caroljmcdonald/mapr-sparkmlchurn/ tree/master/data”. There are 3333 instances and 21 features in the dataset and the representation of targeted churn customer’s class is given by “T” constituting 14.5% of total data while 85.5% constitutes to non-churn customers represented as “F”. Table 1 designates about the above datasets.

3.2 Noise removal

The noise in data is more significant since it makes the data useless which in turn affects the results. The telecom dataset comprises lot of missing values, incorrect values like “Unknown'' attributes etc. There are 21 features involved in second dataset. The filtering of dataset and features is carried out to make sure of only useful features. The features possessing the unknown range are removed from the dataset. Just remove the unknown samples from the dataset.

3.3 Feature selection—information gain (IG) and fuzzy particle swarm optimization (FPSO)

The relevant features selection from a dataset is accomplished on the basis of domain knowledge which is considered to be a critical step namely Feature selection. Various researchesare being carried out in feature selection involved in churn predictions [34, 35]. Information Gain (IG) and Fuzzy Particle Swarm Optimization (FPSO) for feature selection are greatly utilized for feature selection in this research. The information gain and accuracy of the FPSO algorithm aids in extracting 39 features from the total 58 features in first dataset.

3.4 Information gain (IG)

The main scope of Information Gain (IG) is the machine learning area which pertains to entropy-based feature evaluation method. The Information Gain is defined as the amount of data delivered by the feature for the churn prediction adopted in feature selection. Information gain gives the estimation of term utilized for data classification for measuring the data prominence for the churn prediction and estimates the attribute relevance (A in class C).The higher the relevance amid classes C and attribute A is indicated by higher value of mutual information between classes C and attribute A. The information gain expression is presented further down in Eq. (1),

$${\text{IG}}\left( {{\text{C}},{\text{A}}} \right) = {\text{H}}\left( {\text{C}} \right) - {\text{H}}\left( {\text{C|A}} \right)$$
(1)
$${\text{H}}\left( {\text{C}} \right) = \mathop \sum \limits_{{{\text{c}} \in {\text{C}}}} {\text{p}}\left( {\text{C}} \right){\text{logp}}\left( {\text{C}} \right)$$
(2)

where \({\text{ H}}\left( {\text{C}} \right)\) is entropy of class revealed further down in Eq. (2) and \({\text{H}}\left( {\text{C|A}} \right)\) is conditional entropy of the class given attribute is revealedfurther down in Eq. (3),

$${\text{H}}\left( {\text{C|A}} \right) = - \mathop \sum \limits_{{{\text{c}} \in {\text{C}}}} {\text{p}}\left( {\text{C|A}} \right)\log {\text{p}}({\text{C}}|{\text{A}})$$
(3)
$${\text{IG}}\left( {{\text{C}},{\text{A}}} \right) = 1 - {\text{H}}\left( {\text{C|A}} \right)$$
(4)

Attribute A and classes C are not connected in any way, expressed by, if only if \({\text{H}}\left( {\text{C|A}} \right)\) = 1, then the least value of I (C, A) occurs. Conversely, incline to select attribute A which probably occurs in one class C whether as positive or negative. That is to say, optimum features are set of attributes which solely occur in one class. Signifies that, when the value of P (A) is similar to the value of P (A|\({\text{C}}_{1}\)), the highest \({\text{IG}}\left( {{\text{C}},{\text{A}}} \right)\) is reached, consequently both values of P(\({\text{C}}_{1} |\) A) and H(\({\text{C}}_{1} |\) A) being 0.5. Whereas, if values of P (A) and P (A|\({\text{C}}_{1}\)) are similar, then the value of P(A|\({\text{C}}_{2}\)) become P(A|\({\text{C}}_{2}\)) = 0 and H(\({\text{C}}_{1} |{\text{A}})\) = 0. The IG(C,A) value is differed from 0 to 0.5.The Advantages of Information gain ratio is biasing decision tree against considering attributes with large number of distinct values. So, it solves drawback of information gain—namely, information gain applied to attributes that can take on large number of distinct values might learn training set too well.

3.5 Fuzzy particle swarm optimization (FPSO)

In order to render the probable solution for feature selection in the churn prediction process, Fuzzy Particle Swarm Optimization (FPSO) algorithm has been applied. The purpose and the advantage of proposed FPSO is by using the fuzzy set theory for adjusting PSO acceleration coefficients adaptively and is thereby able forimprove accuracy and efficiency of searches. There is a search space passed through the process concerning the feature selection, and every feature's position is pursued by particle position. By utilizing both individual and social particles experience, the position has revised [36, 37].

The following expression (5) upgrades each particle position for feature selection.

$${\text{x}}_{{\text{i}}} \left( {{\text{t}} + 1} \right) = {\text{x}}_{{\text{i}}} \left( {\text{t}} \right) + {\text{ve}}_{{\text{i}}} \left( {{\text{t}} + 1} \right)$$
(5)

where \({\text{x}}_{{\text{i}}} \left( {{\text{t}} + 1} \right)\) represents the particle position i in the time (t + 1) for feature selection in churn prediction, \({\text{x}}_{{\text{i}}} \left( {\text{t}} \right)\) represents the particle position i in the time (t), and \({\text{ve}}_{{\text{i}}} \left( {{\text{t}} + 1} \right)\) expresses the particle velocity i in the time (t + 1) for feature selection in churn prediction. The gbest PSO version was used throughout this study, in which each particle's neighborhood experience has received from the overall instances (swarm) for feature selection in churn prediction. The following calculation (6) assess the velocity for each particle i,

$${\text{ve}}_{{{\text{ij}}}} \left( {{\text{t}} + 1} \right) = {\text{ve}}_{{{\text{ij}}}} \left( {\text{t}} \right) + {\text{ac}}_{1} {\text{ra}}_{{1{\text{j}}}} \left( {\text{t}} \right)\left[ {{\text{y}}_{{{\text{ij}}}} \left( {\text{t}} \right) - {\text{x}}_{{{\text{ij}}}} \left( {\text{t}} \right)} \right] + {\text{ac}}_{2} {\text{ra}}_{{2{\text{j}}}} \left( {\text{t}} \right)\left[ {{\hat{\text{y}}}_{{{\text{ij}}}} \left( {\text{t}} \right) - {\text{x}}_{{{\text{ij}}}} \left( {\text{t}} \right)} \right]$$
(6)

where \({\text{ve}}_{{{\text{ij}}}} \left( {\text{t}} \right)\) expresses the particle velocity i in dimension j in time t, \({\text{x}}_{{{\text{ij}}}} \left( {\text{t}} \right)\) signifies the particle's location in time t, \({\text{y}}_{{{\text{ij}}}} \left( {\text{t}} \right)\) represents the particle's optimal position (feature position), \({\hat{\text{y}}}_{{{\text{ij}}}} \left( {\text{t}} \right)\) depicts the overall optimal position (feature position) of entire particles, \({\text{ac}}_{1}\), \({\text{ac}}_{2}\) are the acceleration constant for the best chosen local and global features in the churn prediction, and in the interval \({\text{ra}}_{{1{\text{j}}}}\), \({\text{ra}}_{{2{\text{j}}}}\) are random numbers [0, 1].

In recent times, some researches intended to develop the strategies in order to augment the convergence and effective solutions have derived with the standard PSO, such as the velocity improvement, inertia weight. The inclusion of inertia weight improves the process of modifying best PSO version \(\left( {{\text{ac}}_{2} } \right)\). Apart from managing the particle's velocity and direction, the inertia weight enable the control over exploration and exploitation swarm. The following calculation represents the modifications ofgbest PSO:

$${\text{ve}}_{{{\text{ij}}}} \left( {{\text{t}} + 1} \right) = {\text{iwve}}_{{{\text{ij}}}} \left( {\text{t}} \right) + {\text{ac}}_{1} {\text{ra}}_{{1{\text{j}}}} \left( {\text{t}} \right)\left[ {{\text{y}}_{{{\text{ij}}}} \left( {\text{t}} \right) - {\text{x}}_{{{\text{ij}}}} \left( {\text{t}} \right)} \right] + {\text{ac}}_{2} {\text{ra}}_{{2{\text{j}}}} \left( {\text{t}} \right)\left[ {{\hat{\text{y}}}_{{{\text{ij}}}} \left( {\text{t}} \right) - {\text{x}}_{{{\text{ij}}}} \left( {\text{t}} \right)} \right]$$
(7)

Through Fuzzy system, PSO's Inertia weight \({\text{iw}}\) and acceleration constants \({\text{ac}}_{1}\) and \({\text{ac}}_{2}\) of the has adapted. Particle's overall velocity influenced by acceleration constants. Here, the local samples' (population) optimal feature has concluded by the local acceleration constant \(({\text{ac}}_{1}\)), whereas overall swarm's (datasets) optimal feature depicted by the global acceleration constant \(\left( {{\text{ac}}_{2} } \right)\). For the reason of converting the inertia weight and acceleration constants, the Fuzzy system has presented; the progress involves the subsequent factor:

Two inputs: The count of iterations (N) when the optimal fitness is unchanged and original inertia weight value (\({\text{iw}}\)).

Three outputs: the change in inertia weight (\({\text{ciw}}\)) and the change in acceleration constants \({\text{cac}}_{1}\) and \({\text{cac}}_{2}\).

Figure 2 shows the Flowchart of Fuzzy Particle Swarm Optimization (FPSO) algorithm.

Fig. 2
figure 2

Flowchart of fuzzy particle swarm optimization (FPSO) algorithm

Figure 3 describes regarding the inputs in three membership operations, besides the outputs in fuzzy techniques. As depicted beneath, the low, medium, and high are being categorized as triangles of left, middle and right. Correspondingly y = trimf(x,params) returns fuzzy membership values computed using following triangular membership function by Eq. (8).

$$f\left( {x;a,b,c} \right) = \left\{ {\begin{array}{*{20}c} {0, x \le a} \\ {\frac{x - a}{{b - a}}, a \le x \le b} \\ {\frac{c - x}{{c - b}}, b \le x \le c} \\ {0, c \le x} \\ \end{array} } \right\}$$
(8)
Fig. 3
figure 3

Triangular membership functions for fuzzy system

To specify parameters, a, b, and c, use params. Membership values are computed for each input parameters such as Inertia weight \({\text{iw}}\) and acceleration constants \({\text{ac}}_{1}\) and \({\text{ac}}_{2}\) in PSO algorithm.It is implemented via https://www.mathworks.com/help/fuzzy/trimf.html.

The fuzzy system architecture consisting of two inputs (number of iterations when best fitness does not change (N) and actual inertia weight (\({\text{iw}}\))) and three outputs (change in inertia weight (ciw)) and acceleration constants \({\text{cac}}_{1}\) and \({\text{cac}}_{2}\) as shown in Fig. 4.This research utilizes nine fuzzy rules for fetching the new \({\text{ciw}},{\text{cac}}_{1}\) and \({\text{cac}}_{2}\) values which is illustrated in one fuzzy rule as trials. The medium value of N, high review values corresponds to high \({\text{ac}}_{1}\),high \({\text{ac}}_{2}\) and low ciw. The Table 2 shows the rules acquired by empirical knowledge experimentation. The range of N is in [1, 20], the value for \({\text{iw}}\) is in 0.5 ≤ \({\text{iw}}\) ≤ 2.5, and the values of \({\text{ac}}_{1}\) and \({\text{ac}}_{2}\) are in 1.0 ≤ \({\text{ac}}_{1}\), \({\text{ac}}_{2}\) ≤ 2.0. The arithmetical explanation for acceleration constants \({\text{ac}}_{1}\), \({\text{ac}}_{2}\) are expressed with the Eqs. (910) discussed below:

$${\text{ac}}_{1} = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{ra}}_{{{\text{ac}}_{1} }} }} {\upmu }_{{\text{i}}}^{{{\text{ac}}_{1} }} \left( {{\text{ac}}_{{1{\text{i}}}} } \right)}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{ra}}_{{{\text{ac}}_{1} }} }} {\upmu }_{{\text{i}}}^{{{\text{ac}}_{1} }} }}$$
(9)

where \({\text{ac}}_{1}\) is percent of local acceleration of particlei, \({\text{ra}}_{{{\text{ac}}_{1} }}\) is number of fuzzy rules initiated to \({\text{ac}}_{1}\), \({\text{ac}}_{{1{\text{i}}}}\) is output of fuzzy rule i for \({\text{ac}}_{1}\), and \({\upmu }_{{\text{i}}}^{{{\text{ac}}_{1} }}\) is membership function value of fuzzy rule i for \({\text{ac}}_{1}\).

$${\text{ac}}_{2} = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{ra}}_{{{\text{ac}}_{2} }} }} {\upmu }_{{\text{i}}}^{{{\text{ac}}_{2} }} \left( {{\text{ac}}_{{2{\text{i}}}} } \right)}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{ra}}_{{{\text{ac}}_{2} }} }} {\upmu }_{{\text{i}}}^{{{\text{ac}}_{2} }} }}$$
(10)

where \({\text{ac}}_{2}\) is percentage of global acceleration of particle i, \({\text{ra}}_{{{\text{ac}}_{2} }}\) is count of fuzzy rules initiated to \({\text{ac}}_{2}\), \({\text{ac}}_{{2{\text{i}}}}\) is output of fuzzy rule i for \({\text{ac}}_{2}\), and \({\upmu }_{{\text{i}}}^{{{\text{ac}}_{2} }}\) is membership function value of fuzzy rule i for \({\text{ac}}_{2}\). The features chosen from both algorithms of IG and FPSO are widely regarded as foremost features.

Fig. 4
figure 4

Fuzzy system to adjust the parameters (\({\text{iw}},{\text{cac}}_{1} ,{\text{cac}}_{2} )\)

Table 1 Dataset description
Table 2 Fuzz rules to adjust the parameters (\({\text{iw}},{\text{ac}}_{1} ,{\text{ac}}_{2} )\)

3.6 Customer classification and prediction

In the dataset of telecom industry, the customers are classified into two categories. That is to say, the first category includes the customers, those who are stay faithful to their operators and occasionally influenced by competitors namely, non-churn customers; whereas, the second category is called churn customers. The suggested strategy intended to identify the churn customers as well as, the motivation of their departure, besides it conceives retention approaches in order to conquer the issue of migration. With regard to categorising churn datasets, the classifier of Divergence Kernel based Support Vector Machine (DKSVM) has applied in this research, which is suggested for classifying the customers and churn prediction through Kullback–Leibler [38]. Kernel function has applied, to correlate the selected feature datasets of the input consumer churn with the non-churn and churn customers, in which telecom dataset is expected to have proper distribution and, eventually, high-dimensional selected feature space is chosen by this way. The input dataset of telecom with selected features possessed non-churn and churn customers correlated through a new kernel function towards a higher feature space in the DKSVM classifier, and a decision plane is finally created. The ability of DKSVM classifier for generalization has enhanced through maximizing the margin distance and by integrating input dataset of telecom accompanying selected feature to the huge dimensional space, through which the function of nonlinear classification determined and it is considered as a primary advantage of this algorithm. The issue of churn prediction has resolved by this DKSVM classifier through express the Eqs. (1113).

$$\phi \left( {{\text{w}},{\upxi }} \right) = \frac{1}{2}\left( {\text{w*w}} \right) + {\text{C}}\left( {\mathop \sum \limits_{{{\text{l}} = 1}}^{{\text{L}}} {\upxi }_{1} } \right)$$
(11)
$$\forall {\upxi }_{1} \ge 0,{\text{ w*x}}_{{\text{l}}} + {\text{bias}} \ge 1 - {\upxi }_{1} {\text{ify}}_{{\text{l}}} = + 1{ }$$
(12)
$${\text{w*x}}_{{\text{l}}} + {\text{bias}} \le - 1 + {\upxi }_{1} {\text{ify}}_{{\text{l}}} = - 1{ }$$
(13)

where ‘w’ denotes the unknown separation plane, \({\upxi }\) denotes the soft margin, \({\text{f}}_{{\text{l}}}\) specifies the training telecom dataset with selected features, \({\text{y}}_{{\text{l}}}\) signifies the respective class of \({\text{sf}}_{{\text{j}}}\) which is churn and non-chrun. L point out the churn predictiontraining samples, and ‘C’ denotes the constant. The Lagrange method plays a vital role in churn predictionerror minimization problem which aids in parameter vector identification by up-surging the function given by below Eqs. (1415)

$${\text{w}}\left( {\upalpha } \right) = \mathop \sum \limits_{{{\text{l}} = 1}}^{{\text{L}}} {\upalpha }_{1} - \frac{1}{2}\mathop \sum \limits_{{{\text{ij}}}}^{{\text{L}}} {\upalpha }_{{\text{i}}} {\upalpha }_{{\text{j}}} {\text{y}}_{{\text{i}}} {\text{y}}_{{\text{j}}} {\text{K}}\left( {{\text{f}}_{{\text{i}}} ,{\text{ f}}_{{\text{j}}} } \right)$$
(14)
$$\mathop \sum \limits_{{{\text{l}} = 1}}^{{\text{L}}} {\upalpha }_{{\text{l}}} {\text{y}}_{{\text{l}}} = 0,{ }0 \le {\upalpha }_{{\text{l}}} \le {\text{C}}$$
(15)

\({\text{K}}\left( {{\text{f}}_{{\text{i}}} ,{\text{ f}}_{{\text{j}}} } \right)\) denotes to the kernel function. Kullback–Leibler [39] has its significance in estimation which is nothing but the probability distribution difference. The Kullback–Leibler divergence from Q to P [39] is given for discrete probability distributions P and Q for \({\text{f}}_{{\text{i}}} ,{\text{ f}}_{{\text{j}}}\), is as follows (16)

$${\text{D}}_{{{\text{KL}}}} ({\text{P}}|{\text{|Q)}} = \mathop \sum \limits_{{{\text{f}}_{{\text{i}}} ,{\text{f}}_{{\text{j}}} }} {\text{P}}\left( {{\text{f}}_{{\text{i}}} } \right)\log \frac{{{\text{P}}\left( {{\text{f}}_{{\text{i}}} } \right)}}{{{\text{Q}}\left( {{\text{f}}_{{\text{j}}} } \right)}}$$
(16)

The Lagrange method is utilized for churn prediction error minimization problem finally for obtaining the new updated kernel and there by achieving work conversion (17)

$${\text{w}}\left( {\upalpha } \right) = \mathop \sum \limits_{{{\text{l}} = 1}}^{{\text{L}}} {\upalpha }_{1} - \frac{1}{2}\mathop \sum \limits_{{{\text{ij}}}}^{{\text{L}}} {\upalpha }_{{\text{i}}} {\upalpha }_{{\text{j}}} {\text{y}}_{{\text{i}}} {\text{y}}_{{\text{j}}} ia {\text{D}}_{{{\text{KL}}}} ({\text{P}}|{\text{|Q)}}$$
(17)

This is the stage for obtaining the customer classification into churn and non- churn by means of MATLAB R2014 tool.

3.7 Customer profiling and relationship

The behavioural information and their relationship are the key factors in partitioning of comprehensive customers' data into groups achieved by Customer cluster concept. The segmentation of telecom data into various groups is attained with the help of Hybrid Kernel Distance based Possibilistic Fuzzy Local Information C-Means (HKD-PFLICM) which also suits partition clustering. The classification is as follows Low, Medium and Risky customers. The integration of Possibilistic-Fuzzy Clustering algorithm (PFCM) [40], and Fuzzy Local Information C-Means (FLICM) algorithm [41] is known as PFLICM. PFLICM helps in clustering of customers in every segment to the nearest cluster based on the kernel distance. It has advantages such as noise immunity and free of artificial parameters. These combination of PFLICM approaches helps in attaining objective function revealed in Eq. (18):

$${\text{J}} = \mathop \sum \limits_{{{\text{c}} = 1}}^{{\text{C}}} \mathop \sum \limits_{{{\text{n}} = 1}}^{{\text{N}}} {\text{au}}_{{{\text{cn}}}}^{{\text{m}}} \left( {\left| {\left| {{\text{x}}_{{\text{n}}} - {\text{c}}_{{\text{c}}} } \right|} \right|_{2}^{2} + {\text{G}}_{{{\text{cn}}}} } \right) + {\text{bt}}_{{{\text{cn}}}}^{{\text{q}}} \left| {\left| {{\text{x}}_{{\text{n}}} - {\text{c}}_{{\text{c}}} } \right|} \right|_{2}^{2} + \mathop \sum \limits_{{{\text{c}} = 1}}^{{\text{C}}} {\upgamma }_{{\text{c}}} \mathop \sum \limits_{{{\text{n}} = 1}}^{{\text{N}}} \left( {1 - {\text{t}}_{{{\text{cn}}}} } \right)^{{\text{q}}} ,{\text{ u}}_{{{\text{cn}}}} \ge 0,{ }\forall {\text{n}} = 1, \ldots {\text{N}},{ }\mathop \sum \limits_{{{\text{c}} = 1}}^{{\text{C}}} {\text{u}}_{{{\text{cn}}}} = 1$$
(18)

where xn is a \({\text{d}} \times 1\) column vector signifying nth telecom data, C is a number of groups being estimated into Low, Medium and Risky customers, cc is a \({\text{d}} \times 1{ }\) vector of cth cluster center, weight ucn is a membership value of nth pixel in cth cluster, tcn is typicality value of nthtelecom data in cth cluster, a, b, and \({\upalpha }\) are fixed parameter values used to balance terms of objective function for customer group, and m and q are fixed “fuzzifier” parameters which helps in governing degree of sharing across clusters and degree to which points may be characterized as outliers, respectively. Also,

$${\text{G}}_{{{\text{cn}}}} = \mathop \sum \limits_{{{\text{ke}} \in {\text{Ne}}_{{\text{n}}} }} \frac{1}{{{\text{ked}}_{{{\text{nk}}}} + 1}}\left( {1 - {\text{u}}_{{{\text{ck}}}} } \right)^{{\text{m}}} \left| {\left| {{\text{x}}_{{\text{k}}} - {\text{c}}_{{\text{c}}} } \right|} \right|_{2}^{2}$$
(19)

In the Eq. (18), \({\text{ked}}_{{{\text{nk}}}}\) signifies kernel distance between the data indices between \({\text{x}}_{{\text{n}}}\) and \({\text{x}}_{{\text{k}}}\). Nen signifies neighborhood around the center data. The hyperbolic tangent kernel and Gaussian kernel are the main factors in the computation of kernel distance \(\left( {{\text{ked}}_{{{\text{nk}}}} } \right)\).

Hyperbolic Tangent kernel is termed as Sigmoid Kernels or tanh kernels are different as in Eq. (20),

$${\text{ke}}_{1} \left( {{\text{x}}_{{\text{n}}} ,{\text{x}}_{{\text{k}}} } \right) = {\text{tanh}}\left( {{\text{v}} + {\text{x}}_{{\text{n}}} {\text{x}}_{{\text{k}}} } \right)$$
(20)

Gaussian kernel is stated as in Eq. (20),

$${\text{ke}}_{2} \left( {{\text{x}}_{{\text{n}}} ,{\text{x}}_{{\text{k}}} } \right) = {\text{exp}}\left( {\frac{{\left| {\left| {{\text{x}}_{{\text{n}}} - {\text{x}}_{{\text{k}}} } \right|} \right|^{2} }}{{2{\upsigma }^{2} }}} \right)$$
(21)

The function \({\text{ked}}_{{{\text{nk}}}}\) helps in combining the kernel outcomes and given by Eq. (22),

$${\text{ked}}_{{{\text{nk}}}} = \frac{{{\text{ke}}_{1} \left( {{\text{x}}_{{\text{n}}} ,{\text{x}}_{{\text{k}}} } \right) + {\text{ke}}_{2} \left( {{\text{x}}_{{\text{n}}} ,{\text{x}}_{{\text{k}}} } \right)}}{2}$$
(22)

The overall clustering is achieved for each customer lastly by applying this outcome to Eq. (19).The different pattern and churn customers behaviour are signified by the cluster in each segment. The characteristics of three clusters characterizing a unique collection of behaviour may help in enhancing the decision making scope for churner. Figure 5 shows the overall flowchart of the HKD-PFLICM algorithm.

Fig. 5
figure 5

Flowchart of hybrid kernel distance based possibilistic fuzzy local information C-means (HKD-PFLICM)

3.8 Retention strategy uses recommendation system

Appropriate monitoring is required with churn consumers and handles them through the promotion cycle to conclude the retention approaches. Top-down approach has a scarcity with regard to aiming specific consumers, Bottom-up and adapted method is unsuccessful concerning marketing strategy. To build tailored retention practices use the similarity-based strategy. This method is the most familiar definition of a recommender strategy. Based on its expectations and past actions evaluated from clusters, it may assist in proposing acceptable policies or other customized collection of deals to each churn customers. Comparison measure to aim identical consumers relies on the subsequent approaches.

Content-based: This process is focused on past behavior and proposes identical suggestions and products to the same consumer.

Collaborative: This process is focused on past behavior and preference only for those customers who have an identical inclination and comparable behavior.

Hybrid: This process indicates combined approach of both content and collaborative strategies.

The collaborative strategy based approach is proposed for those customers who have identical past behaviours and identical preferences. Every customer is classified into the categories of Low, Medium, and High. These approaches were executed by considering the distance function based on the kernel. In this sense, CRM is associated with personalizing the interaction with consumers and reacting to consumer feedback through company, such as creating a collective management network, expanding front offices to include all staff, collaborators and vendors to communicate with consumers via email, mobile, web sites, contacts, fax, etc. The behaviour of the consumer is tracked and recorded by segmentation to define the patterns and use. From now on, through this pattern and behavior, we render guidelines for only specific customers to suggest.

4 Experiments and results

Numerous trials on the proposed DKSVM strategy have been carried out throughout this section by applying machine learning techniques like, Random Forest (RF), J48 algorithm, Decision Stump [17] on two datasets namely telecom churn (cell2cell) and churn-bigml dataset. The outcomes have been acquired through various machine learning strategies, which has been provided in following segments.

4.1 Performance evaluation matrix –churn prediction

The performance of the proposed churn prediction approach has been evaluated through the metrics such as accuracy, precision, f-measure, recall and ROC area. Equation (23) measures accuracy, which recognizes the number of instances that were properly classified.

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
(23)

Here "TN" means True Negative, "TP" means True Positive, "FN" means False Negative and "FP" means False Positive. TP Rate is called sensitivity, too. It indicates that part of the data is appropriately marked as positive. The TPR has to be higher for every classifier. Equation (24) is applied to compute True Positive Rate (TPR).

$${\text{TPR}} = \frac{{\text{True Positives}}}{{\text{Actual Positives}}}$$
(24)

False Positive Rate (FPR) shows that portion of data is falsely marked as positive. For every classifier the outcome of the FP rate has to be minimal. The estimate is rendered using Eq. (25):

$${\text{FPR}} = \frac{{\text{False Positives}}}{{\text{Actual Negatives}}}$$
(25)

Accuracy, also regarded as positive predictive value (PPV), shows which portion of prediction data is positive. Calculation performed using Eq. (26).

$${\text{Precision}} = \frac{{\text{True Positive}}}{{{\text{True Positive}} + {\text{False Positive}}}}$$
(26)

The recall is further measurement of completeness i.e. the algorithm's real score. It's the chance the process can pick all the appropriate instances. The low recall value means lots of false negatives. The measurement is done using Eq. (27).

$${\text{Recall}} = \frac{{\text{True Positive}}}{{{\text{True Positive}} + {\text{False Negative}}}}$$
(27)

The value of F-measure is a trade-off within categorizing the overall data points accurately and guaranteeing that each class includes points of solely one class. The measurement is performed using Eq. (28).

$${\text{F}} - {\text{measure}} = 2{*}\left( {\frac{{\text{Precision*Recall}}}{{{\text{Precision}} + {\text{Recall}}}}} \right)$$
(28)

The ROC field reflects average output against all feasible FP and FN cost ratios. If value of the ROC area is 1.0, then this is a perfect prediction. Equally, value 0.5, 0.6, 0.7, 0.8 and 0.9 signifies random prediction, bad, moderate, good and superior accordingly. The accuracy of overall outcome of the DKSVM algorithm is higher than other prediction algorithms as prediction problem because in both datasets its efficiency is great (See Tables 3 and 4). Tables 3 and 4 reveal the accuracy and building prediction model time for both datasets such as cell2cell and bigml.

Table 3 Performance measure of various classification algorithms with tenfold cross-validation on cell2cell dataset
Table 4 Performance measure of various classification algorithms with tenfold cross-validation on churn-bigml dataset

Figure 6a and b shows the precision comparison results of four different classifiers in two benchmark datasets. However, the proposed DKSVM classifier has higher precision results of 89.00% and 85.7532% for cell2cell dataset and bigml dataset respectively (See Tables 5 and 6). The proposed DKSVM gives 7.6822%, 5.7211% and 3.4122% higher precision results when compared to Decision Stump, J48 and RF classifiers respectively in Bigml dataset. Decision stump gives worst precision results for CELL2CELL dataset when compared to other methods. RF and proposed classifiers have high precision to predict churners correctly in both datasets as shown in Tables 5 and 6.

Fig. 6
figure 6

Precision results comparison vs. classifiers (benchmark datasets)

Table 5 Performance metrics of various classification algorithms on cell2cell dataset
Table 6 Performance metrics of various classification algorithms on bigml dataset

Figure 7a and b shows the performance comparison results of the two benchmark datasets via recall among four different classifiers. However, the proposed DKSVM classifier has higher recall results of 90% and 86.7532% for cell2cell dataset and bigml dataset respectively (See Tables 5 and 6). It means that this algorithm found maximum number of false positives in dataset and it can correctly identify non-chrun customers. This indicates that they are good classifiers (DSVM and J48) for prediction. The proposed DKSVM gives 7.3232%, 5.6632% and 3.2972% higher when compared to Decision Stump, J48 and RF classifiers respectively in second dataset. When compared to existing methods Decision Stump and RF gives worst recall results, since tree based models have incorrectly predicts the false values.

Fig. 7
figure 7

Recall results comparisonvs. classifiers (benchmark datasets)

Figure 8a and b shows the four different classifiers in two benchmark datasets with respect to f-measure. Proposed DKSVM classifier has higher f-measure results of 89% and 86.2503% for CELL2CELL dataset and BIGML dataset respectively (See Tables 5 and 6). The proposed DKSVM gives 8.7073%, 5.4643% and 3.0123% higher f-measure results when compared to Decision Stump, J48 and RF classifiers respectively for Bigml dataset. When compared to existing methods, Decision Stump gives worst f-measure results of 73.9945% for CELL2CELL dataset. Other methods having higher F-measure in both datasets as shown in Tables 5 and 6.

Fig. 8
figure 8

F-measure results comparisonvs. classifiers (benchmark datasets)

Figure 9a and b shows the performance comparison results of the two benchmark datasets with accuracy results comparison among four different classifiers. The proposed DKSVM classifier has higher accuracy results of 90% and 87.75% for cell2cell dataset and bigml dataset respectively (See Tables 5 and 6). It means that this algorithm found the maximum number of true positives in the dataset and it can correctly identify the churn customers. The proposed DKSVM gives 7.86%, 5.66% and 3.285% higher accuracy results when compared to Decision Stump, J48 and RF classifiers respectively. Random Forest and proposed DKSVM algorithms have performed very well as compared to all other algorithms by having 87.65% and 89.76% ROC area under the curve for first and second dataset. DKSVM is an outstanding classifier for prediction of instances. According to the ROC value scale discussed earlier, RF and DKSVM also performed better as shown in the Fig. 10 (see Tables 5 and 6).

Fig. 9
figure 9

Accuracy results comparison vs. classifiers (benchmark datasets)

Fig. 10
figure 10

ROC area curve. vs. classifiers (benchmark datasets)

4.2 Performance evaluation matrix –clustering

The Customer Cluster is employed to segment the entire consumer details into categories on the basis of their activities and interaction detail. In order to measure the results of the clustering methods the following metrics such as accuracy and error is measured in the telecom churn (cell2cell) and churn-bigml datasets (Table 7). The results of the proposed HKD-PFLICM are compared to K-means, Flexible K-Medoids, PFLICM, Fuzzy Local Information C- Means (FLICM), and Entropy Weighting LICM known as EWFLICM.Accuracy measurements demonstrate the level of similarities between one cluster (i.e., set of clusters) is to another cluster (ground-truth).Error metrics depictthe level of non-similarities between one cluster (i.e., set of clusters) to another (ground-truth) clustering.Customer cluster is used for partitioning complete customers' data into groups based on their behavior information and their relationship. For measuring results of clustering methods following metrics such as accuracy and error is measured in the telecom churn (cell2cell) and churn-bigml datasets (Table 7).

Table 7 Clustering metrics comparison vs. clustering algorithms

The results of the proposed HKD-PFLICM are compared to K-means, Flexible K-Medoids, PFLICM, Fuzzy Local Information C- Means (FLICM), and Entropy Weighting LICM known as EWFLICM. Accuracy metrics indicate how much one clustering (i.e., set of clusters) is similar to another (ground-truth) clustering. Error metrics indicate how much one clustering (i.e., set of clusters) is non-similar to another (ground-truth) clustering (Table 8).

Table 8 Threshold values of churn customers (churn-bigml dataset)

Clustering behavior of churner with three major attributes such as communication calls, uncommunication calls,minutes having forms three clusters such as Low (Cluster 3), Medium (Cluster 2) and Risky (Cluster 1) for CELL2CELL dataset. Those cluster threshold values are discussed in the Table 9.

Table 9 Threshold values of churn customers (Churn- Cell2cell Dataset)

Figure 11 shows the clustering accuracy and error results comparison among various clustering methods with two benchmark datasets. The proposed clustering algorithm gives higher accuracy results of 94.11% and 95.41% for cell2cell dataset and bigml dataset. Since the proposed work, instead of considering the distance between the customer behaviour kernel functions is used which increases the clustering results than the other methods.

Fig. 11
figure 11

Accuracy and error results comparison between clustering methods

From this pattern and behaviour, make rules for the recommendation of only similar customers in the future. Figures 12, 13 and 14 show the behaviour of churner with three major attributes such as TOTAL CALLS, TOTAL MINUTES, TOTAL CHARGE having forms three clusters such as Low (Cluster 3), Medium (Cluster 2) and Risky (Cluster 1). Those cluster threshold values are discussed in the Table 8.

Fig. 12
figure 12

Total_Calls behavior in each cluster (churn-bigml dataset)

Fig. 13
figure 13

Total minutes behavior in each cluster (churn-bigml dataset)

Fig. 14
figure 14

Total charge behavior in each cluster (churn-bigml dataset)

5 Conclusion and future work

Customer Relationship Management (CRM) always have the major concern of churn prediction, in order to keep hold of, as well as to improve the number of precious customers through recognizing an identical set of new customers and furnishing them a considerable discounts, services and offers. Through this paper, an efficient strategy of customer churn prediction is presented, which is an optimal choice to support telecom service providers to identify consumers, as they apparently dealt with churn. To perform this task, some data mining steps have been taken to account, such as noise removal, feature selection, customer classification and prediction. This proposed work consists of the following steps: feature selection, customer classification and prediction and finally customer profiling. Information Gain (IG) and Fuzzy Particle Swarm Optimization (FPSO) are proposed for feature selection. By employing inertia weight and acceleration coefficients, the world's optimum version of PSO has amended, concerning enrichment namely, FPSO. To process the categorization of Churn and Non-Churn patrons, the classifier of Divergence Kernel-based Support Vector Machine (DKSVM) has been applied through the metric focused on Kullback–Leibler. Eventually, the characteristics of customers have procured from customer profiling and aggregated as clusters by enforcing the clustering method of Hybrid Kernel Distance-based Possibilistic Fuzzy Local Information C-Means (HKD-PFLICM) concerning the CRM to maintain the customers. The similarities among overall customers have been observed from the analysis, through which the concern's productive potential has enriched, thus the concern eventually reaches the competent marketing promotions. The implementation outcomes represents that, the proposed DKSVM churn approach accompanied by classifier approaches is capable of delivering efficient performance. For instance, the proposed clustering algorithm gives higher accuracy results of 94.11% and 95.41% for cell2cell dataset and bigmldataset. Since the proposed work uses the kernel functions instead of considering the distance between the customer behaviour, the clustering results are observed to be better than the other existing methods. In future, the additional exploration of this approach will focus on the learning strategies of eager learning and lazy learning for optimum churn prediction. In addition, this research has augmented to scrutinize the revising character pattern of churn customers by enforcing the strategies of Artificial Intelligence (AI), which are intended to analyze the trend and prophecy. Apart from SVM, some other prediction approaches like Genetic Algorithms (GAs), Neural Networks (NNs), etc. can be used. Moreover, various other churn prediction datasets could be used and its results could be analyzed in future.