1 Introduction

The Electrical Discharge Machining (EDM) is a state of the art machining process. EDM is chosen over the conventional machining processes which are unable to cut the high strength to weight ratio and toughened conductive materials. In the EDM process a series of electric sparks is established between the tool and the workpiece causing the removal of material through controlled erosion. The thermal energy produced due to spark melts and vaporizes the workpiece material. The EDM process finds diversified applications in aerospace, ordnance, and automobile industries [1]. The complexity of the EDM process results in difficulty to establish the relation between the process parameters and their responses. However, the process performance measures such as material removal rate (MRR), tool wear rate (TWR), surface roughness \(\left({\mathrm{R}}_{\mathrm{a}}\right)\), and circularity \(\left({r}_{1}/{r}_{2}\right)\) have been considered for performance analysis [2]. Many process parameters influence the above-mentioned responses in differnt ways. Parameters such as discharge current \(\left({\mathrm{I}}_{\mathrm{P}}\right)\), pulse-on-time \(\left({\mathrm{T}}_{\mathrm{on}}\right)\),duty factor \((\uptau )\), and flushing pressure \(\left({\mathrm{f}}_{\mathrm{P}}\right)\) significantly affect the EDM process [3].

In this work, four control parameters \({\mathrm{I}}_{\mathrm{P}}\),\({\mathrm{T}}_{\mathrm{on}}\), \(\uptau\), and \({\mathrm{f}}_{\mathrm{P}}\) at three levels were considered for the investigation. The experimental runs were reduced by utilizing the Box-Behnken design of response surface methodology (RSM) without losing the accuracy of data [4, 5]. According to Box-Behnken design (four-factor three-level) twenty-seven experimental runs were conducted where the runtime of each experiment was one hour. In most of the earlier applications of RSM, optimization was carried for a single response. However, it can be extended for multiple response optimization [6, 7]. The responses can be of three types namely larger-the-best, nominal-the-best and smaller-the-best. Most of the engineering applications or physical processes encounter multiple responses which are often conflicting in nature [8].

In the multiple response optimization problems, it is a usual practice to convert multiple responses into an equivalent single unique response via multi-attribute decision-making process. The fuzzy decision-making approach and fuzzy with Taguchi experimental design was used to deal with the multi-response optimization problems [9, 10]. In the fuzzy theory the concept of membership function used to deal with inexact information obtained through the experimental process instead of crisp variables. It is difficult to find out the most suitable membership functions for the experimental data and developing the rule base required for the inference engine. In this research work, the neuro-fuzzy system was employed on the experimental data for obtaining a single unique equivalent response for four performance characteristics of the EDM process. The AISI D2 tool steel was considered as work material, which is hard to machine using conventional machine tools. Brass was considered as tool material, due to good conductivity and less erosion property. The proposed neuro-fuzzy system produces a single response, known as Multiple Performance Characteristic Index (MPCI). The relation between the MPCI and controlling process parameters were articulated mathematically by considering MPCI as a single response. A regression analysis was carried out using RSM design to obtain the regression equation. Further, PSO has been utilized to retrieve the most suitable parametric combination that maximizes the MPCI [11].

The performance enhancement in EDM is a well-researched area. However, finding the best suitable machining parameters for finest performance in EDM is a challenging task. Lee and Li [12] studied the influence of parameters such as gap voltage, discharge current, pulse duration, pulse interval, and flushing pressure on the rate of material removal, wear rate of the tool, and surface roughness of EDM process. They used tungsten carbide (WC) as workpiece and copper tungsten (CuW) as electrode material. Tebin et al. [13] experimented on EDM to study the effect of discharge current, the pulse-on duration, the pulse-off duration, the tool electrode gap, and the tool material on MRR and TWR using steel 50CrV4 as workpiece, copper, and graphite as a tool. The effect of EDM machining parameters such as pulse current, gap voltage, and pulse-on-time on MRR and TWR was investigated by Habib [14] using RSM design. The MRR and TWR values were increased with increasing values of process parameters. Singh et al. [15] conducted experiments on EDM to obtained MRR, TWR, Ra, and analyzed the variation in responses with peak current, gap voltage, pulse-on-time, and duty cycle. RSM design was used to conduct experiments on EDM by Pradhan and Biswas [16]. They investigated the effect of four controllable input variables such as discharge current, pulse duration, pulse-off-time, and voltage on machining performance of AISI D2 steel and copper as work piece-tool combination. It is observed that the discharge current and pulse-on-time have a significant effect on surface roughness. Helmi et al. [17] investigated the surface roughness and MRR in the electro discharge grinding process employing the Taguchi method when tool steel is machined with brass and copper electrodes. They found that, peak current and pulse-on-time were the important parameters influencing the performance characteristics. Chattopadhyay et al. [18] proposed a design of experiment (DOE) method to experiment on rotary EDM with EN8 steel and copper as work piece-tool pair. The relations between performance characteristics (MRR and Electrode Wear Rate) and process parameters (peak current, pulse-on-time and rotational speed of electrode) was established. The peak current and rotational speed of the tool have a significant effect on both the responses. Regression models and DOE approaches have been employed extensively in the EDM process to obtain the best machining parameters [7, 19, 20]. The DOE approaches are well suited to obtain optimal parametric combinations for a single response problem. The method breaks down when multiple responses are simultaneously optimized. In this direction, a fuzzy TOPSIS approach was proposed to convert multi-responses into a single response [9].

A combination of Genetic algorithm (GA) and artificial neural network (ANN) was employed to retrieve best suited process parameters to improve performance in EDM process by utilizing graphite as a tool and nickel-based alloy as a work piece [21]. A similar approach has been considered by Su et al. in [22] from the rough cutting to the finish cutting stage. In most of the studies, multiple objectives are transformed into a unique single objective and attempt to find optimal parameters. To solve the single response optimization problem of Taguchi approach, Liao [23] proposed an effective method named as PCR-TOPSIS which is a combination of process capability ratio (PCR) and TOPSIS theories. Singh et al. [24] used the adaptive neuro-fuzzy inference system for predictive analysis of EDM responses. The MRR and \(\mathrm{Ra}\) in the EDM process were optimized using ANN with PSO [25]. Last few decade researchers are attracted towards artificial intelligence (AI) techniques such as ANN, GA, and fuzzy logic, to model and optimize the manufacturing processes. These new techniques can minimize the few limitations of the traditional process modeling approaches. To optimize multiple responses a neuro-fuzzy system was used by Antony et al. [8]. A combination of back propagation neural network (BPNN) with Levenberg Marquardt (LM) algorithm was used by Panda and Bhoi [26] to predict MRR. The AI technique such as simulated annealing (SA), ANN, genetic algorithm (GA) were utilized to optimize EDM process parameters [27, 28].

PSO is a computational simulation technique based on the movement of organisms such as gather of birds and groups of fish used to solve optimization problems [29]. PSO has a populace of search points to investigate the search space which is referred to as a ‘particle’ and the particles represent a potential solution. The best solution is known as a fitness function value [30]. Particles having the best global value is determined by the fitness function value in the current swarm (gbest), and also determines the best position of each particle over time, pbest, i.e. in current and all previous moves. In the search domain, particles move their position and the velocity updated according to its own flying experience toward its pbest and gbest locations [31]. The application of PSO is found out in many engineering optimization problems due to its certain characteristic, such as fast convergence capability, less number of controlled parameters used for optimization and convergence guaranteed [32,33,34,35]. Ali et al. optimized EDM process by machining AISI2312 hot worked steel alloy using neural network and PSO [43]. Muthuramalingam et al. investigated the white layer thickness EDM processed silicon steel. They also analyse the process parameter using a fuzzy interference system (ANFIS, fuzzy TOPSIS and fuzzy VIKOR) [44, 45]. The fuzzy-based AI also used for optimizing EDM and ECM process [46,47,48]. Marichamy et al. cast brass to perform EDM machining. They observed that the current is the most significant parameter which affects MRR and TWR [49,50,51].

It is evident from the literature review that the EDM process has been extensively studied by researchers. However, in this paper the effort was made to study the EDM in the context of multi-objective optimization of four responses. The effect of four controllable parameters namely discharge current \(({\mathrm{I}}_{\mathrm{P}})\), pulse-on-time \(({\mathrm{T}}_{\mathrm{on}})\), duty factor \((\uptau )\), and flushing pressure \(({\mathrm{f}}_{\mathrm{P}})\) were investigated on four EDM performance responses such as MRR, TWR, \({\mathrm{R}}_{\mathrm{a}}\), and \({r}_{1}/{r}_{2}\). This combination of input and output parameters were rarely investigated. The multi-response were converted to a single response and further optimized to get the set of input parameters for best output performance.

2 Material and method

Experiments were carried out in a die-sinking EDM machine (Electronica Electra plus PS 50ZNC) in the dielectric medium as shown in Fig. 1. The controllable factors such as flushing pressure, pulse-on-time, discharge current, and duty factor \((\tau )\) where τ is defined as \(\uptau ={\mathrm{T}}_{\mathrm{on}}/\left({\mathrm{T}}_{\mathrm{on}}+{\mathrm{T}}_{\mathrm{off}}\right)\) in percentage (\({\mathrm{T}}_{\mathrm{off}}\) denotes pulse-off–time) were considered for the investigation. The workpiece used for EDM was AISI D2 steel, which is an air-hardened high carbon, high chromium tool steel. AISI D2 tool steel possess material composition such as Carbon 1.55%, Manganese 0.6%, Silicon 0.6%, Chromium 11.8%, Molybdenum 0.8%, Vanadium 0.8% and rest is iron. The density of D2 tool steel is 7.7 × 103 kg/m3. In EDM large amount of heat evolves during the spark. Therefore, a good conductive tool is required which have an adequate melting point. Pure brass in a cylindrical shape (diameter 25 mm) was considered a tool. The workpiece used in the EDM operation was 6 mm thick plate, which was sliced from a long bar of 85 mm diameter. The weight of the tool and workpiece was measured before and after the machining process in a high precision electronic weight measuring machine (least count = 0.001 g). The machining operation was carried out for one hour for each experimental run. This time has been desided from pilot experiments conducted to get satisfactory responses. Exclusively for circularity calculation a perfect round shape was necessary, which was obtained after 1 h machining.

Fig. 1
figure 1

EDM machine set-up

The response surface methodology (RSM) was used to collect data, developing mathematical expression, improving, and optimizing processes. This deals with the situation where several input variables potentially influence the performance measure or quality of the product or process. The performance measure or quality is known as the response. The second-order model is widely used in response surface methodology due to its flexibility and it can take a wide variety of functional forms given in Eq. 1 [16].

$${\text{Y}} = {\upbeta }_{0} + \mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{k}}} {\upbeta }_{{\text{i}}} {\text{X}}_{{\text{i}}} + \mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{k}}} {\upbeta }_{{{\text{ii}}}} {\text{X}}_{{\text{i}}}^{2} + \mathop \sum \limits_{{{\text{i}},{\text{j}} = 1,{\text{i}} \ne {\text{j}}}}^{{\text{k}}} {\upbeta }_{{{\text{ij}}}} {\text{X}}_{{\text{i}}} {\text{X}}_{{\text{j}}} + {\upvarepsilon }$$
(1)

where \({\text{Y}}\) is the corresponding response of input variables \({\mathrm{X}}_{\mathrm{i}}\), \({\mathrm{X}}_{\mathrm{i}}^{2}\) and \({\mathrm{X}}_{\mathrm{i}}{\mathrm{X}}_{\mathrm{j}}\) are the square and interaction terms of parameters respectively. \({\upbeta }_{0}\), \({\upbeta }_{\mathrm{i}}\), \({\upbeta }_{\mathrm{ii}}\) and \({\upbeta }_{\mathrm{ij}}\) are the unknown regression coefficients and \(\upvarepsilon\) is the error.

The experimental design is made as per the Box-Behnken design of response surface methodology (RSM) with four factors because it is capable of generating a satisfactory prediction model with few experimental runs [5, 7]. The parametric levels are coded using the relation shown below [7]:

$${\text{Coded Value }}\left( {\text{Z}} \right) = \frac{{{\text{X}} - \frac{{{\text{X}}_{{{\text{max}}}} - {\text{X}}_{{{\text{min}}}} }}{2}}}{{\frac{{{\text{X}}_{{{\text{max}}}} - {\text{X}}_{{{\text{min}}}} }}{2}}}$$
(2)

where \(Z\) is a coded value (− 1, 0, 1), Xmax and Xmin is the maximum and minimum value of the actual variable, and \(\mathrm{X}\) is the actual value of the corresponding variable. The value of parameter setting in coded form are shown in Table 1.

Table 1 Factors and their levels

In the Box-Behnken experimental design 2 k = 16 factorial points (k = 4 is the number of process parameters), eight axial points, and three center points are considered. The layout of the experimental design is shown in Table 2.

Table 2 Experimental design

Each experiment was run for one hour and the responses are calculated as follows [18]:

  1. i.
    $${\text{MRR}} = \frac{{1000 \times \Delta {\text{W}}_{{\text{W}}} }}{{{\uprho }_{{\text{W}}} \times {\text{T}}}}\;{\text{mm}}^{{3}} /{\text{min}}$$
    (3)
  2. ii.
    $${\text{TWR}} = \frac{{1000 \times \Delta {\text{W}}_{{\text{t}}} }}{{{\uprho }_{{\text{W}}} \times {\text{T}}}}\;{\text{mm}}^{{3}} /{\text{min}}$$
    (4)

    where \({\Delta W}_{{\text{W}}}\) is the weight of material removed from the workpiece during machining and \(\Delta W_{t}\) is the weight of material removed from the tool during machining, \(\rho_{W}\) and \(\rho_{t}\) are densities of workpiece and tool respectively, \(\mathrm{T}\) is the time of machining.

  3. iii.

    Roughness was measured by a portable stylus type profilometer (Taylor Hobson, Surtronic 3 + talysurf).

  4. iv.

    Circularity was measured as the ratio of minimum to maximum Feret’s diameter. Where Feret’s diameter is the shortest distance between the parallel tangents drown on two opposite sides of the hole as shown in Fig. 2 [36]. The machined area was magnified (45-X magnification) by Samsung camera attached to a RADIAL INSTRUMENT microscope and Feret’s diameter was measured by drawing tangents as shown in Fig. 2.

Fig. 2
figure 2

Feret’s diameter

3 Methodology

This paper presents a structured and generic methodology that includes both RSM as well as AI tools to minimize the uncertainty in decision-making. The investigation is made to convert multiple responses into a single performance characteristic index via a neuro-fuzzy based model. The relationship between MPCI and process parameters is developed through a statistically valid regression equation. PSO is used to find out the best parameter setting using the developed process model. Assume \(n\) experiments were conducted utilizing RSM and responses obtained as MRR, TWR, \({\mathrm{R}}_{\mathrm{a}}\), and Circularity. The responses were divided into three main categories: the smaller-the better (STB), the nominal-the-best (NTB), and the larger the-better (LTB) responses. In practice, all the responses are not of the same category. Therefore, characteristic responses are converted to respective S/N ratios as follows [37]:

The larger-the-best performance characteristic can be expressed as:

$${\text{S}}/{\text{Nratio}} = - 10\log_{10} \left( {\frac{1}{{\text{n}}}\sum \frac{1}{{{\text{Y}}_{{\text{i}}}^{2} }}} \right)$$

The smaller-the-best performance characteristic can be expressed as:

$${\text{S}}/{\text{Nratio}} = - 10\log_{10} \left( {\frac{1}{{\text{n}}}\sum {\text{Y}}_{{\text{i}}}^{2} } \right)$$

where \({\text{Y}}_{{\text{i}}}\) is the ith experimental data of response.

All the S/N ratio responses \(\left({\mathrm{X}}_{\mathrm{ij}}\right)\) are normalized to obtain a normalized response \(\left({\mathrm{Z}}_{\mathrm{ij}}\right)\) so that they lie in the range, \(0 \le \mathrm{ Zij}\le 1\). Normalization is carried out to avoid the scaling effect and minimize the variation of the S/N ratio obtained at different scales. For responses of smaller-the-better type and larger-the-better normalization is carried out using Eq. 5 [16].

$${\text{Z}} = \frac{{{\text{X}} - \min \left\{ {{\text{X}},{\text{J}} = 1,2, \ldots .,n} \right\}}}{{\max \left\{ {{\text{X}},{\text{j}} = 1,2, \ldots ..,n} \right\} - \min \left\{ {{\text{X}},{\text{j}} = 1,2, \ldots .,n} \right\}}}$$
$$Z_{ij} = \frac{{X_{ij} - {\text{min}}\left\{ {X_{ij} , j = 1,2,3, \ldots \ldots ..,n} \right\}}}{{{\text{max}}\left\{ {X_{ij} , j = 1,2,3, \ldots \ldots ..,n} \right\} - {\text{min}}\left\{ {X_{ij} , j = 1,2,3, \ldots \ldots ..,n} \right\}}}$$
(5)

e fuzzy logic process can be applied effectively, where the input–output have nonlinear relation. The fuzzy logic can classify the input and output data sets broadly into different fuzzy classes. The membership function can be assign to crisp data sets by inference, intuition and AI tools. The common data clustering techniques are hard c mean clustering (HCM), and fuzzy c-mean clustering (FCM). HCM is used to assign a single membership in anyone, and only one, data cluster [22, 28]. However, the FCM extends crisp classification idea into a fuzzy classification notion. Therefore, the membership to the various data points can be assigned in each fuzzy set (fuzzy class, fuzzy cluster), along with the restriction (analogous to the crisp classification) that the sum of all membership values for a single data point in all of the classes has to be unity. FCM minimizes the membership function assigning uncertainty of crisp data into different fuzzy classes. The data points are grouped into c clusters by calculating the fuzzy partition matrix using fuzzy c-means algorithm. Therefore, the centers (centroids) are clustered, that minimize dissimilarity function \({\text{J}}_{{\text{m}}}\) [37].

$$J_{{m{\text{ }}}} \left( {U^{*} ,v^{*} } \right) = \min {\text{ }}\left[ {{\text{ }}J_{{m{\text{ }}}} \left( {U{\text{ }},{\text{ }}v} \right)} \right] = \min {\text{ }}\left[ {\sum\limits_{{k = 1}}^{n} {\sum\limits_{{i = 1}}^{c} {\left( {\mu _{{ik}} } \right)^{{m^{\prime } }} } } \left( {d_{{ik}} } \right)^{2} } \right]$$
(6)

The membership value for kth data point in the ith cluster is \({\upmu }_{{{\text{ik}}}}\), \({\text{m'}}\) is the weighting parameter varying in the range [1, ∞], the matrix of fuzzy partition is U, viz cluster center matrix, and similarity matrix is d, given in Eq. 7. Utilizing the Euclidean distance measure to characterize the similarity, the elements of d are calculated by [37]:

$${\text{d}}_{{{\text{ik }}}} = {\text{d }}\left( {{\text{x}}_{{\text{k}}} - {\text{ v}}_{{\text{i}}} } \right) = {\text{ }}\left\| {{\text{x}}_{{\text{k}}} } \right. - \left. {{\text{v}}_{{\text{i}}} } \right\| = \sqrt {\sum\limits_{{{\text{j}} = 1}}^{{\text{m}}} {\left( {{\text{x}}_{{{\text{kj}}}} - {\text{v}}_{{{\text{ij}}}} } \right)^{2} } } \left( {{\text{for }}i\; = \;1\;{\text{to}}\;c\;{\text{and}}\;k\; = \;1\;{\text{to}}\;n} \right)$$
(7)

where m is the number of features, \({\text{x}}_{{\text{k}}}\) is kth data point and vi is the centroid of ith cluster that can be presented by

$${\text{V}}_{{\text{i}}} = \left\{ {{\text{V}}_{{{\text{i}}1}} ,{\text{ V}}_{{{\text{i}}2}} ,{ } \ldots .,{\text{ V}}_{{{\text{im}}}} } \right\}$$

(for i = 1 to c) and the cluster centers are calculated using the following formulation [37]

$$v_{{ij}} = \frac{{\sum\nolimits_{{k = 1}}^{n} {\mu _{{ik}}^{{m'}} x_{{kj}} } }}{{\sum\nolimits_{{k = 1}}^{n} {\mu _{{ik}}^{{m'}} } }}\;({\text{for}}\;{\text{i = 1}}\;{\text{to}}\;{\text{c}}\;{\text{and}}\;{\text{j = 1}}\;{\text{to}}\;{\text{m}})$$
(8)

where \(x\) is a fuzzy variable describing data point. In essence, fuzzy partitioning is performed through an iterative optimization utilizing the following formulation:

$$u_{{ik}} \left( {s{\text{ }} + 1} \right) = \frac{1}{{\left[ {\sum\nolimits_{{j = 1}}^{c} {\left( {\frac{{d_{{ik}} \left( s \right)}}{{d_{{jk}} \left( s \right)}}} \right)^{{\frac{2}{{m' - 1}}}} } } \right]}}$$
(9)

It should be noted that sum of membership values fora cluster must be equal to 1 i.e.

$$\mathop \sum \limits_{{{\text{i}} \in {\text{I}}_{{\text{k}}} }} {\upmu }_{{{\text{ik}}}} = 1$$

Finally, the best available solution within a predefined accuracy criterion is determined by:

$$\left\| {{\text{U}}^{{\left( {{\text{r}} + 1} \right)}} } \right. - {\text{ }}\left. {{\text{U}}^{{\left( {\text{r}} \right)}} } \right\| \le \in$$

where \({\upvarepsilon }\) is error level for the termination of iteration which varies between 0 and 1. In detail, this iterative procedure converges to a local minimum of \({\mathrm{J}}_{\mathrm{m}}\). Algorithmically, fuzzy c-means methodology can be explained by a flowchart given in Fig. 3.

Fig. 3
figure 3

Flow Chart for fuzzy c-mean clustering

The membership function is repeatedly updated when the system parameters continuously change in a non-deterministic fashion. A neural network (NN) is used for these types of systems, as it is capable of modifying itself by adapting the weights. Neural network learns by modifying its structure rather than adding new rules to its knowledge base [26]. In this work, a backpropagation neural network (BPN) is implemented in output data to generate the fuzzy membership function for fuzzy classes. The relationship between the training data and the corresponding membership values (from FCM) is established by simulating it in BPN. The membership value is assigned to each training-data in different fuzzy classes. The performance of NN is checked by testing-data. After training of NN, it is used to determine the membership values of any input data in the different fuzzy classes (Fig. 4).

Fig. 4
figure 4

Training in neural network

A neural network use models that simulate the working model of the neurons in the human brain. It consists of two fixed layers an input layer and an output layer and one or more hidden layers. In the input layer, several neurons are equal to the number of input data to the neural network and in the output layer, the number on the neuron is equal to the number of output, but in the hidden layer, the number of neurons is optimized to minimize the error between the input and output predicted values [28]. Among the available neural network models, supervised learning neural networks are used to solve the parameter design problem with multiple responses and to establish a functional relationship between control factors and quality characteristics. In a neural network, a set of training input data along with a corresponding set of output data is trained to adjust the weights in a network. Then, the well-trained network is used to predict the membership functions to different fuzzy classes (clusters). The m-h-n neural network architecture indicates a basic three layered BPN represented by the m-h-n neural model, where parameters m, h, and n are the total number of neurons in input, hidden and output layers, respectively. For a multiple input and multiple output system, the data set for input and output comprising of vectors {(x1; x2; x3...xn); (y1; y2; y3.. yn)} are used. A weight wi as path joiner is randomly assigned in different layers. Then, an input x from the training data set is passed through the neural network, corresponding to which an output y is computed and compared with the desired output. The error (e) is computed as:

$${\text{e}} = {\text{Y}}_{{{\text{acctual}}}} - {\text{Y}}_{{{\text{desired}}}}$$
(10)

Error \({\text{e}}\). is distributed to the neurons in the hidden layer using a technique called back-propagation. The different weights \({\text{W}}_{{\text{i}}}\). connecting different neonsn the network are updated as:

$$W_{i} \left( {{\text{new}}} \right) = W_{i} \left( {{\text{new}}} \right) + \alpha ex_{i}$$
(11)

where \(\alpha\) is the learning rate, \(\mathrm{e}\) is associated with error, \({\mathrm{x}}_{\mathrm{i}}\) input to the ith neuron.

The learning rate is defined as the rate by which a neural network updates its weight to minimize the error. It should be kept low to escape the local optima. The input value \({\mathrm{x}}_{\mathrm{i}}\) is again passed through the neural network with updated weights, and the errors are computed. This iteration technique is continued until the error value of the final output is within the prescribed limit. This procedure is continued for all data in the training data set. Then, a testing-data set is used to verify the efficiency of the neural network to simulate the nonlinear relationship. When the network attains a satisfactory level of performance, a relationship between input and output data established, and the weights are used to recognize the new input patterns.

It is advantageous to use the FCM as it minimizes the uncertainty in assigning the membership function of crisp data into various fuzzy classes. The BPN as training data set used each data and its corresponding membership values from FCM. The membership values are assigned to training data by simulating input data and membership values in different fuzzy classes. Then the membership values are determined for input data. The defuzzification method is the conversion of a fuzzy quantity to a specific quantity. Among the various methods, the COA method is used for defuzzifying the fuzzy output function into crisp data [38, 39]. In this method, the fuzzy output \({\upmu }_{{\text{A}}} \left( {\text{y}} \right)\). transform into a crisp value \({\text{y}}\). It is given by the expression as in Eq. 12 [37]

$${\text{y = }}\frac{{\int {\mu _{A} } \cdot {\text{y}}\;{\text{dy}}}}{{\int {\mu _{A} } (y)\;dy}}$$
(12)

4 Particle swarm optimization

PSO is a stochastic optimization algorithm that was originally motivated by the thinking model of an individual of the social organism such as birds, fish, etc. by Kennedy and Eberhart [15]. The PSO has particles driven from a natural group with communications based on evolutionary computation and it combines self-experiences with social experiences. Here a contestant is considered as a particle and the objective is to get a global optimum. In PSO algorithm uses some flying particles from the search area as well as the movement towards a promising area. The flying particle is compared with changing solutions and the search area is compared with current and possible solutions. After each iteration, the particles update its position to a goal (fitness), and local neighborhood particles share memories of their “best” positions. The particle velocities and their subsequent positions are adjusted by these memories [32, 40]. In the standard PSO with ‘Z’ particles in the D dimensional search space, the potential solution can be represented by the particle’s position vector \({\mathrm{X}}_{\mathrm{i}}(\mathrm{t})\). The position \({\mathrm{X}}_{\mathrm{i}}(\mathrm{t})\), of the ith particle is adjusted by a stochastic velocity \({\mathrm{V}}_{\mathrm{i}}(\mathrm{t})\). Thus, the particle moves according to the following equation [32]:

$${\text{V}}_{{\text{i}}} \left( {\text{t}} \right) = {\text{WX}}_{{\text{i}}} \left( {\text{t}} \right) + {\text{C}}_{1} {\text{r}}_{1} \left( {{\text{pbest}}_{{\text{i}}} \left( {\text{t}} \right) - {\text{X}}_{{\text{i}}} \left( {\text{t}} \right)} \right) + {\text{C}}_{1} {\text{r}}_{1} \left( {{\text{gbest}}_{{\text{j}}} \left( {\text{t}} \right) - {\text{X}}_{{\text{i}}} \left( {\text{t}} \right)} \right)$$
(13)
$${\text{X}}_{{\text{i}}} \left( {{\text{t}} + 1} \right) = {\text{X}}_{{\text{i}}} \left( {\text{t}} \right) + {\text{V}}_{{\text{i}}} \left( {{\text{t}} + 1} \right)$$
(14)

where i = 1, 2, ….., N, \({\text{pbest}}_{{\text{i}}}\) \(\left(\mathrm{t}\right)\) is the best solution that particle \(\mathrm{i}\) s obtained until iteration generation t, and \({\mathrm{gbest}}_{\mathrm{i}}\left(\mathrm{t}\right)\) is the best solution obtained from \({\mathrm{pbest}}_{\mathrm{i}}\left(\mathrm{t}\right)\) in the whole swarm at iteration t. w is inertia weight, \({\mathrm{C}}_{1}\) is the cognition learning factor and \({\mathrm{C}}_{2}\) is the social learning factor; \({\mathrm{r}}_{1}\) and \({\mathrm{r}}_{2}\) are the random numbers uniformly distributed in (0,1) [29, 30, 41]. The procedure for implementing the PSO is given by the following steps.

Step 1: Initialization of swarm positions and velocities: A uniform probability distribution function is used in the D dimensional problem space to initialize a population (array) of particles with random positions and velocities.

Step 2: Evaluation of fitness of particle: The fitness function is maximized in evaluation of fitness value of each particle.

Step 3: Comparison to \(\mathbf{p}\mathbf{b}\mathbf{e}\mathbf{s}\mathbf{t}\) (personal best): Each particle’s fitness is compared with the particle’s \(\mathrm{pbest}\). If the obtained current value is superior than the exsisting \(\mathrm{pbest}\), then the \(\mathrm{pbest}\) value is replaced with the current value and the \(\mathrm{pbest}\) location is updated by current location in a D-dimensional space.

Step 4: Comparison to \(\mathbf{g}\mathbf{b}\mathbf{e}\mathbf{s}\mathbf{t}\) (global best): The fitness is compared with the previous \(\mathrm{pbest}\) population. If the obtained current value is superior than \(\mathrm{gbest}\), then \(\mathrm{gbest}\) is resetted to the array index and value of current particle.

Step 5: Updating of each particle’s velocity and position: Modify the velocity \({\mathrm{V}}_{\mathrm{i}}\), and particle’s position, \({\mathrm{X}}_{\mathrm{i}}\), according to Eq. 13 and Eq. 14 respectively.

5 Result and discussion

The experiments have been conducted as per the experimental plan shown in Table 2. Four responses are measured as explained in Sect. 3. Out of four responses, two responses such as MRR and circularity are to be maximized whereas two responses EWR and Ra are to be minimized. Since the responses are contradicting in nature, they were converted to S/N ratio to make them into the same characteristic nature as explained in Sect. 4. The S/N ratios of responses are shown in Table 3. The S/N ratios exhibit large variation as evident from Table 3. Therefore, they were normalized using Eq. 5 and normalized values are presented in Table 3. Then, a supervised learning BPN is modeled to find the membership function. These normalized data sets have been clustered by using fuzzy clustering into four fuzzy classes \({\mathrm{R}}_{1}\), \({\mathrm{R}}_{2}\), \({\mathrm{R}}_{3}\), and \({\mathrm{R}}_{4}\). There are twenty seven numbers of data sets as listed in Table 3, each of them comprising four responses or coordinates. The matrix \(\mathrm{U}\) as shown in Table 4, gives the value of the membership of each data into four fuzzy classes. This occurs when the objective function of fuzzy c-means (FCM) is converged after thirty four iterations. The numbers for each cluster indicate the experiment number or run number.

Table 3 Signal-to-noise ratio and Normalized value
Table 4 Membership values from FCM

\({\mathrm{R}}_{1}\) = 4 8 10 12 14 16 18 20 22 24.

\({\mathrm{R}}_{2}\)= 5 6 7 23 25 26 27.

\({\mathrm{R}}_{3}\)= 2.

\({\mathrm{R}}_{4}\) = 1 3 9 11 13 15 17 19 21.

Since four input and four output parameters were included in the experiment, similar numbers of neurons in input layer and output later have been chosen. The hidden layer neurons were determined by various Backpropagation Neural Networks (BPN) model. The models have been chosen to achieve performance error equal to 0.001. Six BPN models 4–5–4, 4–6–4, 4–7–4, 4–8–4, 4–9–4, 4–10–4 have been selected. Data set 1–18 are selected as training data and data set 19–27 have been used to test the performance of the selected neural network. Finally, BPN architecture 4–8-4 showed minimum root mean square error (RMSE). Learning and momentum parameters were set at 0.12 and 0.50. The number of epochs the BPN was run was 31,250. In spite of higher number of iterations to converge at a final value, low learning rate was used to ensure the neural network to escape from local optima.

Initially quasi-random weights have been assigned for four layers. Thereafter, the data serial number 1 with input co-ordinates \({\mathrm{X}}_{1}=0.125995\) 0, \({\mathrm{X}}_{2}=0.698949\), \({\mathrm{X}}_{3}=1\), and \({\mathrm{X}}_{4}=0.4145741\) corresponding to the output membership 0.0546 (\({\mathrm{R}}_{1}\)), 0.1098 (\({\mathrm{R}}_{2}\)), 0.0028 (\({\mathrm{R}}_{3}\)) and 0.8328 (\({\mathrm{R}}_{4}\)) was entered. The output of the network was computed and compared with the desired output to calculate the error. The initial assigned weights were repeatedly adjusted to minimize the error, until the target achieved to 0.001 as shown in Fig. 5. Similarly, data set (2–18) are entered and weights are readjusted. The data set (19–27) are used to test the performance of the network. It has been observed that after 31,250 iterations, the network achieves a satisfactory level of error as shown in Fig. 4. The obtained adjusted membership values are shown in Table 5.

Fig. 5
figure 5

a Regression plot for training data b Regression plot for testing data

Table 5 Adjusted membership value

Regression curves are plotted as shown in Fig. 5 and Fig. 6, between actual membership function and predicted membership function via neuro-fuzzy model for training data and testing data respectively. Figure 5 and 6 shows that the data are well fitted because a high degree of correlation coefficient (R) = 0.99897 for training and (R) = 0.99854 for testing data. This indicate that the data are well trained and can be predicted. Figure 6 shows the membership functions for output predicted by neural network. After getting fuzzified value, it is needed to defuzzify them to get a crisp value containing the combined quality characteristic which can be used as higher the best criteria. This has done by center of area (COA) method. These defuzzified data are called MPCI, listed in Table 6.

Fig. 6
figure 6

Membership function plot

Table 6 Multi-response Performance Characteristic Index (MPCI)

The MPCI values are considered as single response and analysis of variance shown in Table 7, it is observed that the factors \({\mathrm{I}}_{\mathrm{P}}\), \({\mathrm{T}}_{\mathrm{on}}\), and \(\uptau\) and square terms \({\mathrm{I}}_{\mathrm{P}}\times {\mathrm{I}}_{\mathrm{P}}\) and \({\mathrm{T}}_{\mathrm{on}}\times {\mathrm{T}}_{\mathrm{on}}\), and interaction \(\uptau \times {\mathrm{F}}_{\mathrm{p}}\) are statistically significant. The highest effect was observed in \({\mathrm{I}}_{\mathrm{P}}\) followed by \({\mathrm{T}}_{\mathrm{on}}\), \({\mathrm{T}}_{\mathrm{on}}\times {\mathrm{T}}_{\mathrm{on}}\), \({\mathrm{I}}_{\mathrm{P}}\times {\mathrm{I}}_{\mathrm{P}}\), \(\uptau \times {\mathrm{F}}_{\mathrm{p}}\). \({\mathrm{I}}_{\mathrm{P}}\) has highest effect because it directly contribute to the heat generation. As \({\mathrm{I}}_{\mathrm{P}}\) increases the spark become stronger and more erosion arise. The process model is obtained by regression analysis as given in Eq. 15 and coefficient of determination (R2) was found to be 91.72%.

$$\begin{aligned} {\text{MPCI}} = & 0.404 + 0.200 \times {\text{I}}_{{\text{P}}} + 0.104 \times {\text{T}}_{{{\text{on}}}} + 0.036 \times {\uptau } - 0.039 \\ { }\quad \times {\text{F}}_{{\text{p}}} + 0.120 \times ({\text{I}}_{{\text{P}}} \times {\text{I}}_{{\text{P}}} ) + 0.128 \times ({\text{T}}_{{{\text{on}}}} \times {\text{T}}_{{{\text{on}}}} ) + 0.004 \times { }\left( {{\uptau } \times {\uptau }} \right) + 0.029{ } \times \left( {{\text{F}}_{{\text{p }}} \times {\text{F}}_{{\text{p}}} } \right) \\ \quad - 0.062{ } \times \left( {{\text{I}}_{{\text{P}}} \times {\text{T}}_{{{\text{on}}}} } \right) - 0.029 \times \left( {{\text{I}}_{{\text{P}}} \times {\uptau }} \right) + 0.082 \times \left( {{\text{I}}_{{\text{P}}} \times {\text{F}}_{{\text{p}}} } \right) + 0.001 \times \left( {{\text{T}}_{{{\text{on}}}} \times {\uptau }} \right) \\ \quad + 0.033 \times \left( {{\text{T}}_{{{\text{on}}}} \times {\text{F}}_{{\text{p}}} } \right) + 0.039 \times ({\uptau } \times {\text{F}}_{{\text{p}}} )\left( {{\text{in}}\;{\text{coded}}\;{\text{form}}} \right) \\ \end{aligned}$$
(15)
Table 7 ANOVA for MPCI

Figure 7 shows the response surface for MPCI in relation to the process parameters of discharge current and pulse on time. It can be seen from the figure that the MPCI tends to increase rapidly with an increase in peak current for any value of pulse-on-time. Figure 7 also indicates that maximum MPCI value is obtained at high peak current (7 A) and high pulse on time (300 μs). This is due to their principal control over the input spark energy. As the discharge current increases, it generates a strong spark which produces a higher temperature as a result of more material was eroded from the work piece. Figure 8 shows the response surface for MPCI in relation to the process parameters of discharge current and duty factor. It can be observed from the figure that MPCI increases as \({\mathrm{I}}_{\mathrm{P}}\) increases for any value of \(\uptau\).

Fig. 7
figure 7

Surface plot of MPCI vs \({\mathrm{I}}_{\mathrm{P}}\), \({\mathrm{T}}_{\mathrm{on}}\)

Fig. 8
figure 8

Surface plot of MPCI vs \({\mathrm{I}}_{\mathrm{P}}\), \(\uptau\)

PSO technique was used to determine the optimal parameter setting using the model shown in Eq. 15. The PSO algorithm was run for 100 iterations but converged at 59 iterations as shown in Fig. 9. The optimal value of MPCI is obtained as 0.946 at parametric values of \({\mathrm{I}}_{\mathrm{P}}\)=0.96, \({\mathrm{T}}_{\mathrm{on}}\)=0.99, τ = 0.11, \({\mathrm{F}}_{\mathrm{p}}\)=0.105 in coded form. These values are decoded using Eq. 2 and actual values of factors are found to be \({\mathrm{I}}_{\mathrm{P}}\)=6.92 A, \({\mathrm{T}}_{\mathrm{on}}\)=298.5 µs, \(\uptau =85.5\mathrm{ \%}, {\mathrm{F}}_{\mathrm{p}}\)= 0.3 kg/cm2 respectively. The results obtained for peak current and pulse on time can be validated by comparing it with results obtained in similar work presented in [13], and [15]. The conformation experiment was conducted with the closet value setting available in EDM machine (\({\mathrm{I}}_{\mathrm{P}}\)= 7 A, \({\mathrm{T}}_{\mathrm{on}}\)=300 µs, \(\uptau =85\mathrm{ \%}, {\mathrm{F}}_{\mathrm{p}}\)= 0.3 kg/cm2). The corresponding MRR was obtained as highest value 7.037 (mm3/min), TWR = 3.265 (mm3/min), Ra = 0.855 µm and circularity = 7.59. These values are the optimal responses in the conducted experiments.

Fig. 9
figure 9

The convergence curve

The machined surface of experiment numbers 22 (\({\mathrm{I}}_{\mathrm{P}}\) = 5 A, \({\mathrm{T}}_{\mathrm{on}}\) = 300 µs, \(\uptau\) = 80%, \({f}_{P}\)=0.3 kg/cm2) and 23 (\({\mathrm{I}}_{\mathrm{P}}=5\mathrm{A}, {\mathrm{T}}_{\mathrm{on}}=100\upmu \mathrm{s}\), \(\uptau =90\mathrm{\%}\), \({f}_{P}\)=0.3 kg/cm2) was analyzed under a scanning electron microscope at 1000 magnification (Model JEOL JSM-6480LV). Figure 10 shows pores and micro-cracks for experiment number 22 and Fig. 11 shows a similar observation for experiment number 23. It can be observed that a few larger pores and small number of micro-cracks are present in Fig. 10. In Fig. 11, more number of small pores and more number of micro cracks are present. The increase of pulse-on-time establish more heat between tool and workpiece. The higher thermal gradient may give rise to high residual stress causing more pores and larger cracks.

Fig. 10
figure 10

SEM micrographs showing pores and cracks for experiment number 22

Fig. 11
figure 11

SEM micrographs showing pores and cracks for experiment number 22

6 Conclusions

EDM is widely used by researchers as well as manufacturers for different objectives such as quality manufacturing. Experiments were modeled using RSM design and the four performance characteristics were estimated. The multiple responses were converted into a single equivalent response (MPCI) via a neuro-fuzzy approach. A regression model was developed on MPCI and PSO was used to find the optimal setting. The best set of parameters were achieved at Ip = 6.92 A, Ton = 298.5 μs, τ = 85.5%, Fp = 0.3 kg/cm2 for combined optimization of four responses. The confirmation experiment was conducted with the closet value setting available in EDM machine (Ip = 7 A, Ton = 300 µs, τ = 85%,Fp = 0.3 kg/cm2). The input current has a more significant effect on the EDM responses. The increase of pulse-on-time developed larger pores and micro-cracks on workpiece surface (from SEM image). The combined approach of design of experiment with neuro-fuzzy system and PSO technique holds good for solving multiple response problem. This methodology, being simple and robust, can be applied in other situations where multiple conflicting objectives are desired to be optimized simultaneously.