Fault diagnosis for PV system using a deep learning optimized via PSO heuristic combination technique

A heuristic particle swarm optimization combined with Back Propagation Neural Network (BPNN-PSO) technique is proposed in this paper to improve the convergence and the accuracy of prediction for fault diagnosis of Photovoltaic (PV) array system. This technique works by applying the ability of deep learning for classification and prediction combined with the particle swarm optimization ability to find the best solution in the search space. Some parameters are extracted from the output of the PV array to be used for identification purpose for the fault diagnosis of the system. The results using the back propagation neural network method only and the method of the back propagation heuristic combination technique are compared. The back propagation algorithm converges after 350 steps while the proposed BP-PSO algorithm converges only after 250 steps in the training phase. The accuracy of prediction using the BP algorithms is about 87.8% while the proposed BP-PSO algorithm achieved 95% of right predictions. It was clearly shown that the results of the back propagation heuristic combination technique had better results in the convergence of the simulation as well as in the accuracy of the prediction of the fault diagnosis in the PV system.


Introduction
With the fast development of the renewable energy technology, it contributes as a basic source of electricity in many countries. Renewable energy produces 18.18% from the share of electricity worldwide in 2016 according to the latest reports [1]. Due to the increase recently in the PV generation and the wide use of it worldwide, PV faults had aroused which attracted a lot of attention. These faults influence a lot Faculty of Engineering Electrical Power Department, Cairo University, Giza, Egypt the reliability and the performance of the PV system. The causes that make these faults occur may be partial shading fault, temperature fault, modules aging, cell damage and the short-circuit or the open circuit of the modules of the PV [2][3][4][5][6]. Temperature faults arise from the high temperature of the surface of the PV panels after the sunlight absorption. Partial shading fault occurs due to the presence of clouds or fallen leaves or dust. Both short and open circuit faults occur after long time operation of PV due to modules aging. The principle of finding out the fault occurrence in a PV system helps a lot in preventing the system degradation and in obtaining the system's reliability. The problems caused by the faults occurrence in the PV systems is affecting the operating efficiency, damage may be caused to the system components and may also result in dangerous fire threats and safety hazards.
Bilal Taghezouit et al. [7] presented a detecting of faults strategy based on double exponential technique. This method proved efficiency and applicability for different faults detection. This work had its drawbacks too. It could work for one scale, so it was not suitable for multiple scales. Using deep learning techniques-as stated-will have a high ability in improving the results. Bilal Taghezouit et al. [8] also designed an efficient method using principal component analysis model and multivariate monitoring schemes were used for fault detection. Although of the good results achieved, the designed method is good only for detection in one scale (time) not for multiscale system. Zhicong Chen et al. [9] proposed the random forest (RF) ensemble learning algorithm for the detection and diagnosis of PV early faults. The used RF method takes some fault features as the operating real-time voltage and current strings. It also applies a method of grid-search for optimizing the RF model parameters. The types of faults applied for study are degradation, open circuit, line-line fault and partial shading. This method could reach a high accuracy in prediction making it a good method. In the proposed technique different types of faults are predicted.
Haizheng Wang et al [10] applied the analysis of uncertainty method to the fault diagnosis of PV. A modeling probability method for a PV array parameter distribution is presented. It could solve the nonlinearity as well as uncertainty of the PV output interval. This method needs more verification because of the diversity of the different characteristics of different faults.
Yuanliang Li et al. [11] proposed fault diagnosis method based on the identification of the fault parameters as a diagnosis method for faults of the array of PV. It can recognize faults and describe them quantitatively through the identification of the faults of the parameters using the (I-V) curve of the PV array. It has a drawback described in that it is suitable to be used in case of good irradiance so the case of partial shading if occurs the method is not having the same performance.
Ling Chen et al. [12] presented a method for fault diagnosis using back propagation neural network with Levenberg-Marquardt (L-M) algorithm for the modules of the PV. The fault diagnosis for PV modules is designed on basis of long-distance wireless fault diagnosis using Zigbee technology. This method was able to detect four kinds of faults like short circuit, partial shading, open circuit and abnormal degradation.
Qiang Zhao et al. [13] proposed a PV fault diagnosis method using Fuzzy C-mean clustering which was used for the clustering of the PV fault samples. The fuzzy membership algorithm was also used in this work for the final fault diagnosis. It has the advantage of classifying the fault data from the normal data without previous knowledge.
JingnaPan et al. [14] suggested a fault diagnosis method using an uncertainty analysis based on nonparametric statistical modeling. A method for acquiring the threshold of fault diagnosis is proposed. This was a new idea for setting a dynamic threshold for fault diagnosis.
The artificial neural network [15][16][17][18][19] is a method for simulating the human brain in the way it is used for solving problems. The BP neural network [20][21][22][23][24][25] is a commonly used approach for fault diagnosis and is a multilayer fed forward network composed of three layers or more. The training of the forward network is done by calculating the error for the back propagation algorithm. Different weights are used for the connection of back and front layer neurons. The input and the output layers are connected via the hidden layer. The connection weights are revised until reaching the values with the least error between the actual and the expected values. The response of the input accuracy increases as the correction of the error is done. BP neural network has a disadvantage that the data base having long training time is required to make convergence. This long training periods of time as well as choosing improper samples may lead to low accuracy for prediction of faults in PV systems. That is why using Particle Swarm Optimization with BP neural network was proposed for the purpose of solving these mentioned disadvantages.
The main motivation for the work presented in this paper is the huge effect that results from the availability of different faults in the PV system which cause direct degradation in the performance of the system. This degradation leads to malfunction of the system. This reason aroused the idea of applying a new technique for fault diagnosis in the PV systems. In this work some contributions are achieved such as studying the PV performance under various occurrence of faults by using some features for recognition such as the short-circuit current (I SC ), open circuit voltage (V OC ), the voltage value at maximum power point (V m ) and the maximum power (P m ). This used method decreases the speed of running and the time of execution of the diagnostic method. The performance of the proposed algorithm which is back propagation neural network combined with PSO method is evaluated in the diagnosis of faults for PV systems. This method combines both the ability of the global search of the PSO algorithm as well as the local search ability of the back propagation neural network. The BPNN-PSO techniques improve the convergence of the diagnostic method as well as increasing the accuracy of the prediction of the fault diagnosis of the photovoltaic systems. So, when comparing the contribution of this method with other works done before it would be clear that this work could efficiently predict several types of faults that happen to the PV system and cause its degradation. Other works can only predict a type or two of these faults [26]. The main contribution as well is the combination of a heuristic optimization technique with a deep learning neural network technique which has the ability of providing a better learning method for obtaining correct predicted faults.
To make a comparison with the state-of-the-art methods with the current presented work, it can be clearly shown that a lot of methods have been proposed for the same purpose of fault diagnosis with different techniques. If we particularly consider the methods applied before using deep learning different techniques and use them for comparison, a summarized comparison is shown in Table 1. Comparative analytics is provided based on the state-of-the-art of different deep learning techniques in this table. This paper is organized as follows: Sect. 2 demonstrates the system of fault diagnosis that was used and the parameters for these faults recognition. Section 3 presents the hybrid proposed algorithm of the combined Back propagation neural network and Particle swarm optimization (BP-PSO) for the faults detection in PV systems. Section 4 shows the used data in applying this technique. The faults prediction and diagnosis were studied using MATLAB and results are shown in Sect. 5. A conclusion is finalized in Sect. 6. Figure 1 shows the schematic diagram of the system that is fault diagnosed. It is composed of a PV array module having four modules in series and three in parallel (4 × 3) having a DC load, system alarms and some modules for recording the PV system states. A BP-PSO network is the tool used for diagnosis.

The proposed system configuration
The change of PV parameters under different faults occurrences is analyzed by simulating a PV module using MATLAB/SIMULINK referring to the mathematical model given in [30]. This model is selected for simulation as it is a model that can be practically applied so the outcome can be easily shown in real time.
Some specifications of the PV module are given in Table  2 when standard test conditions are applied: irradiance = 1000 W/m 2 , temperature = 298°K. These conditions are the optimum conditions that are available in the PV system external environment that is why they are selected. These studies had involved the same idea of fault diagnosis for PV systems using nearly the same parameters but with different techniques such as Voltage and Current Observation and Evaluation [31] and string level monitoring for fault diagnosis [32]. Different faults where considered during the simulation such as: a) partial shading, b) faults of temperature and c) aging cells faults which appears when different series resistances are used. The curves of I-V (current voltage) and V -P (voltage power) are formulated as displayed in Figs. (2,3,4).  As the parallel resistance value of the PV cells increases, the values of V oc and I sc have only small changes while the values of V m and P m decreases as shown in Fig. 2(a) and (b).
In Fig. 3(a) and (b), the effect of changing the value of the series resistance is illustrated. When the series resistance decreases the values of V oc and I sc changes only a little while the V m and P m increases.
As the temperature of the cell increases as shown in Fig. 4(a) and (b), I sc increases while the values of V m , P m and V oc decreases. These changes are caused due to the concept of negatively correlating of the bandgap with the ambient temperature. As this ambient temperature increases, the forbidden band center is approached by the Fermi energy of the PV gradually. The diffusion coefficient of the PV is related positively with Fermi energy and I sc .
The above paraded results are an indicator that I sc , V oc , V m and P m are used to show if a fault occurred or not as well as the type of this fault. Their values are used as identification parameters and so used as an input matrix: This paper is concerned with six kinds of faults as listed in Table 3.

Back propagation-particle swarm optimization algorithm
The BP-PSO technique is used in this paper which mainly has a great advantage of combining both the ability of local search of the BP neural network and that of the global search in the PSO [33][34][35]. Using this hybrid technique results in quickly getting faster solutions for the prediction of faults in the PV array. Making this combination results in high efficiency prediction of fault types which is considered as an important contribution as well as the idea of emerging the PSO onto the deep learning technique. The normalization of the test data is made, and they are then used as inputs to the input layer then training occurs, and sigmoid function is then applied in the training layer where the mechanism of learning and the classification are done. The fault-data of the PV system are then processed to the PSO layer for the classification of faults. The optimization results are then obtained at the output layer. Clearly, the methodology depends on first    Temperature and partial shading combination 5 Temperature and cell aging combination 6 applying the training using BP neural network where the sigmoid function is applied and then the PSO algorithm is processed for fault classification. This fault diagnosis method schematic diagram is shown in Fig. 5.
The proposed BPNN-PSO method proposed steps are as follows: a) The faults recorded are normalized as they have different magnitudes.
Due to the different magnitudes of the used identification parameters I sc , V oc , V m and P m , the method of linear transformation is used to normalize the input matrix X [36].
where x i j is the initial input matrix, z i j is the normalized input matrix, x min is the minimum value of each row of matrix X, x max is the maximum value of each row of matrix X, y min is the minimum value of each row in the normalized matrix, y max is the maximum value of each row in the normalized matrix.
b) The sigmoid function is applied to the normalized faults and the optimal results are used to be the particles in the search space of the PSO.
Using suitable number of neurons and activation functions for the BP-PSO neural network results in making the training process faster in convergence and takes less time [37][38][39][40].
The applied sigmoid function used is given by The linear activation function is given by The number of the hidden neurons required is calculated by where floor(y) is a function used for round down such as floor(3.2) = 3,n is the number of neurons in the hidden layer, n i is the number of neurons in the input layer,n 0 is the number of neurons in the output layer, a is a constant. The weight given by C i j is a connection weight that connects the neuron i available in the hidden layer with the neuron j of the input layer in the back propagation neural network using the sigmoid function. The weight given by W i j is a connection weight that connects the output layer with the hidden layer using the linear function. By examination, it was clear that by increasing the number of neurons in Fig. 5 The schematic diagram of the proposed BP-PSO fault diagnosis method in PV system the hidden layer, the accuracy of training of the back propagation neural network increased.
The output matrix that is the result of the output layer is expressed as follows L in = P out (9) where: Z in is an input normalized matrix, Z out is the output matrix after applying the sigmoid function, P in is the input matrix to the particle swarm optimization layer, P out is the output matrix after applying the particle swarm optimization, f 1 is the processing method of the particle swarm optimization, L in is the input matrix-after applying the PSO-to the output layer, L out is the output matrix processed by the linear function to the output layer. c) Update the position and the velocity of the particles after applying the sigmoid function using the particle swarm optimization algorithm.
The particles of the PSO algorithm are chosen by taking the optimal results of training as the initial particles in the search space. The particles had an initial position x i and initial velocity V i . The best local position is given as p i and the best global position of the whole swarm is given by p g [41][42][43]. The update of the velocity and the position of the particles is done using the following equations: where w is the inertia weight factor ∈ (0, 1), t is the number of iterations, c 1 , c 2 are the cognitive and social components, respectively,r 1 , r 2 are independent random numbers that are set between 0 and 1, p t i is the local best previous position of the ith particle in iteration t, p t g is the global best previous position among all the particles in iteration t, x t i is the ith particle's current position in iteration t, x t+1 i is the ith particle's next position in the next iteration, v t i is the ith particle's current velocity,v t+1 i is the ith particle's next position. d) Evaluate the fitness function of the particles in the particle swarm.
The assigned fitness function is the mean square error of the neural network which is given as: where j is the states for faults occurring in the PV system, j= 1, 2, ...,6 M SE j is the mean square error for the j faults, y i j.des is the output desired value, y i j.out is the output actual value of the j th neuron.
When the particle's new position is better than the current local best position, the local best position is updated. If this particle's position is better than the global best position, update the global best position to be the new particle's position. e) Stop the algorithm when reaching the maximum number of iterations or when small error is achieved.
If the maximum number of iterations are achieved or small error is reached, then stop the algorithm and output the results. Otherwise get back to step 3, until these requirements are met.

Computational complexity
The complexity of the proposed BP-PSO is an important factor that should be taken into consideration. If the hidden layers have M neurons, the BP-PSO algorithm will require approximately 5 M + 3 multiplications and 5 M + 2 additions. It is also important to take into consideration the number of iterations which will increase the computational complexity. If the number of iterations is P, then the complexity will be approximately increased P times. Taking an example for this, if M = 3 and P = 2 so the proposed algorithm will attain only 36 multiplications and 34 additions.

Data analysis
The reliability of the PV array used is verified by building it in MATLAB /Simulink where the simulation was done based on the standard conditions. The Dataset used is from [30] which is taken from Sunpower SPR-X20-250-BLK module and its parameters are obtained from National Renewable Energy Laboratory (NREL). The number of sets of values of data samples used for the BP-PSO NN is 300 when the irradiance ∈ [100 w/m 2 ,2000 w/m 2 ], temperature ∈ [273.15   Table 4.

Results and discussion
The data collected were applied to BPNN and to the BPNN-PSO used in this paper using MATLAB. The mean square error of the 240 training samples of the fault diagnosis of the PV system is shown in Fig. 6.
The solid blue line in Fig. 6 represents the mean square error in the process of training. The dotted line is the targeted mean square error. Convergence is clearly achieved by 350 steps using the BP neural network while convergence occurs by only 250 steps using BPNN-PSO. It is also shown that the mean square error of the BPNN-PSO is less than that of the BP neural network. This ensures that the BPNN-PSO used in this paper results in faster convergence, highly efficient training process and great accuracy in fault diagnosis. However, the complexity of this method arises from the time the algorithm takes to reach the results which can be seen not to be very long theoretically, but practically it will take more time to get the fault types occurrence.
The error histogram with 20 bins for the BPNN-PSO for fault diagnosis of the PV system is shown in Fig. 7. The bins represent the number of vertical bars observed on the graph. Each vertical bar represents the samples number from the data set. It shows the error between the target and the predicted values just after the training of the neural network. Zero-error line represents the zero-error value on the error axis. The zero error here is at the bin with center -0.00292.
Ten samples of the test data were selected for the purpose of comparison of the ability of the prediction of the fault type using both BP and BPNN-PSO. The results of this comparison are shown in Table 5. To emphasis this comparison and show that the performance of BPNN-PSO is better than the BP, Fig. 8 shows a comparison graph indicating the performance of both algorithms. The last two columns in Table 5 show whether the predictions are correct or wrong using both algorithms; this √ indicates correct predictions and × indicates wrong predictions. In the samples, a case of prediction was wrong which was sample 7 and was predicted faulty as fault index 4 instead of 5. This occurs in deep learning commonly as the learning process can never give 100% accuracy at all the time. This can be considered well during the system design.
By making a comparison of the results accuracy achieved in this work with previous work in [9], it is found that the accuracy of prediction of the faults in the PV array system described is 95% while that in "Random forest based intelligent fault diagnosis for PV arrays using array voltage and string currents" by Chen, Zhicong, et al. [9] is 85% accuracy. Also, the accuracy of prediction in "Assessment of machine learning and ensemble methods for fault diagnosis of photovoltaic systems" by Adel Mellit, et al. [44] is 81.73%. The accuracy of prediction of faults in "Cost-effective fault diagnosis of nearby photovoltaic systems using graph neural networks" by Jonas Van Gompel, et al. [45] is 87.5%.
Referring to all the above work, the performance of the proposed algorithm of BPNN-PSO shows dominance and higher performance in classification. This proves the superiority of the applied work of BPNN-PSO than the back propagation (BP) neural network with Levenberg-Marquardt (L-M) algorithm proposed in PV fault diagnosis [9], ensemble learning (EL) method proposed in [44] and graph neural networks (GNN) IN [45]. To measure the overall performance, it is typically done by the success rate which is defined as the ratio between the correctly classified instanced to the entire instances. One of these measures is the F-score. The weighted F-score is used as a reference which is defined as the average of all F-scores resulted in each class which are (F-inspect, F-monitor and F-running). It is calculated by: where where A is the number of correct instances classified and C is the number of incorrect instances classified. While recall is calculated by: where B is the number of correct instances but not classified. When calculating the F-score for the BP-PSO neural network presented, it was found to be 0.973 indicating a high performance in the fault occurrence classifications.
Another performance metrics that can determine the performance of the proposed BP-PSO algorithm are precision  and recall. Precision is defined as the ratio of the correct positive observations relative to all observations predicted positively in the actual class. Recall is the ratio of the correct positive observations predicted to all observations in the actual class.
By calculating the precision and recall of the proposed BP-PSO neural network as a sort of performance indication, it was found that the precision is 90.75% and the recall is 88.56% indicating a high accuracy performance of fault diagnosis.
The applied algorithm shows outstanding results for fault detection. This have practical implications in PV systems as the process of faults detection improves a lot the efficiency, reliability as well as the safety of the whole system. If these faults are not detected, high cost will be associated with the power loss from the PV module. The staff responsible for

Conclusion
The parameters I sc , V oc , V m and P m were chosen as the identification parameters for the system's fault diagnosis after the analysis of the PV output. The proposed algorithm of the BP-PSO neural network was applied for the purpose of predicting the fault type that occurs in the PV system. These types include temperature faults, cells aging, partial shading faults, the combination of temperature and partial shading faults and the combination of temperature and cell aging faults. The results of the simulation show that the proposed algorithm significantly improves the convergence and has higher prediction accuracy for the faults type. The back-propagation algorithm converges after 350 steps while the proposed BP-PSO algorithm converges only after 250 steps in the training phase. The accuracy of prediction using the BP algorithms is about 87.8% while the proposed BP-PSO algorithm achieved 95% of right predictions. This algorithm can intelligently predict the type of fault in real time without more hardware support. The impact pf applying this to various PV systems is of huge contribution. The fault detection using this algorithm with this accuracy increase the lifetime of the system, reliability and safe functionality. Although a lot of methods were previously introduced, this method has high accuracy, classification advantage and quick detection. It is important to determine that this property of fault detection could make the mission of PV systems maintenance easier, especially the large-scale systems. As a result, no effort or time is needed to be wasted to determine the fault type. Consequently, the technique can find a solution for the sudden reduction of power that occur due to unexpected failures. These previous results can encourage the recommendation of the energy and power societies to increase using AI techniques for the purpose of classification and detection of faults. This could make a huge jump in the production of power in energy systems through avoiding failures. This algorithm also makes the task of maintenance much easier. Governments should also focus on this purpose by raising the investment for developing monitoring techniques. As a result, gathering the data will be with high accuracy and quickly. The PV solar industry will be benefited from using the proposed algorithm specifically PV systems of large-scale. The promising achieved solution of fault detection will be able to gain much better optimization in cost, time and maintenance efforts. The prediction was not correct at some points which may be considered as a limitation. This work can be applied to other PV systems to test its performance. In the future, the PSO algorithm can be modified by changing the values of C 1 andC 2 to be changed according to a certain equation so they will not be constant. This can change the optimum results and may increase the accuracy of prediction. It can also be applied practically as a future work.
Funding Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). The fund to this is study was a personal fund only without any other outside help.
Availability of data and materials All data used in the paper are referred to in the references used in the paper.

Conflict of interests
The authors declare that they have no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.