Introduction

Testing is essential for every software project, whether simple or complex. Every software must go through the software lifecycle to achieve a successful outcome and simple maintenance. Most of the time is spent identifying bugs and errors that must be fixed. There are numerous types of testing, including functionality testing, system testing, unit testing, and integration testing, among others. This testing contributes to the enhancement of the proposed model, so that no crashes or problems are encountered during its execution.

Ad-hoc testing with test-driven development can also be very beneficial in reducing the number of defects [1]. One of the most common tasks in machine learning is data classification. Machine learning is one of the most important features for mining information from massive databases of enterprise operational records. Machine Learning in Medical Health Care is a rapidly growing field with enormous potential for providing prediction and a deeper understanding of medical data. Most machine-learning methods rely on a set of features that define the learning algorithm’s behaviour and, in turn, influence the performance and complexity of the resulting models. In the past decade, heart disease has been the leading cause of death across the globe. Several machine-learning techniques have been applied to the diagnosis of heart disease in the past. Neural networks and logistic regression are two of the few machine-learning techniques that have shown some success in the diagnosis of heart disease [2].

For the application, we accessed the UCI repository of datasets for machine-learning models. We are considering patient data from a Cleveland hospital containing approximately 400 patients with their names removed from a column to protect their privacy. The application under consideration falls under the umbrella of machine learning. Numerous researchers, businesses, and organizations use machine learning to improve without being explicitly programmed [3, 4]. This method relies on patterns and inference rather than explicit instructions to complete a task. There are three primary types of learning: supervised, unsupervised, and reinforcement.

In preprocessing, supervised learning considers labelled data, whereas unsupervised learning uses unlabeled data. For application modelling, we are considering artificial neural networking with multilayer perceptron. A multilayer perceptron is a very useful approach for making the model learn; for the given data set, which contains around 400 patient records, we used ReLu for forward propagation [5]. By comparing the above algorithms to various other approaches, we have overserved an improvement in the accuracy of the result.

We discovered that accuracy improved for the model once the developer was able to rectify it and backpropagation of the multilayer perceptron improved the weights which used the Adams optimizer once developer was able to rectify and backpropagation of the multilayer perceptron improved the weights which used the Adams optimizer.

We tested different epoch values until the difference in accuracy was stable. The results show increased accuracy, precision, recall, and sensitivity. Only by testing with various test scenarios can the problems that will be encountered be analyzed and understood, which can be a challenge in and of itself [6].

The remaining sections have the following distribution: the literature review talks about various papers referred to that have been proposed in this research field, followed by the methodology which consists of the proposed model and procedure approached, next is the results obtained and comparison and then finally conclusion with future enhancement.

Literature Review

Models for the heart disease dataset have undergone extensive research and development over time. The majority of individuals research several algorithms for the diagnosis of heart disease, including neural network, nearest neighbors, Naive Bayes, and logistic regression as well as hybrid strategies using the aforementioned algorithms [7]. The work by author [8] gives a learning-driven approach that analyzes the dataset and learns the pattern to identify the results. At first logistic regression was utilized by the author to acquire 77% exactness. Whereas another Author [9, 10] used Naive Bayes calculations on the dataset and came to the characterization correctness of 81.48% which was a good improvement. One of the authors examined looking at changed information and mining procedures in the conclusion of coronary illness patients, these systems included naive Bayes, and choice tree, and furthermore, the neural system which was used for the first time on the dataset and the outcomes demonstrated that the naive Bayes could accomplish the best precision in the determination of coronary illness patients [11].

Kmeans bunching [12] is a standout among the most well-known grouping systems; anyway beginning centroid determination is a basic issue that emphatically influences its outcomes. This paper researches applying various techniques for introductory centroid determination, for example, go, inlier, anomaly, arbitrary quality qualities, and arbitrary column strategies for the kmeans grouping method in the conclusion of coronary illness patients. Indira [13] utilizes probabilistic fake neural systems BF arrange. This which is a class of spiral premise work RB strategy is valuable for programmed design acknowledgement, non-linear mapping what’s more, estimation of probabilities of class enrollment, and probability proportions. The information utilized in the tests was taken from Cleveland coronary illness database with all out 576 records and 13 restorative properties. The best exactness execution achieved 94.60%. KNearestNeighbour is a standout among the most generally utilized information mining strategies in arrangement issues [14, 15]. Its straightforwardness and moderately high assembly speed settle on it a famous decision. Anyway, a principle burden of KNN classifiers is the enormous memory prerequisite expected to store the entire example. At the point when the example is enormous, reaction time on a consecutive PC is likewise huge [16].

The fundamental activity of software engineering is software testing. The software is executed by an operation to detect mistakes or bugs. The testing methods and the software testing strategies are discussed in detail here. The techniques of developing the test instances are important phases in the testing process. Numerous test methods have to be developed to evaluate as all the bugs in the software cannot be found.

Himanshi Babbar [17] discussed the significance of software testing in the software development life cycle. Roughly 60% of resources are used for software testing. Manual or automated testing may be possible. Software testing is an activity which focuses on evaluating the program’s capacity and requires that it genuinely achieves outcomes in quality. Testing is widely divided into three levels: testing unit, testing integration, and testing system. Testing is used at this moment to create bug-free software. The most common test cases and testing technologies for error detection are defined here. Many test instances assist detection of bugs.

Software product line (SPL) engineering was created by Isis Cabral et al. [18] and has been used to improve the effectiveness and quality of software generated. With SPL engineering, engineers may methodically create families of products with clearly defined and managed reusable asset sets. The results of the two software product lines’ case studies are documented. With both SPLs, the same error detection outcomes are achieved as all products can be tested. More analyses show that the grouped version of the basic path algorithm can also be used for testing subfamilies of the SPL identified by alternatives. In this technique, the feature model was used to control the selection of test cases and the test effort required can be decreased using a graphical selection algorithm to retain the deletion capability. In testing every product, errors are identified using the FIG Basis Path Method, testing only 6% or 24% of our SPL goods and only 10% of the test instances as are all products in the best situation. The Covering Array method was the most efficient non-graphical technique, which needed us we tested between 13 and 54% of goods in the same schemes. This method of failure identification can be economically adaptable in the case of the topic with only optional characteristics.

Naresh et al. [19] projected that in addition to the elevated level of process maturity, the implementation of the defect preference approach is also a most precious investment. The detection of mistakes in the development life cycle enables avoidance of the passing of mistakes from the design to the coding. Analyses conducted across three businesses demonstrate the significance of using defect avoidance methods to deliver a high-quality product. Investment in the correct DP operations rather than investing in rework that is seen as a result of uncaught failures is the main focus of quality costs’ investment. There are several defect avoidance methods, techniques, and procedures. The most effective defect detection and avoidance method have proven to be software inspection. The objective of continuously achieving 99% defect-free software relies heavily on efficient methods for defect avoidance.

Antonia Bertolino [20] summarized that one observation concerns the many successful connections between software testing and other fields of study. By addressing particular issues of software testing, numerous interesting possibilities are ignored at the boundary between testing and other disciplines. Some have just been discussed here, for example, model control methods or the use of search-based methods to generate test inputs or the use of test methods to evaluate efficiency characteristics. A more holistic strategy to study tests by software readers can discover and enjoy numerous fresh and exciting synergies across the software engineering study fields.

Rasneet Kaur Chauhan et al. [21] emphasized the software testing strategies, methodologies, principles, and tools. A software testing approach integrates different software case design techniques in a well-scheduled sequence of steps leading to effective software testing. Therefore, software test strategies for testing are essential. The strategy for software testing is usually created by experts, project managers, and software engineers. It involves four methodologies and some rules to be followed accurately. Several software testing instruments are accessible on the market. Some were used for a very long period and fresh instruments with many fresh features were also created [22]. Every software project focuses primarily on quality. Software testing techniques are methods of quality measurement. The driving force of growth and application is software testing research. It is also essential to constantly synthesize new accomplishments and new features, and suggest various ideas to support the research on system engineering software testing and facilitate fast developments in the field of software testing [23].

Shah et al. [24] gathered various surveys on testing that can be useful before starting the testing process and it is important to overcome the issues faced. Zain Amin and Ali [25] have applied the multilayer perceptron on a dataset from large-scale healthcare from daily-based information which can help save a life; his study shows a caesarian section for an expecting mother to do surgery that utilizes the MLP with backpropagation to determine the medical operation. Some of the main emergencies of the caesarian are immediate threat to the child or mother or no maternal but early delivery required the author has considered the dataset on 80 patients who are pregnant. The case study resulted in 95% accuracy. We can improve this by enlarging the dataset and including some more grave attributes.

Proposed Methodology

In the proposed model, we are using the approach of neural networking that comes under artificial intelligence which is widely being used to make the model behave more intelligent, pattern recognition, optimization, associative memory, and predictions can be done using the above algorithms, and the dataset needs to be a supervised labelled set have some features that help in analysis and learning in the hidden layers of the network. The overall flow diagram of the proposed model is shown in Fig. 1.

Fig. 1
figure 1

Overview of the proposed model

“Shallow learning architecture” describes classifiers based on machine learning that typically have just one layer of non-linear feature transformation. Support vector machines, logistic regression, decision trees, XGB, and other systems are examples of shallow structured architectures. Even though they have demonstrated efficient performance in straightforward or well-constrained scenarios, their limited modelling and representation capabilities struggle to handle the vast training data sets. Due to their restricted capacity for learning, machine-learning models with shallow structures perform poorly as the training data grow increasingly large. An alternate solution to address this is the Artificial Neural Network (ANN). As a result of their effective learning capacity, ANN-based prediction models are proposed in this work.

The artificial neural network is a computational model that works like a human brain; this produces artificially structured neurons that gives a gateway for data to transfer. A computational system can produce a linear and non-linear response with the help of multiple, simple, highly connected elements. These elements process the information to produce a dynamical state response based on external inputs.

From Fig. 2, we can see the different hidden layers along with the input and output layer, and this is a general representation of the artificial neural network. The type of neural network that is used for our model is a multilayer perceptron. In the case of multilayer perceptron, it can learn non-linear functions and has more than one hidden layer, so that arbitrary accuracy of the activation function increases, depending on the number of neurons each layer has used. At any hidden layer neuron node, a computation takes place where it receives the input from the previous node along with an associated weight, which gets valued and assigned randomly depending on the relative importance to other input layer values. At the node, the function f is applied, and we have used the ReLu for the activation function, ReLu (rectified linear unit) generally takes a real-valued input and gives a threshold to zero, it replaces the negative values to zero

$${\text{Neuron}} = ({\text{weight1}}*{\text{x1 }} + {\text{ weight2}}*{\text{x2 }} + \cdots + {\text{bias)}}{.}$$
(1)
Fig. 2
figure 2

Neural network general structure

The above function is called an activation function, where the numerical inputs such as × 1 × 2 and so on along with the weight associated with each node are calculated which can be used as the input in the next hidden layer if present or the output layer. Bias gives a trainable constant value addition to input values received by the neuron

$${\text{f}}\left( {\text{x}} \right) = {\text{max}}\left( {0,{\text{ x}}} \right).$$
(2)

The above ReLu equation is used for the activation function in our model, and the general representation of the graph is shown in Fig. 3. ReLu is generally lesser expensive for computation compared to tanh and sigmoid as it consists of the less complex.

Fig. 3
figure 3

ReLu graphical representation

The dataset is divided into training and testing, the training data result is compared with the predicted result, and the error rate between the predicted and actual result is compared, so that the error is reduced and the model learns. We are using Adam’s optimizer for the backward propagation. As ANN involves correlation-based feature selection approach, it measures the correlation between each feature and the target variable. Features that have a high correlation with the target variable are considered more important and are selected for training the ANN. This approach helps to identify the most relevant features in the proposed work.

The proposed framework has calculated the output error and propagated these errors back through the model using the backpropagation to calculate the gradient value to update the parameter for the model. To achieve, we are using Adam’s optimizer which is similar to the classical stochastic gradient descent, but it uses the iterative based on the dataset that gives faster and better results, generally in gradient descent maintains a learning rate in a single phase for all weights and the learning rate does not change during the training of the train data of the dataset. In the case of Adam’s optimizer, learning rate is maintained for each model weight which is used for the neuron and individually adapted as learning unfolds.

Figure 4 shows the complete model implementation concerning the artificial neural network having forward propagation and backward propagation.

Fig. 4
figure 4

Multilayer perceptron model

This helped in the analysis of the model, so the learning of the model used various numbers of epochs, so that we could get better accuracy. Learning started with 500 epochs and with an increment of the epoch by 100 could observe the accuracy increased.

For the testing part, here, the functionality testing of the application is used and the accuracy of the model improved as the testing phase showed a few errors and bugs, once the error was rectified the results improved and helped in better prediction. For the prediction of a new patient, a GUI interface is used that helps in predicting if the patient can have heart disease or not.

Results

The experimental findings for the ANN algorithm employed for this work are reported in this section. The ANN algorithm’s effectiveness was evaluated using classification datasets and benchmark functions. The benchmark functions and classification datasets are the two sections that make up this section. Experiments were executed having the specifications of the system as follows: Intel i5-3470 3.20 GHz, Intel Windows 10, and 6 GB of memory.

From the proposed model the accuracy increased, compared to other supervised learning algorithms, the ANN learns the model and predicts the results more accurately. The learning takes some time and gives different measurements such as sensitivity, recall, precision, and accuracy. These measurements help in understanding the success of the model.

Figure 5 shows the confusion matrix of the dataset. It shows the true positive and false negative

$$\begin{aligned} {\text{Sensitivity }} = & \, \left[ {\left( {{\text{no}}.{\text{ of classes correctly predicted as true}} + {\text{ve}}} \right)/\left( {{\text{total no}}.{\text{ of actual changes for true }} + {\text{ve case}}} \right)} \right]*{1}00. \\ {\text{Sensitivity }} = & \, \left[ {{\text{True}} + {\text{ve}}/\left( {{\text{True}} + {\text{ve}} + {\text{False}} - {\text{ve}}} \right)} \right] \\ {\text{Precision }} = & \, \left[ {{\text{True}} - {\text{ve}}/\left( {{\text{true}} + {\text{ve }} + {\text{False}} + {\text{ve}}} \right)} \right] \\ {\text{Recall }} = & \, \left[ {{\text{True}} + {\text{ve}}/\left( {{\text{True}} + {\text{ve}} + {\text{False}} - {\text{ve}}} \right)} \right] \\ {\text{F1-score }} = & { 2}*\left[ {\left( {{\text{precision}}*{\text{recall}}} \right)/\left( {{\text{precision }} + {\text{ recall}}} \right)} \right]. \\ \end{aligned}$$
Fig. 5
figure 5

Confusion matrix

Table 1 shows the results obtained by the various measurements. The total average of the model after the functional testing and rectifying the error is equal to 96.69%. Figure 6 shows the area under the curve which helps in measuring pure prediction discrimination that helps to obtain the accuracy of the test depending on how well the test separates the group being tested into those with and without heart diseases.

Table 1 Results obtained from various measurements
Fig. 6
figure 6

The area under the curve

Finally, Fig. 7 shows an example of the GUI used to predict for new patients if they are having heart disease or not.

Fig. 7
figure 7

Application showing the result

Conclusion and Future Work

This work contributes to a better understanding of the fundamentals of learning with artificial neural networks by applying them to labelled datasets. After carrying out functionality testing on the application, the accuracy of the application scales to 96.69%, which is a higher percentage than can be obtained by any other means. From a future perspective, researchers can improve the results using more machine-learning algorithms or undertaking further end-to-end testing, such as system testing, which can uncover any integration issues or flaws. In addition to creating a steadier outcome, increasing the amount of the dataset including information on heart problems collected from hospitals can also help deliver a more accurate prediction.