1 Introduction

Internet of Things (IoT) technology has attracted the attention of many researchers in recent years. This technology uses in various fields. In the Internet of Things, everything is intelligent and can connect to the network and exchange data. In IoT, all objects within this network can communicate with each other. With sensors that have devices, Devices can collect environmental information and send data to a base station. In this technology, smart devices have two types of sensors, sensors of communication and data collector sensors. Devices can communicate with their surroundings by sensors. Devices of IoT collecting data and transmits this data to other devices by network layer [18].

Today, the Internet of Things is used in various applications such as smart homes [10] and smart cities [23]. With this communication technology, different infrastructures in a smart city such as transportation systems, health, and industry connect. The smart city is one of the applications of the Internet of Things. In most cases, heterogeneous deceives in the Internet of Things are interconnected and managed by applications programs.

The smart city has many intelligent deceives, so a large amount of data is generated, which is big data [31]. Big data in smart cities certainly need to be processed in the Distributed computing systems to analyze data in real-time. Distributed computing systems such as cloud computing [16] are uses to process data of the smart cities. In recent years, new technology for big data processing based on edge computing has emerged. In edge calculations, the capabilities of the sensor layer and data generating objects use for processing, and this model is called fog computing [1]. Fog computing with the cloud computing layer provides a high ability to process big data in the smart city. Big data is not the only challenge for smart cities. Smart cities also face vital challenges in data security. Today, a lot of data is transmitting in smart cities. Data sent in the smart city needs to be protected to maintain the confidentiality of information so that it is not compromised. One of the vital security challenges in smart cities is the issue of medical data security. A lot of data is exchanging in the smart city, including health-related data. Health-related data must be transmitted confidentially over the network and its infrastructure.

The security of medical data and its transmission in the smart city is an important issue because if this data is intercepted in the middle of the transmission path, then pieces of information will be leaked. In the transmission of medical data, it is possible that the confidentiality of the data will be lost and the data manipulated during the transfer. Manipulation of medical data in smart cities disrupts treatment and even kills patients in some cases. According to the presented materials, encryption of medical information and data is vital in smart cities. One of the new treatment techniques in smart cities is sending patient records on the Internet of Things and blockchain to other medical centers [11]. Each medical center can express the opinions of its doctors by analyzing patients’ files. Opinions of medical center physicians are a vital factor in improving the diagnosis and treatment of patients and are effective in treating patients.

The data and information sent between medical centers need to be analyzed. Analysis of data is usually performed by data mining and machine learning methods. As an example in [24], new methods for diagnosing Covid 19 disease with deep learning are served. In [19], artificial neural network-based analysis was used to diagnose heart disease. In [15], the support machine vector is used to diagnose diabetes so that patients can be identified as healthy individuals. In [20], the random forest was used to diagnose lung cancer. In [32], a fuzzy algorithm is used to diagnose colorectal cancer. Analysis and evaluation with data mining methods reveal the latent knowledge of raw data in diagnosing the disease.

A challenge in diagnosing the disease is that these tests should only be made available to medical centers so that other people cannot see the patient’s private information. The purpose method of this article is to maintain the privacy of patients ‘data using blockchain technology and to analyze the content of patients’ files by data mining and machine learning methods.

Protecting patients’ confidential information is one of the priorities of specialized hospitals and clinics. Patients do not want their nature to be revealed to others. In many cases, celebrities, such as politicians, strictly protect their personal information and information related to their illness. In some cases, disclosing the diseases of people who are active in business can cause their company’s stock to fluctuate. The most important motivation of the research, in addition to the presented cases, is the preservation of human values ​​and non-disclosure of patients’ information. Another motivation for the article is that the blockchain is very safe and secure. Another motivation for the research is to use the consultation of medical centers by a majority vote to suggest appropriate treatment for the patient.

The authors’ contribution is to provide a two-tier architecture for analyzing patient records and to provide treatment recommendations with data mining methods in one layer. The data and the contents of the patients’ records are distributed in the medical centers using blockchain technology. The patients’ information between the medical centers (for consultation for treatment) is private and done through the blockchain. The key issues and our contribution in this study are as follows:

  • Maintaining the confidentiality of information and records of patients with blockchain.

  • Presenting a binary version of the HHO algorithm for feature selection.

  • Machine learning with a feature selection mechanism in each node participating in the blockchain to detect the pattern of heart disease.

  • Using the majority learning to diagnose better treatment in the participating nodes of the blockchain.

The importance of the proposed method is the use of blockchain to maintain the confidentiality of data and information of patients. In the next layer, the content of patients’ information and records analyze by data mining and machine learning methods to discover the hidden pattern inside the records. The novelty of this article is in the optimal analysis of patients’ file information with modern swarm intelligence methods such as the Harris Hawks Optimization (HHO) algorithm [12], in combination with machine learning. Another innovation of this article is to provide a secure framework in the blockchain platform for the confidential sending of patient information. This article has been prepared and compiled in several sections. In the first part, an introduction to the paper was presented. The second section examines research-related issues such as the blockchain and related studies. In the third part, the proposed method for maintaining confidentiality and analyzing patient data performs.

2 Related work

This section reviews the work related to the research topic. At the beginning of this section, blockchain technology is introducing. The third section provides an overview of studies in the scope of blockchain and the health and privacy of data.

2.1 Centralized and decentralized authentication

In decentralized authentication, a copy of the information sending to all nodes on the network. In centralized encryption, only one system on the network authenticates users and information. In the distributed mode, all systems participate in encryption and authentication. Because in a distributed cryptography system, all systems participating in cryptography are questioned simultaneously, so this type of cryptography is more robust and more stable. Hacking a non-distributed or centralized method requires that only the central system be hacked. In the decentralized and distributed type, all systems involved in cryptography need to be hacked, which is practically impossible. Blockchain technology is decentralized authentication technology. This technology contrasts with centralized encryption technology, and this difference is shown in Fig. 1 [27].

Blockchain is a technology for encryption, authentication, and validation at the level of several distributed nodes in a network. Blockchain is a technology for distributing information and querying a group of nodes to validate data and information. Concentrated and decentralized methods each have advantages and disadvantages. The centralized authentication only stores data of the authentication in one system and does not consume much memory. In decentralized authentication such as blockchain, all nodes need to store information and consume high memory [27].

Fig. 1
figure 1

The difference between centralized encryption (left image) and decentralized encryption (right image) [27]

2.2 Blockchain

Blockchain is a distributed database of information and data. A main feature of the blockchain is the state of distribution of this technology and the way it is distributed. Each node in this technology has a copy of the information or data. If a system wants to modify this data, it must convince all nodes that the information is being manipulated and updated. There is a unique network in blockchain technology. Each node in this authentication communication network can exchange data and information directly with other nodes. There is no central database or system in blockchain that controls the exchange of information and data over the network. Change in the information present as a request or transaction. This transaction sends to all network nodes. If all nodes receiving the transaction confirm the next step, the data add to the blockchain or version of all node information. A process of creating a blockchain to obtain patient information and records is shown in Fig. 2 [26].

Fig. 2
figure 2

Creating a chain of blocks in the blockchain [26]

Each block in the blockchain authentication network has different sections and components. In the blockchain, structural information related to transactions place inside the block. A timestamp defines for extracting new blocks and validating them to add to the blockchain. Encrypted information or hashes associated with each block address in the blockchain. In blockchain technology, the data is inside the blocks. Each block has a section for encryption.

The encrypted part is called the hash code. The hash code is typically generated by an asymmetric cryptographic algorithm. In this method, the blocks associated with the blockchain connect to the block connects by hash. The search process uses to create the blocks. A block is created by the exploration process and connected to the blockchain. In general, there is a definite time for exploring a variety of blockchain methods. This timestamp is different in digital currencies that use the blockchain. How to connect the blocks to the blockchain is a vital issue. Each block contains data for encryption. Each block also has a section that holds the address of the block connected to the current block and generates by asymmetric encryption. Each block contains a hash code that indexes the previous block in the blockchain. A blockchain and how the blocks are connected is as shown in Fig. 3:

Fig. 3
figure 3

Connecting blockchains with hash blocks

Using a hash creates a blockchain of blocks. Indicate each block its position in the chain by indexing it with the hash of blocks. In some cases, there are branches in the blockchain. This state does when two blocks in a chain create at short intervals. Branching chains review by increasing their length from a threshold value. Each transaction in the blockchain generates a 64-character hash code. This code is combined with its previous hash code to create a new block and add it to the chain. A new block is always added to the prior block to form a link to the chain. In Fig. 3, the block_1 hash is the Genesis block, acting as a key to compute the block_2 block hash. Similarly, block_2 hash acts as block_3 input, etc. [26].

2.3 Review

Blockchain was introduced as a mechanism for securing bitcoins. Blockchain has now evolved as a core technology for several decentralized applications. Blockchain is a useful technology for managing sensitive data, especially in the healthcare, medical research, and insurance sectors. Health care has three main components: research insurance, treatment, medical and health service stakeholders, or patients. In the field of health care, privacy and security breaches are increasing every year. More than 300 violations were reported in 2017, and 37 million medical records were affected between 2010 and 2017. The digitization of health care has led to the confirmation of concerns about safe storage, ownership, sharing the records of patients, and medical data [29]. The blockchain serves as a way to address the fundamental challenges facing healthcare. The challenge of secure sharing of medical records and compliance with data privacy rules is being addressed with the blockchain.

The healthcare sector is evolving day by day. Advances in healthcare are due to various technologies such as the Internet of Things and blockchain. In the field of health, the blockchain use for privacy. Several studies with the blockchain privacy approach reviewing. The study [8] reviews the use of the Internet of Things and blockchain in health programs. They presented a review of various health care programs that integrate the Internet of Things and the blockchain. This study examined six programs in medical services include remote patient monitoring, electronic medical record management, disease prediction, patient tracking, drug tracking, and the fight against infectious diseases, particularly COVID-19.

This study also examines the challenges associated with blockchain technology in IoT-based systems and some available solutions. Some guidelines have been put forward for possible new research that could revolutionize the healthcare sector using other technologies such as artificial intelligence, big data, fog computing, and cloud computing.

The study [13] reviews the use of blockchain technology in the field of healthcare. This study discusses potential challenges such as scalability and storage capacity, blockchain size, global interoperability, and standardization. This study highlights perspectives on health data and the sharing process, clinical trials, the pharmaceutical industry, big data, artificial intelligence, security, and privacy.

In the study [9], a framework for assessing the readiness to accept blockchain in health care introduce. They provided an assessment framework for blockchain readiness in the field of health. Their framework includes the complex interaction of various factors, social structures, and institutional mechanisms and includes all major stakeholders. Their proposed framework applies to the UAE health care sector. Findings show the multifaceted importance of government readiness in the blockchain initiative. It has been shown that large companies are more willing to take advantage of the opportunities offered by the blockchain. Lack of clarity in blockchain rules and regulations and matters of privacy and trust affect the readiness of all stakeholders. The proposed framework and findings of this study will be useful in guiding policy interventions and developing support mechanisms to strengthen the context of blockchain acceptance.

In the study [25], a health care system with blockchain technology introduces to diagnose diabetes. Diabetes is a rapidly growing chronic disease that has increased the mortality rate worldwide. This study provides a framework for diagnosing diabetes with blockchain cryptography. In the proposed method, different machine learning classification algorithms have been used for the early diagnosis of this disease. The proposed framework electronic health control system and records information of patients safely. Their shared framework combines symptom-based disease prediction, blockchain, and interdepartmental file systems. In this context, patient health information is collected through wearable sensors. The collected data of the patient is eventually sent to the proposed system administrator to implement a machine learning model for further processing. The results, along with the physiological parameter store in the data blocks in the blockchain with the approval of the relevant patient and his physician. Their proposed system will help the health community to safely store, process, and share patient health information.

In the study [2], the role of blockchain technology in Telehealth and Telemedicine has been investigated. Telemedicine and telemedicine provide telemedicine services to reduce the incidence of diseases, including Covid 19. These health practices can be effective in managing scarce health resources to control Covid 19 patients in hospitals. Their research showed that blockchain technology could provide remote health and telemedicine services by providing remote health care services in a decentralized, anti-manipulation, transparent, traceable, reliable, trustworthy, and safe manner. Improve the round. These technologies enable health professionals to detect fraudulent medical records and medical test kits commonly used for home-based diagnosis. The results show that the widespread deployment of blockchain in telemedicine and telemedicine technology is still in its infancy.

In the study [14], a blockchain-based integration method and a reliable veterinary clinic information management system using predictive analysis were presented. Recent advances in information management systems combined with machine learning algorithms have paved the way for a dramatic revolution in the healthcare industry. However, data in such systems suffer from various challenges such as security, reliability, and convenience. A new solution is needed to increase data accessibility, and security policies must be adjusted appropriately. The purpose of this study is to use machine learning algorithms to predict and authenticate with blockchain.

In a study [28], they proposed a blockchain-based health program for predicting diabetes in fog computing technology. In their method, patient health information is collected from fog nodes and stored in a blockchain. The new rule-based clustering algorithm is initially used to cluster patient health records. Finally, diabetes and heart disease diagnosis using an adaptive feature-based fuzzy neural inference system. Experimental results show that the proposed work effectively diagnoses the disease. The proposed work is more accurate and about 81% compared to the neural network algorithm.

In the study [5], they offered a safe model for the transmission and detection of medical images with the help of blockchain on the Internet. In this research, in-depth learning using blockchain image transfer and a secure detection model for the Internet of Things environment is presented. The proposed model includes several processes, namely data collection, safe exchange, hash value encryption, and data classification. First, Elliptic curve cryptography (ECC) is applied, and the optimal ECC switch is generated using hybridization of grasshopper with fruit fly optimization (GO-FFO) algorithm. In their method, a deep belief network is used for the classification process to diagnose disease.

In the study [30], an intelligent IoT-based healthcare framework using blockchain technology with an Optimal deep-learning-based secure blockchain (ODLSB) model is present. This study provides a secure blockchain model based on deep learning. The proposed model includes three main processes of secure trading, hash value encryption, and medical diagnosis. Their proposed method includes an Orthogonal particle swarm optimization (OPSO) algorithm for the secret sharing of medical images. The hash value encryption process performed using the Neighborhood indexing sequence (NIS) algorithm. In their proposed method, the optimal deep neural network (ODNN) uses as a classification model to diagnose diseases. Experiments showed that the OPSO-DNN model in medical diagnosis has sensitivity, diagnosis, and accuracy of 92.75%, 91.42%, and 93.68%, respectively.

In study [22], blockchain-based image steganography and PSO algorithm introduce. In this study, a new way to ensure the updating and sharing of COVID-19 data in decentralized hospitals is modeling. Updating and secure sharing of large amounts of health care information between hospitals are challenging. There are two issues related to the confidentiality and integrity of health data. Network security vulnerabilities may be a concern for data availability. According to the authors, no study provides safe updating and sharing solutions for large amounts of health care information in communication channels between hospitals. Therefore, this study proposes and discusses a new method based on steganography as a solution. In the first step, before hiding, the embedding capacity of each image is estimated. The second step is to hide the Covid 19 data using the PSO algorithm. The third step is image transfer based on blockchain technology.

A review of studies shows that most recent research has used the blockchain to transmit medical data. Studies have used the blockchain to store patient data and records in a distributed network. One of the challenges of studies in this area is that blockchain nodes do not process patient data to share. In other words, in the reviewed studies, if a patient file needs to be analyzed by medical centers, then these hospital centers need to share their analysis with other nodes. Our contribution to this research is to present a data analysis approach based on machine learning, feature selection, and blockchain technology. The proposed method innovation is to diagnose a person ill or healthy based on all nodes participating in the blockchain. If a node wants to analyze a patient file and comment on the file, it needs the approval of the other nodes in the blockchain. In the continuation of this article, the proposed method and structure used in it and its phases, such as data storage in the blockchain, feature selection, and learning based on the majority vote, will be describing.

3 The proposed method

In most cases, it is necessary to use the advice of several medical centers or hospitals to treat a person. One way to improve the treatment process for patients is to share patients’ records with other treatment centers. Each medical center should apply its analysis to the data and then provide analysis to other medical centers. A medical center can use patients’ opinions to use the optimal treatment for them and ultimately use most of them for the type of treatment. In the proposed method, each treatment center uses a learning algorithm to diagnose the type of disease.

For analysis in medical centers, machine learning algorithms such as decision tree, random forest, support vector machine, multilayer neural network, AdaBoost and Bayesian network are used. The machine learning algorithm in each medical center plays a role in stimulating the doctors’ opinions of that center in diagnosing the disease. These opinions and consultations are shared in medical centers. The recommended method for each hospital uses the blockchain to store and send information to increase data confidentiality. The proposed framework for diagnosing the disease and maintaining data confidentiality has several steps as follows:

  • Storage and transmission of patient data by blockchain.

  • Analyze blocks or information of patients with chain blocks by machine learning and feature selection.

  • Diagnosis of the disease in a medical center by a majority vote, which will be explained in the next section.

In the proposed framework, heart disease use in analyzes. Patients’ records contain information related to heart patients, and this data is used for analysis. The following phases of the proposed algorithm for diagnosing heart disease and sending it in the form of a blockchain describe. First, the Harris Hawks Optimization (HHO) algorithm explain because this algorithm is used in feature selection in the proposed method. In the second phase, the framework of the proposed method is described, and then the steps of the proposed method and data analysis with machine learning and blockchain are discussed.

3.1 Harris Hawks Optimization (HHO) algorithm

The Harris Hawks Optimization (HHO) algorithm is a meta-heuristic algorithm with a swarm intelligence approach and modeling on the behavior of Harris Hawks in nature. This algorithm has swarm intelligence behaviors with a hunting approach. Harris Hawks can detect various chase patterns based on the dynamic nature of hunting scenarios and escape patterns. The Harris Hawks hunting behavior is shown in Fig. 4 [12]:

Fig. 4
figure 4

Mechanism of swarm intelligence hunting in Harris Hawks Optimization (HHO)

Harris hawks are considered as solutions to the problem, and prey is considered as the solution. In this algorithm, the prey is identified first, and the prey here is the rabbit. The current optimal solution is considered as the rabbit position. One of the behaviors in the HHO algorithm is the quiet or soft siege behavior. A quiet siege is a behavior in which harris hawks move slowly towards the prey and search around the prey. In a difficult siege, any monster or harris hawks can get straight to the rabbit and dive towards it. Hard siege modeling is shown in Fig. 5 [12]:

Fig. 5
figure 5

Hard siege behavior in the HHO algorithm

In a soft siege, each harris hawk flies fast around the prey. In this mechanism, the hawks move towards the prey at a suitable opportunity. In this type of behavior, the harris hawks dive from a height and flies away from the prey. Then, as it decreases in height, it moves slowly towards the prey. In HHO algorithms, each eagle can determine its flight path according to the gathering center of other harris hawks. The desired behavior for moving a harris hawk with a mean point can be seen in Fig. 6. By average population position and its optimal position, the harris hawk approaches the prey with a gentle siege [12]:

Fig. 6
figure 6

Rapid dive behavior in the HHO algorithm

By iteration of the HHO algorithm, the position of the harris hawks and the rabbit or the current optimal solution constantly update. In the last iteration, the prey position extract is the optimal solution. Experiments show that their HHO algorithms find the most accurate solutions from GA, PSO, FA, BA, BBO, CS, and DE algorithms.

3.2 Framework of the proposed method

The framework of the proposed method for diagnosing heart disease and sending data confidentially with blockchain is shown in Fig. 7. Its purpose is to send patients’ medical information and records to other medical centers in the smart city. With this mechanism, the opinion of doctors in different medical centers is well received and considered the best treatment. One mechanism is to send information to other medical centers without maintaining confidentiality. This method does not work because in treating people, preserving medical information is a principle. A good way is to use blockchain encryption technology.

In the proposed method, blockchain technology is used to send patients’ files and confirm the type of treatment. In the proposed method, patient information and records are placed in a block and sent as a blockchain to other medical centers. Each treatment center can specify the type of treatment and block attachment and send the data block to all treatment centers. In the proposed method, each medical center, by receiving information and patient records, predicts the appropriate treatment using a data mining method.

This article is about maintaining privacy and blockchain for medical applications. The innovations used in this article are as follows:

  • Most studies use the blockchain to maintain information confidentiality. This study tried to use the block to maintain the confidentiality of patients’ files that are transferred between hospitals.

  • The proposed method is a medical advisory system. In the proposed method, patient file data and information are sent to physicians in other hospitals for consultation. The purpose of sending patients’ files to other medical centers is to use the opinion of other doctors. However, in sending and receiving information, the proposed method is used to increase the confidentiality of data from the Chinese block.

  • The opinion of the medical centers is shared in the proposed method and the treatment is finally selected that is approved by all medical centers and their doctors.

  • Machine learning and feature selection methods have been used to analyze and evaluate the data in patients’ files.

  • A binary version of the HHO algorithm for feature selection is developed. The role of the HHO algorithm is to select important features in patients’ records for more detailed analysis in medical centers.

The appropriate type of treatment is determined using learning by a majority vote in the participating nodes in the blockchain.

In the proposed method, data mining techniques play the role of information analysis in treatment centers. In the proposed method to discover the best treatment for patients, according to Fig. 7, the following steps were performed:

  • Patient information and their records collected from patients.

  • Patient records are pre-processed, and an important step in data preprocessing is the use of a normalization process.

  • The data send in the form of a block of blockchain technology for medical centers in the smart cities, and each medical center receives the data.

Fig. 7
figure 7

Framework of the proposed method for diagnosing heart disease and maintaining the confidentiality of patients’ records

  • Data of each block receiving by a healthcare system. Data analysis by machine learning and data mining to predict disease progression.

  • The patterns discovered by each treatment center are added to the block by machine learning and sent to other treatment centers by the blockchain.

  • To increase the accuracy of machine learning methods in each medical center, the feature selection phase is used in each medical center.

  • The HHO algorithm is optimized to select a feature from the algorithm. This algorithm use in many applications, and its accuracy in finding the optimal solution is remarkable. The role of the HHO algorithm is the optimal selection of important features of heart patients.

  • Each medical center confirms and authenticates a block and sends it to the primary treatment center.

  • The primary treatment center can receive the recommendations of the treatment centers by receiving the block and uses the majority voting mechanism to decide on the type of treatment.

  • Each treatment center can use a machine learning technique, and ultimately the majority vote is used. In the proposed method, each medical center, by receiving the data of each block, can apply the doctors’ opinion about the data or analyze it by learning their machine.

3.3 Steps of the proposed method

The steps of the proposed method for diagnosing heart disease describe below. The structure of the blocks used in the blockchain describes preprocessing feature selection and majority-based learning.

3.3.1 The structure of each block

Patient information needs to be specified in a specific format for submission in the blockchain. According to Fig. 8, each block contains at least information such as the unique number of each patient, the unique number of each hospital, patient information and data, the time of the creation of the block, the hash of information of each block, and the order of placement in each block.

Fig. 8
figure 8

Coding of patient information in a block

The following components use in each block belonging to the blockchain:

  • Block number must be unique.

  • The label whose purpose is the date the block was created and can be created with the TimeStamp operation.

  • The hash encrypts the contents of a block until each bit of the block is manipulated, then the hash information is changed, and the block is removed from the blockchain.

  • A hash index that specifies which blocks a chain connects to.

  • Data related to patient record information, part of which is predicting patients’ status through data mining methods.

Blockchain technology is a distributed cryptographic method. Each block sends to other members after being created as a blockchain member. Each medical center performs an analysis of medical data with its data mining method and sends the final analysis to all groups in the blockchain, and consult their medical opinion with other nodes. The proposed SHA-256 notification method uses the SHA-2 cryptographic hash function family. For patient-related information, a collection of heart patient data [17] use. Its features (the features that place in the blockchain data block) are as follows:

  • Age: One of the vital features of this data set is that its value is an integer.

  • Gender: It has two values ​​of zero and one, which determine the sex of men and women, respectively.

  • Type of angina: Angina (chest pain) is caused by a partial obstruction of the coronary arteries of the heart. Neo angina does not receive enough blood to the heart tissue, especially during exercise or heavy activity or exposure to stress. If chest pain at rest or the pain does not improve within a few minutes, the risk of heart attack increases, and the patient should be taken to the nearest medical center. Angina is usually temporary and sometimes chronic, persistent pain. In this data set for the angina field, the values ​​of Typical angina, Atypical angina, Non-anginal pain, and Asymptomatic angina consider, which are indicated by the numbers 1, 2, 3, and 4, respectively.

  • Resting blood pressure: This feature measures a person’s blood pressure at rest, usually in a hospital. The doctor makes sure the person is not physically active before measuring this feature.

  • Serum cholesterol (chol): Serum cholesterol is a measure of the amount of cholesterol in the blood, including HDL, LDL, and some other blood fats. In a healthy person without other cardiovascular risk factors, serum cholesterol is less than 200 mg/dL. This property is a continuous and numerical property expressed in milligrams per milliliter.

  • Fasting blood sugar: which is a feature, is in milligrams per milliliter. If the amount of this feature is more than 120 mg/ml, the value of this feature will be one. Otherwise, the value of this feature will be zero.

  • Electrocardiography results: This feature has values ​​of 0, 1, and 2 depending on the shape and curve of the electrocardiographic chart.

  • Maximum heart rate achieved.

  • Exercise-induced angina: Angina pectoris is a condition in which a patient develops chest pain that is the source of pain in the coronary arteries. Angina pectoris caused by a lack of oxygen in the heart muscle. This pain is mostly in the middle of the left chest and can spread to the left arm, and of course, sometimes has two arms and components such as the jaw and the middle part of the two shoulders. Exercise-induced angina pectoris occurs when a patient develops this pain based on relatively intense exercise. Depending on the presence or absence of this pain, this characteristic has two values ​​of one and zero.

  • ST depression induced by exercise relative to rest.

  • The slope of the peak exercise ST segment, which has three modes of sloping, flat, and low slope diagrams, is indicated by the values ​​of 1, 2, and 3, respectively.

  • Number of major vessels (Ca) seen in color imaging is a number between 0 and 3.

  • The type of thalassemia is the last input feature of this data set, which has three values ​​of 3, 6, and 7.

3.3.2 Pre-processing blocks

A vital part of any health care system is the data collected from patients. The data must have a suitable structure and format for machine learning and send in the network. The following phases and steps performed in the preprocessing of data related to the treatment system:

  • Data that is empty or has an empty value was ignored or filled based on the average value of that attribute or field.

  • All data convert to numerical data.

  • The collected data is normalized to be ready for machine learning.

  • The data used in machine learning and data mining has a set of features, each of which has a specific range.

Some of them can change in a small interval and some in a large interval. The use of features that have a range of variations can reduce the accuracy of learning, so in this study, to normalize the analysis of medical data, normalization is used. To normalize patient-related data, the normalization range [a, b] can be considered, and the data can be normalized according to Eq. (1):

$$normal\left({F}_{i}\right)=a+\frac{{F}_{i}-min}{max-min}(b-a)$$
(1)

In this equation, \({F}_{i}\) is the abnormal value of a feature, and \(normal\left({F}_{i}\right)\) is the normalized value of the feature \({F}_{i}\). The max and min values are the maximum and minimum values of the features of a column of the data set, respectively.

If the normalization interval is considered equal to [0,1], then normalization is performed as Eq. (2):

$$normal\left({F}_{i}\right)=\frac{{F}_{i}-min}{max-min}$$
(2)

3.3.3 Proposed flowchart

The feature selection flowchart using the HHO algorithm to diagnose heart disease in each treatment center is shown in Fig. 9. In the proposed method, a feature vector with n components such as Eq. (3) defines as a member of the HHO algorithm. The value of each feature of the features vector is zero and one, which indicates the lack of feature selection and feature selection, respectively

$${X}_{i}=\ll {X}_{i}^{1},{X}_{i}^{2},{X}_{i}^{3},\dots ,{X}_{i}^{n}\gg$$
(3)

In this equation, \({X}_{i}\) is a feature vector. The \({X}_{i}^{j}\) is a component j of the feature vector i. Each feature vector has n components. The feature vector is a binary vector. A feature vector in the current iteration or t is assumed to be X (t). In the new iteration, this feature vector is assumed to be equal to X (t + 1). To evaluate any feature vector, it can be mapped to the data and then considered as the input of an artificial neural network (classifier). Each feature vector is evaluated based on the network diagnostic error and the number of selected features to all features. The appropriate objective function for feature selection in Eq. (4) is defined:

$$f=\alpha .\frac{1}{n}\sum\nolimits _{i=1}^{n}\left|{\overset-{Y}}_{i}-{Y}_{i}\right|+\beta .\frac{F}{A}$$
(4)

In the objective function, the exact value and prediction value of a sample are displayed with \({Y}_{i}\) and \({\overset-{Y}}_{i}\), respectively. The parameter n is the number of samples. The values ​​F and A are the selected features and the total possible features, respectively. The coefficients α and β are two random numbers. Values of α and β are between zero and one that their sum is equal to one.

Fig. 9
figure 9

Feature selection in blockchain-related blocks

Attempts to minimize the value of the objective function or f to a feature vector. To minimize this vector, the HHO algorithm is used. In each iteration, an attempt is made to update the feature vectors by this algorithm. Next, the HHO algorithm selects the optimal feature vector and minimizes the value of the objective function. In the proposed method, first, several random feature vectors are generated as the population of the HHO algorithm and then evaluated by the evaluation or objective function. The optimal feature vector is displayed in each iteration with \({X}_{rabbit}\left(t\right)\). Equation (5) is used to update feature vectors with random motions:

$$X(t+1)=\left\{\begin{array}{*{20}c}\begin{array}{*{20}c}{X}_{rand}\left(t\right)-{r}_{1}\left|{X}_{rand}\left(t\right)-2{r}_{2}.X\left(t\right)\right| & rand\ge 0.5\end{array}\\ \begin{array}{*{20}c}{(X}_{rabbit}\left(t\right)-{X}_{M}\left(t\right))-{r}_{3}(LB+{r}_{4}(UB-LB)) & rand < 0.5\end{array}\end{array}\right.$$
(5)

In this equation, X (t) is the current position of a feature vector in iteration t, X (t + 1) is the position of a feature vector in the new iteration. The value of \({X}_{rand}\left(t\right)\) is a random position of a feature vector in the problem space. The value of \({X}_{M}\left(t\right)\) is the point of gravity and the mean of the characteristic vectors,\({r}_{1}\)، \({r}_{2}\)، \({r}_{3}\), and \({r}_{4}\) are uniform random numbers in the range of zero and one. The LB and UB parameters are the lower and upper ranges of solutions in the problem space, respectively. The values of the LB and UB parameters of the proposed method are zero and one, respectively, and therefore Eq. (5) becomes Eq. (6):

$$X(t+1)=\left\{\begin{array}{*{20}c}\begin{array}{*{20}c}{X}_{rand}\left(t\right)-{r}_{1}\left|{X}_{rand}\left(t\right)-2{r}_{2}.X\left(t\right)\right| & rand\ge 0.5\end{array}\\ \begin{array}{*{20}c}{X}_{rabbit}\left(t\right)-{X}_{M}\left(t\right)-{r}_{3}.{r}_{4} & rand < 0.5\end{array}\end{array}\right.$$
(6)

By updating the feature vectors under the search agent, the feature vectors update in subsequent iterations under the influence of another type of search called a soft besiege, which is shown in Eq. (7):

$$X\left(t+1\right)=({X}_{rabbit}\left(t\right)-X\left(t\right))-E\left|J.{X}_{rabbit}\left(t\right)-X\left(t\right)\right|$$
(7)

In this equation, J is a random value between zero and two. The coefficient E is also a parameter called the energy coefficient and is a decreasing factor in terms of iteration. Another type of update is related to modeling harris hawk dives and can be used to update feature vectors, the modeling of which is shown in Eq. (8):

$$X\left(t+1\right)={X}_{rabbit}\left(t\right)-E\left|{X}_{rabbit}\left(t\right)-X\left(t\right)\right|$$
(8)

In HHO algorithms, each feature vector can be updated based on the average population position or population center of gravity, as in Eq. (9):

$$X\left(t+1\right)={X}_{rabbit}\left(t\right)-E\left|J.{X}_{rabbit}\left(t\right)-{X}_{m}\left(t\right)\right|$$
(9)

By applying these relationships, feature vectors are updated in each iteration to diagnose the disease. In the last iteration, the most optimal feature vector uses to reduce the diagnostic error of the disease. In the proposed method, each harris hawk is a feature vector and contains components zero and one, which indicate the lack of feature selection and feature selection, respectively. On the other hand, the rabbit refers to the optimal feature vector, and the objective function evaluates each of these feature vectors with the error of disease diagnosis and number of features.

3.3.4 Majority voting

Figure 10 shows the framework for using voting-based learning in the proposed method in each blockchain:

Fig. 10
figure 10

Block information extraction and prediction based on the majority vote

In the proposed blockchain method, a method of storing data and sending data is encrypted. In the proposed framework, each treatment center uses a machine learning technique to analyze the data. Sharing disease-related data analysis allows one health center to access the analysis of all data centers. Each hospital and treatment center should make the final analysis of the data based on the majority vote. If a class number generating in the output of most learning methods, it selects as the final output. For example, if there are five algorithms of machine learning and three algorithms to determine if a person is sick and two algorithms to determine a person is healthy, then it is based on the patient’s voting.

In this article, several machine learning methods such as artificial neural network, support vector machine, decision tree, random forest, Bayesian network, and AdaBoost uses for majority voting. A patient-related block is first selected, and its information, such as patient records and other medical centers’ opinions about the patient, is extracted. The contents of the data block extracting in the blockchain. At this stage, the required preprocessing perform on the block. Disease-related prediction class number (this prediction number is added by medical centers) is determined by a majority vote, and the results are presented based on a majority vote.

4 Analysis and evaluation

In this section, the proposed method in the diagnosis of heart disease is analyzing. MATLAB software version 2019 uses to evaluate learning methods and blockchain analysis. The data set used for heart patients is from the UCI database.

4.1 Evaluation criteria

To evaluate the proposed method, the parameters of true positive, true negative, false positive, and false negative are used, which are explained below:

  • TP: The number of people who have atherosclerosis and the proposed method correctly considers them patients.

  • TN: The number of people who do not have atherosclerosis and the proposed method correctly considers them healthy.

  • FP: The number of people who do not have atherosclerosis and the proposed method incorrectly considers them heart patients.

  • FN: The number of people who have atherosclerosis and the proposed method incorrectly considers them healthy.

The criteria for assessing accuracy, sensitivity, and precision, shown in Eqs. (10), (11), and (12), respectively, are used in the evaluations. Many studies related to classification and data mining use these three evaluation criteria to evaluate the performance of the classification algorithm [4721].

$$Acc=\frac{TP+TN}{TP+TN+FP+FN}\times 100\%$$
(10)
$$Recall=\frac{TP}{TP+FN}\times 100\%$$
(11)
$$Precision=\frac{TN}{TN+FP}\times 100\%$$
(12)

4.2 Analysis

The analysis of the proposed method in terms of various numerical and qualitative indicators is presented in the continuation of this section to diagnose heart disease. The output feature of this data set has five different classes that indicate the possibility of clogged arteries in the heart vessels. The values of this feature are 0, 1, 2, 3, and 4, where the number zero indicates a person’s health and the number 4 indicates a very high risk of coronary heart disease.

4.2.1 Classification analysis

The proposed method has two main phases of classification by machine learning and sending using blockchain. In the classification phase, each of the nodes and treatment centers uses a learning method. The reason that a node does not implement multiple learning methods simultaneously is that each treatment center simulates a diagnosis to provide a suitable platform for the implementation of the real model in the future. Each blockchain node implements and analyzes data on artificial neural networks, support vector machines, decision trees, random forest, AdaBoost, and Bayesian network methods. Simultaneous execution of each method in blockchain nodes also reduces the analysis time. To evaluate the methods, 70% of the data set uses for training, and 30% of the data is used for evaluation. In the feature selection phase, the population of falcons or feature vectors is 20, and the number of iteration is 50, and the experiments are repeated and averaged 30 times. The type of artificial neural network in the evaluation phase of feature vectors is also selected from the two-layer type, and each layer has 20 artificial neurons. One method to evaluate the proposed method is to use the RMSE and MAE error indices, which are formulated according to Eqs. (13) and (14) [6][3]:

$$rmse=\sqrt{\frac{1}{N}\sum\nolimits _{i=1}^{N}{({y}_{i}-{\widehat{y}}_{i})}^{2}}$$
(13)
$$MAE=\frac{1}{N}\sum\nolimits _{i=1}^{N}|{y}_{i}-{\widehat{y}}_{i}|$$
(14)

In these relationships, \({y}_{i}\) is the actual class number of a person in terms of illness or health, and \({\widehat{y}}_{i}\) is the class number estimated by the proposed method of that person’s condition, which can be healthy or sick. On the other hand, N is the number of samples used in the evaluation. Table 1 shows the RMSE and MAE index values of machine learning methods and the proposed method in diagnosing heart disease.

Table 1 Comparison of RMSE and MAE error of the proposed method and other methods

Analysis experiments show MAE error of artificial neural network, support vector machine, decision tree, random forest, AdaBoost, Bayesian network, proposed voting, proposed voting by HHO algorithms equal to 0.165, 0.148, respectively. 0.253, 0.267, 0.161, 0.169, 0.128 and 0.116. The proposed method has the lowest error in terms of the MAE index for diagnosing heart disease by feature selection. The methods discussed in the diagnosis of heart disease with RMSE error are 0.347, 0.385, 0.427, 0.348, 0.345, 0.341, 0.294 and 0.267, respectively. The proposed method with the feature selection mechanism of the compared methods has less error in diagnosing heart disease. The voting mechanism without feature selection is second only to the MAE and RME indicators.

Among the learning methods without a voting mechanism, the support vector machine method has the lowest error in the MAE index. In the RMSE index, Bayesian network error also performed better than non-voting methods in diagnosing heart disease. Classification indicators such as accuracy are of particular importance in addition to error indicators in diagnosing heart disease and analyzing data blocks in the blockchains. In Table 2, a comparison of three indicators of accuracy, sensitivity, and precision for the diagnosis of heart disease in the proposed method and other methods performed.

Table 2 Comparison of accuracy, sensitivity, and precision of the proposed method and other methods in diagnosing heart disease

The analysis of experiments showed that the accuracy of artificial neural network, support vector machine, decision tree, random forest, AdaBoost, Bayesian network, proposed voting, proposed voting by feature selection in diagnosing heart disease was 83.95%, respectively. 85.18%, 77.78%, 81.48%, 86.21%, 86.42%, 91.87% and 92.75%. Among the methods compared, voting-based methods are more accurate, and when feature selection uses the HHO algorithm, this accuracy increases from 91.87 to 92.75%. Sensitivity index for diagnosis of heart disease in artificial neural network, support vector machine, decision tree, random forest, AdaBoost, Bayesian network, proposed voting, proposed voting with hawk feature selection equal to 83.7%, 85.7%, respectively. 62.8%, 74.4%, 85.7%, 83.7%, 91.67% and 92.15%.

The proposed method with a selective voting mechanism is more sensitive in diagnosing heart disease in the proposed system. Feature selection in the proposed method increases the sensitivity in diagnosing heart disease from 91.67 to 92.15%. Precision index for diagnosis of heart disease in Artificial Neural Network, Support Vector Machine, Decision Tree, Random Forest, AdaBoost, Bayesian Network, Proposed Voting with Feature Selection for Diagnosis of Heart Disease, 85.7%, 89.7%, respectively 93.1%, 88.9%, 89.7%, 90%, 93.14%, 95.69%. Analysis and evaluation show that the accuracy, sensitivity, and precision index in the proposed method for diagnosing heart disease is higher than non-voting methods.

Our proposed method has several steps. Because they use blockchain, it is natural that its security is 100% because the security of the blockchain is 100%. In the machine learning stage, the accuracy of diagnosing the disease has become more efficient than other methods for the following two reasons:

  • Our method has a smart feature selection with the HHO algorithm and learning focuses on important features.

  • Our method uses majority voting in machine learning. Majority-based learning uses several simultaneous learning methods to correct errors, and therefore the accuracy of our method at this stage is also remarkable.

4.2.2 Execution time analysis

One of the important indicators in evaluating blockchain technology in cryptography and sending patient records is execution. In experiments, ten clients are using. Each client has 100, 200, 300, 400, and 500 blocks in the blockchain. Their execution time compares with the centralized method in the diagram in Fig. 11. Their execution time compares with the centralized method in the diagram in Fig. 11. Of course, each of the receiving centers of the blockchain has a longer calculation time and delay than the centralized state. Execution time per 100, 200, 300, 400 and 500 blocks in the centralized method in terms of seconds is 0.132, 0.214, 0.274, 0.384 and 0.842, respectively. In distributed mode and using blockchain technology, the validation time increases, and this time is equal to 1.124, 1.674, 2.263, 3.105, and 5.684 s for 100, 200, 300, 400, and 500 blocks, respectively. Execution time is longer in blockchain technology, but its security is far better than the centralized method. This delay is an important advantage for the blockchain so that validation does not face a security challenge.

Fig. 11
figure 11

Comparison of the execution time of the proposed and centralized system

Comparison of blockchain diagram implementation time in the proposed method for validation and evaluation of patient records shows that by increasing the number of blocks from 100 to 500, the execution time of the Centralized Method increase from 0.132 to 0.832 s. This increase in runtime is about 6.5 times. In the proposed method, it has to increase from 1.124 to 5.684 s, and this increase is equal to 5.056.

4.2.3 Qualitative comparison

The blockchain method is the opposite of the symmetric cryptography method. In the centralized method, one system is used to maintain confidentiality, but in the blockchain, several systems are used in a distributed manner. The centralized method does not consume much memory because all documents are in a central system. In centralized authentication mode, all information can be manipulated if the Central system is hacking. In the distributed or blockchain mode, all systems must be hacked at the same time, which is difficult in practice. In Table 3, a qualitative comparison is performing between symmetric and blockchain authentication methods.

Table 3 Qualitative comparison of the proposed method and the centralized method

5 Conclusions

Privacy is a vital challenge in using big data in health and processing it in smart cities. Our study shows that blockchain technology is highly distributive. This technology can be integrated well with the Internet of Things. One approach is to use the blockchain to maintain the confidentiality of medical data. Using blockchain technology, this challenge can be solved to a large extent. In this paper, a method of maintaining patient privacy according to blockchain technology is developing.

Each node connected to the blockchain can first analyze the information or patient records by machine learning and data mining. Next, place this analysis in the blockchain blocks with the approval of other medical centers. The proposed method In the phase of sending information and records of heart patients, blockchain is used to maintain the confidentiality of patients’ data. Machine learning with a voting approach uses to analyze data blocks. In the machine learning phase, the HHO algorithm uses to increase the accuracy of classification. Security in the proposed decentralized approaches is high due to the use of the blockchain. The blockchain is theoretically indecipherable. Non-distributed systems such as blockchain, unfortunately, have more power consumption than other approaches such as the centralized approach because they use all system components for decoding.

The proposed decentralized approaches are theoretically impenetrable due to the use of the blockchain against attacks. In contrast, approaches such as centralized medicine are prone to attack and intrusion. The scalability of decentralized methods is one of the proposed methods due to the use of blockchain. The proposed method can be implemented in any scale and dimension. Patient data and records in the proposed method cannot be disturbed because they are executed in the context of the blockchain. The proposed approach, such as blockchain, consumes a lot of memory because it is distributed. In blockchain and decentralized approaches such as the proposed method, patients’ authentication time relative to centralized medical systems is significant. Experiments show that the memory consumption of the proposed method is higher than the centralized methods due to the maintenance of the blockchain.

Each system needs more time to create blocks in distributed computing mode, so blockchain-based methods have more time for authentication than centralized methods, but their security is 100%. However, the security of the data stored in the centralized system is low. The analyzes show that if the voting method and feature selection mechanism are used simultaneously in the proposed method, then the accuracy, sensitivity, and precision of the proposed method are 92.75%, 92.15%, and 95.69%. The proposed method is more accurate in diagnosing heart disease in the three indicators of classification and classification error of artificial neural network, support vector machine, decision tree, random forest, AdaBoost and Bayesian network.

This research, like other research on the blockchain, has its limitations. One of the important limitations of the block is the limited number of transactions in a given period. There is a deliberate delay between security requests sent to the blockchain to maintain security, and this delay slows down the submission and verification of medical records. Blockchain requires more memory, and each node participating in the proposed system must store a copy of all information. Another limitation of the proposed method is that medical centers and hospitals must agree on the use of the blockchain. Despite the disadvantages mentioned, the proposed system has its advantages. Confidence and confidentiality in the proposed method are theoretically unhackable. The second advantage is the use of vital features of patient records using the feature selection method based on the HHO algorithm.

Because a lot of data is generated in medical centers, it is a big data type, so in the future, big data processing platforms such as Apache Spark and Hadoop will be used to process medical and medical data.