1 Introduction

Artificial intelligence (AI) and big data technology already find a wide application in the medical field to reduce labor costs and human-induced mistakes [1, 2]. However, current intelligent medical systems still suffer from immaturity and are questioned for inadequate treatment recommendations [3,4,5]. Many factors help account for this inadequacy, the first and most prominent one being the difficulty in collecting sufficient data with rich features, which determines a comprehensive understanding of disease judgment in the data. Besides, relevant machine learning models generally have a lower performance [6]. Thus, an effective, secure medical data collection and processing from worldwide medical institutions becomes the bottleneck in current intelligent medical systems [7, 8].

To break this bottleneck, medical institutions are united and agree to share medical data under Privacy Protection Regulations. Thus, the model training with larger datasets performs much better than with data from a single medical institution [9, 10]. Federated learning [11] (FL) forms a promising solution in this medical field, since all participants cooperate to train a shared model without any private data disclosure or exchange. Despite of its preliminary success in medical practice, the basic FL convolutional neural networks (CNNs) [12, 13] still have some drawbacks, diminishing the overall system performance. A good case in point is, either a small noise addition to the initial sample or a minor change would make it difficult to visually detect any obvious changes. Same samples in the client’s network model often bring about different predictions. In addition, the lack of client participation incentives in federated learning contributes negatively to the model training efficiencies.

Major improvements on the training model robustness lie in proposing better designs of loss algorithms [14]. This paper proposes the distance-based cross-entropy loss and nature prototype loss (DPL loss function), so that all prototypes directly learn from the given data. The convolutional layer is still employed to extract the characteristics for the bottom of this DPL image classification framework. By contrast, the distribution of multiple prototypes at the top of the DPL means to represent different classes. To achieve the image classification, Euclidean distance is used to find the nearest prototype in the feature space. Inspired by prototype learning [15], the paper introduces natural prototype losses (PL) in DPL to help reduce the distance between the feature vector and the corresponding prototype. With its model overfitting prevented, it still effectively improves DPL classification performances. The model therefore becomes more identifiable and robust.

It is widely accepted that medical data are the prerequisite for model training. Unfortunately, traditional federated learning fails to attract high-quality dataset owners, making training global models infeasible [16, 17]. As a remedy to federated learning, blockchain technologies emerge in federated learning frameworks. Clients are attracted to participate in federated learning by leveraging the blockchain transaction integrity and traceability and by combining its incentive mechanism with other technologies [18]. Inspired by the above studies, this paper introduces an incentive mechanism in the federated learning framework to attract more high-quality medical datasets, expands medical datasets in training models, and improves model classification effects.

As mentioned above, this paper proposes Incentive Mechanism for Federated Learning of Medical Data Classification (FedIn-MC) and makes the following chief contributions:

  1. 1.

    This project introduces an improved federated learning framework to encourage multi-party medical institutions to participate in the cooperation. Secure cross-institutional data sharing guarantees train model comprehensiveness, with medical data retaining in their respective local areas.

  2. 2.

    With the introduction of DPL algorithms based on prototypes, this paper realizes that all prototypes learn directly from the data, and significantly changes poor classification robustness on complex datasets.

  3. 3.

    An introduction of incentive mechanisms in the blockchain markedly improves the attractiveness to medical institutions and encourages high-quality datasets owners to participate in training. Medical institution clients receive corresponding token rewards based on their training dataset contributions.

The rest of this paper is organized as follows: In Sect. 2 we introduce related work followed by an introduction to the FedIn-MC in Sect. 3, and in Sect. 4 we describe the experiments and performance analysis of FedIn-MC. Finally, Sect. 5 draws a conclusion and makes a prospect for future researches.

2 Related work

Widely used in machine learning tasks, prototype learning is a classic and representative method in pattern recognition [19]. Yu et al. [15] apply prototype learning in image classifications, where a prototype represents a class and is computed as the mean of feature vectors within each of these classes.

The method in He et al. [20] is related to the contrastive loss for unsupervised visual representation learning, which guides CNNs to learn more discriminative representations. Wen et al. [21] adopt the central loss approach to improve performance for softmax-based CNN, but it does not learn directly from the data. Instead, it is updated according to the predetermined rules and unable to achieve a synchronized learning with this CNN. A CNN-based encoder is trained by Huang et al. [22] to extract visual representations, to convert image features into coherent semantics and to use similar visual semantics to aggregate into the same image features. In addition, Lakshmanaprabu et al. [23] use the aggregated local features as descriptors for image retrievals, which improve the model’s image retrieval ability. The average word embedding is used as a sentence representation in Wieting et al. [24], to achieve a competitive performance on multiple NLP benchmarks. Furthermore, Hoang et al. [25] adopt prototype learning to represent irrelevant information in distributed machine learning tasks. All of these studies leverage a new fusion paradigm to integrate related prototypes to generate new models for new tasks. They widely use prototype learning for tasks with a limited number of training samples, ensuring better discrimination of the learned representations [26]. Consequently, this paper combines federated learning with prototype learning to effectively integrate feature representations from different dataset distributions, projecting samples to a specific region in the feature space, i.e., near the prototype.

The goal of federated learning is to train a global model on a centralized server, while all data are distributed across multiple local clients, owing to privacy or communication concerns [27, 28]. These prominent advantages make federated learning a promising solution in smart healthcare, breaking data barriers between institutions and stimulating a collaborative training [29, 30]. Lim et al. [31] outline a federated learning application scenario in biomedicine, confirming the feasibility of federated learning in smart medical care. According to Kan et al. [32], local model training in the federated learning mechanism obtains experimental results with a high accuracy and reliability. Improved learning effects ultimately result from a reduced training time. Based on the future digital health development, Rieke et al. [33] exploit how federated learning solves the current problems in smart medical care. However, traditional federated learning frameworks still lack user incentives, failing to attract medical clients with high-quality datasets [34, 35], resulting in hindrances to train an efficient model.

At present, blockchain technology has been used in all walks of life [36, 37]. To be more specific, federated learning and blockchain combined is widely used in the medical field [38, 39]. This combination generally overcomes the incentive inadequacy in traditional federated learning frameworks. Kang et al. [40] apply the incentive mechanism in the reliable federated learning program to protect client data and achieve efficient calculations. Filtering client data sources and combining various blockchain incentive mechanisms, Nishio et al. [41] focus on the model training efficiency and federated learning verification. Kim et al. [42] add a mechanism for verification and provision of corresponding rewards to federated learning. In addition, the phenomenon of differences brought by the delayed problem is also discussed. The Shapley value is adopted in Liu et al. [43] to calculate contribution degrees, together with some game theory principles introduced into the incentive mechanism. It is encouraging that all clients participating in the training are allocated a fair token value. Rehman et al. [44] introduce a fine-grained reputation perception to encourage more edge computing servers to participate in the model training, so the final training effect gets improved. These papers introduce the incentive mechanism in blockchain technology into federated learning to solve the problem of small quantity and low quality of training datasets.

However, practical applications fall outside their consideration in the above mentioned researches. Actual performances may be significantly degraded, despite of their potentially high accuracy in theory on specific datasets. Though computer vision-related tasks are executed on the basis of CNN models, they generally ignore the models’ own robustness defects. The FedIn-MC framework proposed means to be applied to real medical fields. At the first hand, it hopefully solves the inefficiency and less robustness of medical image classifications. Besides, it greatly improves medical image data sharing and data quality. To sum up, this framework realizes local medical data storage and safe sharing, together with an encouragement for more participants to join in.

3 Methodology

3.1 DPL loss function

An automatic feature learning from data usually has a better classification effect. In this paper’s framework, CNN is adopted as a feature extractor, denoted as \((x,\theta )\), while x and \(\theta \) represent the original CNN input and parameter, respectively. Whereas traditional CNNs utilize softmax layers to linearly classify the learned features, our model learns prototypes on each class features for classification in this paper. Denote the learned prototype as m, where \(i \subset \{1,2,3,...,I\} \) represents the class index and \(j \subset \{1,2,3,...,J\} \) represents the prototype index in each class. Denote the prototypes in each class as \(M=\left\{ m_{ij}\big | i=1,2,\ldots ,I;j=1,2,\ldots ,J\right\} \), and set each class to have the same number of prototypes. At the classification stage, the prototype is used to match the sample for classification, (the nearest prototype is found through the Euclidean distance, and the class of the prototype is matched to the corresponding sample) as shown in Fig. 1.

Fig. 1
figure 1

DPL function example

First, the distance-based cross-entropy loss (DCE) is used to measure the similarity between the sample and the prototype. Therefore, the probability \(p\left( x\in m_{ij}\big | x\right) \) that the sample (xy) belongs to the prototype \(m_{ij}\) can be measured by the distance, and the formula is as follows:

$$\begin{aligned}{} & {} p\left( x\in m_{ij}\big | x\right) \propto -\Vert f(x)-m_{ij}\Vert \end{aligned}$$
(1)
$$\begin{aligned}{} & {} p\left( x\in m_{ij}\big | x\right) =\frac{e^{-\lambda d(f\left( x\right) ,m_{ij})}}{\sum _{a=1}^{I}\sum _{b=1}^{J}e^{-\lambda d(f\left( x\right) ,m_{ab})}} \end{aligned}$$
(2)

where \(d(f\left( x\right) ,m_{ij})\) denotes the distance between f(x) and \(m_{ij}\), \(\lambda \) denotes the hyper-parameter that controls the probability assignment difficulty. From this definition, the DCE function \(l(\left( x,y\right) ;\theta ,M)\) is obtained, and its formula is as follows:

$$\begin{aligned}{} & {} l\left( \left( x,y\right) ;\theta ,M\right) =-\text{ log }p\left( y\big | x\right) \end{aligned}$$
(3)
$$\begin{aligned}{} & {} p\left( y\big | x\right) =\sum _{j=1}^{J}{p\left( x\in m_{ij}\big | x\right) } \end{aligned}$$
(4)

The probability of correct classification is thus maximized by minimizing the loss function. Among them, the correct classification probability is maximized by minimizing the distance between sample features and all accurate class prototypes. Meanwhile, it maximizes the space between sample features and all wrong class prototypes. However, a direct minimizing of the classification loss probably leads to overfitting, so the prototype loss is introduced as a regularization to improve DPL generalization performance, which is defined as:

$$\begin{aligned} l\left( \left( x,y\right) ;\theta ,M\right) ={\Vert f(x)-m_{yj}\Vert }^2_2 \end{aligned}$$
(5)

Combining the above losses to train the classification model, the total loss is defined as \(\left( \left( x,y\right) ;\theta ,M\right) \) with the following its formula:

$$\begin{aligned} \text{ DPL }\left( \left( x,y\right) ;\theta ,M\right) =\ l\left( \left( x,y\right) ;\theta ,M\right) \ +\alpha \cdot pl\left( \left( x,y\right) ;\theta ,M\right) \end{aligned}$$
(6)

where \(\alpha \) denotes the hyper-parameter that controls the prototype loss weight. Using PL further improves DPL framework performance. For one thing, it is because PL makes sample features close to the corresponding prototypes, reducing the distances between the features within the same class and indirectly increasing the distances between different types. For another, because the classification loss emphasizes representation separation, while the prototype loss emphasizes the representation compactness, the two combined correspondingly enhance the framework robustness.

3.2 Incentive mechanism

The incentive mechanism proposed is based on client contributions from medical institutions. It is based on the dataset size participating in the model training to reward the corresponding tokens of each medical institution. First, when the FedIn-MC framework starts each round of training, the central server selects a fixed number of medical institution servers and stores their blockchain addresses in Clist. At the same time, the central server defines the Training structure, which is used to store the training tasks completed by the server of that medical institution, including the training set size and the training task’s status information; then, the selected client server trains the model locally and the current round of teaching.

This information is uploaded to training and is stored in Contrib synchronously; finally, according to the data in Contrib, tokens are distributed to each medical institution server address in Clist. The number of reward tokens is calculated as the total sum of all products by the training set size times a constant r. The incentive mechanism helps to encourage more medical institutions to participate in the training of the FedIn-MC framework, thereby reducing the bias of the classification model. The specific Algorithm 1 is as follows.

Algorithm 1
figure a

Incentive mechanism

3.3 FedIn-MC framework

Since multiple medical institutions jointly participate in the model training, during each round of federated learning, the central server selects a specific number of medical institutions to join in the training according to predefined conditions (such as client idle time and charging conditions). For the server of the medical institution, set each medical client’s dataset as \({\text{ data }}_{n,i}=\left\{ {x_{n,i},y_{n,i}}\right\} \) and set the total dataset participating in the training as \(\text{ Data }=\sum _{n\in N}\sum _{i\in I}{\text{ data }}_{n,i}\) where each sample is represented as \(i\in I\), and the medical institution is defined as \(n \in N\). In the model training process on the client side, the server objective function of the medical institution n is expressed as \(F_{n}\), the loss function calculation is described as \(DPL\left( \left( x,y\right) ;\theta ,M\right) \), and the model parameters \(w_{n}^{k}\) are obtained by the corresponding analysis, which is represented as follows:

$$\begin{aligned} F_n \triangleq \min {\left( {\frac{1}{N}}\cdot \sum _{n\in N} \text{ DPL }\left( \left( x,y\right) ;\theta ,M\right) \right) } \end{aligned}$$
(7)

Then, the medical institution server performs k rounds of Stochastic Gradient Descent (SGD) algorithm calculation. And it uploads the updated \(w^k\) to the central server. Finally, on the central server, the weighted average of the \(w^k\) is completed according to the training dataset size. It is calculated as follows:

$$\begin{aligned}{} & {} w^k=\frac{1}{N}\cdot \sum _{n=1}^{N}\sum _{i=1}^{I}\frac{w_n^k\cdot {\text {data}}_n}{\text {data}} \end{aligned}$$
(8)
$$\begin{aligned}{} & {} {\nabla f(w}^k)=\frac{1}{N}\cdot \sum _{n=1}^{N}{\sum _{i=1}^{I}{{\nabla f(w}^k,x_n,y_n)}} \end{aligned}$$
(9)

In FedIn-MC, the model is iteratively computed until the inequality \(\left| f\left( w^k\right) -f(w^{k-1})\right| \le \delta \) is satisfied, where \(\delta \in R\) is the accuracy loss. Here, different medical institutions are used as system clients, each medical institution has a local dataset, and the data are kept locally. At the same time, a trusted third party with sufficient computing power is used as the central server to provide computing power support for the aggregated global model. The process of FedIn-MC framework algorithm is as Algorithm 2.

Algorithm 2
figure b

FedIn-Mc

4 Experiments and results

4.1 Experimental setting

This project takes Ethereum as the underlying blockchain network and Proof of Work (POW). Select My SQL 5.7.1 to store the data under the chain respectively. The federated learning system framework is deployed on 7 servers, of which 1 is a central server and 6 servers as client servers. Assuming that each client server is assigned to a medical institution, each medical institution server is a four-layer neural network for medical data classification. During the training process, the initial learning rate is 0.005, the momentum amount is 0.5, and the hyper-parameter in DPL is set to 1.0.

The experiment uses the CT diagram dataset of COVID-19 [45] (as shown in Fig. 2) and the benchmark dataset MNIST. COVID-19 dataset is collected from several public databases and recently published articles. After screening this paper, the entire dataset is divided into training, test, and validation sets. The basic situation of the dataset division is shown in Table 1.

Fig. 2
figure 2

Incentive registry

Table 1 Introduction of COVID-19 dataset

The experiment stores the status of each federated training round in the incentive registry. As is shown in Fig. 2, the experimental registry records the training status of each medical institution client in this federated training round. This also includes the customer data size and the number of tokens obtained.

The indicators are accurate rates (Accuracy), false rejection rates (FRR), and false acceptance rates (FAR). The Accuracy reflects the model classification accuracy, while FRR and FAR values verify the model stability. The formula is described as follows:

$$\begin{aligned}{} & {} \text {Accuracy} = \frac{\text{ TP}_n + \text{ TN}_n}{\text{ TP}_n + \text{ FP}_n + \text{ TN}_n + \text{ FN}_n} \end{aligned}$$
(10)
$$\begin{aligned}{} & {} \text{ FRR } = \frac{\text{ FN}_n}{\text{ TP}_n + \text{ FN}_n } \end{aligned}$$
(11)
$$\begin{aligned}{} & {} \text{ FAR } = \frac{\text{ FP}_n}{\text{ FP}_n + \text{ TN}_n } \end{aligned}$$
(12)

where TP\(_n\), TN\(_n\), FP\(_n\), and FN\(_n\), respectively, represent True Positive, True Negative, False Positive, and False Negative of the corresponding medical institution client n.

4.2 Results and discussion

4.2.1 Analysis of classification accuracy

As shown in Figs. 3 and 4, in terms of classification accuracy, the FedIn-MC framework is better than the traditional FedAvg framework. The model accuracy of the DPL loss function on COVID-19 and MNIST dataset is increased by 1.12 percentage points and 0.3 percentage points, respectively. This indicates that the DPL loss function minimizes the distances between the sample features and all correct prototypes. Also, it maximizes the distances between sample features and all wrong prototypes. This paper’s loss function maximizes the probability of correct image classifications. The FedIn-MC framework mentioned above performs well on different datasets. This method shows promise for better training efficiency and model universality, and a greater advantage in medical data training.

Fig. 3
figure 3

Accuracy values and loss values for the COVID-19 dataset

Fig. 4
figure 4

Accuracy values and loss values for the MNIST dataset

Table 2 The FAR and FRR values for methods

As shown in Table 2, in order to test the FedIn-MC framework’s robustness, this project trains the model on the MNIST dataset and uses two test sets for the network (MNIST and COVID-19). Since the COVID-19 dataset test sample is not a number, the test result is abnormal, so the model rejects the result. By contrast, MNIST test data and training data come from the same data domain, and the model accepts the corresponding results. To fairly evaluate the classification effect of the model, our experiments use two measurement indicators: FAR and FRR; FAR represents the percentage of the MNIST dataset to accept the sample, while FRR represents the percentage of COVID-19 dataset to reject the selection. A comparison of the results in the table verifies the fact that the traditional model based on softmax fails to simultaneously possess high FAR and FRR. However, the DPL-based model in this paper achieves dual high values, demonstrating an abnormality detection stability. The model rejects more than 99\(\%\) of abnormal COVID-19 value samples while retaining more than 99 \(\%\) of normal MNIST value samples. FedIn-MC framework model here confirms a significant advantage in robust identification (its robustness).

Table 3 The accuracy of classification on added category

This project does an incremental experiment on the COVID-19 dataset to prove the superiority of the FedIn-MC framework in medical data classification. The experiment uses the COVID-19 sample (Normal class, Viral class, and COVID-19 class) as known class data and uses the MNIST dataset as ten categories of unknown. From these ten categories, select one with the COVID-19 dataset. Test samples form a new test dataset. It can be seen from Table 3 that the FedIn-MC framework still maintains a high classification accuracy in incremental test samples. The experiment does not retrain any part of the model but also adds a classification test based on the original model training, proving that the framework of this paper has a robust classified advantage.

4.2.2 Analysis of communication cost

This project compares the communication overhead of FedAvg (softmax) and FedIn-MC (DPL). During the experiment, each round of random selection of N medical institution clients to participate in the experiment measured the communication overhead of 100 rounds of system training.

As shown in Table 4 and Fig. 5, an incremental experiment measured the communication overhead of the server and the global model of the model parameter and the global model. It can be seen from the results that the communication overhead based on softmax-based federated learning is much more significant than the DPL FedIn-MC’s communication overhead. Significantly when the medical institution increases and the dataset becomes more extensive, the framework of this paper can effectively reduce communication overhead.

Table 4 Comparison of communication cost
Fig. 5
figure 5

The communication cost for different amounts of clients

Given the above experimental results, the FedIn-MC framework in this paper can inspire data owners of all parties to join model training, break the "data islands" situation while ensuring data privacy, improve data classification effects, and have a more positive sign for an intelligent medical care development.

5 Conclusions and future work

A federated learning technology is adopted in this paper to solve medical data insufficiency and sharing difficulties so as to promote the medical institution cooperation. A prototype learning concept transfers the traditional softmax function into a DPL function. It directly learns multiple prototypes in each class in a convolutional feature space, and employs a prototype loss to shorten the inner class distances to improve the model robustness. At the same time, in order to improve the dataset number and quality, an incentive learning mechanism of blockchain is introduced in the framework to attract more medical institution participants. In the future smart medical care field, federated learning can also be combined with other new technologies to increase the model accuracy, to reduce data sharing risks, and to provide more innovative ideas for secure medical data sharing. In the following studies, we mean to try more detailed innovative experiments of blockchain and federated learning, to improve the model adaptation ability to changing environments and its aggregation process credibility.