Introduction

Most countries in the world are currently facing a severe shortage of healthcare workers and medical resources [1]. Studies have shown that more than 10 million people die of cancer each year worldwide, with developing countries accounting for more than 80% of these deaths [2]. One of the major reasons for this problem is the shortage of healthcare professionals and the low level of diagnostic automation in developing countries [3]. In addition, the equipment needed to diagnose cancer is expensive and resource intensive, which makes it difficult for hospitals in developing countries or less developed regions to afford such tests [4].

Magnetic resonance imaging (MRI) is one of the common methods used in hospitals to examine tumors [5,6,7]. Cancer diagnosis relies on manual recognition of images for screening [8]. However, the amount of patient image data is huge and there is a large amount of redundant data [9]. Inadequate MRI apparatus and insufficient data from MRI scans for osteosarcoma can still cause delays in diagnosis and treatment [10]; these issues can also be caused by inefficient manual identification methods and slow efficiency. In addition, cancer diagnosis lacks a uniform definition and requires solid expertise [11]. Higher subjectivity increases the rate of misdiagnosis by inexperienced physicians [12]. For example, the average accuracy of manual segmentation by physicians is about 90%. The accuracy rate for physicians with less experience is only 85%.

At this stage, the research on automatic cancer diagnosis in developing countries is of great significance [15]. Due to their precise cutting of tumor region boundaries and preservation of shape information, deep learning-based segmentation networks facilitate the comparison of tumor regions before and after medication [16]. However, tumor images have large differences in location, shape, and scale, resulting in poor model interpretability [17]. Meanwhile, there is noise interference in osteosarcoma image data, some images have low resolution, and the accuracy of models such as convolutional neural networks (CNN) is still unsatisfactory [18]. The emergence of artificial intelligence algorithms has effectively increased the accuracy and improved the segmentation of images [19]. However, the implicit features in the image cannot be extracted manually, and training a multi-feature classifier is also a time-consuming task [20]. Many studies have used more complex and deeper networks to improve segmentation. This improves the generalization ability of the model to some extent, but it is still far from the clinical standard [21]. Not only that, the complex network increases the model training time and reduces the training efficiency [22, 23]. It can be seen that the existing techniques do not achieve the expected results.

Although the existing research algorithms have improved the shortcomings of the traditional methods to a certain extent, some new problems that need to be solved have emerged, which are summarized in the following three main points:

  1. (1)

    The features used for learning by intelligent algorithms need to be extracted manually, while implicit features cannot be extracted manually into an image.

  2. (2)

    The feature extraction of osteosarcoma MRI images is complex, with high noise and low resolution in the images. It can easily lead to overfitting of the model.

  3. (3)

    Networks that achieve high segmentation accuracy through complex structures not only reduce training efficiency but also have high equipment requirements.

In summary, this paper proposes a deep convolutional neural network segmentation model based on noise reduction and super-resolution reconstruction (NSRDN). We first pre-screen the initial image dataset, using Differential Activation Filter (DAF) to divide valid and invalid images. It reduces computational cost and waste of resources and improves segmentation accuracy. In addition, considering that the acquisition time of some osteosarcoma MRI images is early and limited by the acquisition equipment, the resolution may be low and the texture features of the detailed parts are not clear. We improved the IMDN algorithm, combined with the adaptive cropping method to super-resolution the image, and reasonably added image details while ensuring that the original details were not lost. We then rationally designed the segmentation model, considering the complexity of tumor shape and the uncertainty of location. We used a high-resolution network (HRNet) to segment tumor images of different sizes. To ensure the robustness and stability of the model, we rotated the input data at different angles before segmentation to expand the training data. At the same time, the input image is binarized, which improves the segmentation efficiency, solves the problem of poor image segmentation blur effect, and obtains a more accurate tumor segmentation image, which is convenient for doctors' diagnosis and treatment. NSRDN is the first to improve the accuracy of the MRI image segmentation task of osteosarcoma from the perspective of improving image resolution, which plays a great role in the auxiliary diagnosis, treatment and prognosis of osteosarcoma, and has a good prospect and application value.

The specific contributions of this study are as follows:

  1. (1)

    Computer technology can assist doctors in qualitative and even quantitative analysis of lesions and other areas of interest, thereby greatly improving the accuracy and reliability of medical diagnosis. According to the characteristics of osteosarcoma MRI images, we propose a new osteosarcoma-assisted segmentation method to provide doctors with intuitive analysis results. We improve the segmentation accuracy from the perspective of improving image resolution, which is innovative compared to traditional medical image segmentation schemes.

  2. (2)

    Origin datasets are relatively coarse, with a mix of useless and useful data, we carry out innovative treatment of the segmentation data for the characteristics of the dataset. This article first performs a coarse initial denoising process on the image, and the images are screened using a differential activation filter (DAF). Detecting the existence of the lesion area through the image window block reduces the amount of computation and improves the segmentation accuracy.

  3. (3)

    The osteosarcoma MRI dataset has the characteristics of noisy and low resolution, the shape and position of the tumor image are largely different. In this paper, the low-resolution characteristics of osteosarcoma MRI are optimized. Based on the characteristics of the size and shape of the osteosarcoma MRI dataset, we improve the IMDN algorithm and use the adaptive cropping-based IMDN_AS algorithm for super-resolution reconstruction of images, which enhances the data detail characteristics while ensuring the image features, and avoids the increase of invalid features so that the segmentation network has better performance and segmentation results.

  4. (4)

    Considering the characteristics of high resolution and low noise after image processing, we do not blindly pursue the depth and complexity of the network. Instead, according to the characteristics of the processed input data, a high-resolution network (HRNet) that satisfies the high-resolution characteristics of the processed images is designed. We use this network for image segmentation, which preserves the high-resolution features of the image and expands the advantages of the image. To extract more refined semantic features, parallel convolution and feature fusion are used, which accurately identifies the tumor region and its boundary, and have high segmentation efficiency.

  5. (5)

    In this paper, more than 4000 samples were collected from Artificial Intelligence Research Center for experimental analysis. The experimental results show that the proposed method is superior to other existing methods in terms of segmentation accuracy and effectiveness. Clinicians can use our segmentation results to assist in determining the tumor area and lesion location, which greatly reduces the workload and improves the diagnosis and treatment accuracy.

Related work

For the purpose of enhance the segmentation of tumor images, a series of solutions, predicated on the principles of deep learning have already been proffered. These solutions aim to automatically predict the segmentation maps of tumor images, thereby providing assistance to the clinical examination procedures of medical professionals.

Convolutional neural networks (CNNs) utilize a range of techniques such as local connectivity and weight sharing, multi-level feature extraction, data augmentation, and regularization to achieve efficient extraction of image features, making it possible to tackle complex image tasks, and have become the primary choice in this field due to their outstanding performance and accuracy [24]. An approach that attempts to model semantic segmentation using CNNs. A latent representation is generated by using a convolutional encoder. It generates the corresponding feature maps, which are then used with a fully connected layer to generate pixel-level segmentation predictions. However, this structure uses fully connected layers that result in the forfeiture of spatial relationships and destruction of spatial information. Fully convolutional networks (FCNs) were proposed as an alternative, which can also predict a segmentation map based on pixel classification [25]. FCNs capture semantic representations through several convolution blocks composed of Convolutional layers, activation function layers and pooling layers on the encoder path and provide pixel-level predictions using convolutional layers and upsampling operations in the decoder path. Most importantly, as an architecture, FCNs have inspired many medical image segmentation models that adopt similar ideas, the most important of which is the U-Net model [26]. Previous studies have extended this model to the segmentation of multi-class pathological images with good results [27]. However, FCN cannot control global information. Literature [28] proposes a context module, while literature [29] proposes a hybrid dilated convolutional framework. However, the performance of these approaches falls short in terms of extracting global features. Recently, transformer-based architectures have be-come prominent in semantic segmentation task and have demonstrated better performance than alternative methodologies rooted in FCN.

Researchers have recently utilized transformers, which have accomplished notable efficacy in natural language processing, to tackle computer vision tasks. Chen et al. [30] employed a sequence transformer for pixel recursive prediction. Compared to CNNs, sequence transformers can more effectively utilize relationships between pixels and can handle images of arbitrary sizes. The advantage of this method lies in its ability to predict each pixel in the image, resulting in a more detailed image. ViT, as another popular vision transformer model, directly applies a pristine trans-former architecture to the sequence of image patches, achieving image recognition through classification of the entire image. It does not require predefined convolutional kernels, but instead uses self-attention mechanisms to learn representations of the image. This efficacious approach was posited by Dosovitskiy et al. [31], and has attained the pinnacle of performance in various image recognition benchmarks, thereby solidifying its preeminent status in the field. Furthermore, beyond the scope of mere image classification, transformers have also been widely used to solve various visual problems, such as object detection [32]. There is also semantic segmentation [33], which assigns each pixel in the image to a specific object or region, thereby achieving more fine-grained image understanding and analysis. Furthermore, transformers can also be applied to image processing [34], such as image enhancement, denoising, and deblurring, to improve image quality, and enhance image details. These models not only demonstrate excellent performance in terms of computational efficiency and accuracy but also showcase their versatility and flexibility for image processing tasks. As a result of its exceptional performance, an increasing number of researchers have put forth transformer-based models as a means to enhance various visual tasks.

The transformer is capable of effectively extracting features based on a global receptive field at the first layer of the architecture through global attention. This model possesses formidable learning capabilities, empowering it to capture long-range dependencies and contextual information from the input sequence. One notable example of this is in the realm of medical image seg-mentation, where researchers have developed hybrid architectures that combine FCNs with trans-formers/attention mechanisms [35]. Despite the promise of these integrated frameworks, they do not outperform the preeminent FCN architecture and model based on the Transformer architecture in this endeavor, especially HarDNet-DFUS [36] and SSFormer [37] which respectively represent the most advanced FCN and Transformer architectures. FCBFormer [38] as a way of integration has achieved a good effect in establishing global dependencies and obtaining contextual information, achieving a high level of overall performance. For complex boundary situations in tumor areas, it is often difficult to preserve complete shape information and achieve a good segmentation boundary accuracy.

Owing to the intricate morphology of neoplasms, the boundaries of some small local areas are often blurred. In such cases, boundary enhancement becomes a crucial measure to preserve more boundary information. Meng Zhao [39] put forth an edge sharpening technique grounded on the mathematical morphology and Canny operator, which selectively enhances edges by extracting edge information as weights. This method performs well for segmentation edge blurring caused by weak contrast, but is less suitable for more complex situations, such as the blurring of segmentation boundaries due to difficulty in segmenting small local areas. Yang-Mao [40] proposed a mean vector disparity intensifier to quell stochastic gradients and amplify the gradients of entity outlines. This method requires a certain understanding of the segmentation target and a clever parameter setting for the color intensity of edge points. Additionally, it also relies on enhancing contrast for edge enhancement and does not break free from the limitations of the previous method. Therefore, a universal and well-performing edge enhancement method remains a challenge.

In addition, the use of deep learning algorithms against COVID-19 has also been applied in recent years. Badshah et al. introduced a variety of machine learning algorithms for the detection of COVID-19 in X-ray images, and the results showed that the decision tree achieves a good prediction [41]. Hybrid approaches between metaheuristics and machine learning have also been the focus of many researchers. Such studies have successfully combined machine learning and swarm intelligence algorithms and have achieved outstanding results in many fields. For example, to address the problem that deep models are easy to overfit, Bacanin et al. [42] proposed an improved firefly algorithm, which compensates for the shortcomings of the original firefly algorithm by showing the exploration mechanism and chaotic local search strategy. The method performs better in preventing overfitting of CNNs. Meanwhile, Bacanin et al. [43] also proposed an automated framework based on the hybrid sine–cosine algorithm to solve the problem of overfitting deep models, which is mainly achieved by choosing suitable values for the regularization parameter loss. Malakar et al. [44], on the other hand, proposed a hierarchical feature selection model (HFS) based on genetic algorithms to address the problems of redundancy and irrelevance in feature selection in deep learning algorithms. The HFS model is able to improve the performance of handwritten word recognition techniques. Zivkovic [45] tuned the hybrid CNN and XGBoost model to improve the recognition accuracy of the model. This method also performs well in the COVID-19 diagnosis of X-ray images.

It can be seen that a large number of disease diagnoses and prognostic processing have been delivered to the intelligent medical system for completion. Triggering mechanisms for medical decision-making systems have thus received attention. Song, Xiaona et al. [46] proposed a dual-event-triggered control algorithm based on a spatio-temporal sampled data scheme to reduce the consumption of transmission resources; Peng [47] designed a novel point-by-point controller to reduce the number of sensors required; Wang and Rui et al. [48] adapted the controller to minimize the occurrence of faults through Q-learning.

However, because MRI images have low resolution and are susceptible to noise, conventional processing methods are often difficult to achieve higher segmentation accuracy. Based on this, we propose a deep convolutional neural network segmentation model based on noise reduction and super-resolution reconstruction. It is used for segmentation of osteosarcoma images. This method improves the segmentation effect of osteosarcoma MRI images through strategies such as data preprocessing and image super-resolution reconstruction. It is more suitable for clinical applications.

Materials and methods

With advances in medical and computational disciplines, an increasing number of complex medical conditions have been solved by computer-based means [49,50,51]. However, healthcare resources are still inequitably distributed and this poses a huge challenge for developing countries in the area of automated cancer diagnosis [52]. Traditionally, cancer diagnosis has relied on the manual examination of medical images by medical professionals with specialist knowledge. However, the large number of images generated for each patient, each containing a vast amount of information, makes it extremely inefficient for doctors to manually identify and diagnose these images [53]. Furthermore, the complex background of osteosarcoma MRI and the difficulty in distinguishing tumour boundaries make it easy for doctors to make misjudgements when performing manual screening tests [54, 55]. Therefore, our goal is to use computer vision techniques to filter key information for doctors, reducing their workload and improving diagnostic efficiency, which is important for the automated diagnosis of cancer in developing countries. The synthesized configuration of the proposed image segmentation method is shown in Fig. 1.

Fig. 1
figure 1

System design frame diagram

This paper is divided into three parts. In Sect. “Dataset preprocessing”, we performed data optimization and preprocessing on osteosarcoma MRI images to screen useful data slices. In Sect. “Image super resolution reconstruction”, we perform super-resolution reconstruction on the processed data. Section “Image Segmentation Network Design” deals with segmentation of osteosarcoma MRI images. By marking the suspicious location of the tumor, doctors can accurately judge the degree of soft tissue infiltration and treatment effect.

Table 1 lists some of the symbols used in this paper.

Table 1 Some symbols and their meanings

Dataset preprocessing

We looked at the initial MRI dataset and found that not all of it was suitable for training. Some images had tiny tumor areas, and some even had no tumor areas. Directly using all images for segmentation will increase the cost of segmentation and have an impact on the segmentation effect. However, manual screening of image data is time-consuming and labor-intensive, the accuracy is not high. Therefore, we need a way to separate useful data from useless data to automate the screening process:

We use Differential Activation Filters (DAF) [56] to partition the dataset to address this issue. Due to the influence of noise on the screening accuracy, DAF firstly uses discrete Fourier transform (DFT) [57] to initially denoise the high-frequency noise of osteosarcoma MRI images, then it calculates and determines the lesion area. The specific process is:

Let \(h\left(x\right)=f\left(x\right)+g\left(x\right)\) be the pixel value observed by MRI of osteosarcoma, where \(f\left(x\right)\) is the actual image pixel and \(g(x)\) is the inherent noise value. \(\forall x\in X\) represents the coordinate value of the two-dimensional image space. Wherein \({W}_{x}\) represents the image window block of size \({a}_{1}\)×\({a}_{1}\).\(x\) represents the center position of the window block.

$$Q\left(x\right)=q\left(\left|\frac{\sum_{i=1}^{{a}_{1}}\sum_{j=1}^{{a}_{1}}\left(\overline{{W }_{x}}\left(i,j\right)-{\lambda }_{T}\right)}{{a}_{1}^{2}}\right|\right)$$
(1)

where \({\lambda }_{T}\) is the screening threshold, which is the hyperparameter of the method. If \(k<0\), the activation function \(q=1\). Otherwise, \(q=0\). In the process of window block scanning of MRI images, rough initial noise reduction is performed on the scanning area \({W}_{x}\). The following formula calculates the specific noise reduction method. The result \(\overline{{W }_{x}}\) is defined as:

$$\overline{{W }_{x}}={\Upsilon }_{2D}^{-1}\left({\text{t}}\left({\Upsilon }_{2D}\left({W}_{x}\right),{\lambda }_{fft2}\upmu \sqrt{2{\text{log}}\left({a}_{1}^{2}\right)}\right)\right)$$
(2)

\({\Upsilon }_{2D}\) represents the transform operator of discrete cosine transform (DCT), \({\Upsilon }_{2D}^{-1}\) is the inverse transform operator, here we use the fast Fourier transform (FFT) [58] operation, \({\lambda }_{fft2}\) is a fixed threshold parameter. In the formula, μ is set as the adjustment factor according to the amount of noise contained in the specific data set. The value can be fine-tuned here based on the differences between each of our osteosarcoma MRI data images. we can Define \(t\) as:

$$t\left(l,\lambda \right)=\left\{\begin{array}{c}\lambda , if\quad \left|l\right|>\lambda \\ 0,\quad otherwise\end{array}\right.$$
(3)

In the window block scanning process, rough initial noise reduction processing is performed on the captured part of the MRI image area. Then further calculation is performed, and the result is obtained after the activation function. When \(q=1\), there is a tumor area in the corresponding position of the DAF window block. Marking the image as "valid data." Otherwise, keep moving the window block until the DAF has traversed the entire osteosarcoma MRI. As shown in Fig. 2, when a tumor is detected in the window block area, the picture is a "lesion image," which is valid data. Otherwise, it is a normal image, which is invalid data.

Fig. 2
figure 2

Data set preprocessing process

Image super-resolution reconstruction

Compared with CT images, magnetic resonance imaging (MRI) can better picture ligaments, tendons, tumors, and other tissue abnormalities. High-resolution MRI images can provide more details for the subsequent segmentation network, resulting in better prediction results. However, obtaining high-resolution images requires longer scan times and higher signal-to-noise ratios. None of the original datasets have high resolution, and even a small number of images have low resolution, making it more challenging to train the network to extract useful features. Therefore, we need a method for simple processing of the original image data to achieve the high-resolution reconstruction of the original data, improving the adaptability of the segmentation network while increasing the image details.

We use IMDN_AS to perform super-resolution reconstruction of MRI, as shown in Fig. 4. IMDN_AS is improved based on the architecture of IMDN [59], and the super-resolution of any scale is obtained by adaptive cropping. For the original data, we use the following steps to complete the super-resolution of the image:


(1) Adaptive cropping.


Without increasing the amount of computation, to obtain MRI super-resolution images of any scale, we replaced the previous convolution with two down-sampled convolutions on the original IMDN network to construct the IMDN_AS structure. Take the preprocessed MRI as input. The two convolutions here represent (× 4) downsampling to ensure that subsequent operations are performed at small sizes. Finally, the upsampler performs a (× 4) upsampling to ensure the input and output are the same size. However, to ensure smooth downsampling, we need to work with images whose height and width are not divisible by 4. Therefore, we introduce adaptive image cropping (ACS) to ensure that the input width and height are divisible by 4. The specific method is as follows: we cut the osteosarcoma image into 4 parts and put it into the IMDN_AS network, which can ensure that those images of special size can also complete the whole task. Four overlapping image patches are obtained through ACS, and the cropping details are given in Fig. 3. For each patch, the following must be satisfied:

$$\left\{\begin{array}{c}\left([\frac{Y}{2}]+\Delta {l}_{Y}\right)\%4=0\\ \left([\frac{X}{2}]+\Delta {l}_{X}\right)\%4=0\end{array}\right.$$
(4)

where \(\Delta {l}_{Y}\), \(\Delta {l}_{X}\) are the increments of the height and width of the MRI image, which the following formula can calculate:

$$\left\{\begin{array}{c}\Delta {l}_{Y}={padding}_{Y}-\left([\frac{Y}{2}]+{padding}_{Y}\right)\%4\\ \Delta {l}_{X}={padding}_{X}-\left([\frac{X}{2}]+{padding}_{X}\right)\%4\end{array}\right.$$
(5)

\({padding}_{Y}\) and \({padding}_{X}\) represent the extra length of the MRI image. In order to ensure universality, we take the values as:

Fig. 3
figure 3

Schematic diagram of adaptive cropping process

$${padding}_{Y}={padding}_{X}=4k,k\ge 1$$
(6)

\(K\) represents an integer greater than or equal to 1. These four cropped MRIs have the same size. At this point, they can be processed, and after the output results of the network are sequentially mapped, they are mapped to the original positions, and finally the redundant increments \(\Delta {l}_{X}\) and \(\Delta {l}_{Y}\) are discarded to obtain the reconstructed MRI image of osteosarcoma. The operation guarantees that images whose height and width are not divisible by 4 will also do the job.


(2) IMDN_AS network training


The process after we input the MRI image into IMDN_AS is shown in the figure. The first simple feature extraction is performed on the image using two 3 × 3 convolutions, and then it goes through multiple stacked information multi-distillation modules (IMDB). In addition, all intermediate features extracted from MRI images are fused using a 1 × 1 convolutional layer. The main function of the IMDB module is to extract effective features gradually. The specific process is shown in Fig. 4.

Fig. 4
figure 4

IMDN_AS architecture diagram and IMDB structure

We extract the features through the IMDB module and obtain the most useful features in a continuously refined way. As shown in Fig. 4, in the Power Regulation Module (PRM), each step of the osteosarcoma MRI needs to extract its refined features. The whole process goes through four convolutional layers and three-channel segmentations. After each passes through the convolutional layer, the channel segmentation process will produce refined and coarse features. After connecting all the fine features, it goes through the CCA layer and the 1 × 1 convolution layer in turn, while the coarse features will be input to the next computing unit to continue refining. Thus completing an IMDB module. The parameters of the PRM structure are shown in Table 2.

Table 2 PRM structure hyperparameter values

(3) Loss function setting.


For a given input osteosarcoma LR and target osteosarcoma HR images, there are maps as follows:

$${T}^{SR}={H}_{IMD{N}_{AS}}\left({T}^{LR}\right)$$
(7)

where \({H}_{IMDN\_AS}\) is our IDMN_AS module, given the training set \({\left\{{T}_{i}^{LR},{T}_{i}^{HR}\right\}}_{i=1}^{N}\), \(N\) pairs of training data can be obtained. We optimize with mean absolute error (MAE) as the loss. Therefore, our loss function can be expressed as:

$$L\left(\sigma \right)=\frac{1}{N}\sum_{I=1}^{N}{\Vert {H}_{IMDN}\left({T}_{i}^{LR}\right)-{T}_{i}^{HR}\Vert }_{1}$$
(8)

where \(\sigma \) is an updatable parameter of the model.

Image segmentation network design

After specific data processing, we used a segmentation network to perform semantic segmentation on the obtained result. This network is based on a high-resolution convolutional neural network (HRNet) [60]. As shown in FIGURE 5, traditional networks gradually reduce the resolution of feature space. After that, it gradually or at one time enlarges the resolution of the feature map to the original value to obtain the segmented image. HRNet can keep the original osteosarcoma image with a large resolution feature map while down-sampling some feature maps with the small resolution, and they progress in parallel. Finally, the osteosarcoma feature maps of each resolution are fused to realize the segmentation function. Since reducing the resolution will cause the loss of information, HRNet reduces the loss of this information so that it can segment objects more accurately. The structure of HRNet is shown in the figure below, and its process is roughly divided into two stages.

Fig. 5
figure 5

Schematic diagram of HRNet network segmentation process

Parallel multi-resolution convolution

After the osteosarcoma image is fed into the backbone, this high-resolution convolutional stream is maintained throughout the process. Before entering the next stage, the high-resolution convolutional stream of the osteosarcoma image is first retained. It is down-sampled simultaneously, and the obtained low-resolution convolutional stream is used as the next branch, in parallel with the high-resolution convolutional stream. Each time a new stage is entered, the above process is performed, ensuring that the subsequent stage's parallel stream consists of the lower resolution stream produced by the previous stage and downsampling. As the number of stages increases, so does the number of parallel branches.

The process can be represented by:

$$\begin{aligned}{\searrow Branch}_{44} \searrow {Branch}_{22} \to {Branch}_{32}\to {Branch}_{42}\\ \searrow {Branch}_{33}\to {Branch}_{43} {\searrow Branch}_{44}\end{aligned}$$

where \({Branch}_{pq}\) represents the \(q\) th parallel branch of the \(p-\) th stage, \(q\) as the resolution index represents the resolution size of each branch, and the resolution of the \(q\) parallel branch is \(\frac{1}{{2}^{q-1}}\) that of the first parallel branch.

Repeated multi-resolution fusion

The above process obtains multiple parallel flow branches of MRI of different resolutions, there is a branch of the original high-resolution representation in these branches. The fusion module is to exchange information between the resolution representations of different branches, and a fusion is required before entering the next stage. The fusion process can be seen in Fig. 5.

When the input of the fusion process is composed of \(\left\{{R}_{p}^{i},p=\mathrm{1,2},3,..n\right\}\), its output is the sum of the transformed representations of these inputs, namely:

$${R}_{p}^{0}={f}_{1p}\left({R}_{1}^{i}\right)+{f}_{2p}\left({R}_{2}^{i}\right)+{f}_{3p}\left({R}_{3}^{i}\right)+..+{f}_{np}\left({R}_{n}^{i}\right)$$
(9)

where \(p\) is the resolution index, and in the final fusion stage, the fusion process has additional outputs as:

$${R}_{n}^{0}={f}_{1p+1}\left({R}_{1}^{i}\right)+{f}_{2p+1}\left({R}_{2}^{i}\right)+{f}_{3p+1}\left({R}_{3}^{i}\right)+..+{f}_{np+1}\left({R}_{n}^{i}\right)$$
(10)

The transformation function \({f}_{xp}\left(\cdot \right)\) in the formula depends on the input resolution index \(x\) and the output resolution index \(p\). If \({\text{x}}=p\), then \({f}_{xr}\left(R\right)=R\), if \(x<r\), then \({f}_{xr}\left(R\right)\) needs to downsample the input \(R\). Conversely, if \(x>r\), then \({f}_{xr}\left(R\right)\) needs to upsample the input \(R\) to output.

Considering the particularity of data, we first used a binarization algorithm to classify the pixel values of the input MRI image to reduce the amount of network calculation [61]. At the same time, sufficient data sets are required to ensure the robustness. We rotate the image by 90°, 180°, 270° and put it into the segmentation network. To ensure the rationality of the results, we use the weighted average of the four image segmentations as the final segmentation result.

$$Avg= \sum_{i=0}^{h}\sum_{j=0}^{w}\left({c}_{0}{p}_{0,ij}+{c}_{1}{p}_{1,ij}+{c}_{2}{p}_{2,ij}+{c}_{3}{p}_{3,ij}\right)$$
(11)

And the following expressions need to be satisfied:

$${c}_{0}+{c}_{1}+{c}_{2}+{c}_{3}=1$$
(12)

Among them, the weights are obtained as \({c}_{0}=0.4\), \({c}_{1}={c}_{2}=0.25\), and \({c}_{3}=0.1\).

The proposed method not only gives high segmentation results but also provides high resolution auxiliary images. It has better performance in all aspects such as segmentation accuracy, DSC and IOU. For example. after processing and training the MRI, we obtained the segmentation results of the patient's transverse, coronal, and vector MRI images. In clinical application, doctors are particularly concerned about the lesion, the condition of the lesion, and the occurrence of distant metastasis. After the treatment of NSRDN, it can assist clinicians to divide the tumor area, determine the location of the lesion quickly and improve the rate of consultation. More conveniently, this auxiliary diagnosis and treatment method considers both precision and efficiency, and it can also be widely used in developing countries that lack technology and equipment.

Results

Data set

The data in this article were provided by the Research Center for Artificial Intelligence [62]. We collected data on over 4000 MRI osteosarcoma images from 204 patients for patients with osteosarcoma. We rotate each original image by 90°, 180°, and 270° to get richer data and put it into the segmentation network for training. Some of the patient's information is shown in Table 3.

Table 3 Some information about the patient

Experiment details

Contrasting Models: We compare the NSRDN algorithm with FCN [63], PSPNet [64], MSRN [31], MSFCN [30], FPN [65], U-Net [26] algorithms, conduct comparative experiments, and analyze the experimental results.


Evaluation Metrics: After obtaining the experimental results, to evaluate the model performance, we use accuracy, precision, recall, F1-score, Intersection of Union (IOU), and Dice Similarity Coefficient (DSC) as evaluation indicators [66]. Confusion matrices are often used to calculate and evaluate the performance of deep learning networks. It consists of four parts: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Where TP stands for the region predicted to be a tumor and is indeed a tumor. TN stands for the region predicted to be normal and also normal. FP represents a region predicted to be a tumor but is normal, and FN is a region predicted to be a normal region but is a tumor [67]. We also use the parameter quantity params as the evaluation index of the model, reflecting the size of the space occupied by the model. In osteosarcoma image segmentation, we need to improve various parameters as much as possible to ensure the model segmentation effect.


Hyperparameter Settings: The network was trained for a total of 150 epochs. Adam was used as the optimizer during training, and the initial learning rate was set to 0.001. After 100 epochs, the training was continued with a learning rate of 0.0001. Ensure that the training can reach the optimal value and ensure the segmentation effect.

Segmentation effect evaluation

We used DAF to divide the initial data, and the pictures were divided into valid and invalid pictures. Figure 6 shows the results after the division, and the left picture is the high-quality data with clear organizational boundaries. On the right, there are a few lesions and blurred tissue boundaries. Such image data is not only ineffective for model training but may also have adverse effects, so it is classified as invalid data. The origin data contains more than 80,000 osteosarcoma MRI data. After screening, we divided these data into 63,783 valid lesion data and 17,543 invalid training data, and we finally used these lesion data for subsequent processing and operations. Due to the invalid training data, there are some images containing tiny tumor areas. Therefore, we followed up with comparative experiments to verify the effectiveness of the proposed model for segmenting this part of the image.

Fig. 6
figure 6

Division of the dataset

We present the resulting image after super-resolution reconstruction in Fig. 7 and compare it with the original image. Figure 7 shows the results of super-resolution of three different orientations of osteosarcoma MRI images, where the first row is those original low-resolution images, and the second row is the reconstructed images. We zoomed in on the detailed features of the tumor site, which can be clearly and intuitively felt. The reconstructed MRI image of osteosarcoma has clearer textures and richer and clearer details. In addition, the clearer boundary information provides more details for the subsequent segmentation network, effectively improving the accuracy of the segmentation network.

Fig. 7
figure 7

Comparison of original image and super-resolution reconstruction results

Figure 8 compares the model segmentation effect before and after preprocessing the original data. There are three rows of image data in the figure, the first row represents the ground truth. Tumor areas are shown in contrasting colors, and some surrounding tissue structures are shown in other colors. The second row represents the segmentation effect diagram after data preprocessing and then put into the training network, and the third row represents the segmentation effect diagram without data preprocessing.

Fig. 8
figure 8

Comparison of segmentation effects before and after data processing

It can be seen from the comparison that some images without data preprocessing have poor segmentation results, the boundary segmentation is inaccurate, and the tumor area is not entirely predicted. Although roughly able to distinguish tumor regions, the segmentation of details is not precise enough. After the complete processing process, the image segmentation effect is better, and its effect is replaceable in terms of actual labels. In summary, the processing process in this paper can effectively improve the segmentation accuracy and further improve the segmentation effect from the level outside the segmentation model.

Figure 9 shows a schematic diagram of the segmentation effect corresponding to the original image, as well as the experimental results obtained by different comparison algorithms. The data used by the comparison methods are all preprocessed. Columns 3–8 in the figure represent the segmentation results of different algorithms on some datasets, respectively. The first column represents the original segmentation image of osteosarcoma, and the second column represents the ground truth. We simply marked the lesion area in the original MRI image for a more intuitive display. In terms of the segmentation effect, it can be found that NSRDN has better segmentation results. For images that are difficult to segment, NSRDN handles segmentation details more thoroughly, as shown in the fourth row in the figure. Compared with other algorithms, its segmentation results predict that the image is closest to the actual label.

Fig. 9
figure 9

The effect of less algorithmic segmentation on the dataset

To quantitatively analyze and evaluate the algorithm's performance to obtain a more accurate evaluation, we obtained the values of the evaluation indicators of each comparative algorithm in our experiments and compared them with the proposed method. Table 4 lists the performance of various methods on the osteosarcoma dataset, including the proposed algorithm. As can be seen from Table 4, the proposed method performs better than the segmentation results of other models in terms of evaluation metrics such as accuracy, precision, recall, F1-score, and DSC. Experiments show that the work of data pre-screening and coarse denoising can help improve the segmentation effect. In addition, the introduction of super-resolution reconstruction technology into the osteosarcoma segmentation task has achieved certain success. Since IMDN_AS is a lightweight super-resolution network, it also reduces resource consumption and model training time. Based on the data analysis, the proposed model ensures accuracy and takes into account the robustness of the model. The model's parameters are not large, which is more conducive to clinical application and avoids the need for hospitals to purchase and adapt expensive hardware facilities. In addition, we carry out statistical hypothesis testing on the data [68]. Take the IOU score as an example, the original hypothesis is "the test set has the same distribution as the training set", choose the significance of 0.05, the final test p-value is 0.17, which is much larger than 0.05, and it is impossible to reject the original hypothesis, i.e., we accept the original hypothesis that "the test set has the same distribution as the training set ".

Table 4 Performance of different algorithms on the dataset

Furthermore, our method discards some images containing tiny tumor regions at the very beginning, which guarantees an effective improvement in model training speed. However, the tumor sites in these images are often more difficult to identify and distinguish manually in actual diagnosis. To evaluate the segmentation effect and performance of our model on these images, we experimentally demonstrate the segmentation results of our method on these images and compare it with other methods. As shown in Fig. 10, we performed segmentation and comparison of these MRI images and calculated the DSC index after segmentation of each image. Experiments show that our method can effectively segment those images with inconspicuous osteosarcoma, which can more effectively prevent misdiagnosis by doctors and reduce the situation that patients cannot get timely treatment due to doctor errors.

Fig. 10
figure 10

Comparison of segmentation results of various algorithms for images containing tiny tumors

Figure 11 compares the results of the initial model and the improved model. It can be seen that the data processing operation before the model training is necessary. The IMDN operation and the optimization of the data set effectively improve the segmentation results and optimize the segmentation boundary. Using IMDN improves the value of DSC and makes Re and IOU also greatly improved. The Pret operation better improves the segmentation performance. Except for the small improvement in Acc, the values of Pre and Re are increased by about 1.4% on average, and the IOU, F1, and DSC are increased by 1.7%, 0.6%, and 0.3%, respectively.

Fig. 11
figure 11

Comparison of results after model improvement

Figure 12 compares the DSC and IOU of various contrasting algorithms in the experiment. In addition, we compare the experimental results of different segmentation networks combined with data preprocessing methods. It can be seen from the figure that our proposed osteosarcoma segmentation model has the best segmentation effect, which is determined by the adaptability of the network structure to image features. In terms of improving the segmentation accuracy of MRI for osteosarcoma, IMDN and DAF are better than other algorithms in improving the segmentation accuracy of FPN and FCN-16S. In FPN, DSC is improved by about 3%, while in FCN-16S, DSC is improved by 3.6%. For MSRN, the preprocessing leads to a decrease in the segmentation performance of the network, which indicates that our proposed preprocessing method is not suitable for MSRN network. In contrast, HRNet is able to gain better performance gains from our preprocessing algorithm. It has the best segmentation performance and segmentation accuracy among all methods.

Fig. 12
figure 12

Values of DSC and IOU for different segmentation network training results under preprocessing operations

We selected five segmentation comparison models, used the same preprocessed dataset, and compared their precision values. We conducted 150 epochs of training and selected the first 80 epochs for analysis. As shown in Fig. 13: During the training process, the Precision value of our method is relatively stable. Among the five types of algorithms, the precision value of MSRN fluctuates greatly in the first 80 epochs, gradually stabilizing in the last stage of training. Our method has the best training effect, with a Precision value of more than 0.95, indicating that the method predicts tumors well and has a low misdiagnosis rate. MSFCN has a higher precision value than MSRN but is still lower than other algorithms under the same conditions. The Precision values of UNet and FPN are close, both reaching above 0.9, second only to HRNet and nearly 7.4% higher than MSFCN. Our method achieves higher precision with more robust segmentation performance.

Fig. 13
figure 13

Precision values of different models under corresponding epochs

At the same time, we selected the recall values of these five algorithms for mapping comparison. Due to the low recall value of our method in the early stage of training, 70 epochs during the training process are selected for display here. As shown in Fig. 14, the Recall value of MSRN fluctuates greatly, while the recall value of UNet is stable, and its value is finally stable at around 0.93, which can reduce the occurrence of missed diagnosis to a certain extent. Our method keeps increasing with the training recall value and finally stabilizes at around 0.97, indicating that the method has been able to identify basically all tumor regions. Around the 100th epoch, the recall of MSFCN is lower than the mid-training value and eventually stabilizes. Compared with other algorithms, the recall of our method is higher. Although there is a certain fluctuation in the early stage, the later training is stable, which can effectively reduce the missed diagnosis.

Fig. 14
figure 14

Recall values of different models under certain epochs

Finally, we selected the first 80 epochs of training to compare and evaluate the DSC values of these models. As shown in Fig. 15, our method always maintains a high DSC value during this training process, and it has good stability. The late DSC of the model training is close to 0.96, which is higher than other comparison methods and has a higher segmentation similarity. In the later model training stage, the DSCs of U-Net, MSFCN, and FPN are all close to 0.9. Under the same training conditions, the DSC values of the three types of algorithms are all higher than MSRN. In contrast, our model algorithm has the best segmentation performance. It shows that the method in this paper can better segment the boundary on the osteosarcoma MRI data set and can provide a helpful reference for clinicians.

Fig. 15
figure 15

DSC values of different models under certain epochs

Discussion

At present, medical image processing technology has been popularized in developing countries. In the diagnosis of MRI images of osteosarcoma, FCN, PSPNet and other contrast methods improve the segmentation accuracy by improving the network structure. However, on the one hand, the shape and position of MRI images of osteosarcoma vary greatly, and on the other hand, some MRI images of osteosarcoma collected in the early stage are limited by the acquisition equipment, resulting in low resolution and high noise, all of which have an impact on the segmentation effect. Simply using a deeper and more complex network does not improve the accuracy of segmentation much. Considering the low resolution of osteosarcoma MRI images, the texture features of the detail parts are not clear. We use the information multifractionation algorithm to realize the super-resolutionization of the image. It can reasonably add details to the image while ensuring that the original details are not lost. And then we carry out a reasonable design of the segmentation model. Considering the complexity of the tumor shape and the uncertainty of the location, we used high-resolution network to segment the tumor images of different sizes. Finally, in this paper, a deep convolutional neural network osteosarcoma image segmentation system (NSRDN) based on noise reduction and super-resolution reconstruction is proposed for these characteristics of osteosarcoma MRI images. Simultaneous improvements in both image properties and segmentation network design significantly improve the segmentation performance of the model. It greatly reduces the burden of clinical diagnosis and treatment of osteosarcoma in developing countries and improves the survival rate of patients.

As shown in Table 4, our method performs better than the existing correlation algorithm on the MRI data of osteosarcoma and has the best segmentation performance. This method also balances the consumption of split resources, and the superiority of the method can be intuitively felt from Figs. 7 and 8. Because of this, the proposed method has been applied to clinical aid in diagnosis and treatment with excellent results.

The reasons for the excellent performance of our method can be summarized as follows:

(1) Targeted image processing methods.


Related studies focus more on the improvement of segmentation networks, such as MSFCN by introducing multiple supervised input layers, guiding multi-scale feature learning to improve the segmentation accuracy of MRI images of osteosarcoma. In this study, combined with the pathological features and image properties of osteosarcoma, an innovative treatment of segmented data was carried out. Aiming at the problem of image noise, we propose a denoising method for MRI images. At the same time, for the first time, the super-resolution reconstruction method was introduced in the osteosarcoma MRI image segmentation task to process the data and improve the segmentation accuracy. This solves the problem of low resolution of some osteosarcoma MRI data. In addition, considering the computational load brought by the high-resolution network, we propose an adaptive cropping method to improve the network and reduce the consumption of invalid resources.

(2) Adaptability of network structures to image features.


We use simple and efficient segmentation networks, not blindly pursuing the depth and complexity of the network. Considering that the image has the characteristics of high resolution and low noise after processing, we use a high-resolution network that is suitable for the nature of the picture for segmentation, which better maintains the high-resolution characterization of the picture. To extract more refined semantic features, parallel convolution and feature fusion are used to improve the precision of boundary segmentation.

In summary, the osteosarcoma image segmentation auxiliary diagnosis and treatment system proposed in this paper based on noise reduction and super-resolution reconstruction can achieve an excellent segmentation effect in osteosarcoma MRI image data, which is far superior to other methods. The experimental results confirm that our automated segmentation system can obtain the best performance.

Conclusions

Based on about 80,000 osteosarcoma MRI image data, this paper proposes a deep convolutional neural network osteosarcoma image segmentation system based on noise reduction and super-resolution reconstruction (NSRDN). The method includes optimization of osteosarcoma MRI image data, image super-resolution reconstruction, and semantic segmentation. We try to introduce super-resolution reconstruction in the field of osteosarcoma image processing to improve the segmentation accuracy and effectively improve the segmentation effect. We compare our method with classical segmentation methods. The experimental results show that this paper has good segmentation performance on the osteosarcoma MRI data set, effectively solving the problems of low segmentation accuracy and overfitting existing learning methods. While ensuring the segmentation effect, resource consumption is reduced. At the same time, we also visualized the segmentation effect and intuitively compared the osteosarcoma segmentation images of different methods.

However, there are some limitations of our approach. The problem of high cost of medical images that are difficult to acquire and annotate can easily lead to the lack of datasets. Thus theoretically, it is difficult for the model to extract deeper features in the absence of datasets. Meanwhile, there are limitations in model training due to the lack of datasets. In the future, with the improvement of computing power and equipment, we will improve the generalization ability of the model with small amount of data by improving the data preprocessing method. In addition, in future research, we will aim to address the segmentation errors caused by differences in tissue and tumor grayscales. This is expected to be achieved by focusing on the boundary information of the tumor and the texture information of the tissue in medical images.