Introduction

Genu Varum and Valgum refer to the natural straightening or standing of the lower limbs, with ankles or knees touching each other, while knees and ankles cannot be closed at the same time. Early symptoms are difficult to be found, but patients with severe deformities can cause osteoarthritis, patellar malacia, and other diseases due to the change of weight-bearing line of the lower extremities [1]. Early detection of HKA is of great significance to improve the prognosis of Genu Varum and Valgum. Varus malalignment has been reported in 53–76% of individuals with knee osteoarthritis [2]. HKA measured from full-length lower limb radiograph is one of the gold standards to diagnose knee malalignment.

For the diagnosis of Genu Varum and Valgum, the most common method is to use X-ray images to measure hip–knee–ankle angle (HKA). For Genu Valgum, the patient’s medial malleolus cannot close together, the lower limb is X-shaped; for Genu Varum, the patient’s knee cannot close together, the lower limb is O-shaped. HKA is a measure of lower limb alignment, defined as the angle between the mechanical axes of the femur and the tibia which is measured from a full-length lower limb radiograph [3]. In addition, HKA is a common method to evaluate the anatomical structure of lower extremities, diagnose pathology, serve as a tool for operation planning and evaluate the success of surgery [4].

Currently, HKA is manually drawn and measured by professional surgeons on X-ray images. However, hospitals produce a large number of full-length X-ray images of lower limbs every day that it is difficult for orthopedic surgeons to keep up-to-date. In addition, doctors in some underdeveloped areas are undertrained in diagnosis. Therefore, there is an urgent need for a convenient and effective method to measure HKA. Traditional HKA measurement methods [5,6,7] rely on the doctors to calculate the angle. No automatic measurement system has emerged yet.

Artificial intelligence is an advanced technology, which is able to automatically perform segmentation, classification and registration in medical images. Computer aided diagnosis using deep learning is gradually applied in medical image analysis [8]. Kang Zhang et al. [9] developed an AI system for accurate diagnosis of COVID-19. Moreover, computer aided diagnosis using deep learning is gradually applied in medical image segmentation. Yun Pei et al. [10] proposed a novel network with dual attention module to segment the colorectal tumor. Google Health developed an AI system for breast cancer screening [11]. Feng Shi et al. [12] reviewed of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19.

A novel technology is proposed to measure the HKA automatically. Different from the previous studies about HKA angle measurement, we used segmentation neural network to assist angle measurement. It is effective in angle prediction, which greatly saves the time of angle measurement for orthopedic surgeons.

Materials and methods

This study was approved by the Ethics Review Board of Second Hospital of Jilin University.

Datasets

We selected 398 patients (112 males and 286 females, age from 5-year-old to 85-year-old) who visited Second Hospital of Jilin University between October 2018 and August 2020. These patients underwent X-rays examinations for double lower limbs with the equipment from Philips Medical System. The Window Center is 2047 and the Window Width is 4095.

Firstly, with keeping the image’s proportion, the Dicom files which included original X-ray images and header file information were transferred into JPG format images. And then, the double lower limbs images were cut into unilateral lower limbs images. After that, the whole dataset included 796 images. Exclusion criteria are as the following: (1) hip replacement; (2) severe developmental dysplasia of hip; (3) knee replacement; (4) artificial limb; and (5) poor quality images. If an image has a knee replacement without hip replacement, it could be put into the dataset for segmenting the head of hip and be excluded from the dataset for segmenting the knee. If the image has an artificial limb, it could be excluded from the dataset for segmenting the knee and the ankle; however, it could be put into the dataset for segmenting the head of hip.

We randomly selected 676 images to develop and validate three deep neural networks. Particularly, 80% of images were utilized to train the model while the rest of 20% images were used for validation. The left 120 images made up testing dataset which was used to test the accuracy of segmentation result and check the performance of calculating HKA. The details about the number of images are shown in Fig. 1.

Fig. 1
figure 1

Study file

Angle measurement methods

The clinicians determined three points in the X-rays firstly in order to measure HKA. Three points located at the head of femur, the knee and the ankle. The proposed method in this study adopted deep neural networks to segment three organs respectively and locate the central point of each organ using a novel method. According to the coordinates of central points, HKA can be determined automatically.

Deep neural network

Deep neural network is fune-tuned based on U-Net [13]. It is made up of encoder module which attains abstract semantic information and decoder module which is used to restore the feature map from encoder module to the original size of input image.

Encoding part is made up of convolution layers, rectified linear unit (ReLU) layers, batch normalization layers, and max pooling layers. Convolution operation focuses on digging out the local feature with a kernel of 3 × 3 while max pooling operation reduces the scale of the model parameters with a kernel of 2 × 2.

Decoding part is consisted of deconvolution layers which perform inverse operations to amplify the shape of feature map, convolution layers, batch normalization layers, and ReLU layers. Between encoder module and decoder module, the network structure adopts the skipped connection to fuse the same size feature map from encoding part and decoding part.

Different from traditional neutral networks, the sizes of input images in this model are not equivalent. In our network, we keep the original sizes of X-rays instead of reshaping them into the same size to ensure the HKA angle not to be changed.

In the process of skipping connection, fusing operation requires the fused feature maps keeping the same size. With the convolution operation and max pooling operation in encoding part, length and width of feature map are uncertain. Therefore, before skipping connection, the network adopts bilinear upsampling to make the feature map from decoding module keep the same size as the feature map with the same number of channels from encoding module. The structure of deep neural network is shown in the Fig. 2. The parameters of deep neural network is shown in the Table 1.

Fig. 2
figure 2

Structure of the proposed automatic HKA angle measurement system

Table 1 The parameters of deep neural network for segmenting organs

Angle measurement

The neural networks are effective to segment three organs. The shapes of segmentation results are irregular. In order to calculate HKA, determining the central points of organs is necessary. An algorithm was defined then.

The points of edge counter are defined as C(xjBDY, yjBDY), j ∈ [1, n]; the internal points of segmentation area are defined as I(xi, yi), i ∈ [1, m] and the distance from internal points to boundary are defined as di, j, as shown in the Fig. 3. n is the amount of total boundary points, and m is the amount of total internal points of segmentation result.

Fig. 3
figure 3

The processing of calculating the central point of organ

Firstly, calculate the distances from I(x1, y1) to \(C\left({x}_j^{BDY},{y}_j^{BDY}\right)\) as d1, 1, d1, 2d1, n. And then, calculate the mean squared error (MSE) of d1, 1, d1, 2d1, n. Repeat the above operation to obtain the MSE of all internal points. Finally, compare all the MSEs, and select the inner point corresponding to the smallest MSE as the center point.

$${d}_{i,j}=\sqrt{{\left({x}_i-{x}_j^{BDY}\right)}^2+{\left({y}_i-{y}_j^{BDY}\right)}^2}$$
$$MSE_{i} = \frac{{\sum\limits_{{j = 1}}^{n} {\left[ {d_{{i,j}} - \frac{{\sum\nolimits_{{j = 1}}^{n} {d_{{i,j}} } }}{n}} \right]^{2} } }}{n}$$

After obtaining the central points of three organs, law of cosines is used to calculate HKA. The testing data is divided into left lower limbs and right lower limbs. The horizontal standard line whose vertex is the central point of the knee face left when the picture is the left lower limb X-ray or face right when the picture is the right lower limb X-ray. We mark the central point of the head of hip as A(xA, yA), the central point of the knee as B(xB, yB), the central point of the ankle as C(xC, yC), and the vertex of the horizontal line away from B(xB, yB) as D(xD, yD), such as Fig. 4. HKA angle was the sum of α and β.

$$\alpha =\operatorname{arccos}\left(\frac{{\left| AB\right|}^2+{\left| BD\right|}^2-{\left| AD\right|}^2}{2\times \mid AB\mid \times \mid AD\mid}\right)$$
$$\beta =\operatorname{arccos}\left(\frac{{\left| BC\right|}^2+{\left| BD\right|}^2-{\left| CD\right|}^2}{2\times \mid BC\mid \times \mid BD\mid}\right)$$
$$angle=\alpha +\beta$$
Fig. 4
figure 4

The method of calculating the HKA angle. (a) The right lower limb X-ray. (b) The left lower limb X-ray

Evaluation indexes for segmentation

In the segmentation task, the outline of edge which belongs to segmented area and the overlap between prediction and ground truth both are crucial indexes to show the accuracy of segmentation. To evaluate the performance of segmentation network, three indexes are used in this paper. (In this study, pixels in the area of segmented organs are defined as positive pixels; others are defined as negative pixels.)

Dice coefficient reflects the overlapping area between prediction and ground truth. The meanings of P and G present the number of positive pixels in prediction and ground truth.

$$Dice=\frac{2\times \mid P\cap G\mid }{\mid P\mid +\mid G\mid }$$

Recall represents the proportion of predicted true positive pixels to all true positive pixels.

$$R=\frac{TP}{TP+ FN}$$

Precision represents the proportion of predicted true positive pixels to all predicted positive pixels.

$$P=\frac{TP}{TP+ FP}$$

Statistical analysis

IBM SPSS Statistics 24.0 software was used to analyze the correlation. 95% confidence intervals (CIs) were calculated for continuous estimated parameters. Statistical significance was set at p < 0.01. Kandall’s W and Univariate analysis were performed in order to examine the measurement consistency among these three orthopedic surgeons. Student’s test and ICC were adopted to evaluate the similarity between prediction and ground truth values.

Experimental settings

The experiment platform equipped with one NVIDIA GeForce RTX 2080 graphics processor whose memory was 16 GB. The core processor was Inter Core i7-9700K CPU. The networks were trained and tested on Windows 10 system. Developing arithmetic adopts PyTorch 0.4.1 (https://pytorch.org/) as the basic frame and adopted Python 3.6 as programming language.

When training three segmentation networks, we set the same parameters and used Adam as the optimizer. The learning rate was set to 0.001 and batch size was 1. Early stopping epoch was set as 30. It means if the loss value in the validation dataset doesn’t decline continuously after training 30 epochs, the network will stop training to avoid overfitting.

Result

Segmentation performance evaluation

Deep learning model structure was used to trained three times to complete segmenting three organs, separately. We used fivefold cross-validation to evaluate the deep learning model for segmenting the organs. Firstly, the dataset was randomly divided into five groups without repeated samples. One of the five groups was selected as the validation dataset, and the remaining four groups were used as the training dataset to train the model. The above two steps were repeated five times, so that each group was used as the validation dataset. The average of the results of the model on the validation dataset was calculated to evaluate the performance of the segmentation model. The dice coefficients of fivefold cross-validation in validation dataset are shown in the Table 2. The average of dice coefficients in head of hip segmentation result is 0.8244; the average of dice coefficients in knee segmentation result is 0.9251; the average of dice coefficients in ankle bone segmentation result is 0.8988. We chose the third fold model parameters of head of hip, the first fold model parameters of knee and the first fold model parameters of ankle bone as the model parameters. The segmentation results in the testing data are shown in the Table 3. Dice, recall, and precision of deep neural network compared with ground truth were 83.18%, 81.20%, and 86.74% for segmenting the head of hip, 93.01%, 90.75%, and 95.69% for segmenting the knee, 89.83%, 90.30%, and 89.79% for segmenting the ankle, respectively. Models for segmenting the head of hip, the knee, and the ankle were trained for 150 epoches in each fold.

Table 2 The dice coefficients of fivefold cross-validation
Table 3 Three organs’ segmentation performance of deep learning

The sky blue area in the Fig. 5a presented the ground truth for segmentation and the sky blue area in the Fig. 5b presented segmentation result. The organs which are used to determine the central point coordinates are segmented by deep neural networks accurately.

Fig. 5
figure 5

Visualization of segmentation result and positioning central points. (a) The ground truth for segmentation. (b) Segmentation result

In the testing dataset, the head of hip, the knee and the ankle mainly coincident with the correct position of the organ.

Evaluation results

To validate the method, we compare the prediction result with the manual measuring HKA in the testing dataset individually using Biomet Orthosize Templating (Warsaw, Indiana, America, https://www.orthosize.com/) by three orthopedists (with 13 years’ experience, 10 years’ experience and 7 years’ experience). Three measurement results are statistically analyzed to evaluate their consistency.

We adopt Kandall’s W to calculate the similarity. The Kandall’s W coefficient is 0.999 and p value is less than 0.001. It indicates a high reliability that three orthopedists’ measurements of angle are consistent. We choose the average of the angle measured by three orthopedists as the ground truth.

To compare the data distributions of manual measurement and prediction, the data is shown in the Fig. 6. The maximum value, the minimum value, the upper quartile and the lower quartile of manual measurements and prediction are distributed in the same range.

Fig. 6
figure 6

Boxplot about readers and prediction

120 X-ray images are tested to attain the value of angle. Statistical analysis is shown in the Table 4. The mean of ground truth with standard deviation is 176.90 °  ± 12.22° and the mean of prediction with standard deviation is 176.41 °  ± 12.08°. ICC between ground truth and prediction indicates a high consistency. The value of ICC with 95% CIs is 0.999 (0.996, 0.999). The p value for ICC is less than 0.001; there is no significant difference between two groups. The average of difference between prediction and ground truth is 0.49°. The calculated angle ratio having a deviation of less than 1.5° from the ground truth is 89.17%, whereas it converges to 69.17% for a deviation of less than 1.0° ratio and 39.17% for a deviation less than 0.5°.

Table 4 Comparison and verification between the prediction and ground truth

The average of measurements from three surgeons is considered as ground truth. Bland–Altman plot with three standard curves shows the difference between prediction and ground truth. In the Bland–Altman plot (Fig. 7), the solid line denotes the average that value is −0.4905 of all difference between prediction angle and ground truth, and the dashed lines denotes 1.96 standard deviations that value is 1.4792 away from the mean.

Fig. 7
figure 7

Bland–Altman plots about prediction and ground truth

Discussion

Measuring HKA based on deep learning has yet to be developed. The traditional deep learning network requires input images with the same size. However, the length and width of different X-rays are not equivalent. In order to measure the angle, the aspect ratio of the images cannot be changed. In addition, deep learning is generally used for segmentation, detection or classification tasks rather than measuring the angle. Deep learning method need to match the complex post-processing operation to do that. Therefore, it’s a challenge to apply the deep learning technology to measure the HKA. In order to achieve automatically determining HKA, we attempt to develop and validate an end-to-end artificial intelligence system.

The new method for measuring HKA doesn’t rely on physician; it adopts deep neural networks and a novel algorithm for searching central points of organs to automatically calculate angles. The prediction and orthopedists’ measurements keep the high consistency. ICC between two groups reached 0.999 (p < 0.001), and new method saves doctors’ time. Bland–Altman plot shows substantially narrower limits of agreement within ground truth and prediction.

This measurement method proposed in this study is similar to the way that doctors measure the angle; it uses the computer algorithm to imitate doctor’s work flow. As the result of the recognizable outline of the femoral head, the knee and the ankle, it’s suitable to adopt deep learning algorithm to segment them. In three organs segmentation tasks, segmentation effect of the knee is better than others. This is because the edge contour of the knee is not surrounded by other organ or tissue so that deep neural network can extract features of the knee correctly. The head of hip is surrounded by pelvis and the ankle is near to the bottom of tibia, resulting in the outline of them hard to be found by deep neural networks, so some pixels are falsely predicted. There are some mispredicted pixels concentrated in the edge contour of segmentation. When system determining the central point of the organ, as long as the central area of the organs can be segmented, the coordinate of central point will be accurately predicted.

We didn’t use the center of mass as the central point of organ. Instead, we proposed a novel algorithm to search it, because we found that some segmentation results are not continuous regions, such as the Fig. 8. The method we proposed can ensure that the center points are located inside the organs and are as close as possible to the points manually marked by doctors. Therefore, our method can effectively reduce the influence of noise in segmentation on determining the center point coordinates. For discontinuous regions, the center of mass cannot be calculated. In addition, the pictures in the testing dataset were randomly selected from all the data, so the testing data included bad contrast and endoprostheses, such as Fig. 9. It proved our angle measurement system was much more robust.

Fig. 8
figure 8

The discontinuous regions of segmentation results

Fig. 9
figure 9

The segmentation results of bad contrast and endoprostheses

In clinical diagnosis, the orthopedic surgeons need to manually determine the central points of the three organs. It spends lots of time of doctors. Our method achieves automatically measurement. However, there are some limitations to this study. Deep learning algorithm relies on volume data. Currently, the data from single centre was utilized to develop a highly accurate system; in order to improve the robustness of system, data from different medical centres needed to be collected in the future.

In the relevant research on the use of deep learning for HKA angle measurement, Thong Phi Nguyen, et al. [14] chose the detection algorithm to determine the position of organ. The detection algorithm used the box to surround the organs. Instead, the segmentation algorithm can accurately determine the contour of the organ so that the central points of the organs can be more closed to the points of doctors’ note. In addition, our test set was larger and contained bad contrast and endoprostheses. Severe malalignment or rotational deformities of the lower extremity and patient positioning during the imaging can influence the accuracy of two-dimensional (2D) HKA measurement [15]. To solve this problem, three-dimensional (3D) lower limb reconstruction is used to determine the position of the organs. This technology requires patient is token X-rays twice (Patient is first positioned in the cabin standing with parallel feet free standing position. The second acquisition is performed with one leg slightly shifted to the other one) [16, 17]. This method requires the patient to be irradiated twice, increasing the patient’s exposure to radiation. On the other hand, open source datasets such as Osteoarthritis Initiative (OA) and clinical practice only expose once, so it is impossible to measure the HKA with three-dimensional reconstruction technology. In addition, researchers observed the correlation between HKA and femur-tibia angle (FTA) on the knee radiograph. Calculating FTA only requires patients are token the knee x-rays. Its owns cost effectiveness and minimal radiation exposure [18]. In the following research, the FTA automatic measurement algorithm will be studied.

Conclusion

We proposed a novel automatic HKA measurement method using deep learning algorithms. The method employed deep neural networks to segment the head of hip, the knee, and the ankle, and then searched the central point with the minimum MSE of distance between itself and boundary of organ. By the law of cosines, HKA was calculated according to the coordinates of three central points. With the new method, small difference was observed between prediction and ground truth and ICC has reached 0.999. The accuracy of predicted ankle values by system is similar to orthopedic surgeons, while it saves orthopedic surgeons’ time.