1 Introduction

X-ray imaging is the most common imaging technique amongst the medical community to find bone pathologies. X-rays are the oldest and most common techniques to take images of almost all bones of the body like wrist, knee, elbow, shoulder, knee, pelvis, spine, etc. X-ray imaging helps in fracture diagnosis, dislocation of joints, bone injury, abnormal bone growth, infection, and even arthritis. Bone fractures are usually accidental but they can be pathological also. That is due to the weakening of bones caused by osteoporosis, cancer, or osteogenesis. Osteoporosis is the leading bone pathology causing millions of fractures worldwide [29] and women are more affected [30]. Osteoporosis is related to age as bones become weak with the advancing age but sometimes osteoporosis prevails at younger ages also [9]. Osteoporosis is also termed as the silent disease because its symptoms are not visible in the early stages and they get prevalent when osteoporosis has reached the very advanced stage where bones are susceptible to fractures with just a little fall. The fracture fixation and other treatment costs of osteoporosis take a huge amount of budget from the economies [4, 46]. So, to reduce the treatment cost it needs to be diagnosed in the early stages.

Medically osteoporosis is diagnosed with Dual Energy X-ray Absorptiometry Technique (DXA) [13]. which determines the bone mineral density (BMD) in terms of the T-score and Z-score values approved by WHO for different stages of osteoporosis [65]. But it suffers from some limitations which include areal measurements with the technique being costly and less available. Other imaging modalities that are used for osteoporosis detection are the Quantitative Ultrasound System (QUS) [19, 21], Computed Tomography (CT) [6, 28], Magnetic Resonance Imaging (MRI) [7, 18]. MRI is a 3 T improved bone microarchitecture imaging technology but is very costly and has lower spatial resolution [7, 18], CT is 3D geometric imaging with volumetric measurements but has a high dose of radiation and doesn’t qualify for the WHO’s definition of osteoporosis detection [6, 28]. QUS is simple, non-invasive, portable, cost-effective, and uses sound waves for studying bones but it is site-specific and has an absence of strong empirical evidence [19, 21]. Considering these limitations, a cost-effective, readily available, and accurate detection system is required. This led the researchers to take advantage of recent advancements in the field of imaging technology to analyze medical images with computer algorithms to form computer-aided diagnostic systems (CAD).

In recent years, among the CAD systems for medical image analysis, deep learning-based convolutional neural network (CNN) techniques have gained popularity [38, 66] due to their state-of-the-art results in detecting many diseases from images like brain tumor detection [41], breast cancer detection [8], pneumonia detection [47], cancer detection [48, 49], human activity recognition [2], multiple sclerosis [3] etc. CNNs like AlexNet, ResNet-50, VGG-16, VGG-19, and GoogleNet [23, 33, 51, 56] have shown state-of-the-art results in the classification of medical images. The main challenge in using CNN classifiers is that they need a huge amount of labeled data for training but in the medical field availability of a big-size dataset is very difficult. To address this issue researchers have come up with the idea of transfer learning [60]. In transfer learning, a CNN trained on a huge dataset is retrained with a smaller dataset of a new problem, and CNN uses the knowledge gained from a huge dataset to easily learn the features of the new small dataset and thus effectively helps in classifying the images.

Many CAD systems are proposed for osteoporosis diagnosis including deep learning at various bone sites like hip, spine, hand, and tooth but not much work has been done to detect knee osteoporosis [62, 63]. The knee is the most stressed joint, bearing the weight of the body and responsible for mobility. With the increase in the aged population, the incidence of osteoporotic fractures around the knee increases with women at more risk of tibial and fibular fractures [10]. It is estimated that around half of the knee fractures occur in patients which are older than 50 yrs. of age and in the elderly patients who sustain femoral fractures, with less function and low quality of life, a high 1-year mortality rate of 22% is noted [53]. An early detection system is needed to detect the prevalence of osteoporosis in the knee bone to prevent fractures and reduce treatment costs [1, 44, 61].

In this paper, we have used the power of CNN architectures and the cost-effectiveness of X-ray imaging to find the early detection system for knee osteoporosis. Our model uses the prominent CNNs namely AlexNet, VggNet-16, ResNet, and VggNet-19 for classifying the knee X-ray Images.

The main contributions of our study are summarised below:

  • A labelled dataset of knee X-rays classified as normal, osteopenic, and osteoporotic according to the T-score measured by the QUS system is presented.

  • Four prominent CNN networks (AlexNet, VggNet-16, ResNet, and VggNet −19) are considered for experimentation using the PyTorch library known as Fastai [23].

  • The transfer learning is applied in all CNN networks and results are compared to find the most appropriate network to be used in clinical practices for osteoporosis detection.

  • To the best of our knowledge, this is the first study to detect osteoporosis in knee bone with the labeled dataset having all three classes of osteoporosis i.e.; normal, osteopenia, and osteoporosis.

The rest of the paper is organized as related work is discussed in Section 2. In Section 3, we have discussed the dataset used in the study with the different methods used to study the dataset. Section 4 presents the experimentation and results and Section 5 discusses and compares the results with existent works of literature. Finally, Section 6 concludes the paper and shows the limitations of the study and future directions.

2 Related work

Machine learning approaches especially deep convolution neural networks have shown state-of-the-art results in disease detection [15]. Many researchers have successfully used machine learning approaches to build the osteoporosis diagnosis system from different types of images [63]. In this section, we have discussed the latest works done in the field of osteoporosis diagnosis using deep convolution neural networks.

Computed radiography images were utilized in [22] in 2016 to detect osteoporosis from phalanges with DCNN. They used three-fold cross-validation for evaluation and achieved a good diagnosis ratio.

Naoufami et al. [59] in their work proposed DCNN to detect osteoporotic vertebral fractures (VF). Computed tomographic images of vertebrae were used to extract logical features and then the performance of the system was compared with the practicing radiologists and comparable results were achieved. Derkatch et al. [12] used DCNN to detect vertebral fractures from DXA images with good accuracy. CT scans of vertebrae were utilized by Krishnaraj et al. [32] to identify osteoporotic and non-osteoporotic subjects. They used U-net CNN for the segmentation of CT images and achieved good accuracy. Vertebral CT scans were also utilized by Fang et al. [16] for osteoporosis detection. They used DenseNet-121 CNN classifier to classify normal and osteoporotic vertebrae. DCNN was also employed by Zhang et al. to detect osteoporosis and osteopenia in lumbar spine radiographs they included that dataset containing the images of only women aged ≥50 [71]. Lee et al. [36] extracted the spine x-rays features with the help of CNN architectures and passed them to the machine learning classifiers for classification. They achieved the maximum classification accuracy of 71% with VGG for feature extraction and random forest for classification. Yasaka et al. [68] used the CNN architecture to predict the BMD of lumbar vertebrae from computed tomography images of the abdomen. They found a good correlation between the predicted BMD from CNN and the DXA BMD. Computed tomography scans of the spine were studied by Sollmann et al. [52] and assessed the volumetric bone mineral density with CNN. They compared the results with the volumetric bone mineral density obtained from routine CT and found that CNN gives high diagnostic accuracy.

Dental Panoramic Radiographs (DPRs) were utilized by Lee et al. [35] to diagnose osteoporosis from the tooth with the help of a convolution neural network. The results of oral and maxillofacial radiologists were surpassed by this DCNN. DPRs were also used by [37] for osteoporosis detection. They used VGG-16 CNN classifier and employed transfer learning in VGG-16 to improve the classification performance of the CNN classifier. AlexNet CNN was used by Yu et al. [70] to detect osteoporosis from dental Panoramic radiographs. They classified the DPRs in osteoporotic and non-osteoporotic with good accuracy but doesn’t include the osteopenia class exclusively. DPRs were also studied by Sukegawa et al. [54] with the help of CNNs to detect osteoporosis on and found the good performance. They also added the clinical covariates which further improved the classification performance.

The magnetic resonance images of the proximal femur were studied by Deniz et al. [11] for osteoporosis detection. They used DCNN to segment the proximal femur for measuring the quality of bone and assessment of fracture.

Two CNN models namely MS-Net (Mark- Segmentation- Network) and BCC-Net (Bone- Conditions -Classification Network) were proposed by Tang et al. [57] for ROI selection and for bone type determination on basis of extracted features from ROI respectively in osteoporosis diagnosis and achieved 76.65% accuracy.

Liu et al. [40] diagnosed osteoporosis from x-ray images of the pelvis. They calculated the energy function from the softmax of the proposed U-net model that uses the deep features of the medullary joint from X-rays to detect osteoporosis. This study poorly diagnoses the images of the bone mass reduction group and osteoporosis group. Yamamoto et al. [67] detected osteoporosis from hip radiographs using CNN. They combined the clinical covariates with images and found that it improved the performance and the best performance was achieved by EfficeientNet CNN.

The AlexNet Classifier was used by Tecle et al. for diagnoses of osteoporosis [58]. They used the X-ray images of the hand and classified the osteoporotic and non-osteoporotic images from the segmented second metacarpal region.

He et al. [24] analysed the knee X-rays and proposed to use two radiographic parameters namely cortical bone thickness and distal femoral cortex for bone quality assessment. These parameters were found to have a significant correlation with BMD and T-score.

From the above-related works of literature, we could find that knee osteoporosis is an under-studied field as compared to other sites like vertebrae and teeth. Detecting osteoporosis from the knee can protect vital organs like kidney, pancreas, etc. from getting exposed to harmful radiation while getting the images for analysis. X-rays are also the cheapest form of medical images available and can help build a cost-effective system. we have used the knee x-ray images which are classified as normal, osteopenic, and osteoporotic on basis of T-score values obtained from the QUS system to train the CNN networks. The transfer learning helps the CNN to perform well even when trained on a small dataset.

3 Materials and methods

3.1 Dataset

The knee x-ray images were collected from the BMD camp organized by the Unani and Panchkarma Hospital, Srinagar, J&K, India, and its sister branches from 21 to 12-2019 to 31-12-2019 in central Kashmir, north Kashmir, and south Kashmir. The camp was organized in the hospital premises and was open to participants from all age groups, genders, and different regions of Kashmir, India. The dataset consists of both x-ray images as well as osteoporosisrelated clinical factors for each participant. Each patient first went through a personal interview wherein he was informed about the procedure of the QUS BMD test and various clinical factors like age, gender, height, previous history of fracture or any other pathology, lifestyle habits, medications, etc. were documented. Written consent from each participant was taken for using their data without their personal details like name, and address in the research study. Then the BMD was measured just below the knee with the peripheral bone assessment QUS system known as the Sunlight Omnisense 7000S with simulation software from Pegasus Prestige (Osteomed, DMS, France). The QUS system was chosen as it is radiation-free, multisite, easy to use, affordable, accurate, portable, and fits the osteoporosis diagnosis criteria of WHO [17]. The report generated with this QUS system contains the Z-score value, T-score value, diagnosis i.e.; normal, osteopenia or osteoporosis, and area of assessment for measuring the BMD. After the BMD measurement, the knee x-rays in anteroposterior view (AP) were taken from the participants who gave consent to undergo an x-ray. Among the total of 932 participants who went through the BMD test, only 240 gave consent to undergo the x-ray scanning. The x-rays obtained were then kept under the different classes of BMD level on basis of the T-score values recommended by WHO obtained from the QUS system. The BMD tests of the participants confirmed that among the 240 participants, ones with normal BMD were 37 with 18 males and 19 females; 154 were osteopenic with 59 males and 95 females; 49 were osteoporotic with 31 males and 18 females. The dataset is available at [43]. The demographic information of the 240 participants is given in Table 1 with sample knee x-rays from normal, osteopenic, and osteoporotic classes shown in Fig. 1 [64].

Table 1 Demographic information like lifestyle factors, clinical factors, and no. of samples in QUS classified Classes for the dataset. BMI: body mass index
Fig. 1
figure 1

Sample images from the database from top to bottom (a) normal X-ray, (b) Osteopenia X-ray, (c) Osteoporosis X-ray

In the x-rays collected from the camp, some X-rays had scans of both the knees so, the left and right knee x-rays were separated and then the dimensions of all the x-rays were kept the same, and finally, we have 381 knee x-rays. The region of interest containing the knee joint and some area from the top and bottom limb was extracted from each x-ray to be used for further processing. In this study, we have used only the x-ray images from the database to make the vision-based classification system from CNNs. The image dataset was further split into training and validation sets. The CNNs AlexNet, Resnet-50, VggNet-16, and VggNet-19 are trained with the training set and the accuracy of the classifier is then validated with the validation sets.

3.2 Proposed methodology

Figure 2 shows the block diagram of proposed model for detection of osteoporosis from knee x-rays. Firstly, the knee X-rays were collected as mentioned in Section 3.1 to form an image dataset which is then split into training data (used to train CNN classifier) and test data (used to test the trained classifier). The training data is augmented to increase the number of images in the training set as CNN works better with more data. Then the imageset is passed on to CNN model for training the CNN. Then finally the prediction ratio of train and test data is analysed and performance of classifier to classify images into normal, osteopenia, and osteoporostic images is evaluated.

Fig. 2
figure 2

Block Diagram of Proposed Methodology

3.3 CNN architectures

CNN is the variant of deep neural networks whose intermediate levels are based on the principle of convolution. The convolution is the mathematical function in which one function is modified with another function to get a new function with some modified features. CNNs are used for the processing of images in which the image is convolved with a filter of less length * width to reduce the size of the images but maintain the basic information contained in the image. CNN as compared to other deep learning architectures have received more interest from researchers because they can utilize both the configural and the spatial information of the 2D as well as 3D images [34]. The source of power in CNN is that it can learn the image data directly from the image without any extra methods required for feature extraction as in other machine learning methods [27] or object segmentation [55]. Many CNNs have been developed to solve various types of problems and they vary with each other in one or the other aspect but the basic components are the same. The CNNs consist of three types of layers viz.; convolutional layer, pooling layer, and fully connected layer. The convolutional layer is responsible for learning the feature representations of the input images by using the set of filters. The pooling layer helps in reducing the computations and parameters with the downsampling of the representations to achieve the shift-invariance. It is usually placed in between the two convolutional layers. There could be any number of convolutional and pooling layers in the network. By stacking them properly we can extract the feature maps containing the higher-level representations. One or more fully connected (FC) layers are present at the end of the stacked convolutional and pooling layers and before the output layer, to perform the task of reasoning.

In our study, we have employed the popular pretrained CNN architectures namely AlexNet, ResNet-18, VggNet-16, and VggNet-19.

3.3.1 AlexNet

AlexNet proposed by Krizhevsky et al. is known for its breakthrough in machine learning for achieving high accuracy in the classification of 1.2 million HR (high resolution) images at ImageNet LSVRC-2010 contest in 1000 different classes with a 15.3% of top error rate. It outperformed the previous state-of-the-art architectures. The network consists of 5 convolutional neural networks followed by max-pooling layers and then three fully connected layers with a 1000-way softmax classifier at the end. The basic architecture of AlexNet from [33] is shown in Fig. 3. AlextNet has been used in many applications to classify different types of images. In disease detection from medical images, AlexNet has shown efficient results and outperformed the expertise of medical experts in many applications like brain tumor detection from brain MRI [42], skin lesion detection [25], COVID -19 [50], etc. AlexNet was chosen for this comparison as its training speed is 5 times faster than other DL architectures, it works with any GPU with no extra hardware requirement, and uses a RELU activation function that can converge the stochastic gradient descent with good acceleration [25].

3.3.2 ResNet-18

ResNet CNN architecture, proposed by He et al. [23] won the ILSVC challenge of 2015 bringing the error rate as low as 3.6%. It was an extremely deep network with 152 layers. ResNets are built on multiple stocks of residual blocks. Residual blocks help to feed the activation of one layer to the layer deeper in the network by using skip connections. This helps the system train faster. ResNet has many variants like ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152 as per the number of layers in the network. In our study, we have used the ResNet −18 architecture as our dataset is not so large. The network architecture for ResNet −18 is given in Fig. 3. In medical image classification ResNet has shown very promising results in detecting brain pathology [39], Thyroid Ultrasound images [20], breast cancer detection [69], etc.

Fig. 3
figure 3

Basic Network architecture of AlexNet, ResNet-18, VggNet-19, and VggNet-16.Fastai

3.3.3 VggNet

VggNet CNN architecture, proposed by Simonyan et al. [51] of the Visual Geometry Group of Oxford University and was the first runner-up in the ILSVR challenge of 2014. The main aspect of VggNet is its cascading network architecture. it uses small 3✕3 convolution filters and a pooling layer after 2 or 3 convolutional layers. The network has two variants on basis of the number of convolution, pooling, and fully connected layers i.e. 16 or 19 known as VggNet-16 and VggNet-19 models respectively. The general architecture of VggNets is given in Fig. 2. In medical diagnosis, VggNet has shown the state of the art results in detecting many diseases from medical images like diabetic retinopathy [31], Alzheimer’s disease detection [14], malaria disease detection [45], etc.

The model architecture layers and basic features of AlexNet, VggNet-16, ResNet, and VggNet-19 are given in Table 2.

Table 2 The table depicts the number of convolution, pooling layers, FC layers, and basic features of AlexNet, VggNet-16, ResNet-18, and VggNet-19 architectures

A fastai [26] is a layered application programming interface built for deep learning. Components provided by fastai are of a high level that can help the standard deep learning architecture to get the state-of-the-art results quickly and easily as well as a low level that can help to build new approaches with alterations or updations. The library has dynamism from the python language and flexibility of the Pytorch library. The dynamism of the python language and flexibility of the Pytorch library present in the fastai makes it a good choice to be used for implementing the deep learning architectures.

3.4 Transfer learning

Transfer learning, used in machine learning, is the reuse of a pre-trained model on a new problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another. It’s currently very popular in deep learning because it can train deep neural networks with comparatively little data. In the medical field, obtaining millions of labelled images required to train a convolutional neural network is a great challenge. Several benefits include: saving training time, better performance of neural networks (in most cases), and not needing a lot of data.

4 Experimentations and results

For experimentation, we have compared the performance of four CNN architectures namely AlexNet, ResNet −18, VggNet-16, and VggNet-19 for classifying the three stages of osteoporosis in knee x-rays. These architectures have been successfully used in classifying the medical images of other diseases so can be used for osteoporosis detection from knee x-rays. These architectures classify the images by extracting the feature maps of what is in a knee x-ray. These architectures vary from each other in way of the number of layers (convolution, pooling, or FCC), or some other units. The basic architecture details and features of the CNN architectures used are given in Table 2.

The CNN architectures were loaded from the Fastai library using the cnn_learner function. The CNN architectures are data-thirsty networks and we have only 381 knee x-ray scans having 60 in normal, 245 in osteopenia, and 76 in osteoporosis class so to increase the number of images data augmentation was done by using the Dataloader() function from Fastai library. The CNN networks were first trained on just our dataset and then transfer learning was employed by using the pretrained networks trained on ImageNet dataset containing millions of images to check whether transfer learning can improve the classification performance or not. Due to less number of images, the dataset was divided into a 95:5 ratio of training and validation sets. All four CNN architectures namely: AlexNet, ResNet-18, VggNet-16, and VggNet-19 were first only trained with the training set of knee x-ray images for 10 epochs and then CNNs pretrained with ImageNet dataset was trained with knee x-ray dataset. The performance of both types of CNNs was measured in terms of accuracy, error rate and validation loss which are shown in table form as well as graphically below. The Tables 3, 4 and 5 show the results obtained when CNNs were not pre-trained with ImageNet dataset and corresponding graphs are displayed in Figs. 4, 5 and 6 respectively. Results corresponding to pretrained CNNs are shown in Tables 6, 7 and 8 and graphically displayed in Figs. 7, 8 and 9.

Table 3 The table depicts the error rates achieved by AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Table 4 The table depicts the validation losses of AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Table 5 The table depicts the accuracies of AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Fig. 4
figure 4

The classification accuracies for AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs

Fig. 5
figure 5

Error rate of AlexNet, VggNet-19, ResNet, and VggNet-16 achieved for 10 epochs

Fig. 6
figure 6

Validation Loss of AlexNet, VggNet-19, ResNet, and VggNet-16 achieved for 10 epochs

Table 6 The table depicts the error rates achieved by AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Table 7 The table depicts the validation losses of AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Table 8 The table depicts the accuracies of AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs
Fig. 7
figure 7

The classification accuracies for AlexNet, VggNet-19, ResNet, and VggNet-16 for 10 epochs

Fig. 8
figure 8

Error rate of AlexNet, VggNet-19, ResNet, and VggNet-16 achieved for 10 epochs

Fig. 9
figure 9

Validation Loss of AlexNet, VggNet-19, ResNet, and VggNet-16 achieved for 10 epochs

Figure 10 shows the confusion matrix for AlexNet, ResNet, Vgg-19, and Vgg-16 models obtained for validation sets in pretrained CNNs. The osteopenia disease group has the highest classification accuracy with X-ray images. The images with the lowest classification accuracy are the osteoporosis images. Both the variants of VggNet CNN were not able to classify the osteoporotic images while obtaining the highest accuracy of 86.3 and 84.2 for Vgg-16 and Vgg-19 respectively. The poor classification performance for classifying the osteoporosis disease group and then the normal group is because the number of images in each class was low. The collected dataset of knee X-ray images had a maximum number of images in the osteopenia group than the normal and osteoporosis group. The results from all CNN architectures suggest that X-ray images can be used to detect osteoporosis from the knees.

Fig. 10
figure 10

The confusion matrixes for the validation set of the dataset, (a) Vgg-19, (b) Vgg-16, (c) ResNet, (d) AlexNet

5 Discussion

This is the first study that is aimed to detect osteoporosis from knee X-rays that are classified into disease groups (osteopenia and osteoporosis) and normal on basis of BMD values obtained from the medical diagnostic test QUS. We have used the power of CNN networks to classify the class of X-ray images by interpreting the differences in the image groups and then classifying them automatically. The performance of well-known CNNs was compared in order to get the best performing CNN for detecting osteoporosis from knee X-rays. The participants of the study included participants of all genders and ages. Deep learning architectures have been used to detect osteoporosis from other sites like hand, spine, or hip scans.

From Figs. 4 and 7 we can observe that the best classification accuracy is achieved by AlexNet and the lowest performance is obtained by Vgg-19 and Vgg-16 in normal CNN and pre trained CNN respectively. In Table 9 we have summarised the best values from all the metrics of both types of CNNs. We can see that the best classification accuracy achieved by AlexNet, ResNet, VggNet-16, and VggNet-19 is 78.95%, 74.3%, 78.9%, and 73.68% and 91%, 86.4%, 86.3%, and 84.2% for normal CNN and pretrained CNN respectively. From Table 9, we can see that the lowest error rates achieved by AlexNet, ResNet, VggNet-16, and VggNet-19 are 0.21, 0.257, 0.21, and 0.263, and 0.09, 0.136, 0.181, and 0.157 for normal CNN and pretrained CNN respectively. Also we can see that the lowest validation loss from AlexNet, ResNet, VggNet-16, and VggNet-19 is 0.544, 0.138, 0.671 and 0.685 and 0.325, 0.694, 0.625, and 0.692 for normal and pretrained CNN respectively. The results obtained suggests that when CNNs were trained with only knee x-ray dataset although showed good classification accuracy but pretrained CNNs when trained with knee x-ray dataset showed improved accuracy. This implies that using transfer learning improves the overall performance of the system without building a new CNN from scratch or adding or deleting any layer. The highest classification accuracy of 91% achieved by AlexNet suggests of using CNN for classification of knee X-ray images. The previous deep learning models used in osteoporosis detection from other sites showed good performance but had some limitations for eg; in study of Zhang et al. [71] to detect osteoporosis from lumbar spine X-rays using deep learning model but they included that dataset containing the images of only women aged ≥50. Lee et al. [36] achieved the maximum classification accuracy of 71% with VGG for feature extraction and random forest for classification. Yasaka et al. [68] studied CT images of vertebrae and found a good correlation between the predicted BMD from CNN and the DXA BMD. Study of Liu et al. [40] poorly diagnosed the images of the bone mass reduction group and osteoporosis group. AlexNet CNN used by Yu et al. [70] detected osteoporosis from DPRs with good accuracy but doesn’t include the osteopenia class exclusively. He et al. [24] analysed that radiographic parameters from knee X-rays have a significant correlation with BMD and T-score. Bortone et al. [5] used the artificial neural network and support vector machine classifiers to classify the subjects with osteopenia, osteoporosis and normal bone functions on basis of the lifestyle factors, the previous history of fractures based on data, collected from participants by filling up the questionaries. Tang et al. [57] used CNN model for bone type determination with accuracy of 76.65%. Table 10 presents the comparison of our work with existing state-of-the-art works.

Table 9 Comparison of different metrics of normal CNN and pretrained CNN
Table 10 Comparison with existing state-of-the-art works

Our dataset consists of image data as well as numerical data containing the clinical, lifestyle, and other important factors. But in this study, we devised a system that can detect osteoporosis directly from X-ray images. The images used are grouped in three different classes viz.: normal, osteopenia, and osteoporosis on basis of the T-score calculated from the QUS system, unlike many other computer-aided systems which are built on binary classification. The images consist of the x-rays from both males and females and the age group varies from 18 to 107 years of age. The deep learning-based detection system for osteoporosis can be a good choice and can help medical experts to identify the patients with risk of osteoporosis and osteoporotic risk fractures at very early stages. The deep learning model trained on the supervised X-ray images can help in diagnosing osteoporosis not only in the early stages but also can prove to be a cost-effective and easily available tool in low-income economies having higher population rates like India or other countries. The clinical factors can also help the medical practitioner to make a wise decision for a patient in addition to classification from a deep learning system. The CNN systems are completely automatic as they do not require any additional effort for feature extraction, selection, or classification. The inability of the VggNets to classify the osteoporotic class can be the result of having fewer images in this class. The maximum participants were diagnosed with osteopenia from the QUS system and all four CNNs were able to detect the Knee X-rays of osteopenia class very efficiently. We could increase the efficiency of CNNs in classifying the normal and osteoporotic x-rays by adding more images to each class. The main outcomes of our study are summarised below:

  1. 1.

    Mostly the studies work on a particular age group or gender. The X-ray images included in the study are collected from different age groups and all genders.

  2. 2.

    Our study covers all three classification criteria of osteoporosis i.e.; normal, osteopenia, and osteoporosis. and our study is validated by the medical test QUS which calculates the T-score by measuring the bone mineral density of the bones.

  3. 3.

    Classification of x-rays with CNN is purely automatic. It doesn’t involve separate methodologies for feature extraction, selection, or classification.

  4. 4.

    We have compared the performance of well-known CNN architectures viz.: AlexNet, VggNet-16, VggNet-19, and ResNet-18 in classifying the knee X-rays.

  5. 5.

    To overcome the problem of the small number of images in the dataset we have used data augmentation and transfer learning.

  6. 6.

    The comparison with existing state-of-the-art works shows that our proposed model shows good performance (Table 10) and can be used for osteoporosis detection.

Our study suffers from some limitations. Firstly the performance of the CNNs was affected by a small number of images in the dataset especially in normal and osteoporosis classes. We believe increasing the number of images in each class will enhance the performance of the networks. Secondly, the T-score was calculated from the QUS system which is a cost-effective technique for assessing the fracture risk by examining the calcaneus of the different bones but it gives unstable bone parameters and its validation database is different from the BMD DXA. So, we can further validate our dataset by measuring the BMD with DXA. Thirdly, the clinical and other factors which were collected from the participants can also help in predicting the bone condition of the patient but it was not used in the classification process. So, we will try to inculcate these features with image data for better diagnosis. Despite some limitations our comparison could help to find the best CNN architecture to be used in clinical settings for diagnosing osteoporosis at early stages, reducing the risk of fractures which will automatically decrease the testing and treatment costs of osteoporosis.

We can summarize that the simple knee X-ray scans, taken for whatsoever reason can be passed through the system made with CNN and can be assessed for risk of osteoporosis or osteopenia without any extra cost or screening. The medical acceptance of these deep learning systems is not yet available but using Artificial Intelligent systems to give the first advice on the possibility of having some disease can be very helpful in modern medicine.

6 Conclusion and recommendation

In our study, we have evaluated and compared the performance of popular CNN architectures namely ResNet-18, VggNet-16, AlexNet, and VggNet-19 in diagnosing osteoporosis from knee X-ray images. The X-ray images used were taken from the custom dataset that was classified into normal, osteopenia, and osteoporosis group with the help of a medically accepted BMD test known as the Quantitative Ultrasound system which calculates the T-score by measuring the BMD of bone. The custom dataset contained a total of 381 knee x-ray scans. The results show that the best performance was achieved by AlexNet with 91% of accuracy and the lowest performance of 84.2% was given by VggNet-19. The results from all CNNs showed good diagnostic performance and suggest that diagnosing osteoporosis from knee X-ray using transfer learning with CNN can serve to be a cost-effective and readily available diagnostic tool.

In the future, more data can be collected especially from normal and osteoporotic subjects. Secondly, we can find the relationship of knee osteoporosis with osteoporosis at other sites to make universal diagnostic system of osteoporosis. Thirdly, the system can be build which will detect osteoporosis from clinical factors in combination with images.