1 Introduction

The “Video Surveillance” category covers major events, analyses, and other information essential to security experts that install or use video surveillance systems in corporations, educational institutions, and government agencies [1, 2]. The use of gender recognition in video surveillance is challenging. Research into gender detection is difficult since a gait is a dynamic, complex target [3]. However, the biggest problem is hidden camera angles, bad lighting, and low-resolution security footage. Therefore, gait analysis is the choice method for identifying people.

Since human stride differs from other biometric traits, computer vision researchers are focusing on gait detection. [4]. Furthermore psychophysical research shows that stride can differentiate persons. Human gait is motion patterns. Gait recognition is ideal for video surveillance systems that cannot capture high-resolution facial or biometric data. Gait classification uses model-based and model-free methods [1, 5]. Model-free gait algorithms use silhouette images to compute gait features. Model-based gait methods use body components to model each person’s body shape before measuring features. Gait identification has various uses, including gender, age, diagnosis, action recognition, and security footage monitoring.

Investigations using gait analysis typically aim to achieve their primary objectives first and foremost, which are person identification and authenticity. On the other hand, gender detection are examples of types of soft biometric identification that have a great application potential [6]. Furthermore, some application domains may benefit from the ability to use an individual's stride length to estimate their gender. Currently, Most of the available research on gait-based gender recognition uses the gait feature that is taken from silhouette sequence of gait cycle. This gait feature is known as GEI [7, 8].

GEI has been used in the vast majority of the research conducted on gender detection. The GEI algorithm merges the silhouettes from a single walking cycle in its entirety. Each image object's brightness in GEI indicates the entire motion pattern cycle of the gait [9]. We are able to tackle the problem of gait analysis based on gender detection by employing machine and deep learning. GEIs used to extract features with the assistance of pre-trained models, and classifiers are put to use to classify gender. Moreover, adjusting the classifier’s parameters can further improve the proposed system’s efficiency.

2 Analysis on biometric gait and feature extraction

Biometric identification uses a person’s unique physical or behavioral features. Biometrics including fingerprints, face features, iris patterns, voice, and gait are utilized for secure and reliable identification. Bio-metric gait recognizes people by their walking motion. It measures stride, stance, and swing gait cycles.

2.1 Analysis on biometric gait

A walking motion of humans is referred to as their “gait” [1, 4, 5]. The term “gait cycle” refers to the period of time or the sequence of events that takes place between the impact of one foot on the ground by a human and the subsequent impact of the same leg the floor. The following three things are the most important aspects that contribute to gait such as stride, stance stage and swing stage [5, 8]. The term "stride" describes the distance among two consecutive foot contacts made by the same foot. There are four components to the stance stage, which are as follows: (1) InitialContact, (2) LoadingResponse, (3) MidStance, and (4) TerminalStance. During the stance stage, 60% of the leg is hit on floor. Whereas swing stage, 40% of that same leg is not in contact with the ground that includes the following components: (1) pre-swing, (2) beginning of the swing, (3) middle of the swing, and (4) swing-termination.

2.2 Motivation

Following are some of the reasons why gait analysis is useful for recognizing and identifying individuals:

  • Gait recognition provides remote gait monitoring. And it can be carried out using small and cheap equipment. Simple devices detect gait. Cameras, smart phones, and floor sensors can measure human stride.

  • Low-resolution gait analysis is feasible. Low-quality video may affect face recognition. Gait recognition uses silhouettes and motions. There is no need for participant participation in order to do gait identification.

2.3 Feature extraction

The gait recognition research focuses on silhouette and GEI images for spatial and temporal feature analysis of human walking images.

2.3.1 Silhouette images

The process of extracting features was carried out on the video in order to make it possible to object recognition with a higher degree of precision. As a consequence, the visual sequence was presented in the form of silhouettes. Reconstructing a body's silhouette requires, to perform thresholding and removed background, and after that, a 3 × 3 median filter operator was applied in order to eliminate any isolated pixels that may have been present [9, 10].

2.3.2 Gait energy images (GEI)

The weighted average method is utilized by GEI in order to depict the gait sequence of gait cycle [7, 11] from the images captured during the Gait Cycle (BT(x, y)), we can derive the GEI:

$$G\left( {x,y} \right) = \frac{1}{N}\sum\nolimits_{{t = 1}}^{N} {Bt\left( {x,y} \right)}$$
(1)

Time t that was referred to as (x, y). Each image B has its own set of coordinates in space, denoted by x and y, and number of frames took from gait cycle is denoted by N.

2.4 GEI based application for gait analysis

Following are the GEI base application for gait analysis that can be useful for recognizing and identifying individuals:

  • The GEI image lets us assess the person’s stance and swing frequency to predict their ideal stride frequency. We can also predict and order angles. As a result, it will help to adopt a more precise individual motion pattern from their arms, head and legs.

  • When it comes to creating system based on gender inquiries. The entry and restrictions requirements for colleges and hostel depending on gender, are a few examples of the kind of applications that fall within this category.

3 Related work

Kwon and [12]. Lee investigated JSE for three-dimensional detection and recognition. JSE kinematic gait measures body joint distance from anatomical planes during model skeleton walking. Instead of dynamic motion, static positions are used to create anatomical planes. Walking skeleton model gait sequences generate median, transverse, and frontal body-centered coordinates in the initial stage. The second stage extracts the three planes' JSEs. Feature extractions merge JSEs. Labelled JSE data completes the classification model. The trained classification model classifies gender-unlabeled JSE data. Four datasets use them: A, B used Kinect v1 and v2, C, and D used UPCVgaitK1 and K2. They explored different machine learning gender classification methods. In dataset B, JSE-SVM scored 98.08%.

Upadhyay and Gonsalves [13] addressed the angles, which prevented the camera from seeing the individual’s movement of their body parts and gait-based features for gender detection. After DCT feature vector application, the XGBoost-classifier classifies gender. The proposed system's performance is assessed using the OU-MVLP dataset. The experiment uses fourteen viewing angles to determine gender with a CCR of 95.33%.

Khabir et al. [14], they overcame these challenges and showed how to classify gender from gait dataset based on inertial sensor from the massive OU-ISIR dataset. They found that SVM had the maximum accuracy for gender classification at 84.76%.

Xu et al. [15], they examined an individual frame to determine probability of gender identification. Rather than employing a sequence of gait data for real-world applications. They utilized OL-MVLP, which included 10,307 subjects. The SVM algorithm's CCR for gender classification is 94.27% for a single image.

Bei et al. [10], they investigated optical camera sensor-based gender recognition. SubGEI from the gait cycle extracts temporal movement cues on fusion body appearance for more accurate gait analysis in tough settings. Two-stream CNN reflects GEI and also handles motion information of subGEI better than CNN models. CASIAB gait dataset. The investigations used a CNN with four layers, Inception-V3, and VGG16 in the first-stream and 3 sub-GEIs, TL1, TL2, and TL3, reflecting sub-GEIs having four, six, and eight picture frames in the second-stream. The experiment analyses 11 viewing angles, with the 90° angle yielding the best gender categorization accuracy of 95% when applying the Incep-tionV3 algorithm on TL-2 sub-GEI.

Gillani et al. [16], they examined based on the attributes that were collected from the data for gender detection. For the purpose of obtaining the inertial signals, Inertial Measurement Units sensors were utilized. These sensors had triangle gyroscopes and accelerometers. They employed the OU-ISIR dataset; they divided the dataset into sequences, and the participants were instructed to walk on a predetermined path two times: once when reaching the route for the first time (seq 0) and once while return to the same route (seq 1). The LogisticRegressor classifier has the highest CCR for gender classification, 68.2% for sequence 0 and 65% for sequence 1.

Chen et al. [17], They developed a customized dataset in which they took into consideration Using insole pressure mats that determined the Center of pressure for feature extraction, where researchers captured 960 steps from each of the 24 younger and older subjects in the study. For the purpose of applying SVM, they split the 30 features in four-stages, such as the initial contact stage, forefoot contact stage, foot flat stage, and the fore foot push off stage. They used 13 different steps, one for each participant, to obtain an accuracy of 95% for the gender classification.

4 Proposed method

In comparison to previous research efforts in gait recognition, which have employed various methodologies and datasets, this study introduces a novel approach centered on the OULP-Age dataset and gait energy images (GEIs) for gender prediction. By utilizing pre-trained models for feature extraction from GEIs and fine-tuning the parameters of the XGBoost classifier, this research presents a distinctive perspective on gait analysis and gender prediction.

Unlike prior studies, which focused on different datasets and methodologies such as Joint Symmetry Estimation (JSE) or Discrete Cosine Transform (DCT), our approach offers a unique dataset and methodology for gender prediction, showcasing the effectiveness of DenseNet pre-trained models and optimized XGBoost parameters. This contribution adds value to the field by addressing the need for improved gender prediction based on gait analysis and provides insights into optimizing feature extraction and classification techniques for gender prediction in gait analysis.

The sections 4.1 to 4.4 provide explanations of the proposed method for predicting a person’s gender.

4.1 Flow of proposed method

The purpose of the gait analysis method is to recognize and identify individuals’ genders using their unique gait patterns. Figure 1 depicts the flow of the proposed method. In the beginning, analysis of gait is performed using the OULP-Age dataset. After that, the dataset was separated for training and testing phases, and a range of pre-trained models were used to extract features for gait analysis. Once the features have been extracted, the classifier should be used to classify the features. Moreover, tuning of the classifier model’s parameters is also performed to improve the accuracy of the results. Based on this, compared the performance of the classifier with the tuning parameters of the classifier to analyze the results for predicting the person’s gender based on gait analysis.

Fig. 1
figure 1

Flow of the proposed method

In Training and Testing phase, the decision to split the dataset evenly into 50% for training and 50% for testing is grounded in fundamental principles of machine learning and data science. This balanced allocation ensures that the model is exposed to an adequate amount of data during training, facilitating effective learning of patterns and relationships within the dataset. Simultaneously, an equal distribution for testing enables a rigorous evaluation of the model’s performance on unseen data, essential for assessing its ability to generalize to new examples. This approach mitigates the risks of overfitting or underfitting by providing sufficient training data while allowing for a comprehensive assessment of the model’s performance. Moreover, the 50–50 split simplifies the evaluation process, enabling a clear comparison between predicted outcomes and actual observations. Overall, this methodology underscores a commitment to balance, effective evaluation, and robust generalization to new data, enhancing the reliability and efficacy of the machine learning model.

4.2 Pre-trained models

“Pre-trained" term is used in machine learning to describe when parts of a previously trained model are included into new models or used to tackle a different problem [18,19,20]. This method of machine learning model construction requires less time, money, and labeled data to produce accurate results. With the help of the pre-trained model, we achieved satisfying results with the DenseNet model, the details of which will be shown in the following section.

4.2.1 DenseNet

In our main proposed method, we relied on DenseNet. It uses TensorFlow 2.0 (TF 2.0) and Keras and comes in a variety of versions, like DenseNet (121,169, and 201). Besides the standard convolutional and pooling layers, DenseNet has two other crucial blocks. DenseNet different version shares the same convolution block, pooling layer, transition layer, and classification layer architectures. However, DenseNet each version have their own unique set of four DenseBlock with different repetition times [21].

The first convolutional block is constructed of 64 filters that are 7 × 7 in size and have a stride of 2. After this, a MaxPooling layer will take its place, employing a 2 stride with 3 × 3 max pooling configuration. The Batch Normalization, ReLu activation, and Conv2D layers follow the input layer for each convolutional block. Each dense block performs a pair of convolutions using kernel sizes of 1 × 1 and 3 × 3.

  • DenseNet-121 dense Block1, dense Block2, dense Block 3 and dense Block 4 is repeated 6 times, 12 times, 24 times, and 16 times respectively.

  • DenseNet-169 dense Block1, dense Block2, dense Block 3 and dense Block 4 is repeated 6 times, 12 times, 32 times, and 32 times respectively.

  • DenseNet-201 dense Block1, dense Block2, dense Block 3 and dense Block 4 is repeated 6 times, 12 times, 48 times, and 32 times respectively.

Transition layer channels must be half. The function specifies a 1 × 1 kernel size for the 2 × 2 average pooling layer and 2 stride for such convolutional. Classifier Layer uses 7 × 7 global mean pooling and 1000D connected softmax.

4.2.2 VGG16

VGG 16 includes 13 convolutional and three fully interconnected layers. The first two convolutional layers each have 64 3 × 3-sized filters after two more with 128 filters, two with 256 filters, two with 512 filters, and the final with 512 filters. Three fully connected layers with 4096 neurons each follow the convolutional layers, followed by a 1000-neuron classification layer [22].

4.2.3 VGG19

VGG19 includes 19 layers—3 fully linked and 16 convolutional. The first two convolutional layers have 64 3 × 3-pixel filters, followed by two more with 128 filters, four with 256 filters, four with 512 filters, and the last with 512 filters. Three fully connected layers with 4096 neurons each follow the convolutional layers, followed by a 1000neuron classification output layer [23, 24].

4.2.4 ResNet50

It is called ResNet50 because it has 50 layers. ResNet50 accepts 224 × 224 × 3 images. First 7 × 7 convolutional layer with stride 2 compresses the input image to 112 × 112 × 64. Batch normalisation and ReLU activation follow. Three convolutional layers with different numbers of filters follow: Block 1: 3 × 3 convolutional layers with 64 filters, repeated 3 times; Block 2: 128 filters, repeated 4 times; Block 3: 256 filters, repeated 6 times; and Block 4: 512 filters, repeated 3 times. ResNet50 uses a 1 × 1 convolutional layer to map input to block output. After convolutional layers, a global average pooling layer decreases output spatial dimensions to 1 × 1. Then comes a fully linked layer with 1000 units and a softmax activation function, representing the 1000 ImageNet classes [24].

4.2.5 NASNetMobile

NASNetMobile has 22 layers grouped into repeating cells. The input layer receives image data, which is processed through a series of layers: a 3 × 3 convolutional layer with 32 filters and a stride of 2, a batch normalisation layer, a ReLU activation layer, a 3 × 3 separable convolutional layer with 32 filters and a stride of 1, another normalisation layer, a second ReLU activation layer, and a 3 × 3 max pooling layer [25].

4.2.6 NASNetLarge

NASNetLarge is an advanced image classification architecture based on NasNetMobile. Its 87-layer network design is cell-based. Image data enters at the first layer. This is followed by a 3 × 3 convolutional layer with 96 filters and a stride of 2, a batch normalisation layer that normalises the output of the previous layer to speed up training, and a ReLU activation layer that introduces non-linearity. Another normalisation layer, ReLU activation layer, and 5 × 5 separable convolutional layer with 256 filters and 1 stride follow. A 2 stride 3 × 3 max pooling layer finishes the cell [26].

4.2.7 Xception

Xception (short for “Extreme Inception”) is a deep convolutional neural network architecture. Xception accepts 299 × 299 × 3 pictures (where 3 is the number of channels for RGB images). Instead of typical convolutions, Xception uses depthwise separable convolutions, which use a depth wise convolution followed by a point wise convolution. After convolutional layers, a global average pooling layer decreases output spatial dimensions to 1 × 1 [27].

4.2.8 InceptionResNetV2

InceptionResNetV2 combines the Inception and ResNet networks. Skip connections enhance gradient propagation during training in its 164 layers. A classification layer, stem module, numerous Inception-ResNet-A, B, and C modules, and the stem module processes images via convolutional, pooling, and activation layers. Inception-ResNetA, B, and C modules use 1 × 1 convolutions and max pooling to collect features at different scales. Ultimately, a global average pooling layer and fully linked layer produce class predictions in the classification layer [26].

4.2.9 InceptionV3

The input layer of InceptionV3 allows shape images (299, 299, 3). Two convolutional layers, max pooling, and batch normalisation reduce the input image's spatial dimensions and extract basic properties. 17 Inception modules have convolutional filters and pooling procedures. The following module receives concatenated module outputs. Inception modules 5 and 11 add two auxiliary classifiers. Each has a classification softmax layer, global average pooling layer, and 2 fully-connected layers with ReLU-activated. The last layer adds a global average pooling layer, fully—connected layer, and classification softmax output layer [26].

4.3 Evolution metrics for purposed method

The accuracy of gender prediction can be determined by using the evolution metrics that are listed below.

Confusion Matrix A useful classifier is the confusion matrix. Both multiclass and binary categorization are effective. It analyzes algorithmic flaws, compares model output to actual output, and provides statistics for True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).

TP: the number of correct positive predictions made by the model.

FP: the number of incorrect positive predictions made by the model.

TN: the number of correct negative predictions made by the model.

FN: the number of incorrect negative predictions made by the model.

Also there are also following statistical tools given the gender distribution of the underlying data: true positive rate (TPR), false positive rate (FPR), true negative rate (TNR), and false negative rate (FNR).

TPR: The ratio of correctly classified positive instances to the total actual positive instances.

$$TPR = \frac{TP}{{TP + FN}} * 100$$
(2)

FPR: The ratio of incorrectly classified positive instances to the total actual negative instances.

$$FPR = \frac{FP}{{FP + TN}} * 100$$
(3)

TNR: The ratio of correctly classified negative instances to the total actual negative instances.

$$TNR = \frac{TN}{{TN + FP}} * 100$$
(4)

FNR: The ratio of incorrectly classified negative instances to the total actual positive instances.

$$FNR = \frac{FN}{{FN + TP}} * 100$$
(5)

Precision: it measures a model's positive predictions. Hence, precision is the fraction of real positive predictions (i.e., accurately predicted positive samples) over the model's total positive predictions.

$$\ Precision = \frac{TP}{{TP + FP}} * 100$$
(6)

Recall: it is commonly called as sensitivity or true positive rate, assesses a model's accuracy in identifying actual positives. True positives divided by the sum of true positives and false negatives is the recall.

$$Recall = \frac{{TP}}{{TP + FN}}*100$$
(7)

F1-Score: it combines precision and recall, is used to evaluate binary classification models.

$$F1 - Score = 2*\frac{{Precision*Recall}}{{Precision + Recall}}*100$$
(8)

Correct Classification Rate (CCR): It estimates the proportion of samples correctly classified. It is determined by dividing correct predictions by total predictions.

$$CCR = \frac{TP + TN}{{TP + TN + FP + FN}} * 100$$
(9)

4.4 XGBoost classification

“XGBoost” is the abbreviation for the word “extreme gradient boosting.” The implementation of gradient-boosted decision trees is done with the help of the XGBoost algorithm. In XGBoost, the application of weights is an essential component. Each independent variable receives a weight before being included into the decision tree that is used to make predictions about the results [28]. Variables whose outcomes were incorrectly forecasted by the first decision tree are given added weight before moving on to the second tree. After that, these separate classifiers and predictors are combined in order to produce a model that is both reliable and precise.

5 Experiments and results

Our experiment was designed to assess the accuracy of gender prediction using a gait analysis dataset. To achieve this, we applied a variety of pre-trained models and adjusted classifier parameters. We then measured performance using a range of evaluation metrics to obtain comprehensive results.

5.1 Software and setup tools

We utilized Anaconda and Pycharm. Furthermore, for implementation, we adopted TensorFlow 2.9.1 and imported various Python libraries, including pandas, Sklearn, NumPy, Scikit-learn, Keras, XGBoost, and OpenCV-python.

5.2 Experiment dataset

In this study, we utilized the OU-ISIR Gait Database’s OULP-Age. It consists of 63,846 gait photos of people moving down a path. The age range of the participants is 2 to 90. GEIs sequences of 88 × 128 pixels that were gathered from a side angle of each participant’s gait. A testing set of 31,923 subjects (16,426 women and 15,407 men) and a training set of 31,923 subjects (16,327 women and 15,596 men) were created from the database based on the predefined protocol [29].

5.3 Training and testing phase

Including both the training and testing sets, first we used pre-trained models (from TensorFlow 2.0 (TF 2.0) and Keras) for extracting the features on GEI images from OULP—Age dataset. After that, XGBClassifier (from XGBoost) used to classify and predict the accuracy for gender detection using different evolution metrics.

5.4 Tuning classifier

By adjusting boosting settings including max depth, learning rate, and n estimators, this research fine-tunes the XGBoost classifier parameters. The learning rate has been adjusted to fall between 0.01 and 1.0, max depth between 4 to 6 and n estimators was up to 1000.

5.5 Result comparison and discussion

In Table 1 highlights prediction of gender classification in training phase where the highest CCR score is achieved by DenseNet201 with a value of 93.88%. This model also has the highest precision and F1-score for both female and male genders, with values of 95.31% and 94.01% for precision, and 94.32% and 93.75% for F1-score, respectively. The recall for both genders is also high, with values of 92.74% and 92.43% for female and male, respectively. On the other hand, the lowest CCR score is achieved by NASNetMobile with a value of 88.80%. This model also has the lowest precision and F1-score for both female and male genders, with values of 89.17% and 88.73% for precision, and 88.57% and 88.57% for F1-score, respectively. The recall for both genders is also relatively low, with values of 89.02% and 88.41% for female and male, respectively.

Table 1 Prediction of gender detection in training phase

In Table 2 shows that DenseNet201 model has the greatest TPR of 92.74% and the highest TNR of 95.11%, demonstrating its high degree of accuracy in correctly identifying positive and negative situations. Also, it has the lowest FPR of all models (4.89%), meaning that it predicts less false positives than other models. Also, the NASNetMobile model has the greatest false positive rate (FPR), which indicates that it generates more false positive predictions than other models. Also, it has the greatest FNR (11.14%), which shows that it falsely labels more genuine positive situations as negative. Overall, according on the metrics supplied in the table, DenseNet201 model appears to be the best-performing model, whereas NASNetMobile model looks to be the worst-performing model.

Table 2 TPR, FPR, TNR and FNR of gender detection in training phase

In Table 3 demonstrates the prediction of gender classification in testing model. The DenseNet169 achieved the highest CCR of 93.90%. It also had the highest precision and recall scores for both genders, with precision values of 94.70% and 93.30% for females and males, respectively, and recall values of 94.00% and 93.09% for females and males, respectively. Its F1-score was also high, with values of 94.53% and 93.81% for females and males, respectively. While NASNetMobile model achieved the lowest CCR of 88.05%. Although it had a high precision score for males (88.76%), its precision score for females was relatively low (88.08%). Its recall score was also lower than most of the other models, with values of 88.42% and 88.03% for females and males, respectively. Its F1-score was also relatively low, with a value of 87.31% for females and 87.66% for males.

Table 3 Prediction of gender detection in testing phase

In Table 4 illustrates that the DenseNet169 has the highest TPR at 93.30%, meaning it correctly identifies a high proportion of positive cases. DenseNet121 has the lowest FPR at 6.18%, meaning it makes the fewest incorrect positive identifications. DenseNet169 also has the highest TNR at 94.53%, meaning it correctly identifies a high proportion of negative cases, while InceptionResNetV2 has the highest FNR at 12.10%, meaning it incorrectly identifies a high proportion of positive cases as negative. Overall, it seems that DenseNet models perform well on this task, while InceptionResNetV2 may be the weakest performer.

Table 4 TPR, FPR, TNR and FNR of gender detection in testing phase

The results from Table 5 demonstrate the prediction accuracy of gender detection using DenseNet models with tuned parameters of the XGBoost classifier. These findings highlight significant improvements in accuracy achieved through parameter tuning, shedding light on the enhanced performance of the models after tuning the classifier.

Table 5 Prediction of Gender Detection using DenseNet with Tune Parameter of XGBoost classifier

Before parameter tuning, DenseNet-121, DenseNet-169, and DenseNet-201 exhibited comparable levels of accuracy throughout the training and testing phases for gender detection using the XGBoost classifier. However, after fine-tuning the classifier, notable improvements in accuracy were observed across all models. For instance, DenseNet-201, which initially performed well with an accuracy of 93.88% during the training phase, experienced a further enhancement in accuracy, reaching an impressive 95.41% after tuning the classifier. Similarly, DenseNet-169, which achieved the highest accuracy of 93.90% during the testing phase before tuning, saw its accuracy improve to 95.13% after parameter optimization.

Comparing the performance of DenseNet models before and after tuning the classifier underscores the effectiveness of this optimization process in boosting prediction accuracy. The results demonstrate that parameter tuning enhances the capabilities of the XGBoost classifier, leading to more accurate gender detection outcomes across all DenseNet models.

In summary, the findings from Table 5 underscore the importance of parameter tuning in improving the accuracy of gender detection models based on gait analysis. By optimizing the parameters of the XGBoost classifier, significant enhancements in prediction accuracy can be achieved, ultimately contributing to the effectiveness of DenseNet models in gender prediction tasks.

6 Conclusion

The intention of this research is to demonstrate an approach for predicting gender based on gait analysis that is both cost-effective and adaptive method. Following the extraction of the features, the system being presented makes use of GEIs for human gait representation. Further, XGBoost is utilized for the classification of gender detection. When it comes to gait recognition, GEI is superior to binary silhouette sequences in terms of its ability to reduce both store space and computing time.

An approach that is based on appearance is utilized for gait analysis by the proposed system while it is performing the detection of gender. On the OULP-Age dataset provided by OU-ISIR, the suggested system is examined to determine how well it performs. The comparison of the results obtained in the experiment with training, testing, and tuned classifiers demonstrates that the suggested system has superior performance using the DenseNet model with XGBoost Classifier. The results of the comparison reveal that using the tuned parameters of XGBoost Classifier leads to superior accuracy. The proposed system achieved 95.41% accuracy for gender detection respectively.

However, the system that has been proposed does not take into consideration other issues that are associated with gait-based analysis. Some of these issues include occlusion of gait as a result of different conditions regarding holding of baggage and the loose fitting attire. In addition to the approach taken for the analysis from a variety of view angles. Future work in this project will address the difficulty with walking due to a blockage, varied viewpoint angles and build approach resilient to limited access to gait.