A Novel Functional Link Network Stacking Ensemble with Fractal Features for Multichannel Fall Detection

Falls are a major health concern and result in high morbidity and mortality rates in older adults with high costs to health services. Automatic fall classification and detection systems can provide early detection of falls and timely medical aid. This paper proposes a novel Random Vector Functional Link (RVFL) stacking ensemble classifier with fractal features for classification of falls. The fractal Hurst exponent is used as a representative of fractal dimensionality for capturing irregularity of accelerometer signals for falls and other activities of daily life. The generalised Hurst exponents along with wavelet transform coefficients are leveraged as input feature space for a novel stacking ensemble of RVFLs composed with an RVFL neural network meta-learner. Novel fast selection criteria are presented for base classifiers founded on the proposed diversity indicator, obtained from the overall performance values during the training phase. The proposed features and the stacking ensemble provide the highest classification accuracy of 95.71% compared with other machine learning techniques, such as Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine. The proposed ensemble classifier is 2.3× faster than a single Decision Tree and achieves the highest speedup in training time of 317.7× and 198.56× compared with a highly optimised ANN and RF ensemble, respectively. The significant improvements in training times of the order of 100× and high accuracy demonstrate that the proposed RVFL ensemble is a prime candidate for real-time, embedded wearable device–based fall detection systems.


Introduction
Falls are a major health hazard for older adults and result in high morality and injury rates [55]. A large percentage of fall incidents, up to 62%, result in immobility [9,34]. Falls result in high costs to the national health service [55]. Early detection of falls and immediate medical aid can save lives and reduce death by 80% [39]. Fall Detection Systems (FDSs) play an important role in timely medical aid provision through early detection of falls [39]. FDS can be sensor based [38,41,64] or camera based [15,22,30,37]. Sensor-based systems can be wearable [26,41] and smartphone-based [21,46] accelerometers or gyroscopes, while environmental sensors frequently use infrared [7,14], pressure sensors [59] and WiFi-based sensing devices [17,54], which utilise fluctuations in channel state information amplitude at the WiFi receiver to sense activities. The readings from these sensors are used to detect and classify falls from Activities of Daily Life (ADL).
The sensor signals are evaluated with signal processing algorithms to extract features for classification. Machine learning and neural network algorithms are then frequently used to classify and detect falls from the extracted features by a processing device [21,26,62]. Machine learning techniques and neural network models have a significant focus on randomised algorithms in recent years, due to their asymptotically faster runtimes and computationally efficient models [48,56]. The main idea behind utilising randomised learning for neural networks is to assign random weights and biases to neural network inputs and compute output parameters by solving a linear system [57]. Random Vector Functional Link (RVFL) neural networks introduced by Pao et al. in [42] utilise randomness for a subset of weights and biases between the input and a single hidden layer, which are kept fixed during the training procedure. Unlike single hidden layer Multi-Layer Perceptron (MLP), RVFL networks have direct links between the inputs and the output. The output weights can be computed from a closed form least-squares method. RVFL networks are computationally efficient and fast learners, unlike traditional neural networks [43]. Therefore, RVFL network is a prime candidate for fast ensemble techniques.
Fractal dynamics is an essential part of complex nonlinear dynamic systems which are chaotic in nature and appears in state space representations of such systems with time-evolving trajectories [35]. The human movements manifest as a result of complex non-linear interactions representative of a complex non-linear dynamic system [45] and can be analysed as a chaotic system exhibiting fractal dynamics, while the current work in non-linear dynamics is limited to the analysis of human movements during walking [52]. Other activities such as falls can be analysed with fractal dynamics. Real-world fractals are statistically self-similar patterns and signals, where the whole is statistically similar to its components. The generalised Hurst exponent is related to fractal characteristics of a signal and is used for fractal analysis of time-varying biomedical signals [44]. The fractal characteristics of a signal have a positive correlation with its irregularity, according to Mandelbrot [33] and the generalised Hurst exponent can be used as a measure of signal irregularity.
We utilise RVFL neural networks as base classifiers for our proposed ensemble method. Each base classifier is trained in a highly optimised feature space to achieve high classification accuracy. We propose fractal feature-based classification of falls, along with Discrete Wavelet Transform (DWT) coefficient features. Hurst exponent values are used as fractal features to represent the accelerometer signals for falls and ADL. Fall signals traditionally consist of a frequent single-magnitude spike, as opposed to continuous and lower magnitude variations for other activities [23] and potentially have different irregularity characteristics, which can potentially be exploited as features for classification. The Hurst exponents are calculated from the Signal Summation Conversion (SSC) method [12]. The DWT multilevel wavelet transform is performed in parallel with the SSC computations and the resulting DWT coefficients and Hurst exponents are used to train various classifiers for fall detection.
Furthermore, our work proposes a novel ensemble of RVFL neural networks combined by an RVFL network metalearner as illustrated in Fig. 1 for the final classification of falls from ADLs, to achieve low latency and fast training for the ensemble learner. The base RVFL classifier response can be potentially diverse due to a random subset of parameters used in RVFL networks. The proposed technique introduces heterogeneity by choosing from a set of different kernel functions for the network and selecting classifiers from different folds of the same k-fold training procedure. This encourages generation of a larger number of base classifiers within the same training procedure for lower runtime costs. However, the speed advantage of the RVFL ensemble can be lost if an efficient procedure for selection of RVFL base classifiers is not adopted. Additionally, we also propose an efficient scheme for selection of individual classifiers.
Our work determines diversity of models from overall performance measures of base classifiers, namely total True Positives (TP) and True Negatives (TN). The insight is based on the observation that two models with similar accuracies can be inherently different due to the way they classify positives and negatives. A model with high TP and low TN values may have a similar accuracy to a model with low TP and high TN values, since accuracy depends on the aggregate sum of the two counts. However, the two models are inherently diverse. The technique determines a diversity indicator from overall performance measures (TP and TN values) computed while training the model, instead of finding diversity amongst the models from individual inputs and their corresponding classification outputs. Calculating diversity from each input space sample is not computationally feasible for comparing a large number of models. An Aggregate Performance as Diversity Indicator (APDI) is constructed from their difference and the concept is applied to models with same or different accuracies. The proposed RVFL ensemble and selection algorithm-based FDS is illustrated in Fig. 1, while an overview of the fall detection process is shown in Fig. 2. As illustrated, the accelerometer signals from a wearable sensor device are transmitted to a local processing system through the WiFi Router for fall classification. On detection of a fall event, the nearest medical aid centre is notified for timely medical assistance. The contributions of our work are summarised as follows: -We proposed the use of the generalised Hurst exponent for fall classification as a metric to characterise the irregularity of a signal. Mandelbrot in [33] demonstrated that fractal dimensions' increase with the irregularity of a one-dimensional curve and have a positive correlation with signal irregularity. The generalised Hurst exponent is related to the fractal dimensions of a signal and is used to determine the fractal dimension of one-dimensional signals [44]. The generalised Hurst exponent is therefore leveraged as a discriminating feature for representing the irregularity characteristics of a signal to train machine learning algorithms for fall classification. The technique is based on the observation that fall accelerometer signals consist of a spike and have different irregularity characteristics than ADL signals, which possess higher irregularity. The next "Related Work" section discusses the related work and "Mathematical Techniques" section explains the mathematical techniques used. The proposed algorithm with the ensemble technique is discussed in "Proposed Algorithm". Section "Methodology" presents methodology and "Results and Discussion" section provides results and discussion.

Related Work
There has been a number of recent research publications on detecting fall events based on wearable sensors using traditional machine learning techniques. Hsieh et al. [19] proposed a fall detection algorithm that utilises both machine learning and threshold-based techniques for detection of falls from accelerometer signals with high accuracy, above 98%. Sukor et al. [51] leveraged time and frequency space features including energy and power spectrum of accelerometer signals for fall detection. Principal Component Analysis (PCA) was performed on feature space to select the principal components and various machine learning classifiers including DT and SVM were used for fall detection. Ramon et al. [47] utilised a multiplesensor body area network with a smartphone for sensing, processing and classification of falls and ADLs. A number of classifiers including SVM, KNN, Naive Bayes and DT were applied and analysis of variance algorithm was used for validation of different algorithms.
Ensemble techniques have also been utilised for fall detection. The latest work by Chelli et al. [5] utilises Ensemble Bagged Trees (EBT) for classification of falls with a number of time and frequency features with an accuracy of 97.7%. In [58], authors utilise convolutional layers to extract features from images and a bagged tree ensemble is then used for fall classification. Nguyen et al. [36] proposed RF for detection of falls with a number of time features including signal energy and achieved an accuracy of 94.37% for fall detection. However, generation of a large number of trees is computationally expensive and we show that our proposed method is faster than a single DT. Yang et al. [63] combined tree classifiers through a diversity-based technique for RF ensemble using weights for each sample. The tree weights are learned through convex quadratic programming. However, our method utilises RVFL neural networks for fast training speed and is 2.3× faster than a single decision tree. Moreover, our proposed method uses a simple selection procedure based on aggregate performance metrics obtained from each model which improves the overall accuracy and reduces the selection time cost.
Ensemble techniques for combining neural networks were also leveraged for fall classification. Recently, Chen et al. [6] proposed ensemble of stacked AutoEncoders (AE) along with One-Class Classification based on the Convex Hull (OCCCH) technique. Furthermore, the authors utilised two stage ensemble method with majority voting in the first and weighted ensemble in the second stage. Khan et al. [27] proposed an AE ensemble for processing accelerometer and gyroscope signals for classification of falls. Furthermore, the authors utilise a majority voting scheme to combine the results of the classifiers. Wen et al. [60] presented an ensemble of CNN, where each CNN outputs a probability for each class. The probabilities are then combined using a probability-based fusion method and the maximum probability is used for the final classification results. However, deep neural networks, such as AEs, CNNs, etc., are computationally extensive and suffer from high runtime costs associated with generation of base classifiers for deep neural networks. Our proposed technique offers a fast RVFL ensemble technique with RVFL meta-learner to combine the outputs, along with an aggregate performance-based diversity indicator for selection of base learners, resulting in high accuracy and low runtime costs.
Randomised algorithms [32] have received a significant focus in recent years for large-scale computing applications, due to their asymptotically faster runtimes and efficient numerical implementations. Neural networks and machine learning models have also exploited randomised algorithms for faster training [48,56]. To the best of our knowledge, this is the first instance of randomised weights-based RVFL neural networks for fall detection. RVFL neural networks use a subset of randomised weights/biases and were proposed by Pao et al. in [42], while generalisation ability and learning characteristics were discussed in [43]. Zhou et al. [67] presented an online version of RVFL with sequential learning for modelling of dynamic time-varying complex systems. The proposed method is applied to the prediction of quality indices for an industrial furnace process. Xu et al. used the RVFL networks for learning of spatio-temporal processes [61]. Maeda et al. [31] used a convolutional coding-based deep RVFL neural network for distress classification of roads. Tian et al. [53] used RVFL networks for recognition of intrusion signal in an optical fiber warning system. Cecotti et al. [4] used deep RVFL neural networks for recognition of handwritten characters. Scardapane et al. [49] present Bayesian inference techniques for data modelling with RVFL networks, while Dai et al. [11] used RVFL networks for the diagnosis of Alzheimer's disease. The authors also used their model to determine the progression of disease. Katuwal et al. in [25] proposed an ensemble of RVFL neural network with DTs. RVFL network is used for initial division of data into classes and DTs are applied to the obtained classes of data for final classification. However, the DTs have a higher runtime cost and take away the speed advantage of RVFL networks. In our work, the proposed RVFL ensemble is 2.3× faster than a single DT. Furthermore, Katuwal et al. in [24] also proposed an ensemble of deep RVFL network, which utilises all the hidden layers of a single deep network to obtain a separate output from each layer and calculates an ensemble output based on average or a majority vote. However, deep RVFL ensembles have high computational complexity and do not provide the speed advantage of our proposed RVFL ensemble.
Fractal features have also been used in biomedical systems with machine learning techniques but are limited to diagnosis of anomalies or have been used only for human gait analysis. Fractal dynamics of walk and human gait have been analysed in [18,52] and [50]. Various anomalies have been detected in biological systems by the use of fractal dimensions [28,66]. Koutsiana et al. [28] detected fetal heart sounds by computing fractal dimensions of wavelet transformed signals. Anomalies in the human brain have also been detected by Zhang et al. in [66] by utilising fractal dimensions. However, to the best of our knowledge, the generalised Hurst exponent has not been used for activities in general and falls in particular as an irregularity measure of the obtained signals.

Discrete Wavelet Transform
The accelerometer signals representing the tri-axis of motion along x, y and z axes can be mathematically represented as a x = {a x (n)}, a y = {a y (n)} and a z = {a z (n)}, where n = {1, . . . , N} and N = 128 samples for the evaluated window size. The tri-axis accelerometer signals a x , a y and a z are illustrated in Fig. 3 in red, orange and blue, respectively. The DWT of tri-axis accelerometer signals is a projection of each tri-axis accelerometer signal on a family of wavelet basis functions φ i,k (n) and ψ i,k (n). The basis functions are obtained from dilations and translations of the mother wavelet ψ(n) and scaling function φ(n) and are as follows: where k are discrete translations and 2 i represent dyadic dilations. The DWT coefficients of each tri-axis accelerometer signal, generally represented as a dim (n), where dim represents the dimensions of motion along x, y or z axis, can be given as: where A i,dim represents wavelet approximation coefficient vector and D i,dim represents wavelet detail coefficient vector for each of the three axes of motion where dim can be x, y or z, while k is the shifting index of mother and scaling wavelet functions. The low pass wavelet coefficients are also known as approximations. The approximations A 1,dim are further used as input signals in Eqs. 3 and 4 to generate level-2 DWT approximations A 2,dim and details D 2,dim . The level-2 DWT approximations A 2,dim are then used as input signals in Eqs. 3 and 4 to generate level-3 DWT approximations A 3,dim and details D 3,dim . Similarly, level-3 DWT approximations A 3,dim are used to generate the final level-4 DWT approximations A 4,dim and details D 4,dim .

Signal Summation Conversion
The signal summation method for computation of Hurst exponent [12] is also implemented on each of the tri-axis accelerometer signals a x = {a x (n)}, a y = {a y (n)}, a z = {a z (n)}, where n = {1, . . . , N} and N = 128 samples along the three axes of motion x, y and z. The steps of the process on each tri-axis accelerometer signal, generally represented as a dim (n), where dim denotes one of the three dimensions or axes of motion x, y or z, can be given as: 1. Compute the cumulative sum signal s dim (n) of the accelerometer signal a dim (n) as follows: 2. Partition N samples of the cumulative sum signal s dim (n) for each of the three accelerometer axes dim = {x, y, z}, into N/w non-overlapping windows of size w = {2, 4, . . . , N/2, N}. 3. Perform the detrending process on the signal obtained from the previous step. In our work, we perform the bridge detrending process [3], which involves computing a separate line equation that connects the first and last points in the window for each partition. Given a window size w and window partition index j = {1, 2, . . . , N/w}, the index of the first and last points in the j th window partition can be given as jw − w + 1 and jw. The slope sl dim,j and signal magnitude axis intercepts b dim,j for each window partition j , for each of the three accelerometer axes x, y and z respectively, are computed as: The function d dim,j representing the line equation for each accelerometer axis and each window partition j can be given as: Each equation is then subtracted from the signal in the respective partition. The detrended signalś dim,j for each partition j is given as: where k w ∈ {1, . . . , w} is the signal index in each window partition j , relative to its start. 4. Compute the meanś dim,j and standard deviations σ dim,j , for each of the three accelerometer axes respectively, for each detrended window j from the following equations: where j = {1, 2, . . . , N/w} each for a given window size w. 5. Compute the mean of standard deviations over all windows j = {1, 2, . . . , N/w} for a given window size w, for each given value of w = {2, 4, 8 . . . , N} as: 6. The Hurst exponents H dim , for the three accelerometer axes, dim= {x, y, z}, are related to the mean of standard deviation σ dim,w for each window size w as given in Eq. 13, where ρ is the constant of proportionality. H dim can be computed from the slope of the least squares regression lines log σ dim,w versus log w according to Eq. 14.
The Hurst exponents H x , H y and H z represent fractal features for each dimension dim, with dim= {x, y, z} and are used along with the 4 th -level wavelet approximation coefficient vectors A 4,x , A 4,y and A 4,z as input features for the RVFL neural network.

RVFL Neural Network
The RVFL neural network is a single hidden layer network, first introduced by Pao et al. in [42] and [43]. It is characterised by direct links between the input and the output layers, apart from the conventional connections between the input and hidden layers, as illustrated in Fig. 4. Hidden layer in an RVFL neural network is also known as the enhancement layer. The weights and biases between the input and enhancement layers are randomly initialised in an RVFL network and remain constant during the training phase, while the weights that connect the input and enhancement layer nodes to the output layer are learnt.
Given inputs x di ∈ R r and target output y di ∈ R, where di = {1, . . . , N t } is the data index and r = 27 are the number of wavelet and fractal input features. The RVFL network will have r input neurons and 1 output neuron. Let G be the number of enhancement nodes and α g denote random weights between the input and enhancement nodes, where g = {1, . . . , G} represents the index of the enhancement nodes. Then, α α α 1 = [α 1,1 · · · α 1,r ] represents the random weights initialised between the first enhancement node g = 1 and all the input nodes, which are equal to the number of features r. The output of the g th enhancement node y g for the di th data input can be given as: where, f act represents an activation function for the neural network. In matrix form, the overall input matrix X for the output node of RVFL network can be represented as a concatenation of two matrices X 1 and X 2 , where X 1 represents inputs from the input layer and X 2 represents inputs from the enhancement layer: Given weights β β β of the direct links to the output node, both from the input and the enhancement nodes. The outputs t of the RVFL network can be given as: where t is the target output vector, From Eq. 19, the output weights β β β can be calculated directly by the Moore-Penrose method given in [20] or the ridge regression [1,65], as presented in Eq. 22,

Proposed Algorithm
The proposed algorithm takes tri-axis accelerometer values for all the three axes of motion. The signals are divided into window segments of size 128 and processed to extract wavelet low pass coefficients and fractal features. The selected Daubechies 4 wavelet coefficients and generalised Hurst exponents computed for all the three axes of motion are used to train RVFL models. The selected RVFL models are then used to generate an RVFL ensemble of RVFL models. The proposed algorithm is divided into two sections feature processing and RVFL Ensemble classifier, which are described next and illustrated in Figs. 5 and 6, respectively.

Feature Processing
1. Divide each tri-axis accelerometer signal, generally represented as a dim , where dim denotes one of the three dimensions of motion x, y or z, into window segments of size N = 128 samples.

Compute level-4 DWT approximation coefficients
A 4,dim for the accelerometer signal a dim for each of the three dimensions of motion, with dim equal to x, y or z axis.
(a) Perform zero padding for each of the triaxis accelerometer signal a dim and compute convolution a dim h db4 of each signal with Daubechies 4 wavelet filter coefficients h db4 and down sample by 2 to find level-1 approximation coefficients A 1,dim for each of the axis x, y and z.
where, the function p l,c k (x di ) is the performance score returned by each classifier M l for input x di and target class c k ∈ c. We utilise accuracy measure for each classifier and the performance score can be given as: where t is target output, f act,l is the activation function for model l, y g is given in Eq. 15, β β β x,l and β β β g,l can be obtained from Eq. 21 and l denotes unique values of the weight vector for each base classifier, given as: Equation 23 can now be represented as: The RVFL ensemble algorithm seeks a RVFL metalearner M c : X c → Y out , where X c represents the output space of base classifiers, {M 1 (x di ), M 2 (x di ), . . . , M L (x di )} and takes the class outputs from base models as input features for the meta-learner. The final base models are selected from the available models based on accuracy and a pairwise diversity indicator, which is used as a heuristic function. Selection of diverse RVFL base classifiers must be based on an efficient technique in order to benefit from faster learning advantage of base classifiers, while finding diversity amongst the models from individual inputs and their corresponding classification outputs is not computationally feasible for comparing a large number of models. We propose an ensemble selection method based on diversity of models from overall performance measures TP and TN values of base classifiers. The RVFL models for the ensemble are selected based on a diversity indicator computed from the differences in TP and TN values. The insight is based on the observation that two models with the same accuracy can be inherently diverse based on whether the accuracy performance has been achieved through higher TP or higher TN values. The high accuracy of a model may be attributed to its higher TP values (or lower false negative values), while another model with comparable accuracy may have comparatively higher TN values (or lower false positive values), since accuracy depends on the sum of TP and TN values. The two models will complement each other in an ensemble since different TP and TN values also imply that they will differ in their classification or misclassification of instances. The difference in TP and TN values can be a potential indicator of diversity. Therefore, a model with a relatively lower accuracy may be chosen based on a higher TP or TN value.
The models in M = {M 1 , M 2 , . . . , M L } are sorted based on accuracy metric and a search algorithm based on the diversity indicator as a heuristic function is performed. The resulting set of models obtained, E = {M 1 , M 2 , . . . , M md }, consists of md diverse models obtained from heuristic search from a pairwise diversity heuristics. The base model selection algorithm to select md RVFL base classifiers is illustrated in Fig. 7 and is given as follows: 5. Compute the diversity indicator APDI as given in Eq. 30 to check if the next highest accuracy model has either TP or TN value greater than the newly added model to the ensemble set.
6. If APDI is greater than zero, then add the model to the ensemble; otherwise, test the next model for diversity. 7. If no model is found according to the APDI criteria, then add the next highest accuracy model to the ensemble set and repeat the procedure by comparing other models with the next newly added model.
where, the function p c k (M(x di )) is the performance score returned by the RVFL meta-learner M c for input M(x di ) and target class c k ∈ c, where M(x di ) represents the output of base classifiers . We utilise accuracy measure for the RVFL meta-learner classifier and the performance score can be given as: where t is target output, f act,c is the activation function for meta-learner, y g is given in Eq. 15, and β β β x,c and β β β g,c are weight vectors for the RVFL meta-learner similar to Eqs. 25 and 26.

Dataset
A dataset by Kwolek et al. [29] consisting of accelerometer signals for falls and various ADLs including walking, sitting down, sitting on chair, lying down, lying on bed, picking up objects, standing up and sitting down was used for analysis and experimental verification of the proposed scheme. A total of 40 fall activities were recorded. The dataset is acquired from a motion sensing platform consisting of an Inertial Measurement Unit (IMU) mounted on the pelvis of 5 volunteers. The IMU consists of two sensors, a 16-bit three-axis gyroscope and a 12-bit three-axis accelerometer with a total sampling rate of 256 Hz. The accelerometer was used for analysis and detection of fall activity in this work. The three-axis accelerometer measured the acceleration of the body movements along all the three axes of motion in units of G-force (g) with values varying from −8 to 8g. All the three axes of motion are used for classification and are divided into windows of 128 samples each. The 128-sample windows were overlapped during training with 64 samples from the previous window, resulting in a 50% overlap between two windows. Figure 8 shows 128 sample segments for each of the three axes of motion for fall activity. Each of the three axes was processed separately by the algorithm and wavelet and fractal features were computed for each motion axis separately. The features for reach axis were then concatenated and used for training the RVFL ensemble.

Experimental Specifications
This section discusses the experimental specifications including the tools and the system used for processing. The fractal analysis was performed in R language with R-Studio and a number of R packages for fractal analysis were used, including "fractal", "fracdiff" and "tseries" packages. After fractal analysis in R, the fractal features were stored in data files and exported to Matlab for ensemble classification. The classification and machine learning experiments were performed with the statistics and machine learning toolbox in Matlab version 2019a. The machine learning classification was performed on an Intel system i5-6500 processor with quad cores at 3.2GHz with 6MB cache and a main memory of 8GB. A number of classifiers were used for comparison including, DT, Linear Discriminant Analysis (LDA), KNN, SVM, RF and ANN. The classifiers are explained in "Classifiers" section, while the training and testing strategy is based on a 5 fold partitioning strategy further explained in "k-Fold Partitioning" section. The execution runtimes were obtained by reading the clock with Matlab commands "tic" and "toc". The clock times in cycles before and after the execution of the algorithm were saved and the difference was computed to measure the execution time of the algorithm. A total of 5 measurements were taken for all the 5 combinations of 5-fold partitioning strategy and then an average training runtime was calculated for each algorithm. Furthermore, the system is also compared with current state-of-the-art ensemble techniques utilised for fall detection in "Results and Discussion" section and Table 7.

k -Fold Partitioning
The datasets for training all the classifiers were divided into 5 folds, with 4 folds for training and 1 fold for testing resulting in a percentage of 80-20 % for training and testing, respectively. All the classifiers were trained on the 4 out of 5 folds each time in a round robin fashion and the values for testing accuracy were averaged overall. Similarly, TP/TN, FP/FN and values for precision, sensitivity, specificity and F1-measure were calculated for each fold and averaged overall. The same strategy was followed for measuring the training time. The training time was measured for training over 4 folds each time and averaged. The specific details of parameters for each classifier are mentioned in the next section.

Classifiers
A number of classifiers with the proposed set of features, namely Daubechies 4, level-4 wavelet coefficients and generalised Hurst exponents, were used for comparison with the proposed RVFL stacking ensemble. The classifiers along with their parameters are given below:

Decision Tree
The DT algorithm was tested on Daubechies 4, level-4 wavelet coefficients and generalised Hurst exponent features of the accelerometer signal. The features are compared against constant values and the tree is split based on less than equal to or greater than value from the constant. Leaf nodes give the final classification of a fall or an ADL decision. The DT in this work uses the CART algorithm to select the best split feature at each node from fractal features and the 4th level wavelet coefficients. Gini's Diversity Index (G.I.) in Eq. 33 is used as the split criterion, with r − 1 maximum splits where r is the feature set size or input sample size from fall and activities dataset.
Where pr represents the probabilty value. The leaves originating from the same parent node are merged and the classification tree is grown by estimating a pruned optimal sequence of subtrees. For the testing phase, the test feature set is routed down the tree according to the values of the features, which are compared against constants at each node and the final classification is obtained on reaching a leaf node associated with a fall or an ADL class.

Linear Discriminant Analysis
LDA finds the maximum separation between classes by maximising variance between classes and minimising variance within the class. Given a set of fractal and wavelet transform features as input x di ∈ R r , di = {1, . . . , N t } for r = 27 dimensional input space with K classes, labelled as { 1 , . . . , K }. The k th class has total N k inputs, with x di ∈ k in the feature space. LDA finds basis vector θ θ θ in terms of the between class scatter matrix S B and within class scatter matrix S W as: where μ μ μ is the mean vector of all input datasets of size N t and μ k μ k μ k is the mean vector for class k of size N k , given as:

K-Nearest Neighbour
The KNN classifier is based on the insight that the class of an unknown instance should be similar to the class of its neighbours. The KNNs are chosen based on the Euclidean distance from an unknown instance and a classification decision is based on the majority vote of the neighbours. Given r, where r is the number of wavelet level-4 coefficients and generalised Hurst exponent features, an r-dimensional space can be used to represent the dataset of falls and activities. The Euclidean distance dist between two points, an unknown activity ax and a known activity bx in an r-dimensional feature space can be given by Eq. 38: where r = 27 for our feature space. The Euclidean distances between point a and all classified instances are calculated. The KNNs are selected based on the K smallest Euclidean distances and the neighbours are then majority voted to determine a fall or an ADL classification for the unknown activity. In our work, we used several values of K from 1, 3, 5 to 7. The highest classification accuracy was achieved for K = 1 nearest neighbours.

Support Vector Machine
SVM [2,10] finds the best hyperplane with the largest margin that separates the two classes of fall activity and ADLs. Given a set of training input vectors x di ∈ R r , di = {1, . . . , N t } for r = 27 dimensional fractal and wavelet transform feature space and outputs y di ∈ {1, −1}, the hyperplane is given by Eq. 39: where x and w represent column vectors of input variables and constants in the hyperplane equation, respectively. While the training input vector x di represents the fractal and wavelet transform features and sign() is a signum function with ±1 output, we use soft-SVM in our Matlab implementation, since it also applies to non-linearly separable data. The objective in soft-SVM is to minimise Eq. 40: where ξ di is the slack variable and penalises the objective function for data points that cross margin boundary meant for that class, while C b is the box constraint. We used the Sequential Minimal Optimisation [13] solver in Matlab with a linear kernel function for training the Soft-SVM and a value of 1 for the box constraint.

Random Forest
RF is an ensemble learning technique that generates a number of DTs at training time and outputs the mode of the classes as the final classification output. Given a set of fractal and wavelet transform features as input x di ∈ R r , di = {1, . . . , N t } and outputs y di ∈ {0, 1}, RF randomly selects inputs with replacement and trains classification tree, also selecting randomly a subset of features at each split. The splitting criterion is either based on the information gain or Gini's index given in Eq. 33.

Artificial Neural Network
The ANN used is a classic MLP. For a given input vector x i , the output of each neuron is computed as: where f sig represents a sigmoid activation function. The ANN used is a single hidden layer neuron with one input and one output layer. It was trained and tested with a number of neurons in the hidden layer and various learning agorithms were used including Stochastic Gradient Descent (SGD), Rprop and Levenberg-Marquardt (LM) algorithms.

Results and Discussion
The proposed use of fractal features for classification of falls is a good discriminant along with DWT features and provides high classification accuracies with a number of classifiers as illustrated in Table 6. The proposed RVFL ensemble has a significant speed advantage of the order of 100× and training time of 1.76 ms, which has implications for real time, embedded implementation on low-end processing cores in terms of runtime cost. This would enable real-time detection and immediate notification to medical aid centre for medical response. The proposed stacking ensemble of RVFLs combined with an RVFL meta-learner, along with the proposed ensemble selection algorithm, provides the best results with the proposed features. The RVFL neural networks are first trained with a different number of neurons and activation functions to determine the best parameters for the fall classification problem. Five different activation functions namely, hardlim, sign, sine, tribas and radbas were intially tested. The three activation functions sine, tribas and radbas provide the best results as illustrated in Tables 1 and 2. The number of neurons required for 27 inputs are also modest and good results are achieved with half or less than half the number of total input and output neurons. The table rows with accuracy values highlighted in bold give the best accuracy results. The RVFL networks with the highest accuracies are chosen for the ensemble. However, a network with a higher TP or higher TN is preferred amongst the networks with similar accuracy. For example, in Tables 1 and 2, the three highlighted rows in italic show networks which complement each other either with a higher TP or a higher TN (same is not true for ANN Tables 3 and 4). The ensemble is composed out of three RVFL networks and provides the highest accuracy of 95.71% as illustrated in comparison or results Tables 5 and 6. The proposed scheme is compared with the most optimised ANN (MLP) with sigmoid activation functions for classification. The ANN is optimised for a number of neurons in the hidden layer and a number of learning algorithms. The three training algorithms stand out for better performance namely, LM, Rprop and SGD. The best training accuracy of 90% is achieved with the LM learning algorithm and 42 neurons followed closely by an ANN  network with 56 hidden neurons, as illustrated in Table 4 with the corresponding TP, TN, FP and FN values in Table 3.
The proposed RVFL ensemble provides the highest accuracy, precision, sensitivity and specificity by virtue of the highest improvement in TP. The accuracy is 5.71% higher than RF and ANN, while it shows an improvement of 7.14% over the SVM and 11.42% over the DT and LDA as illustrated in Table 6.
The proposed scheme gives the lowest training time of 1.76 ms as illustrated in a logarithmic plot in Fig. 9. The   [36,58] and [27]. The presented FDS has 8% higher accuracy and 19% higher sensitivity than the camerabased FDS proposed by Wang et al. [58]. Moreover, the proposed technique gives 2.3% and 1.66% higher sensitivity than [36] and [27], respectively. Our work has comparable performance to Chen et al. [6] and 2% lower performance than Chelli et al. [5]. However, the proposed system has the lowest runtime cost of all the recent works illustrated in Table 7. The AE ensemble suffers from higher runtime costs due to the complexity of deep AE networks. The work in [6] has a significant runtime cost at 1810.20 s compared with our RVFL ensemble which takes only 1.76 ms for training and generation phase of RVFL stacking ensemble, while the runtime cost for AE ensemble in Khan et al. [27] is not given, deep AE ensemble is computationally expensive and RVFL networks are computationally fast and efficient learners in shallow networks [43]. The proposed scheme is 2.3× faster than a single DT and tree-based ensemble techniques such as RF, Bagged Trees or EBT require a

Conclusion
In this paper, we proposed a novel algorithm for classification of falls though the use of fractal features and an ensemble of RVFLs combined with an RVFL neural network. The fractal Hurst exponent is computed from the SSC method and provides an irregularity measure of the signal. The proposed features based on fractal analysis provide a high classification accuracy with DT, LDA, KNN, SVM, RF and ANN and the proposed ensemble. The proposed ensemble utilises a novel and fast selection methodology for base classifiers based on a diversity indicator obtained from overall performance measures of TP and TN values determined during the training procedure. The novel RVFL ensemble classifier proposed in our work gives the highest accuracy of 95.71% compared with other classifiers on the same set of features, an improvement of 5.71% over RF and  Chelli et al. [5] Chenetal. [6] Proposed FDS   Year  2017  2018  2019  2019  2019  2020  Dataset Datasets [16], [40] Self-simulated Dataset [8] Public datasets Self-generated Dataset [29] Sensor Tri-axes Acc., Tri-axes Acc., Camera Tri-axes Acc., Tri-axes Acc. Tri-axes Acc. Gyro.
Gyro. Gyro. ANN, and 7.14% over the SVM. The proposed classifier achieves high gain in runtime. The speedup in training time of the proposed RVFL ensemble is 317.7× compared with an ANN and 198.56× compared with an RF ensemble. The proposed RVFL ensemble is 2.3× faster than a single DT. Furthermore, the proposed scheme has higher or comparable accuracy than most of the latest ensemble methods and provides the lowest runtime cost of 1.76 ms. The proposed ensemble and the ensemble selection algorithm proposed in our work is orthogonal to the application and features used. The speedup advantage of RVFL ensemble can lead to real-time implementation on low-end cores. This can enable on device training and real-time detection and immediate notification for medical response on a fall event.

Compliance with Ethical Standards
Conflict of Interest The authors declare that there are no conflicts of interest.
Ethical Approval This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.