First-order Layer in Artificial Pain Pathway

The neural mechanisms involved in pain perception consist of a pathway which carry signals from the periphery to the cerebral cortex. First-order pain neurons transduce the potentially damaging stimuli detected by the sensorial extremes into long-ranging electrical signals that are transmitted to higher order neurons where the organisation is more heterarchical, especially in the cerebral cortex. However, the first order neurones, as their name states, have a degree of branching which clearly identifies them as hierarchical elements in the arrangement of pain pathway. This research aims to develop an artificial neural pain pathway that mimics this biological process, in particular the first order neurones. First, the research proposes the periodogram method on the condition monitoring data with a minor malfunction and operational damage. As the pain is associated with actual or potential tissue damage, using such data from a machinery system can provide insights which can be used to improve the computational effectiveness. Then, a one-dimensional convolutional neural network model is introduced to represent the second and third orders of the pain pathway. The research findings found clear support for studying the similarities between the major components of biological information processing of tissue damage and statistical signal processing for damage estimation.


Introduction
Research on artificial neural networks (ANN) has become one of the most significant development in the machine learning field of modern digital age. Structure of an ANN is often reported to be somewhat analogous to the biological neural networks that constitute a multi-layer mammalian brain. Admittedly, the anology is vague and biological systems are substantially more complex than or current understanding. However, the transferring overall contents of neurons to computational process has provided a number of distinguished models -with record-shattering success-from recurrent neural network variants [1][2][3][4][5], to convolutional neural network-based deep learning classes [6][7][8][9][10], and the alternative physical energy based model translations for use in cognitive science [11][12][13][14][15], along with the unsupervised autoencoders [16][17][18][19][20].
Another rapidly emerging field with a similar trend to ANN is the system health management by use of historical data, current conditions and future states [21]. For most contemporary complex systems embedded with peripheral sensors, deploying a machine learning model is certainly a stringent task. This might be even more challenging in systems where the authorities impose meticulous regulations for the sake of safe and secure practices [22][23][24].
Bio-inspired computing algorithms processing condition monitoring data can overcome this issue with a new ANN design that is comparable in certain respects to the evolution of the brain and its robust systems to handle the damage in the body. In this respect, it is important to comprehend the damage and its nature as well as how to adapt it in computing algorithms.
According to the International Association for the Study of Pain (IASP) terminology, the pain is "an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage." [27,28]. Based on this partial similarity, one can hypothesise that the damage (or operational anomaly) in the systems is analogous to pain phenomenon in the neural system. As the knowledge underlying these choices is mostly absent, this research aims to alter available neural network classes by introducing a novel firstorder transformation layer to represent the pain pathways. The goal is to enable the systems to have specific cognitive abilities and coordination mechanisms of pain processing in a braininspired manner, and finally achieve a successful damage regulation level that is analogous to biological pain. Because the existence of hierarchy on "the degree of branching" on first order neurons and heterarchy across the following complex brain parts, this research chose to examine whether the conventional network structures based on a heterarchical collection of connected nodes transmitting a signal to each other, can be fed by a feature computing method rather than forming probability-weighted associations between different units. So that, such a method can convert a signal from its original domain (such as time) to a desired feature representation. Accordingly, this study proposes a one dimensional convolutional neural network model to represent the higher orders of the pain pathway and the Welch's method (also known as the periodogram method) to transform a sequence of time samples of a signal by estimating the power spectral density of the same signal.
The rest of the paper is structured as follows: First, the study reviews the extant scientific background and related works on the pain pathways, convolutional neural networks and the periodogram method. Then, a definition of the methods and techniques used in the article is given. A case study follows the methodology section and conducts an in-depth exploration of the success of the first order transformation layer within the operational damage context. The results of the case study are then discussed. In the final part, the article concludes with a discussion of implications and potentials for further research.

General Pathways of Pain Sensation
Pain is an unpleasant somatic sensation that is associated with tissue damage and it is a physiologically critical function for survival and evolutionary fitness in sustainable habitats. Simplified depiction of three sensory neurones carrying pain information through ascending signaling pathway (adapted from [25, 26]) Within the general somatosensory warning mechanism, there are major known components of the ascending pain pathway designed to alarm against real or upcoming hazardous stimuli with the potential for tissue damage [25,26,[29][30][31][32][33][34]. First-order pain afferents enter the spinal cord via the dorsal root [29,34]. While terminating superficially at this location, the first order neuron synapses with the second-order neuron after which the may ascend signal to higher centres that relay the information to cortical centres [29,30,34]. Figure 1 gives a underlying diagram of the ascending pain pathway, with the first-order neurons from periphery to dorsal horn and second-order neurons from dorsal horn crossing over in the spinal cord into the thalamus and finally the third-order neurons synapse to thalamus and project to frontal cortex [25,26,30].

Modelling the Pain Sensation
By the very nature and architecture of this pain pathway depiction, the model in this paper simulates the way the brain processes the damage signals in the depicted neural orders. The analogy here is more like a shared abstraction and the corresponding elements do not share only a loosely relation, but also an inspiration. Displaying a resemblance to the activity of first order neurons which respond to noxious stimulation and send incoming signals up the spinal cord [29], the framework will initiate with an approach using Fast Fourier Transform (FFT) for the power spectra estimation, introduced by [35]. The spectral density characterises the frequency of a signal, and it can represent and convert the properties and features of the same signal. A series of recent studies has indicated that spectral analysis can play a vital role for a better understanding of damage characterisation [36][37][38].
When the condition monitoring case is related with vibration or any rotating equipment (which will be in the following case study section), it would be difficult to picture the analysis without the FFT, the use of which can be found in almost all spectrum methods such as the Welch's method [36]. That being the case, the research will use the Fast Fourier Transform for power spectra estimation to represent the transformation of pain signal from nociceptors to the spinal cord. Second order neurons resembles the layer by layer structures of the convolutional and sub-sampling (also known as pooling) layers, the introduction of which propelled the field of Deep Learning. Therefore, the research chose to further advance the information from first order layer with a Convolutional Neural Network (CNN) and examine whether these frequency features can provide a strategic advantage over the understanding of damage.
CNN, proposed by [6], have repeatedly proven its success in various deep learning applications, including the damage related topics of fault detection, diagnostics and prognostics along with the systems health management [39]. The work of [40] presented a vibration-based structural damage detection system using 1D CNN. The model could automatically extract damage-sensitive features from the acceleration signals measured under random excitation, and output a binary classification label (damaged and undamaged). Likewise, vibration and acoustic data have a major role in damage for rotating machinery and has become focal point in the field of prognostics and health management (PHM) [39]. Current strategies have been used in various domains.
Ince et al. [41] implemented an adaptive 1-D CNN into an early motor fault-detection system with an inherent adaptive design fusing the feature extraction and classification phase into a single learning body. To autonomously extract useful features for bearing fault detection, [42] used a CNN based fault detection system as an end-to-end machine learning model. One important part of the damage degradation is to construct an effective health indicator that can expose quantifiable characteristics of system deterioration. [43] proposed a convolution feature learning based model to construct such indicators in bearing systems. [44] extended the use of CNN to wind turbine monitoring applications by processing complex vibration signal inputs from both rotor and gear box bearings. Their results demonstrate that the deep learning systems can outperform both the so-called shallow-learning models and human experts.
With an alternative design, [45] processed raw vibration signals through Deep CNN with wide first-layer kernels. These raw data are processed by the wide kernels in the first convolutional layer which extract the features and suppress the high frequency noise. In a similar manner, [46] proposed a hybrid model based on the continuous wavelet transform and CNN. The wavelets constructed the time-frequency gray scale images to be used in as the input of CNN. The trained function could extract the features from the fault signals and detect the faulty feeder at the same time. In an another hybrid model, a fusion layer was annexed to integrate a multi-layer perceptron with CNN [47]. The features were extracted from timedomain statistical features by using the annexed feed-forward ANN and combined with the local vibration signal feature from the CNN so that they can further identify different faulty states.
Extending the fusion concept to parallel computing of deep learning models, [48] employed a long short-term memory neural network (LSTM) for temporal feature extraction, and also simultaneously use CNN for spatial features. Then, a fusion centre combines these two paths to estimate the remaining useful life time of aero-propulsion systems. Overall, these studies demonstrates the significant performance of CNN on the damage detection, particularly when fused with an accompanying approach. In accordance with the success of the hybrid approaches, this research will fuse the periodogram method with the convolutional layers that will represent the second order neurons. the outputs will be converted (or flattened) into a 1-dimensional array for inputting them to the last layer. The cell bodies of third order neurons lie within a region on the primary cortex. Here is where convolutional layers and a classical feed forward network will collide as the latter is included in the form of a "Fully Connected" layer. The specific region will be represented as a dropout layer. The term refers to randomly "dropping out" units, both hidden and visible ones along with their connections, from the neural network during training process [49,50] so that over-fitting in training will be avoided. This theoretical foundation is a simplistic modelling of an artificial pain pathway (APP), and it is shown in Fig. 2.

Methodology
This section provides information on the proposed periodogram method and CNN procedures for the application of damage diagnosis. The framework is a series of three layers which helps extracting the anomaly related features and respond to the final feed forward network for classification. The details of these layers are defined in the following parts.

Welch's Method
The power spectral,or power spectrum, estimation of a given signal is one of the most wellknown practices in digital signal processing [51]. It is the measure of a signal's power content falling within given frequency bins , and represented by P x x ( f ).
A periodogram is depicted by such a power spectrum of a finite time series as: by using the finite time series, the periodogram can be composed by the method of [52], with the following formulation [53]: where, the Fourier transform of a sample sequence, X ( f ), is defined as: With reference to this, the time series can be transformed into a spectrum of frequencies.
Welch's method is regarded an adaptation of this conventional periodogram method [35,51] and, accordingly, often called as the periodogram method. The method is implemented by allocating the time series into sequential divisions (segments), using the periodogram formula for each segment, and calculating the average of them [54]. Let the segments' length be L and the starting points of each be D steps apart [35], first of the segments is given by hereby, second of the segments is so that the total number of data is where K corresponds to the total number of segments and the k-th segment is defined as Figure 3 shows this segmentation of time series and illustrates the above mentioned parameters in the coordinate system. Second modification of the Welch's method is to weight the segments by a window [35], formulated as Segments before computing the periodogram form the sequences of When these sequences are applied into the periodogram formula, the result then becomes a "modified" periodogram,P x x ( f ).
where U stands for a normalisation factor for the power in the window function [53] and it is defined as In the final step of the Welch estimate, After applying these spectral density estimation steps, the data are transformed into a new form that will be used for the first layer of CNN. This is also known as the input layer and formed by artificial input neurons that bring these transformed data into the network structure for the subsequent layers.

Convolution
A convolution operation is the first building block in a CNN architecture. The feature detectors here serves as the filters. Proposed framework employs a 1D layer that forms a convolution Fig. 4 Visual demonstration of convolution operation: blue grid is the input map, kernel is the glared area and green grid is the convoluted output kernel which will be convolved with the corresponding input layer over a single dimension (see Fig. 4). This is a slide over a spatial (or sometimes temporal) dimension to produce a tensor of outputs. Sliding allows a element wise multiplication with the corresponding input section, and then summing up the matrix into a single production (Fig. 4, dark green square) which will be placed in an another 1d feature vector (the green grid).
For mathematical definition, let's call the input vector -the one arrived from the periodogram methodf and the kernel g. The convolution of f and g is defined as where the result vector index is given by i.

Non Linearity -ReLU Layer
An additional activation function Rectified Linear Unit is applied an to the convolutional output. Rectified Linear Unit (or shortly ReLU) is an essential unit of CNN with an outperforming performance comparing to other activation functions and a wide recognition in the literature [55]. It is a non-linear operation and its output is given as This is applied piece-wise by replacing all negative ones in the feature map by zero.

Sub-sampling -Pooling
The following part of CNN is max-pooling, a form of down-sampling (also called subsampling). It partitions the convoluted data into a set of non-overlapping sub-regions and outputs the maximum from each. So it reduces the dimensionality of representation while retaining the most useful information. Down-sampling decrease the number of features and network computation, and therefore it can reduce the optimisation complexity and control  Figure 5 illustrates the Max-Pooling (and also Average pooling) with a 1×4 window. Each operation is over 4 numbers and discards 75% of the content. After including the max-pooling layer, the artificial pain pathway takes the following form becomes ready for third order neurons.
Simulating the primary thalamus function of relaying signals to the cerebral cortex, the output of the preceding pooling layer is flattened into a single column to be inserted into an ANN classification layer in the following step.

Fully Connected Layer-ANN Classification
After previous layers and flattening, the high-level classification in the model is done via fully connected layers in which the neurons connect to all outputs of the prior layer and also to following ones on the next layer, as seen in regular (non-convolutional) ANNs. A basic single-input neuron is defined by input (x), weight (w), bias (b) [56]. To find its output, the arriving input signals are adjusted by the connection weights, then summed and and a bias term is added. The output is passed through a transfer function ( f ) to produce results. These are expressed as In a fully connected neural network layer, there are multiple of these units organised in multiple layers, that can simply associate input elements to an output, while in aggregate do complex computations (see Fig. 6).

Dropout
Over-fitting is a critical problem in ANN training due to either large number of parameters or complex co-adaptations unable to generalise the data. An over-fitted model corresponds too closely to input data, learns the detail and noise, and accordingly fail to fit additional samples accurately. Applying a dropout layer to the input of previous one is an efficient way of addressing this problem [49]. The term refers to randomly "dropping out" neurons -along with their connections-during training process [49,50]. In Fig. 7, there is an illustration of how the neurons, and their incoming and outgoing links are temporarily removed from the training process. Here, an architecture randomly sets the neuron input units to 0 with a frequency of rate at each step while the rest are scaled up by 1/(1 − rate).

Sigmoid Function
Sigmoid activation function is applied in the output layer. Having a characteristic S-shaped sigmoid curve, it always provides a return value between 0 and 1, and defined by the formula of The final structure of the framework is visualised in Fig. 8 where the methodology steps are carried out according to the order. This representation follows the proposed Artificial Pain Pathway workflow: start of the processing by segmenting the vibration data into smaller segments, input layer formation from the segmented data, first-order transformation layer, feature learning at convolutional layers, and fully-connected layer for classification.

Flight Data
The methodology is tested on vibration data monitored by sensors mounted on a T50 rotorcraft from Aveox-a turbine-powered unmanned aircraft with dual rotors. The data set was recorded on September 17, 2019, at a small airfield in Denmark. There were four flights From previous operations, the flight team observed that, when fully loaded, the aircraft had severe vibrations-beyond the acceptable level. With this anomaly in mind, one of the flights was presented with high payload to understand how severe this issue was. After consulting the manufacturer, the problem was mostly remedied by a change to the rotor blade and rotor hub configuration. That being the case, the vibration data from this flight indicates a minor malfunction (or damage) that might have let into a more serious issue of structural degradation and perhaps ultimate disintegration.
All test flights started with the same take-off position and reached to an altitude around 25 meters, with the exception of the last flight which was in higher altitude to increase the wind speed exerted on the aircraft. Detailed between between the flights are as follows: To record the vibration data during these flights, a series of identical IMU sensors-SparkFun 9DoF Razor IMU-were fixed to the aircraft in locations as shown in Fig. 9. They were located in random orientations, since only the magnitude was of interest. Sensor recordings include angular velocity, translational acceleration and the magnetic field at a rate of 100 Hz.
During the test flights, the constantly changing wind speed and direction had an obvious affects on the aircraft which might have caused aerodynamic effects on the rotor and changed the vibration pattern. Flight videos can be accessed at the UAS-ability YouTube channel [57]. Fig. 9 The photo shows the T50 aircraft with the location of the seven IMU sensors, the wind sensors, and the payloads for flight 2 and 3. The payload for flight 2 is no mounted, but the location is shown They are named with the date -i.e. "19.09.17" -and -FL00x -referring to the flight number. There were four camera views available; an overview camera, a manual tracking camera, an on-board camera, and a ground station camera (called MGCS). During the third flight, there were also an additional video of the aircraft taken by another drone.

Signal Pre-processing
Flight data were uploaded to a common repository and timestamped. These raw sensor data contain instrumental errors -some extreme measurements beyond the expected range and unlike the other readings. Once captured, the primary data need to be cleaned from these outliers for a better analysis and useful observations best illustrating the case.
To find the outlier points that differ significantly from the distribution, the data are normalised based on the mean and standard deviation. Any standardised point significantly higher than the mean is an outlier and replaced by using linear interpolation between two of its neighbour points. Figure 10 provides raw data representation after dropping the outliers. Of the 7 IMUs on the drone, the one mounted directly onto the autopilot-called VECTOR in this aircraft -is the most representative sensor for what would be available on a normal unmanned aircraft. This is mainly because it is located in the ideal position where it can accurately detect anomalies or changes in the drone environment and provide the related information. On account of this, the data from this IMU is used in the model testing.
To apply the proposed framework, the data on Fig. 10 are segmented by dividing the recorded time series up and grouping sequentially into segments so that the APP can classify each segment into a proper label -"regular" or faulty operation. These segments are represented in Fig. 11 with a windows size of 100 for each. Both early and late predictions are removed to avoid the take off and landing data which include relatively less vibration than the rest of operation, and therefore can confuse the model. After the proposed periodogram method applied into each of them, their size are reduced and they could provide a distinguishing damage related features. Figure 12 gives an indication of how the power spectral density for each segment varies with frequency. Each spectral density estimation is estimated by dividing the cycles into overlapping parts, calculating a modified periodogram for each part and averaging them.
Even at first glance, it is obvious that the second flight comprise higher power spectrum values than the other three regular flights. The extracted features from these periodograms are potential indicators of the flight regularity or anomaly. When the data are transformed as like in Fig. 12, the results become the output of the first order transformation layer in the APP and will then be associated with a 1D CNN. The results indicate a relatively small difference in both the accelerometer and gyro values-except for the second flight which displays an anomaly in the operation as expected due to the extra payload. From this, it is possible to argue that Flight 2 has an appreciably more disorder than the others conducted by the same drone.
The findings may also hint a threshold level indicating boundaries of the acceptance region for each sensor reading. When there is a significant number of critical values surpassing the threshold level, it can be regarded that a flight anomaly have occurred.
However, some sensor readings during Flight 4 may surpass the threshold value-probably due to higher altitude in operation. A threshold model may not distinguish these from an flight anomaly -suggesting that some measurements in a certain altitude or a direction might fail to demonstrate overall performance. On the other hand, the other directions of the same sensor on Fig. 12 perform as expected. To have an efficient classification, CNN employs a process of taking all 6 sensor vectors as an input and outputting a probability that the segments fall in particular labels of "regular" and "anomaly" operation. Table 1 gives details of the configuration, or architecture, which specifies what layers the framework contain, and in what order they are connected. There is the input layer in which 6 different vibration data are segmented and processed through to the transformation layer. Then, it outputs (51,6) values to the second order layer where the convolutional layer creates a kernel with a single point window length. To produce a tensor of outputs, the kernel is convoluted with the first order layer output over a single spatial dimension. There are 32 output filters in the convolution and 6-length input vectors with 51 time-steps. The following pooling layer down-samples the incoming output representation to (25,6) by taking the maximum value over the window of 2. Then, these data are flattened and processed by 3 regular densely-connected neural network layers. Finally, an output layer with the sigmoid function is responsible for producing the final classification.
With these steps, the model is compiled and fitted to the segmented vibration data. The number of epochs to train the model is set to 25 -after this number, the epochs do not affect directly the result of the training step. An epoch is a full iteration over samples, the entire input and output data provided. During the training process, the iterations are required to update the network's parameters so that the algorithm can reach an optimal point where the classification is accurate enough. In compiling, an optimiser is implemented with the Adam algorithm to optimise the parameters. This is a stochastic gradient-based optimisation method, based on adaptive estimation of first-order and second-order moments [58]. Adam optimisation allows straightforward implementation with little memory requirement and computational efficiency, and the method is also compatible with the complex cases that are large with regard to data and parameters [58]. With the purpose of computing the loss quantity that ANN seek to minimise during training, the built-in loss function computes "the mean squared error" between the known output labels and the model estimations. The metric function used to judge the model performance is "accuracy" which calculates how often the model estimations with actual output labels.

Results
The experimental findings in this section come from the model testing that evaluates the framework in the form of descriptive statistics. The section provides graphs and plots as well Output: Activation: Sigmoid Input: Output: (2) as the results of the model analysis. In light of the estimations and model performance, the section shows the model's ability to accurately perform classification task not only just with training data but also with validation data; so that the model can be actually deployed in real-time with run-time data. The outcomes are reported in detail so that one can justify the proposed configurations. The output labels takes the value [1,0] for regular operations and [0, 1] for anomaly to convert the categorical classification into indicator variables. The full data set is divided into two subsets: a set to train the model and set to test the trained model. The partition of full data set allows an unbiased evaluation of the final model fit on the training data. On the assumption that the test set meets the preceding training results, it can be concluded that the model generalises well to new data and the test set serves as a proxy for evaluating the loss and metrics at the end of each epoch.
In Fig. 13, the model accuracy for both subsets are plotted. An accuracy of > 95% is achieved for both and, that is to say, the model does not over-fit the training data. In both cases, the model could fit the parameters and produce improving results by iterations without corresponding too closely to the training date or making an overly complex focus on the  idiosyncrasies. Using the dropout layers as presented in Table 1 is a major reason why the model could fit the test data efficiently.
There is also a similar trend in the training and validation loss over epochs, see Fig. 14. The minimal difference between the two lines and the same decreasing results proves that  the models is computationally inexpensive and has an effective regularisation process along with an improved generalisation. The low loss metrics, along with the high accuracy, are promising findings hinting that the proposed models can perform well at deciding on regular or anomaly conditions. Figure 15 displays an approximate representation of the distribution of error (|e| > 0.01) for both label columns. In the histogram, the bar groups show that much of train error falls into the acceptable range between −0.5 and 0.5. By returning the indices of the maximum values along each row of the estimated targets as returned by a classifier, the ones in the acceptable range are regarded as successful estimations. Comparing these returned indices with the corresponding second column of the ground truth (correct) output labels, a report showing the main classification metrics is built and the results are shown on Table 2. Output: Activation: Sigmoid Input: Output: (2) rt includes precision, recall, F1 score for both labels and also the overall accuracy of the model. The reported averages include macro avg -averaging the unweighted mean per prediction class-and weighted avg -averaging the support-weighted mean per prediction class. Table 2 also provides a confusion matrix to evaluate the classification accuracy. In the binary classification estimation, the count of true negatives (regular) is 1293, false negatives is 27, true positives (faulty) is 330 and false positives is 25.
Even though the results are desirable, the data scarcity is a major bottleneck and the model performance relied heavily on the windows size used to segment data. A segment with more data can provide better features to process but this also reduces the number of available samples. After the segments are updated with a new windows size of 200, the network was re-configured as in Table 3, re-run with the same setup and the results are refreshed.
The results of the later training has provided slightly better results. Figure 16 illustrates the performance of classification. Both validation and accuracy rates confirms that more data for each label output can provide better results on classification. Additionally, the results on Fig. 17 lead to similar conclusion where there are fewer errors beyond the acceptable range (between −0.5 and 0.5). Even though slightly superior results are achieved, it is worth  discussing that these interesting facts revealed by the results of higher windows size. These might even cast a new light on using more data from additional sensors or data types.
With a similar comparison between the indices of the maximum of the estimated targets and the ground truth label, a further classification report and a confusion matrix is given on on Table 4. Together, these results tie well with previous ones. When comparing the label accuracies, it must be pointed out that false positives are higher than false negatives even though there are more regular ones. This might be regarded a negligible bias or potential limitation but, in a real case scenario, it would be more important to correctly estimate and classify the regular operations without causing any undesired interruption.

Conclusion
This work proposed and demonstrated an artificial pain (damage) mechanism which process vibration signals from four different drone operations. The model employs the integration of the periodogram method and a 1D CNN with multiple dense layers. The trained model has the ability of extracting the fault related features, and also detecting the operational anomaly with high accuracy. The results clearly revealed that the artificial pain pathway (APP) was able to successfully classify shuffled data-the segmented windows from both "regular" and "anomaly" flights. Furthermore, in addition to initial setting, an increase in the windows size (more input data per classification label) reveals a slightly higher performance of the model. Based on the classification results demonstrating that the proposed framework is reliable and robust for vibration related damage, it would be an interesting direction to consider other data types and signal conversion methods to be used in the first order layer. So that, the use of APP can be extended to further prognostics and health management applications.