1 Introduction

In recent years, with the development of mobile communication system equipment and Internet of Things, wireless transmission technology has played more and more important roles in our daily life [1]. Compared with wired network, wireless network is more convenient and concise. Furthermore, in terms of cost, the wireless network greatly eliminates the wiring and decoration costs. However, intrusive attack on electronic devices is growing rapidly. Wireless signals have been often used as a cornerstone of massively malicious attacks, and the broadcast characteristic of the wireless transmission makes the problem worse. It is essential to guarantee the safety of information transmission, urging us to pay more and more attention to security problem and new countermeasures. Physical layer security is the most basic part of wireless transmission security. Many attackers invade security system by copying the device and mocking the signal. For example, the thief gets into the car by imitating the signals like car keys; the intruder enters confidential system by mocking license signal emitted from cloning devices. Countering RF device cloning is an issue that we urgently need to solve. Fortunately, due to slight differences in production, even the “same” devices will have some discrepancy, which is hard for us to observe it directly. But that still gives us opportunity to identify different RF devices and find cloned equipment for malicious attacks. At present, RF-DNA fingerprint technology is a rising technology which is adopted to counter related risk such as device cloning. Traditional DNA refers to the biological internal attributes and different individuals have different DNA. Similarly, we think each RF device has its own intrinsic physical attributes called RF-DNA fingerprint. In this paper, by calculating the statistics features of many signals emitted from one device, we could get “RF-DNA” of each device. In other words, RF-DNA fingerprint is discriminating features extracted from different RF devices [2], and any two RF devices must have differences. The differences are due to equipment noise and hardware production error [3], reflected in their output signals.

The main mission of RF-DNA fingerprint technology is to distinguish the signal and counter cloning, which could be summarized as identification and authentication. There are some RF devices and an unknown signal. We need to identify which device that the unknown signal came from and this is called identification. As for authentication, an unknown signal claims that it came from one RF device, and we need to find out how credible it is. It is used to prevent the two different devices from using the same RF-DNA.

Similar to biology DNA recognition, RF-DNA fingerprint technology could identify machines, which will have great application in lots of field such as information safety, criminal investigation, and even military command. Once this technology become mature, malicious cloning devices, in all likelihood, will be caught, and our wireless information transmission will become more secure. So far, some related research and progress have been done in this field.

Over past two decades, there are many development opportunities [4] and physical layer challenges [5] in RF-DNA fingerprint issues. At present, it is a mainstream method to classify the amplitude, frequency, and phase features using MDA method [2, 69]. Firstly, research [2, 6] enabled both identification and verification device issues and extended the process from the three-class to general N-class problems. By setting a priori distribution of multivariate Gaussian distribution, posteriori probability could be calculated [2, 10] to achieve authentication simulation. Previous studies showed the impact on the number of dimensions [7, 11] and the number of sub-region [8] on classification accuracy. The more feature dimensions will get better classification results. Additional, a signal could be divided into midamble region and near-transient region. Choosing near-transient region or inter-manufacturer of mobile communication system could lead to higher classification accuracy [1, 9]. Besides MDA, decision tree algorithm [12] or other classifiers are also good methods to distinguish the signal. Classifier selection is a crucial part of RF-DNA technology [13, 14]. SVM classifier [15] was applied on kernel-independent component nonlinear feature extraction. Research [16, 17] proposed to use probabilistic neural network based on Bayesian classification as classifier. Generalized relevance learning vector quantization-improved [12, 18, 19] is a supervised machine learning algorithm based on MDA, which shows better performance. As for the different feature extraction methods, fast Fourier transform [20, 21], the short-time Fourier transform [22], and discrete Gabor transform [10] could also be used to extract features. Previous research had compared their performance, including on time domain, wavelet domain [23], and spectral domain [24] features. Some novel approaches such as least square estimation [25] and the phase characteristics [26] were proposed to extract transient fingerprint. Also, research [20] separated the features apart and finds more important features. Furthermore, some physiological electrocardiogram signals [15, 17] were classified by emerging artificial neural network [20, 27] directly, especially by recurrent neural network [28] which is worth drawing lessons from.

2 Experimental process

In general, RF-DNA fingerprint technology is divided into the following four steps:

a) Signal collection. Command the RF devices send out a series of unintentional signals and the receivers could collect them. Repeat the above process many times to collect lots of signals. These signals should be considered as security signals and will be used as training set in our classifiers.

b) Feature extraction. Each signal has some own statistical features including on time domain, frequency domain, and some other features. On the one hand, the purpose of extracting features is to reduce dimension. On the other hand, the features might have a more accurate description of this signal. This step is the core step of RF-DNA fingerprint technology. A good feature selection often means good classification accuracy.

c) Set up database. After feature extraction, the feature sequence of each signal is put into the database and should be labeled where it comes from. These features are called RF-DNA fingerprint.

d) Classification. The main mission is to distinguish the label of an unknown signal sample. We could judge by comparing the features of unknown signals with the known samples features in the database.

2.1 Signal collection

There are four RF devices embedded with NRF24LE1 chip shown as Fig. 1 (Fig. 1 contains Fig. 1a and Fig. 1b. Fig. 1b is an enlarged view of Fig. 1a). The only difference among these RF devices is the date of manufacture which are 11th week of 2011, 31st week of 2011, 24th week of 2014, and 48th week of 2015, respectively. The research signals in this paper were collected in May 2018.

Fig. 1
figure 1

Four RF device chips and one enlarged view. a Our four sampling equipments and their base. We used them to collect original signals. b An enlarged view of one RF device produced in 24th week of 2014

The experimental signal acquisition process was shown as Fig. 2. Our personal computer (PC) controlled the RF device to emit unintentional signals, and the detector received the signals displayed in the oscilloscope. The detector recorded the original amplitude signal from these four RF devices once the waveform is stable. Then, the signals were pre-treated by Microsoft Decoder Sample. All signals were obtained under 2.4G bandwidth, and the sampling frequency is fs=25.6MHz. For each RF device, the valid signal lasts about 9 s and has about 230,000,000 sampling points in total.

Fig. 2
figure 2

Experimental signal acquisition process. The PC controlled the RF device to emit unintentional signals and the detector received the signals shown in oscilloscope, and the distance between RF device and detector is 5 meters

2.2 Additive white Gaussian noise (AWGN)

The signals were collected in the closed basement, which could be considered as a relatively low noise environment. The distance between RF device and detector is only 5 m. Besides, the outside noise influences were limited as much as possible. However, such experimental scene selection may not have versatility and might not be suitable for practice use. Due to the limited experimental conditions, our experimental scene is unique. The laboratory environment data could not give a convincing result of the performance. Therefore, AWGN was taken in order to evaluate performance under some less ideal conditions and make our results more universal. The SNR was calculated as formula (1).

$$ SNR = 10 \times {\log_{{\rm{10}}}}\left(\frac{{\begin{array}{*{20}{c}} {\text{Signal}}&{\text{power}} \end{array}}}{{\begin{array}{*{20}{c}} {\text{Noise}}&{\text{power}} \end{array}}}\right) $$
(1)

The noise power could be controlled by AWGN while the signal power could be calculated from original amplitude signal. Through analysis and calculation, the SNR of original sampling signal is 30 dB. That means the signal power is 103 times than noise power, which could be considered that there is hardly noise in sampling. And after different AWGN, we could get the SNR={0, 1, 3, 5, 7, 10, 15, 20, 25, 30} dB environment, respectively.

2.3 Sample generation

Too short sample leads too poor classification accuracy and too long sample is lack of persuasion, thus, taking L=218=262,144 sampling plots as one sample is a plausible choice. Considering that the sampling frequency is fs=25.6 MHz, each sample lasts about 0.01 s, which is in a relatively high precision level.

For each RF device, we divided the original signal into N=2000 samples. According to the order of the production data, the label catalogs of four RF devices are M1, M2, M3, and M4, respectively. Signal samples from the same device will be marked the same label. In order to ensure the adequacy of training, we randomly take T=1600 samples as training samples from each RF device. The rest 400 samples are set as testing samples to assess performance.

During the operation of signal collection, there are too much bias that we could not fully observed. The system should work directly on the original data with minimal pre-processing. In practice, in order to reduce potential signal collection bias and eliminate the dimension of the data, the original signal sampling sequence x(n) should be normalized. We take linearly normalization method to handle every single original sample before the feature extraction as formula (2).

$$ a({\rm{n}}) = [x(n) - \min (x(n))]/[\max (x(n)){\rm{ - }}\min (x(n))] $$
(2)

where x(n) is original amplitude sampling signal and a(n) is the normalized signal.

Then, the amplitude of each sample is normalized to the range of [0, 1] as Fig. 3. These four diagrams show the 500 signal points of four machines, respectively. The unstable signal in front was then abandoned. With the naked eye, the signals from the four RF devices are very similar. It is almost impossible to see from the figure that amplitude profiles are visually distinctive. Therefore, we need extract RF-DNA features to identify the differences.

Fig. 3
figure 3

Five hundred normalized signal points of four RF devices. We show 500 original samples of four RF devices. Those signals were normalized to the range [0, 1]. The unstable signal in front was then abandoned. It is almost impossible to see from the figure that amplitude profiles are visually distinctive. Therefore, we need extract RF-DNA features to identify the differences

3 Statistic fingerprint generation

3.1 Divide sample into sub-regions

Figure 4 elaborates the whole fingerprint generation process. The first two boxes have been introduced in Section 2. That is, there were k=4 RF devices, and we collected N=2000 signal samples for each RF device. However, a relatively ideal condition to extract features is on a steady signal. Hence, we decided to divide one signal sample into NR equal length sub-regions and thus each region could be considered more stable in comparison. Additionally, the benefit of doing this is that you can increase the dimension of fingerprint features. Bihl et al. [7] showed that the increase of feature dimension may increase the accuracy of classification. Figure 5 demonstrates the sub-region allocation process. Then, we got NR sub-regions and one complete sample region, totally (NR+1) regions. We extracted the features separately in these (NR+1) regions. Cobb et al. [8] analyzed the performance of parameters NR value, and we take NR=16 which is a reasonable trade-off.

Fig. 4
figure 4

The whole process of fingerprint generation. The whole RF-DNA fingerprint generation process including five steps: (1) four RF devices, (2) collect N=2000 samples, (3) divide one sample into NR=16 equal length sub-region, (4) calculate NF=3 features, and (5) extract NS=4 statistics features

Fig. 5
figure 5

One sample signal was divided into NR equal length sub-regions. A schematic that we divided the signal equally. We divided one signal sample into NR equal length sub-regions and thus each region could be considered more stable in comparison. Additional, the benefit of doing this is that you can increase the dimension of fingerprint features

3.2 Feature extraction using Hilbert transform

The most straightforward method to extract features is using the original amplitude signal as our features. However, our results show that such classification accuracy is less than 40% using only amplitude feature directly without any transforms. Hence, we need to find more feature dimension information. Using Hilbert transform, instantaneous amplitude (IA) noted by a(n), instantaneous phase (IP) noted by φ(n), and instantaneous frequency (IF) noted by f(n), totally NF=3 features could be extracted from the given real-valued time domain signal.

Firstly, the IA signal was converted into I-Q sample SC(n)=H(a(n))=sI(n)+sQ(n). Next, the IP φ(n) and the IF f(n) were calculated as formula (3).

$$ \varphi (n) = {\tan^{- 1}}\left[ {\frac{{{s_{Q}}(n)}}{{{s_{I}}(n)}}} \right],f(n) = \frac{1}{{2\pi }}\left[ {\frac{{d\phi (n)}}{{dt}}} \right] $$
(3)

3.3 Calculate statistical fingerprint

Compared with the previous method, we propose mean (μ) feature could be added to our statistical RF-DNA fingerprint. Taking IA feature a(n) as an example, mean, standard deviation (σ), skewness(γ), and kurtosis(κ) were calculated as formula (4)–(7). That is, NS=4 statistical fingerprint were calculated in each (NR+1) regions and each NF feature sequence. The hot picture of normalized statistical fingerprint was shown as Fig. 6, which calculated average from 2000 signals for each RF device. It can be intuitively seen from the diagram that M4 has more difference from other three RF devices. That will cause M4 to be more easily identified which is consistent with to our results.

$$\begin{array}{*{20}l} {\rm{Mean:\; }}\mu &= \frac{{\rm{1}}}{L}\sum\limits_{{\rm{n}} = {\rm{1}}}^{L} {a(n)} \\ {\rm{Variance:\; }}{\sigma^{\rm{2}}} &= \frac{{\rm{1}}}{L}\sum\limits_{{\rm{n}} = {\rm{1}}}^{L} {{{(a(n) - \mu)}^{2}}} \\ {\rm{Skewness:\; }}\gamma &= \frac{{\rm{1}}}{{L{\sigma^{\rm{3}}}}}\sum\limits_{{\rm{n}} = {\rm{1}}}^{L} {{{(a(n) - \mu)}^{\rm{3}}}}\\ {\rm{Kurtosis:\; }}\kappa &= \frac{{\rm{1}}}{{L{\sigma^{\rm{4}}}}}\sum\limits_{{\rm{n}} = {\rm{1}}}^{L} {{{(a(n) - \mu)}^{\rm{4}}}} \end{array} $$
(4) (5) (6) (7)
Fig. 6
figure 6

Average 2000 fingerprint for four RF devices at SNR = 30 dB. The heat map of normalized statistical fingerprint for four RF devices at SNR = 30 dB. That is our RF-DNA fingerprint

where a(n) denotes the normalized sample signal sequence and L denotes the number of sampling points and standard deviation σ is \(\sqrt{{\sigma^{2}}}\).

Overall, for one sample i (i=1,2,…,N;N=2000), the way we generate fingerprint can be summarized as the following three steps. (a) Divide the original signal sample into NR=16 equal length sub-region and one total region. Then, we got the vector as formula (8). (b) Calculate NF=3 features signal within (NR+1)=17 regions as formula (9). (c) Extract NS=4 statistics and generate 1×204 dimension single sample fingerprint as formula (10).

$$\begin{array}{*{20}l} {F_{{R^{i}}}} &= {\left[ {{F_{R_{1}^{i}}} \vdots {F_{R_{2}^{i}}} \vdots {F_{R_{3}^{i}}}{\rm{ }} \vdots... \vdots {F_{R_{({N_{R}}{\rm{ + 1}})}^{i}}}} \right]_{1 \times }}_{({N_{R}}{\rm{ + 1}})} \\ F_{i}^{x} &= {\left[ {{{({F_{{R^{i}}}})}^{a}} \vdots {{({F_{{R^{i}}}})}^{\varphi}} \vdots {{({F_{{R^{i}}}})}^{f}}} \right]_{1 \times }}_{({N_{R}}{\rm{ + 1)}}\cdot {N_{F}}} \\ {F_{i}} &= {\left[ {\mu (F_{i}^{x}) \vdots \sigma (F_{i}^{x}) \vdots \gamma (F_{i}^{x}) \vdots \kappa (F_{i}^{x})} \!\right]_{1 \times }}_{({N_{R}}{\rm{ + 1}})\cdot {N_{F}}\cdot {N_{S}}{\rm{ = 1}} \times {\rm{204}}} \end{array} $$
(8) (9) (10)

Finally, for each RF device, the training matrix composed of T=1600 separately training fingerprint sets is

$$ Tr = {\left[ {{F_{1}},{F_{2}},...,{F_{T}},} \right]'}_{T \times {\rm{204}}} $$
(11)

4 Classification methods

Previous research [1, 2, 69] mostly used MDA as classifier for fingerprint recognition. MDA is an extension to Fisher’s linear discriminant in multivariate statistical analysis when there are more than two RF devices needed to be classified. It effectively reduces the input data dimensionality by projecting it into a lower-dimensional space. We need to find projection vector b=(b1,b2,...,b204). After projecting, the aim is to maximize λ which is the ratio of between-group to within-group sum of squares defined as formula (12).

$$ \lambda {=}{\frac{{{b'}{S_{b}}b}}{{{b'}{S_{w}}b}}} $$
(12)

where Sb is the between-group scatter matrix and Sw is the within-group scatter matrix. We could calculate that the projection vector b is eigenvector of \(S_{w}^{-1}S_{b}\), and λ is the associated eigenvalue reflecting group separation.

Based on the extracted 204-dimension fingerprint, we apply two statistic methods which are SVM and LR as classifiers to deal with Hilbert transform features for the first time. The limitation of previous research is that the classifiers can only identify the unknown sample belongs to which device. LR can give the probability of belonging to each class, which could be used to achieve RF authentication mission.

4.1 Support vector machine

Traditional SVM can only solve the two classifications problem. The training fingerprint samples have been extracted as formula (11). The training samples and their labels set is S={(F1,y1),(F2,y2),...,(FT,yT)},FiR204,yi∈{+1,−1}, where T=1600 is the training samples size for each RF device and yi is class category. Through maximizing the interval or the equivalent method as formula (13), we can find separating hyperplane ωF+b=0.

$$ \begin{aligned} &{\mathop {\min }\limits_{w,b,\xi} \frac{{\rm{1}}}{{\rm{2}}}(\omega)'(\omega) + C\sum\limits_{i = 1}^{T} {{\xi_{i}}} }\\ s.t.\hspace{0.4cm}&{{y_{i}}(\omega ' \cdot {F_{i}} + b) \ge 1 - \xi_{i}} \\ &{{\xi_{i}} \ge 0} \hspace{1cm}i=1,2,\ldots,T \end{aligned} $$
(13)

where C is penalty coefficient, and we set C=100 using tenfold cross-validation. ω and b are parameters of separating hyperplane. ξ is the distance between fingerprint sample Fi and the separating hyperplane. Finally, we would take the unknown sample fingerprint into this separating hyperplane. Through positive or negative of the obtained value, we could classify this unknown sample.

However, SVM is designed to deal with binary classification problems. In this experiment, we used one-against-one method which could extend SVM to k classes. Design sub-classifiers between any two classes, and thus, we could get k(k−1)/2=6 sub-classifiers (k=4). For example, the SVM sub-classifier of class cα and class cβ is established. If the unknown sample is classified into class cα, then class cα scores one point; otherwise, class cβ scores one point. After six times classification, the unknown signal sample finally belongs to the class which gets the highest score.

4.2 Logistic regression

The traditional logistic regression is also used to solve the problem of two classifications. Similarly, we extend logistic regression to k classes. Since there are four RF devices, we assume that P(y=cα|F)(α=0,1,2,3) represent the probability of belonging to class cα. We set y=0 as the reference group and covariant variable is F=[F(1),F(2),…,F(204)]. Set up disordered logistic regression models.

$$ \begin{aligned} {g_{\alpha} }(F)&=\ln \left[ {\frac{{P({y=c_{\alpha} }|F)}}{{P({y=c_{0}}|F)}}} \right]\\ &={w_{\alpha,0}} + {w_{\alpha,1}}{F^{(1)}} + \ldots + {w_{\alpha,204}}{F^{(204)}} \end{aligned} $$
(14)

where α=0,1,2,3 and obviously g0(F)=0. Equally, the conditional probability of label y is:

$$ P(y =c_ \alpha |{F}) = \frac{{{{\mathop\mathrm{e}\nolimits}^{{g_{\alpha} }(F)}}}}{{1 + {\sum\nolimits}_{j = 1}^{3} {{{\mathop\mathrm{e}\nolimits}^{{g_{j}}(F)}}} }} $$
(15)

In identification mission, we could infer that the unknown fingerprint sample belongs to the largest probability class cα, that is

$$ \begin{array}{*{20}{c}} {P({y=c_{\alpha} }|{F}) > P({y=c_{\beta} }|{F})}\hspace{0.7cm}{\forall \beta \ne \alpha };\hspace{0.3cm}\alpha, \beta=0,1,2,3 \end{array} $$
(16)

In authentication mission, a signal will claim that it is emitted from a security RF device. We could authenticate this signal by setting probability verification threshold P0.

$$ P({y=c_{\alpha} }|{F}) \ge {P_{0}}\hspace{0.7cm}\alpha =0,1,2,3 $$
(17)

where cα is the RF device class which the unknown sample claims to belong. The decision for this authentication mission is a binary result. If the probability P(y=cα|F) meets the threshold P0 as formula (17), we will accept this fingerprint and deem it as security signal. Otherwise, we will refuse and take it as a security signal and deem it as an imposter. For example, if the probability of an unknown sample belonging to a security device is the largest, but the probability is less than the threshold, then we still do not regard it as a security signal.

There are two values to measure the selection of the threshold which are true positive (TP) and true negative (TN). TP denotes the probability that a security signal comes and you accept it. TN denotes the probability that an imposter signal comes and you refuse it. The larger the two values, the better the authentication effect.

5 Results and discussion

Table 1 shows classification confusion matrix in three different classifiers. At SNR=30 dB environment, the test accuracy of MDA and SVM algorithm is beyond 94% on average. Therefore, we could believe that the method of feature extraction in the time domain is effective. When SNR=0 dB, the noise power is equal to the signal power, which could be considered in a very high noise environment. In such simulated environment, the classification accuracy will be significantly reduced and any two of the four RF devices may be confused. The environment noise does have a great influence on discrimination.

Table 1 Classification accuracy in different classifiers at SNR=30,0 dB

Figure 7 created by Matlab R2016a shows the tend of classification accuracy at different SNR. Obviously, as the SNR increases, the classification accuracy is increasing. When SNR is below 5 dB, the classification accuracy is less than 80% and begins to decline significantly. Both MDA and SVM show a better classification performance than LR.

Fig. 7
figure 7

Classification accuracy under different classifiers. With the change of SNR, the tend of classification accuracy under MDA, SVM, and LR

Then, we listed the classification accuracy of four RF devices separately under MDA classifier shown as Fig. 8. We can find that M2 and M4 maintain a relatively high accuracy. That is because the differences between RF devices are uncertain and we cannot observe it directly. We could only observe the difference indirectly that M2 and M4 have a more significant fingerprint features; hence, they could be easier classified. Similarly, the difference between M1 and M3 are small; thus, they could be easier confused and have relatively lower classification accuracy.

Fig. 8
figure 8

Classification accuracy of each RF device individually under MDA. We observe the classification accuracy of each machine and find out the similarity between four RF devices

Due to the characteristics of LR, we achieve authentication simulation shown as Table 2. In our experiment, three RF devices were designed as cloning devices to send malicious attack signal, and there is one security RF device. We set different threshold P0 from 0.2 to 0.8. Then, TP and TN were calculated in different SNR environment. Take 13.0 and 99.9 in the upper left corner as an example. When we set P0=0.8 and SNR=30 dB, due to the higher threshold, only 13.0% security signal could be accepted, but 99.9% imposter signal will be refused. As the decrease of threshold, more security signals are accepted and less impostor signals are refused. Besides, as the increase of SNR, both TP and TN are increasing. The external noise showed a great effect on RF authentication. For the three possible cloning RF devices and one security RF device in our experiment, the best probability threshold could be set from 0.4 to 0.5, where the sum of TP and TN is relative high.

Table 2 Authentication mission threshold decision at different SNR

In addition, we extra extracted mean features as RF-DNA fingerprint. The performance was shown as Fig. 9 that +Mean refers to the fingerprint with mean feature and −Mean refers to the fingerprint without mean feature. In the high SNR environment, due to the precision of the classification is already high, mean feature can only play a small role. But in the low SNR environment, mean feature conducts a significant improvement in the classification accuracy. Therefore, it is meaningful for RF-DNA to extract the mean feature.

Fig. 9
figure 9

Effect of mean feature under MDA. We extra extracted mean features as RF-DNA fingerprint. The recognition accuracy was improved. +Mean, with mean feature; −Mean, without mean feature

Furthermore, previous studies only focused on the integrity of features including frequency, phase, and amplitude. They did not study which features had more discriminant information. We propose to only extract the frequency or phase or amplitude sub-feature alone. The number of feature dimension changed from 204 to 204/3=68. Figure 10 shows the classification accuracy to single features under MDA classifier. The red line “All” means that all three features are adopted which are 204 dimensions.

Fig. 10
figure 10

Classification accuracy for single sub-features under MDA. We propose to only extract the frequency or phase or amplitude sub-feature alone to find which one has the highest classification accuracy. The red line “All” means that all three features are adopted which are 204 dimensions

We found that the frequency sub-feature has the highest classification accuracy. In other words, the frequency feature information has the largest effect on classification recognition in our experiment. Amplitude and phase feature information play an auxiliary role. However, the classification accuracy between red line “All” and blue line “Frequency” is still very different. When SNR ≤5 dB, frequency feature also loses its identification ability. That is, amplitude and phase features information indeed have a great contribution on classification. They are crucial to enhance classification accuracy. In summary, the success classification of RF-DNA fingerprint was due to the joint action of all three features.

6 Conclusions

Recently, using cloning equipment to obtain illegal access authentication seriously affects the security of information transmission. RF-DNA fingerprint is a rising concept to mark every RF device, thus could be used to identify malicious attack cloning RF devices. In our experiment, 2.4 G bandwidth signal from four RF devices were collected. Results show that the optimal classification accuracy could reach 94%. The reason why we achieved satisfactory results is RF-DNA fingerprint of each RF device is unique, just like DNA in living beings, and the difference among similar RF devices could be discovered. This paper demonstrates that using our extracted fingerprint to distinguish RF devices is successful. Meanwhile, we analyzed the performance under some unsatisfactory conditions. With the decrease of SNR, the classification accuracy is also decreasing. That makes our experimental results more universal and persuasive in real application. Although the accuracy of LR is not as good as SVM or MDA, it could achieve authentication mission and find reasonable threshold setting. Besides, adding mean and separating the sub-features are also innovations of this paper, which will be some special applications in practice.

In this paper, we used NRF24LE1 chip as RF devices in closed basement scenario. Other scenarios could be implemented through different simulators. And our experimental results can be extended to many scenarios, such mobile phone signal [9], remote sensing signal, and military radar signal. Even some human signals such as electrocardiogram, electroencephalogram, or electromyogram could use RF-DNA fingerprint technology to identify human health.

The limitation of this paper is that the extracted features are relatively less, only 204 dimensions. Increasing feature extraction may improve identification accuracy significantly. Additionally, only four RF devices are classified in our experiment. We should find some advanced methods if we need to deal with a large number of cloning RF devices. And the future work could also focus on the method of extracting fingerprint and the classifier chosen. For example, recent researches took short-time Fourier transforms [22] and discrete Gabor transform [10] to generate RF-DNA fingerprint. Besides, some neural network model [17, 20] could also be used as classifiers. Combining appropriate statistical algorithms, finding meaningful RF-DNA fingerprint features can improve the recognition accuracy prominently. Furthermore, our research extract only one kind of fingerprint, and the combination of multiple fingerprint could be a rising area of future research.