Readback Error Classification of Radiotelephony Communication Based on Convolutional Neural Network

  • Fangyuan Cheng
  • Guimin Jia
  • Jinfeng Yang
  • Dan Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10996)


The readback errors of radiotelephony communication result in serious potential risk to the air transportation safety. Therefore, it is essential to establish a proper model to identify and also to classify the readback errors automatically so as to improve the flight safety. In this paper, a new scheme, which has two channels to process the instructions and the readbacks (I-R pairs) respectively based on one-layer convolutional neural network (CNN), is proposed for the readback error classification. The semantics of the I-R pairs are learned by the one-layer CNN encoder. Then, the classification decision is made according to a matching vector of the I-R pairs. A new method of input is also tested. Extensive experiments have been conducted and the results show that the proposed scheme is effective for automatic readback error classification and the average classification accuracy on a Chinese civil radiotelephony communication dataset is up to 95.44%.


Radiotelephony communication One-layer convolutional neural network Semantic vector  Readback error classification 

1 Introduction

According to the reports of Aviation Safety Reporting System (ASRS) and Federal Aviation Administration (FAA), the radiotelephony communication errors are the main factor of resulting in aviation incidents. Meanwhile, the readback errors take a half percent of the total radiotelephony communication errors [1, 2]. Readback errors refer to the incorrect readbacks that the pilots make in conversation with air traffic controllers during flight. Here, the common readback errors can be usually divided into five types: heading information error, runway information error, call sign information error, altitude information error and partial information loss [3, 4, 5, 6]. To reduce this kind of aviation incidents caused by readback errors, it has to analyze the error tendency of readbacks to improve the communications among the air traffic controllers and the pilots. The conventional way deals with this problem by analyzing the voice tapes of communication manually [4, 5, 6], which requires a lot of efforts in listening to the voice tapes. Thus, it is meaningful to establish a model for automatic readback error classification. Unfortunately, there are few proper models of this problem in the aviation field by now.

The task of readback error classification is similar to the sentence matching issue in natural language processing (NLP), which has been studied for a long time. Sentence matching aims to identify the relationship between the sentence pairs by matching the semantics of sentences. Recently, deep neural networks are widely used and the CNNs have achieved remarkable performance in computer vision and sentence matching [7, 8, 9, 10, 11, 12, 13]. Some CNNs first represent the sentence pairs as sentence-level semantic vectors, and then identify the relationship based on the semantic vectors, such as ARC-I [8] and CNTN [9]. Some CNNs take the multiple-granularity into account for identifying the relationship, such as Bi-CNN-MI [10] and MultiGranCNN [11]. Bi-CNN-MI and MultiGranCNN are complicated in architecture and there are too many parameters to be tuned in training. In these works, the square kernel is widely used of CNN to model the sentences. Whereas the latest works have proved that one-layer CNN, whose kernel width is equal to the dimensionality of the word vectors, is more effective in semantic modeling [12, 13].

According to the analysis above, a novel scheme is proposed to classify the readback errors for I-R pairs based on one-layer CNN. It mainly consists of semantic modeling and semantic matching. For semantic modeling, the I-R pairs are processed respectively to generate their semantic vectors by a two channel model that has the same one-layer CNN encoder. In addition, for the input of the one-layer CNN, a doubling strategy is applied to improve the representations of the I-R pairs. For semantic matching, a matching vector, which contains semantic vectors of the I-R pairs and a similarity information, is generated and then the classification decision is made using the softmax function.

2 The One-Layer Convolutional Neural Network

The architecture of the one-layer CNN is described in this section. As illustrated in Fig. 1, it is composed of input layer, one convolutional layer and one pooling layer.
Fig. 1.

The architecture of one-layer CNN

The input of the one-layer CNN is a sentence matrix. The sentence matrix \( {\mathbf{S}} \in {\mathbb{R}}^{n \times m} \) is formed by concatenating word vectors of all words in the sentence, where n is the (zero-padded) sentence length and m is the dimension of the word vector. Then, the sentence matrix is fed into the convolutional layer.

In convolutional layer, a rectangle kernel which size is h × m is taken on the sentence matrix S. The kernel regards the single word vector as a pixel. It slides several words under each convolution to generate a new feature. For example, a feature \( c_{i} \) can be generated from a window of words \( {\mathbf{S}}_{i:i - h + 1} \) by Eq. (1)
$$ c_{i} = \sigma ({\mathbf{W}} \cdot {\mathbf{S}}_{i:i - h + 1} + b)\quad i = 1,2, \ldots ,n - h + 1, $$
where h denotes the ‘height’ of the kernel, \( {\mathbf{W}} \in {\mathbb{R}}^{h \times m} \) stands for the weight matrix in convolutional layer, \( \sigma \) represents an active function such as ReLU, and \( b \in {\mathbb{R}} \) is the bias. The filter is applied to each possible window of words in sentence matrix S and then the feature map is produced by Eq. (2)
$$ {\mathbf{c}} = \left[ {c_{1} ,c_{2} , \ldots ,c_{n - h + 1} } \right], $$
where \( {\mathbf{c}} \in {\mathbb{R}}^{n - h + 1} \). To make the extracted features abundant, we can use multiple kernels to learn complementary features with the same region or the varying region. The convolutional layer produces a set of feature maps of dimension (n + h − 1) × 1.

The output of the model is a semantic vector, which is put into the pooling layer to reduce the dimension and over fitting. There are many strategies used for pooling operations, such as Max, k-Max and average etc. Finally, these features are further concatenated to a semantic vector.

3 The Proposed Method for Readback Error Classification

The architecture of readback error classification model based on one-layer CNN is shown as Fig. 2. It consists of two modules: semantic modeling module and semantic matching module.
Fig. 2.

The architecture of readback error classification model based on one-layer CNN

3.1 Semantic Modeling

In sentence modeling module, we propose an augmented one-hot encoding method to represent the words in the sentence. A doubling strategy is also applied to obtain the sentence matrix S.

In verification task of radiotelephony communication, it has been reported that the one-hot word vector outperforms word2vec [14, 15]. These works do not take the abbreviation of some keywords into account, which is listed in Table 1. To increase the correlation between the standard keywords and their abbreviated keywords, flags are added after the one-hot vector, as listed in Table 1. In this way, the correlation between the standard keywords and their abbreviated keywords can be improved significantly. Meanwhile, the misclassification is reduced.
Table 1.

The examples of abbreviation words

For building sentence matrices, three different input strategies are used here to find the best semantics representation of the sentences. They are Original Sentences, Doubling Instruction and Doubling Readback respectively. The doubling strategy is applied to strengthen the semantics of the words in [16, 17]. An example of Original Sentences is listed in Table 2. In doubling strategy, the input sentence is represented twice in S when the sentence length is shorter than the max-length of the pairs. This method can emphasize the information of the short sentences twice so that the model can extract better representations of semantics. For Doubling Readback, the readback is represented twice in the sentence matrix and the matrix of instruction is unchanged as listed in Table 2.
Table 2.

The example of I-R pairs

In convolutional layer, multiple kernels of an identical region are used to learn complementary features. The output of the convolutional layer is fed into a k-Max pooling layer to detect the top k important features. Then, these features are concatenated to a semantic vector. The semantic vectors of the instruction and readback can be represented as \( {\mathbf{x}}_{I} \) and \( {\mathbf{x}}_{R} \) respectively.

3.2 Semantic Matching

The matching vector, which can represent the relationship between the I-R pairs, is generated by aggregating \( {\mathbf{x}}_{I} \), \( {\mathbf{x}}_{R} \) and cosine similarity between the semantic vectors. The aggregating operation is defined as Eq. (3)
$$ {\mathbf{x}}_{input} = \left[ {{\mathbf{x}}_{I} \oplus sim \oplus {\mathbf{x}}_{R} } \right], $$
where \( \oplus \) is the concatenation operator. sim stands for the cosine similarity between the semantic vectors. The cosine function is defined as Eq. (4)
$$ sim({\mathbf{x}}_{I} ,{\mathbf{x}}_{R} ) = \frac{{{{\mathbf{x}}_{I}}^{T} \cdot {\mathbf{x}}_{R} }}{{\left\| {{\mathbf{x}}_{I} } \right\| \cdot \left\| {{\mathbf{x}}_{R} } \right\|}}, $$
where \( \left\| \cdot \right\| \) represents the \( l_{2} \) norm. Then, the matching vector is fed into a fully connected layer and the softmax function is used to output a vector, which denotes the probability of input I-R pair belonging to each kind of matching category. The function can be expressed as Eq. (5)
$$ p(y = j|{\mathbf{x}} ) = \frac{{e^{{{{\mathbf{x}}}^{T} \theta_{j} }} }}{{\sum\nolimits_{m = 1}^{M} {e^{{{{\mathbf{x}}}^{T} \theta_{m} }} } }}\quad \;j = 1,2, \ldots ,6, $$
where \( \theta_{m} \) is a weight vector of m-th label and \( p(y = j|{\mathbf{x}}) \) stands for the probability that the input \( {\mathbf{x}} \) belongs to label j. \( {\mathbf{x}} \) can be thought of as a final representation of the readback pairs. Finally, the error type of the input I-R pair can be obtained according to the index of the max value in the vector.

4 Experiments

4.1 Dataset

Due to the vacancy of the Chinese civil radiotelephony communication (CCRC), we built the CCRC dataset according to the recordings of radiotelephony communication between air traffic controllers, and pilots and the training books for radiotelephony communications are also consulted to establish the dataset. In this dataset, there are six types of readback pairs: correct readback, heading information error, runway information error, call sign information error, altitude information error and partial information loss. There are 1300 correct I-R pairs, and there are 500 I-R pairs for each kind of readback errors, which means there are 2500 incorrect I-R pairs that are inconsistent in semantics.

4.2 Parameter Settings and Test Protocol

Here, the embedding size of augmented one-hot vector is 1005. The other parameters of the proposed model are as follows: in convolutional layer, the value h of the rectangle kernel is set to 14 and the number of the feature maps is 50. In k-Max pooling layer, the value of k is 7. To train the model, the minibatch Gradient Descent (GD) is used for optimization and the batch size is 100. Besides, the learning rate is 0.1 and the dropout rate is 0.5.

The training set contains 3100 I-R pairs, in which there are 1100 correct I-R pairs and 400 incorrect I-R pairs with one type of errors. The test set is made up by the rest 700 I-R pairs. To verify the accuracy and stability of the proposed model, we conduct the experiments using random ‘sampling protocol’ for thirty times. The evaluation metrics of the experimental results are Average test accuracy (Ave.) of thirty tests, Mean square error (MSE) and F.
$$ {\text{test}}\;{\text{accuracy}} = \frac{{{\text{Correct}}\;{\text{classification}}\;{\text{numbers}}\;{\text{of}}\;{\text{test}}\;{\text{samples}}}}{{{\text{Total}}\;{\text{numbers}}\;{\text{of}}\;{\text{test}}\;{\text{samples}}}}, $$
$$ F = \frac{2RP}{R + P}, $$

MSE denotes the stability of the model and F is the geometric mean of the precision and recall rate of a certain matching category.

4.3 Analysis of Parameters h and k

The original sentences are used as the input to analyze the influence of the parameters h and k. The value of h varies from 6 to 18, and the k in k-max pooling layer is set to 1. From Fig. 3(a), it can be noted that the performance is better when larger value of h is used in convolutional layer. We recommend the h is set to 14. Besides that, Fig. 3(a) shows the test accuracy of augmented one-hot is better than one-hot. The value of k is from 1 to 14. As illustrated in Fig. 3(b), the performance is better when larger k is used in k-Max pooling. It is observed that the improvement is quite limited for the value of k over 7. The more features are remained in k-Max pooling layer, the more neural nodes are needed in the final fully connected layer so that the model cost more training and testing time. In this paper, k is set to 7 by making a tradeoff between the costing time and the test accuracy.
Fig. 3.

Analysis of the parameters h and k

4.4 Classification Performance

First, to test the performance of the augmented one-hot vector for word representation in our task, as well as the performance of the proposed new model, the ARC-I [8] model is used for comparison. The experimental results are listed in Table 3. Here, the input strategy is fixed to original sentences.
Table 3.

Experimental results on CCRC


Ave. (%)

MSE (%)

ARC-I (one-hot)



ARC-I (augmented one-hot)



Ours (one-hot)



Ours (augmented one-hot)



From Table 3, it can be noted that the proposed method outperforms the baseline method, ARC-I. This is mainly due to the rectangle kernel and the k-max pooling. Furthermore, the model using the augmented one-hot encoding has a higher test accuracy and lower MSE. The reason is that the augmented one-hot word vector can represent the relations of the abbreviated words better. The augmented one-hot vector is used in the following experiments.

Then, three different input strategies are compared. The experimental results are illustrated in Fig. 4 and Table 4. The test accuracy of thirty random validations is shown in Fig. 4. The F-value and MSE of each kind of errors are listed in Table 4. The Correct stands for the type of correct readback, and Error type1 to type5 denote the five given error types: heading information error, runway information error, call sign information error, altitude information error and partial information loss.
Fig. 4.

Test accuracy of the proposed model with three input strategies

Table 4.

F-value and MSE of each kind of errors


Original sentences

Doubling instruction

Doubling readback

F (%)

MSE (%)

F (%)

MSE (%)

F (%)

MSE (%)








Error type1







Error type2







Error type3







Error type4







Error type5







All samples







From Fig. 4 and Table 4, it can be noted that the doubling strategies can improve the test accuracy and the F-value of each error type. The reason is that the doubling strategies can emphasize and strengthen the information of the short sentences twice for better representations. As a result, the relationship between the I-R pairs can be expressed better. At the same time, it can be seen that Doubling Readback achieves higher performance than Doubling Instruction. This is because the model can emphasize the difference and the semantics of the sentences twice via Doubling Readback strategy when the readback error is not obvious. That is, it can provide more class-discriminative information.

5 Conclusions and Future Work

In this paper, a new scheme has been proposed for classifying readback errors of radiotelephony communication. Based on the CCRC dataset, a series of experiments were conducted to evaluate the performance of the new scheme. The experimental results have shown that the proposed method is effective for automatic readback error classification and give a new solution for classifying readback errors. While, some improvements can be made in the future work. The kernel with different region can be used in the model to extract the semantics of the I-R pairs, and the method of the matching vector generation is also a key point to improve the performance of the model.



This work is supported by National Natural Science Foundation of China (No. U1433120, No. 61502498, No. 61379102) and the Fundamental Research Funds for the Central Universities (No. 3122017001).


  1. 1.
    National Transportation Safety Board: Review of U.S. Civil Aviation Accident, Calender Year 2010. Annual Review NTSB/ARA-12/01, Washington, DC (2012)Google Scholar
  2. 2.
    Billings, C.E., Cheaney, E.D.: Information transfer problems in the aviation system, Technical report 1875, National Aeronautics and Space Administration (1981)Google Scholar
  3. 3.
    Cardosi, K., Falzarano, P., Han, S.: Pilot-controller communication errors: an analysis of Aviation Safety Reporting System (ASRS) reports. Aviat. Saf. 119, S518–S519 (1998)Google Scholar
  4. 4.
    US Federal Aviation Administration: Altitude deviation study: a Descriptive Analysis of Pilot and Controller incidents, Final report, October 1992Google Scholar
  5. 5.
    Morrow, D., Lee, A., Rodvold, M.: Analysis of problems in routine controller-pilot communication. Int. J. Aviat. Psychol. 3(4), 285–302 (1993)CrossRefGoogle Scholar
  6. 6.
    Pope, J.A.: Research identifies common errors behind altitude deviations. Flight Saf. Digest. 12, 1–13 (1993)Google Scholar
  7. 7.
    Luan, S.Z., Chen, C., Zhang, B.C., Han, J.G., Liu, J.Z.: Gabor convolutional networks. IEEE Trans. Image Process. 27, 4357–4366 (2018)CrossRefGoogle Scholar
  8. 8.
    Hu, B.T., Lu, Z.D., Li, H., Chen, Q.C.: Convolutional neural network architectures for matching natural language sentences. In: International Conference on Neural Information Processing Systems, pp. 2042–2050 (2015)Google Scholar
  9. 9.
    Qiu, X., Huang, X.: Convolutional neural tensor network architecture for community-based question answering. In: AAAI Conference on Artificial Intelligence, pp. 1305–1311 (2015)Google Scholar
  10. 10.
    Yin, W., Schütze, H.: Convolutional neural network for paraphrase identification. In: Conference of the North American Chapter of the Association for Computational Linguistics, pp. 901–911 (2015)Google Scholar
  11. 11.
    Yin, W., Schütze, H.: MultiGranCNN: an architecture for general matching of text chunks on multiple levels of granularity. In: International Joint Conference on Natural Language Processing, pp. 63–73 (2015)Google Scholar
  12. 12.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1746–1751 (2014)Google Scholar
  13. 13.
    Zhang, Y., Roller, S., Wallace, B.: MGNC-CNN: a simple approach to exploiting multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968 (2016)
  14. 14.
    Lu, Y.J.: Semantic Representation and Verification of Aviation Radiotelephony Communication Based on RNN. Civil Aviation University of China (2017)Google Scholar
  15. 15.
    Jia, G.M., Lu, Y.J., Lu, W.B., et al.: Verification method for Chinese aviation radiotelephony readbacks based on LSTM-RNN. Electron. Lett. 53(6), 401–403 (2017)CrossRefGoogle Scholar
  16. 16.
    Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
  17. 17.
    Liu, Y., Sun, C.J., Lin, L., Wang, X.L.: Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Fangyuan Cheng
    • 1
  • Guimin Jia
    • 1
  • Jinfeng Yang
    • 1
  • Dan Li
    • 1
  1. 1.Tianjin Key Lab for Advanced Signal ProcessingCivil Aviation University of ChinaTianjinChina

Personalised recommendations