Abstract
The BP neural network model used in data classification can change the traditional manual classification, which has the disadvantages of low efficiency and subjective interference. According to the principle of BP, this paper determines the relevant parameters of network structure, and establishes an optimized BP. The BP model is used to analyze the chemical composition data of tobacco leaves to determine the grade of tobacco leaves. Experiments show that this model has better recognition accuracy than KNN and random forest model. It effectively improves the efficiency of classification and reduces the interference of subjective factors in classification.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Tobacco leaf is an important raw material of the tobacco industry. Its grade purity will directly affect the quality and taste of cigarettes produced by the tobacco industry. Therefore, the classification of tobacco leaf grade is of great significance [1]. In the traditional tobacco grading process, it mainly depends on relevant professionals to comprehensively evaluate the tobacco grade, and identify the tobacco grade through vision, touch, smell and other senses. The classification method of artificial tobacco leaf has strong subjectivity and is closely related to the experience of professionals. Different experts may classify tobacco leaves into different grades, which is inefficient, difficult to guarantee the accuracy, and consumes a lot of human and material resources [2]. In view of the limitations of manual classification of tobacco leaves, some technical schemes have been put forward in relevant literature. Literature [3] proposed to use band light source and light intensity to classify the grade of tobacco leaves. Literature [4] proposed tobacco classification based on clustering and weighted k-nearest neighbor, and classified tobacco classification according to infrared spectroscopy. Reference [5] used entropy method to weight the features of samples, introduced the weight of features in the calculation of sample distance, and used KNN algorithm to classify tobacco leaf chemical composition data. If there is a lot of noise in tobacco data, KNN classification cannot eliminate the interference of noise, so the accuracy will be affected. Literature [6] applies random forest algorithm to tobacco grade classification, which can achieve good results when there are many samples in the data set. However, the random forest algorithm cannot show its advantages on the small sample data set in this paper. Literature [7] proposed an automatic classification method of tobacco leaves based on machine vision, which realizes the classification of tobacco leaves according to the feature extraction and recognition of tobacco images. However, in the process of tobacco leaf image recognition, the actual situations such as folding of tobacco leaf images and mixing of front and back sides of tobacco leaves are not considered. Literature [8] proposed to classify tobacco grades by near-infrared spectroscopy and use partial least squares discrimination method to classify tobacco grades. However, infrared spectroscopy equipment is expensive and cannot be used on a large scale. Aiming at the above problems, this paper studies the tobacco grade recognition technology based on BP model. BP has strong nonlinear mapping ability and associative memory for external stimuli and input information, so it has strong recognition and classification ability for input samples [9]. BP has high accuracy in tobacco leaf chemical composition data set classification and solve the disadvantages of low efficiency and strong subjectivity.
2 Data Acquisition and Analysis of Tobacco Grade
The chemical composition of tobacco leaf is one of the important factors affecting the taste and quality of cigarette [10], which includes reducing sugar, total alkaloids, total sugar, potassium, total nitrogen, starch and other components. The experimental data of this paper come from different flue-cured tobacco bases in Guangxi, Yunnan, Chongqing and Hunan of China. Flue-cured tobacco leaves are mainly divided into four grades: B2F, C2F, C3F and X2F. The BP model is introduced to identify the tobacco chemical composition data set. When the tobacco grade needs to be divided, the predicted tobacco grade information can be obtained by inputting the tobacco chemical composition information. Table 1 is partial records in the database about the chemical composition data and grades of tobacco leaves.
Table 2 summarize the proportion of chemical components contained in B2F tobacco grade, C2F tobacco grade, C3F tobacco grade and X2F tobacco grade.
It can be seen from Table 2 that in the proportion of chemical components of B2F tobacco grade, total sugar accounts for the highest proportion of all chemical components and chlorine accounts for the lowest proportion. The fluctuation range of total sugar and reducing sugar is the largest. The total sugar can reach 15.6% at the lowest time and 44.6% at the highest time. Reducing sugar accounted for 11.5% at the lowest time and 35.6% at the highest time.
It can be seen from Table 2 that in the proportion of chemical composition of C2F tobacco grade, the overall change trend of chemical composition of tobacco leaf is consistent with that of other grades, the proportion of total sugar is the highest, followed by reducing sugar. But the difference is that the lowest proportion of total sugar is 24.9%, and the lowest proportion of reducing sugar is 20%, which is higher than other grades. In the proportion of chlorine, the lowest is 0.2% and the highest is 0.62%, which is much higher than other grades.
It can be seen from Table 2 that in the proportion of chemical composition of C3F tobacco grade, the proportion trend of chemical composition of tobacco leaf is generally consistent with that of other grades. However, compared with B2F, the proportion of potassium in C2F can reach 4.97%, which is higher than that of 3.79% and 2.93% in B2F and C2F. The highest proportion of starch was 13.18%, which was also higher than the other three grades.
According to Table 2, in the proportion of chemical composition of X2F tobacco grade, the proportion of total sugar and reducing sugar is much higher than that of B2F tobacco grade and C2F tobacco grade, second only to that of C2F. However, the change trend of overall component proportion is similar to that of B2F grade.
From Table 2, it can be found that the chemical composition information of Different Tobacco Grades changes greatly, and the chemical composition proportion between each tobacco grade also has great similarity. If identified by professionals, when the chemical composition proportions of two different grades of tobacco leaves are relatively similar, it is difficult for professionals to determine what grade the two kinds of tobacco leaves belong to. Because the proportion of chemical components between different grades is not stable in a small range, on the contrary, it will fluctuate in a large range, which may also lead to overlap between different tobacco grades. Therefore, if professionals only rely on experience and personal subjectivity to judge the grade of tobacco leaves, there are defects.
3 Establishment of Tobacco Grade Recognition Model Based on BP
The chemical composition of tobacco leaves is analysed to judge the grade of the tobacco leaves. This problem belongs to the classification problem of machine learning. To realize multi-dimensional data classification, BP is hierarchical, which is composed of input layer, middle layer and output layer. All neurons in adjacent layers are fully connected. Each neuron obtains the input response of the BP network and generates the connection weight. From the output layer to each intermediate layer, the connection weight is corrected layer by layer by reducing the error between the desired output and the actual output, and returned to the input layer. The process is repeated, and it is completed when the global error of the network tends to the given minimum value [11].
3.1 Input Data Preprocessing
The main factor affecting the grade is the chemical composition. The total sugar, reducing sugar, total alkaloids, potassium, chlorine, total nitrogen and starch in the tobacco chemical composition data set are determined as seven characteristics, which are set as the BP input layer data and expressed by x1, X2,…, X7 respectively. Take the tobacco grade as the BP output layer data, expressed by Y.
The tobacco data were normalized. The normalization of data sets can effectively raise the prediction accuracy and accelerate the convergence speed of the model. The input data X1, X2,…, X7 of the network are linearly normalized and processed according to Formula (1).
Encode the BP output layer data: 1 represents B2F tobacco grade, 2 represents C2F tobacco grade, 3 represents C3F tobacco grade, and 4 represents X2F tobacco grade.
3.2 BP Network Structure Design
-
(1)
Input and output layer design
The input index of BP model is the chemical composition of tobacco leaves, and the output is the grade of tobacco leaves. So, the input layer has 7 nodes and the output layer has 1 nodes.
-
(2)
Hidden layer design
When BP has enough hidden layer nodes, it can approximate the nonlinear function with arbitrary accuracy [12]. Therefore, a three-layer BP model is adopted in this paper. But too many hidden layer neurons will not only increase the computational complexity, but also produce the problem of over fitting [13]. Too few hidden layer neurons will affect the accuracy of output results. Generally, the number of hidden layer nodes is determined by Formula (2).
$$ h = \sqrt {m + n} + a $$(2)The parameters h, m and n in Formula (2) are the number of hidden layer nodes, the number of input layer nodes and the number of output layer nodes respectively. And a is a constant between [1, 10]. According to Formula (2), the number of neurons in the hidden layer is calculated to be between 3 and 13. In this paper, the number of BP hidden layer neurons is set as 6. The BP design is shown in Fig. 1.
-
(3)
Activate function selection
The activation function of the hidden layer in the BP is a nonlinear function [14], because the combination of linear functions is a linear function itself. Increasing the number of network layers can not calculate more complex functions, so the nonlinear function must be introduced. Types of activation functions: ReLU, Sigmoid, Tanh, etc. The ReLU, Sigmoid and Tanh are shown in Formulas (3), (4) and (5) respectively.
The research shows that the ReLU activation function is generally used for hidden layers. For the output layer, if it is classified and split, the Sigmoid function is used [14]. Sigmoid function represent output probability. The prediction of tobacco grade is realized by inputting relevant attribute values through the joint action of input layer, hidden layer and output layer.
3.3 BP Network Training
The training of BP model includes the forward propagation process of data set and the back propagation process of error. Forward propagation of data set: represent the chemical composition data and tobacco grade information contained in tobacco leaves with (x, y), and input the sample data into BP model. At the same time, set the weight of the network model and the threshold of the last iteration, and the output of neurons is calculated layer by layer. Error back propagation: determine the influence gradient of the weight and threshold of the last layer and the previous layers on the total error, and then modify the weight and threshold to minimize the target error. The following steps are the network training process.
-
(1)
Initialize the network model. The data set includes the chemical composition of tobacco leaves and the corresponding grade of tobacco leaves. The input data is the chemical composition X of tobacco leaves, and the number of input features is expressed by P. The number of hidden layers is expressed in M. The output layer is tobacco grade y, because there is only one output, and the number of output layers is 1.
-
(2)
Get hidden layer data R. Input \(x_{i}\) according to the characteristics of tobacco chemical information \(x_{i}\).The weights of input layer and hidden layer are \(\omega_{ij}\), hidden layer threshold \(a_{j}\). Calculate the hidden layer output as R. As shown in Formula (6).
$$ R_{j} = f\left( {\sum\nolimits_{i = 1}^{p} {\omega_{ij} x_{i} - a_{j} } } \right),j = 1,2, \ldots ,m $$(6) -
(3)
According to the hidden layer output R, the weight between the hidden layer and the output layer \(\omega_{j}\), and the output layer threshold b to calculate the tobacco grade prediction L.
$$ L = g\left( {\sum\nolimits_{j = 1}^{m} {R_{j} w_{j} - b} } \right) $$(7)
Where f represents the hidden layer activation function ReLU and g represents the output layer activation function Sigmoid. After obtaining the prediction output L, BP prediction error E is calculated from the expected output Y using Formula (8). The smaller the value of MSE, the better the accuracy of the prediction model.
According to the error E, the weight \(\omega_{ij}\) and threshold \(a_{j}\) between the network input layer and the hidden layer is updated. And the weight \(\omega_{j}\) and Threshold b between the hidden layer and the output layer is updated. \(\eta\) indicates the learning rate.
-
(4)
Finally, the end of training is judged according to whether the target error is reached or the number of iterations. If satisfied, it ends. Otherwise, return to step 2.
4 Simulation Experiment
4.1 Experimental Setup
Set the relevant parameters of BP. Set the excitation functions of the BP hidden layer and output layer as ReLU and Sigmoid respectively, the BP training function Traingdx and BP performance is evaluated by MSE. The characteristic numbers of input layer, hidden layer and output layer are 7, 6 and 1 respectively. Number of iterations Epochs, expected error e, learning rate η are set to 6000, 0.000001 and 0.02 respectively.
4.2 Analysis of Experimental Results
Figures 2, 3, 4 and 5 show the prediction results of the system for different tobacco grades.
Figures 2, 3, 4 and 5 show the prediction results of four tobacco grades. The ordinate in the figure represents the tobacco grade, including 1: B2F grade, 2: C2F grade, 3: C3F grade and 4: X2F grade. The orange dot indicates the actual tobacco grade, and the blue dot indicates the predicted tobacco grade. When the actual tobacco grade is consistent with the predicted tobacco grade, two points will coincide, that is, when all points are on the line corresponding to the grade, the prediction result is the best. It can be observed that in the test set data, the predicted grade of most tobacco sample data can well coincide with the actual grade, which shows that the model can correctly predict the tobacco grade of most tobacco sample data. However, there are still a few data that cannot be correctly identified, which may be related to the tobacco data itself. The proportion of chemical components of different grades of tobacco leaves is the most highly similar. In addition, it may also be related to the model itself. The selection of the number of hidden layer neurons and hidden layer layers of the BP model and the selection of activation function will have a certain impact on the prediction accuracy of the model.
In the data set, 70% is set as the training set, and the training model is established by BP neural network algorithm. The remaining 30% data were used as a test set to predict 30% tobacco grade. Finally, the predicted grade is compared with the actual grade of 30% tobacco leaves and displayed at the front of the web page. The effect is shown in Fig. 2, 3, 4 and 5, and the prediction results are shown in Table 3. The recognition rate of B2F grade of tobacco leaves reached 90.09%, C2F grade of tobacco leaves reached 90.47%, C3F grade of tobacco leaves reached 90.77%, X2F grade of tobacco leaves reached 91.38%, and the overall average recognition rate was 90.67%.
The above literature mentioned that KNN and random forest are applied to tobacco grade recognition. Now these two algorithms are compared with BP. See Table 4 for comparison results. The data set in this paper belongs to small samples and data with noise. BP has nonlinear characteristics. By fitting the change law of input data through multi-layer neurons, it can denoise and fit small sample data, so it can obtain higher classification accuracy.
5 Conclusion
With the higher and higher requirements of customers for the quality of tobacco leaves, the current manual grading of tobacco leaves has some limitations, such as strong subjectivity, consuming human and material resources and so on. In this paper, the chemical composition data of tobacco leaves are used as the training set, the BP model is established, and the tobacco grade classification technology based on BP is developed. The purpose is to solve the disadvantages of low efficiency and high subjectivity of artificial tobacco grading. Experiments show that the proposed algorithm achieves better recognition accuracy than KNN and random forest. Deep neural network has better performance than traditional neural network and has been widely used [15]. In the next step, we will use deep neural network to predict tobacco grade.
References
Tan, X., Yunlan, T., Yingwu, C.: Intelligent classification method of flue-cured tobacco based on rough set. J. Agric. Mach. 06, 169–174 (2009)
Shuangyan, Y., Zigang, Y., Siwei, Z., et al.: Automatic tobacco classification method based on near infrared spectroscopy and PSO-SVM algorithm. Guizhou Agric. Sci. 46(12), 141–144 (2018)
Zhiqian, Q.: Effects of different light sources and light intensity on tobacco classification. Guizhou university, China Guiyang (2020)
Hang, L.: The research on tobacco classification based on clustering and weighted KNN . China Zhengzhou: Zhengzhou university (2017)
Hui, Z., Kaihu, H., Zhou, Z.: Application of EM-KNN algorithm in classification of re-dried tobacco leaves. Software 39(06), 96–100 (2018)
Hari, S., Maria, P.A.: Prediction of tobacco leave Grades with ensemble machine learning methods. In: International Congress on Applied Information Technology, pp. 1–6 (2019)
Zhenzhen, Z.: Method for automatic grading of tobacco based on machine vision. China Chongqing: Southwest university (2016)
Guo, T., Kuangda, T., Zuhong, L., et al.: Classification of tobacco grades by near-infrared spectroscopy and PLS-DA. Tobacco Sci. Technol. 309(04), 60–62 (2013)
Qing, C., Wei, L., Kejun, Z.: A neural network recognition model based on aroma components in tobacco. J. Hunan Univ. 33(02), 103–105 (2006)
Guiting, H., Chengchao, Z., Weijun, Z., Zhengjiang, Z.: Application of BP neural network based on model identification in photovoltaic system MPPT. Comput. Meas. Control 25(10), 213–216 (2017)
Lin, W., Zhihong, L., Zicheng, X.: Study on relationship between acid aroma with polyphenol content, chemical composition and taste characteristics of flue-cured tobacco. J. Agric. Sci. Technol. 21(05), 159–169 (2019)
Qiyi, Q., Chengxiang, G., Shuai, W., Xuyi, Y., Ningjiang, C.: On BP neural network optimization based on particle swarm optimization and cuckoo search fusion. J. Guangxi University (Nat Sci Ed) 45(04), 898–905 (2020)
Runa, A.: Research on text classification based on improved convolutional neural network. Inner Mongolia University for Nationalities, China Tongliao (2020)
Xiao, Q., Chengcheng, H., Shi, Y., et al.: Research progress of image classification based on convolutional neural network. Guangxi Sci. 27(6), 587–599 (2020)
Konovalenko, I., Maruschak, P., Brezinová, J., et al.: Steel surface defect classification using deep residual neural network. Metals 846(10), 1–15 (2020)
Acknowledgments
This research is funded by the Guangxi Science and Technology Planning Project (GX[2016] No. 380), and the Science and Technology Planning Project of Guangxi China Tobacco Industry Co., Ltd. (No. GXZYCX2019E007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Nong, Y., Chen, Z., Huang, C., Pan, J., Liang, D., Lu, Y. (2022). Recognition Model Based on BP Neural Network and Its Application. In: Qian, Z., Jabbar, M., Li, X. (eds) Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications. WCNA 2021. Lecture Notes in Electrical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-19-2456-9_31
Download citation
DOI: https://doi.org/10.1007/978-981-19-2456-9_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2455-2
Online ISBN: 978-981-19-2456-9
eBook Packages: EngineeringEngineering (R0)