Abstract
This essay spotlights concrete crack detection in infrastructure maintenance, highlighting its importance for structural integrity, cost-effectiveness, and eco-consciousness. It delves into various detection methods and introduces an improved VGG-16-based deep learning model with batch normalization, P-ReLU activation, and Adam optimization for better training outcomes. Through experiments on the MendeleyData-CrackDetection dataset, the enhanced model outperforms the original. This study underscores the significance of hyperparameter optimization and algorithm choice in deep learning.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
Under the background of intelligent manufacturing in China, Intelligent construction technology is increasingly becoming an important development direction in the field of construction and engineering. Intelligent building technologies combine artificial intelligence, big data, the Internet of Things, and advanced sensing technologies to improve the efficiency, quality, and sustainability of building construction, operation, and maintenance.
Crack detection on concrete surfaces is crucial in maintaining infrastructure and ensuring structural safety used for detecting and repairing cracks in concrete structures early to ensure safety and reliability. Concrete structures such as bridges, buildings and roads are part of the infrastructure of modern society. Cracks that are not detected or not repaired in time may cause serious damage to the structure and even threaten life and property safety [1]. Early detection and treatment of cracks in concrete structures can reduce maintenance and repair costs. Chronic neglect of cracks may lead to more expensive repairs and recovery work [2]. Concrete cracks may cause moisture and harmful substances to penetrate and thus negatively affect the surrounding environment. Effective detection helps to reduce environmental pollution by [3]. Existing concrete surface crack detection methods generally have the following four directions: Visual detection [4], Deep learning methods [5, 6], Image processing technology [7] and Sensor technology [8].
The application of deep learning and image processing technology provides a new hope for crack detection, which can improve the accuracy and efficiency of detection, reduce maintenance costs. Convolutional neural network (CNN) is one of the key factors for deep learning to make breakthrough in the field of image. The LeNet-5 [9], proposed by LeCun et al. in 1998, was an early CNN, which laid the foundation for the digital recognition task. Subsequently, AlexNet [10]’s proposal led to the deep learning research in the field of image recognition, which achieved a significant victory in the ImageNet competition in 2012. Deep learning has been widely used in image classification and object detection tasks. Some important work has included the Faster R-CNN [11], the YOLO (You Only Look Once) [12], and the Mask R-CNN [13]. These models have achieved significant performance improvements in the field of object detection through their end-to-end training methods. In these improvements, the VGG-16 neural network is one of the important milestones in the field of deep learning, which was proposed by Karen Simonyan and Andrew Zisserman [14] in 2014. The network structure includes a 16-layer deep convolutional neural network with a series of convolutional layers, pooling layers, and fully connected layers. VGG-16 demonstrates a broad potential in image processing. It is widely used in image classification tasks, especially achieving excellent performance on large-scale image datasets, such as ImageNet.
This paper aims to propose an improved deep learning model for the automatic detection and identification of concrete surface cracks. The model is based on the convolutional neural network and the attention mechanism, which is able to effectively extract the features of the concrete surface and classify them. The advantage of this model is that it does not require manual annotation of the crack location or complex image preprocessing, thus saving a lot of time and resources. In this paper, we verify the validity and robustness of this model by performing experiments on publicly available concrete surface crack datasets. The contribution of this paper is that it provides a new way to solve the crack problem in infrastructure maintenance and construction engineering, thereby improving safety and reducing maintenance costs.
2 Crack Recognition of the Concrete Surface Based on an Improved Convolutional Neural Network
In the construction research, the accuracy of automatic identification and timely diagnosis of concrete surface cracks has attracted the attention of researchers. To improve the accuracy of crack recognition, reduce the neural network parameters, and improve the training efficiency, the VGG-16 neural network structure was improved. The MendeleyData-CrackDetection [15] concrete crack dataset, including crack and no crack images, covering dry shrinkage, plastic shrinkage, temperature and external load cracks, were used. The dataset includes 40000 RGB images of 227 × 227 pixels in negative (without cracks) and positive (with cracks), with 20000 per category. The dataset had surface finish and illumination differences, without random rotation, flip, or tilt data enhancement. The data set was divided 7:1 into training set (35000) and validation set (5000).
In the concrete surface crack detection, the original VGG-16 network has problems, such as long training time and general accuracy. Therefore, the network is improved to introduce the batch normalization layer [16], which maps the activation values of each layer to a range of mean 0 and variance 1 to solve the gradient vanishing problem. The batch normalization method improves the network convergence rate and reduces the number of iterations while maintaining the same accuracy [17]. The basic mathematical expression for the method is shown as follows.
For the input m samples \(x_{i}\)~\(x_{m}\), the mean value is shown below in (1):
The variance is shown below in (2):
The normalized result is shown below in (3):
After a batch normalization operation, the mean of the data was adjusted to 0 and the variance to 1. To avoid cases where the inequality does not hold for ε = 0, we introduce the constant ε. However, such an operation may affect the feature distribution of the image, and thus, the need to recover the original feature distribution of the image through scale transformation and offset operation. The specific mathematical expressions are given in the formula (4)–(6).
These parameters are obtained by learning and training, where E represents the mean and Var represents the variance function.
2.1 LeakyReLU Activation Function and the P-ReLU Activation Function
Most convolutional neural network models will use the ReLU function as the activation function after the convolutional layer. However, due to the characteristics of the ReLU activation function, when the network output is negative, the output value is always 0 after the activation function processing, which triggers the gradient disappearance in subsequent training, that is, the phenomenon of neuron “death”. Considering this feature, some researchers will adopt the LeakyReLU activation function to optimize the convolutional neural network model in practice. For example, Chen Mianshu et al. [18] used the LeakyReLU activation function to select the nonlinear activation function in the image classification task based on the convolutional neural network. The mathematical expression of the P-ReLU activation function Formula is shown in formula (7).
This function is similar to the ReLU function, that is, when the input value is positive, the original value is directly output, and when the input value is negative or zero, the original value is multiplied by a constant C. The Leaky ReLU function is such a function, where C is a small positive number, determined before training begins. This has the advantage of maintaining the activation of neurons with negative inputs, avoiding the phenomenon of neuronal death, while also increasing the neuronal diversity. According to the above formula, the default value of C is 0.01 [18].
However, there are also some problems with using the Leaky ReLU activation function in the network training. Since the output coefficient of negative input in its mathematical expression is fixed and is not necessarily the most suitable for training effect, if you want to find the best value of coefficient C, you need to do many experiments. To address this issue, this study employed P-ReLU activation function in the network model to replace the original ReLU activation function in the convolutional layer. The P-ReLU activation function [19] is a learnable activation function that automatically adjusts the output coefficient at the negative input based on the training data.
Within the negative interval, the weight of the neurons is controlled by the parameter A. Unlike the LeakyReLU activation function fixed weight, this weight can be learned and dynamically adjusted during training. The variable i represents the different channels. This activation function allows all channels to share one weight, and can set different weights for each channel. The default initial value is 0.25. Because the P-ReLU function still has derivatives at x < 0, no gradient vanishing problem, and the function is non-saturated, it can effectively solve the problem of neurons dying in the negative interval, so as to improve the network performance and accelerate the model convergence to a certain extent. Below is the image of the ReLU activation function versus the P-ReLU activation function [20] (see Fig. 1).
This paper is based on the VGG 16 convolutional neural network and preserves the overall structure. A batch normalization (BN) layer was added after each set of convolution operations, replacing the original activation function as P-ReLU to solve the neuronal death problem. The last fully connected layer of the original model has 1000 labels, but there are only four categories in this paper, so we changed the last Softmax classifier to four labels and reduced the number of fully connected layers to two. The dimensionality of the first fully connected layer becomes 4096 and the second is 4, corresponding to four concrete crack detection images. These adjustments simplify the network structure and improve the identification efficiency. The structure of the improved convolutional neural network (CNN) model is shown below (see Fig. 2).
In the modified VGG-16 convolutional neural network model, the key parameters of each layer are shown in the table below (see Table 1).
After calculating the loss function, to update the parameters of the network nodes, we adopted the Adam optimizer [21] for training. Compared with the traditional stochastic gradient descent (SGD), the Adam optimizer comprehensively considers the first and second order gradient estimation, thus achieving the model convergence faster and improving the training efficiency. The mathematical expression of the Adam optimization algorithm is shown below in formula (8)–(12).
\({\theta }_{t}\)+1 represents the weight values of the neural network model in the t + 1 round iteration. Meanwhile, \({\text{m}}_{\text{t}}\) and \({\text{n}}_{\text{t}}\) represent the first and second moment estimates of the gradient, respectively. The t and t are correction terms for \({\text{m}}_{\text{t}}\) and \({\text{n}}_{\text{t}}\), which are the first moment estimate and the second moment estimate of the corrected deviation, respectively. The μ, v and ε are hyper-parameters, usually set as μ = 0.9, v = 0.999 and ε = \({10}^{-8}\).
2.2 Setting of the Hyperparameters of the Network Model
In neural network training, the choice of learning rate is crucial. Too much learning rate may lead to oscillations and extended training time, while too little learning rate may lead to slow convergence and local optimal solution problems. Therefore, the rational selection and adjustment of the hyperparameters is the critical task. After multiple training and adjustment, the hyperparameters of the original VGG-16 and the modified model are as follows (see Table 2 and Table 3):
3 Experimental Results and Analysis
The evaluation model mainly focuses on the following aspects: training accuracy, decline speed of loss function, model convergence rate, and oscillation existence. We will analyze the effect of the improvement strategy on the experimental results from the perspective of training accuracy. The accuracy reflects the model’s ability to identify concrete surface cracks, the loss value reflects the error level of the model in identifying diseases and insect pests, and the oscillation degree reflects the stability and gradient explosion of the model. The following figure shows the training accuracy of the original VGG-16 model (see Fig. 3) and the improved convolutional neural network (see Fig. 4).
As can be seen from the figure above, after 120 rounds of training, the convergence rate of the improved model and the training accuracy are significantly better than the unimproved model. Because we introduce the batch normalization module and P-ReLU activation function, which effectively improves the convergence rate of the network, avoids large oscillations, reduces the risk of overfitting, and improves the generalization ability of the model. In contrast, the unimproved VGG-16 model failed to perform as well on the accuracy of the model training due to its numerous parameters. In terms of convergence rate, the original model starts to converge at round 15, while the improved model approaches convergence and is faster at round 10. Considering the training accuracy and convergence speed, it can be concluded that in addition to the change of the network training accuracy, it is also necessary to pay attention to the change of the loss function.
For the original, the VGG-16 model and the improved convolutional neural network. From the curve of the loss function, the loss function value of both models decreases rapidly in the beginning stage of training and eventually drops to nearly zero, but the improved neural network model decreases faster. Although the improved model had local oscillations at the beginning of training, which showed slightly worse stability, considering the size of the loss function and the reduction of the training loss function. The loss value changes of the original VGG16 network model (see Fig. 5) and the improved convolutional neural network model during training (see Fig. 6).
When training the network model using SGD stochastic gradient descent, the network convergence rate is significantly slower compared to the Adam optimization algorithm. In the experiment, the network starts to converge until the end of training. Finally, the network model does not fully converge and needs to increase the number of training rounds, leading to a significantly longer training time. The final training accuracy was 91.1%, which was lower than the 98.7% for the improved model. The loss function value also fluctuates repeatedly in a certain area until the end of the training, with no level close to zero. Therefore, it is necessary to choose the appropriate optimization algorithm according to the network model and the data set reality to obtain better results.
An improved deep convolutional neural network model based on VGG-16 convolutional neural network to identify four concrete surface cracks. The training accuracy of the improved model reached 98.7%. Improvements include introducing a batch normalization module after each convolutional layer to improve the convergence rate of the model. Meanwhile, the ReLU activation function in the original network was replaced with the P-ReLU activation function to reduce the effect of the ReLU function on the gradient vanishing problem of the network training.
4 Conclusion
As a strong infrastructure country, China is crucial to the timely and effective monitoring of the surface cracks of buildings, which is related to the safety of people’s lives and property. However, traditional methods are often time-consuming and laborious, and fail to meet practical needs. Fortunately, the emergence of deep learning techniques has solved this problem. Convolutional neural network model is widely used to identify and classify concrete surface crack images, thus providing reference data for improving process and maintenance. LeNet-5 is the earliest convolutional neural network model, followed by AlexNet, VGG, GoogLeNet, ResNet, etc. These models are deepening, but also bring some problems, such as training accuracy tends to saturation, occupy space and increasing parameters. Therefore, when selecting a network model, we need to focus on network structure, training time, occupancy space and parameters in order to find the most suitable model for this study. In this paper, we improve the original VGG model structure by adding the batch normalization layer and P-ReLU activation function and replacing the SGD optimizer with the Adam algorithm, thus increasing the convergence speed.
References
Jin, Y., Ma, W., Li, J., Kovacevic, A.: Electrical resistance–based monitoring of concrete cancer. Constr. Build. Mater. 244, 119516 (2020)
Ma, Y., Zhang, Y., Wang, Z., Wu, J.: Experimental study on the preparation and mechanical properties of laser welded high-strength steel sheet for automobile body. Mater. Sci. Eng. A 764, 140699 (2021)
Matthews, R.D., Fowler, H.: The use of waste tyres in cementitious materials: a review. J. Clean. Prod. 278, 122756 (2020)
Li, J., Li, Z., Wang, X., Xie, Z.: Transfer learning for visual object recognition: a survey. Pattern Recogn. 74, 59–78 (2019)
Park, H., Lee, H., Lee, Y.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. Pattern Recogn. 88535–88549 (2019)
Tóth, Á., Pálsson, S., Borbála, T., Fülöp, L.: The use of deep learning techniques in the field of image processing: a review. Pattern Recogn. Lett. 109048–109066 (2021)
Li, Y., Wang, H., Li, Z.: Bridge crack detection using deep learning: a review. J. Struct. Eng. 147(1), 04019186 (2021)
Wang, H., Li, Y., Li, Z.: Bridge crack detection using deep learning: a comparative study. J. Bridg. Eng. 26(5), 04020049 (2021)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. arXiv preprint arXiv:1904.02701v1 (2019)
Redmon, J., et al.: YOLOv3: an accurate, fast, and robust real-time object detection system. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Li, Y., Wang, H., Li, Z.: Bridge crack detection using deep learning and image processing: a review. J. Autom. Control Eng. 5(3), 46–53 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ozgenel, C.F.: Concrete crack image for classification. MendeleyData, v2 (2019). https://doi.org/10.17632/5y9wdsg2zt.2
Youjun, Y., Bokai, T., Hongjun, W., et al.: Improved application of VGG model in apple appearance classification. Sci. Technol. Eng. 20(19), 7787–7792 (2020)
Hinton, G., Van Camp, D., Hinton, R.: RMSProp: divide the gradient by a running average of its recent magnitude. arxiv preprint arXiv:1212.5701 (2012)
Chen, M., Yu, L., Sang, A., et al.: Multi-label image classification based on a convolutional neural network. J. Jilin Univ. (Eng. Edn.) (3), 10771084 (2020)
He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, p. 10261034 (2015)
Hakur, R.S., Yadav, R.N., Gupta, L.: PReLU and edge-aware filter-based image denoiser using convolutional neural network. IET Image Process. 14(13), 3869–3879 (2020)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Processing of the 3rd International Conference for Learning Representations, pp. 1–12 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this paper
Cite this paper
He, Z., Zhang, H. (2024). A Method of Concrete Surface Crack Detection Using an Improved Convolutional Neural Network (CNN) Model. In: Xiang, P., Zuo, L. (eds) Novel Technology and Whole-Process Management in Prefabricated Building. PBSFTT 2023. Lecture Notes in Civil Engineering, vol 382. Springer, Singapore. https://doi.org/10.1007/978-981-97-5108-2_36
Download citation
DOI: https://doi.org/10.1007/978-981-97-5108-2_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5107-5
Online ISBN: 978-981-97-5108-2
eBook Packages: EngineeringEngineering (R0)