1 Introduction

Intersection markings play a vital role in providing road users with guidance and information. Maintaining an accurate inventory of intersection markings is essential for effective transportation management. According to the Federal Highway Administration’s (FHWA’s) program on the Model Inventory of Roadway Elements (MIRE), roadway data, including intersection elements, are critical to data-driven highway safety management (Lefler et al., 2017). Specifically, MIRE’s gap analysis has identified that existing roadway inventories have large gaps in intersection descriptors such as type and number of exclusive left turn lanes, right turn channelization, and presence of crosswalk (Mallela et al., 2012). Meanwhile, the conditions of intersection markings will be gradually degrading due to vehicular traffic, rain, and/or snowplowing. Degraded markings can confuse drivers, leading to increased risk of traffic crashes. Timely obtaining high-quality information of intersection markings lays a foundation for making informed decisions in safety management and maintenance prioritization.

However, many states do not process an up-to-date statewide inventory and condition information of traffic assets because the high cost of data collection offsets the benefit of having such information (Balali et al., 2015). Traffic asset data are generally collected either by field investigation or computer-based manual extraction from aerial images, street views and video logs, and both of these data collection approaches are cost prohibitive (Proulx et al., 2015). Current labor-intensive and high-cost data collection practices make it very challenging to gather intersection data on a large scale (Fiedler et al., 2013). Road markings are among the traffic assets that can easily deteriorate over time, making it even more costly to keep track of their latest conditions. To collect statewide marking data and to prioritize the replacement need have created a demand for a cost-effective and scalable tool that can efficiently and accurately track the classifications, geographic locations, and conditions of road markings.

This study aims to develop an automated and scalable system powered by artificial intelligence (AI) for urban infrastructure data collection. The system can fully automate the processes of marking data collection and condition assessment on a large scale with almost zero cost and short processing time (e.g., in a preliminary test, the processing time per intersection is less than 2 s). Urban science is a multidisciplinary domain centered around leveraging data, technology, and analytical methods to tackle complex urban challenges. In this context, the study holds significant potential for advancing urban science by introducing innovative methodologies for collecting urban infrastructure data. The system's ability to generate extensive datasets in a cost-effective manner can profoundly impact urban science in multiple critical areas:

1.1 Improves the inventory of roadway data elements

The system offers a highly cost-effective tool to enhance current roadway inventory databases while supplying fundamental data elements crucial for advancing urban science.

1.2 Advances intersection safety management

The system can provide transportation agencies demanding data for Highway Safety Improvement Program (HSIP). The availability of large-scale intersection marking data (e.g., presence of crosswalks, dedicated left-turn lanes, etc.) enables agencies to use the analytic methods provided in the American Association of State Highway and Transportation Officials’ (AASHTO’s) Highway Safety Manual (HSM). It helps bridge the gaps in current modeling practices by offering critical data to support safety decision making in hotspot identification and before-after safety evaluation.

1.3 Enables infrastructure maintenance prioritization

It is estimated that state agencies spend more than $1 billion annually in maintaining road markings in the United States and Canada (Zhang & Ge, 2012). The developed system can allow agencies to monitor the conditions of a large number of markings for better allocation of resources and timely maintenance.

1.4 Augments intelligent transportation systems (ITS)

The developed system can produce detailed intersection profiles for supporting ITS applications such as the development of high-resolution digital maps, driver-assistance systems, and safety warning systems.

1.5 Supports transportation planning modeling

The generated intersection data can help transportation planners develop more accurate planning models by incorporating detailed information on intersection configurations.

2 Literature review

Though road marking data are generally collected manually in practice, there are research efforts devoted to automating the process. Image processing techniques were widely used to identify road markings such as image segmentation (Senlet & Elgammal, 2012), geometric parameter optimization (Foucher et al., 2011) and edge detection (Ahmetovic et al., 2015). The template matching method (Liu et al., 2012; Wu & Ranganathan, 2012) was also used for road marking recognition. Despite the fast speed image processing and template matching methods can offer, their decisions rely on empirical functions, which are difficult to be generalized in a changing environment (Chen et al., 2015; Vokhidov et al., 2016). More adaptive methods are learning-based such as k-nearest neighbors (KNN) (Rebut et al., 2004), support vector machine (SVM) (Greenhalgh & Mirmehdi, 2015; Sukhwani et al., 2014), random forest (Smith et al., 2013) and artificial neural network (ANN) (Máttyus et al., 2016; Yamamoto et al., 2014).

More recent advances include the exploitation of deep learning methods that have capability to autonomously learn discriminative features from image data. For instance, Vokhidov et al. (2016) found convolutional neural network (CNN) could better recognized lane-use arrows in various environments. Wen et al. (2019) also used CNN to classify different types of road markings with considerable differences. R-CNN (Region-based Convolutional Neural Network), proposed by Girshick et al. (2014), can not only recognizing what objects are present but also determining their precise locations by drawing bounding boxes around them. It combined selective search for region proposals and a CNN for feature extraction. R-CNN achieved impressive accuracy but was computationally expensive due to its sequential processing of regions, making it impractical for real-time applications. R-CNN was utilized by Tian et al. (2020) to detect lane-use arrows and while/yellow lane lines. Their results showed that R-CNN could robustly extract road markers under various complex traffic scene. Fast R-CNN (Girshick, 2015) addressed the computational inefficiency of R-CNN by introducing the concept of region-of-interest (ROI) pooling. It allowed feature extraction from the entire image in a single forward pass, significantly speeding up the process. Fast R-CNN demonstrated improved accuracy and efficiency over its predecessor, making it more practical for real-world applications. Qian et al. (2016) employed Fast R-CNN to detect road surface traffic signs including lane-use markings to assist automated driving.

Compared with marking recognition, much less research focused on the automatic assessment of marking conditions. Burrow et al. (2000) determined the extent of erosion by comparing present road markings with the “ideal” ones. Both Zhang and Ge (2012) and Lin et al. (2016) used image processing techniques to capture characteristics of markings such as geometric deformity, colors and edge lines and then to determine the quality level of markings.

There are several limitations of existing studies. Firstly, most learning-based methods for marking recognition are customized for driving assistance instead of inventory management, so they use small and local datasets and are not suitable for large-scale data collection. Secondly, most existing approaches for marking recognition are still sensitive to noises on road markings such as occlusion, illumination variations and worn-out conditions. Thirdly, condition assessment of markings is still under-examined. Existing methods rely on image processing techniques and more robust and adaptive methods are needed. Fourthly, previous studies either focus on marking recognition or condition assessment, there is no integrated method available which can optimize the whole data collection process and reduce computation time. Thus, there is an immediate need to develop a more optimal and economical solution for marking data collection on a large scale.

3 Methodology

3.1 An overview of the system

This section presents an overview of the AI-powered system for intersection marking data collection. You can find a demonstration of the system at the following link: https://youtu.be/fvHf1H7i8Wo. Figure 1 illustrates the conceptual design of the system. The system focuses on two types of markings at intersections – lane-use arrows and crosswalks, while it has the flexibility to be extended to cover other road markings as well. The system economically utilizes roadway geographic information systems (GIS) data and aerial images as inputs, which are commonly available from transportation agencies or open sources. The use of GIS data enables fast indexing and identification of intersections and accelerate the process of aerials image data extraction, making the proposed approach truly scalable and computationally efficient. The synthesis process entails the matching of geographic coordinates between the intersection GIS data and aerial images, allowing for auto-extraction the corresponding intersection images. The extracted intersection image data were used to train a novel computer vision model for detection, characterization, and condition assessment of intersection markings. Emerging AI techniques were harnessed to improve accuracy, robustness, and computational efficiency of the system. This system will be the foundation of future expansions to collect other roadway features such as medians and driveways to support additional data needs.

Fig. 1
figure 1

Conceptual illustration of the automated system for intersection marking data collection

The system has innovatively addressed the limitations of existing data collection approaches from the following aspects:

  1. 1.

    Seamless integration of spatial analytics with AI techniques. With the help of existing intersections’ locations, AI techniques can easily have the advantages in recognizing visual patterns. Spatial analytics helps pinpoint intersections in the target area and auto-extract their aerial images. Incorporating the spatial information can be the catalyst to greatly reduce the efforts in image segmentation and object recognition, and thus it makes the data collection process truly scalable and computationally efficient.

  2. 2.

    Smart application of deep learning for condition assessment. Humans are sensitive to visual impairments of markings, but it is very costly to apply subjective assessment on a large scale. The system leverages deep learning to generate quality scores consistent with human viewers. The multi-scale deep features of markings are fed into a regression sub-network to produce quality scores to indicate their degradation conditions.

  3. 3.

    Multi-task learning for higher accuracy and computational efficiency. The system creatively performs the joint tasks of intersection marking detection, characterization, and condition assessment in an end-to-end deep learning model. Model can better learn a new task by transferring the knowledge it has acquired by learning a related task. The simultaneous accomplishment of multiple tasks ensures its computational efficiency and inference performance for large-scale data collection practices.

  4. 4.

    Enhanced system accessibility and reproducibility. Despite the equipped advanced spatial analytics and AI components, the system has no prerequisite of knowledge and skills in imaging processing and GIS tools, and therefore enables more users to access it. In addition, it provides objective measurements for reproducible data collection.

3.2 Annotation of intersection aerial images

An annotation tool of Computer Vision Annotation Tool (CVAT) was tested and used to manually label the types of lane-use arrows (i.e., left, right, left & straight, right & straight, and straight) and crosswalks (i.e., transverse, zebra, and ladder) and their degradation conditions (i.e., low-quality and high-quality). Markings are categorized as high-quality if they are intact without any visible damage. Conversely, if a marking exhibits any form of damage or deterioration, it is classified as low-quality. Prior to data collection, all assessors underwent a thorough training session to become well-acquainted with both the annotation tool and the data collection protocol. Evaluation was conducted initially to ensure the integrity and consistency of the collected data. An example of annotation results is shown in Fig. 2.

Fig. 2
figure 2

Overall annotated image with land-use arrows, crosswalks, and degradation conditions labeled

3.3 Lane-use arrow detection

The Faster R-CNN (Ren et al., 2015) model is an object detection model that improves on Fast R-CNN by using a region proposal network (RPN) with the CNN model. The RPN shares full-image convolutional features with the detection network, enabling nearly cost-free region proposals. It's a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are then used by Fast R-CNN for detection. As a whole, Faster R-CNN consists of two modules: a deep fully convolutional network that proposes regions, and the Fast R-CNN detector that uses the proposed regions. The Faster RCNN model was used to detect and classify lane-use arrows (five categories: Left, Left & Straight, Straight, Right, Right & Straight) in the satellite images. The network structure is shown in Fig. 3. The backbone to extract image feature is the convolutional neural network with 16 layers (VGG16) (Simonyan & Zisserman, 2014).

Fig. 3
figure 3

Faster RCNN model for lane-use arrow detection (modified based on Ren et al. (2015))

3.4 Crosswalk detection

The crosswalks were classified into three types according to the Manual on Uniform Traffic Control Devices (MUTCD) standards as shown in Fig. 4.

Fig. 4
figure 4

Examples of crosswalk markings (T1—Transverse Crosswalk: crosswalk marker with two parallel solid white lines; T2—Zebra Crosswalk: crosswalk marker with a series of closely spaced solid white lines; T3 – Ladder Crosswalk: crosswalk marker with solid white lines between two parallel solid white lines) (FHWA, 2009)

Most crosswalk markings are arbitrary-oriented, horizontal bounding boxes used for the detection of lane-use arrows are no longer suitable. A deep learning model capable of detecting rotated objects is needed. The Box Boundary-Aware Vectors (BBAVectors) model (Yi et al., 2021) was used for oriented object detection in aerial images with Box Boundary-Aware Vectors. The BBAVectors model resulted in an outstanding performance in the Large-scale dataset for object detection in aerial images (DOTA) dataset (Xia et al., 2018), which is a benchmark dataset for oriented object detection in computer vision. The BBAVectors model is used for detecting arbitrary-oriented objects, such as crosswalk markings in this case. This model is built upon the CenterNet (Duan et al., 2019), extending it for the oriented object detection task. The BBAVectors use a simple yet effective strategy to describe the Oriented Bounding Box (OBB). They are measured in the same Cartesian coordinate system for all the arbitrarily oriented objects, achieving better performance than the baseline method that learns the width, height, and angle of the OBBs. The model is single-stage and anchor box free, which makes it fast and accurate. The network structure of BBA Vectors is shown in Fig. 5.

Fig. 5
figure 5

BBA Vectors network structure (modified based on Yi et al. (2021))

3.5 Degradation condition assessment

Degradation conditions of markings were first manually annotated into two quality classes, i.e., low-quality and high-quality. If a marking (a lane-use arrow or crosswalk) is complete without any visible damage, it is classified as high-quality, otherwise it is classified as low-quality. Examples of low-quality and high-quality markings are presented in Fig. 6.

Fig. 6
figure 6

Examples of low-quality (a) and high-quality (b) degradation conditions

Kang et al. (2014) used a convolutional neural network for image quality assessment. A deep convolutional neural network model VGG16 (Simonyan & Zisserman, 2014) was developed for quality assessment. The quality score, which represents the estimated probability of a marking belonging to the high-quality category as determined by VGG16, was utilized to assess marking conditions. Quality scores range from 0 (indicating the lowest quality) to 1 (indicating the highest quality), providing a measure of marking degradation levels. VGG16 has great flexibility to learn the perception of human viewers on degradation conditions. The structure of VGG16 is presented in Fig. 7.

Fig. 7
figure 7

The VGG16 network structure for image quality assessment (modified based on Simonyan and Zisserman (2014))

4 System development

4.1 System structure

Figure 8 illustrates the architecture of the system, which consists of two main components: the backend and the frontend. The backend is responsible for deploying a system that facilitates the transmission of results from the vision component to the frontend. Conversely, the frontend is designed to display the outcomes and provide a user interface for seamless interaction.

Fig. 8
figure 8

System structure

4.2 Backend

The FastAPI (Lathkar, 2023) framework was selected as the foundation for constructing the backend system. FastAPI is a contemporary, efficient, and web-based framework designed for creating Application Programming Interfaces (APIs) using Python 3.6 + and relies on standard Python type hints. In the backend, intersection images serve as input and are processed through a computer vision module and an output module. The computer vision module performs the detection of lane-use arrows and crosswalks while assessing their degradation conditions. The resulting outputs consist of labeled intersection images and.csv files containing comprehensive marking information.

4.3 Frontend

The frontend of the web-based system was developed to provide users with a graphical user interface (GUI) for viewing and interacting with the system. JavaScript was utilized to create dynamic elements on static Hyper Text Markup Language (HTML) web pages. The Mapbox API was employed to retrieve aerial images of intersections based on the coordinates provided by users. The interface features four buttons: Input, Start, End, and Output. The Input button allows users to enter the location of the intersections, the Start button initiates the processing, the End button halts the process, and the Output button enables the export of data.

4.4 Input, graphical user interface, and output

The input data contains intersection coordinate information and is tabulated in common.csv format. An example of the input data derived from LRS Road Intersections (VDOT, 2017) is shown in Table 1. There are three columns including Intersection_ID, Latitude, and Longitude.

Table 1 Input data format

The graphical user interface of the system prototype is shown in Fig. 9. You can find a demonstration of the system at the following link: https://youtu.be/fvHf1H7i8Wo.

Fig. 9
figure 9

Graphical user interface of the system prototype

A sample output file in.csv format is presented in Fig. 10, with its field description listed in Table 2. The users have the option to output labeled images data for verification purposes as shown in Fig. 11.

Fig. 10
figure 10

Sample of an exported.csv file

Table 2 Field description of the exported.csv data
Fig. 11
figure 11

Sample of an exported.jpg file (Numbers outside of parentheses are quality scores, numbers within parentheses are widths and lengths of crosswalks)

4.5 Programming packages and analytical tools

For programming packages, the vision algorithm was made use of PyTorch (Paszke et al., 2019), a popular deep learning framework, to build and train the AI models. PyTorch provides a flexible and efficient platform for developing neural networks and conducting deep learning tasks. Additionally, other essential packages like NumPy, pandas, and JavaScript (JS) were employed. NumPy facilitated numerical computations, pandas enabled efficient data manipulation and analysis, while JS was used for creating dynamic elements in the frontend GUI.

4.6 Experiment setting

For the experiment settings, both the Faster RCNN (Ren et al., 2015) and BBAVectors (Yi et al., 2021) networks were trained for 100 epochs using a learning rate of 1e-4. A confidence threshold of 0.2 was set to determine the detection. Additionally, the quality model was trained for 120 epochs for convergence. For the computational resources, the system is deployed on a 22.04 Ubuntu operating system with NVIDIA GeForce 3090 graphics card.

5 Results

5.1 Lane-use arrow detection

The downloaded aerial images from Mapbox were divided into a training set to train computer vision models and a testing set to test the trained model for performance evaluation. Each aerial image is a 3-channel Red, green, and blue (RGB) color image with a rough resolution of 1354 × 967 pixels. The lane-use arrows of each image were also manually annotated. Table 3 presents the distributions of the lane-use arrows in training and testing datasets.

Table 3 Lane-use arrow data distribution and detection performance

After training the Faster RCNN model on the training set, the detection performance was evaluated on the testing set. Examples of correctly detected and incorrectly detected (e.g., misclassification, missing) lane-use arrows are presented in Fig. 12. Average precision (a.k.a., Area Under the Precision-Recall Curve) was used to evaluate the performance of each lane-use arrow class. Average precision can indicate whether the model can correctly identify all the positive examples without accidentally marking too many negative examples as positive. The mean average precision reaches 85% on the testing set as shown in Table 3.

Fig. 12
figure 12

Example of correctly detected and incorrectly detected lane-use arrows (Numbers indicate confidence levels)

5.2 Crosswalk detection

Over 3,000 aerial images of intersections with crosswalks were collected, which were subsequently divided into a training set to develop the deep learning model and a testing set to evaluate the model performance. All the crosswalks on these images were manually annotated. Table 4 Crosswalk data distribution and detection performance Table 4 presents the distributions of the crosswalks in training and testing datasets.

Table 4 Crosswalk data distribution and detection performance

After developing the BBA Vectors model on the training set, the detection performance was evaluated on the testing set. Examples of correctly detected and incorrectly detected (e.g., misclassification, missing) crosswalks are presented in Fig. 13. A mean average precision of 89% was achieved as shown in Table 4.

Fig. 13
figure 13

Examples of correctly detected and incorrectly detected crosswalks (Numbers indicate confidence levels)

5.3 Assess the degradation conditions of markings

A total of 6,396 lane-use arrows and 5,031 crosswalks were annotated by trained reviewers. Tables 5 and 6 present the distributions of degradation conditions for lane-use arrows and crosswalks. The majority of markings (85.4% for lane-use arrows and 69.4% for crosswalks) are in the high-quality category.

Table 5 Degradation conditions of lane-use arrows and condition assessment performance
Table 6 Degradation conditions of crosswalks

After training the VGG16 model, the classification performance was evaluated on the testing sets of both lane-use arrows and crosswalks. Examples of correctly classified and incorrectly classified markings are presented in Fig. 14. Accuracy (No. of corrected classified instances/total No. of instances) was used to evaluate the performance of conditions assessment as reported in Tables 5 and 6. The overall accuracies for lane-use arrows and crosswalks have achieved 91% and 83%, respectively.

Fig. 14
figure 14

Examples of correctly classified and incorrectly classified markings based on degradation conditions

6 Conclusions

This paper develops an automated system that utilizes advanced AI techniques to detect intersection markings and assess their condition. The system that has been developed holds immense potential for driving the progress of urban science by offering essential urban infrastructure data in a cost-effective manner, which serves as a foundation for analysis and decision-making processes. A summary of the investigation results is as follows:

  1. 1.

    A Faster RCNN model was developed to detect lane-use arrows. The mean average precision has achieved 85% on the testing set.

  2. 2.

    Developed a BBAVectors model that can capture rotated objects to detect crosswalks and achieved a mean average precision of 89%.

  3. 3.

    A VGG16 model was developed to assess the degradation conditions of markings. The overall accuracies for lane-use arrows and crosswalks achieved 91% and 83%, respectively.

From the investigation, it is found that emerging AI techniques (e.g., deep learning) could deliver satisfactory data products in terms of detection, characterization, and condition assessment of intersection markings. The model performance could be further enhanced when additional data are used for model development. The seamless integration of spatial analytics and advanced computer vision techniques makes the system truly cost-effective, scalable, and computationally efficient. The system harnesses emerging AI techniques such as multi-task deep learning to enhance its robustness, accuracy, and computational efficiency. The system is very accessible to users of different technical skills through its graphical user interface.

Existing intersection marking data are generally collected either by field investigation or computer-aided manual extraction from aerial images, street views, and/or video logs. These approaches cost prohibitive and only feasible for very limited data collection needs. In addition, their inherently subjective nature requires extensive training to reduce human errors. The system offers distinct advantages to innovate current practices: (a) extremely low cost, (b) extraordinary scalability, (c) timeliness and consistency, and (d) objective and high-degree reproducibility. The system can automate statewide intersection marking data collection at almost zero cost and with machine-based objective measurements. It can enhance timeliness and consistency of roadway inventory data by rapidly processing latest aerial image data periodically. It eliminates the exposure of surveyors to hazards in field data collection. Unlike manual data collection, the system also provides objective measurements and a high-degree reproducibility of collected data.

The system can generate data elements highly expected by transportation agencies to support the Model Inventory of Roadway Elements (MIRE) program and to advance Highway Safety Improvement Programs (HSIP). Current data collection practices require transportation agencies to invest millions of dollars in contracting very time-consuming data collection services each year. By economically providing large-scale intersection marking data, this system will enable transportation agencies to empower analytic methods for data-driven safety management. The system can also assess the degradation condition of identified markings, and thus timely assist maintenance prioritization for reinforcing intersection safety.

Although the system demonstrates promising performance, it is essential to acknowledge the potential limitations and challenges associated with utilizing aerial photo data in certain geographic contexts. In rural and mountainous regions, the resolution of aerial data might be insufficient, leading to potential impacts on the accuracy of detection and quality assessment outcomes. Furthermore, the less frequent updates of aerial data in these areas can result in outdated information, posing challenges in accurately capturing the current conditions of the markings. It is important to remain cognizant of these factors when implementing the proposed system for data collection in such areas. Additionally, the performance of computer vision models can be further improved by including more data for training.