1 Introduction

Every day, the latest traffic data is required to maintain a safe and effective flow of traffic, as well as to identify and apprehend lawbreakers. In numerous circumstances, such as security control [1,2,3], automatic toll collection [4], unattended parking lots [5,6,7], and enforced traffic safety [8,9,10,11,12], the recognition of the license plate has been utilized to gather information about a vehicle. As technology advances, more and more locations now have traffic surveillance cameras. The requirement for real-time data processing and running data through a machine learning algorithm increases along with the amount of data that cameras are collecting. To address this challenge, we developed a Docker Container-Based Framework based on the Apache Kafka Node Ecosystem that not only allows customers to query the license plate number of the vehicle they want to track, but also offers a unique feature: the ”Plate Specific Video Reconstruction”. This feature enables users to generate specific video clips of vehicles from accumulated data, providing a customized insight into individual vehicle activities.

In our framework, microservices were leveraged for flexibility and scalability. Each microservice is made up of various components. The duties of each component are different. In practice, it prevents us from being too closely connected. When one of the services is unavailable, the application still functions. Real-time processing on a single node with a single GPU runs out of memory and is not feasible. It is crucial to distribute processes among various nodes to address this. Communication must be established between jobs that have been divided into smaller tasks.

Middleware message queue provides a practical way to address the issues described above. There are systems for middleware message queueing, including RabbitMQ [15], ActiveMQ [14], and ZeroMQ [13]. Research shows that in some situations Apache Kafka outperforms other solutions [16].

For massive data streaming, Apache Kafka [17] has recently been released. Real-time data in motion can be handled using streams of computation. Real-time processing is necessary in the industry for applications that use sensors, log files, and web streams. A distributed publish-subscribe messaging system called Apache Kafka enables real-time data processing by connecting several microservices.

This study suggests a real-time processing pipeline that is evaluated using Apache Kafka to analyze feeds from several sources of surveillance cameras. The pipeline is built on an open-source framework. The remaining portions of the essay are structured as follows: Related work on license plate detection and combination designs of license plate recognition and distribution systems is presented in Section 2 of this article. The proposed approach is described in Section 3. The results of the experiment are presented in Section 4, and a conclusion is provided in Section 5.

2 Related Work

2.1 Vehicle Detection and License Plate Recognition Systems

Donget et al. [18] proposed a classification of vehicle types with a Convolutional Neural Network (CNN) from the frontal view of the vehicles. The accuracy of their proposed approach reached 96. 1% during the day and 89.4% during the night. Kul et al. [19] used the Support Vector Machine (SVM), Adaboost, and Artificial Neural Network (ANN) to classify vehicles. They obtained an accuracy of 87.5%, 81.6% and 85.4%, respectively. Kul et al. [20, 21] proposed a two-stage vehicle classification system. Tashiev et al. [22] developed a vehicle classification system that uses the YOLO real-time object classification framework and tested their work on the BITvehicle dataset. Their work reached an accuracy of 90.35%. They used TinyYOLO [23] and two CNN; one for vehicle detection and the other for vehicle classification. Their method with Tiny-YOLO reached 89.19% accuracy in terms of the Intersection over Union (IOU) metric. Kul et al. [24] showed that vehicle classification is a challenging problem due to the different dimensions of vehicles. Goncalves et al. [25] have established an optical character recognition system with a convolutional neural network for plate recognition. Their proposed detection approach reached an accuracy of 79.3% , and their proposed recognition approach reached an accuracy of 85.6%. Laroca et al. [26] identified the license plate recognition system based on the YOLO [23] object detector. They trained the YoloV2 architecture, which is built on the Darknet [27] library, with their own tagged data set. This model does not perform well when the frames have inclined angles. Their system performed with an accuracy rate of 78.33. Yonetsu et al. [28] proposed a two-stage YOLOv2 for the detection of license plates. They extracted cars and license plates from images with objects of cars. They also built a database of Japanese license plates. Their proposed license plate detection method reached an accuracy of 87% in clear weather, 74% at nightfall, and 11% in the dark. In [29], Nam et al. used the background modeling and subtraction (BGS) model for vehicle detection. To make their classifier robust, they performed the Gaussian functions of the OpenCV library. Z. Selmi and colleagues achieved an accuracy of 92.7%. Selmiet et al. [30] proposed a deep learning license plate recognition system. Before recognizing the characters on a license plate, they built a CNN model for the classification of plates and nonplates. Their proposed method reached 94.8% accuracy with the Caltech dataset. Their system fails in some of the cases where there are multiple license plates. D. Pu et al. [31, 32] presented a real-time CNN-based license plate detection system. But their approach did not work well for small license plates. Abdullah et al. [33], presented a real-time license plate recognition system for Bangladesh. Their dataset includes photos that have different environmental conditions and is public. They used the YOLOv3 algorithm for their system and their method achieved 85% in terms of IoU for digit recognition. Silva et al. [34] have developed a system that includes vehicle detection, license plate detection, and plate reading steps. Their approach allows reading license plates from different angles, and they also have created a synthetSilva ic data set.

2.2 Distributed Messaging Platform

Distributed messaging systems are based on reliable message queueing. Asynchronously, messages are transmitted through the system between applications. Applications have two types of filters to select messages: topic-based and content-based. There are different distributed messaging systems: Apache Kafka [35], RabbitMQ [36], JMS (Java Message Service) [37], ActiveMQ [38], ZeroMQ [39], and Kestrel [40]. Apache Kafka is a messaging platform designed to write more than 10 million messages per day on average of 172,000 messages per second [41]. Although Kafka’s original purpose is for log processing, it is also used for different scenarios. There are studies in the literature that use Apache Kafka for frame transmission. Yoon-Ki Kim et al. [42] suggested that the color channels of the images to be transmitted over Kafka should be separated and collected under separate topics. There are other studies in plate recognition systems, distributed messages, and platforms that are used together. S. Jung et al. [43], in the vehicle tracking system they developed, read plates that they captured after graying-singularizing and filtering processes for license plate recognition with Tesseract OCR. They used Apache Kafka to transmit the visuals. H. Chen et al. [44] proposed a video processing framework based on Spark Stream and Kafka. They used Apache Kafka to acquire the camera data in real time. Kul et al. [45] proposed a real-time vehicle classification system based on the Java Message Service. They separated vehicle types into three classes: small, medium and big. Each class has a topic. After classification is done, the vehicles are published according to their labels. They used ANN, AdaBoost and SVM algorithms. The best accuracy rate is obtained by using the ANN algorithm.

Table 1 comparison Table

3 Proposed Method

Our proposed method basically consists of three steps: plate detection with YOLOv3 [46], plate recognition with Tesseract OCR, and transmission of detected plates through Apache Kafka. To provide a comprehensive understanding of our methodology and its advantages over existing systems, we present a comparison table detailing the features and capabilities of our proposed system relative to others. This comparison is provided in Table 1. The detailed flow chart that illustrates the steps and processes of our method is shown in Fig. 1.

Fig. 1
figure 1

System Architecture

Fig. 2
figure 2

Vehicle Detection and License Plate Recognition

3.1 Vehicle Detection and License Plate Detection

Generally, in plate recognition systems, plate detection is usually performed first and then character recognition is performed on the detected area. In our study, we first performed plate detection with YOLOv3 based on Darknet-53. YOLO is an open source artificial neural network library written in C with the CUDA programming language. The parameters batch is 64 and the number of subdivisions is 8. As shown in Fig. 2, different vehicle types have different position on the license plate. Therefore, we trained the YOLOv3 network accordingly and used the BIT-Vehicle data set for this. In this data set, there are 6 classes of vehicles, namely buses, microbusses, minivans, sedans, SUVs, and trucks. These classes include 558, 883, 476, 5,922, 1,392, and 822 vehicles, respectively.

3.2 Distributed Publish Subscribe Message Platform

With the developed system, users will enter the license plate number they want to query into the system. By creating a topic with the detected plate number, the user will only see images of that vehicle.

Apache Kafka is an open source stream processing platform developed by LinkedIn [35], where producers publish messages into topics and consumers receive messages from the subscribed topics. A topic is a category of messages and may be stored in one or more partitions in the Apache Kafka cluster. Here, there are brokers, also known as Kafka Servers. Brokers are used to distribute all partitions and replications.

Fig. 3
figure 3

Avro Metadata

Fig. 4
figure 4

Constructed video clip of given plate

As shown in Fig. 1, we used 3 brokers, 2 topics, 2 partitions, and 1 replication. We use the Apache Avro format for metadata creation to send data over the cluster. The metadata contains three pieces of information: the date the image was saved, the camera ID, and the vehicle image in byte array format. Figure 3 shows how Apache Avro was used to serialize these JSON files before deploying them on Kafka.

3.3 Vehicle Trace Video Construction

Our proposed method introduces a clear strategy for effectively managing large volumes of video data. By doing so, it improves the user experience, allowing users to directly access relevant video clips without the need to browse through long recordings. Importantly, the system is designed with scalability in mind. As more surveillance feeds or vehicles are added, the system remains manageable due to the easy procedure of adding new Kafka topics.

Upon initiating a query via the Web interface, users specify the start date, end date, and license plate number to be queried. Subsequently, they are assigned a UUID. The server subscribes to the Kafka topic, which is named as the specified license plate number, using the uniquely assigned UUID as the group number and begins asynchronously reading messages under this topic. The specific assignment of the group ID ensures that all messages are retrieved without any loss of data. Upon receipt, each message undergoes decoding, followed by date filtering. Messages whose date headers fall outside the user-defined range are ignored and not retained in memory. The selected messages are first sorted by date and subsequently by the timestamp (in milliseconds) of the captured image.

Once the data are sorted by date, we use the OpenCV library to compile and produce a video, which is then saved on the server in mp4 format as seen in Fig. 4.

4 Experimental Results

All our experiments were performed on a machine with Intel ®  Core i9-9900KF CPU @ 3.60GHz \(\times \) 16, GeForce RTX 2080 Ti/PCIe/SSE2. CuDNN: 7.6.5, CUDNNHALF = 1, GPUcount: 2 and OpenCVversion: 4.1.2.

4.1 Plate Detection and Recognition

In YOLOv3, F1 scores and IoU metrics are used to track the accuracy of the model. Model training was carried out by labeling the plates of different types of vehicles separately. LabelImg [47] is a tool used to obtain ground-truth labels. Our trained model has achieved IoU = 79.01%, precision = 0.96, recall = 0.98, F1 score = 0.97, and average precision = 98.28%, as shown in Table 2.

Table 2 YOLOv3 Performance on BIT-Vehicle Dataset

4.2 Apache Kafka

Apache Kafka can run both locally and in a Docker container. The Docker container microservice architecture provides many advantages. However, performance may differ. In Fig. 5, the application performance (message / second) is shown. Messages were sent in batches of 10, 100, 1000, 10,000, and 100,000.

Fig. 5
figure 5

Throughputs of platform-based distributed stream processing

The primary objective of this experiment was to measure the throughput (messages per second) of an application using Kafka, comparing its performance in a Docker environment against a local setup. Throughput was observed by transmitting messages in different batch sizes: 10, 100, 1,000, 10,000, and 100,000.

As depicted in Fig. 5, the performance varies with the size of the message batch and the environment. For smaller batch sizes, the performance on both Docker and locally remains relatively close, but as we scale up the number of records, a noticeable difference in throughput is observed.

For a batch size of 100,000 records, Kafka’s performance on Docker reached a throughput of approximately 163,666.12 messages/second, while locally it achieved around 148,439.97 messages/second.

This shows that in our setup, Docker slightly outperformed the local environment when dealing with larger record numbers.

5 Conclusion

In this study, we developed a vehicle recognition system based on license plates using container-based Apache Kafka. Vehicle license plates were processed by each node and sent to their topics. Memory capacity has been the biggest limitation of monolithic applications. Although the GPU is widely used, it is still expensive and has limited capacity. With the developed system, we try to overcome the capacity problem and provide the user with flexibility of use.

In our future work, we will work on the vehicles in Turkey and create a data set that contains Turkish license plates. We will develop a smart search system with Apache Kafka.

6 List of Abbreviations

Abbreviation

Definition

License Plate Recognition

LPR

You Only Look Once

YOLO

You Only Look Once version 3

YOLOv3

Optical Character Recognition

OCR

Convolutional Neural Network

CNN

Support Vector Machine

SVM

Artificial Neural Network

ANN

Intersection over Union

IoU

Background Modeling and Subtraction

BGS

Java Message Service

JMS