1 Introduction

YOLO (You Only Look Once) is an algorithm which is used for detecting the objects in the input image or video. The main objective of this algorithm is to classify the type of object present in the image and to lay a boundary over it. This method proves to be more optimum than the previous methods like CNN, R-CNN and FR-CNN [1]. A python script is then developed to count the number of items in each class and display them in detail [4]. Using the concept of machine learning with YOLO algorithm we have developed a program to identify type and number of objects fed as input to the computer. Applications like traffic control and stand-alone cars works fine without any manual errors [2].

2 Proposed System

The objective of the proposed system is to detect the type of objects and display the count of each object in detail. An algorithm named YOLO is used for detecting and bounding the objects in the images. The thousands of traffic signal images are given into the algorithm for classification and analyzing the image components. Therefore, algorithms like R-CNN, YOLO have been developed to find these occurrences in a faster way.

2.1 Literature Survey

Shaif Choudhury describes a vehicle detection technique that can be used for traffic surveillance systems. Traffic monotiring system is proposed with Haar based cascade classifier [6]. Rashmika Nawaratne demonstrates the video surveillance system with monitoring, hazard detection and amenity management. Existing methods are lagging from learning from available videos [7]. Nirmal Purohit proposed a technique for identifying and classifying an enemy vehicle in military defense system using HOG and SVM are utilized [3]. Jia-Ping Lin show that the image recognition with YOLO can be applied in many applications of Intelligent Transportation System [5].

2.2 Training Process

Training a dataset is to make sure that the machine recognizes the input provided to the camera at the time of processing. LABELIMG is a Manual image process used for manually defining regions in an image and creating a textual description of those regions. This process is also called as Annotation. The Explicitation for the training process is illustrated in Fig. 1. First, we added a thousand images for the purpose of training. The images are split in the ratio 0:9 for validation and training respectively. The images that are trained has to be finally checked with the images saved for validation.

Fig. 1.
figure 1

Explicitation of the training process

In an LABELIMG software, the annotation for each thousand images are done i.e., all these images undergo a process called annotation where all the coordinates of the objects in the images are marked and named according to their class. After Annotation an .xml file is created. It contains the names of the class with their coordinates inside the image. The information about the annotated files are serially arranged according to the time of detection. The .xml files are then converted into text files. A text generator is used for converting the xml files into text files. A python script and a shell script are used for executing the Text generator process. The training process understands only text files so the text generator scripts are added before the process. The main principle of annotation process is to make the system learn to name and detect the objects in the image to the original one. The Schema chart for the training process is illustrated in Fig. 2.

Fig. 2.
figure 2

Schema chart of training process

2.3 Prediction Process

At the time of process, a raw image is captured by the camera and sent to the server as shown in the Fig. 3. The server is setup for the making the process faster. The server contains the program for YOLO algorithm along with our pre-trained models. The pretrained models are nothing but the images that we trained after the validation process at the time of splitting up in the ratio 0:9. The raw image obtained from the camera now compares itself with the pre-trained images using the concept of YOLO. Now the algorithm divides the pictures into N × N grids. The system checks each grid for the number of objects inside it. If there are more than one object in a grid, the grid is then further divided into multiple grids. Once the unique objects are detected the boundaries for them declared.

Fig. 3.
figure 3

Explicitation of the prediction process

Now an image with bounded objects and corresponding names in it is produced as the output. According to the code, the number of times each object detected is counted and the count is displayed. The schema chart of prediction process is shown in Fig. 4.

Fig. 4.
figure 4

Schema chart of prediction process

3 Experimental Results

Once the prediction process is completed the predicted output image with objects being bounded and names being named in the image are obtained. The confident level of the object detection is increased. The output in terminal contains the accuracy level and the total count of each objects in number.

Annotation

Annotation is the initial stage of training process. The Fig. 5 shows the objects being marked as named according to their respective classes. A thousand images are trained in such a manner for prediction. For more accurate prediction a further more images are trained in the similar way.

Fig. 5.
figure 5

Annotating the image

Image After Prediction

The process of training should have created a weight file with the details of the similar images. The YOLO compares the present image with the details available in the generated weight files and delivers an accurate predicted output. Now the algorithm will predict the type of object based on the dataset which is fed into the training process.

The objects can be identified up to 150 mm using the camera. This range varies with respect to the camera. The objects belonging to similar class are marked by a similar colour boundary as shown in the Figs. 6, 7 and 8.

Fig. 6.
figure 6

Predicted output with bounded class and names (sample 1)

Fig. 7.
figure 7

Predicted output with bounded class and names (sample 2)

Fig. 8.
figure 8

Predicted output showing numbers of each classes and increased confident level (sample 1)

The predicted output consists of objects being bounded by a box and named correspondingly which is shown in Fig. 9.

Fig. 9.
figure 9

Predicted output showing numbers of each classes and increased confident level (sample1 & sample 2)

The predicted output consists of objects being bounded by a box and named correspondingly which is shown in Fig. 10.

Fig. 10.
figure 10

Predicted output with bounded class and names (sample 3)

4 Conclusion

In this obligation we have determined and classified the type of objects like vehicles and human beings from the obtained images. The usage of YOLO algorithm decreased the process time and produces a more optimised output than the existing methodologies. By fixing a certain area in a captured image, people those who violate traffic rules can also be identified and a penalty can be made in a digital way. The number of vehicles in the image is found out and it can be used for traffic control. YOLO accessess to the whole image by predicting boundaries. We enforces spatial diversity in making predictions. Self driving (unmanned) cars can use this proposed technique for detecting the objects in front of them and driving without collisions. The outgrowth shows that, this process is more efficient and faster than the existing methods.