Modified Object Detection Method Based on YOLO

  • Xia ZhaoEmail author
  • Yingting Ni
  • Haihang Jia
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 773)


YOLO (You Only Look Once), the 2D object detection method, is extremely fast since a single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. However, it makes more localization errors and its training velocity is relatively slow. Benefiting from the thoughts of cluster center in super-pixel segmentation and anchor box in Faster R-CNN, in this paper, we propose a modified method based on YOLO (shorted for M-YOLO). First, we substituted YOLOs last fully connected layer for a convolutional layer, on which the cluster boxes (some anchor boxes centered on cluster center) can completely cover the whole image at the beginning of training. As a result, the new structure can speed up the training process. Second, we increase the number of divided grids i.e. cluster centers, from \( 7\times 7\) to the maximum \(17\times 17\), as well as the number of predicted bounding boxes, i.e. anchor boxes, from 2 to the maximum 9 for each grid cell. The measure can improve the IOU performance. Simultaneously, we also put forward a new kind of NMS (non-max suppression) to solve the problem aroused by M-YOLO. The experimental results show that M-YOLO improves the localization accuracy by about 10%, the convergence speed of the training process is also improved.


Deep learning Object detection Cluster center Anchor box 


  1. 1.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  2. 2.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, pp,. 1440–1448 (2015)Google Scholar
  3. 3.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  4. 4.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  5. 5.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. ECCV 1(2016), 21–37 (2016)Google Scholar
  6. 6.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.School of Electronics and Information EngineeringTongji UniversityShanghaiChina

Personalised recommendations