Skip to main content
Log in

CornerNet: Detecting Objects as Paired Keypoints

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Unless otherwise specified, our convolution module consists of a convolution layer, a BN layer (Ioffe and Szegedy 2015) and a ReLU layer.

  2. We use the best model publicly available on https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md.

References

  • Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2874–2883).

  • Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS—Improving object detection with one line of code. In 2017 IEEE international conference on computer vision (ICCV) (pp. 5562–5570). IEEE.

  • Cai, Z., Fan, Q., Feris, R. S., & Vasconcelos, N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. In European conference on computer vision (pp. 354–370). Springer.

  • Cai, Z., & Vasconcelos, N. (2017). Cascade R-CNN: Delving into high quality object detection. arXiv preprint arXiv:1712.00726.

  • Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., & Feng, J. (2017). Dual path networks. In Advances in neural information processing systems (pp. 4470–4478).

  • Dai, J., Li, Y., He, K., & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. arXiv preprint arXiv:1605.06409.

  • Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. CoRR, arXiv:1703.06211.

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, CVPR 2009 (pp. 248–255). IEEE.

  • Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. (2014). Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2147–2154).

  • Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.

    Article  Google Scholar 

  • Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.

  • Girshick, R. (2015). Fast R-CNN. arXiv preprint arXiv:1504.08083.

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. arxiv preprint arxiv: 170306870.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European conference on computer vision (pp. 346–361). Springer.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026–1034).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE CVPR.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456).

  • Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. In Computer Vision–ECCV 2018 (pp. 816–832). Springer.

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  • Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., & Chen, Y. (2017). Ron: Reverse connection with objectness prior networks for object detection. arXiv preprint arXiv:1707.01691.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., & Sun, J. (2017). Light-head R-CNN: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264.

  • Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2016). Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144.

  • Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. arXiv preprint arXiv:1708.02002.

  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer.

  • Newell, A., & Deng, J. (2017). Pixels to graphs by associative embedding. In Advances in neural information processing systems (pp. 2168–2177).

  • Newell, A., Huang, Z., & Deng, J. (2017). Associative embedding: End-to-end learning for joint detection and grouping. In Advances in neural information processing systems (pp. 2274–2284).

  • Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch.

  • Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).

  • Redmon, J., & Farhadi, A. (2016). Yolo9000: Better, faster, stronger. arXiv preprint, 1612.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).

  • Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., & Xue, X. (2017a). Dsod: Learning deeply supervised object detectors from scratch. In The IEEE international conference on computer vision (ICCV) (Vol. 3, p. 7).

  • Shen, Z., Shi, H., Feris, R., Cao, L., Yan, S., Liu, D., Wang, X., Xue, X., & Huang, T. S. (2017b). Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886.

  • Shrivastava, A., Sukthankar, R., Malik, J., & Gupta, A. (2016). Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Singh, B., & Davis, L. S. (2017). An analysis of scale invariance in object detection-snip. arXiv preprint arXiv:1711.08189.

  • Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI (Vol. 4, p. 12).

  • Tychsen-Smith, L., & Petersson, L. (2017a). Denet: Scalable real-time object detection with directed sparse sampling. arXiv preprint arXiv:1703.10295.

  • Tychsen-Smith, L., & Petersson, L. (2017b). Improving object localization with fitness NMS and bounded IOU loss. arXiv preprint arXiv:1711.00164.

  • Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Wang, X., Chen, K., Huang, Z., Yao, C., & Liu, W. (2017). Point linking network for object detection. arXiv preprint arXiv:1706.03646.

  • Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2016). Subcategory-aware convolutional neural networks for object proposals and detection. arXiv preprint arXiv:1604.04693.

  • Xu, H., Lv, X., Wang, X., Ren, Z., & Chellappa, R. (2017). Deep regionlets for object detection. arXiv preprint arXiv:1712.02408.

  • Zhai, Y., Fu, J., Lu, Y., & Li, H. (2017). Feature selective networks for object detection. arXiv preprint arXiv:1711.08879.

  • Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2017). Single-shot refinement neural network for object detection. arXiv preprint arXiv:1711.06897.

  • Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., & Lu, H. (2017). Couplenet: Coupling global structure with local parts for object detection. In Proceedings of international conference on computer vision (ICCV).

  • Zitnick, C. L., Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (pp. 391–405). Springer.

Download references

Acknowledgements

This work is partially supported by a Grant from Toyota Research Institute and a DARPA Grant FA8750-18-2-0019. This article solely reflects the opinions and conclusions of its authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hei Law.

Additional information

Communicated by Vittorio Ferrari.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Law, H., Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int J Comput Vis 128, 642–656 (2020). https://doi.org/10.1007/s11263-019-01204-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01204-1

Keywords

Navigation