Parcel Tracking by Detection in Large Camera Networks
- 1.3k Downloads
Abstract
Inside parcel distribution hubs, several tenth of up 100 000 parcels processed each day get lost. Human operators have to tediously recover these parcels by searching through large amounts of video footage from the installed large-scale camera network. We want to assist these operators and work towards an automatic solution. The challenge lies both in the size of the hub with a high number of cameras and in the adverse conditions. We describe and evaluate an industry scale tracking framework based on state-of-the-art methods such as Mask R-CNN. Moreover, we adapt a siamese network inspired feature vector matching with a novel feature improver network, which increases tracking performance. Our calibration method exploits a calibration parcel and is suitable for both overlapping and non-overlapping camera views. It requires little manual effort and needs only a single drive-by of the calibration parcel for each conveyor belt. With these methods, most parcels can be tracked start-to-end.
Keywords
Multi-object tracking Tracking by Detection Instance segmentation Camera network calibrationNotes
Acknowledgments
This work was supported by the Central Innovation Programme for SMEs of the Federal Ministry for Economic Affairs and Energy of Germany under grant agreement number 16KN044302.
Supplementary material
Supplementary material 1 (mp4 6877 KB)
Supplementary material 2 (mp4 11745 KB)
References
- 1.Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008 (2008). https://doi.org/10.1155/2008/246309CrossRefGoogle Scholar
- 2.Bewley, A., Ge, Z., Ott, L., Ramos, F.T., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing, pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003
- 3.Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 2544–2550 (2010). https://doi.org/10.1109/CVPR.2010.5539960
- 4.Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6, pp. 737–744 (1993)Google Scholar
- 5.Chahyati, D., Fanany, M.I., Arymurthy, A.M.: Tracking people by detection using CNN features. Proc. Comput. Sci. 124, 167–172 (2017). https://doi.org/10.1016/j.procs.2017.12.143CrossRefGoogle Scholar
- 6.Danelljan, M., Khan, F.S., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014). https://doi.org/10.1109/CVPR.2014.143
- 7.Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Marín-Jiménez, M.J.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47(6), 2280–2292 (2014). https://doi.org/10.1016/j.patcog.2014.01.005CrossRefGoogle Scholar
- 8.Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: Proceedings of the British Machine Vision Conference 2006, pp. 47–56 (2006). https://doi.org/10.5244/C.20.6
- 9.He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
- 10.Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45CrossRefGoogle Scholar
- 11.Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 702–715. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_50CrossRefGoogle Scholar
- 12.Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-backward error: automatic detection of tracking failures. In: 20th International Conference on Pattern Recognition, pp. 2756–2759 (2010). https://doi.org/10.1109/ICPR.2010.675
- 13.Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012). https://doi.org/10.1109/TPAMI.2011.239CrossRefGoogle Scholar
- 14.Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016). https://doi.org/10.1109/CVPR.2016.95
- 15.Karaca, H.N., Akınlar, C.: A multi-camera vision system for real-time tracking of parcels moving on a conveyor belt. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 708–717. Springer, Heidelberg (2005). https://doi.org/10.1007/11569596_73CrossRefGoogle Scholar
- 16.Kroeger, T., Timofte, R., Dai, D., Van Gool, L.: Fast optical flow using dense inverse search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 471–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_29CrossRefGoogle Scholar
- 17.Kuhn, H.W., Yaw, B.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2, 83–97 (1955)MathSciNetCrossRefGoogle Scholar
- 18.Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese CNN for robust target association. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 418–425 (2016). https://doi.org/10.1109/CVPRW.2016.59
- 19.Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017). https://doi.org/10.1109/CVPR.2017.472
- 20.Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
- 21.Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
- 22.Lukezic, A., Vojír, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter tracker with channel and spatial reliability. Int. J. Comput. Vis. 126(7), 671–688 (2018). https://doi.org/10.1007/s11263-017-1061-3MathSciNetCrossRefGoogle Scholar
- 23.Matterport: Mask R-CNN for object detection and segmentation. https://github.com/matterport/Mask_RCNN
- 24.Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking (2016). https://arxiv.org/abs/1603.00831
- 25.Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4225–4232 (2017)Google Scholar
- 26.Radke, R.J., Andra, S., Al-Kofahi, O., Roysam, B.: Image change detection algorithms: a systematic survey. IEEE Trans. Image Process. 14(3), 294–307 (2005). https://doi.org/10.1109/TIP.2004.838698MathSciNetCrossRefGoogle Scholar
- 27.Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). https://arxiv.org/abs/1804.02767
- 28.Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031CrossRefGoogle Scholar
- 29.Shin, I.S., Nam, S.H., Yu, H.G., Roberts, R.G., Moon, S.B.: Conveyor visual tracking using robot vision. In: Proceedings of 2006 Florida Conference on Recent Advances in Robotics, pp. 1–5. Citeseer (2006)Google Scholar
- 30.Tang, Z., Miao, Z., Wan, Y.: Background subtraction using running Gaussian average and frame difference. In: Ma, L., Rauterberg, M., Nakatsu, R. (eds.) ICEC 2007. LNCS, vol. 4740, pp. 411–414. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74873-1_50CrossRefGoogle Scholar
- 31.Tomasi, C., Kanade, T.: Detection and tracking of feature points. Technical report. Carnegie Mellon University, Technical Report CMU-CS-91-132 (1991)Google Scholar
- 32.Wang, X., Türetken, E., Fleuret, F., Fua, P.: Tracking interacting objects optimally using integer programming. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 17–32. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_2CrossRefGoogle Scholar
- 33.Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, vol. 18, pp. 1473–1480 (2005)Google Scholar
- 34.Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). https://arxiv.org/abs/1212.5701