Collaborative Track Analysis, Data Cleansing, and Labeling

  • George Kamberov
  • Gerda Kamberova
  • Matt Burlick
  • Lazaros Karydas
  • Bart Luczynski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6938)


Tracking output is a very attractive source of labeled data sets that, in turn, could be used to train other systems for tracking, detection, recognition and categorization. In this context, long tracking sequences are of particular importance because they provide richer information, multiple views, wider range of appearances. This paper addresses two obstacles to the use of tracking data for training: noise in the tracking data and the unreliability and slow pace of hand labeling. The paper introduces a criterion for detecting inconsistencies (noise) in large data collections and a method for selecting typical representatives of consistent collections. Those are used to build a pipeline which cleanses the tracking data and employs instantaneous (shotgun) labeling of vast numbers of images. The shotgun labeled data is shown to compare favorably with hand labeled data when used in classification tasks. The framework is collaborative – it involves a human-in-the loop but it is designed to minimize the burden on the human.


Feature Vector Tracking Data Surveillance Video Label Data Tracker Output 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. In: Machine Learning, pp. 37–66 (1991)Google Scholar
  2. 2.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Jour. Artificial Intelligence Res. 11, 131–167 (1999)zbMATHGoogle Scholar
  3. 3.
    Chapelle, O., Zien, A., Schölkopf, B. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)Google Scholar
  4. 4.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Hauptmann, E.G., Hao Lin, W., Yan, R., Yang, J., Yu Chen, M.: Extreme video retrieval: joint maximization of human and computer performance. In: ACM Multimedia, pp. 385–394. ACM Press, New York (2006)Google Scholar
  6. 6.
    He, D., Zhu, X., Wu, X.: Error detection and uncertainty modeling for imprecise data. In: Proc. 21st International Conference on Tools with Artificial Intelligence, pp. 792–795 (2009)Google Scholar
  7. 7.
    Hoogs, A., Rittscher, J., Stein, G., Schmiederer, J.: Video content annotation using visual analysis and a large semantic knowledgebase. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. II: 327–334 (2003)Google Scholar
  8. 8.
    Jacob, M., Kuscher, A., Plauth, M., Thiele, C.: Automated data augmentation services using text mining, data cleansing and web crawling techniques. In: Proc. IEEE Congress on Services - Part I, pp. 136–143 (2008)Google Scholar
  9. 9.
    Jiang, H., Fels, S., Little, J.: A linear programming approach for multiple object tracking. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)Google Scholar
  10. 10.
    Lefort, R., Fablet, R., Boucher, J.-M.: Weakly supervised classification of objects in images using soft random forests. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 185–198. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Leistner, C., Grabner, H., Bischof, H.: Semi-supervised boosting using visual similarity learning. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  12. 12.
    Lin, H., Bilmes, J.: How to select a good training-data subset for transcription: Submodular active selection for sequences. In: Proc. 10th Annual Conference of the International Speech Communication Association (2009)Google Scholar
  13. 13.
    Muslea, I.M., Minton, S., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: Proc. 19th International Conference on Machine Learning, pp. 435–442 (2002)Google Scholar
  14. 14.
    Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. the 21st International Conference on Machine Learning, pp. 623–630. ACM Press, New York (2004)Google Scholar
  15. 15.
    Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Zhang, H.-J.: Two-dimensional active learning for image classification. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  16. 16.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2), 411–423 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Venkataraman, S., Metaxas, D., Fradkin, D., Kulikowski, C., Muchnik, I.: Distinguishing mislabeled data from correctly labeled data in classifier design. In: Proc. 16th IEEE Int. Conf. on Tools With Artificial Intelligence, pp. 668–672 (2004)Google Scholar
  18. 18.
    Vijayanarasimhan, S., Grauman, K.: What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2262–2269 (2009)Google Scholar
  19. 19.
    Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in video. In: Proc. IEEE Computer Vision and Pattern Recognition, pp. 657–663 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • George Kamberov
    • 1
  • Gerda Kamberova
    • 2
  • Matt Burlick
    • 1
  • Lazaros Karydas
    • 1
  • Bart Luczynski
    • 1
  1. 1.Department of Computer ScienceStevens Institute of TechnologyHobokenUSA
  2. 2.Department of Computer ScienceHofstra UniversityHempsteadUSA

Personalised recommendations