Interactive Feature Growing for Accurate Object Detection in Megapixel Images

  • Julius SchöningEmail author
  • Patrick Faion
  • Gunther Heidemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9913)


Automatic object detection in megapixel images is quite inaccurate and a time and memory expensive task, even with feature detectors and descriptors like SIFT, SURF, ORB, and KAZE. In this paper we propose an interactive feature growing process, which draws on the efficiency of the users’ visual system. The performance of the visual system in search tasks is not affected by the pixel density, so the users’ gazes are used to boost feature extraction for object detection.

Experimental tests of the interactive feature growing process show an increase of processing speed by \(50\,\%\) for object detection in 20 megapixel scenes at an object detection rate of \(95\,\%\). Based on this method, we discuss the prospects of interactive features, possible use cases and further developments.


Feature growing Interactive object detection Eye tracking Multivariate detectors 



This work was funded by German Research Foundation (DFG) as part of the Priority Program “Scalable Visual Analytics” (SPP 1335).

Supplementary material

431902_1_En_39_MOESM1_ESM.mp4 (6 mb)
Supplementary material 1 (mp4 6188 KB)
431902_1_En_39_MOESM2_ESM.pdf (2.6 mb)
Supplementary material 2 (pdf 2698 KB)


  1. 1.
    Alahi, A., Ortiz, R., Vandergheynst, P.: FREAK: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  2. 2.
    Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_16 CrossRefGoogle Scholar
  3. 3.
    Bay, H., Tuytelaars, T., Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). doi: 10.1007/11744023_32 CrossRefGoogle Scholar
  4. 4.
    Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: superpixels extracted via energy-driven sampling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 13–26. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33786-4_2 CrossRefGoogle Scholar
  5. 5.
    Chen, H.-P., Shen, X.-J., Long, J.-W.: Histogram-based colour image fuzzy clustering algorithm. Multimedia Tools Appl. 75, 11417–11432 (2016). doi: 10.1007/s11042-015-2860-6 CrossRefGoogle Scholar
  6. 6.
    Eriksen, C.W., Schultz, D.W.: Information processing in visual search: a continuous flow conception and experimental results. Percept. Psychophys. 25(4), 249–263 (1979)CrossRefGoogle Scholar
  7. 7.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012).
  8. 8.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report (2007)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004). doi: 10.1023/B:VISI.0000029664.99615.94 CrossRefGoogle Scholar
  10. 10.
    Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)CrossRefGoogle Scholar
  11. 11.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. Proc. Int. Conf. Comput. Vis. Theor. Appl. (VISAPP) 2, 331–340 (2009)Google Scholar
  12. 12.
    Nielsen, J.: Usability Engineering. Morgan Kaufmann, San Francisco (1993)zbMATHGoogle Scholar
  13. 13.
    Oliveira, I.O.d., Ono, K.V., Todt, E.: IGFTT: towards an efficient alternative to SIFT and SURF. In: International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), Full Papers Proceedings, pp. 73–80 (2015)Google Scholar
  14. 14.
    Romberg, S., Pueyo, L.G., Lienhart, R., van Zwol, R.: Scalable logo recognition in real-world images. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR 2011, pp. 25: 1–25: 8. ACM, New York (2011)Google Scholar
  15. 15.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV) (2011)Google Scholar
  16. 16.
    Schöning, J., Faion, P., Heidemann, G.: Semi-automatic ground truth annotation in videos: an interactive tool for polygon-based object annotation and segmentation. In: Proceedings of the 8th International Conference on Knowledge Capture (K-CAP), K-CAP 2015, pp. 17: 1–17: 4. ACM, New York (2015)Google Scholar
  17. 17.
    Schöning, J., Faion, P., Heidemann, G.: Pixel-wise ground truth annotation in videos - an semi-automatic approach for pixel-wise and semantic object annotation. In: Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 690–697 (2016)Google Scholar
  18. 18.
    Schöning, J., Heidemann, G.: Evaluation of multi-view 3D reconstruction software. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 450–461. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23117-4_39 CrossRefGoogle Scholar
  19. 19.
    Schöning, J., Heidemann, G.: Interactive 3D modeling - a survey-based perspective on interactive 3D reconstruction. In: Proceedings of the 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM) pp. 289–294 (2015)Google Scholar
  20. 20.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, MM 2007, pp. 357–360. ACM, New York (2007)Google Scholar
  21. 21.
    Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, London (2011). doi: 10.1007/978-1-84882-935-0 CrossRefzbMATHGoogle Scholar
  22. 22.
    Tanisaro, P., Schöning, J., Kurzhals, K., Heidemann, G., Weiskopf, D.: Visual analytics for video applications. It-Inf. Technol. 57, 30–36 (2015)Google Scholar
  23. 23.
    Tobii, A.B.: Tobii EyeX controller (2016).
  24. 24.
    Trick, L.M., Enns, J.T.: Lifespan changes in attention: the visual search task. Cogn. Dev. 13(3), 369–386 (1998)CrossRefGoogle Scholar
  25. 25.
    Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: SUN database: exploring a large collection of scene categories. Int. J. Comput. Vis. 119(1), 3–22 (2016). doi: 10.1007/s11263-014-0748-y MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Julius Schöning
    • 1
    Email author
  • Patrick Faion
    • 1
  • Gunther Heidemann
    • 1
  1. 1.Institute of Cognitive ScienceOsnabrück UniversityOsnabrückGermany

Personalised recommendations