Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

  • Mehdi Noroozi
  • Paolo Favaro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)


We propose a novel unsupervised learning approach to build features suitable for object detection and classification. The features are pre-trained on a large dataset without human annotation and later transferred via fine-tuning on a different, smaller and labeled dataset. The pre-training consists of solving jigsaw puzzles of natural images. To facilitate the transfer of features to other tasks, we introduce the context-free network (CFN), a siamese-ennead convolutional neural network. The features correspond to the columns of the CFN and they process image tiles independently (i.e., free of context). The later layers of the CFN then use the features to identify their geometric arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. We pre-train the CFN on the training set of the ILSVRC2012 dataset and transfer the features on the combined training and validation set of Pascal VOC 2007 for object detection (via fast RCNN) and classification. These features outperform all current unsupervised features with \(51.8\,\%\) for detection and \(68.6\,\%\) for classification, and reduce the gap with supervised learning (\(56.5\,\%\) and \(78.2\,\%\) respectively).


Unsupervised learning Image representation learning Self-supervised learning Feature transfer 



We thank Philipp Krähenbühl for his assistance with the experiments on Pascal VOC 2007 and for kindly evaluating our CFN weights on his configuration for classification with Pascal VOC 2007.


  1. 1.
    Agrawal, P., Girshick, R., Malik, J.: Analyzing the performance of multilayer neural networks for object recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 329–344. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_22 Google Scholar
  2. 2.
    Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: ICCV (2015)Google Scholar
  3. 3.
    Barlow, H.B.: Unsupervised learning. Neural Comput. 1, 295–311 (1989)CrossRefGoogle Scholar
  4. 4.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. PAMI 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  6. 6.
    Boulard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59, 291–294 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chen, D.M., Baatz, G., Koser, K., Tsai, S.S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., Grzeszczuk, R.: City-scale landmark identification on mobile devices. In: CVPR (2011)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  9. 9.
    Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)Google Scholar
  10. 10.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)Google Scholar
  11. 11.
    Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. IJCV (2014)Google Scholar
  12. 12.
    Freeman, H., Garder, L.: Apictorial jigsaw puzzles: the computer solution of a problem in pattern recognition. IEEE Trans. Electron. Comput. EC–13, 118–127 (1964)CrossRefGoogle Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  14. 14.
    Girshick, R.: Fast r-cnn. In: ICCV (2015)Google Scholar
  15. 15.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. PAMI (2015)Google Scholar
  16. 16.
    Hinton, G.E., Sejnowski, T.J.: Learning and relearning in boltzmann machines. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1 (1986)Google Scholar
  17. 17.
    Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. NIPS (1993)Google Scholar
  18. 18.
    Hooper, H.: The Hooper Visual Organization Test. Western Psychological Services, Los Angeles (1983)Google Scholar
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  20. 20.
    Jason, Y., Jeff, C., Anh, N., Thomas, F., Hod, L.: Understanding neural networks through deep visualization. In: Deep Learning Workshop, ICML (2015)Google Scholar
  21. 21.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. ACM-MM (2014)Google Scholar
  22. 22.
    Krähenbühl, P., Doersch, C., Donahue, J., Darrell, T.: Data-dependent initializations of convolutional neural networks. In: ICLR (2016)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. NIPS (2012)Google Scholar
  24. 24.
    Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: ICML (2012)Google Scholar
  25. 25.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)Google Scholar
  26. 26.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Research (1997)Google Scholar
  27. 27.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)Google Scholar
  28. 28.
    Pomeranz, D., Shemesh, M., Ben-Shahar, O.: A fully automated greedy square jigsaw puzzle solver. In: CVPR (2011)Google Scholar
  29. 29.
    Pomeranz, D.: Solving the square jigsaw problem. Ph.D. thesis, Ben-Gurion University of the Negev (2012)Google Scholar
  30. 30.
    Richardson, J., Vecchi, T.: A jigsaw-puzzle imagery task for assessing active visuospatial processes in old and young people. Behavior Research Methods, Instruments, & Computers (2002)Google Scholar
  31. 31.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science (2000)Google Scholar
  32. 32.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. In: ICLR (2014)Google Scholar
  33. 33.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. NIPS (2014)Google Scholar
  34. 34.
    Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing (1986)Google Scholar
  35. 35.
    Tybon, R.: Generating Solutions to the Jigsaw Puzzle Problem. Ph.D. thesis, Griffith University (2004)Google Scholar
  36. 36.
    Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)Google Scholar
  37. 37.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? NIPS (2014)Google Scholar
  38. 38.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Institute of InformaticsUniversity of BernBernSwitzerland

Personalised recommendations