Skip to main content

Spatially-Consistent Feature Matching and Learning for Heritage Image Analysis


Progress in the digitization of cultural assets leads to online databases that become too large for a human to analyze. Moreover, some analyses might be challenging, even for experts. In this paper, we explore two applications of computer vision to analyze historical data: watermark recognition and one-shot repeated pattern detection in artwork collections. Both problems present computer vision challenges which we believe to be representative of the ones encountered in cultural heritage applications: limited supervision is available, the tasks are fine-grained recognition, and the data comes in several different modalities. Both applications are also highly practical, as recognizing watermarks makes it possible to date and locate documents, while detecting repeated patterns allows exploring visual links between artworks. We demonstrate on both tasks the benefits of relying on deep mid-level features. More precisely, we define an image similarity score based on geometric verification of mid-level features and show how spatial consistency can be used to fine-tune out-of-the-box features for the target dataset with weak or no supervision. This paper relates and extends our previous works (Shen et al. in Discovering visual patterns in art collections with spatially-consistent feature learning, 2019; Shen et al. in Large-scale historical watermark recognition dataset and a new consistency-based approach, 2020). Our code and data are available at

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  • Aubry, M., Russell, B.C., & Sivic, J. (2014). Painting-to-3d model alignment via discriminative visual elements. ACM Transactions on Graphics (ToG).

  • Belongie, S., Malik, J., & Puzicha, J. (2001). Shape context: A new descriptor for shape matching and object recognition. In NeurIPS.

  • Bender, K. (2015). Distant viewing in art history. A case study of artistic productivity. International Journal for Digital Art History (1).

  • Bounou, O., Monnier, T., Pastrolin, I., Shen, X., Benevent, C., Limon-Bonnet, M.F., et al. (2020). A web application for watermark recognition. Journal of Data Mining and Digital Humanities.

  • Brendel, W., & Bethge, M. (2019). Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. In ICLR.

  • Briquet online.

  • Briquet, C. M. (1907). Les filigranes.

  • Brueghel family: Jan brueghel the elder.” the brueghel family database. University of California, Berkeley. Accessed 2018 October 16

  • Castellano, G., Lella, E., & Vessio, G. (2021). Visual link retrieval and knowledge discovery in painting datasets. Multimedia Tools and Applications.

  • Crowley, E. J., & Zisserman, A. (2013). Of gods and goats: Weakly supervised learning of figurative art. In BMVC.

  • Crowley, E. J., & Zisserman, A. (2016). The art of detection. In ECCV.

  • Crowley, E. J., Parkhi, O. M., & Zisserman, A. (2015). Face painting: Querying art with photos. In BMVC.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

  • Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NeurIPS.

  • Doersch, C., Gupta, A., & Efros, A. A. (2014). Context as supervisory signal: Discovering objects with predictable context. In ECCV.

  • Doersch, C., Gupta, A., & Efros, A. A. (2015). Unsupervised visual representation learning by context prediction. In ICCV.

  • Dutta, A., Gupta, A., & Zissermann, A. (2016). VGG image annotator (VIA).

  • Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., & Leal-Taixé, L. (2020). The group loss for deep metric learning. In ECCV.

  • Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). Can: Creative adversarial networks, generating“ art” by learning about styles and deviating from style norms. arXiv

  • Frauenknecht, E., Stieglecker, M. (2015). Wzis – wasserzeichen-informationssystem: Verwaltung und präsentation von wasserzeichen und ihrer metadaten. Kodikologie und Paläographie im Digitalen Zeitalter 3: Codicology and Palaeography in the Digital Age 3

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In CVPR.

  • Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In ICLR.

  • Gidaris, S., & Komodakis, N. (2018). Dynamic few-shot visual learning without forgetting. In CVPR.

  • Ginosar, S., Haas, D., Brown, T., & Malik, J. (2014). Detecting people in cubist art. In Workshop at ECCV.

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.

  • Gonthier, N., Gousseau, Y., Ladjal, S., & Bonfait, O. (2018). Weakly supervised object detection in artworks. arXiv

  • Gordo, A., Almazan, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. In IJCV.

  • Grauman, K., & Darrell, T. (2005). Pyramid match kernels: Discriminative classification with sets of image features. In ICCV.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • Hertzmann, A. (2018). Can computers create art? In Arts. Multidisciplinary Digital Publishing Institutes

  • Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., & Salesin, D. H. (2001). Image analogies. In SIGGRAPH.

  • Hiary, H. (2008). Paper-based watermark extraction with image processing. Ph.D. thesis.

  • Hiary, H., & Ng, K. (2007). A system for segmenting and extracting paper-based watermark designs. International Journal on Digital Libraries.

  • Honig, E. (2016). Jan Brueghel and the Senses of Scale. University Park: Pennsylvania State University Press.

    Google Scholar 


  • Jabri, A., Owens, A., & Efros, A. A. (2020). Space-time correspondence as a contrastive random walk. In NeurIPS.

  • Jenicek, T., & Chum, O. (2019). Linking art through human poses. In ICDAR.

  • Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., & Winnemoeller, H. (2013). Recognizing image style. arXiv

  • Kim, S., Kim, D., Cho, M., & Kwak, S. (2020). Proxy anchor loss for deep metric learning. In CVPR.

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv

  • Loing, V., Marlet, R., & Aubry, M. (2018). Virtual training for a real application: Accurate object-robot relative localization without calibration. In IJCV.

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. In IJCV.

  • Mao, H., Cheung, M., & She, J. (2017). Deepart: Learning joint representations of visual arts. In: ACM Multimedia

  • Massa, F., Russell, B. C., & Aubry, M. (2016). Deep exemplar 2d–3d detection by adapting from real to rendered views. In CVPR.

  • Mensink, T., & Van Gemert, J. (2014). The rijksmuseum challenge: Museum-centered visual recognition. In ICMR.

  • Noroozi, M., & Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV.

  • Paumard, M. M., Picard, D., & Tabia, H. (2018). Jigsaw puzzle solving using local feature co-occurrences in deep neural networks. In ICIP.

  • Picard, D., Gosselin, P. H., & Gaspard, M. C. (2015). Challenges in content-based image indexing of cultural heritage collections. IEEE Signal Processing Magazine.

  • Picard, D., Henn, T., & Dietz, G. (2016). Non-negative dictionary learning for paper watermark similarity. In ACSSC.

  • Piccard, G. (1977). Die Wasserzeichenkartei Piccard im Hauptstaatsarchiv Stuttgart: Wasserzeichen Buchstabe P.

  • Pondenkandath, V., Alberti, M., Eichenberger, N., Ingold, R., & Liwicki, M. (2018). Identifying cross-depicted historical motifs. arXiv

  • Qi, H., Brown, M., Lowe, D. G. (2018). Low-shot learning with imprinted weights. In CVPR.

  • Rad, M., Oberweger, M., Lepetit, V. (2018). Feature mapping for learning fast and accurate 3d pose inference from synthetic images. In CVPR.

  • Radenović, F., Tolias, G., & Chum, O. (2016). Fine-tuning cnn image retrieval with no human annotation. In TPAMI.

  • Rauber, C., Tschudin, P., & Pun, T. (1997). Retrieval of images from a library of watermarks for ancient paper identification. In Electronic Visualisation and the Arts.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS.

  • Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., & Sivic, J. (2018). Neighbourhood consensus networks. In NeurIPS.

  • Said, J., & Hiary, H. (2016). Watermark location via back-lighting modelling and verso registration. Multimedia Tools and Applications.

  • Seguin, B., diLenardo, I., & Kaplan, F. (2017). Tracking transmission of details in paintings. In DH.

  • Shen, X., Darmon, F., Efros, A. A., & Aubry, M. (2020). Ransac-flow: Generic two-stage image alignment. In ECCV.

  • Shen, X., Efros, A. A., & Aubry, M. (2019). Discovering visual patterns in art collections with spatially-consistent feature learning. In CVPR.

  • Shen, X., Pastrolin, I., Bounou, O., Gidaris, S., Smith, M., Poncet, O., & Aubry, M. (2020). Large-scale historical watermark recognition: Dataset and a new consistency-based approach. In ICPR.

  • Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA.

  • Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.

  • Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In ICCV.

  • Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In NeurIPS.

  • Strezoski, G., & Worring, M. (2017). Omniart: Multi-task deep learning for artistic data analysis. arXiv

  • Su, H., Qi, C.R., Li, Y., & Guibas, L.J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV.

  • Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In AAAI.

  • Tan, W. R., Chan, C. S., Aguirre, H. E., & Tanaka, K. (2016). Ceci n’est pas une pipe: A deep convolutional network for fine-art paintings classification. In ICIP.

  • Teh, E. W., DeVries, T., & Taylor, G. W. (2020). Proxynca++: Revisiting and revitalizing proxy neighborhood component analysis. In ECCV.

  • timemachine.

  • Úbeda, I., Saavedra, J. M., Nicolas, S., Petitjean, C., & Heutte, L. (2019). Pattern spotting in historical documents using convolutional models. In Proceedings of the 5th international workshop on historical document imaging and processing (pp. 60–65).

  • Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D., et al. (2016). Matching networks for one shot learning. In NeurIPS.

  • Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: The kernel recipe. In ICCV.

  • Wang, X., Jabri, A., & Efros, A. A. (2019). Learning correspondence from the cycle-consistency of time. In CVPR.

  • Westlake, N., Cai, H., & Hall, P. (2016). Detecting people in artwork with cnns. In ECCV.

  • Wilber, M. J., Fang, C., Jin, H., Hertzmann, A., Collomosse, J., & Belongie, S. J. (2017). Bam! the behance artistic media dataset for recognition beyond photography. In ICCV.

  • Yin, R., Monson, E., Honig, E., Daubechies, I., & Maggioni, M. (2016). Object recognition in art drawings: Transfer of a neural network. In ICASSP.

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. In ICCV.

  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xi Shen.

Additional information

Communicated by Katsushi Ikeuchi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, X., Champenois, R., Ginosar, S. et al. Spatially-Consistent Feature Matching and Learning for Heritage Image Analysis. Int J Comput Vis 130, 1325–1339 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Feature learning
  • Self-supervised learning
  • Artwork analysis
  • Watermark recognition