Multimodal metadata assignment for cultural heritage artifacts

Rei, Luis; Mladenic, Dunja; Dorozynski, Mareike; Rottensteiner, Franz; Schleider, Thomas; Troncy, Raphaël; Lozano, Jorge Sebastián; Salvatella, Mar Gaitán

doi:10.1007/s00530-022-01025-2

Multimodal metadata assignment for cultural heritage artifacts

Regular Paper
Published: 21 November 2022

Volume 29, pages 847–869, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Luis Rei^1,2,
Dunja Mladenic¹,
Mareike Dorozynski³,
Franz Rottensteiner³,
Thomas Schleider⁴,
Raphaël Troncy⁴,
Jorge Sebastián Lozano⁵ &
…
Mar Gaitán Salvatella⁵

486 Accesses
3 Citations
6 Altmetric
Explore all metrics

Abstract

We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

AI art in architecture

Article Open access 17 August 2023

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Article Open access 13 September 2023

Notes

References

Akiba, T., Sano, S., Yanase, T., et al.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International conference on knowledge discovery and data mining (2019)
Arik, S.O., Pfister, T.: Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI conference on artificial intelligence 35(8):6679–6687. (2021) https://ojs.aaai.org/index.php/AAAI/article/view/16826
Arora, R.S., Elgammal, A.M.: Towards automated classification of fine-art painting style: a comparative study. In: International conference on pattern recognition, pp. 3541–3544 (2012)
Belhi, A., Bouras, A., Foufou, S.: Leveraging known data for missing label prediction in cultural heritage context. Appl. Sci. (2018). https://doi.org/10.3390/app8101768
Article Google Scholar
Belhi, A., Bouras, A., Foufou, S.: Towards a hierarchical multitask classification framework for cultural heritage. In: 2018 IEEE/ACS 15th international conference on computer systems and applications (AICCSA), IEEE, pp. 1–7 (2018b)
Bishop, C.M.: Pattern Recognition and Machine Learning, 1st edn. Springer, New York (NY), USA (2006)
MATH Google Scholar
Blessing, A., Wen, K.: Using machine learning for identification of art paintings. Technical report. Stanford University, USA (2010)
Bojanowski, P., Grave, E., Joulin, A., et al.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Caruana, R.A.: Multitask learning: a knowledge-based source of inductive bias. In: International conference on machine learning, pp. 41–48 (1993)
Castellano, G., Vessio, G.: Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview. Neural Comput. Appl. 33(19), 12263–12282 (2021)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for computing machinery, New York, NY, USA, pp. 785–794, https://doi.org/10.1145/2939672.2939785(2016)
Conde, M.V., Turgutlu, K.: Clip-art: contrastive pre-training for fine-grained art classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops, pp. 3956–3960 (2021)
Conneau, A., Khandelwal, K., Goyal, N., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for computational linguistics, Online, pp. 8440–8451, https://doi.org/10.18653/v1/2020.acl-main.747 (2020)
Crawshaw, M.: Multi-task learning with deep neural networks: a survey. arXiv preprint arXiv:2009.09796 (2020)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893 (2005)
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol. 1 (Long and Short Papers). Association for computational linguistics, Minneapolis, Minnesota, pp. 4171–4186, https://doi.org/10.18653/v1/N19-1423 (2019)
Doerr, M.: The CIDOC CRM, an ontological approach to schema heterogeneity. In: Semantic interoperability and integration (2005)
Dorozynski, M., Clermont, D., Rottensteiner, F.: Multi-task deep learning with incomplete training samples for the image-based prediction of variables describing silk fabrics. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 4(2/W6), 47–54 (2019)
Article Google Scholar
Fiorucci, M., Khoroshiltseva, M., Pontil, M., et al.: Machine learning for cultural heritage: a survey. Pattern Recogn. Lett. 133, 102–108 (2020)
Article Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Gao, Y., Li, Y., Lin, Y., et al.: Deep learning on knowledge graph for recommender system: a survey. CoRR abs/2004.00387. arXiv:2004.00387 (2020)
Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. In: Proceedings of the European conference in computer vision workshops (2018)
Garcia, N., Renoust, B., Nakashima, Y.: Contextnet: representation and exploration for painting classification and retrieval in context. Int. J. Multimed. Inf. Ret. 9(1), 17–30 (2020)
Article Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016a)
He, K., Zhang, X., Ren, S., et al.: Identity mappings in deep residual networks. In: European conference on computer vision, pp. 630–645 (2016b)
Hyvönen, E., Mäkelä, E., Kauppinen, T., et al.: Culturesampo: a national publication system of cultural heritage on the semantic web 2.0. In: Aroyo, L., Traverso, P., Ciravegna, F., et al. (eds.) The Semantic Web: Research and Applications, pp. 851–856. Springer, Berlin Heidelberg (2009)
Chapter Google Scholar
Iqbal Hussain, M.A., Khan, B., Wang, Z., et al.: Woven fabric pattern recognition and classification based on deep convolutional neural networks. Electronics 9(6), 1048 (2020)
Article Google Scholar
Joulin, A., Bojanowski, P., Mikolov, T., et al.: Loss in translation: learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 conference on empirical methods in natural language processing (2018)
Kadra, A., Lindauer, M., Hutter, F., et al.: Well-tuned simple nets excel on tabular datasets. In: Beygelzimer A., Dauphin Y., Liang P., et al. (eds) Advances in Neural Information Processing Systems, https://openreview.net/forum?id=d3k38LTDCyO (2021)
Kingma, DP., Ba, J. Adam: A method for stochastic optimization. In: 3rd International conference on learning representations (ICLR 2015) (2015a)
Kingma, DP., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (Poster), arXiv:1412.6980 (2015b)
Koch, I., Ribeiro, C., Lopes, C.: ArchOnto, a CIDOC-CRM-based linked data model for the Portuguese archives, Springer, pp 133–146. https://doi.org/10.1007/978-3-030-54956-5_10(2020)
Krizhevsky, A., Sutskever, I., Hinton, GE.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (NIPS’12), pp 1097–1105 (2012)
LeCun, Y., Boser, B., Denker, J.S., et al.: Backpropagation applied to handwritten ZIP code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Li, X., Chen, C.H., Zheng, P., et al.: A knowledge graph-aided concept-knowledge approach for evolutionary smart product-service system development. J. Mech. Des. 142(101), 403 (2020). https://doi.org/10.1115/1.4046807
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 (2017)
Liu, W., Chen, L., Chen, Y.: Age classification using convolutional neural networks with the multi-class focal loss. IOP Conf. Ser.: Mater. Sci. Eng. 428(012), 043 (2018). https://doi.org/10.1088/1757-899x/428/1/012043
Article Google Scholar
Liu, X., He, P., Chen, W., et al.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for computational linguistics, Florence, Italy, pp. 4487–4496. https://doi.org/10.18653/v1/P19-1441 (2019a)
Liu, Y., Ott, M., Goyal, N., et al.: Roberta: a robustly optimized Bert pretraining approach. 1907.11692 (2019b)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9 (2019)
Meng, S., Pan, R., Gao, W., et al.: A multi-task and multi-scale convolutional neural network for automatic recognition of woven fabric pattern. J. Intell. Manuf. 32(4), 1147–1161 (2021)
Article Google Scholar
Mensink, T., Van Gemert, J.: The Rijksmuseum challenge: museum-centered visual recognition. In: Proceedings of international conference on multimedia retrieval, pp. 451–454 (2014)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814 (2010)
Palagi, E.: Evaluating exploratory search engines: designing a set of user-centered methods based on a modeling of the exploratory search process. PhD thesis, Université Côte d’Azur (2018)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach H., Larochelle H., Beygelzimer A., et al (eds) Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 8024–8035 (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Puarungroj, W., Boonsirisumpun, N.: Recognizing hand-woven fabric pattern designs based on deep learning. In: Advances in Computer Communication and Computational Sciences, pp. 325–336. Springer, Singapore (2019)
Chapter Google Scholar
Radford, A., Kim, JW., Hallacy, C., et al.: Learning transferable visual models from natural language supervision. In: Meila M., Zhang T. (eds) Proceedings of the 38th international conference on machine learning, proceedings of machine learning research, vol. 139. PMLR, pp. 8748–8763 (2021)
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
Ruotsalo, T., Aroyo, L., Schreiber, G., et al.: Knowledge-based linguistic annotation of digital cultural heritage collections. IEEE Intell. Syst. 24(2), 64 (2009)
Article Google Scholar
Santos, I., Castro, L., Rodriguez-Fernandez, N., et al.: Artificial neural networks and deep learning in the visual arts: a review. Neural Comput. Appl. 33(1), 121–157 (2021)
Article Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on computer vision and pattern recognition workshops, pp 806–813 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(2014), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Stefanini, M., Cornia, M., Baraldi, L., et al.: Artpedia: a new visual-semantic dataset with visual and contextual sentences. In: Proceedings of the international conference on image analysis and processing (2019)
Strezoski, G., Worring, M.: Omniart: multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684 (2017)
Sur, D., Blaine, E.: Cross-depiction transfer learning for art classification. Tech. Rep. CS 231A and CS 231N, Stanford University, USA (2017)
Tan, WR., Chan, C.S., Aguirre, HE., et al.: Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In: IEEE International conference on image processing, pp. 3703–3707 (2016)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Guyon I., Luxburg U.V., Bengio S., et al (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (2017)
Wolf, T., Debut, L., Sanh, V., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. Association for computational linguistics, Online, pp 38–45, https://www.aclweb.org/anthology/2020.emnlp-demos.6 (2020)
Xiao, Z., Liu, X., Wu, J., et al.: Knitted fabric structure recognition based on deep learning. J. Text. Inst. 109(9), 1–7 (2018)
Article Google Scholar
Yosinski, J., Clune, J., Bengio, Y., et al.: How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 27, 3320–3328 (2014)
Google Scholar
Zhang, C., Kaeser-Chen, C., Vesom, G., et al.: The imet collection 2019 challenge dataset. arXiv preprint arXiv:1906.00901 (2019)
Zou, X.: A survey on application of knowledge graph. J. Phys: Conf. Ser. 1487(012), 016 (2020). https://doi.org/10.1088/1742-6596/1487/1/012016
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Slovenian Research Agency and the European Union’s Horizon 2020 research and innovation program under SILKNOW grant agreement No. 769504.

Author information

Authors and Affiliations

Department for Artificial Intelligence, Jošef Stefan Institute, Jamova Cesta 39, 1000, Ljubljana, Slovenia
Luis Rei & Dunja Mladenic
Jošef Stefan Institute International Postgraduate School, Jamova Cesta 39, 1000, Ljubljana, Slovenia
Luis Rei
Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, Nienburger Straße 1, D-30167, Hannover, Germany
Mareike Dorozynski & Franz Rottensteiner
EURECOM, Campus SophiaTech, 450 Route des Chappes, 06410, Biot, France
Thomas Schleider & Raphaël Troncy
Departamento de Historia del Arte, Universitat de València, Spain, Av. Blasco Ibáñez 28, 46010, Valencia, Spain
Jorge Sebastián Lozano & Mar Gaitán Salvatella

Authors

Luis Rei
View author publications
You can also search for this author in PubMed Google Scholar
Dunja Mladenic
View author publications
You can also search for this author in PubMed Google Scholar
Mareike Dorozynski
View author publications
You can also search for this author in PubMed Google Scholar
Franz Rottensteiner
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Schleider
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Troncy
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Sebastián Lozano
View author publications
You can also search for this author in PubMed Google Scholar
Mar Gaitán Salvatella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Rei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rei, L., Mladenic, D., Dorozynski, M. et al. Multimodal metadata assignment for cultural heritage artifacts. Multimedia Systems 29, 847–869 (2023). https://doi.org/10.1007/s00530-022-01025-2

Download citation

Received: 30 May 2022
Accepted: 11 November 2022
Published: 21 November 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00530-022-01025-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal metadata assignment for cultural heritage artifacts

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

AI art in architecture

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal metadata assignment for cultural heritage artifacts

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

AI art in architecture

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation