Mining exoticism from visual content with fusion-based deep neural networks

  • Andrea Ceroni
  • Chenyang Ma
  • Ralph EwerthEmail author
Regular Paper


Exoticism is the charm of the unfamiliar or something remote. It has received significant interest in different kinds of arts, but although visual concept classification in images and videos for semantic multimedia retrieval has been researched for years, the visual concept of exoticism has not been investigated yet from a computational perspective. In this paper, we present the first approach to automatically classify images as exotic or non-exotic. We have gathered two large datasets that cover exoticism in a general as well as a concept-specific way. The datasets have been annotated in a crowdsourcing approach. To circumvent cultural differences in the annotation, only North American crowdworkers are employed for this task. Two deep learning architectures to learn the concept of exoticism are evaluated. Besides deep learning features, we also investigate the usefulness of hand-crafted features, which are combined with deep features in our proposed fusion-based approach. Different machine learning models are compared with the fusion-based approach, which is the best performing one, reaching an accuracy over 83% and 91% on two different datasets. Comprehensive experimental results provide insights into which features contribute at most to recognizing exoticism. The estimation of image exoticism could be applied in fields like advertising and travel suggestions, as well as to increase serendipity and diversity of recommendations and search results.


Image retrieval Visual concept classification Exoticism 



  1. 1.
    Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE CVPR ’09Google Scholar
  2. 2.
    Adamopoulos P, Tuzhilin A (2015) On unexpectedness in recommender systems: or how to better expect the unexpected. ACM TIST 5(4):54Google Scholar
  3. 3.
    Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retrieval 12(5):526–558CrossRefGoogle Scholar
  4. 4.
    Borth D, Chen T, Ji R, Chang S (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: MM’13Google Scholar
  5. 5.
    Bradski G (2000) The openCV library. Dr. Dobb’s J Softw Tools 120:122–125Google Scholar
  6. 6.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML ’14Google Scholar
  7. 7.
    Editors of the American Heritage Dictionaries (2018) The American heritage dictionary of the English language. Accessed 18 Jan 2019
  8. 8.
    Eickhoff C, de Vries AP (2013) Increasing cheat robustness of crowdsourcing tasks. Inf Retrieval 16(2):121–137CrossRefGoogle Scholar
  9. 9.
    Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRefGoogle Scholar
  10. 10.
    Ewerth R, Springstein M, Phan-Vogtmann LA, Schütze J (2017) “Are machines better than humans in image tagging?”: a user study adds to the puzzle. In: Jose JM, Hauff C, Altıngovde IS, Song D, Albakour D, Watt S, Tait J (eds) Advances in information retrieval. Springer, Cham, pp 186–198CrossRefGoogle Scholar
  11. 11.
    Ge M, Delgado-Battenfeld C, Jannach D (2010) Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: RecSys ’10Google Scholar
  12. 12.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR ’14Google Scholar
  13. 13.
    Goldwater RJ (1986) Primitivism in modern art. Harvard University Press, CambridgeCrossRefGoogle Scholar
  14. 14.
    Gracia J, Montiel-Ponsoda E, Cimiano P, Gómez-Pérez A, Buitelaar P, McCrae J (2012) Challenges for the multilingual web of data. Web Semant Sci Serv Agents World Wide Web 11:63–71CrossRefGoogle Scholar
  15. 15.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  16. 16.
    Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of WaikatoGoogle Scholar
  17. 17.
    Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804CrossRefGoogle Scholar
  18. 18.
    Hare J, Samangooei S, Dupplaw D (2011) OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: MM ’11Google Scholar
  19. 19.
    Howarth P, Rüger S (2004) Evaluation of texture features for content-based image retrieval. In: CIVR ’04Google Scholar
  20. 20.
    Hull DA, Grefenstette G (1996) Querying across languages: a dictionary-based approach to multilingual information retrieval. In: SIGIR ’96Google Scholar
  21. 21.
    Jacobs M (1995) The painted voyage: art, travel and exploration, 1564–1875 (Art History). British Museum Press, LondonGoogle Scholar
  22. 22.
    Jenkins OH (1999) Understanding and measuring tourist destination images. Int J Tour Res 1:1–15CrossRefGoogle Scholar
  23. 23.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM MM ’14Google Scholar
  24. 24.
    Jones A (2007) This is not a cruise. Accessed 18 Jan 2019
  25. 25.
    Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: MM ’15Google Scholar
  26. 26.
    Kaminskas M, Bridge D (2017) Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans Interact Intell Syst 7(1):2Google Scholar
  27. 27.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS ’12Google Scholar
  28. 28.
    Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174CrossRefzbMATHGoogle Scholar
  29. 29.
    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755Google Scholar
  30. 30.
    Locke RP (2009) Musical exoticism. Images and reflections. Cambridge University Pres, CambridgeGoogle Scholar
  31. 31.
    Luo Y, Tang X (2008) Photo and video quality evaluation: focusing on the subject. In: ECCV ’08Google Scholar
  32. 32.
    Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: ACM MM’10Google Scholar
  33. 33.
    Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2015) ITI-CERTH participation to TRECVID 2015. In: TRECVID 2015 workshopGoogle Scholar
  34. 34.
    Mavridaki E, Mezaris V (2014) No-reference blur assessment in natural images using Fourier transform and spatial pyramids. In: ICIP ’14Google Scholar
  35. 35.
    Mavridaki E, Mezaris V (2015) A comprehensive aesthetic quality assessment method for natural images using basic rules of photography. In: IEEE ICIP ’15Google Scholar
  36. 36.
    Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: ACL ’07Google Scholar
  37. 37.
    Merriam-Webster Online (2018) Merriam-Webster’s dictionary of English usage. Accessed 18 Jan 2019
  38. 38.
    Müller-Budack E, Pustu-Iren K, Ewerth R (2018) Geolocation estimation of photos using a hierarchical model and scene classification. In: European conference on computer vision (ECCV). Springer, Munich, pp 575–592Google Scholar
  39. 39.
    Nguyen TT, Hui P, Harper F, Terveen L, Konstan J (2014) Exploring the filter bubble: the effect of using recommender systems on content diversity. In: WWW’14Google Scholar
  40. 40.
    Over P, Awad G, Fiscus J, Sanders G, Shaw B, Michel M, Smeaton A, Kraaij W, Quénot G (2013) TRECVID 2013: an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. Washington, USA.
  41. 41.
    Pappas N, Redi M, Topkara M, Jou B, Liu H, Chen T, Chang S (2016) Multilingual visual sentiment concept matching. In: ICMR ’16Google Scholar
  42. 42.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  43. 43.
    San Pedro J, Siersdorfer S (2009) Ranking and classifying attractiveness of photos in folksonomies. In: WWW ’09Google Scholar
  44. 44.
    Segalen V (2002) Essay on exoticism: an aesthetics of diversity. Duke University Press, DurhamGoogle Scholar
  45. 45.
    Sharma G, Wu W, Dalal EN (2005) The CIEDE2000 color-difference formula: implementation notes, supplementary test data, and mathematical observations. Color Res Appl 30(1):21–30CrossRefGoogle Scholar
  46. 46.
    Sheridan P, Ballerini JP (1996) Experiments in multilingual information retrieval using the spider system. In: SIGIR ’96Google Scholar
  47. 47.
    Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47:1–45CrossRefGoogle Scholar
  48. 48.
    Song K, Tian Y, Gao W, Huang T (2006) Diversifying the image retrieval results. In: ACM MM ’06Google Scholar
  49. 49.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR’15Google Scholar
  50. 50.
    Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473CrossRefGoogle Scholar
  51. 51.
    Tapachai N, Waryszak R (2000) An examination of the role of beneficial image in tourist destination selection. J Travel Res 39(1):37–44CrossRefGoogle Scholar
  52. 52.
    Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) Yfcc100m: the new data in multimedia research. Commun ACM 59(2):64–73CrossRefGoogle Scholar
  53. 53.
    Tong H, Li M, Zhang H, He J, Zhang C (2004) Classification of digital photos taken by photographers or home users. In: PCM ’04Google Scholar
  54. 54.
    van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: WWW ’09Google Scholar
  55. 55.
    van de Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE CVPR’07Google Scholar
  56. 56.
    Vargas S, Castells P (2011) Rank and relevance in novelty and diversity metrics for recommender systems. In: RecSys ’11Google Scholar
  57. 57.
    Weyand T, Kostrikov I, Philbin J (2016) Planet-photo geolocation with convolutional neural networks. In: European conference on computer vision. Springer, pp 37–55Google Scholar
  58. 58.
    Wu S, Chen YC, Li X, Wu AC, You JJ, Zheng WS (2016) An enhanced deep feature representation for person re-identification. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–8Google Scholar
  59. 59.
    Wu Y, Bauckhage C, Thurau C (2010) The good, the bad, and the ugly: predicting aesthetic image labels. In: ICPR ’10Google Scholar
  60. 60.
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492Google Scholar
  61. 61.
    Yeh CH, Ho YC, Barsky BA, Ouhyoung M (2010) Personalized photograph ranking and selection system. In: ACM MM ’10Google Scholar
  62. 62.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV ’14Google Scholar
  63. 63.
    Zhang N, Donahue J, Girshick RB, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: ECCV ’14Google Scholar
  64. 64.
    Zhao S, Gao Y, Jiang X, Yao H, Chua T, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: MM ’14Google Scholar
  65. 65.
    Zhao S, Ding G, Huang Q, Chua TS, Schuller BW, Keutzer K (2018) Affective image content analysis: a comprehensive survey. In: IJCAI, pp 5534–5541Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.L3S Research CenterLeibniz Universität HannoverHannoverGermany
  2. 2.Visual Analytics Research GroupLeibniz Information Centre for Science and Technology (TIB)HannoverGermany

Personalised recommendations