1 Introduction

The number of digital images is constantly growing. However, it is not only the sheer volume of images that makes it impossible for the individual researcher to gain an overview of the whole art world; the different relationships among artworks make for even more complexity. Therefore art experts, as for example Grasskamp and Lange (2016) radio interview Grasskamp and Lange (2016) and Lange and Voss (2015), claim that the increasing number of artworks and the growing complexity of the relations among artworks lead to a situation where the original tasks of museums (preserving, collecting, researching, and teaching (Standard Deutscher Museumsbund, 2006)) can no longer be fulfilled in conventional ways. W. Grasskamp stated in an interview that an art expert would have to visit nine exhibitions a day to see all exhibitions held in Germany alone. Nobody can examine such a number of images and their relationships by viewing the images one after another. This is why we introduce LadeCA.View. LadeCA.View enables experts to describe a collection (here and in the following, the term collection refers to both a collection and an exhibition) of visual art in such a way that other experts can get an overview of the intention, content, and structures of the collection within a short period of time—without looking at each image individually. However, LadeCA.View can also be used as an interface to probe more deeply into a collection.

LadeCA.View is part of LadeCA, a language used to analyze, describe, and explore collections of visual art. The concept of LadeCA was developed and described by Pflüger (2021). LadeCA.View visualizes collections of visual art in an interactive manner and is based on a collection of visual art paired with a controlled vocabulary (Harpring, 2010). Each item (a LadeCA word) of the vocabulary represents a unit of meaning and is assigned a classifier that determines whether an image is labeled by the specific word. LadeCA.View uses the classifiers of the words to create a set of images for each word—the images of the collection that are labeled with the corresponding word. The sizes of these subsets indicate how prominent the words of a vocabulary are in the collection, and since the image sets of the words can overlap, they also indicate correlations and structures within the collection. In this way, LadeCA.View visualizes not only the content of a given collection/vocabulary pair but also the relationships of the words referring to the given collection; that is, LadeCA.View visualizes the content, characteristics, and structures of a collection. This information is not only shown formally but directly via the images at hand. With LadeCA.View, the images of a collection are always shown in the context of the entire collection and can thus be experienced in the context of the entire collection.

As described above, LadeCA.View enables a user to survey, use and understand a collection of images as a whole, and a user can select individual images for their research based solely on their own criteria while maintaining an awareness of the relationship between the selection and the entire collection. In contrast, a recommendation system strongly influences the selection of images that a user chooses and uses; therefore, the user loses direct reference to the entire collection.

With collections containing significantly more than 100,000 images, the interactivity and the visualizations of LadeCA.View become insufficient. If a recommendation system is used for such large collections, then the system can significantly increase the number of recommended images by using LadeCA.View; therefore, the connection between the presented images and the entire collection is not lost as much as with the conventional presentation of images.

LadeCA.View uses a LadeCA vocabulary to structure the collection to be displayed. The LadeCA vocabulary is created and applied using algorithms with considerable input from expert knowledge. A high level of efficiency is achieved through automation, while the integration of expert knowledge leads to a quality comparable to that achieved with manual indexing. LadeCA can be used for all types of images. However, the creation of a LadeCA vocabulary is optimized for images from the field of visual art. For images from other areas, LadeCA can create a vocabulary based on information taken from other systems. Any system that structures a collection by forming subsets of related images (e.g., recommendation systems or indexing systems) can be used. In such cases, LadeCA automatically creates a LadeCA vocabulary from the given image subsets, which is then used for the presentation with LadeCA.View.

The paper is structured as follows. With regard to the possibilities already in existence or currently under discussion, we will show in Section 2 the background of LadeCA.View in order to highlight LadeCA.View’s contribution to the research in the field of working with and presenting visual art. Section 3 and Section 4 show the way LadeCA words are created, their properties, and the way they are used in LadeCA.View. Section 5 shows LadeCA.View in the context of related systems. The functions and the design of LadeCA.View are shown and justified in Section 6. Section 7 outlines three case studies and thus LadeCA.View’s scope of applicability. In Section 8 we conclude with some final remarks.

2 Related work

If only the images of a collection are to be presented, there are basically two options. One option is to show the images one after the other (a slide show); the other is to show the images side by side. It is obvious that neither visualization is suitable for large collections. That is why large collections are usually broken down into smaller subject areas, which are then shown separately according to user requests. Researchers are shown predefined subject areas or they can select subject areas by means of metadata. Nowadays, most museums offer access to their collections via digital databases in the manner just described; among them are the British Museum (www.britishmuseum.org), the Rijksmuseum (www.rijksmuseum.nl), and the MET (www.metmuseum.org). Other institutions of various kinds also offer their digital image databases (which are sometimes considerably larger than those of museums) for research purposes (Getty Images (www.gettyimages.de), akg-images (www.akgimages.de), or Prometheus (www.prometheus-bildarchiv.de)). In all of these image databases, a user can search for images with the aid of metadata (e.g., an artist, theme, or object type). In the museums mentioned above, many of the images can be downloaded as open-access images in high resolution. Often, users are given additional images related to a particular image; these are usually images with the same metadata. In this way, a user can obtain information about individual images and put together small samples; that is the current situation with publicly accessible image databases used by art historians. This type of image search has the decisive disadvantage that only images that are somehow known to the researcher can be found; that means that some of the associated metadata must be already known to the researcher. Far more advanced methods of searching for images are offered by recommendation systems, which are outlined next.

2.1 Recommendation systems

A simple description of recommendation systems is reflected by the slogan “if you like that, you will like this” Gibney (2014). The slogan points to the basic principle of such systems; the system determines the user’s preference and searches the database for additional items that could be of interest to the user, e.g., items that are strongly related to items that have already been viewed by the user. The objective of the above slogan is simple, but the realization of a recommendation system is very complex and depends heavily on both the objects and the objectives of the user (as well as the objectives of the operator of the system). It is very hard for algorithms to determine a user’s interests on the basis of keywords and item samples and to transform these interests into database queries. In order to determine such database queries, a system has to generate relations between items (and keywords) that consider the changing interests of the current user. Jafarkarimi (2012) describes a simple procedure (collaborative-filtering) that calculates such relations with the help of a statistical analysis of user behavior. Other algorithms use the content of items to calculate relations between the items (content-based recommendation); such algorithms have been described e.g., by Pazzani and Billsus (2007). Nowadays there are very sophisticated systems for extracting information relevant to a user from databases. These systems differ with regard to the type of objects sought, the functions offered, the size of the databases and the way the objects are represented (Easley and Kleinberg (2010), Section 14). The areas of application for such systems are very diverse. Be it literature, scientific reports, product sources, videos, images or web pages, the number of digitally accessible items in all these areas is now so large that it can no longer be surveyed by a single person without a recommendation system. There are various functions for recommendation systems (from the points of view of the user and the operator) (Ricci et al., 2011; Baran, 2017): “find some good items; find all good items; annotation in context; recommend a sequence; recommend a bundle; platform-independent; scalable; compliance with copyright rules; increase the number of items sold; sell more diverse items; increase user satisfaction; increase user fidelity; better understand what the user wants.” The methods and algorithms of the recommendation systems that dominate the Internet are highly optimized and strongly adapted to the respective tasks and object types.

Recommendation systems enable users to find the data that are relevant to them, even in large databases, and this is sufficient for many research questions. However, the goal of LadeCA.View is to convey to a user the properties and potential of an unfamiliar collection as a whole. Thus, the focus is not on specific user interests or questions, but rather on the potential of a collection and its underlying intention.

2.2 Modeling

A single person cannot view all the individual images of a large collection within a feasible period of time. In large collections, it is in particular the relations between the individual images which result in the potential and the intention of the collection that are too complex to be grasped (within a feasible period of time) by a single person viewing solely the individual images. Here, a model of a collection helps to describe the collection’s content, potential, and intention. The model can be informal, usually a descriptive text. With LadeCA, however, there is a formal language to create a model of an image collection using a controlled vocabulary Harpring (2010).

Using models to describe and handle complex systems is an old and common practice. Standardized construction plans for buildings, for example, have been used for decades. The standards for such plans implement a language for models of buildings to document them in detail and all their complexity so that they can be built. In software engineering, an ER Chen (2002) or RDF model (Online document Wikipedia, 2022b) is commonly used to represent complex systems. Such a model describes data structures, integrity constraints, operations, and relations between the entities, and can be implemented in a database and visualized with the help of standardized graphic elements. Object-oriented programming not only helps to design and implement complex software systems; the result also implies a model with which the resulting system can be described at different levels of complexity.

With LadeCA, a user can create controlled vocabularies. An image set is divided into (not necessarily disjoint) subsets. The subsets then form the words (items) of the controlled vocabulary. Relationships between the words are given by the affinity of the images associated with the words. This approach is inspired by the Mnemosyne Atlas projected by Aby Warburg (1866–1929) Michaud (2004). Aby Warburg formed units of meaning (words) by grouping images to arrange these words spatially in such a way that the words’ spatial proximity to each other in space reflected the relationships of the words. This led to an atlas of related words (Mnemosyne Atlas), which roughly corresponds to a controlled vocabulary. A controlled vocabulary can also be thought of as a semantic network Online document Wikipedia (2022a) or an indexing system (e.g., ICONCLASS (Couprie, 1983) or AAT (Petersen, 1990)). We use the term “controlled vocabulary” to emphasize that it allows for descriptions (of sets of images), somewhat comparable to descriptive text. LadeCA implements a controlled vocabulary. An extension of the usual controlled vocabularies (in the field of visual arts) is that sample images (prototypes) are given for each word, and of particular importance, a classifier automates the assignment of images to the individual words. Applying a LadeCA vocabulary to a collection creates a model of the collection that includes the content, focus, and structures (synonymy, hyponymy, and antonymy) of the collection. LadeCA.View consists of interactive visual interfaces that make the model accessible and editable.

Relations between the objects are also determined in recommendation systems, resulting (in turn) in the creation of structures (as in a model). The reasons for using a model instead of a recommendation system for working with a collection of artworks are, first, the assumption that a collection of artworks has a known underlying intention and proposition and, second, that the focus when interacting with users via a model is on conveying that intention and proposition.

2.3 Excerpts and accumulations of image databases

If only part of a collection is of interest to a user, or if the operator of the collection only wants to present part of the collection, it makes sense to only present this part; that is the idea underlying recommendation systems (Section 2.1). With LadeCA, the part that is to be presented can be restricted during the creation of the vocabulary. This makes sense if only part of the collection is to be presented. However, users can also restrict the selection according to their own criteria while using LadeCA.View. The zoom function of LadeCA.View allows a user to deliberately narrow down the area to be examined. Selecting individual words from a vocabulary, the user can limit the range that is presented. This has the advantage that not only the limited part of the collection is being made visible, but also that the user has an overview of what is not presented through the deliberate choice of (LadeCA) words. If the user wants an extension of the subset, LadeCA.View adds additional words to the subset. This recommendation is based on the (content-based) relationships between individual words.

There are platforms that can pool different image databases. Prometheus (www.prometheus-bildarchiv.de), for example, is one such platform. Various types of image databases are made available here, but the user only sees the Prometheus interface and does not have to think about which database is actually used to deliver the desired images. Such a platform has the advantage that new image databases can be added while the owner maintains full control over the added database. Such constructions are not possible with LadeCA, but different (LadeCA) databases can be automatically merged, even if the databases have different metadata. However, if there are different metadata, the performance of the classifiers degrades.

3 Words in LadeCA

Although the idea behind LadeCA words has already been described by Pflüger (2021), this description is worth reiterating here because it is also essential for understanding LadeCA.View.

A LadeCA word (only referred to as a “word” if the meaning as a LadeCA word is given by the context) is determined based on related images. The types of relationships between the images of a word can be very different. Figure 1 shows three different types of relationships. In the top row, all the images show the same object – a seated person – and all the images are sketches. In the middle row, all the images are related by virtue of their style. In the bottom row, all the images are by the artist Weiran Wang.

Fig. 1
figure 1

Different types of relatedness. (Artists from top to bottom: Weiran Wang, Christine Gläser, and Weiran Wang, respectively)

3.1 Semiotic triangle

In LadeCA the semiotic triangle (Fig. 2) for LadeCA words is seen as follows:

  • A word is a sign expressed by an image or ideogram.

  • A word’s descriptive meaning is a set of image samples (possibly with metadata and text) which describe the meaning of the word at hand. In this sense, the descriptive meaning is as manifold as the descriptive meaning in natural language, and we assume that usage or explicit stipulations also lead to a common descriptive meaning for each word.

  • The denotation of a word is the set of potential images that can be labeled by the given word (extensional definition). Each word is associated with a unique classifier (intentional definition) that decides for each given image whether it can be labeled by the corresponding word or not.

Fig. 2
figure 2

The semiotic triangle for a content word in general (Löbner, 2013)

3.2 Word formation in LadeCA

The semantic concept of LadeCA words follows prototype theory Löbner (2013). In prototype theory, the concept of a given word has a real-world example (a prototype) that best represents this concept. Following this theory, a word can be defined by a prototype. Concepts in art include works of art that serve as real-world examples. In this manner, LadeCA words are described through sample images which represent the concepts underlying the words; these are also called prototypes. In the field of art, the concepts that can be assigned to a category are not restricted and therefore vary. In most cases, it is necessary to describe a category using several prototypes, with each single prototype defining a concept of the category. The classifiers and associated comparison functions are generated automatically by LadeCA during the word formation process (Fig. 3). The word formation process starts with an existing LadeCA word or a set of positive examples (prototypes; together with metadata) that exemplarily describe the category of the word to be formed. Then, for each positive example, LadeCA generates a classifier that decides whether a given image is related to the positive example or not. In the next step, all images of the basis set are classified, and the result is presented to the user for correction. Figure 4 shows the interface for the correction process. The left-hand side of the interface shows the images that are classified as positive; the order in which the images are shown corresponds to the strength of their relatedness to the category, and the images framed in white are the given positive examples. The right-hand side shows the images that are classified as negative; the order corresponds to the strength of their relatedness to the category, and the images framed in white are the negative examples. The user can make corrections; if there are images that are not related to the category on the left-hand side (images classified as positive), the user can mark them as not related (with a red frame); and if there are images that are related to the category on the right-hand side (images classified as negative), the user can mark them as related (with a green frame). The correction does not have to be exhaustive, i.e., it is sufficient to correct only a few incorrectly classified images. In the next iteration, LadeCA uses the corrections to derive new classifiers. Because the images are sorted according to their degree of relatedness to the category, only the top rows need to be looked at when making corrections on the right-hand side because the classification is uncertain only for these images. It is the responsibility of the user to decide whether all concepts that should be taken into account when creating a category are represented by the sample set. If a category is not fully represented by the positive examples, and if examples are not shown in the top rows on the right-hand side, further images that represent the missing concept – found by other means, e.g., via metadata – can be inserted into the sample set. After a word has been confirmed as correct by a user, LadeCA optimizes the classifier. Positive and negative examples that are not required for a correct classification are no longer used for classification.

  • If all images (from the base set) of a word are given, then LadeCA can use them to automatically create the word (i.e., the classifiers for it).

  • If the images of a word are given in order (e.g., frames of a video clip, or the images of a narrative sequence) then a word can be defined as a sequence. The order of the images of a sequence can be determined interactively by the user with the word formation interface. A sequence has all the properties of a word, but the images of the sequence are all defined as positive image samples and are not removed by LadeCA (during the final optimization).

Fig. 3
figure 3

Schematic representation of the formation process of a word. Set of images: Basis set used for the creation of classifiers. Initial set: One or more related images specified by the user or suggested by LadeCA based on an analysis or clustering of the given set of images (or an already existing LadeCA word). I: From the given set of positive and negative examples that describe a category (or the initial set), LadeCA creates a classifier, which in turn determines images potentially belonging to the category. II: LadeCA separates the images generated in step I into belonging to the category and not belonging to it, and the user makes corrections if necessary. III: If corrections have been made, a further iteration step in the process of forming a word category is carried out

Fig. 4
figure 4

Interface for correcting a LadeCA word category

4 Semantic relations of LadeCA words

Labeling an object/word does not clarify its meaning. Grasping the meaning of an object/word is seeing it in relation to other objects/words Löbner (2013), Dewey (1910), Miller and Johnson-Laird (1976), and Harrison (1972). Therefore, a language about images has to be able to indicate the relatedness between words. In LadeCA there are two fundamentally different methods of (automatically) examining relatedness between words. One possibility (word-based) determines relatedness between words directly via the word properties, by using the classifiers and prototypes of the words. The second possibility (set-based) is to use a set of images (an exhibition, collection, or corpus) as reference. In LadeCA.View, it is set-based relatedness that is mainly used since only relatedness within the given collection is analyzed; word-based relatedness is used for arranging the reference images of the words in the main view and for the zoom function (Section 6), since here the general word context is more important.

The relatedness R between the words A and B is not symmetrical (RA,BRB,A). This allows an order (<) to be formulated, particularly to identify the property ”A is a subset of B” (A ⊂ B), which is necessary to determine hierarchies.

In the set-based approach, the relatedness RA,B between words A and B is determined by set theory as follows:

RA,B = |IAIB| / |IA|

RB,A = |IAIB| / |IB|

|IA| (|IB|) is the number of images of word A (B) in

the reference set

|IAIB| is the number of images of the intersection

of the images of the words A and B in the reference

set

Synonymy: The words A and B are considered strictly synonymous if both values RA,B and RB,A are equal to 1. This is the case if the words A and B are identical with respect to the reference set, i.e., the images of word A in the reference set are the same as those of word B. Figure 5 shows an example of partial synonymy; the upper row is an image set partially synonymous with the image set in the lower row.

Fig. 5
figure 5

Example of two sets of images that are partially synonymous. (Artist: Frieder Kühner)

Hyponymy: The word B is a strict hyponym of A (A is a hypernym of B) if RB,A is equal to 1 and RA,B is less than 1. This is the case if the images of word B are a strict subset of the images of word A. Figure 6 shows examples of hyponymy; the sets of images in the middle row are hyponyms of the set in the bottom row, while they are hypernyms of the corresponding sets in the top row.

Fig. 6
figure 6

Examples of hyponymy. (Artist: Weiran Wang)

Co-hyponymy: The words B1, B2, B3, etc., are strict co-hyponyms of A if RB1,A, RB2,A, RB3,A, etc., are equal to 1, RA,B1, RA,B2, RA,B3, etc., less than 1, and the relatedness RAi,Aj (with i ≠ j) 0. This is the case if the words Bi have no images in common and the images of word A are the exact sum of the images of words Bi. Figure 7 shows an example of co-hyponymy; the images are divided into two independent parts: colored and gray-scale.

Fig. 7
figure 7

Example of co-hyponymy

While strict synonymy, strict hyponymy, and strict co-hyponymy are rare or trivial, cases of partial synonymy, partial hyponymy, and partial co-hyponymy are more significant (Löbner, 2013). Figure 8 shows the difference between strict and partial synonymy, hyponymy, and co-hyponymy for the set-based approach; these differences apply analogously to the word-based approach.

Fig. 8
figure 8

Euler diagrams to show synonymy, hyponymy, and co-hyponymy for the set-based approach

Opposition/antonymy: When words B1 and B2 are strict co-hyponyms or partial co-hyponyms of a word A, they are opposites/antonyms or gradable antonyms, respectively.

5 LadeCA in the context of related systems

A LadeCA vocabulary is a controlled vocabulary; the items of LadeCA are units of meaning denoted by a classifier that recognizes images that can be labeled with the respective word. The semantic relations, and thereby the semantic structure of a given vocabulary, are described in Section 4. A LadeCA vocabulary is created using LadeCA Interfaces (Section 3); however, it can also be created using any other system that structures a collection by forming subsets of related images. In such cases, LadeCA automatically creates a LadeCA vocabulary (with all its semantic relationships) using the given image subsets.

5.1 LadeCA as a semantic network

Objects:

  • Images

  • LadeCA words

Semantic relations (A, B: LadeCA words; I: An image):

  • LI,A: Image labeled with the word A.

  • RI,A: Degree of relatedness between Image I and word A as a number between -1 and + 1, where -1 means “not related” and + 1 means “strongly related” (a threshold defines whether an image is labeled with the corresponding word).

  • \(R^{S}_{A, B}\): Degree of relatedness (set-based; Section 4) between A and B as a number between -1 and + 1, where -1 means “not related” and + 1 means “equals”.

  • \(R^{W}_{A, B}\): Degree of relatedness (word-based; Section 4) between A and B as a number between -1 and + 1, where -1 means “not related” and + 1 means “strongly related”.

  • HA,B: A is a hyponym of B (Section 4).

In principle, it is possible to transfer the LadeCA vocabulary to other models. The concrete definition of the objects and relations depends on the model to be used (e.g., ER or RDF models).

5.2 LadeCA as an indexing system

A LadeCA vocabulary can be viewed as an indexing system. The words of the vocabulary correspond to the indexes/classes of indexing systems. Structures such as word fields or hierarchies result from the semantic relationships between the words. The difference from the commonly used indexing systems (e.g., AAT or ICONCLASS) is that with LadeCA, there is a classifier for each word, i.e., for each index/class, which automatically decides whether an image is labeled with the corresponding word. This allows indexing to be done automatically; a change in the vocabulary can then be automatically transferred to image sets that have already been indexed. A LadeCA vocabulary is created by specifying sample images in a process supported by algorithms and interfaces, in contrast to the textual description of the individual indexes/classes in conventional systems. The advantage of a LadeCA vocabulary over conventional systems is its greater efficiency and greater flexibility.

For a set of images indexed with a conventional indexing system, LadeCA can automatically create a LadeCA vocabulary so that each indexed set of images can be presented with LadeCA.View. A LadeCA vocabulary created by using an indexed image set can be used for automatically indexing an unindexed image set with the original indexing system. The quality of this process of indexing can be checked and optimized with the appropriate LadeCA interfaces.

5.3 LadeCA.View combined with recommendation systems or search engines

LadeCA.View does not have the functionality of a recommendation system or a search engine; however, LadeCA.View can display the results of such systems. The advantage of this capability is that LadeCA.View can convey images in the context of the entire image base. This makes it easier for the user to experience the connection between a selection of images and the image base.

The procedure is as follows. The first step is specifying the images to be presented by the recommendation system or the search engine; LadeCA.View then determines the LadeCA words associated with these images and their structures. Next, these words (with their semantic structures) are presented via the zoom function of LadeCA.View. A user can reduce the number of presented words or expand them using the corresponding LadeCA.View function. When expanding the number of words, LadeCA.View supplements the current selection with words that are related—a kind of recommendation.

6 LadeCA.View

The target group of LadeCA.View are art experts (art historians, curators, art dealers, artists) who want to get information about digitally accessible collections of visual art, who want to examine them and/or who want to create or edit a digital presentation of a collection. LadeCA automatically calculates the semantic relations among the words of a given collection/vocabulary pair, and LadeCA.View makes the content and the relations between LadeCA words and between images accessible. LadeCA.View is integrated into the system LadeCA; while using LadeCA.View, a user can also use all the functions of LadeCA, and this opens up extensive possibilities for editing a collection. Words can be changed, new words can be inserted, words can easily and quickly be combined into new words, and words can easily and quickly be split into multiple words.

The functionality of LadeCA.View cannot be regarded as complete. Extended analyses make further functions and visualizations necessary, e.g., if the interaction with indexing systems is implemented. At the stage described here, LadeCA.View is focused on the case studies described in Section 7. This results in goals for functions, interactions, and visualizations of LadeCA.View and the resulting design decisions. The intended applications are as follows.

A1. Abstracts. Describing a collection to enable users to get a quick overview of a collection of visual art – like having a tool to create and show an abstract for collections of visual art.

A2. Explore. Enabling users to quickly familiarize themselves with a large collection of visual art.

A3. Describe/analyze. Making it possible to describe existing findings about a collection of visual art in such a way that users can understand and further investigate these findings based on the images of the collection.

We used our experience from previous projects Pflüger et al. (2020) and Pflüger (2021) and interviews with members of the target group to the central issues these experts encounter when working with large image collections. The results led to further requirements for the design of the interactions and visualizations of LadeCA.View

  • Users cannot be assumed to have any knowledge about setting parameters of the used algorithms. In particular, it must be possible to intuitively create LadeCA words using the given image database.

  • Working with LadeCA.View should be possible in an intuitive way, without time-consuming intermediate steps, and should take place directly via the image material.

  • Properties and relationships between images and (LadeCA) words must be shown directly using the images at hand. This means that the visualizations should not use complex types of representation such as extensive color codes or abstract visualizations.

  • Art historians regard working with originals as essential. Thus, in order to maintain an unobstructed relation with and close resemblance to the originals, the images should not be visually distorted or overlaid with artificial visual elements during handling.

The requirements lead to some basic techniques for visualization and interactions in LadeCA.View.

The user’s focus is derived from the position of the mouse pointer. When the focus has been steady for a while, LadeCA.View interprets this as a request for further information on the focused object, and presents more information in a separate view. Explicit requests for information are made with a mouse-click. The object clicked on and the context determine the type of information that is required. The desired information is then depicted by groups of images – with the exception of the presentation of descriptive text or metadata. Options for displaying groups of images were discussed by Gleicher (2018). Due to the requirements mentioned above, only the side-by-side representation is an option. Different ways of presenting supplementary information about a focal object were discussed by Cockburn et al. (2009). Since the images should not be visually distorted or overlaid with artificial visual elements during handling, only separate representations are used in LadeCA.View. There is an exception for the words’ (pictorial or symbolic) reference images. These can be marked, and the transparency or the background of the reference images can used to display information. This is acceptable since the reference images are not seen as works of art. Relationships between individual words are shown by the spatial arrangement of the reference images in the main view; special relationships are rendered by showing and hiding images (which saves markings).

The following sections describe the interface of LadeCA.View using the applications A1 – A3 as examples. Due to the high level of interactivity, individual screenshots illustrate LadeCA.View’s functionality of LadeCA.View only to a limited extent. The interface can be better assessed using the video in the supplementary material.

6.1 LadeCA.View – abstracts

We use the collection taken in case study 1 (Section 7.2)(700 of 5,000 images) to show how the abstract of a collection is displayed with LadeCA.View. In the top view (see Fig. 9), reference images of dominant words have white backgrounds and opaque reference images. If, on the other hand, only a few images in a collection are labeled with a word, this word is shown with a gray background and the reference image is shown transparently. Intermediate levels then form a scale for the frequency with which individual words are represented in the collection. It is easy to find out the main themes, namely, that, on the one hand, the collection consists of works by Frieder Kühner, Weiran Wang, and to a lesser extent Christine Gläser, and, on the other hand, that the art concepts included in the collection are abstract, landscape/architecture, and figural painting. The reference images are automatically arranged in such a way that words that belong together (word fields) are spatially arranged together. In this manner the words that contain images by the individual artists are easy to identify – they are each grouped around the word that contains the totality of the works of art of the respective artist. Information on a word is displayed in the bottom left view as soon as the mouse pointer hovers over a reference image. The reference image is then presented opaquely and is underlined with a thin red line; a text field in the lower left view provides the word ID, the number of images, and a descriptive text (if available); and there are sample images of the word, i.e., the images which the word classifier is based on and which are therefore typical examples of the word.

Fig. 9
figure 9

LadeCA.View showing the example used in case study 1

More information is shown as soon as a word is clicked on (Fig. 10; a user can also select several words at the same time). Then the reference image of the word (top view) is marked with a red frame; all images that are labeled with the word are shown fitted into the lower right view; in the top view the reference images of the words that are not related to the selected word are hidden. When the mouse pointer hovers over an image in the lower right view, this image is shown fitted into the left view. If the mouse pointer hovers over a reference image in the top view (Fig. 11), all images that are not labeled with the indicated word are hidden in the lower right view – then the intersection of the marked word (framed red) and the indicated word is visible. In this case, the basic information of the indicated word is shown in the lower left view. Detailed information about an individual image (e.g., metadata) is displayed in a pop-up window if an image is clicked on in the lower left or right view. Multiple words can be automatically combined into a new word. In the example at hand, the words that encompass all images of an artist have each become a new word (the words with the name of the artist as reference images). The words that have been combined are strict co-hyponyms if they are disjoint; otherwise they are partial co-hyponyms. Such structures can be determined and displayed with LadeCA.View. Figure 12 shows all hypernyms of the collection in the top view. Figure 13 shows information on a selected hypernym.

Fig. 10
figure 10

LadeCA.View after clicking on a word

Fig. 11
figure 11

LadeCA.View during a user’s exploring words related to a clicked-on word

Fig. 12
figure 12

LadeCA.View showing co-hypernyms in the top view. Information on the co-hyponym underlined with the thin red line is given in the bottom left view

Fig. 13
figure 13

LadeCA.View showing a selected co-hypernym (bold, framed red) and its co-hyponyms (bold, underlined red) in the top view. Information on the co-hyponym underlined with the thin red line is given in the views below

6.2 LadeCA.View – explore

We use the collection that we examined in user study 2 (Section 7.3) as an example to show advanced functions of LadeCA.View. This collection was explored by users having no prior knowledge of the collection. Therefore, the words were created intuitively rather than systematically. This sometimes resulted in related words. LadeCA.View recognizes synonymous words (word fields) and can display these word fields on request. While strict synonymy is rare or trivial, cases of partial synonymy are more significant. With LadeCA.View, the user can set the degree of relatedness that is accepted as synonymous. A useful function when exploring extensive vocabularies is the zoom function. If a user marks some words in the top view (Fig. 14), the current vocabulary will be reduced to these words (Fig. 15) when the user is using the zoom function (zoom in), and the new, reduced vocabulary can be examined in the same way as the original one. A reduced vocabulary can be expanded (zoom out) (Fig. 16); this adds words related to a word in the current vocabulary, and can be repeated; it is possible to go back to the original vocabulary in one step.

Fig. 14
figure 14

Top view of LadeCA.View showing selected words (framed red) for zooming

Fig. 15
figure 15

Top view of LadeCA.View showing the selected words as a new vocabulary

Fig. 16
figure 16

Top view of LadeCA.View showing the expanded vocabulary of Fig. 15

6.3 LadeCA.View – describe/analyze

We use the collection that we examined in user study 3 (Section 7.4) as an example to show advanced functions of LadeCA.View. This collection is characterized by words that contain variants of events (birth, wedding, death, etc.), scenes (parade, amusement, circus, etc.), as well as certain figures (dog, donkey, bird etc.) and objects (ladder, gnocchi pot, stone cube, etc.). This leads to a hierarchical structure that LadeCA.View automatically recognizes and is able to display (Fig. 17). The top node of the hierarchy, the word that includes all the words that illustrate variants, is highlighted by a thick red frame. The nodes/words that form sub-hierarchies, or words that contain variants and do not belong to a sub-hierarchy, are underlined in bold red. If the mouse pointer hovers over a word (in Fig. 17 it is the word with rural scenes), its information is displayed in the views below. If a sub-hierarchy is clicked on, the structure of this hierarchy is displayed analogously (Fig. 18). The words of the Tiepolo collection form a construction kit with which potential Tiepolo life stories can be illustrated. There is no preferred life story; rather, an almost infinite number of life stories and variants is plausible – sequences of images in a fixed order. Such sequences can be defined in LadeCA.View (Section 3.2). In the collection described here, three sample sequences are inserted; these are the words whose reference images are marked with five small squares (Fig. 19). With LadeCA.View, a user cannot only examine already existing sequences; with simple mouse clicks a user can create new sequences with images from the individual categories (scenes, events, figure objects) in a defined order and insert the new sequences into the existing vocabulary.

Fig. 17
figure 17

LadeCA.View showing a hierarchical structure of the collection at hand. The top node/word of the hierarchy has a bold red frame. The nodes/words of the next lower level are underlined in bold red. The node/word indicated by the mouse pointer is described in the views below and is underlined thinly in red

Fig. 18
figure 18

LadeCA.View showing the structure of a sub-hierarchy of the collection at hand. The top node/word of the hierarchy has a thin red frame. The top node/word of the sub-hierarchy has a bold red frame. The nodes/words of the next lower level are underlined in bold red. The node/word indicated with the mouse pointer is described in the views below and is underlined thinly in red

Fig. 19
figure 19

LadeCA.View with a selected sequence. The mouse pointer hovers over the sequence that is shown in the bottom left view in the defined order

6.4 Usability of LadeCA.View

LadeCA.View is currently used in several projects. The interface is constantly being optimized and, if a project requires special visualizations or functionality, expanded. From our point of view, a large, meaningful user study to assess usability would not make sense. However, for a brief evaluation of the interface described here, we conducted a small user study with 10 art history students who had not previously worked with LadeCA.

Applied to three collections, the tasks were, one, specifying the main topics, two, examining one of the main topics in more detail, and, three, detecting and describing word fields and hierarchies. During the examination of the first collection, carried out by the instructor, the tasks of the study and the basic functions of LadeCA.View were explained and demonstrated. The second and third collections were examined by the participants themselves, who were able to ask questions at any time. The plausibility of the results, the processing time, the participants’ own subjective assessments of their results and the usability and usefulness of the interface were then evaluated. The examination of the results showed that all participants were able to solve the given tasks correctly and that the process could be carried out by all participants in approximately 20 minutes per collection. At the end of the study, all participants were able to work independently and rated working with the interface and their results positively.

7 Case studies

The following sections describe three case studies to outline LadeCA.View’s scope of applicability. The presentation of these case studies with the help of LadeCA.View was described in Section 6.1 – 6.3. and is shown in the video of the supplementary material.

7.1 Basic workflow

LadeCA automatically creates a semantic network for a given collection/vocabulary pair. The LadeCA words form the nodes; the relationships between the nodes are determined automatically. LadeCA.View makes the semantic network accessible. The basic workflow of creating a semantic network consists of defining a vocabulary for a given collection, which can be done in different ways.

  • If a suitable vocabulary is already available – e.g., from a previous project with a similar image basis and goals – this vocabulary can be used. If necessary, it can be adapted to the current project.

  • The collection can be split up into subsets by the user with the support of LadeCA (case 1). LadeCA then automatically creates the corresponding words, and these words can then be easily and quickly be combined or subdivided by means of supporting LadeCA interfaces and algorithms.

  • If the task is to explore an unknown collection, words can be created intuitively (case 2). This means a user creates the words of a vocabulary by determining some related images as examples of a word (e.g., with the help of metadata search or similarity search) with which LadeCA automatically generates a word which can then be optimized with the word formation interface. In addition to that, the word formation interface creates a word that contains the images that are not labeled with any word of the given vocabulary. The images of this word point to the part of the collection that is not accounted for by the current vocabulary, which helps a user to decide whether a vocabulary is already sufficient or needs to be expanded.

  • If a collection has already been examined and groups of related images have been created (case 3), words can be determined for these groups (again, automatically, by means of LadeCA). LadeCA.View then shows these groups and their relationships, thus giving a description of the collection and the results of the existing analysis, and enables further investigations. A similar case exists when the collection is already indexed with a suitable indexing system.

The three following case studies outline the range of applications that LadeCA.View offers.

7.2 Case study 1

The situation was as follows. We worked with an art corpus that contains around 5,000 works of art from the last 200 years by around 50 artists. The corpus included portraits, individual or groups of people, architecture, landscapes, and abstract paintings, done in different painting and drawing techniques. The images represented different styles. The range ran from photo realistic to distortion and from abstract to constructivist images. The task was to present the works of C. Gläser, W. Wang and F. Kühner, which were contained in this corpus. To do this, we asked the curator S. Borchardt to isolate the images painted by these artists and split them into disjoint words, with each word comprising related images. In addition, further words were created (each with a reference image showing a corresponding keyword), one word for each artist which comprised their artwork, and one word for each of the following topics: abstract images, figurative images, landscape and architecture, still life, and genre painting. This resulted in a vocabulary of 30 words (Fig. 20) With the support of LadeCA, the curator was able to create the disjoint words in about two hours, and it took him about another hour to create the remaining words. The presentation with LadeCA.View was then generated automatically. In the presentation with LadeCA.View the dominant words are easy to recognize – they are the words with a very light background. With the help of LadeCA.View’s interactive functions, a user can quickly and easily explore the content, relationships, and structures of the collection (see Section 6.1).

Fig. 20
figure 20

Vocabulary created for case 1 (top view of LadeCA.View showing the reference images of the words)

This case study is an example of a task that often presents itself in practice (e.g., in a museum). The curator of a large collection of artworks wants to use part of the collection for a presentation (e.g., an exhibition). The curator is familiar with the collection, knowing which images he wants to present and with what intention. Perhaps the images to be shown are already assembled, or can be assembled using their metadata. Main groups can now be formed according to the intention of the presentation; this can be done by directly specifying the groups (e.g., using metadata) or using the word formation interface (Section 3.2). This leads to a basic vocabulary that can be refined with the help of the appropriate support of LadeCA algorithms and interfaces, and can be structured by joining words (e.g., to create hierarchies).

7.3 Case study 2

The situation was as follows. We had scans of about 30,000 historic prints (from the Klebealben of the Albertina Vienna). In a study on the evaluation of LadeCA (Pflüger, 2021), 16 students of art history intuitively created 140 words (Fig. 21); this took about 5 to 10 minutes per word. This vocabulary (that is, the 140 words) reflects the content of the collection from the perspective of the participants and enables a user to explore and analyze the content of the collection with LadeCA.View.

Fig. 21
figure 21

Vocabulary created for case 2 (top view of LadeCA.View showing the reference images of the words)

The participants had no prior experience with the images of the collection. The vocabulary reflects how the participants perceived the collection while creating the words. It can therefore not be assumed that the vocabulary actually represents the content of the collection in a comprehensive way. However, LadeCA offers the ability to check and optimize the vocabulary with appropriate algorithms and interfaces until the vocabulary actually represents the underlying collection. This process is accomplished with LadeCA in interaction with the user, checking whether all images labeled with the same (set of) words actually belong together; in the event that the vocabulary is insufficient, special interfaces offer the possibility to optimize the vocabulary for the corresponding collection with algorithmic support (e.g., by splitting words). Furthermore, LadeCA supports a user in working out hierarchical structures and supplementing the vocabulary accordingly (e.g., by creating supersets). LadeCA’s algorithms and interfaces for optimizing a vocabulary for a given collection of images are still in an experimental state and are currently being evaluated and documented in detail. LadeCA.View already offers a somewhat less complicated way of optimizing a vocabulary. All images in a collection that are not labeled with a word are combined into one word. The user is thus given the opportunity to view these images that are not covered by the vocabulary and to form new words from this set until all images are finally covered by words of the vocabulary. The formation of supersets for working out hierarchical structures is also possible directly with LadeCA.View.

This case study shows an example of a possible solution to problems with the structuring and presentation of very large collections. Art collections that are very extensive, such as the collection of historical prints at the Albertina Vienna (1,000,000 prints), or collections that are constantly being expanded and changed (e.g., the image archive Prometheus), do not have curators who know the entire collections down to all their details. Not only can such collections be structured effectively and economically with LadeCA and presented with LadeCA.View, but extensions and changes can also be integrated simultaneously into the structure and the presentation. Moreover, structuring and presenting collections with LadeCA requires only a small fraction of the time and costs usually required when using standard indexing systems such as ICONCLASS or AAT. Lastly, the close integration of expert knowledge and algorithms through the interfaces of LadeCA guarantees both professional quality and the consideration of all components and aspects of the collection, even as users can incorporate their knowledge, findings, and intention.

7.4 Case study 3

The third case was about a corpus of drawings by the artist Tiepolo; Tumanov (2019) analyzes this collection in detail. The corpus, created between 1797 and 1804, consists of 104 drawings all showing the figure of Pulcinella at different life stages. The analysis of this corpus resulted in a vocabulary of 33 words (Fig. 22). However, due to the numerous varying repetitions or obviously contradictory representations, it is impossible to combine all drawings in one concise narrative. That means that there is no clearly intended narrative. Instead, the viewer is given a wider narrative scope within which a multitude of different stories can be constructed. Due to this situation, the recipient of the collection can take the place of the narrator and arrange individual drawings in different sequences, which then tell different life stories of Pulcinella. For a recipient who has not memorized the entire collection, however, it is impossible to use the full range of options to tell different stories. In this case, LadeCA.View does not only give a recipient an overview of the collection but also enables them to construct different stories about the life of Tiepolo without having to delve deeply into the collection.

Fig. 22
figure 22

Vocabulary created for case 3 (top view of LadeCA.View showing the reference images of the words)

Art historians often analyze the semantic relationships among images of a specific collection and determine possible intentions of the collection. The results of such an investigation are usually communicated in text form and with the help of some image examples. As the example above shows, such results can also be described and communicated directly with LadeCA.View using the entire image material of the collection. This expands the possibilities for communicating the results.

8 Conclusions

This paper introduces the interface LadeCA.View. It describes LadeCA.View and shows that this interface enables experts to describe a collection of visual art in such a way that other experts can get an overview of the intention, content, and structures of the collection within a short period of time, and that it can also be used as an interface to probe more deeply into and edit an existing collection. Using three practice-oriented examples, this work shows possible uses for LadeCA.View and thus the ways in which LadeCA.View supports effective research in the context of digitally accessible art collections; we tested and optimized the usability of the system with a cursory user study (10 art history students). The whole concept of LadeCA.View is new – we are not aware of any method that is able to convey the properties of a collection as a whole with comparable effectiveness which simultaneously allows the user to explore these properties directly in more detail based on the image material.

9 Abbreviations

LadeCA: A Language to analyze, describe, and explore collections of visual art