Our system allows to implement an effective workflow for gaze-based interaction with artwork imagery and is composed of two main parts (Fig. 3):
-
a backend, consisting of two tools, one to be used by art experts (e.g., the curators of an exhibition) to define the contents that will be displayed, and the other by the system administrator, to choose the various settings;
-
a frontend, consisting of a single application, that is the actual eye tracking interface used by end users (i.e., visitors).
Backend
The first backend tool, called ActiveArea Selector, allows to easily create “gaze-aware” images and define and update the contents that will be shown in the frontend application. In a museum, the set-up of the exhibit often changes when new artworks arrive on loan for a short period, or for important anniversaries and events. In all these cases, also the multimedia content of an interactive installation needs to be updated accordingly. To make this operation as simple as possible, an art expert can use the ActiveArea Selector tool to load an artwork image, “draw” rectangular “regions of interest” (active areas) on it, and link specific multimedia content (text, image, audio, video, or their combination) to each of them. Some multimedia contents can only be played separately. For example, while it is possible to combine a caption, an audio clip, and an image, it is not possible to play audio and video clips at the same time, or to show an image and a video together—which would be rather confusing for the visitor. The chosen contents can enhance the fruition of the artwork, supplying additional historical, artistic, or scientific information.
The graphical user interface of the ActiveArea Selector is minimal, to be easily comprehensible to users who may be expert in art but not necessarily in technology (Fig. 4). The main panel is occupied by the image, that can be panned and zoomed using the lateral scroll bars and the mouse wheel. By activating the “edit mode”, through a checkbox at the bottom of the screen, the mouse can be used to draw, move, and re-scale active areas (Fig. 4a). A right click on an active area displays a pop-up menu (Fig. 4b) that allows to enter a description, in different languages and choose or update the linked media content. The selected active area changes its color from light blue to yellow to differentiate itself from the other areas and provide a visual feedback. Once all areas have been drawn and their content has been added, a “gaze-aware” image, coded through an XML file, is exported for the frontend application.
Even if there is no limit to the number of active areas that can be defined for an image, it would be better not to have too many of these sensitive regions, to avoid cognitively “overloading” the visitor with too much information and multimedia content. Basically, only some meaningful features of the artwork should be highlighted, that can be easily understood by visitors in a short time. After some trials during the development of the system, we found that up to three active areas are suitable for “small” images, while more areas (five or six) can be employed for bigger pictures. However, the tool also offers an additional option in case of images with a large number of active areas (e.g., a painting with many characters). Active areas can be grouped into “levels”, so that the user can selectively display the areas of a specific level only. This occurs with an additional selection menu that appears when a picture with associated levels is opened (see Sect. 5.2). This option should only be used for “complex” paintings, with many relevant elements.
To speed up the initial image annotation phase, the ActiveArea Selector tool can be used by different art experts in parallel, each working on a different artwork.
The second backend tool is a settings panel used by the system administrator to set all the parameters of the frontend application, such as the colors of graphical elements, timers, maximum levels of image zoom, or the behavior of the active areas. In this case too, we opted for a simple interface showing, for each parameter, its current value and a short description explaining the valid values. Also, this tool does not require specific technical knowledge and can be effectively used by the curator of an exhibition to customize the user experience.
Frontend: Gaze-based Artwork Explorer (GAE)
The frontend is the core of the proposed system. The setup consists of the eye tracker placed at the base of a computer screen (Fig. 5). The user sits in front of it and, after a short calibration procedure (that simply consists in looking at a few circles displayed in different positions), can interact with the application. Instead of the Eye Tribe eye tracker employed in the previous exhibition, now out of production, we have used the more recent Tobii 4C, together with its SDK that integrates well with the Microsoft C# Windows Presentation Foundation (WFP)—the standard framework for creating user interfaces in Windows-based applications.
The development of the eye tracking tool, called Gaze-based Artworks Explorer (shortened to GAE from now on), followed three main design principles: intuitiveness, generality, and robustness.
Both the interface and the interaction were designed to be as intuitive as possible: a multimedia application in a museum is generally used only one time, and thus it must be immediately comprehensible to the user and be characterized by a very quick learning curve. GAE supports different kinds of media, and its “behaviors” (such as the duration of a dwell time, i.e., the fixation duration for an action to be triggered) can be customized by the administrator. Finally, to make the system robust, all buttons are large and well-spaced, so as to compensate for possible errors due, for example, to big movements of the user in front of the screen or to sub-optimal calibrations—both possible in a crowded setting like a museum.
Figure 6 shows the system’s state diagram, illustrating how a user can move among the various “pages”.
The starting point is the Idle page (Fig. 7), that allows the user to choose the language and then to reach the Home page or to run a tutorial. The graphics in this page is very simple, to be easily understood by users who, at this step, do not know how to use the system yet. Every button in the application changes its color when the user’s gaze is perceived on it, to provide a visual feedback. A button is considered “pressed” after a certain dwell time (1 s as a default). This prevents accidental “clicks” by inexpert users who have never used an eye-controlled interface before. All the default values (such as dwell times, highlight colors, or zoom and scroll speeds) can be changed in the system’s settings. Different behaviors can be set for different controls—for example, a short dwell time for the start button and a longer time for the exit button.
The Tutorial page (Fig. 8) shows a short video (about 90 s) explaining how to interact with the tool. The user can stop, move forward, or move backward the video with controls available at the bottom of the page. When the video ends, or by selecting the exit button, the Home page is loaded. This is a simplification with respect to our initial implementation, in which, during the tutorial, the user had to actively try each control. We adopted this strategy because the interactive tutorial was judged too long and annoying by the visitors of the “Battle of Pavia” exhibition. We also noticed that some of them stopped using the application a few minutes after the end of the tutorial. Even if, in general, an interactive tutorial is an effective solution to learn a new software, normally visitors of a museum do not want to learn how to use an application: they only desire to quickly obtain information about the exposed artworks. We thus shortened the tutorial and uniformed the interaction mode to make the application easier and quicker to learn. This is the reason why all buttons have the same behavior and the same visual feedback, and why we used only standard and intuitive icons (such as arrows or magnifying lenses).
The Home page (Fig. 9) acts as a control panel for the user, who can choose which artwork to explore among those available, change the current language, run the tutorial again, or exit. Artwork images are shown as thumbnails that can be selected like buttons. Arrows at the left and right sides of the screen allow to horizontally scroll the thumbnails, when necessary.
The Visualization page (Fig. 10) is the core of GAE. Its graphic structure and interaction mode have been completely redesigned with respect to the Visconti Castle application. In that case, all buttons were hidden and a visual menu (allowing to zoom in/out, change the current image, and exit the program) appeared when any area of the picture was fixated for 2 s (Fig. 2). This approach was deemed not completely comfortable by visitors, since zoom in/out operations are much more frequent than changing the displayed picture or abandoning the application. In general, the presence of unnecessary functions may confuse the user and potentially lead to wrong actions. In GAE we have logically separated the navigation within an image from the navigation within the application.
A home button is always visible in the upper left corner, allowing to return to the Home page. By fixating any spot of the image for a certain time (2 s by default), two lens buttons appear through which it is possible to perform zoom in and out operations on the observed area. By looking at any other part of the image (for half a second by default), the lenses disappear. When the image is zoomed in, four arrow buttons appear at the four edges of the screen allowing to scroll the image (Fig. 10b). Each button is semi-transparent, but it changes its color to light blue when the gaze is detected on it. This is an acceptable compromise between making the buttons well visible and hiding the image as less as possible. This choice also solves a problem of the previous application, in which the visitor could perform scrolls by looking at any parts of the screen’s edges. Even if that solution was reasonable and intuitive, we noticed that it could cause accidental scrolls when the user was simply observing the edges of a painting. Having explicit buttons makes the user more aware of the position of controls, thus reducing possible accidental shifts of the image. To avoid abrupt movements, both zoom and scroll operations are initially slow and increase their speed only if the user’s gaze remains on the buttons.
The displayed image can contain active areas previously defined with the ActiveArea Selector tool (Sect. 5.1). Active areas are not visible at the beginning. Only when the user’s gaze is over one of them, its rectangular region is highlighted in light blue (Fig. 11a, b). Active areas were also present in the Visconti Castle application, but, in that case, they could only provide textual information, such as the name of a depicted character. No initial information about the number of available active areas was provided. However, at the exhibition we noticed that only few visitors actually found all the active areas: after the first ones, their interest seemed to diminish. To potentially engage visitors more, we have introduced some gamification principles, adding a sort of “reward” when all the areas are found. As soon as a new image is loaded, a message displayed at the center of the screen informs the user about the number of active areas present in that picture and invites him or her to find all of them. When the user’s gaze is detected for the first time on an active area, a “congratulation message” appears which informs about the number of remaining areas (Fig. 11a). When all the areas have been found, a “cup” is shown (Fig. 11b). This sort of simple mini-game can encourage visitors to explore the entire artwork and find all the available active areas with the associated media content.
When an active area is fixated, a play button appears besides the two zoom lenses (Fig. 11c). The area is then highlighted with a thin red border, with no background color. Actually, when the user is searching for active areas, a well-evident blue background is better so as to make the areas evident; on the contrary, when the user focuses his or her attention on a specific region, it is better to show the original colors of the painting. Looking at the play button triggers the display of the associated multimedia content, shown in a pop-up panel (Fig. 11d). The appearance of this panel automatically hides all buttons and makes the whole image semi-transparent except for the active area. This way, the attention of the user can be focused on the media content and on the related active area. To close the pop-up panel, the user has to simply look at the X button. Optionally, the media content can be directly opened as soon as the active area is fixated for a predefined time. It is important to stress that, in both cases, the activation of the multimedia content is consciously triggered by the user (with a long fixation or the “press” of the the play button). Thus, if the visitor is not interested or has already explored all the multimedia elements, s/he can simply ignore them and continue looking at the painting without distractions. Figure 12 shows an example of interaction in the Visualization page.
As stated in Sect. 5.1, when there are many active areas in an image, they can be grouped into “levels” by the art expert and then displayed one at a time by the user. When the Visualization page is loaded, a visual menu appears through which the user can choose which group of active areas to show, by simply looking at the correspondent button (Fig. 13). The buttons in the menu will be as many as the number of levels previously defined (for the specific image) by the art expert using the ActiveArea Selector tool.
At the end of the interaction experience, the user can either exit the application selecting the corresponding button in the Home page or simply move away. In any page, if the user is not detected for a few seconds (10 as a default), the system resets itself and return to the Idle page.