Keywords

1 Introduction

One of the most dynamic fields of research nowadays is the intuitive human-computer interaction through the development of intelligent agents. In order to provide users with efficient methods to interact with computer systems, intelligent agents with two key features have been developed: natural language communication and knowledge handling [1]. Conversational agents (CAs) are an innovative mechanism to enable verbal interaction with humans and a computer system. Although their understanding of natural language for a general application is still limited, it is possible to design and tweak an efficient agent for a specific target domain. Moreover, their knowledge base can be well-defined and fueled by the huge amount of data existing on the World Wide Web.

Although technologies for enable conversation are getting adopted in more and more domains, applications for interactions and content delivery in the cultural heritage domain are not as widespread as we would expect. The majority of such systems is developed specifically for museums and exhibitions as virtual guides and are limited to specific content and intended to be used by museum visitors. There is a noticeable lack of applications for content delivery from the cultural heritage domain to the general public.

In recent years, publicly available data on the web has known a tremendous growth in nearly every domain, even cultural heritage. The available semantic data for this domain can provide to heritage communities new methodologies and support the development of specific applications for both the expert users and general public, to access heritage collections, create online digital libraries of cultural artefacts, navigate and interact with these online resources (search and retrieve data). Also, Semantics in cultural heritage may help overcome other specific issues like handling the multidisciplinary nature of the analytical data available in this field of research. Information about an artefact may come from different data sources and be in various formats, which raises difficulties in the process of efficient knowledge extraction and interlinking data sources, issues that can be solved by applying semantic web technologies.

In this paper we present a prototype version of a web-based conversational agent that can interact with users in natural language and can assist them to explore the European cultural heritage. The agent implements a simple conversational mechanism, which is based on the Google Assistant platform [3]. The mechanism for detecting the users’ intent is based on keywords and entity recognition. The knowledge base for this agent is Europeana, a digital cultural library, museum and archive, offering public access to millions of digital objects from thousands of contributing heritage collections all across Europe. This digital library embraces the principles of the Semantic Web in the structure of its data model, which enables an improved integration in applications providing recommendations or assistance based on its knowledge base.

The rest of the paper is structured as follows: Sect. 2 provides a short background on conversational agents, highlighting other existing implementations. Section 3 presents the Europeana digital library and its API that is being interfaced by the agent. Section 4 presents the implementation details of the web interface and a short qualitative evaluation of system. We conclude this paper with Sect. 5, where also some future research directions are highlighted.

2 Related Work

Due to their intuitive interfaces and features, CAs are being credited to bring on several key assets regarding the interaction effects they have on the human users [3]. Natural language interaction capabilities make CAs very promising instruments for enhancing user access to the Web of Data, since they can provide the desired information in a friendly, natural language conversation.

CAs are being used in a variety of areas, with many applications benefiting from their particular features. Some applications are in the field of virtual cultural heritage, where CAs are deployed as virtual guides for various heritage sites, either in the real-world or in virtual environments [4,5,6]. These implementations have the potential not only to entertain and engage visitors but also to contribute to the learning process by offering personalized feedback, answering questions, storytelling, etc.

Other applications of conversational agents are in adjacent areas such as education (for instruction and training) [7, 8] in the form of question answering systems where such implementations are highly used and thus needed (for example supporting access to large data sets like encyclopedias, in libraries or cultural institutions) [9], for informational purposes (automated customer service, e-commerce) [10], tourism [11], municipalities [2], and many other fields.

The Semantic Web is being credited as particularly useful for exploratory search queries, in which the user has only a glimpse of what exactly he is searching for and during the interaction with the semantic data he develops further insights, leading to the accumulation of knowledge about the inquiry subject [12]. This incremental approach is can be applied by having turn-taking dialogues with conversational agents - as a natural interaction scenario for human users engaged in exploratory semantic searches [13].

The first implementations involving conversational agents for accessing Web information were prior to the development of the Semantic Web. They were targeting improved web navigation by offering an interface capable of accepting search queries in the form of natural language questions instead of the traditional menu-driven navigation and keyword search [14]. Since then various systems were deployed exploiting structured data available on the web [1, 14,15,16].

3 The Europeana Public Digital Library and API Access

The Europeana project is one of the major international projects based on the synergy between cultural heritage and the Semantic web. Europeana is Europe’s digital cultural library, museum and archive, offering public access to millions of digital objects from thousands of contributing heritage collections all across the European Union, via a multilingual interface [17] The creators of Europeana state that it should not be regarded merely as an accumulation of digital object representations, but instead its purpose should be that to enable the generation of knowledge pertaining to cultural artefacts [18].

Thus, the current efforts targeting Europeana are focused on the development of the technological solutions, data models and functionalities needed in order for it to transcend beyond a classic digital library and towards an interactive knowledge provider. In this context, Europeana is being regarded [18] as a complex aggregation of digitalized cultural artefacts and rich contextualization data, all in the process of being embedded in a Linked Open Data architecture [19].

The key mechanism for accessing the stored data is a public API (Application Programming Interfaces). Europeana offers an extensive interface for both end users and content providers, in an effort to enable cultural heritage entities (institutions and private developers) to build their own applications by integrating the functionalities of the Europeana DLMS (Digital Library Management System) or even extend them [20]. This unique framework for accessing Europe’s cultural heritage is being used in an increasing number of projects that are built around the Europeana API and are run by various cultural heritage institutions [21].

In this context, this paper aims to provide an evaluation of the usability and quality of the large amount of structured cultural heritage content that is digitized in Europeana. The API used in this paper for the evaluation is the Europeana REST API based on HTTP calls, with the response information being returned in the JSON format. An Europeana Search API call is basically an HTTP request in a specific format sent to the Europeana API service URL located at:

https://www.europeana.eu/api/v2

There are 4 methods for search and retrieve actions using the Europeana API, among which 2 in particular have been used given the requirements of the present application: search and record. The search method returns a list of records found within the Europeana repository according to the specified search parameters. The HTTP request for a search is done at the following URL:

https://www.europeana.eu/api/v2/search.json

Any HTTP request to the Europeana API must include in the URL string a query string parameter names wskey that is used for authentication. The search.json API method allows for several other query string parameters to be included in the URL for filtering the search results according to the user’s needs. The key parameters are query, which specifies the search term(s) and qf, which provides facet filtering query. Besides these, several other parameters enable filtering of the results according to various factors (copyright status, thumbnail present, and others). A basic example of a search API request URL is:

https://www.europeana.eu/api/v2/search.json?wskey=xxxx\&query=mona+AND+lisa

In the above example, the search query will provide a response with records containing both words mona and lisa. Moreover, it is possible to limit the search to a specific data field by providing its name using a predefined syntax. For example, searching for objects whose author is Leonardo da Vinci is performed using the following syntax who:“Leonardo da Vinci”:

https://www.europeana.eu/api/v2/search.json?wskey=xxxx&query=who:“Leonardo+da+Vinci”

Other data fields that can be used in a query are: title, who, what, when, and where. This provides to the user the ability to create complex queries that retrieve specific cultural objects. In this paper we limited our queries to the data fields what that describe the type of the object (paintings, pottery, statues, etc.) and who to search for the object’s author.

The other Europeana API search method used is record, which retrieves detailed information about a single record within the Europeana repository. A generic HTTP request for a record method call is done at the following URL:

http://www.europeana.eu/api/v2/record/[recordID].json

The record.json API method needs prior knowledge of the Europeana record’s ID string, which needs to be included explicitly in the URL like in the following example:

http://www.europeana.eu/api/v2/record/9200300/BibliographicResource\_3000052917527.json?wskey=xxxx

The response of a record-type HTTP API request contains an object representing the EDM (Europeana Data Model) metadata record in JSON format. This object includes information specific to that particular record, among which the Europeana Collection which it belongs to, the record title, location, time of creation, EDM Dataset name, information about when the record was created in the digital library.

4 Web-Based API Interface

In order to facilitate a user-friendly interaction and communication in natural language with the Europeana database we designed a simple web interface, where the user can input a query, the back-end analyses the text, based on the query data constructs the request, and sends it to the Europeana API.

4.1 Implementation

The whole Web application was implemented using Django, a high-level Python Web framework. Django follows the MVC paradigm, where the model (M) represents the data (usually stored in a database), the view (V) is the representation layer of the app (HTML Web app), and the controller (C) controls the flow of information between the model and the view and implements business logic. Django enables rapid and reliable development.

The front-end of the application (see Fig. 1.) was created using Bootstrap and custom JavaScript code. The application is divided in two parts: (i) the top part allows to interact with the CA in natural language and displays part of the conversation history; (ii) the bottom part presents the search results in a structured way (see Fig. 2.). The communication with the back-end CA is implemented through AJAX calls, where JSON objects are exchanged.

Fig. 1.
figure 1

A Simple chat web interface to access the Europeana API.

Fig. 2.
figure 2

Retrieved images for the query Monet paintings.

The input data is sent as a JSON object containing three fields: (i) question, where the query of the user is stored; (ii) userID, containing the ID of the current user to track the conversation; timestamp, that holds the time of the request. The JSON is processed in the back-end functions. The simple CA is created using the Asistent platform [2]. The platform supports the creation, management and use of virtual assistants. It is developed as a Software as a Service and is thus accessible through API calls. The platform is composed of several modules that provide answers to questions The most important modules are: (i) keyword-based answers; (ii) dynamic answering using RSS feeds or CSS selectors; (iii) indexing library to provide answers from structured data; (iv) applications that provide additional functionality to end-users. As a proof of concept we limited our search to the artwork type paintings or pictures and well-known authors in the prototype application. Communication with the Asistent platform is implemented using the API call:

/ask (HTTP GET method) using input parameters:

  • question: The query from the user

  • context: The context within the question was stated

The answer is provided as a JSON object containing four fields: (i) answer, providing the textual response of the CA; (ii) ID, the serial number of the answer; URL, a website that is associated with the response and (iv) data, that contains a JSON object with a list of artwork obtained from Europeana. The JSON is structured as follows:

figure a

The data filed is used to construct the bottom part of the application, the gallery-like list representation (see Fig. 2). Selecting a specific element opens up a full-screen overlay showing a basic card providing more information linked to the selected object.

4.2 Evaluation of Search Results

A simple evaluation was performed based on the relevance of the search results and the perceived usefulness. We limited our search to paintings and/or images from five famous painters (Giotto, Monet, Picasso, Rembrandt, van Gogh) to ensure the presence of relevant artwork in Europeana. For each author, the first 100 results obtained from Europeana, that should be the most relevant, are analyzed qualitatively by obtaining user feedback from two questions: Q1: CA provides appropriate search results based on the input and Q2: Interacting with CA is easy and flexible. The two questions are graded with a number from 1 to 5 according to his perception of the statement (1 - totally disagree, 2 - partially disagree, 3 - neutral, 4 - partially agree, 5 - strongly agree). Next, a quantitative analysis is performed to obtain the ration of relevant results in the first 100 hits.

The next three figures present part of the search results for different search queries. Figure 2. Shows the retrieved images for the paintings of Monet query. Only part of the results are actual paintings or images of paintings, there are some documents linked to the painter, photographs from various events and also some objects, that are relevant, but missing the appropriate image. Out of 100 hits 42 objects can be regarded as relevant and they are not ranked in the order of importance.

Next, in Fig. 3. objects obtained from Europeana for paintings of van Gogh are listed. In this query, a higher ratio of relevant results was obtained (67 out of 100), but ranking by importance is again missing, which lowers the perceived quality of results. As before, among the results we can see images of documents, book and journal covers and not relevant portraits of the painter. There are less objects missing the image field; however, they are still present.

Fig. 3.
figure 3

Search results for the query van Gogh paintings.

Finally, in Fig. 4. part of the search results for Picasso paintings are listed. In the first 100 hits, 37 objects are tagged as relevant, which is, as in Fig. 2., a rather low value of relevant hits. There is a high ration of photographs of Picasso among the results, which is expected since he is a contemporary artist that died in 1973. We noticed that his most important artworks are missing in the results, which again points to a poor ranking mechanism provided in Europeana. As can be seen in Fig. 4. several objects missing the image field are present among the results, even though some of them represent relevant artwork of Picasso.

Fig. 4.
figure 4

Search results for the query Picasso paintings.

Table 1 provides a summary of the quantitative and qualitative evaluation. Q1 was evaluated for each author, while Q2 addressed the overall usability of the system and is thus represented with a single number. The numerical evaluations for Q1 and Q2 are averaged and rounded to the first decimal number. As it can be observed, the evaluation of the relevance did not score high values, which means that users did not find the obtained search results to be useful. Contrary, they found the CA to be easy to use and flexible (average 4.3 out of 5.0). The relevance ratio of artwork in the first 100 hits is presented in the last row. Except for van Gogh, the values are rather low, which is in line with the user evaluation from Q1.

Table 1. Evaluation summary.

The implication is that the Europeana API does not provide relevant results based on the search queries, which limits its usability. There is a need to provide an efficient ranking or scoring mechanism to improve the relevance of the search results.

5 Conclusions and Future Work

In this paper we presented a proof-of-concept of a web-based application implementing a conversational agent that enables communication in natural language with the user and provides an intuitive interface to the Europeana database, a digital cultural library which provides public access to millions of digital objects from thousands of contributing heritage collections all across the European Union. Within the application, the user can search for paintings and images of specific authors.

The evaluation of the search results highlighted a major issue linked to the Europeana database. The lack of a ranking measure that would rate search results based on the popularity or importance of the artwork results in a rather poor relevance of the top search results. This gives to users the perception of poor understanding by the CA and limited usability of the system to provide relevant results to search queries.

Future research efforts will address the aforementioned issue and will focus on finding the appropriate solutions for improving the quality and format of the responses to user queries. By applying machine learning and linking additional data from external resources (e.g. Wikipedia) to design an evaluation function that would rank the artwork obtained from Europeana and display the top 50 best ranked results will greatly improve user acceptance, usefulness and increase the perceived intelligence of the conversational agent. Learning will be performed on the textual meta-data associated to each record and by introducing decision rules that qualitatively evaluate a specific record. In addition, computer vision methods could be applied to the graphical element of the record set to identify the most relevant record by comparison among them or with data from external sources.