Advertisement

Multimedia Systems

, Volume 21, Issue 5, pp 427–449 | Cite as

Modeling performing arts metadata and relationships in content service for institutions

  • Pierfrancesco Bellini
  • Paolo Nesi
Open Access
Regular Paper

Abstract

The modeling of performing arts metadata is considered one of the most challenging problems, since performances add complexities related to events to the classical cultural heritage descriptors associated to physical objects. The most relevant lacks of the present models are related to the modeling of information connected to performers and performances, which are obviously distinguishing aspects of the performing arts and are essential for the preservation of our cultural heritage and literature, such aspects being strongly connected with performing arts. This paper presents the European Collected Library of Artistic Performance (ECLAP) semantic model that has been specifically defined for aggregating and enriching performing arts content coming from several content providers. ECLAP has been set up by the European Commission to play the role of content aggregator for Europeana. The proposed ECLAP semantic model addresses most of the identified problems. The proposed model has been compared with present standards and it is now supported by a graphic tool for user navigation among semantic relationships and Linked Open Data (LOD). The paper also describes the generation of LOD from the ECLAP semantic model and the mapping of ECLAP model to Europeana Data Model (EDM). The experience highlighted that some relevant elements produced, enriched and aggregated by ECLAP cannot be mapped into EDM, while the ECLAP model can address some of the details related to the performing arts which are not at present addressed by the available standards.

Keywords

Performing arts Metadata enrichment Performing arts metadata LOD EDM Metadata standards 

1 Introduction

What is part of our history is the reality of institutional services, where users can access content by searching and browsing online catalogs obtaining lists of references with static archival models without any dynamic connection with internet world and archives and with no information enrichment provided by the involved users. With the introduction of web 2.0/3.0, data mining and semantic computing, and wide usage of social media and mobile technologies most digital libraries and museum services were forced to radically renovate their services. Very famous cultural institutions partially suit their services to exploit new technologies and opportunities, e.g., getting visibility on the major social networks. For example, among positions in terms of Facebook likes and/or Twitter followers we have: MoMA, Metropolitan Museum, Musée du Louvre, British Library, Guggenheim Museum, Centre Pompidou, British Museum, Getty Museum, Los Angeles, Smithsonian Institution, etc. In most cases, these institutions use social media solutions as promotional channels rather than taking the opportunity of exploiting the semantic computing innovations to provide new services and tools for their customers, for example, to increase the user engagement and to enrich the content itself. The last step would imply to dominate a higher level of technology awareness, which is much more complex to be conquered, in terms of both acceptance and investments. Moreover, professional users are unsatisfied by general-purpose social media solutions since they do not provide satisfactory facilities to perform advanced semantic aggregations and associations, learning management, which are needed for educational and professional purposes. These needs have determined the creation of a number of more specific and focused services that in the domain of digital libraries for performing arts can be identified as: Artyčok: http://www.artycok.tv, Digital Theatre: http://www.digitaltheatre.com, Digital Dance Archives: http://www.dance-archives.ac.uk, SP-ARK: http://www.sp-ark.org, and European Collected Library of Artistic Performance (ECLAP) http://www.eclap.eu.

Users asking for content services are becoming more exigent, requesting new features in the area of collaboration, social and semantic computing, such as managing content and user services via collaborative tools, aggregations tools, linked data access and integration, metadata interoperability and integration, connection with social networks, access and tools on mobile devices, semantic navigation tools linking open data, etc. To this end, when designing, nowadays, an institutional service in the domain of performing arts several aspects should be considered such as: adequate metadata model for the performing arts including aggregations and annotations models and tools, semantic relationships among content and users also taking into account their actions and collaboration, technical metadata for content distribution and intellectual property management, mapping and publication information as linked open data (LOD), the establishment of connections with relevant external sources of information such as dbPedia, geonames, exporting information toward international organizations such as Europeana, and finally navigation into the main established relationships among content and users.

The modeling of performing arts metadata is probably one of the most complex cases, since the concept of cultural heritage and artistic work presents not only manifestations (instances, for example: pictures of a painting) but also performances, adaptations, interpretation, etc., where the artistic capabilities are again dominant. Therefore, many standards have tried to address the problem, such as: MPEG with MPEG-7 descriptors (MPEG-7); EN 15744:2009 (film identification—minimum set of metadata for cinematographic works) and its superset EN 15907:2010 (film identification—enhancing interoperability of metadata—element sets and structures) (EN 15907:2010); Functional Requirements for Bibliographic Records object oriented (FRBRoo) [16] which is the harmonization of FRBRer and International Committee for Documentation-Conceptual Reference Model (CIDOC-CRM) [18] and it is performed by International Federation of Library Associations and Institutions (IFLA) and International Council of Museums (ICOM) [14]; Dublin Core metadata terms [13]; Visual Resource Association-Core (VRA-CORE) [45]; Categories for the Description of Works of Art (CDWA) [11] and recently, the Europeana Data Model (EDM) [24]. In [40], a method for aligning several multimedia metadata models to Multimedia Metadata Ontology (M3O), [39] has been presented. Multimedia Metadata Ontology is grounded on a number of patterns that can be used for modeling: annotations, aggregations, description, situation, etc. Specifically, in [32], an analysis about the usage of FRBRoo for modeling performing arts descriptions as linked data has been presented. In [15, 17], a study about the mapping of FRBRoo structures and concepts to EDM has been proposed. This study has been specifically focused on performing arts case, due to their high complexity. In [19], an analysis of the difficulties in modeling performing arts issues with ontology has been carried out. This analysis highlighted some criticism of FRBRoo (that could be moved also to other metadata models as well) about the modeling of both abstract plans for a performance and the several variations in the related instances—i.e., the real performances. Other former and relevant studies in this field are GLOPAC [26] partially derived from FRBR, and Performing Arts Documentation Structure [36] grounded on Media Art Notation System which has been built on top of MPEG-21 metadata framework. On the other hand, these standard show limitations on modeling the information related to performers and performances.

Another relevant aspect is the description of annotations of multimedia content and the exportation of these data with open and accepted formats. The Annotea project [29] was one of the first to adopt semantic web technologies for annotations and it was originally designed for annotations of web sites and therefore it offers limited capabilities for annotating multimedia objects. The LEMO annotation framework [27] built on top of Annotea model supports annotations of media fragments [46]. Recently, the Open Annotation Collaboration (OAC) model [28] has been proposed and it is designed for the use as linked data.

Moreover, a number of basic technologies and standards can be taken into account. Linked Data is a technique for data publishing, which uses common web technologies to connect related data and make them accessible on the Web. It is based on identifying resources with HTTP Uniform Resource Identifiers (URI), and, using standards like the Resource Description Framework (RDF) [38] to provide data about these resources and to connect them to other resources on the web [9]. In most cases, for resource description, a common practice is grounded on exploiting available vocabularies. The reuse can be performed using already-defined classes and properties or by creating a specific vocabulary and defining sub classes and sub properties starting from those already defined and accessible. Some well-known basic vocabularies may be:
  • Dublin Core (http://purl.org/dc/terms/) for the description of human-created artifacts [13],

  • Friend of a Friend (http://xmlns.com/foaf/1.0/) for the description of people, organization and relations among them,

  • Creative Commons (http://creativecommons.org/ns#) for the representation of legal information about works,

  • Basic Geo Vocabulary (http://www.w3.org/2003/01/geo/wgs84_pos#) for basic properties for the representation of geographical coordinates.

In the field of performing arts, there are also some specific contributions, while not all the aspects are covered by a single vocabulary. The Music Ontology [37] aimed at modeling main concepts and properties of shared music (albums, tracks, performances, arrangements, etc.). It includes information that could be related to distribution models and services such as Napster, Last.FM, and iTunes. It has been used by BBC programs and music [30], with DBtune, even if it covers only the music-related information. Moreover, the Linked Movie Database has a vocabulary specific for the film domain, and other ontologies like dbPedia [8] and Freebase are quite generic. In [12], an attempt to model an ontology of live performances has been presented. In [31], the analysis for addressing the problem of linking content with relevant characters has been proposed by exploiting LOD. It can be useful to establish relationships among performing arts authors and performers with digital resources and descriptors.

On the other hand, despite the large work performed so far, not even one of the above-mentioned standards and solutions is satisfactory on modeling performing arts scenarios. The most relevant lacks are related to the semantic descriptions and to the modeling of the information related to performers and performances, which are obviously distinguishing aspects of the performing arts and are essential to the preservation of our cultural heritage and literature.

In this paper, the semantic model and tools for ECLAP service for performing art institutions are presented. (European Collected Library of Artistic Performance, http://www.eclap.eu has been set up with CIP PSP funding from the European Commission and partners). ECLAP is a portal and service which collects, enriches and distributes content coming from more than 35 performing arts institutions (i.e., content partners), ranging in 18 different countries, from Europe, South Africa, Russia and Chile. An overview of ECLAP can be recovered from [20]. ECLAP infrastructure and semantic models have been designed to cope with most of the above-mentioned problems of the performing arts domain. Up to now, the ECLAP infrastructure has processed more than 170.000 objects, made of more than 1 million items, in up to 13 different languages, thus obtaining about 1/2 million of content accesses in the last year. ECLAP services include tools for content ingestion, workflow management, metadata enrichment, IPR definition, multichannel distribution (PC and Mobiles), content aggregation (playlists and collections), and also exportation/publication toward Europeana in EDM via an OAI-PMH server and as LOD. The ECLAP content is processed to be described in terms of the so-called ECLAP semantic model. This ECLAP representation model is much richer than the ECLAP ingestion model which has been adopted to make easier the conversion from several ingested metadata formats such as: DC, FRBR, MARC, EAD, CDWA, etc. ECLAP also provides support to manage discussion groups and distribution channels of final users belonging to content partners, and thus to take care of the relationships those users accessing content may establish with content itself and among one another.

The paper is organized as follows: Sect. 2 presents an overview of the ECLAP service and tools for performing arts archive. Section 3 presents the ECLAP semantic model describing the entities and the supported relationships among different content kinds and users taking into account performing arts aspects, IPR, annotations and aggregation, the linking of the ECLAP semantic model with external sources as dbPedia and geonames, regulations, and of collected dates. In Sect. 3.3, a comparative analysis of the ECLAP semantic model in representing performing arts metadata with respect to most of the above-mentioned metadata standards is also provided. The analysis has shown that ECLAP addresses some more details of the performing arts with respect to the present standards. Section 4 describes the LOD model generated by ECLAP semantic model with the related choices to make available to external portals the ECLAP complex model, including: content description, taxonomy, relationships, user descriptors and annotations (according to MyStoryPlayer model), links to LOD. Some examples have been reported as well. In Sect. 5, an overview of the Social Graph tool of ECLAP to allow users to visualize and navigate among the ECLAP semantic model has been presented. The Social Graph also allows to prune and filter the relationships according to the user’s interests. Some results of the user validation have also been presented. Finally in Sect. 6, the mapping of the ECLAP semantic model toward the EDM model of Europeana is presented. This mapping represents the final phase of the metadata aggregation process of European thematic and regional aggregators collecting metadata to provide them to Europeana. Conclusions are drawn in Sect. 7. In the Appendixes, there are some formal description of the relationships modeled in SocialGraph, and more information about the mapping of ECLAP toward EDM. These Appendixes are also available as web pages on ECLAP, but are reported herein as well, for the sake of simplicity.

2 ECLAP overview

ECLAP is a Best Practice Network and a service provider. ECLAP services are offered to performing art institutions which provide content on ECLAP with the aim of collecting, aggregating, enriching and distributing content toward end users and other international institutions (via OAI-PMH and LOD). As a Best Practice Network, ECLAP consists of working groups that analyze the state of the art and produce best practices and guidelines documents to cope with technical and strategic problems in the performing arts sector [20]. To this end, three main ECLAP Working Groups (with corresponding blogs and forums) have been set up to cover the areas of: digital libraries and models for performing arts content, intellectual property management and tools, and digital content-based tools for teaching and learning performing arts in the new era. To make the networking and discussions easier, ECLAP is also a repository of technical documents, demonstrators, best practices and standards which can be used to understand better problems and find corresponding guidelines, state-of-the-art solutions, as well as future activities and project proposals.

The ECLAP content service exploits the use of social media and semantic computing technologies and solutions for content and metadata enrichment, aggregation and distribution of rich multilingual performing art content toward personal computers and mobiles. Presently, ECLAP distributes more than 170.000 distinct objects (video, audio, images, texts, 3D, braille, animations, web pages, epub, MPEG-21, documents, etc.), coming from more than 30 Content Providers (CPs), in up to 13 metadata languages. The content is made available to a community of performing art professionals, teachers, and students, thus building up a community of more than 2,300 users.

The ECLAP content management performs a wide range of metadata enrichment activities (based on AXCP media grid [3]). The typical metadata enrichment performed by ECLAP can be the addition of technical descriptors to source files, the addition of more languages, the geo localization recognizing locations mentioned into metadata and descriptors augmenting them with formal geonames and thus GPS positions, the production of QR codes for museum inspection and linkage (see it as augmented reality first step), the creation of content aggregations (e.g., collections, playlists, e-courses, annotations), the addition of comments and tags, the association of taxonomical classifications, the establishing of connections with dbPedia open data of well-known personages (VIP names), the addition of a formal IPR license descriptor, the association of unambiguous date and time for events, the association of an UUID (permitting the management of any kind of identifiers that may be available for the single content element such as: ISBN, ISAN, ISMN, private coding IDs, etc.), the production of LOD, etc. [4].

With this large range of activities and semantic enrichment processes, ECLAP has to provide a suitable semantic model, as described in the next section. This paper focus is on presenting ECLAP semantic model and comparing it with standards, thus providing information about the LOD service and model of ECLAP, together with its comparison with Europeana EDM.

ECLAP users are professional users: teachers, researchers, archivists, performers, directors, artists, etc. (see, for example, the distribution of ECLAP users on http://www.eclap.eu/103996). Their motivations about the mentioned requirements are mainly related to get access to content with complete semantic description for research purpose, content study and comparison, fundraising, preparing lessons and proposing/producing new performances. On the other hand, most of them have strong interests to see their content located in the same portal of well-known artists and thus widely promoted on internet and on Europeana, so that their content might be used and referred by other professionals and same field researchers.

According to the above-mentioned requirements a comparison of ECLAP services with respect to many other content and performing arts portals has been carried out and described in technical reports [21, 25]. Moreover, for the sake of completeness a short overview is reported in Table 1, where the most attractive services are compared with ECLAP on the basis of the major requirement areas. It should be noted that most of the archiving solutions do not have aggregation and annotations tools. Instruments of the previous generation were typically standalone tools as Ligne the Temps [33], Theatron (http://www.theatron.org), and thus they have not been reported in the table, while their analysis can be recovered from the cited references. On the other hand, ECLAP integrates a set of tools for semantic enrichment to establish aggregation, annotation and relationships among media and content [2, 7]. It can be noted that ECLAP is offering a wider set of services and in most cases they are integrated each other and offering more functionalities. Their higher level in most cases depends on the capability of ECLAP semantic model presented in this document in expressing and exploiting media and user relationships.
Table 1

Comparison of performing art services against major requirements

 

Artyčok: http://www.artycok.tv

Digital theater: http://www.digitaltheatre.com

SP-ARK: http://www.sp-ark.org

REPLY http://www.siobhandaviesreplay.com/

UBU http://www.ubu.com/

GLOBAD http://WWW.Glopac.org

ECLAP http://www.eclap.eu

Data base of content aggregation

Limited (1,461)

Limited (36 theater productions)

Small (4,000)

Small (39 works and 9 related projects)

Small

Small

Yes (>170.000)

Number of partners

14

20 UK

1

5

8

18

>30 from 13 countries

Networking and collaboration

Limited

Limited

Limited

No

No

No

Yes

Social media connections

No

Yes

Yes

No

No

No

Yes

Advanced semantic model with: classification, analysis, contextualization, relations, comparison

No

Limited

No

Yes

No

Yes

Yes

Aggregation tools

No

No

Yes

No

No

No

Yes

Multilingual metadata

Only EN and CZ

No

No

No

No

Partial

Yes

Audiovisual Annotations

No

No

No

Partial

No

No

Yes

Multilingual Search and retrieval

Partial

No

No

No

No

Partial

Yes

Linked Open Data

No

No

No

No

No

No

Yes

Social Graph modeling and access, semantic navigation

No

No

No

No

No

No

Yes

Connection with Europeana

No

No

No

No

No

No

Yes

According to the last surveys about ECLAP service (as reported on the portal, as well), users appreciated more services such as: the large collection of content enabling them to create aggregations and comparison of content and master classes coming from multiple institutions (it often occurs that famous artists create master classes content only for one institution), the possibility of accessing to content and its related relationships and aggregations via graphical interface (i.e., Social Graph), the coverage of the metadata schema including multilingual and the IPR management, possibility of creating annotation on audiovisual.

3 ECLAP semantic model

According to the above summarized aims, the corresponding semantic model for ECLAP has to provide the ground where CPs, can map ingested and uploaded content using several kinds of metadata models and sources. This also means to provide a model where all details and relationships can be modeled despite their metadata source format: DC, EAD, MARC, custom models, FRBR, CDWA, etc., [42]. This process in ECLAP is performed using a formalized workflow [10]. On the other hand, to cope with the above-mentioned aspects, the information related to ECLAP content and users is modeled by means of the so-called ECLAP semantic model (described in the following), which is much richer than the ECLAP ingestion model adopted during the metadata ingestion [10]. The ECLAP semantic model includes relationships and information that are typically missing in the former classical metadata formats which have been added to cope with modeling the external links to dpPedia and geoname.org, the performance aspects, the IPR details, users and their relationships, annotations, aggregations, etc. A part of this information is automatically produced by ECLAP back-office algorithms semantic enrichers, while others from human-based crowdsourcing.

In Fig. 1, the general overview of the ECLAP semantic model is shown where almost all the mayor entities are reported. The ECLAP semantic model has been defined as a compromise to create a model taking into account several issues such as: (1) modeling content metadata of heterogeneous cross media content coming from different formats and sources for performing arts (2) modeling information and relationships with the users involved in workflow, modeling and managing the IPR for conditional access and user-generated content management (3) modeling links with external open data and resources without changing the original metadata (4) publishing information as EDM semantic model, LOD and other formats (5) providing performance in metadata access from back-office processes.
Fig. 1

ECLAP semantic model overview

In the semantic model, the Content element represents all the content kinds managed by the portal. Content is associated with Groups/Channels managed by CPs (each ECLAP content provider has at least a group/distribution-channel to manage). Content is specialized in Event, Blog, WebPage, Forum and Media Objects. Blogs, WebPages and Forums are used to provide news, general unstructured information and to stimulate the users’ discussions on specific topics; while Media Objects represent the multimedia content and their aggregations that are accessible from ECLAP and published toward Europeana. The Media Objects are specialized in AVObjects (audiovisual: Image, Video, Audio) that can be used in annotations and in Playlists. Annotations are created by means of two relationships between audiovisual. They can be One2One or Explosive annotations. They are the basic elements to create more complex annotations as well. In One2One annotations, an audio visual object or one of its fragment is related to a segment of another or of the same audiovisual, both of them are played at the same time; in Explosive annotations an audiovisual fragment is related with a single time instant of an audiovisual at which it has played, interrupting the former. Annotations can be built and played using specific tools coming from MyStoryPlayer tool [7] and saving them according to the W3C Open Annotation model as described in the following. At each Annotation, a set of information (Annotation Description) can be associated, such as: labels, text fragments, descriptors, etc. Playlists aggregate AVObjects in a sequence allowing the usage of fragments of the Audio/Video. Collections aggregate a set of Media Objects and in this case they can include also Documents, Playlists, etc, and thus also other Collections. Courses and Programmes are a specialization of a Collection being a set of ordered Content.

Moreover, Content may have several Comments and/or Ranks (votes) and it can be associated with a set of terms taken from a multilingual taxonomy. Taxonomy based classifications describe information about the taxonomy terms associated with the content: for each term what is reported is the label in every language, the term id, and the id of the top term for the hierarchy and the path from the term to the top term. The Taxonomy consists of a qualified vocabulary as a SKOS [41]. Each Content (and thus also MediaObject) is associated with different sets of metadata (see Table 2), the DublinCore metadata (e.g., title, subject, type, description), the Technical metadata related to the content and its distribution (e.g., audio/video duration, image size, ingestion details, digitization details, content URL, available media resolution, compliant devices), the metadata per IPR Licenses (for managing content access also localized for nationality or domain, Europeana.Right, license URL if any), the Workflow details related to management (e.g., kind of content lifecycle workflow (internal, external, test, europeana, eclaponly,…), status of the content into the workflow, actions to be done, etc.), and specific metadata for performing arts information (e.g., performance place, performance date, performing arts type, performers, etc.). The IPR License refers to an IPR Model formalizing the rights that can be exploited for each category of user (public anonymous, registered, educational, group, trusted), for a type of content in the different versions (e.g., resolution), and for the different devices, locations, time, etc.
Table 2

ECLAP metadata at a glance, divided into main categories

Metadata category

Number of fields

Multilingual

Location name/info

Person names

Dates

Performing arts

Multiple

Y

YT

YT

YT

Dublin Core

15

Y

YT

YT

YT

Dublin Core Terms

22

Y

YT

YT

YT

Technical

17

N

YF, GPS (Lat, Long)

N

YF

IPR license

Multiple

N

YF

N

YF

Workflow

10

N

N

N

YF

Group/channel

Multiple

Y

N

N

N

Comment

Multiple

Y

YT

YT

YT

Annotation

Multiple

Y (description)

YT

YT

YT

Rank

Multiple

N

N

N

N

As to: Location Name/Info column: YT means that some fields may contain single or multiple locations in the free text, while YF means that the set of locations is well formalized (using standard codes, for example). As to Person Names column: YT means that some of those fields may contain single or multiple citations to many Person Names of people that can be VIP among Users, they may be in several different formats and languages. As to Dates column: YT means that some of those fields may contain single or multiple dates in several different formats; YF means that the reported dates are well formed in the unified format for the portal. In the case marked as YF, the information is directly produced by the ECLAP back office or solved at the ingestion/insertion time, thus the format is well formed and unambiguous. In the YT cases, the information is included in free text without a precise format and semantics, so that it has to be disambiguated and interpreted. This table does not report the relationships among content and users

Table 2 reports the multiplicity of the single metadata segment and if this piece of information supports the multilingual coding and representation. The total number of associated information for the most complex content element may consist of more than 500 elements, excluding comments, annotations, ranks, etc.

This paragraph reports and comments some examples of the properties defined for the performing arts metadata category. These properties have been: (1) defined as specialization of Dublin Core properties, and (2) identified by means of an analysis of the metadata schemas used by the 35 ECLAP international partners, as well as other schemas used by other projects and metadata standards. Among the properties are information about the performance depicted in the resource (place, city, country and date); the premiere of the performance (place, city, country and date); the contributors to the performance creation, each one having the specific Cast/Crew role (actor, dancer, light designer, hairdresser, director, set designer); the type of performing art (e.g., theater, dance, etc.); name of the theater or dance company or musical group (e.g., Momix); Object, object used in the performance; artistic movement and acting styles the work can be classified in (e.g., Classicism, Dada, Epic, Expressionism, etc.); date when the recording was made, etc. A complete description of the ECLAP metadata fields is reported in [42], while indexing is described in [5].

Moreover, as represented in Fig. 1 related to the semantic model, some of the Dublin Core and performing arts metadata elements (e.g., coverage, spatial, performance place, performance city and country) may include some citations to location (that may be associated with geonames entities) and/or to Person Names. This means that some of the metadata fields of Content may contain information that can be related to external open data services to enrich the original metadata, and/or to internal information. Person Names in free text metadata fields may refer to:
  • Well-known VIP personality (that may be solved by linking them to dbPedia or other source vocabulary),

  • User names of the portal (for example, a co-author). For example, a User may be mentioned into a metadata field (e.g., in the Dc:Description, thus establishing an implicit connection to be re-cognized and explicated by the system); a User could have uploaded a content, thus creating an implicit link with the content (see in the following for further details).

  • Cited Names, which are simple citations to person into the free text and may create relationships with other content having similar citations (for example, the same piano player, the same director, which are neither VIP names nor ECLAP Users, but it might occur that they are cited several times in the same or different content collections).

Metadata fields may also include instances of dates that can be very useful to identify events and build a temporal ordering of content facts: performances, uploads, publication, historical periods, etc. For example, a Dc:Description may include a text as “music concert of Mozart, held in Luzern, 03-01-98”; thus linking to W.A.Mozart and to a specific performance event.

In Fig. 2, the relations among Users and other entities in the semantic model are depicted. A User may be a member of one or more groups and can be a group administrator. Moreover, each User has his/her profile associated with a number of important features so as to manage content and establish relationships with content. Each Content is provided by a User, who can have the right to access (via an IPR Profile matching with the associated IPR Model of the content), and can suggest and vote/rank content to other users and toward social networks. The access to a given content by a User is a piece of information to be saved to create suggestions and recommendations.
Fig. 2

Relations of users with other major entities of the semantic model in Fig. 1

A Media Object is a specialization of Content that may be marked as favorite (similar to the Facebook “Like”) by a User, and a group administrator can insert a Content into the featured object list of the promoted content on the portal. Comments and Annotations are linked to the User who created them. Finally, Users are linked with other Users with the ‘knows’ relation that builds the classical ‘Social Graph’ and each user can specify the topic of interest among the taxonomy terms. The User has a number of topics of interest that can be modeled similarly to the taxonomical model for content (this classification can be derived out of the user or dynamically calculated on the basis of the plays and/or content appreciations). To manage the Content, specific roles can be assigned to each User so as to access and change content information (Workflow Roles). On the other hand, an IPR Profile is assigned or computed for each User to verify the access rights during his/her content accessing, with respect to the IPR Model associated with the accessed content. Finally, Users can be also cited into some metadata fields and thus on Content. For example, the DC and/or performing arts metadata fields. This fact occurs quite often when user-generated content is provided, thus augmenting and aggregating archival content.

3.1 Mining and linking to external datasets and LOD

In ECLAP semantic model, there is number of specific fields where locations and Person Names may be directly referred using a dictionary or vocabulary. On the other hand, that activity of producing qualified values is vanished by the effective gathering of thousands and thousands of metadata content coming from several sources, in several different formats and different interpretation of the metadata fields (e.g., different “DC dialects”), etc., which have to be integrated in the unique ECLAP archive. This fact does not allow the normalization of person and location names at the ingestion phase, requesting the user to identify them from a predefined set. As to Person Names, the creation of a vocabulary can be very complex, since in the performing arts domain the metadata may include all the names of the actors even those playing very small roles. Moreover, these names are mentioned into metadata fields defined as free text, and available as free text in the former archive of the CPs. The fields of Table 2, fields identified by column Person Name have to be processed by a natural language processing engine to extract Person Names in all their possible forms and languages with the aim of disambiguating and normalizing them in the system using natural language processing tools [1]. The problem of name entity recognition with synonyms in text is well known and it can be solved with a variety of solutions ranging from simple grammars to machine learning. The identified names and their variations and permutations are searched on dbPedia to associate citations to external entity, so-called VIP Names. A set of possible external resources (urls to dbPedia) is associated with the master name and its synonyms. The identified names and their variations and permutations are also searched into ECLAP Users so as to associate citations from metadata fields to a ECLAP User.

In ECLAP, on about 170.000 objects, the algorithms have identified about 24,000 unique Person Names and more than 780.000 instances. The 9 % of the unique Person Names had at least a candidate correspondence on dbPedia, while only the 0.67 % of them allowed to establish at least a correspondence with a ECLAP user. Moreover, for each identified name, the whole set of Content, such name is cited and made accessible to the user directly from the metadata via a link. This allows to see for each person name (though not being a VIP nor a ECLAP user) the content mentioning the same name, and therefore to know more about the related user activities.

A second relevant analysis was related to the geographic locations and places. The aim was to identify geographical information to find matches with names appearing in the geonames.org dataset and thus obtaining formal location and GPS positions. The most informative fields are the (first) performance place, city and country and Dublin Core spatial and coverage. Since exact matching did not produce enough results, the matching was performed using full text search of the metadata field over the geographical names, the results have been filtered requiring that the words of the matched name had to be present in the metadata field. The results were assessed using precision/recall methodology, obtaining a precision of 98 % for cities (recall of the 98.8 %) and 99.5 % for countries (with a recall of 16 %, since in most cases the country was missing or identical to the city). Moreover, when the country field is identified for the identification of city or place the search is limited to names of that country. The solution adopted is similar to the one proposed in [43, 44].

3.2 Regularizing and disambiguating dates

As highlighted in Table 2, the metadata sets of ECLAP have instances of well-formed dates, and may include many instances of dates in the free text fields. Therefore, the latter may have a large heterogeneity in terms of format and meanings: several tens of different CPs, different collections, sources, standards, countries. In most cases, the provided metadata contain stratified information and revisions over time and different modalities of writing and classifying are used. As stated in Table 2, only a few dates are generated by the system. In most cases, dates are reported with different formats and/or languages; for example: 2013-04-01, April 2013, travanj 2013, 4th of May 1996, 4 mai 1996, etc. In many cases, the dates provided in the free text fields may be ambiguous and/or incomplete: 01-02-02, 04/02, 1995, etc. This complexity creates strong problems for the temporal ordering of content and thus of performance. To solve such problems, algorithms to regularize and disambiguate dates, allowing and performing date classification and resolution, processing all kinds of obtained metadata are needed. The disambiguation process has to consider: (1) the language and the context (2) the probability of each given format for the identified collection and Content Provider (CP) (they can be deducted from the unambiguous dates found in the collection). Therefore, the algorithm has been based on a set of date model formats and natural language processing. Over the 170.000 content objects, about 864,000 dates have been identified, about the 80 % of dates have been disambiguated and classified as: first performance, performance, upload, last change, issuing, acceptation, creation, recording, etc.; an average of about five dates for each object.

3.3 ECLAP model vs standards

The ECLAP semantic model has been designed to manage performing art content and their relationships with users and open data. To this end, a set of standards has been analyzed with particular attention on their capabilities in describing: performance place and date; first performance (premiere) place and date; role of each agent involved in the creation process (e.g., actor, director, musician); usage of standardized role names; roles used for performing arts (when roles are standardized); association of each actor with the character played; association of each musician with the instrument played; association of a performance and/or performance work with its related content (e.g., photos, piece text); association of the content with terms from classification schemes for subject or type description; documents, texts and free text, images, audio files and videos; semantic description of content (e.g., actions performed); relationships with open data such as geonames, dbPedia, etc.; legal IPR status, and possible license or IPR model per user kind.

To cope with the mentioned problems, an analysis has been performed to assess the needs of many prestigious institutions working on performing arts, thus confirming the above-mentioned requirements for the modeling of performing arts metadata. An analysis of a number of standards in modeling these aspects has been performed as a second step, thus producing the results summarized in Table 3 and discussed in the following, for each standard starting from DC.
Table 3

Summary of standard comparison for performing art metadata, when Y is reported as (Y), means a partial support/coverage

Aspects

MPEG-7

EN 15907

FRBRoo

DC

VRA-CORE

CDWA

ECLAP model

Performance place and date

(Y)

(Y)

Y

(Y)

Y

(Y)

Y

First performance (premiere) place and date

N

N

N

N

N

N

Y

Role of each agent involved in the creation process (e.g., actor, director, musician)

Y

Y

Y

(Y)

Y

Y

Y

Standardized roles

Y

N

Y

Y

N

N

Y

Supports all roles for performing arts

N

Y

Y

N

Y

Y

Y

Associate performance and/or performance work with related content (e.g., photos, piece text)

Y

Y

Y

Y

Y

Y

Y

Associate content with terms from classification schemes for subject or type description

Y

Y

Y

Y

Y

Y

Y

Describe documents and texts

N

N

Y

Y

N

Y

Y

Describe images, sounds and videos

Y

Y

Y

Y

Y

N

Y

Semantic description of content

Y

N

N

N

N

N

Y

Free text description

Y

Y

Y

Y

Y

Y

Y

IPR status description

(Y)

Y

Y

Y

Y

N

Y

IPR Model

Y via MPEG-21 REL

N

N

N

N

N

Y via ECLAP IPR Model

Dublin Core [13] metadata terms are generic metadata elements designed to describe digital resources. There are no specific elements for performing arts field. However, many performing arts details can be defined as specializations of the generic terms. The different contributors to the creation (e.g., actor, director) can be defined using MARC relator terms that are defined as sub properties of dc:contributor. According to our analysis, the information about the first performance location is difficult to map to existing elements. The MARC relator terms do not cover all the professionals involved in the creation of performances (e.g., Acrobat). Moreover, it is not possible to associate the actor/musician with the name of the character/instrument played. The semantic description of content is limited to subject/coverage association. The DC.accessright field can be used to collect information on the IPR license or model, while the expected format is not formalized, it can be a URL or a structured information.

MPEG-7 [34] allows the representation of information about: (1) people involved in the creation process with the specific role using the CreationDS (Description Scheme). It can also include the character name and the instrument played, the possible roles are standardized in the RoleCS (Coding Scheme); (2) performance location and date, using the Location and Date elements within the CreationCoordinates element in the CreationDS; (3) content classification for subject/type, using the ClassificationSchemeDescription DS to define a classification scheme; (4) scene description using: simple Text Annotation element for free text description, KeywordAnnotation for keywords, Structured Annotation element with Who, WhatObject, WhatAction, Where, When, Why and How sub-elements, Dependency Structure element to represent the structure of a text annotation based on the syntactic dependency structure of the grammatical elements making up a sentence, Graph DS to describe a graph of relations amongst a set of description scheme instances; for example, a graph describing the narrative structure of a movie or the spatial structure of a set of segments. As a result of the performed analysis on this standard, not all types of professional roles used in performing arts are covered, information about first performance is missing, and it is not suitable for the description of documents and texts, neither for the IPR modeling. On the other hand, it is quite flexible to be improved. For the IPR, the MPEG-21 REL could be manipulated to model potential licenses as PAR (Potential Available Rights) in AXMEDIS evolution of MPEG-21 [6], while MPEG-21 REL nature is focused on modeling instances of licenses and not license models [48]. Differently from the ECLAP IPR Model, the AXMEDIS PAR model does not describe the permissions with respect to user roles and for the different kinds of digital resource. Thus, PAR model resulted unsuitable to be used for IPR modeling of cultural heritage collection.

EN 15907:2010 [23] defines a metadata set for cinematographic works entities such as cinematographic work, variant, manifestation, item, content and contextual entities Agent, Event. From the standard: “A cinematographic realization of a pre-existing non-film work is considered as a cinematographic work. This includes pure performance works such as concerts, original theatre performances, sports events, etc.”. The Has Agent relationship between cinematographic work, variant, manifestation, or item with an agent entity can express the “activity” of the agent (e.g., Actor) as well as the name of the character played by the agent. The production event element associated with the cinematographic work (representing the performance) may be used to report the performance location and date (using a specific value for the “production event type” sub-element, e.g., “performance”, “rehearsal”). In this case, there is no specific element for modeling performance event or space for “production event type”. The relations with non-video content as images, documents and other material associated with the performance work are marginally described. The information on the location and date of the premiere (first performance) is missing. It is not possible to describe semantically content apart from subject association. The IPR Model aspects are not addressed in this standard.

Visual Resource Association-Core [45] is a data standard for the description of works of visual culture, as well as images which may describe them. The standard is hosted by the network development and MARC standards office of the Library of Congress (LC) in partnership with the visual resources association external link. The described core entities are work, image and collection. The work type can be a performance, the date type can be the performance date, the location type can be the performance kind. The agent can be assigned a role from a controlled vocabulary. On the basis of our analysis, we detected the lack of information to mark the first (premiere) performance (date and location); a ‘notes’ element can be used to state that a date/location is referring to a premiere, but it is not fully satisfactory. The semantic description of content is limited to the association with a subject. The IPR Model aspects are not addressed in this standard.

Categories for the description of works of art [11] describes the content of art databases by articulating a conceptual framework to describe and access information about works of art, architecture, other material culture, groups and collections of works, and related images. Categories for the description of works of art include 532 categories and subcategories. A small subset of categories is considered core, since they represent the minimum information necessary to identify and describe a work. Categories for the description of works of art allow the representation of information about: the styles referring to the period of expression of a certain form of art (e.g., 5.1. styles/periods description; 5.2. styles/periods indexing terms); the subject, contextual information (e.g., 17. CONTEXT; 17.1. historical/cultural events); free text for description; critical comments; related works; copyright restrictions; related textual references; place/location with authority record; gives information about the creator (e.g., 4. CREATION: 4.1. creator description; 4.1.1. creator extent; 4.1.2. qualifier; 4.1.3. creator identity; 4.1.4. creator role). From the analysis, it seems that CDWA does not provide support for modeling: roles used for performing arts (when roles are standardized), associations of actor with the character played; association of musician with the instrument played, and in a detailed manner the description of audio and video files. A partial solution to model roles may be to specialize the CREATION aspects reported above. On the other hand, creation in the performing art is typically associated with only the author and to the performer. More derived lacks have been identified for modeling the IPR Model aspects for many different kinds of resources, while the copyright restriction can be generically defined without a specific formalization.

Functional Requirements for Bibliographic Records object oriented is the harmonization of FRBR and CIDOC-CRM performed by IFLA and ICOM. Functional Requirements for Bibliographic Records object oriented provides a number of classes for modeling performance work, recording works, performance plan, recordings, etc. In FRBRoo, classes that can be useful for the description of the performing arts works are: F20 performance work, F21 recording work, F25 performance plan, F26 recording, F27 work conception, F28 expression creation, F29 recording event, F30 publication event, F31 performance (subclass of: E7 activity, E5 event, E4 period, E2 temporal entity), F9 place, F10 person, F38 character. Properties that can be used for performance, performance work and performance plan: R25 performed (was performed in) (domain: F31 performance; range: F25 performance plan), P14 carried out by (performed) (domain: E7 activity; range: E39 actor), P14.1 in the role of (range: E55 type), R12 is realized in (realizes) (domain: F20 performance work; range: F25 performance plan], R13 is realized in (realizes) (domain: F21 recording work; range: F26 recording), P4 has time-span (is time-span of) (domain: E2 temporal entity; range: E52 time-span], P7 took place at (witnessed) (domain: E4 period; range: E53 place].

In [32], Patrick Le Boeuf presented an analysis about the usage of FRBRoo for modeling performing art descriptions as linked data, proposing several patterns and solutions. In [15, 17], a study about the mapping of FRBRoo structures and concepts to Europeana EDM has been proposed. The study has been specifically focused on performing art cases due to their high complexity of modeling. In [19], an analysis of difficulties in modeling performing arts issues with an ontology has been presented. The analysis also highlighted some criticism to FRBRoo (that could be moved also to other models as well) about the modeling of both abstract plan for a performance and the several variations in its related instances—i.e., the real performances. According to our analysis against ECLAP requirements, the FRBRoo has some limitations in modeling the full semantic related to the first performance (either of a work and of a production). One could associate to F31 performance a type “premier” partially solving the problem. Moreover, it seems to be impossible to associate the actor/musician with the name of the character played or the name of the instrument played in a performance. Similarly to the previous aspect, the semantic description of content is limited to the association with a subject. The cases presented in [15] share same problems. IPR support for modeling information into FRBR is limited to the formalization of a reference to licenses. Thus, the IPR Model has to be formalized in other manners.

As a conclusion, MPEG-7 and EN 15907:2010 on film identification and the VRA-CORE 4.0 are mostly related to the description of audio visual aspects of video/image material, but they are not suitable for the description of documents and texts. The FRBRoo seems to be the most powerful to cope with the problems of the performing arts domain especially if we consider the current effort Europeana is doing to integrate it with EDM. On the other hand, ECLAP has demonstrated to be capable of modeling more details about performing arts with respect to the other models and standards, and it also integrates the aspects related to social activities and user-content relationships, for example, citations of VIP names, geonames, usernames, people.

4 ECLAP LOD model and service

The ECLAP portal allows to access RDF descriptions of digital resources that are available on it using specific URIs. The RDF description of the resources is provided in case of a LOD enabled browser, otherwise the standard web browsers are redirected to the usual HTML page with a human readable description. Among the resource descriptions provided there are the taxonomy terms used to classify content, the content annotations that relate couple of audiovisual content, the groups to which the content is bound (e.g., the group of the CP), the ECLAP users with their connections with content and the names referred in the metadata.

where <axoid> is the unique identifier assigned to the content when uploaded (e.g., urn:axmedis:00000:obj:04e0caef-b33b-4f4a-ba50-a80d96766192), <tid> is the vocabulary term identifier (e.g., 501 for dance), <aid> is the identifier assigned to the annotation, <gid> is the identifier of the group (e.g., 3160 for the development group), <uid> is the user identifier (e.g., 1 is the portal administrator) and <nid> is the identifier for a name. The usage of numbers allows assigning unique and stable identifiers for each of them (since most can be freely changed by users, for example, the group name) and to develop iterators for accessing them.

Moreover, a number of relationships exists as well among:
  • Content and vocabulary terms describing it,

  • content and aggregated content (e.g., collection, playlist) containing it,

  • content and groups that are used to provide the content (each ECLAP CP has a group),

  • content and annotations that describe it,

  • users and content, groups and annotations,

  • content and the geonames vocabulary for the places where performances were held, they are provided as a result of an enrichment made on the metadata,

  • content with Person Names cited in the metadata,

  • Person Names with ECLAP users or with DBPedia.

In Fig. 3, an example of how content is related with vocabulary/taxonomy terms, collections and annotations is reported. For the description of the entities a specific ontology has been designed, this ontology is available as a linked data. All URIs used for properties and classes are dereferenceable and point to the ontology description (e.g., http://www.eclap.eu/schema/eclap/performancePlace) both as RDF and human readable documentation in HTML.
Fig. 3

Example of relation among a content with collections, taxonomy terms, names, users, groups, places and annotations

4.1 Content description

Each content is described using RDF; the Dublin Core terms in the ECLAP semantic model are provided as they are, while the specific fields for ECLAP are provided using specific properties (e.g., eclap:performancePlace) which are declared refinements of more generic properties taken from standard schemas (e.g., dcterms:spatial). The relations with the vocabulary are provided using specific properties (e.g., eclap:genre for the terms of the genre hierarchy) linking the LOD URIs to the terms. Also these properties are declared as sub properties of Dublin Core terms.

The relations with other aggregated content like collections are provided using dcterms:isPartOf and dcterms:hasPart properties. Relations with the group of the content provider that is giving the content are offered by specific properties, eclap:isProvidedBy and eclap:provides (both sub properties of dc:relation). These relations allow the linking of all the content, in particular they can be useful for crawlers allowing them to harvest all the content items from a provider. Moreover, a link to the content representation provided to Europeana is available, as well. Therefore, a link to the license using the creative common properties (cc:license and cc:attributionURL) could be used if the content has an associated IPR model specifying a valid license URL.

The following is an example of the RDF representation of a video related to Dario Fo’s “Mistero Buffo”:

Open image in new window

4.2 Taxonomy description

ECLAP provides six thesauruses of terms for the classification of content (for a total of 231 terms):
  • Subject (e.g., teaching, philosophy, multiculture).

  • Genre (e.g., comedy, comic, drama).

  • Historical period (e.g., contemporary, classical, XX century)

  • Movement and style (e.g., experimental, theater of the absurd)

  • Performing arts type (e.g., dance, ballet, music, rock, theater, Noh)

  • Management and organization (e.g., performance, choreography)

Each term in the thesaurus is described using SKOS [47], the relations among the concepts are provided using the broader/narrower properties, and each term is described with multilingual labels in 13 different languages. Moreover, each term is linked with all the content items using that term by means of a specific isSubjectOf property.

4.3 Annotations description

Annotations are used to relate the whole content or some fragments of it to a textual description and with another content or fragment. Annotations can also be associated with an additional descriptor (e.g., scene, gesture, character). Annotations are described using the OAC ontology [35] that is currently a W3C community working draft, the hasTarget property refers to the object being annotated, the FragmentSelector class is used to specify the temporal fragment of the annotated resource that is subject to the annotation and the hasBody property refers to the annotation body that can be the reference to another content or a text description. The annotatedBy property is used to relate the annotation to the user that created it and the annotatedAt indicates when the annotation was created.

The annotation tool and model of ECLAP is MyStoryPlayer [2]. It supports two kinds of annotations, the One2One (that is shown in parallel with the media annotated) and the explosive annotation (that is shown stopping the media being annotated and showing the annotation audiovisual on the main canvas). This aspect of the semantic behavior of the annotation is not representable using OAC. To cope with this problem, an additional rdf:type has been added to formalize this type of annotation. The following is an example of a One2One annotation of a video fragment from second 29 to 227, with another video fragment (from second 67 to 119) and with a text description. There is also a dc:type element to associate a classification keyword with the annotation (acting style):

Open image in new window

4.4 User description

Provided the privacy implication of publishing personal information about the user, only minimal personal user information is given, namely the nickname is provided. However, other relations are available such as: ‘knows’ relation connecting with ‘friends’ users, featured content, favorite content, uploaded content, created annotations, subscribed groups, and possible taxonomy terms of interest. The following is an example of the description of a user:

Open image in new window

The property isMemberOf is the inverse of the foaf:member property and the createdAnnotation property is the inverse of oa:annotatedBy. The has Favourite property is defined as a sub property of foaf:interest.

4.5 Group description

Groups in ECLAP are used both as a way to aggregate users around a specific topic (i.e., the working group on IPR issues) and to aggregate content provided by a content provider (i.e., the Dario Fo and Franca Rame Archive). Each group has a set of users that are group administrators, a set of group members and it is associated with media objects.

4.6 Name/person description

Each name found in the metadata during the named entity recognition phase is accessible with an RDF description providing the different names that were marked as synonyms, the possible links to dbPedia records for the same name, link to the ECLAP user with the same name and links to the content quoting this name (or its synonyms). The links to dbPedia are made via the rdfs:seeAlso property and not with the more semantically strong owl:sameAs, since it happens that many links on dbPedia are found for the same name and when linking all of them with the sameAs property, this make them all equivalent.

The following is the description of Dario Fo:

Open image in new window

And the following is the description of person “Paolo Nesi” that is linked with owl:sameAs with the ECLAP user:

Open image in new window

5 Relations display and navigation

The ECLAP allows to display and to navigate the relations among the managed entities. The ‘Social Graph’ of a media object is shown when a content is played or when the user logs in. This graph is a simplification of the information that is available in the ECLAP semantic model and via linked data and the terminology used for relations is not always the same used in LOD aiming at simplifying the understanding by users.

The graph is made of two kinds of nodes: rectangular-shaped nodes represent entities (content, terms, users, etc.), while circular-shaped nodes represent relations. Directed edges connect an entity node to a relation node and a relation node to an entity node. Examples of relations are shown in Fig. 4. Regarding the user interactions, the user is able to: Expand an entity node with its relations adding them to the graph; Focus on an entity, in this case the graph is cleared and only the focused node is shown with its relations; Open, which means playing the page or content associated with the node; use the Back button to go back to previous states of the graph (e.g., after a focus); zoom/pan the view; hide/show types of relations to reduce the complexity of the graph. A special node is the ‘More’ node that is presented when there are many nodes in a relationship (e.g., the content associated with a group). In this case, providing all nodes could be infeasible, thus a limited number of nodes is provided and a ‘more’ node is added to the relation. Clicking on it other nodes are added to the relation in a way similar to classical pagination used to present long lists in HTML.
Fig. 4

An Example of ECLAP Social Graph

In Fig. 4, an example of ECLAP Social Graph of content is shown after expanding some nodes. The relationships visualized by the Social Graph are reported in Appendix A. The Social Graph is also presented in the Europeana ThoughtLab page on new ways of searching and browsing (http://pro.europeana.eu/web/guest/thoughtlab/new-ways-of-searching-and-browsing#SocialGraph).

According to the users interaction analysis of the Social Graph and of the whole portal, 5.8 % of unique users interacted with the Social Graph. The ECLAP users get access to the Social Graph in their home page where several content lists are also accessible: recently played, last posted, popular, latest contributions from your groups and colleagues, top rated, your favorites, your uploads, potential colleagues, latest updates, featured content, etc. The Social Graph does not offer any support for creating new edges, while the above-mentioned lists can be of help in creating new edges. The most requested operation has been to Open a node (43 %, for example, to access a recommendation, to see the content of other users), then to Expand a node (29 %, where to expand a media object covers 17 %) and then to see the More related content (18 %), the Focus operation reaches about 10 % of the requested operations since the Social Graph has been activated (2013-01-29) until the mid of September 2013. Figure 5 reports in more detail the distribution of the interactions among the different types of actions.
Fig. 5

Distribution of major user interactions on Social Graph, in percentage with respect to the total number of interactions

6 From ECLAP to EUROPEANA EDM model

Recently, the new EDM [22] for metadata ingestion and management has been proposed. The new model is based on well-defined semantic web standards such as ORE, Dublin Core [13] and SKOS [41]. Noticeable requirements for the EDM model were (1) distinction between “provided object” (painting, book, movie, archeology site, archival file, etc.) and digital representation (2) distinction between object and metadata record describing an object (3) multiple records for the same object should be allowed, containing potentially contradictory statements about an object (4) support for objects that are composed of other objects (5) compatibility with different abstraction levels of description (6) EDM provides a standard metadata format that can be specialized and (7) EDM provides a standard vocabulary format that can be specialized. One of the main goals of EDM is to allow the integration of the different data models used in Cultural Heritage data, to collect and connect through higher-level concepts all original descriptions coming from several Content Aggregators. Analyzing the EDM model in the context of Content Aggregation, two basis classes of resources provided to Europeana are identified: the “provided object” itself and a (set of) digital accessible representation of it. This permits to keep separate “works”, which are expected to be the focus of the users’ interest from their digital representations, which are the elements manipulated in the information systems like Europeana. According to the ORE approach through the ore:Aggregation class, the provided object and its digital representation, given by one provider, stands for an aggregation. Each instance of ore:Aggregation is related to one resource standing for the provided object, through ens:aggregatedCHO property, and one or more resources that are the digital representations of the provided object through the ens:hasView property.

The present version of EDM integrates the former model of Europeana called Europeana Semantic Elements (ESE), by re-contextualizing each element in the more structured context of EDM.

In particular, in the context of EDM deployment, the values of ESE properties, which are currently provided as simple strings, could be given in a typical RDF [38] form, namely as pointers to full-fledged (RDF) resources standing for concepts, agents or places (to name a few) that would be provided with complete description and linkage to other resources. This applies in particular to both Dublin Core properties (e.g., dc:creator) and to ESE-specific ones (e.g., ens:isShownAt). As EDM supports the delivery of aggregated content, ECLAP can use Collections as a kind of aggregated content that may be provided to Europeana.

Moreover, ECLAP used the extensibility of EDM to define specific specialization for some properties to provide more detailed information on content. For example, custom properties have been defined in the following way:
  • eclap:director rdfs:subPropertyOf dc:creator.

  • eclap:lightDesigner rdfs:subPropertyOf dc:contributor.

  • eclap:performanceDate rdfs:subPropertyOf dcterms:issued

where: director property is defined as sub property of Dublin Core creator, lightDesigner as sub property of contributor and performanceDate as sub property of issued date. However, ingestion into Europeana is performed providing data as XML and an XSLT that is used to map to EDM XML Schema. The EDM XML Schema uses the RDFXML encoding of the EDM ontology limiting the kind of properties and classes that can be used not allowing using custom properties and custom classes.

Therefore, ECLAP metadata schema is mapped to EDM schema using an Object-centered perspective (the only one which Europeana ingestion supports by now). It should be noted that when it comes to Performing Art domain, the content provided to Europeana in many cases does not represent strictly a physical object (like a book, a painting, a sculpture, …), while often it represents an event occurred in the past, which is the performance. That is quite different from classical cultural heritage elements. For each ECLAP MediaObject to be provided to Europeana, an edm:ProvidedCHO element has to be produced to represent the provided cultural heritage object with all its metadata, then an edm:WebResource element representing the ECLAP portal web page showing the cultural heritage object and finally an ore:Aggregation element connecting the ProvidedCHO with the WebResource adding information about the provider (the aggregator and the content provider), plus the thumbnail of the digital resource, etc.

In general, the Dublin Core elements (dc and dcterms) of the MediaObject are mapped directly to the ProvidedCHO elements while the PerformingArts metadata are mapped to Dublin Core elements when possible, also the taxonomy associations may be mapped to Dublin Core elements depending on the top hierarchy element (Subject is mapped to dc:subject, PerformingArtType to dc:type, HistoricalPeriod to dcterm:temporal, etc.). Moreover, the skos:Concept elements representing the terms used in the metadata are reported as well. The mapping is enhanced by the associations with Places, TimeSpans, Agents, thus integrating the text metadata with an association with a RDF resource coming from LOD initiatives or well-known authority files as dbpedia for Person Names, geonames for places, etc. In the Appendix B, a more detailed description of the mapping is reported. Appendix C provides an example about mapping the metadata of an Image from the Dario Fo and Franca Rame Archive.

Recently, Europeana has given some guidelines to provide hierarchically organized content that should be considered to provide Europeana with aggregated content. Such type of content is available on the ECLAP portal as Collections, Playlists, Annotations and Courses, but according to the present EDM model, most of the aggregated content in the ECLAP semantic model cannot be fully exported to Europeana. In fact, whether following the aggregation schema allowed by EDM the information about (1) the temporal segments of media involved in playlists (2) semantic information related to annotations and synchronizations modeled in MyStoryPlayer (3) the full courses cannot be directly mapped into EDM. On the other hand, they continue to be additional features of ECLAP Content Aggregator with respect to the Europeana model and service. ECLAP also offers the management of social network and therefore the several relationships with users cannot be mapped as well, but in most cases they are probably out of the scope of Europeana.

7 Conclusions

In this paper, ECLAP semantic model, addressing the problems of performing arts and content enrichment and aggregation, has been presented. It describes the entities and the relationships supported among the several content kinds and users activities focusing on performing arts aspects, IPR, annotations and aggregation, the linking to external sources such as dbPedia and geonames, regulations, and many collected dates related to several events associated with performances and content evolution. The proposed model in representing performing arts metadata has been compared with the most widespread and well-known standards such as: FRBRoo, DC, EDM, MPEG-7, etc. (limiting the analysis to standards having really some specific capabilities to cope with performing arts aspects). The same ECLAP model is accessible as LOD to make available to the community the large set of ECLAP data including: content description, taxonomy, relationships, user descriptors and annotations (according to MyStoryPlayer model), links to external LOD, etc. Some examples have been reported as well. To provide final users with a complete access to the ECLAP semantic model, a Social Graph tool has been proposed. It allows users to visualize and navigate in the model, and also to prune and filter the relationships according to the user’s interests. Some results of the user validation have also been presented. Finally, the mapping of the ECLAP semantic model toward the EDM model of Europeana has been presented. This final mapping represents the final phase of the metadata in reaching the European aggregator of cultural heritage content.

ECLAP has successfully addressed and enriched more than 170.000 multilingual content, enriching them and providing them in LOD and in EDM. Linked Open Data is freely accessible and EDM information is also accessible directly on Europeana service. The experience has also highlighted that some relevant elements produced, enriched and aggregated by ECLAP cannot by mapped into EDM, since the ECLAP model can address some of the details related to the performing arts which are not presently addressed by the available standards.

Notes

Acknowledgments

The authors want to thank Hugo André Lopes, Alessandro Venturi, and Marco Serena for the help in developing the linked data support and the Social Graph visualization and integration into ECLAP, and give thanks to all the partners involved in ECLAP, and to the European Commission for funding ECLAP in the Theme CIP-ICT-PSP.2009.2.2, Grant Agreement No. 250481. A sincere thanks to Patrick Le Boeuf for all the comments and emails exchanged about the performing arts modeling in FRBRoo and ECLAP.

References

  1. 1.
    Bellandi, A., Bellini, P., Cappuccio, A., Nesi, P., Pantaleo, G., Rauch, N.: Assisted knowledge base generation, management and competence retrieval. Int. J. Softw. Eng. Knowl. Eng. 32(8), 1007–1038 (2012). doi: 10.1142/S021819401240013X CrossRefGoogle Scholar
  2. 2.
    Bellini, P., Nesi, P., Paolucci, M., Serena, M.: Models and tools for content aggregation and audiovisual cross annotation synchronization. In: Proceedings of 2011 IEEE International Symposium on Multimedia, pp. 210–215. (2011) Google Scholar
  3. 3.
    Bellini, P., Bruno, I., Cenni, D., Nesi, P.: Micro grids for scalable media computing and intelligence on distributed scenarious. IEEE Multimed. 19(2), 69–79 (2012)CrossRefGoogle Scholar
  4. 4.
    Bellini, P., Bruno, I., Nesi, P.: A workflow model and architecture for content and metadata management based on grid computing. In: Proceedings of the ECLAP 2013 Conference, 2nd International Conference on Information Technologies for Performing Arts, Media Access and Entertainment, Springer LNCS (2013)Google Scholar
  5. 5.
    Bellini, P., Cenni, D., Nesi, P.: On the effectiveness and optimization of information retrieval for cross media content. In: Proceedings of the KDIR 2012 part of IC3K 2012, International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (2012) Google Scholar
  6. 6.
    Bellini, P., Nesi, P., Pazzaglia, F.: Exploiting P2P scalability for grant authorization in digital rights management solutions. Int. J. Multimed. Tools Appl. (2013)Google Scholar
  7. 7.
    Bellini, P., Nesi, P., Serena, M.: MyStoryPlayer: semantic audio visual annotation and navigation tool. In: Proceedings of the 17th International Conference on Distributed Multimedia Systems, DMS11, Florence (2011)Google Scholar
  8. 8.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia—a crystallization point for the web of data. J Web Sem. 7(3), 154–165 (2009)CrossRefGoogle Scholar
  9. 9.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data—the story so far. Int. J Semant. Web Inf. Syst. 5(3), 1–22 (2009). doi: 10.4018/jswis.2009081901 CrossRefGoogle Scholar
  10. 10.
    Bruno, I., Paolucci, M., Bellini, P., Mitolo, N.: DE3.3.2 content and metadata processing and semantification. http://www.eclap.eu/115117
  11. 11.
  12. 12.
    Colin, D.: The difficulty of an ontology of live performance. InterAct. UCLA J. Educ. Inf. Stud. 9(1) (2013)Google Scholar
  13. 13.
  14. 14.
    Dionissiadou, I.: Archives incorporating museum objects: the case of Performing Arts. In: 2010 Annual Conference of CIDOC, Shanghai, 8–10 Nov 2010. http://cidoc.meta.se/2010/full_papers/dionissiadou.pdf
  15. 15.
    Doerr, M., Gradman, S., Le Bouef, Aalberg, T., Bailly, R., Olensky, M.: Final report on EDM-FRBRoo Application Profile Task Force, Europeana (2013)Google Scholar
  16. 16.
    Doerr, M., Bekiari, C., Le Boeuf, P.: FRBROO, a Conceptual Model for Performing Arts. In: 2008 Annual Conference of CIDOC, Athens, 15–18 Sep 2008. http://cidoc.mediahost.org/archive/cidoc2008/Documents/papers/drfile.2008-06-42.pdf
  17. 17.
    Doerr, M., Gradman, S., Hennicke, S., Isaac, A., Meghini, C., van de Sompel, H.: The Europeana Data Model. In: Dissemination paper, IFLA 2010, World Library and Information Congress: 76th IFLA General Conference and Assembly, Gothenburg, 15 Aug 2010. http://www.ifla.org/files/hq/papers/ifla76/149-doerr-en.pdf
  18. 18.
    Doerr, M.: The CIDOC conceptual reference module—an ontological approach to semantic interoperability of metadata. AI Mag. 24(3) (2003)Google Scholar
  19. 19.
    Doty, C.: The difficulty of an ontology of live performance. InterAct. UCLA J. Edu. Inf. Stud. (2013)Google Scholar
  20. 20.
    Bellini, P., Bruno, I., Cenni, D., Nesi, P., Paolucci, M., Serena, M.: A new generation digital content service for cultural heritage institutions. In: Proceedings of the ECLAP 2013 conference, 2nd International Conference on Information Technologies for Performing Arts, Media Access and Entertainment, Springer LNCS (2013)Google Scholar
  21. 21.
    Verbruggen, E., Baltussen, L.B., Mitolo, N., Nesi, P., Oomen, J., Van Biessum, H.: ECLAP early exploitation plan, M30, http://www.eclap.eu/115355
  22. 22.
  23. 23.
  24. 24.
  25. 25.
    Eversmann, P., Lint, E., Schuurman, J.: ECLAP: performing arts education, heritage and educational IT. Best practice recommendations, DE5.2.3 WGA, http://www.eclap.eu/136384
  26. 26.
    Young, J.: On metadata, Performing Arts material in our digital world, Global Performing Arts Consortium WWW.Glopac.org
  27. 27.
    Haslhofer, B., Jochum, W., King, R., Sadilek, C., Schellner, K.: The LEMO annotation framework: weaving multimedia annotations with the Web. Int. J. Digit. Libr. 10(1), 15–32 (2009)CrossRefGoogle Scholar
  28. 28.
    Haslhofer, B., Simon, R., Sanderson, R., Van de Sompel, H.: The Open Annotation Collaboration (OAC) model. In: Proceedings of the 2011 Workshop on Multimedia on the Web (MMWEB ‘11), pp. 5–9. IEEE Computer Society, Washington DC (2011)Google Scholar
  29. 29.
    Kahan, J., Koivunen, M.R.: Annotea: an open RDF infrastructure for shared Web annotations. In: WWW’01: Proceedings of the 10th International Conference on World Wide Web, pp. 623–632. ACM Press, New York (2001)Google Scholar
  30. 30.
    Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., Lee, R.: Media meets Semantic Web—how the BBC uses DBpedia and Linked Data to make connections. In: Proceedings of the 6th European Semantic Web Conference, pp. 723–737. Springer, Berlin (2009)Google Scholar
  31. 31.
    Koster L.: Linking library and theatre data. In: International Group of Ex Libris Users 2011 IGeLU Conference, University of Haifa, 11–13 Sept 2011Google Scholar
  32. 32.
    Le Boeuf, P.: Towards Performing Arts Information As Linked Data? In: SIBMAS 2012 Conference: Best Practice! Innovative Techniques for Performing Arts Collections, Libraries and Museums= A la recherche de l’excellence! Approches innovantes dans les collections et bibliothèques des arts du spectacle, France (2012)Google Scholar
  33. 33.
    Lignes de Temps—analyse, comment and annotate films and any audio/video recordings, http://www.iri.centrepompidou.fr/outils/lignes-de-temps-2/
  34. 34.
  35. 35.
    Open Annotation Collaboration, W3C, http://www.openannotation.org/
  36. 36.
    Gray, S.: Conservation and Performance Art, Building the Performance Art Data Structure PADS. Master Dissertation, Northumbria University (2008)Google Scholar
  37. 37.
    Raimond, Y., Sandler, M.B.: A Web of musical information. In: Bello, J.P., Chew, E., Turnbull, D. (eds) ISMIR, pp. 263–268, (2008) http://musicontology.com
  38. 38.
  39. 39.
    Saathoff, C., Scherp, A.: Unlocking the semantics of multimedia presentations in the web with the Multimedia Metadata Ontology. In: WWW, ACM, pp. 831–840, (2010)Google Scholar
  40. 40.
    Scherp, A., Eibing, D., Saathoff, C.: A method for integrating multimedia metadata standards and metadata formats with the Multimedia Metadata Ontology. Int. J. Semant. Comput. 6(1), 25–49 (2012). doi: 10.1142/S1793351X12400028 CrossRefGoogle Scholar
  41. 41.
  42. 42.
    Sofou, N., Bellini, P.: ECLAP DE4.3 metadata descriptors interoperability http://www.eclap.eu/115119
  43. 43.
    Tordai, A., van Ossenbruggen, J., Schreiber, G.: Combining vocabulary alignment techniques. In: Proceedings of the 5th International Conference on Knowledge capture (K-CAP ‘09), pp. 25–32. ACM, New York (2009)Google Scholar
  44. 44.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk—a link discovery framework for the Web of data. In: Proceedings of the 2nd International Workshop on Linked Data on the Web (LDOW), Madrid, Spain (2009)Google Scholar
  45. 45.
  46. 46.
    W3C (2009) Media fragments URI 1.0. W3C media fragments working group. URL http://www.w3.org/TR/media-frags/
  47. 47.
    W3C Semantic Web Deployment Group (2009) SKOS simple knowledge organization system reference. URL http://www.w3.org/TR/2009/REC-skos-reference-20090818/
  48. 48.
    Wang, X.: MPEG-21 rights expression language: enabling interoperable digital rights management. IEEE Multimedia 11(4), 84–87 (2004)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  1. 1.Distributed Systems and Internet Technology Lab, DISIT, Dipartimento Ingegneria dell’InformazioneUniversity of FlorenceFlorenceItaly

Personalised recommendations