As already mentioned, the first step toward the development of the OAVA platform is to define the criteria that will be used to create the dataset. Thus, aggregation services and national registries are reviewed. Related initiatives in Greece are also presented. We conclude the literature review with a short review of selection criteria for the development of online collections.
Aggregation services and national registries
Europeana is supported by the European Commission aiming at Europe’s cultural heritage hosted in the collections of European galleries, libraries, archives, and museums (GLAMs) . Europeana has developed the Europeana Publishing Framework  to help European cultural heritage institutions share their collections. Image and text resources make up 97% of Europeana’s total content, while audiovisual resources make up the remaining 3% . Europeana partners include the European Film Gateway,Footnote 1 the EUScreenFootnote 2 project and its successor, EUScreenXL. One thing to note in the case of audiovisual resources is that the criteria for data partners change slightly due to the characteristics of the format. Having a digital file format available on the Internet via a permanent URL is considered a prerequisite. . Regarding metadata, legacy and local metadata schemas were converted to the Europeana Data Model using the MINTFootnote 3 tool; content selection criteria were not restrictive . Examples of resource types are TV programs and series, movies, news, entertaining programs, movies or series of documentary character, advertisements, radio programs, homemade videos, etc. Copyright is a challenging issue especially for audiovisual resources. Thus, the EU carried out another project related to orphan audiovisual works. The FORWARD project was initiated in 2013 and lasted until 2017 with the aim to “create a permanent registry for AV Orphan Works” .
The audiovisual heritage of the UK has been documented in various books and projects. A well-known monograph is “The Researcher’s Guide: Film, Television, Radio, and Related Documentation Collections in the UK” . “Film Archives UK” is a limited company in UK that focuses on the development of the country’s public-sector film archives. To achieve its goals, the company collaborates with regional film archives, such as the National Library of Scotland’s Moving Image Archive and the National Screen & Sound Archive of Wales . In Switzerland, Memobase is an association for the preservation of the audiovisual heritage of Switzerland, supported by the Swiss Confederation. It aggregates metadata for audiovisual resources, i.e., photographs, films, sound, and video, from 60 Swiss memory institutions . A recent assessment of the progress of digitization, online accessibility, and digital preservation of cultural material in the European Union has shown that digitization of audiovisual heritage has a high priority in 13 EU countries, namely Belgium, Czechia, Germany, Greece, Italy, Latvia, Malta, Austria, Poland, Hungary, Slovakia, Finland, and Sweden . According to , the COVID-19 pandemic revealed the need for more audiovisual resources that can be used in collaborative environments and online activities.
Trove is a collaboration between the National Library of Australia and several Australian organizations. It aggregates content from GLAM institutions in Australia, mainly “libraries, museums, galleries, the media, government and community organisations” . They must: (a) provide the XML sitemap for their websites, (b) use aggregation technologies, such as the OAI-PMH protocol and APIs, and (c) describe the provided resources using either the Trove Data Dictionary or simple metadata schemas, such as the Dublin Core . The audiovisual content aggregated by Trove is categorized as “Music, Audio & Video” content. At the end of May 2021, this category included 2.303.226 audiovisual resources (video, sound, and audio books),Footnote 4 only a small percentage of the 6 billion digital items available through Trove . Australia’s “National Registry of Audiovisual Collections”  includes collections from a variety of providers, including “libraries and museums, community groups, political parties, historical societies, research centers, film societies, broadcasters, distributors, production companies and foreign legations, as well as individual collectors” .
The “Digital New Zealand” (DigitalNZ) service was launched by the National Library of New Zealand in 2008 with the aim of aggregating all digital material related to the country, and providing a one-stop point of access. It currently has more than 30 million resources, of which only 478.500 are audio and video resources. The DigitalNZ harvests metadata from content partners, structures the metadata, and then makes it available through its API . In the event that a content partner simply uploads their digital collection to a web page, the DigitalNZ service may harvest metadata and resources by scraping the metadata in the HTML web pages . The DigitalNZ service helps aspiring partners in their decisions for digitizing their material (Make it Digital program) and even provides a shared repository for partners that have small collections and no repository infrastructure .
The Digital Public Library of America—DPLA, is a nonprofit organization that utilized the know-how of both Europeana and Trove Australia. As far as selection criteria are concerned, it focuses on exploiting intermediate providers—Hubs, which will gather and prepare the material from cooperating institutions. Hubs must initially submit 50,000 unique records. They are also responsible for ensuring that good operation and viability prerequisites are met . Criteria related to audiovisual content include the level of description, the existence of metadata, ownership, and usage rights . The DPLA does not exclude audiovisual material hosted in commercial streaming services, such as YouTube and Vimeo, as long as there is a copyright statement, and no advertisements are displayed. It is worth noting that the DLPA does not collect audio transcriptions into text.
The National Digital Library of India (NDLI) is a relatively new aggregation service “with a vision to turn NDLI into a National Knowledge Asset” . The NDLI aggregates metadata from a variety of sources and normalizes them for integration using the NDLI metadata schema . With regard to rights, the NDLI promotes the use of Rights Statements among its content partners . The NDLI currently aggregates more than 68 million of textual documents and 1 million of images. Video and audio resources are nearly 866.000.Footnote 5
In Greece, the National Documentation Centre acts as the Greek National Aggregator for Europeana. It has recently published a set of best practice guidelines for Greek GLAM institutions wishing to share their collections on Europeana via the national aggregation service called SearchCultureFootnote 6 . These guidelines focus on the Europeana Data Model, licensing issues, and technical characteristics of the digital files. In June 2021, the SearchCulture aggregation service provided access to nearly 2.500 audiovisual resources from galleries, archives, libraries, museums, and research institutions. This is less than 1% compared to the 717.700 resources offered by the service in total.
In 2006, the Hellenic National Audiovisual Archive (HNAA) was founded collecting audiovisual resources from different providers, mostly media organizations. One of its tasks was to create a registry of audiovisual archives and collections in Greece. A questionnaire was sent to universities, research institutes, libraries, archives, museums, cultural festivals, public organizations, and collectors. Preliminary results were presented in 2010 ; unfortunately, this project was never completed because the HNAA stopped operating in 2011. The National Centre of Audiovisual Media and Communication EKOME, established in 2015, aims to protect, support, and promote public and private initiatives “in the field of audiovisual media and communication in Greece” . Based on the finding that audiovisual resources are produced and stored in different electronic environments, the EKOME is organizing the creation of a National Registry of Audiovisual Archives that aims to register all institutions holding audiovisual archives. The National Registry has not been created yet. Currently, the EKOME focuses on “attracting investment through the production of films, television series, documentaries, animation, and digital games in Greece” .
Selection criteria for developing collections
All these registries and aggregation services provide specific technical standards and licensing policies in order to collect audiovisual resources from different types of cultural, media, and research institutions [7, 10, 31, 34, 36, 38] Regarding the content of the audiovisual material, selection criteria do not apply. Providers simply decide which content may be considered eligible for inclusion. Selection criteria mostly apply in internal digitization projects or other collection development procedures. Studying the relevant literature shows that there is a high degree of homogeneity in the selection criteria described in the various studies and research papers . The Digital Library Federation—Council on Library and Information Resources, published a report in 2001 with selection criteria to help libraries create “high-quality collections of free Web resources” . The criteria in this report were organized in 4 categories: context, content, form/use, and process or technical. The International Federation of Library Associations and Institutions (IFLA) published in 2012 guidelines and criteria for the selection of electronic resources . Criteria on content (reliability, subject, complementarity with other sources) and technical criteria were introduced, in addition to the evaluation criteria for commercial providers. NISO published a framework for the development of digital collections providing best practices on collection development, item selections, and metadata . Item selection mostly focused on their technical characteristics. The American Library Association  published a set of criteria depending on the type of library (Public, School, Academic). The criteria were organized as general and special ones focusing mostly on content characteristics (reliability, objectivity, subject, relevance to the rest of the collection).
At the European level, the Minerva project provided useful guidelines for digitization projects . The guidelines acknowledged that selection criteria differ depending on each project’s goals. Nevertheless, there are general criteria that should be considered, such as copyright and licensing issues, and accessibility. In Greece, a report with best practices for digitization projects was published by a research team at the University of Patras . The selection criteria of this report included copyright, cost, and digitization process issues, retention, criteria for the organization and adequate documentation of the material to be digitized, criteria related to the entity ’s potential, the purpose of the digitization project, and the intended uses for the digitized material.
To conclude this section, well-known aggregation services of cultural content are EuropeanaFootnote 7 in Europe, Digital Public LibraryFootnote 8 in North America, National Digital Library of India, TroveFootnote 9 in Australia, and DigitalNZ in New Zealand. All these five aggregation services are generic ones and present similarities to one another. They all cooperate with other institutions to aggregate content and provide policies, copyright best practices, preservation and conservation ideas regarding the aggregated content. The institutions providing their content are most often, galleries, archives, libraries, museums, as well as film archives and media institutions. Despite the generic character of the aggregation services, they all aim to aggregate audiovisual heritage resources among other types of resources. Currently, the audiovisual resources aggregated by each service are only a small fragment of the total content. The need of gathering and providing access to audiovisual material in a unified way is self-evident. National libraries play a key role in preserving a nation’s heritage. With the advent of the Internet, new types of online content are easily created, spread, and lost once a web page is removed. Thus, many national libraries have begun to play an active role in the development of national digital archives and aggregation services for the collection and conservation of online and audiovisual resources. In many cases, existing library tools, criteria, and methodologies, that have already been applied to the development of online collections, are being extended and adapted to meet the needs of the new types of content. Focusing on Greece, this need to aggregate audiovisual material is imperative due to the absence of a national searchable registry.
Licenses for reuse
Resources in the Public Domain can be used in the context of open access services. According to the Greek and European legislation, intellectual property falls into the public domain 70 years after the death of the creator of the intellectual work. This means that they can be modified and used commercially by attributing to the author, without copyright or other charges.
According to the Greek law and European Union directives, any information generated by public sector organizations is considered public information, with a few exceptions, for example, national security organizations, organizations collecting personal or other sensitive data, etc. Public information may be used for any purpose by any individual or legal entity, even for commercial purposes, without the need to obtain consent or even inform the content creator. Resources that have been created by public organizations in Greece and fall under the Greek law regarding open data and the free provision of access to public information with the possibility of reuse are also considered eligible for open access use.
All material publicly available on the Internet and licensed under any Creative Commons license is considered eligible for selection. Creative Commons licenses have been in effect since 2002 to facilitate the process of publishing content on the Internet. Content creators decide how their content may be used or reused, removing all, or most of the copyright restrictions set by the law.
Material publicly available on the Internet and licensed under a Rights Statements license is usually considered eligible for open access use. However, on a few occasions, certain licenses may pose restrictions. Rights Statements licenses are typically used by cultural organizations, which have collections of content by various authors. These licenses make it easier for collection owners to explicitly state how the content may or may not be used.
When considering the “Fair Use” of content, one must keep in mind that “Fair use” is not a license. The term “Fair Use” describes the right to use proprietary material after taking into account several factors such as:
the service makes commercial use of the content.
the content presents facts and informs the public.
only part of the content is used.
the service is obstructing commercial exploitation by the content owners.
Whether a particular use of a resource falls under the terms of “Fair Use” always remains a decision of US courts.
The case of licenses offered by hosting services is the last case discussed in this section. To publish content to a hosted service, the content creator must first accept the terms and conditions under which the content will be published. This usually means that the content creator is no longer the copyright holder or the sole copyright holder of the content, allowing for several types of reuse permitted by the hosting service. Resources hosted on such services are therefore candidates for open access platforms.