Legal challenges of an open web index

Calls for a European “third way” in matters of digital technology need to be put into action to achieve an actual change of the status quo. To achieve autonomy in the digital sphere, European alternatives to digital services and products need to be established. However, such efforts must also extend to the level of information technology infrastructure. To that end, decentralized indexing of the internet could significantly help to strengthen Europe’s digital sovereignty. Nevertheless, legal challenges need to be overcome to make that vision a reality.

identified as a major objective that may well define Europe's future in the digital age (cf. [1], which lists possible initiatives that could contribute to digital sovereignty; ibid., p. 9 et seq.).
There are several goals that relate to the concept of digital sovereignty: 1. To find a "third way" between the US approach of libertarian data capitalism (or "surveillance capitalism" as coined by [2]) and the approach of authoritarian states that seek total control over the actions of their citizens in the digital sphere. 2. To safeguard European fundamental rights, principles, and values in a globalized and digitized environment. 3. To strengthen Europe's economies and make them competitive in the 21st century. 4. To develop alternatives to established digital services and products, and thus ensure autonomy and freedom of choice. 5. To safeguard personal and professional secrets in the context of digital communication and the use of digital services and products while ensuring "freedom of expression." The "Berlin Declaration on Digital Society and Value-Based Digital Government" of 8 December 2020 states that Europe must foster "our own key digital capacities," but then narrows this down to cloud infrastructure and public services [3, p. 12].
Some [4] classify the "fight for digital sovereignty" as an epochal struggle, since it does not only concern all vs. all, but describes a situation "of anyone allied with anyone, with variable alliances changing according to interests and opportunities." Therefore, one can witness the asymmetric dispute between internet companies and states, since companies develop, maintain, and profit from their digital services and assets, while states hold the "cybernetic control" to regulate those companies that carry out their business within the jurisdiction of the state, thereby establishing a (potential) significant counterbalance. While companies are drivers and creators of innovation, states are enablers as well as regulators, supervising the direction in which companies are heading.
Acknowledging the need for digital sovereignty also acknowledges the fact that Europe and other regions have fallen behind in matters of digitization [5]. Achieving the goals that have been outlined above requires very different approaches in very different areas-ranging from economic promotion to the enactment of new laws.
One such approach that aims at digital sovereignty at the level of the internet infrastructure is the establishment of a European web index [6, p. 49 et seq., 7]. A web index plays a fundamental role in the effective use of the internet. In the Western world, the field of internet search is clearly dominated by Google, which operates its own web index [8]. The only alternative web index on an international scale is maintained by Microsoft for its own search engine Bing. In Russia and China respectively, Yandex and Baidu operate localized web indexes. Breaking a monopoly like Google's requires alternative search engines. However, in the current environment, such search engines still need to base their operation on one of the existing web indexes. Creating and maintaining a new web index is very costly and thus usually not a valid option for a small or emerging search provider. The solution to this dilemma might be an open and collaborative web index, where resources across Europe are pooled to create a web index that can serve as a basis for search engine operators to establish services independently from the existing monopoly and without the need to adhere to pre-dictated terms of use. It is also a change for European emancipation in the sense that European values and fundamental rights can be made the benchmark for the establishment and operation of such a web index. Furthermore, by creating an alternative, and thus redundancy, the overall resilience of the modern information infrastructure is strengthened.

An open search infrastructure
Navigating the internet without the use of a search engine is at best impractical, at worst near impossible. A web index contains all the information that can be found in the results list of a search engine, e.g., the Uniform Resource Locator (URL), short summaries (so-called Snippets) and also compressed versions of images that are embedded in a website. As a result, search engines are an integral part of the internet infrastructure. Effectiveness and usability are key features for the success of any search engine.
Operators of web indexes have to establish and maintain a searchable directory of the internet with the exception of the so-called Deep Net and Dark Net, which are not indexed. Web crawlers have to constantly work towards this goal. An open web index could make the results of this constant search effort available to research, which might benefit a range of disciplines, from social sciences to cybersecurity studies. The failure of the digital markets to guarantee plurality and diversity may condense into a duty of the state to promote the establishment of an open search infrastructure and it should be considered for inclusion in our understanding of the scope of Art. 106(2) Treaty on the Functioning of the European Union (TFEU).

Legal challenges
The main feature of an open search infrastructure is the separation of web index operator and search engine provider. This constellation can already frequently be found, but there is a significant difference: the open web index will by itself not provide a search engine for the end user. Instead, the operator(s) will simply provide a basis for others to provide services. This in itself might challenge what in law we understand a search engine to be. Art. 2 No. 5 of Regulation (EU) 2019/1150 [9] defines an "online search engine" as "a digital service that allows users to input queries in order to perform searches of, in principle, all websites, or all websites in a particular language, on the basis of a query on any subject in the form of a keyword, voice request, phrase or other input, and returns results in any format in which information related to the requested content can be found." The aspect of indexing is not included in this definition. By contrast, the European Court of Justice (ECJ) has stated that the "activity of a search engine" consists "in finding information published or placed on the internet by third parties, indexing it automatically, storing it temporarily and, finally, making it available to internet users according to a particular order of preference" [10, para 35].
A decentralized approach (with the operation of an open web index distributed over several jurisdictions) poses additional legal challenges. This concerns-among other aspects-national requirements for de-indexing, which may vary. If state actors are involved (e.g., national computing centers as operators of an open web index), then these actors are directly bound by constitutional law. This can lead to strict requirements on neutrality and anti-discrimination that must be adhered to. However, a valid counterpoint may result from national measures to promote certain issues or to strengthen minorities. Ultimately, constitutional requirements at the European level, at the member state level, and, in the case of a federal republic like Germany, even at the level of federal states might have to be taken into account.
Furthermore, numerous aspects of sub-constitutional law need to be taken into consideration to prevent a lack of compliance with legal requirements. These range from intellectual property rights to data protection law. There are also organizational legal issues (contracts, oversight, governance, etc.), which will not be addressed in this article.

Intellectual property rights
A web index does not necessarily have to store intellectual property (IP) content, but it is highly likely that the index will store contents that are protected by intellectual property rights (even in a compressed state, e.g., thumbnails of larger images found on websites) in order to enable search engines to provide services beyond simple links such as previews. This is especially the case if it wants to provide actual competition to established search providers. Search engines today no longer need just an index, but also numerous accompanying functions such as snippets, verticals, knowledge graphs, instant answers, or mobile applications. The legal implications will be covered in this section.
According to the case law of the ECJ, the declared main objective of Art. 3(1) of Directive 2001/29/EC [11] is to achieve a high level of protection for authors, which means that the concept of communication to the public must be understood in an extensive way [12, para 22]. Therefore, the term "public reproduction" includes the operation of a file-sharing platform on the internet that refers to protected works by indexing metadata and, by implementing a search engine, enables the users of this platform to find these works and find them within the framework of a "peer-topeer" (= user to user) network [12]. In its decision, the ECJ considered it irrelevant that it was not the operator but rather the users of the platform who made the works available online. Only the indexing of the data and the availability of a search engine enabled the easy and comfortable sharing of the data. Since the platform operator was also able to recognize that a very large proportion of the files on the online file sharing platform refers to works that were published without the consent of the rights holder, it was clear to the operator that they were being played back to a new audience. Nevertheless, while the unlawful purpose of many file-sharing platforms might be considered as recognizable by the (index) operator, this still needs to be acknowledged by an index operator.
In order to provide useful and effective information to users, a web index needs to provide tools to help them reach the intended information. Hyperlinks are a technical solution to this challenge and have been integral to the creation of the World Wide Web from the outset.
In both the "Svensson" case [13] and the "BestWater International" case [14], the ECJ declared a link as a "reproduction" within the meaning of Art. 3 (1) of Directive 2001/29/EC, but not as public reproduction as long as the linked articles are already freely accessible. Therefore, "linking" does not create a "new public." Also, according to the ECJ [15], hyperlinks make it easier to discover other websites, including the protected works that may be accessed there, and undoubtedly offer faster and more direct access to protected works. However, any other interpretation of the term "making available to the public" would seriously affect the functioning of the internet and run counter to achieving the objective of the directive-namely to promote the development of the information society in Europe [16].
Nevertheless, providing links to a work that has been unlawfully posted on the internet can, under certain conditions, constitute a public reproduction. This is not the case if the work has been posted on a website with the permission of the author. However, if the work is published on another website without permission, the link constitutes a public reproduction. The knowledge or negligent ignorance of the person placing the link of the copyright infringement should also be taken into account. An intention to make a profit can be used as a criterion, which leads to a presumption of conformity. The intention to make a profit can be considered as subordinate in regard to the open web index, thereby resulting in a lower risk of classifying the service as a promotor of unlawful acts.
In addition to links, the "caching" of content is an essential feature of a web index. The term caching is generally understood as the caching of information: The purpose of caching is to make the transmission of requested information by users faster and more efficient. Information that is required frequently can then be retrieved from the cache, which shortens loading times. In the case of search engines, "content caching" is of particular importance. While a search engine "crawls" the internet, it creates and saves a kind of "snapshot" of every web page it finds, which is then stored for a certain period in the "cache." In the preamble thereto of Directive 2001/29/EC, Recitals (31) and (33) state that a fair balance of rights and interests between the different categories of rightsholders, as well as between the different categories of rights-holders and users of protected subject matter must be safeguarded. Furthermore, the exclusive right of reproduction should be subject to an exception to allow certain acts of temporary reproduction, which are transient or incidental reproductions, forming an integral and essential part of a technological process and carried out for the sole purpose of enabling either efficient transmission in a network between third parties by an intermediary, or a lawful use of a work or other subject matter to be made. To the extent that they meet these conditions, this exception should include acts that enable browsing as well as acts of caching to take place, including those that enable transmission systems to function efficiently, provided that the intermediary does not modify the information and does not interfere with the lawful use of technology, widely recognized and used by industry, to obtain data on the use of the information [17].
On the other hand, it is disputed whether caching functions are privileged in relation to Directive 2001/29/EC, since these acts of reproduction conducted by search engines or a web index cannot be considered as an "integral part of a technical process," but rather as part of a service of the respective search engine or web index operator. In addition, the duration of storage in such content caching systems (several weeks to months) cannot be classified as "temporary" within the meaning of the Directive. The same applies to mirror servers. Therefore, content caching by a web index must either be permitted by the respective rights-holder or performed in accordance with the other legal exceptions expressed in Recital (33)  Thumbnails are small preview images or miniature images that are used as a preview for a larger version of these images or graphics. Thumbnails are commonly displayed by search engines in their result lists. The operators of a web index cannot rely on the privilege expressed in Directive 2001/29/EC. Furthermore, this type of use by the web index qualifies as a separate economic value on its own (cf. Recital 33). Even the privilege to quote cannot override the right to make a thumbnail publicly available, since a thumbnail does not qualify as a "quote" as it is not used in the context of an independent work for explanatory purposes or the like.
A web index is undisputedly a database. The protection of the web index operator as a Database Owner in regard of Directive 96/9/EC [18] must also be evaluated. In addition to the copyright issues that arise for operators of an (open) web index when third-party content is added to the web index, the web index itself or the database owners naturally also have legal protection. The rights of the database owner are protected by copyright according to Art. 2(c) Database Directive (96/9/EC) as a collection of independent works, data, or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means.
According to Recital 41 of the Database Directive, the database owner is the person who takes the initiative to create the database and bears the investment risk. Therefore, the database manufacturer has the exclusive right to reproduce, distribute, and publicly reproduce the database as a whole or a substantial part of the database. This protection is very comprehensive. The way in which elements or information are extracted from a protected database is irrelevant [19].
According to Recital 42 of the Database Directive (96/9/EC), the database owner's right to prohibit unwanted usage applies not only to the maker of a competing product, but also to all actions of the user that "go beyond his legitimate rights and thereby harm the investment." Therefore, in principle, all acts of use are covered by the Directive that cause qualitatively or quantitatively significant damage to the database owner's investment, regardless of whether the act serves a commercial or a non-commercial purpose. According to the rulings of the German Federal Court of Justice, a purely quantitative assessment does not constitute a "substantial part" of a database if 10% of the data records are transferred from a database [20].
For the assessment of the scope of the sui generis protection, it is particularly irrelevant whether the aim of the extraction and/or further purpose of usage is to create another database. It is irrelevant whether the database created competes with the original database and whether the new database is the same or a different size compared to the original database. Even a small part of a database can therefore be an essential component if, for example, the database is based on particularly difficult and costly data acquisition. When taking over an individual component from a database, however, one cannot usually speak of an essential part. It should be noted that the removal of insignificant parts can also encroach on the rights of the database owner if it is repeated and systematic and runs counter to the normal evaluation of the database or unreasonably affects the legitimate interests of the database owner.
Regarding an open web index, it is important to define who is to be positioned as the database owner, since this is the basis for the further utilization of rights-the start of the "chain of title." The granting of rights to use the open web index or to use it as a template for a search engine is imperative. This could happen in the form of individually negotiated agreements or as a general license. Thus, it could be deployed based on the "copyleft" or "non-copyleft" approach common in the field of open source licenses. However, this is more of a strategic than a legal question.

"Right to be forgotten"
The ECJ has "found that, in exploring the internet automatically, constantly and systematically in search of the information, which is published there, the operator of a search engine 'collects' such [personal] data which it subsequently 'retrieves', 'records' and 'organizes' within the framework of its indexing programs, 'stores' on its servers and, as the case may be, 'discloses' and 'makes available' to its users in the form of lists of search results" [21, para 28]. Since the first part of this statement also applies to a web index, the activities of a web index amount to the processing of personal data wherever such data is collected. The operation of a web index is thus subject to European Union data protection law.
The "right to be forgotten" has been explicitly enshrined in Art. 17 General Data Protection Regulation (GDPR) [22] after being first recognized by the ECJ in its Google Spain judgement [21]. The "right to be forgotten" means that the data subject may request de-indexing or de-listing from a search engine provider. Consequently, the ECJ has abandoned the term "right to be forgotten" and has since then predominately used the term "right to de-referencing," while the original terminology lives on in the General Data Protection Regulation [23, para 38 et seq., 10, para 65]. Different scenarios must be considered: 1. The data subject requests de-referencing from the operator of a search engine based on the open web index. 2. The data subject requests de-referencing directly from the web index operator (or one of the operators, since we envision a collaborative operation). 3. The data subject requests de-referencing from both/multiple entities.
The Google Spain judgement was concerned with "the display, in the list of results that the internet user obtains by making a search by means of Google Search on the basis of the data subject's name, of links to pages of the on-line archives of a daily newspaper that contain announcements mentioning the data subject's name and relating to a real-estate auction connected with attachment proceedings for the recovery of social security debts" [21, para 28]. An open question is whether to apply the standards that apply to a search engine provider to an open web index or those that apply to an online archive.
Art. 17(3) GDPR states that an organization's right to process someone's data might override their right to be forgotten. From a legal point of view, the main challenges for a web index in this context is the technical design of the web index itself. The reasons listed in Art. 17(3) GDPR show that contradictions regarding data deletion are particularly evident when creating back-ups. A web index will have to create backups, since in addition to the right to data deletion it is also legally obliged to back up data. In order to resolve this contradiction between the obligation to delete data and backing-up data, it appears necessary, on the one hand, to carefully document the data processing purposes during the development of the web index and, on the other hand, to develop a viable deletion concept ("privacy by design"). Among other things, the deletion concept carefully documents the corresponding purpose as the basis for data collection and processing. The purpose should be described as precisely as possible. Thus, a deletion concept cannot solve the technical difficulties involved in deleting individual data records. However, it offers a basis for argumentation for data processing to be carried out in accordance with Art. 17 GDPR.
The implementation of the "right to be forgotten" is financially, structurally, and technologically challenging. Google states that since the Google Spain judgement, the company has received almost one million requests for de-listing that concern 3.8 million URLs. About half of these URLs were then actually de-listed [24]. There is reason to believe that the ECJ judgement has made the market entry of small search engine providers even more difficult as it adds another costly operational requirement (cf. [25,Art. 17 para 25]). If an open web index wishes to effectively promote plurality on the online search market, a design should be preferred that supports small and emerging search engine providers in this regard.

Competition law
Competition law could come into play in the context of a right to be indexed. If a website is not indexed, it cannot be found through any search engine that bases its service on that index. This can have adverse effects on businesses, organizations, etc., that may not be able to acquire new customers, donations, etc. This is especially the case when one search engine dominates the access of large parts of the population to the internet by serving as a de facto gateway. The essential facilities doctrine could come into play here [26, 27, p. 365, 28, p. 194]. The doctrine applies in the context of Art. 102 TFEU, which prohibits the abuse of a dominant position on the market. A dominant position is likely to appear on the web index market, since there are high access barriers (initial investments, network effects [29,30], etc.) resulting in a market of only very few operators.
The denial of access to an essential facility like the internet may constitute a prohibited abuse of this dominant position under Art. 102 TFEU [31]. There are strict conditions to this type of abuse since countermeasures result in a forced contract. First, the essential resource must be in the hands of a dominant operator in an upstream market. Second, access to this resource must be objectively necessary to engage in competitive activity in a downstream market. Third, the necessary resource must not be capable of being reproduced under reasonable economic con-ditions. Fourth, access must be technically possible but denied or allowed under unjustified restrictive conditions. The European Commission adds that the refusal must be likely to harm the consumer [32, para 81].
These conditions could be met in the case of an open web index as described here. A web index is a service of the upstream market that is objectively necessary for search engines and many other digital services on the downstream market of the internet. As mentioned above, it cannot be reproduced under reasonable economic conditions and consumers may be harmed if a website is not indexed. Whether access to a website and all of its content is technically possible and whether it is denied or allowed under unjustified restrictive conditions is a case-by-case decision.

The challenge of "openness"
Google's and Bing's web indexes are not publicly available in their unfiltered raw version due to economic interests. But even a non-profit web index as described here might face inevitable legal restrictions to its full publication to everyone. Intellectual property, the right to be forgotten, and penal law are in a basic conflict with the principle of openness. However, full access to an unprocessed web index can also promote research, economy, democracy, and fundamental rights, but at the same time might also enable unprecedented problematic or even criminal activities harming other fundamental rights and principles.
Depending on the design of the index, it is conceivable that only authorized legal or natural persons would get access to the unfiltered raw version by virtue of law or authorization of an administrative body. A second version of the index might be retained for research or security purposes-a version that includes deleted links not available to the public.
The implementation of European values and rights in the web index might deflagrate if that implementation is not conveyed to the users of search engines based on the index. Therefore, certain requirements could be attached to the use of the web index by search engine providers. One such requirement might be to demonstrate an appropriate system for processing requests concerning the right to be forgotten. But such requirements could also be more far-reaching and relate to, for example, the ranking of search results in order to ensure neutrality and prevent discrimination.
However, forcing all service providers using the web index to adhere to certain ideals would clash with the idea of a truly open web index. What remains is the hope that those providers that adhere nevertheless will rise to the top in their respective markets or at least gain significant shares.

Neutrality and anti-discrimination
Index neutrality refers to an objective, quasi natural sorting of the index. The additional value of an open web index is to have unfiltered access to the internet without unnecessary prior sorting, ranking, or prioritization.
However, even at the stage of merely creating and maintaining a web index, the operator(s) might not be able to avoid making certain selection and evaluation decisions. Absolute neutrality will not always be possible or even desirable. Selection and arrangement of foreign content is often inevitably accompanied by the risk of opinion manipulation or discrimination against certain content.
Therefore, transparency of this procedure and the corresponding algorithms is necessary. A European approach based on fundamental rights and values requires that the creation and maintenance of a web index are just as open as the outcome itself. In this way, they become subject to public democratic discourse and participation ensuring protection of minorities, diversity, and innovation. Therefore, these rights and values must be identified and subsequently concretized; conflicts between them must be resolved. Concretization in this case means to break down fundamental rights and principles to technical specifications (for a proposal on how to achieve this see [33, p. 64 et seq.]: the KORA method [concretization of legal requirements]). Ultimately, the design of an open web index should be constitutionally compatible beyond simply meeting the minimum requirements set by law; it should best support constitutional rights and principles [33, p. 72 et seq.]. Therefore, they must be enshrined in and promoted by technology and technical infrastructure. The perspective needs to be broader than, for instance, privacy or data protection by design; the goal should be "fundamental rights and principles by design" aiming at the best possible realization of these rights and principles.

Constitutionally compatible design
However, European values can also come into play at the level of the search engines that base their operation on an open web index. One example is Berlinbased Ecosia GmbH, which provides a search engine that is currently based on Bing and thus subservient to the terms and conditions set by Microsoft. Ecosia uses a major part of any earnings to plant trees.

Conclusion
An open web index does not spell the end of monopolies in the digital sphere-but it might very well be a significant steppingstone on the road to establishing valid alternatives to dominant digital services. The established search engines will still have a significant advantage over any newcomer: they can rely on years and years' worth of user data that can and has been used to improve search algorithms, ergonomics, advertising, and all other relevant aspects of a search engine. If these data are personalized to a user, the right to data portability in Art. 20 GDPR could come into play-if it is understood in a broader sense to include data that results from the actions of a data subject and not limited to data that was directly provided by the data subject.
The current legal framework is geared towards a concurrence of web-index and search engines; most court cases relate to Google. The separation of index and search engine also poses a significant challenge not only for the design and operation of an open web index, and particularly with regard to the "right to be forgotten," but also for the implementation of European rights and values. If openness is understood in a broad sense, then the index would also be available for search engine operators who do not strive for the best possible realization of fundamental rights and values. A compromise could be to make the index available to research without limitations, but to bind search engine operators to certain standards. These standards, however, must not lead to a situation where access to the "open" index becomes as restrictive as access to established indexes.
Despite all of the issues, the positive aspects of an open web index would undoubtedly prevail. On a political level, an open web index would join other efforts to emancipate Europe from the technological dominance of others. Keeping the potential for surveillance and other infringements of fundamental rights in the digital sphere in mind, the establishment of an open web index would also serve the duty to protect these fundamental rights. On an economic level, the dependence of the European economies and the Single Market as a whole would be reduced. The case for European autonomy was further strengthened when Google recently threatened to cut off Australia from its search engine in an effort to stop Australian legislation that would force the company to negotiate payments with news media companies [34], which is in line with similar threats made in the past in similar contexts. With an alternative readily available after the establishment of an open web index, lawmakers would be under less pressure when legislating for the digital sphere, thereby strengthening Europe both in the short and in the long run.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.