Any preservation of data or records automatically entails the potential risk of misuse. The digital world, including the sector of electronic archiving, has accentuated this phenomenon even further. On the one hand, it has facilitated unauthorised remote access to electronic data, on the other, it has enabled the unauthorised extraction of digital data from which—although primary access is open or permitted by the citizen—using certain tools and algorithms it is possible to indirectly extract other information without the consent or even knowledge of the citizen. In a way, this is analogous to typical intelligence work, which is increasingly being used in the private sphere for various purposes.

This is also true for the risks of deanonymisation of personal data, that is, reidentification of individuals. Although together with data destruction, anonymisation belongs among the important tools of data minimisation, it is becoming increasingly apparent that anonymisation is not a panacea. Not only can it be opted for in certain situations and not in all circumstances, given the need to preserve (personal) data, but the increasing capabilities of information technology and artificial intelligence are bringing more effective tools enabling the reidentification of individuals and the deanonymisation of anonymised and pseudonymised data. In this respect, one could apply the observation of the Working Party on the Protection of Individuals with regard to the Processing of Personal Data as an independent European data protection and privacy advisory body: “Thus, anonymisation should not be regarded as a one-off exercise and the attending risks should be reassessed regularly by data controllers”.Footnote 1

8.1 Data Retention as a Specific Form of Data Minimisation, and Data Storage Limitation

One of the specific levels on which the problem of minimising personal data and limiting their preservation demonstrates, is the phenomenon of “data retention” as the process of retaining traffic and location data, that is, a broad set of data relating to the use of public telephone networks, public mobile telephone networks, and electronic communications networks.Footnote 2 In the legal systems of European countries, it is common to find data retention obligations imposed on entities providing public communications networks or providers of publicly available electronic communications services for a certain period of time.Footnote 3 In October 2020, the most recent (but not the first) judgement of the Court of Justice of the European Union was delivered, which demanded that the governments of the Member States of the European Union may require operators to store and access aggregated and non-targeted data only in the case of fighting serious crime or in a situation of serious threat to national security.Footnote 4 The court went on to state that national legislation imposing an obligation to make traffic and location data available to security and intelligence services “exceeds the limits of what is strictly necessary and cannot be considered to be justified, within a democratic society”.Footnote 5 In this respect, the Court did not find legitimate such legislation which allows public authorities to impose on providers of electronic communications services an obligation to transmit traffic and location data to security and intelligence services on a general and indiscriminate basis.Footnote 6

However, in some European countries, including the European Union itself, the issue of data retention has been a matter of concern for some time. In 2006, the European Union passed a controversial directive that required EU Member States to ensure that their legislations preserve traffic and location data of public telephone, mobile, and electronic communications networks for a minimum of six months and a maximum of two years from the date of communication.Footnote 7 Subsequently, individual member states started to implement the directive in their national law. In Germany, the directive was enforced at the end of 2007,Footnote 8 after which a number of constitutional complaints were filed alleging violation of telecommunications secrecy and the right to informational self-determination as fundamental rights guaranteed by the German constitution.Footnote 9 In 2010, the German Federal Constitutional Court ruled the law unconstitutional and abolished it. It declared the six-month general retention of traffic and location data to be contrary to the German constitution as it contravened postal and telecommunications secrecy.Footnote 10 General retention and provision of data violates the principle of proportionality when the interference with fundamental freedoms is not proportionate to the need to protect the rights. For example, in the field of criminal proceedings, the balance would lie in the use of such data in the case of suspected serious offences.

Meanwhile, the European Directive continued to apply in Germany. However, this changed by a 2014 judgement of the Court of Justice of the European Union, which retroactively annulled the controversial 2006 directive.Footnote 11 The Court considered the relationship of the Directive to the rights guaranteed by the Charter of Fundamental Rights of the European Union, in particular the “right to respect for private and family life, home and communications” and the right to the protection of personal data.Footnote 12 In particular, the Court took into account that “the data which providers of publicly available electronic communications services or of public communications networks must retain, pursuant to Articles 3 and 5 of Directive 2006/24, include data necessary to trace and identify the source of a communication and its destination, to identify the date, time, duration and type of a communication, to identify users’ communication equipment, and to identify the location of mobile communication equipment, data which consist, inter alia, of the name and address of the subscriber or registered user, the calling telephone number, the number called and an IP address for Internet services. Those data make it possible, in particular, to know the identity of the person with whom a subscriber or registered user has communicated and by what means, and to identify the time of the communication as well as the place from which that communication took place. They also make it possible to know the frequency of the communications of the subscriber or registered user with certain persons during a given period.”Footnote 13

Based on this, the Court concluded that: “Those data, taken as a whole, may allow very precise conclusions to be drawn concerning the private lives of the persons whose data has been retained, such as the habits of everyday life, permanent or temporary places of residence, daily or other movements, the activities carried out, the social relationships of those persons and the social environments frequented by them”.Footnote 14 As a result, the very storage of such data for the purpose of their eventual further disclosure to the public authorities “directly and specifically affects private life”,Footnote 15 and therefore the right to respect for one’s private and family life, home, and communications, as well as the right to the protection of personal data guaranteed by the Charter of Fundamental Rights of the European Union.

In its judgement, the Court then summarised that the phenomenon of the retention of traffic and location data and their subsequent use by the public authorities without informing the person concerned constitutes an extensive and particularly serious interference with the abovementioned fundamental rights, highlighting the fact that if the persons concerned are not informed of the retention and use of such data, it may “generate in the minds of the persons concerned the feeling that their private lives are the subject of constant surveillance”.Footnote 16

In 2016, the Court of Justice of the European Union issued another judgement reiterating that the limitation of personal data protection in the context of the storage and provision of electronic communications must be reduced to the absolute minimum necessary.Footnote 17

Going back to Germany, the development in the area of data retention had been completed for the time being by the 2015 law, which, although it stipulated the obligation to retain traffic and location data, reduced the retention period from 6 months to 10 weeks.Footnote 18 In addition, at the end of 2015, an amendment was added to the German Code of Criminal ProcedureFootnote 19 imposing significantly stricter rules and limitations on access to traffic and location data. These are provided only for the investigations of a selected group of particularly serious crimes, such as high treason, sexual abuse, child pornography, murder, and certain other crimes.

The issue of data retention is also undergoing significant development in the Czech Republic. A 2005 Act imposed an obligation on those providing a public communications network or a publicly available electronic communications service to retain traffic and location data for a maximum period of 12 months, while an implementing decree specified this period to be 6 or, in the case of some data, 3 months.Footnote 20 Several years later, however, following an initiative of 51 Czech MPs, the Czech Constitutional Court assessed this provision and annulled it.Footnote 21 Subsequently, an amendment was approved which made it mandatory to keep these data for 6 months.Footnote 22

In 2012, the Czech Constitutional CourtFootnote 23 repealed Section 88(a) of the Code of Criminal Procedure, which was subsequently revised and the above amendment was added. The Czech legal system thus anticipated the development in Germany by several years and narrowly specified the categories of offences for which law enforcement authorities may request data on telecommunications traffic (including intentional offences with a maximum penalty of at least three years).Footnote 24

The development in the Czech Republic finally concluded in 2019 by a ruling of the Czech Constitutional Court, which confirmed the obligation of those providing a public communications network or a publicly available electronic communications service to retain traffic and location data for a period of 6 months.Footnote 25

Data retention represents a specific situation in which data minimisation becomes the underlying question, both in terms of the scope of the data to be retained and the time limit determined for their retention. It is thus a special case as the ability to tip the imaginary scales lies with the minimisation of the data that the state is entitled to keep on its citizens in order to ensure their security on the one hand, their freedom on the other, and at the same time a democratic open society; the other side of the scales is totalitarian practices, Big-Brother-like surveillance, and discipline of the citizens.

Data minimisation in archives and archiving asks a similar key question as data retention, that is, what data on their citizens do public authorities have the right to maintain. Yet, another specific focus moves in a different direction: What citizen data the state and public authorities can keep permanently or for long periods of time. In the following part, I will take a closer look at how the principles of data minimisation and storage limitation shape the field of archives and archiving, including archival theory and methodology, and what are the main tools archives and archiving use to apply them. I will briefly introduce this issue with a specific concept of data minimisation in relation to archiving in the public interest using the diction of the European GDPR.

8.2 Data Minimisation and Storage Limitation in Relation to Archives and Archiving

In the European Union, the General Data Protection Regulation (GDPR)Footnote 26 has assigned archiving in the public interest—similar to scientific and historical research or statistical purposes—a privileged status. Archiving in the public interest has been exempt from the generally formulated and applied principles of personal data processing, including an exemption from the very broadly formulated right to be forgotten, in which case the purpose of archiving in the public interest is itself the reason for not applying the right to be forgotten. However, this privileged position also has significant limitations. Any privileges reserved for archiving in the public interest are conditional on guaranteeing the main principles of personal data processing as defined by the GDPR in Article 5, two of which are key: the “data minimisation” principle and the “storage limitation” principle. These safeguards are then further specified in GDPR Article 89 under the somewhat convoluted wording that the relevant measures “may include pseudonymisation provided that those purposes can be fulfilled in that manner. Where those purposes can be fulfilled by further processing which does not permit or no longer permits the identification of data subjects, those purposes shall be fulfilled in that manner.”

The core of the regulation, which is crucial for archival and historical science, is based on conditionality: If it is possible to erase the specific identity of a person, the obligation to de-identify a specific person applies, provided that the public interest of archiving and the scientific and historical research objectives or statistical purposes are not compromised. At this point, the European Regulation opens a very large room for interpretation: Who shall assess, in what way and according to what criteria, whether the purpose pursued—in the case of archiving, it is the purpose of archiving in the public interest—can be fulfilled, even when the identification of persons concerned in the records and archives is erased? Typically, this may be the case of census records, which, for example, Germany has been anonymising for several decades, and the Czech Republic did so most recently in 2021 before transferring the records to an archive for permanent preservation.

Yet, regardless of the European GDPR, it is extremely important that archives and entire archival systems are able to view the protection of personality and privacy from the perspective of the actual permanent or—if we are more realistic—long-term preservation of data in archives. Here, archiving comes into contact with what can be aptly described—and not only according to the GDPR—as data minimisation and data storage limitation. In the vast majority of developed archival systems, archives are the places that both implement and are responsible for the absolute largest share of the total reduction of public records created. In the context of archival appraisal and selection of records for permanent preservation, archives perform by far the largest, completely legal destruction of public records that occurs today; it is usually around 95% of the total of public records created. This is then proportionate to the responsibility not only for the adequate and professional selection of records valuable from an archival-historical perspective, but also for the irreversible destruction of such records and data which, on the one hand, pose a serious risk of potential future misuse and on the other, do not have such a high historical-archival value that they should be preserved in an archive permanently or for a long term.

Archival theory, methodology, and practice, however, have so far neglected the potential risks of misuse of sensitive personal data contained in permanently (or long-term) stored records, and have focused almost exclusively on the records information content and their future usability by various research projects and private research interest. This is what archiving should change in the future. In the following part, I will take a closer look at some models of archival appraisal as they took shape in the post-1945 period. I will conclude by analysing another form of minimisation and data storage limitation, which is the process of anonymisation or pseudonymisation, proposing a concept of linking anonymisation and pseudonymisation to the four categories of the right to be forgotten as presented in Chap. 5. I will pay particular attention to the risks of deanonymisation and reidentification.

8.2.1 Records Destruction and Archival Appraisal as Basic Tool for Minimising Personal Data in Records and Archives

General awareness views archives primarily as institutions serving the purpose of preserving and archiving data and records. However, their equally important function is to reduce records created by a wide range of entities ranging from public to private institutions and natural persons. However, the purpose of this reduction should not only be the necessary reduction of the volume of permanently or long-term archived material, but also the protection of the personality, privacy, and personal data of those concerned in the records. This purpose will play an increasingly important role in the future, due to the significantly higher risk of data misuse in the case of electronic data and records compared to paper documents.

Common developed archival systems across countries establish more or less similar models to implement a process by which a very small part of records is designated for permanent or long-term preservation and the absolute majority for legal destruction. This process occurs even before the actual archiving and preservation and processing of archival records in archives. Although the legislations of some countries do implement an obligation even for private entities to submit their records to archives for retention procedures and archival appraisal, this is a rather minor phenomenon. This is due to the common assumption that private entities, including individuals, should have the right to dispose of their records as they see fit (provided, of course, that other rights are not infringed, including the most closely watched protection of personality, privacy, and personal data). For this reason, and in the context of the whole text, I will therefore consider archival appraisal primarily using the example of public records.

In the English-speaking world this process is most often referred to as “archival appraisal”, in German terminology it is called “archivische Bewertung”, in French the terms “évaluation” or “tri” are used, in Italian it is “selezione”. In terms of meaning, the professional archival terminology reflects that it is a certain “selection” of records, a process that goes hand in hand with their “evaluation” based on certain content criteria. The International Council on Archives, in its draft methodology for records appraisal, provides a fairly adequate definition: “Appraisal is the process of evaluating records to determine how long to keep them, including to decide if the records have sufficient long term value to warrant the expense of preservation in an archives. Appraisal is fundamental to the archival endeavor, because appraisal determines what records will be kept and what records can be disposed.”Footnote 27

Especially after 1945, in the context of a massive increase in the volume of records created, the archival industry began to develop some models of archival appraisal, which sought to systematically reduce the volume of records before the moment they are archived. At the very core of the different models is the search for and reconnaissance of certain values or, in more recent terms, meanings that a record or archival material can carry and, if applicable, also the determination of who do they carry them for. The now classic concept of primary and secondary values by the American archivist and thinker Theodore R. Schellenberg achieved a great international resonance and has had a significant impact on contemporary archives. “Primary value” according the Schellenberg’s mid-1950s reasoning represents the meaning of the record for which it was originally created.Footnote 28 It is therefore the meaning primarily determined by the needs of the record creator. “Secondary value” is then determined by the interests of other institutions or private researchers, it expresses a secondary meaning, in the classical Schellenberg sense, the scientific, research, or historical meaning. Schellenberg then divides secondary value into two categories. First, there is the “evidential value”, which consists in the information the record provides on the activity of the record creator. Second, it is the “informational value” or “research value”, which expresses what information for research in various sciences the particular record contains.Footnote 29 In both cases, however, it is of value for research purposes.

However, Schellenberg’s concept introduced one crucial point: It marked a sharp departure from the opposing approach, the most prominent representative of which was already before World War II, another distinguished classic of archival theory, Hilary Jenkinson. The key was the departure from Jenkinson’s idea that archives should preserve the exact arrangement of the collection of records as it was constituted by the creator, without discarding any of the records.Footnote 30 Jenkinson would have preferred to leave out appraising records in terms of their possible historical significance, and Schellenberg in this respect represented the post-war awakening to the absurdity of Jenkinson’s idea.

Then, at the beginning of the twenty-first century, Canadian archival thinkers (Yvon Lemay, Sabine Mas, Louise Gagnon-Arguin) came up with the idea of expanding and complementing the classical concept of primary and secondary value of the record and they proposed the concept of a tertiary value at the sixth symposium of the Interdisciplinary Group for Research in Archival Science (Groupe interdisciplinaire de recherche en archivistique, GIRA) “Archives, from information to emotion” (“Les archives, de l’information à l’émotion”), held in Montréal in 2010.Footnote 31 The Canadian concept uses several not yet fully unified terms: the “emotional value” (“valeur émotive”),Footnote 32 the “sentimental value” (“valeur sentimentale”),Footnote 33 and also the “artistic-emotional-affective” nature of the value thus defined.Footnote 34 The common denominator of this idea is the hypothesis that archival records have, in addition to the abovementioned primary and secondary value, or in addition to the testimonial and informational value, also an emotional value. Yvon Lemay and Marie-Pierre Boucher provide a condensed definition: “Imagine for a moment that on the desk in your office is a framed photograph of your mother, taken in hospital just before she passed. Needless to say, whenever you look at that photo, you feel devastated. Even though a very banal photograph, it has immense value.”Footnote 35 Usually it is the authenticity of the (archival) record with a direct link to its source, origin, or author that creates a strong “emotional charge”. Lemay and Boucher eventually systematised three basic functions of archival records and, with them, of archives as their controllers: (1) to inform (“informer”), (2) to bear witness (“témoigner”), and (3) to revive the past, to evoke, to “recall the past” (“évoquer”).Footnote 36 Not only Lemay and Boucher, but also other members of the GIRA group appeal to archives and archivists to take this third value, dimension, and function of the archival record seriously enough to start noticing it and implementing it in their activities, especially in the field of various forms of access to and use of archives, including exhibitions.

Eric Ketelaar, one of the most influential and formative figures in contemporary archival studies, looked at the layers of meaning in records from a slightly different direction. Ketelaar has significantly reversed the perspective when he asked how users (researchers, historians, etc.) create meaning for the record and the historical source. He began with the claim that records have a whole range of “meanings”, in a way, they are a “repository of meanings”.Footnote 37 A few years ago, with the help of psychology, Ketelaar specified and systematised the various ways of the emergence or constitution of record meanings.Footnote 38 He based his distinction on one of the basic systematisations used in psychology, distinguishing in principle three kinds of mental processes: cognitive (knowledge), affective (emotions), and conative (volition). This is a division that psychology uses in other contexts as well, in the teaching of propositional attitudes, and so on. Ketelaar transferred this division into archival theory and methodology and applied it when considering the constitution of the meanings of records. He distinguished three ways of forming the meanings: (1) cognitive mode, (2) affective mode, and (3) conative mode. The cognitive mode of constructing the meaning of a record, or the cognitive mode of the user’s attitude towards the record, represents a purely cognitive way, and cognitive motivation of the record user. The conative mode of constructing the meaning is based on the specific motivations and intentions with which the user approaches the historical source. The motivation is different for a person looking for employment records to calculate his pension, different for a historian writing an expert study, and different for a thief looking to gain profit from selling archival records or valuable parts of archival material. Ultimately, the affective mode embodies the emotional level of the user’s attitude to the record/archive/historical source, corresponding to the tertiary value of the record according to Canadian archival theory.

The above concepts focus on what layers of meaning shape records. One step “down” from the actual practice of appraisal of records are the models of archival appraisal, which began to take shape gradually from the 1950s onwards. Already in the early discussions of the 1950s, the West German archivists, Georg Wilhelm Sante and Wilhelm Rohr, called for the assessment of records not to be based on the records themselves, but rather use the entire files and groups of creators as the starting point for such assessment. Hans Booms later referred to the findings of the archivists as the Sante-Rohr model.Footnote 39 Hans Booms himself, President of the German Federal Archives from 1972 to 1989 and one of the most important German archival theorists and thinkers, built a different model in the 1970s, based on a different premise: Archival appraisal is intended to create a society-wide documentation of public life in all its complexity and diverse facets. For this purpose, Booms proposed to create models for the creation of historical documentation (“Überlieferungsmodelle”), which he called the “documentation plan” (“Dokumentationsplan”).Footnote 40 In younger debates in Germany, the term “documentation profile” (“Dokumentationsprofil”) became more common. One of the results was a methodological aid for creating documentation profiles in municipal archives, approved by the German Federal Conference of Municipal Archives (“Bundeskonferenz der Kommunalarchive”) in 2008.Footnote 41

Boom’s concept of a documentation plan and the effort to mirror the whole of society in archives and archival collections corresponds with the direction of archival methodology in the USA and Canada, which have been and still are greatly influenced by social history, including the so-called new social history.Footnote 42 Social history was absorbed into one of the world’s most influential archival appraisal models, called macro-appraisal developed by Terry Cook (1947–2014) in Canada in the early 1990s (the origins date back to the late 1980s). It has been put into practice at the National Archives of Canada (now Library and Archives Canada), further developed until recently and is gradually applied in archival systems in various parts of the world.Footnote 43 In the macro-appraisal model, Cook calls for a shift away from concentration on the substantive content of a particular individual record and a focus on the functions, activities, programmes, and so on of the record creator.Footnote 44 The functions and activities of the creator must be understood in the context of their origin and within a broader context (political, social, cultural, etc.) and based on this context, the value of these functions, activities, and institutions, as well as the value of the creators themselves, needs to be determined. The theory of macro-appraisal consists in the transition from “content-based information” to “context-centred knowledge”. Cook then interprets the aforementioned shift from the question “what information does the record contain” to the question “how and why was the record created” as a recovery of a sense of provenance or as a “rediscovery of a sense of provenance”.Footnote 45 The reason why we can speak of a change in the meaning of the most important and default classical principle of provenance is the fact that in Cook’s work provenance is no longer primarily the office and its organisational structure, but in short, it is the functional context of the record creation. Similar tendencies in contemporary archiving can be found in the aforementioned Eric Ketelaar, who also claims to have found inspiration in Terry Cook.Footnote 46 While Cook uses macro-appraisal theory as a means of defining himself in relation to traditional archiving, Ketelaar, in a very similar vein, introduces the dichotomy of “descriptive archivistics” and “functional archivistics”.Footnote 47 Compared to Cook, however, Ketelaar’s approach is characterised by a greater focus on the person of the record creator.

However, archival methodology is also developing some other models of archival appraisal. One of the thoroughly elaborated models, including a step-by-step application in public records, was created in Germany by the State Archives of Baden-Württemberg, one of the German Landesarchives. The method of “vertical and horizontal evaluation” is based on a comparison of the roles and functions of organisations and their components in their vertical (superordinate and subordinate units) and horizontal structure (division of competences between units at the same level, their cooperation, etc.).Footnote 48 By finding out the tasks, activities, competences of a given subject in the context of competences, activities, and cooperation with other subjects (their superiors, subordinates, or those standing on the same level), the archivist then uses this method to excavate the records with the greatest relevance and informative value.Footnote 49

The perspective of not only evaluating records but also the offices, organisations, that is, the creators of the records, and evaluating “their” records according to their “value”, was already being pursued by the East Germans in the 1960s. Basically, they tried to establish classes or categories of creators (registry officers) according to their social importance.Footnote 50 A more significant shift away from the emphasis on the record alone towards evaluation of the creators occurred in West Germany around the 1970s. The creators were evaluated in the government system. However, it remained in principle in the context of the state administration; no broader or other context was considered. Current debate in Germany, known as the “horizontale und vertikale Bewertung”, can trace its predecessor to the German debates in the 1970s that discussed “horizontally/vertically integrated appraisal” (“horizontal/vertikal integrierte Bewertung”; “horizontale/vertikale Intergration der Bewertung”).Footnote 51 Since the 1990s, in addition to German archivists, Swiss archivists have also started to evaluate records based on the position of the creator (within the state administration; within the functions they perform, etc.).Footnote 52 At that time, the Swiss Federal Archives began to introduce the procedure of “Priorisierung” (“Prioritisation”). This procedure determines three priority classes (“Prioritätsklassen”) A, B, or C on two levels. Registry officers (Registraturbildner) are prioritised first, followed by the groups of records, defined in the filing plans. The shift consists in the fact that the archival-historical significance of records depends on the significance of their creators, classified into individual priority classes.

It might seem that the theories and models of archival appraisal applied later during retention periods and record selection procedures are not primarily related to the protection of personal data, personality, and privacy. But that would be a false impression. The connection exists on two basic levels:

  1. 1.

    At a general level, it is the basic link between archival appraisal, retention periods, and data minimisation, including personal data. The best way to protect these data is to destroy them. In this sense, the most powerful form of data minimisation and storage limitation is the very data reduction in the process of records selection and appraisal. If we stick to using the narrow terminology of the European GDPR, it is true that it views pseudonymisation as primarily a reversible process, in which personal data are recoverable. But the GDPR also introduces the “hard” data minimisation that takes place before the data are transferred to the archive, that is, the data, including personal data, are destroyed.

  2. 2.

    On a concrete level, it is the individual models of appraisal and selection of records implying the question of which specific groups of records to destroy and which groups to preserve. It is at this point that the potential for assuming a significantly different optic of looking at archival appraisal and records selection opens up: The initial perspective is no longer the question of what informative, and testimonial value the materials may have for current and future research, which ultimately corresponds to the classic Schellenbergian “secondary value” of records. Instead of this approach, which focuses on the information content and usability of records for various research purposes, and which so far still prevails in most archival theory and practice, a perspective opens up that puts the protection of personal data and the security risks of breaching this protection during long-term or permanent archiving at the starting point. This is a perspective that archival methodology has so far overlooked and has not taken in into account in a more fundamental and comprehensive way. It is an approach that primarily asks whether, and if so how, and with what consequences, the data that are hypothetically transferred to the archive could be misused.

Archival science can thus open a new and broad field of research into the crucial implications for archival practice. This raises questions as to which specific groups of records are to be irreversibly destroyed and which are to be preserved in the perspective of protection of personal data, personality, and privacy considering the risks of their possible misuse. Chapter 7, has already provided some specific examples of the misuse of certain groups of records and archives in the twentieth and twenty-first centuries, that included, among others, census and medical records.

I see several points crucial to the relationship of archiving and the process of data minimisation and storage limitation; they are:

  1. 1.

    In the future, archival appraisal and retention management will become key tools for applying the principle of data minimisation and storage limitation within records management even before transferring the data for archiving, as well as in the case of archives and data already archived.

  2. 2.

    Archival appraisal and disposal of records will henceforth acquire a new important function: They will serve as a fundamental pillar of justification for why personal data were preserved and passed on to the archiving phase and not destroyed during their existence in the records management before being archived. For the vast majority of the first few decades of the records’ existence, they will contain the personal data of living persons who will only pass away over time. At the beginning of their existence, archives are only on the starting line of a process that could, with a certain degree of exaggeration be described as the “disappearance of personal data”. This is a process that is driven by the living actors concerned in the archives gradually dying and personal data are by definition in most legislative systems tied only to living individuals. The process is similar to post-mortem personality protection, in which the sensitivity of certain data related to personal data and privacy disappears as the period since the death of the individual concerned gets longer. The German Federal Court of Justice (Bundesgerichtshof) put it very precisely when at the end of the 1960s it stated that post-mortem personality protection is limited in time; it is not infinite.Footnote 53 In its judgement, the Court also emphasised that the need to protect the rights of the deceased “disappears as the memory of the deceased fades”.

    In any case, archiving, including archiving in the public interest, begins long before the persons concerned in the archival material die. Thus, archives will always have to deal with the fact that they are managing the personal data of living persons and are therefore subject to all the obligations imposed by the relevant legislation on the processing and protection of such data.

  3. 3.

    Retention periods will become a much more important and multifaceted function. These have so far fulfilled two main functions in the management of public records: For the creators, they serve as a tool for the timely and continuous release of capacity in their registries. But they serve a much more important function for an open democratic society: They compel the creators to propose public records for retention in a timely manner and without delay and to submit some of their records for archiving. In this respect, they are one of the key means for the exercise of transparent and controllable records management, archiving, and all government and public administration processes in general. However, in the new context we are looking at, they will acquire another at least as important crucial role: They will become an essential lever for the application of the data storage limitation principle, if we follow the terminology of the European GDPR, which considers the data storage limitation principle to be one of the fundamental pillars of personal data protection and understands it as the principle that personal data cannot be retained for longer than is strictly necessary for their original purposes. If we stay in the European Union, the GDPR has indeed approved some form of exemption from this principle, inter alia for archival purposes, but even this exemption has its limits and it is not at all clear how its application will be interpreted in the future in individual countries and at the level of the Union (cf. Chap. 5).

    In any case, retention periods will become more important in the future for both records management and archiving.

  4. 4.

    In the context of the archival appraisal of public records, in some situations archiving in the public interest is beginning to open up possibilities for the free consent/disconsent of those whose personal data are being considered for archiving. Until now, archiving in the public interest has not taken into account the consent of persons to archiving and has used the standard legal authority for the preservation of records, including personal data.

8.2.2 Anonymisation, Pseudonymisation, and the Link to the Model of Four Categories of the Right to Be Forgotten

Destruction is not the only possible way in which data, including personal data contained in archival records, can be reduced. The second main tool for such reduction, are processes that are mostly summarised under the term data “anonymisation”.

Terminologically, the situation is somewhat complicated by the European GDPR, which in 2016 introduced (but not for the first time in the European Union) the concept of “pseudonymisation”. Let me thus open the following topic with a terminological analysis.

GDPR distinguishes between two fundamentally different processes—anonymisation and pseudonymisation of data. Pseudonymisation, as defined by the GDPR, means replacing a piece of personal data (e.g., a name) with another identifier so that the personal data cannot be further associated with a specific data subject. However, the possibility of re-establishing this link between the personal data and their subject is preserved.Footnote 54 This re-established link must once again comply—if we remain in the EU area—with all the requirements set out in the GDPR.

Anonymisation, on the other hand, is a process whereby the link between personal data and their bearer is irreversibly broken. The German legislation is a little more specific, defining anonymisation as “the alteration of personal data in such a way that individual data on personal or factual circumstances cannot—or can only be attributed to a specific or identifiable natural person by means of disproportionate demands on time, cost and manpower”.Footnote 55 In a similar way, it explicitly defines the process of pseudonymisation, namely in the sense of replacing the name and other identifiers with another character in order to exclude or make it substantially more difficult to identify the person concerned, that is, in the form understood and defined by the GDPR.Footnote 56

The question is whether the term “pseudonymisation” will gradually make its way into general archival terminology. Even though the GDPR and its pan-European validity will have a formative influence, it should be mentioned that Germany in particular knew and used this concept much earlier than the so far short-lived GDPR.Footnote 57

Terminologically, it is possible to consider what category is the blacking out personal data, or such data that could lead to their being linked to a specific identifiable person, in copies of records or copies of archival material.

In case of presenting “anonymised” (“pseudonymised”?) archival records in the research rooms (but similarly in other situations, typically when publishing a court judgement with blacked-out personal data) it cannot be considered anonymisation in the proper sense of the word. The possibility of re-establishing the link between personal data and their subject remains, for a simple reason: The data redaction (blacking out) only concerns copies of archival records, while the originals—containing all the original data—remain intact in archival depots. Strictly speaking, however, we cannot even talk about pseudonymisation as defined by the GDPR, because the personal data is not replaced by any other identifier and it is not a redaction of the original record but again, only its copies.Footnote 58

However, abstracting from the limited narrow legal framework of this terminology, we could place data redaction (blacking-out) or any other process of “depersonalisation” of archival copies on the imaginary borderline between anonymisation and pseudonymisation. It combines features of both of them. On the one hand, considering the copy of the record/archive and the researcher who consults this record, it could be deemed anonymisation, as it is or should not be possible to recover the eliminated personal data from the copy itself. On the other hand, in the Kantian sense of “an sich”, it is possible to restore the eliminated personal data by comparing the “depersonalised” copy of the record/archive with the original and complete the relevant data based on this comparison. And this is the most important feature of pseudonymisation, that is, the possibility of reconnecting the data to a specific person.

On the other hand, in cases when the records are transferred to the archive already anonymised by the creator themselves, so that no one else—not even the archive—will be able to “deanonymise” or “depersonalize” them again, we are dealing with the anonymisation of the original documents, or archival materials in the true sense of the word.

It is, among other things, this very fragile relationship between anonymisation and pseudonymisation that will probably play a more important role in the field of archives and records management in the future than it does today, and will also become—at least in some cases, as I have demonstrated in detail using the example of census records—an important social issue. The discussion will be conducted in particular in the direction of the extent to which it will be possible to make do with mere pseudonymisation and to what extent it will be necessary to use “hard”, irreversible anonymisation. The debates will undoubtedly include the economic dimension of the issue; data, as they say, is the “black gold” of the twenty-first century. And the irreversible and massive destruction of data will bring about considerable economic losses.Footnote 59

In view of the terminological analysis performed above, the procedure of pseudonymisation is practically without exception applied to records that have already been handed over for permanent archiving and have undergone the process of archival appraisal; the links between the data and their carriers are broken on copies and not on the original records, and the possibility to restore the link between the data and their carriers thus remains.

The process of making archival materials available to researchers consists most often in blacking out the relevant parts that could lead to the identification of their bearer prior to offering them to the researcher for consultation. In the case of analogue pseudonymisation of paper records, usually, however, it is not enough to only redact one copy, most times it is necessary to make a second copy and black out the specific data a second time. Some archives even manually cut out the relevant areas from copies of archival records, which is of course possible only in the case of paper documents and for pseudonymisation of analogue data. This very demanding process is eventually crowned by providing a dissatisfied applicant access to the archival material in a research room, the applicant is unhappy as they have received the eagerly awaited material usually after a not too short period of time, which in the “age of access” they gradually cease to be willing to tolerate, or they lose interest in the data made available with such a delay.

This point was already pinpointed in the 2005 Report on Archives in the enlarged European Union. “Changing societal expectations of the roles of the archivist in the twenty-first century are activated by the increasing irrelevance of constraints of place, time, and medium in ‘the age of access’, made possible by modern information and communication technologies. These facts increase citizens’ expectations of free access to authentic information 24 hours a day, seven days a week, wherever they happen to be.”Footnote 60

The anonymisation/pseudonymisation of personal data in archival material is not in itself very problematic. Yet there are at least two levels on which it leaves fundamental question marks:

  1. 1.

    Process level: The process of analogue anonymisation/pseudonymisation of personal data on archival copies seems to be a more or less trivial matter. This is probably the reason why it is not given closer consideration. However, this assumption is wrong. Even at this procedural level, one basic difficulty stands out: It is the time-consuming and laborious nature of the whole process of personal data anonymisation/pseudonymisation, in its analogous as well as digital form. To support this thesis, we conducted an empirical survey with colleagues from two archives to quantify the labour and time needed for anonymising/pseudonymising personal data in archival records. The research was conducted in two public state archives in the Czech Republic as one of the member states of the European Union, namely the State Regional Archives in Prague and the National Archives of the Czech Republic. Although this was a form of analogue pseudonymisation, many of the steps, and the overall time and effort involved are similar for digital pseudonymisation as well. This will be the case at least until artificial intelligence tools are applied to a more substantial extent for digital pseudonymisation, which is not yet happening to any significant extent in the archival environment.

The following calculation and list of the individual steps necessary for the processing of personal data in archives thus represents an insight into the archival practice of one of the EU Member States, but the calculations concerning some of the actions, especially the process of data pseudonymisation by means of blackening out copies of archival records, are more or less universally valid.

The first step is simply to copy the archival record including the arrangement and preparation of the copies for the research room. The State Regional Archives in Prague performed their calculation on a sample of two boxes. The first box contained material on the Regional Committee of the Communist Party of Czechoslovakia (minutes of the Board), which took a total of 4 hours. The second box contained records from the Extraordinary People’s Court in Prague fonds and represented a full 10 hours of work. This step of preparation of judicial materials is considerably more time consuming.

What is the time required for the actual redaction of the copies (blacking out)? The State Regional Archives in Prague based their calculation on a sample of two boxes from the same fonds. Pseudonymisation of materials from the Regional Committee of the Communist Party of Czechoslovakia (minutes of the Board) took 3.25 hours per box of records, while pseudonymisation of archival materials from the Extraordinary People’s Court in Prague involved 9.75 hours of work per box of records. Pseudonymisation of judicial material is significantly more time consuming due to the enormous amount of (sensitive) personal data.

The cumulative time devoted to anonymisation/pseudonymisation (not including time for research and other activities related to the preparation of the records for researchers), including the preparation of copies, ranges from 7.25 to 19.75 hours of work on a single box.

A similar empirical survey was carried out in parallel in the Czech National Archives.Footnote 61 The results are almost identical. It took 6.2 hours to make copies of one box containing records from the Central Committee of the Communist Party of Czechoslovakia, and another 8.2 hours to pseudonymise them. It then took much longer for the same tasks in the case of the fonds belonging to the State Court, which in the early days of the communist regime in Czechoslovakia in 1948–1953 conducted political trials with opponents of the regime and handed out death sentences in these politicised trials. In the case of these fonds, the copying and preparation for pseudonymisation took 13.2 hours, while the pseudonymisation itself took 15 hours. The time required for pseudonymisation thus ranged from 14.4 to 28.2 hours of time per one box of records.

These figures, however, by no means cover the entirety of the work that must be devoted to preparing archival material to be “stripped” of personal data. This is a specific situation of the Czech archival system, which in this respect is rather unique in international comparison.

It is necessary to add to this the time that Czech archives—as required by Czech archival legislation—should devote to searching the Information System of Population Registration and determining whether the persons mentioned in the archival records are alive or not. Furthermore, archives should ask the living persons whether or not they agree to the disclosure of their personal data. The time and workload would be enormous, given that the archive would have to obtain all the necessary information (data from the population register, responses from persons whose personal data are included in the archival records), process it, organise all the associated agenda, and top it up by pseudonymisation. Given that a single cardboard file box often contains dozens or even hundreds of names of potentially living persons, the time-consuming nature of such work would be so great that it would become virtually unbearable for archives under current conditions. An exact empirical calculation of the time spent on the latter tasks could not be made. Based on cursory archival practice so far, only an estimate can be made of the total workload associated with the complete execution of the activities mentioned above, that is, to process a single carton would probably be close to 30 hours.

In addition to the financial requirements associated with the necessary staffing for these tasks, it is necessary to add the significantly increased costs associated with data pseudonymisation. This is particularly the cost of copying archival records before they are pseudonymised. In this case it is also remarkable to look at the specific empirical figs. A single box of post-1945 modern records contains on average 750–1000 sheets, most of which would have to be copied, in some circumstances even twice due to 100% blacking out of the information.

  1. 2.

    The second level, on which the anonymisation/pseudonymisation of personal data in archival records may seem controversial, is the content: The actual core of the problem of irreversible anonymisation and, from the perspective of the researcher, reversible pseudonymisation of personal data lies in the fact that at the end of the process, the applicant, including historians and other scientists, receives material devoid of specific links to specific persons and historical actors, material that is, so to speak, “depopulated”. The implications not only for historical research and understanding of our past, and therefore ourselves, are enormous in such a case. Apart from this chapter, Chap. 5, also discusses some specific examples.

In the future, it remains open to what extent and in what situations archives—both public and private—will approach the pseudonymisation of personal data in the process of making archival material available to the public. Provisions on pseudonymisation are gradually being incorporated, for example, into donation agreements between record donors and public and private archives; the latter can be demonstrated by the example of the German Archives for the History of Psychoanalysis.Footnote 62

Even at the legislative level, there has been a gradually increasing pressure to enforce the principle of data minimisation, especially in the form of breaking the link between data and its specific carrier. In a stronger form, data minimisation represents the tool of irreversible data anonymisation; in a weaker form it represents data pseudonymisation. It is not just the European GDPR that has come up with the principle of data minimisation and the requirement to break the links between the data and their subjects. For example, in Germany, at the federal level, the Data Protection Act of 2017, as part of the process of transforming the GDPR principles into German legislation, established an obligation of maximum “data economy” (“Datensparsamkeit”), meaning that “as few personal data as possible” should generally be processed.Footnote 63 At the same time, this legislation seeks to promote the application of the process of anonymisation of personal and especially sensitive personal data (in the diction of the GDPR special categories of personal data). However, it explicitly mentions anonymisation in the case of research and statistical purposes, but does not mention anonymisation in the case of archiving in the public interest.Footnote 64

However, it is unlikely that archives would choose the “hard” irreversible anonymisation of personal data in archival materials by eliminating them directly from the original. The vast majority of archivists are people educated in fields close to historical sciences. They look at archival sources from this “distant perspective” and broad horizon. They do not view archiving through the media lens of “close perspective”, topicality, and “fast-moving time”, let alone through the eye of a tabloid looking for all sorts of juicy titbits the material may contain. On the contrary, archivists, in the process of creating archival-historical sources, seek to preserve and allow to emerge a formative reflection of reality, society, and culture, and to establish this in a tradition that will be passed on to future generations. However, this reflection and mirror of reality, the basis of historical consciousness and individual and social memory, and at the same time one of the important pillars of the formation of human civilisation and culture, would be fundamentally reduced, bent, and shifted in its message were it not for certain specific people and their traces preserved in historical sources. The depth of tradition, the understanding of our past and the understanding of man himself would be seriously compromised.

It is a completely different question whether and in which cases the path of “hard” irreversible anonymisation will not be chosen for records moments before their archiving and their hypothetical transfer to a public archive. In Chap. 7, I have used census records to provide an international comparison of the situation of irreversible anonymisation of personal data in records in the moment before being archived.

The difference between the irreversible “hard” anonymisation and the reversible “soft” pseudonymisation of data correlates with the proposed model of the four categories of the right to be forgotten, which I first introduced in Chap. 5. The proposed concept of the four categories of the right to be forgotten (1. temporary limited, 2. temporary absolute, 3. permanent limited, 4. permanent absolute) suggests that in the case of such records and data for which it is necessary with regard to the protection of personality, privacy, and personal data to opt for the fourth and strongest category, that is, the “permanent absolute” right to be forgotten, neither archives nor purposes of archiving in the public interest should have the right to maintain these records and data; that should remain the fact even if such records would be completely exempt from the system of closure periods and remain closed. This applies to data that another German case law from the late 1980s, this time by the Federal Constitutional Court,Footnote 65 referred to when it defined the inviolable sphere of personality rights. This case law then stood at the heart of the newly crystallised model of personality spheres. Alongside the social sphere encompassing in principle the public life of individuals, there is the private sphere comprising of the small circle of family and close friends and the private life in one’s own home. And finally, there is the intimate sphere. Although it has not yet been precisely defined by the German Federal Constitutional Court, it has determined the existence of “an ultimate inviolable sphere of private life, which is absolutely separate from public power. Even serious interests of the general public cannot justify interventions in this sphere.”Footnote 66 And it is in these cases that either irreversible destruction of the data in question or at least their irreversible anonymisation should be chosen as the surest way and means to ensure that this sphere is never breached and the extremely sensitive data it contains are never misused in the future—bearing in mind the fragility of today’s democracies and the uncertain geopolitical constellation.

The time capsule tool the Australians and the Irish use for archiving census records as analysed above, belongs to the second category of the “temporary absolute” right to be forgotten. The relevant data are closed and access to them is completely restricted, but only for a limited period of time. In the case of this “temporary absolute” right to be forgotten, there is no need to use either the pseudonymisation or the anonymisation tools simply because the data are completely inaccessible for a given period.

However, a high percentage of the records maintained in public and private archives corresponds to the first category of “temporarily restricted” right to be forgotten, to which access for legitimate official purposes is allowed on a virtually permanent basis, but access for private purposes is prevented for various reasons (closure periods, personality and privacy protection, banking secrecy, classified information, etc.). This covers the vast majority of records maintained in public archives belonging to the group of records created after 1945. Records and archives falling under the “temporary limited” right to be forgotten represent by far the largest part of the material undergoing pseudonymisation, provided that its definition also includes the process of blacking out or other means of data redaction on copies and records.

The last and very high proportion of the content of archives (in the case of large state, provincial, regional, and similar public archives, this represents a significant majority of material) is archival material that is not subject to access restrictions for data protection reasons and that, based on our analyses, is not subject to the right to be forgotten in any of the four categories. Moreover, in the vast majority of archives, the volume of maintained records increases continuously as more and more archival acquisitions are made.

There are four interconnected processes that are fundamental to the whole issue of access to archives and the protection and processing of archived data. The first two are the process of “ageing of archives and data” and the process of “disappearance of personal data” as people who have left their mark on archival sources gradually pass away. Most archives across countries are based on the principle of continuous acquisition of new materials as more and more records are created and transferred to the archives; these archives should thus experience a continuous increase in the group of materials exempt from all the categories of the right to be forgotten.

This corresponds to the third process of disappearance or transformation of personal data sensitivity. It is ultimately linked to a fourth process, which I briefly mentioned above and which is discussed in more detail in Chap. 2. It is the process of weakening of post-mortem protection of the personality rights of the deceased, which the German Federal Court of Justice aptly described in the cited judgement—paradigmatic also for the subsequent development of the interpretation of the post-mortem personality protection—and stated that the need to protect the rights of the deceased that “disappears as the memory of the deceased fades”.Footnote 67

All of this certainly does not mean that the right to be forgotten has been radically marginalised in archives over time. Not only will “young” records always represent a significant part of the content of the archives in terms of volume, but it is also true that the “youngest” of the records belong among the most attractive for researchers, analogous to the interest of historians in “young history”. Similarly, there is a significant percentage of requests for access to archival material that touch on people’s private and, in a number of cases, intimate spheres. Archives will therefore still have to take into account in which respects the right to be forgotten manifests in any particular case and which category of the proposed system of the right to be forgotten comes into play. Archives should then act accordingly and decide on the manner of access to the relevant archival records.

8.2.3 Deanonymisation and Reidentification

Ultimately, archives will have to increasingly consider the growing risks of deanonymisation of anonymised or pseudonymised personal data and reidentification of individuals. These risks are manifested in the vast majority of cases in the area of digital data and records and at the level of their digitally pseudonymised or anonymised form. In principle, however, they can also be applied to analogue records and analogue pseudonymisation or anonymisation. On the one hand, various anonymisation and pseudonymisation techniques are being developed, which are not yet used to any significant extent in and by the archiving sector, even taking into account the fact that “hard” anonymisation of data in archival materials is opted for only exceptionally. On the other hand, tools for data deanonymisation and reidentification of individuals are being improved. The Working Party on the Protection of Individuals with regard to the processing of personal data has identified three risk areas for successful anonymisation or pseudonymisation.Footnote 68

The first area is the so-called singling out, that is, the ability to isolate a particular group or all records that are linked to and identify a particular person in a given data set. The second risk area is “linkability”, where two or more records relating to a particular person or group of persons can be linked, whether from one or more data sets. The third risk is “inference” by which it is highly likely to derive the value of an attribute from the values of a set of other attributes and thus deanonymise the data. The means used to prevent deanonymisation and reidentification of individuals should then tackle these three risk areas.

It is neither the aim nor the scope of this text to provide a detailed analysis of the various anonymisation/pseudonymisation techniques. It can briefly mention the systematisation of two groups of anonymisation/pseudonymisation techniques applicable to data sets presented by the Working Party.Footnote 69 The first is “randomization”, which consists in changing the credibility of data whose certainty is disturbed by introducing an element of randomness. If the data uncertainty is sufficiently significant, it can no longer be linked to a specific identifiable person. These techniques include, for example, noise addition, by which attributes in a data set are altered and the resulting data set is less accurate, but the overall data assignment is preserved. It also includes the permutation technique that swaps the attribute values and in some cases assigns them to be carried by different subjects than in reality. Randomisation techniques also include differential secrecy due to which the resulting data set contains some random inaccuracies that are additionally assigned. The second group of anonymisation techniques applicable to data sets is “generalization” which generalises certain subject attributes. For example, the original data set territorial reference to a municipality is extended to a region. This expands the set of potential carriers of the relevant data and makes it more difficult to identify them.

Currently, there are many cases pointing to the real risks of deanonymisation of anonymised or pseudonymised personal data and reidentification of individuals. There are studies that empirically demonstrate the real possibilities of deanonymisation using specific cases and data sets. For example, Bradley Malin, using the IdentiFamily software program, demonstrated the possibility of linking depersonalised family ties to specific individuals and the possibility of reconstructing large-scale genealogies within the current population from available online sources, typically from the media space, obituaries, and the like.Footnote 70 In a remarkable study, Luc Rocher, Julien M. Hendrickx, and Yves-Alexandre de Montjoye attempted to quantify the probability of correctly reidentifying a particular person including very incomplete data sets.Footnote 71 They created a model in which they estimated the degree of individual uniqueness. In the case of the American population, they concluded that 99.98% of Americans can be correctly reidentified in any data set using 15 demographic characteristics. Their conclusions point to weaknesses in some of the current standard anonymisation methods.

Arvind Narayanan and Vitaly Shmatikov have developed and introduced an algorithm for people-centric reidentification of social media data.Footnote 72 In doing so, they demonstrated that one-third of users with both a Twitter account and a Flickr account can be reidentified on an anonymous Twitter graph with an error rate of only 12%. The operator mistakenly assumed that the pseudonymisation they carried out—typically when selling the data to other entities, in particular for advertising and marketing purposes—prevents any reidentification. Narayanan and Shmatikov showed that pseudonymisation was not a sufficient guarantee in this case. Reidentification was enabled by means of relationships between different persons that are unique and can be used as an identifier. Their study is one of a number of those that prove that pseudonymisation of personal data and anonymity in social media environment is merely fictitious and not sufficient to protect privacy.

Narayanan and Shmatikov address the topic of deanonymisation and reidentification comprehensively and using concrete examples they demonstrate other ways in which supposedly anonymous data can be reidentified after they had been anonymised/pseudonymised. They applied the deanonymisation method to the Netflix Prize database containing movie ratings from 500,000 Netflix subscribers.Footnote 73 Using data from the Internet Movie Database, they identified Netflix records related to identified users. In doing so, they revealed, among other things, their political preferences and other potentially sensitive data.

The “Unique in the crowd: The privacy bounds of human mobility” study is a significant contribution to the field of reidentification.Footnote 74 The authors focused on the question to what extent it is possible to reidentify individuals by using information on the movements of persons. They performed a unicity test to verify how many points in terms of spatio-temporal traces a person must leave in order to uniquely identify the mobility trace of an individual. The data that can be used for the purpose of reidentification usually include data from various public sources such as the address of residence, the address of employment (usually it is the address of residence or employment that is known about the person), geolocation data provided directly by the person concerned, and so on. The authors based their research on a data set of 1.5 million individuals, the vast majority of whom moved within 100 kilometres, over a period of 15 months. They then concluded that only four spatio-temporal points are sufficient to uniquely identify 95% of individuals and two randomly selected spatio-temporal points allow for the identification of more than 50% of individuals. Based on the findings, a general conclusion can be drawn stating that “mobility traces are highly unique, and can therefore be reidentified using little outside information”. Similarly, another study demonstrated that mobile phone data can be reidentified by means of the user’s top locations.Footnote 75 Deanonymisation via geolocation data is becoming both one of the increasingly common tools of data misuse and a hot research topic.Footnote 76

However, the risks of deanonymisation and reidentification are becoming highly attractive and intensively researched topic on multiple levels. Very recent research has demonstrated the possibility of deanonymising data by using records on music preferences and selections by the users of various streaming services. It has revealed the possibility of reidentifying users based on these records, albeit stored in anonymised/pseudonymised form.Footnote 77 In a similar context, specific models for assessing the risks of data reidentification (including the use of quantifications based on available statistical data) in mobile applications are being developed, both by public authorities and by citizens themselves, with the aim to evaluate the specific level of risk of data reidentification and privacy leakage risk.Footnote 78

In recent years, some (personal) data protection authorities have already started to include a general reidentification risk assessment in their methodological recommendations for the management of (personal) data, containing an analysis of the potential risks of data deanonymisation and reidentification of persons.Footnote 79 It focuses mainly on situations in which the data custodian intends to disclose or publish anonymised data. In such cases, the custodian should analyse the hypothetical risks of future deanonymisation and reidentification of persons and carry out a reidentification risk assessment to ensure an adequate protection of personal data, personality rights, and privacy. Archives and archiving should take inspiration and incorporate data reidentification risk assessments into their records management processes.

To sum up, it is not possible to perceive anonymisation as a perfectly effective irreversible act of destruction of certain data, or of the link between the data and their carriers. Data custodians, which of course also includes archives, should take the risks involved seriously. The digital world, including digital data management and archiving, has significantly increased the risks of data misuse. Access to data has been greatly facilitated, especially with the possibility of remote online access. Technological advances and, in particular, the expected development of artificial intelligence capabilities increase the risks of deanonymising anonymised or pseudonymised data and reidentifying individuals. This context should be given substantial consideration, particularly at the level of the application of the principle of data minimisation and storage limitation, including in the field of data archiving.

8.3 Conclusion

Data minimisation and storage limitation in relation to archives and archiving are implied by an underlying question that is also decisive for the specific field of data retention: What data on their citizens the state and public authorities have the right to maintain and exploit, taking into account the need to ensure the basic functions of the state, local government, necessary internal security, the administration of justice, and other public interests. Patrik van Eecke and Peter Craddock described this situation succinctly—albeit in relation to the European GDPR, but the statement is generally valid for archives: “the main challenges to public archives are not whether data needs to be erased, but how to ensure that only the personal data that is truly necessary is processed and that personal data is anonymised or at least pseudonymised where possible”.Footnote 80

The principle of data minimisation and storage limitation in the context of the protection of (not only) personality rights, privacy, and personal data in the area of archiving and records management is based on a fundamental tension in which on the one hand, the state, public administration, and public authorities need to acquire and store a set of certain data on citizens to ensure their basic functions and, on the other hand, they need to deal with the ever increasing risk of misuse of such collected, stored, and possibly archived data. I have outlined some of the possible tools by which records management and archiving in particular can respond to this tension and seek outcomes that are consistent with a democratic legal order and that balance the polarised scales as much as possible. The conclusion of the book will then provide a set of recommendations that may be implemented in archival and records management practice, and will summarise the conclusions of the analyses conducted in this book.

The principle of data minimisation in the field of archiving and preservation expresses two interrelated basic issues: What data can the state, public authorities, and society have the right to keep on their citizens and, where appropriate, what data they can preserve permanently or for a very long time.

The book has asked questions about the risks to data, personality, and privacy protection in situations when data are stored and, in specific cases, archived. This tension, however, reflects one of the paradoxes of archiving: On the one hand, the very act of storing data, especially personal data, poses security risks of potential misuse, typically in the form of ransomware. On the other hand, archiving serves in certain cases as one of the tools of data protection and even as one of the defences against the risk of data theft and leakage, including some types of ransomware attacks, especially those that block access to data. In this case, appropriately configured processes of archiving and storage backups (including the important issue of the frequency of such backups in the form of archiving) outside the information systems of the creator, that is, in an archive, can under certain circumstances enable the recovery of the blocked data.

The tension also demonstrates on an economic level. On the one hand, any retention of personal data poses a potential risk of negative economic consequences for data controllers in a situation in which data protection is breached and, for example, blackmail occurs. On the other hand, data represent the imaginary “black gold” of the twenty-first century and are a source of profit for the private and, by extension, the public sector.

Based on the performed analyses, the relationship of the archival sector and archiving to the process of minimisation and storage limitation of (personal) data, can be summarised into several important and decisive moments.

Although the analyses have provided indications of a gradually decreasing percentage of records transferred into archives for permanent retention, relative to the records determined by the process of archival appraisal for permanent destruction, the continuously increasing volume of archived materials represents a constant (cf. Chap. 6) that needs to be taken into account in the context of the issue of data minimisation.

Public archives created less than approximately 100 years ago, or public records with archival potential, contain mostly personal, including very sensitive, data of living persons. This applies to archival material during the initial phase of its existence, when it relates to individuals who are still alive. This phase corresponds approximately to the average probable life expectancy, which gradually increases, and this trend is likely to continue (barring any other serious epidemic situations such as the COVID-19 pandemic or other catastrophic social developments). Archival records at this stage, during which they relate to living persons, are subject to a significant risk of misuse of the personal data of those concerned in them.

At the same time, the archiving process bears a feature that I refer to as “disappearance of personal data”, if we understand personal data as data relating only to living persons. In view of the fact that archives are destined for permanent preservation, the proportion of deceased individuals concerned in the archives in relation to the living increases over time. This is another important process that takes place within data archiving, and which can be described, again using some literary hyperbole, as the “ageing” of archives and data. Closely related to this are two other processes that occur during archiving, namely the disappearance or transformation of personal data sensitivity and the weakening of post-mortem protection of the personality rights of the deceased. These four formative processes will be discussed in more detail below and in the final section “Recommendations”.

On the other hand, archives constantly acquire more material, with a significant share of records subject to shorter retention periods; very frequent is the period of five years after the record is closed. These are very young records, which will often carry information about living persons for almost 100 more years. Archiving in the public interest will therefore have to deal with the preservation of personal and sensitive personal data of living persons in the future as well.

A principle paradigmatic for the relationship between archiving and data minimisation is embodied in the European GDPR: If it is possible to erase the specific identity of a person, provided that the public interest of archiving, together with the purposes of historical research and statistical purposes are not compromised, there is an obligation to perform such de-identification. However, this general rule is not in itself a solution to the whole problem. On the contrary, archiving stands before a key question and a very wide room for interpretation: How and according to what criteria to determine whether the purpose of archiving in the public interest can be fulfilled in the case of specific groups of records, even in case of irreversible loss of personal identifiability. It remains equally important and open to answer who will assess this issue, whether it will be, for example, expert bodies, who will nominate the members, and how apolitical and professional decision-making will be ensured.

Records management and archives play an important role in the application of fundamental human rights. First, it is the right to know, the right of access to information and records, and second, the protection of privacy and personality rights.

There is a fundamental tension not only between data minimisation and its relation to data preservation, but within archiving in its entirety: On the one hand, the best protection for any data is their irreversible destruction. To put it the other way round using the words of the Court of Justice of the European Union interpreting the impact of general retention of traffic and location data: “the mere retention of that data by the providers of electronic communications services entails a risk of abuse and unlawful access”.Footnote 81 Similarly, in order to protect privacy, personality, and personal data, it is usually most effective to simply irreversibly destroy the potentially or actually sensitive data. The “Archive of National Socialism” (“NS-Archive”) analysed above—which the East German communist dictatorship used to coerce, compromise, and blackmail own citizens as well as those from “enemy” countries—demonstrated how easy it is for the public authorities to massively misuse sensitive data on people if such data are preserved. One of the fundamental pillars of official legal records destruction that occurs as a part of archival appraisal resonates with the intent to destroy sensitive and potentially exploitable personal data; it is the condition that a record that had been through the process of archival appraisal and had been destined for destruction should be irretrievable.

On the other hand, for a specific individual as well as for the functioning of the whole state and society, it is absolutely necessary to preserve certain data for a certain period of time, which can sometimes be very long and extend over several decades, and possibly even to a hypothetically infinite period of time; and this is the moment archives enter the field with the vision of (feasible?) permanent data archiving. In a slightly different perspective, but in a similar circle, is the right to know, the right of access to information and records, a right no functioning and free society can do without.

The tension and the need to balance between the need to preserve and the need to destroy/not preserve is a defining motive far beyond data management and archiving. This applies to inanimate as well as to animate structures. However, while psychology more often focuses on how memory works and how to maximise its potential, and sees forgetting as a kind of necessary and rather distracting pendant,Footnote 82 philosophy has been able to take the phenomena of memory and forgetting further in many respects, in some cases equating them and defining them as somewhat equal partners. Friedrich Nietzsche expressed this in the second half of the nineteenth century when he addressed the relationship between knowledge and life and the function that history (historical science) fulfils or should fulfil for the performance of life. He asked himself what is the relationship between the past, its knowledge and life itself.Footnote 83 (Historical) knowledge does not exist by itself, but always relates in some way to life and must be judged as such. The question of knowledge is as important as the question of life. Nietzsche asks what (historical) knowledge is to life, what life needs it for, and what place knowledge should occupy in it. Life is constantly threatened by excess of knowledge that could ultimately destroy it. In this respect, it is necessary to set reasonable boundaries so that life is not stifled or outright destroyed by “the explosion of knowledge”. If we translate this for the purpose of our research, a lot of data and records must be destroyed and not preserved, “forgotten” as a sine qua non condition for the possibility of preserving, processing, and making meaningful use of that tiny and most valuable part of the created data.

The second crucial point is a level on which balancing is or should be taking place. Data preservation and especially long-term archiving should seek a balance between the two positions: On the one hand, it is a constant, unchanging protection of some basic values and layers of an individual’s rights, personality, and privacy. On the other, there is the phenomenon characterised and defined as early as 1968 by the German Federal Court of Justice giving the definition of post-mortem protection by means of direct proportionality: the need to protect the rights of the deceased “disappear as the memory of the deceased fades”.Footnote 84 Post-mortem personality protection is not, according to this judgement—which foreshadowed the subsequent case law of the German courts up to the present day—“timeless”. In other words: Just as the memory of the deceased gradually fades, the protection of the personal data, personality, and privacy of the dead should weaken and diminish proportionally. This includes two abovementioned processes: namely the disappearance or transformation of personal data sensitivity and the weakening of post-mortem protection of the personality rights of the deceased. A specific example illustrating the “disappearing sensitivity” of data in records and archival materials, or the transformation of the nature of their sensitivity, is the history of the so-called fichiers juifs in France, which I have discussed in detail in Chap. 7 in Sect. 7.3. It is therefore necessary to consider the “ageing” of archives and data as it relates to people who died some time ago and that time gets more and more distant. This process and the post-mortem protection of personality in general are discussed in Chaps. 2, 3 and 4.

Let us apply this balancing act to one specific example from the history of European civilisation: For example, information about the intimate life of the Roman emperor and Czech king Charles IV, who died almost 650 years ago, should be subject to the same protection of the most intimate privacy sphere as is the case with the same categories of data about, say, Václav Havel, who died in 2011, a dissident fighting against the communist regime in the former Czechoslovakia and the first president of the state free from the communist-Soviet control. Their “hacking” would result in exactly the same tabloid-like gossip, whether it concerns a long-dead emperor or a recently deceased president. It is still a case of breaching the protection of the most intimate privacy sphere, whether concerning 500-year-old or 1-day old data. This protection should never be broken. On the contrary, however, in the case of other data, which are currently often classified as sensitive personal data, such as health or ethnic origin, the more than 600-year-old information relating to Charles IV is not equal in sensitivity to the same information relating to Václav Havel. After all, in the case of philosophical beliefs, religious beliefs, and the like, the sensitivity of the information of these two public figures is practically no different, and in both cases it is an area open to the public.

Data minimisation and storage limitation should take account of this context, which applies to both phases of their “life”—that is, during its “active life”,Footnote 85 whether in the context of records management or specific individuals, as well as in their post-archiving phase. Respectively, an essential moment of the process of data minimisation and limitation of their storage happens just at the borderline, at the transition between “active life” and their “passivation” during the retention period and archival appraisal, during which the absolute largest part of records and data is destroyed and the remaining minority—as I demonstrated in the first part of this text—is transferred to the archive for permanent archiving.

We may begin to see that the inclusion of the possibility of granting consent to the maintenance and archiving of personal and sensitive personal data repeatedly comes to light as one of a number of tools for finding balance between the need and right to remember together with the right to know and, on the other hand, the right to be forgotten and the protection of personality and privacy. Some countries have already implemented the first steps towards making the archiving of public records containing sensitive personal data in the public interest subject to the consent of the person concerned in them. I have demonstrated this using the example of census records in Australia and Ireland and their archiving inside time capsules, with absolute restriction of access to their contents for 99 and 100 years respectively. Only the census records of citizens who gave their explicit consent are archived; at the same time, these people were given the opportunity to include a personal message to future generations. This, in a way, manifests something I would call the “right to be remembered” as a kind of counterpart to a right a person can claim, that is, the “right to be forgotten”. It is the right of man to leave a certain memory, an imprint in reality, and to preserve and care for it in such a way that it will be distinct even after many decades and centuries. The model of archiving census records in Australia and Ireland demonstrates the combination of these two rights on two levels: (1) the freedom of choice of citizens to preserve their data is introduced; (2) in the case of consent to archiving, the right to be forgotten is effectively applied for the first 99 and 100 years respectively by means of an absolute restriction of access to these data. After this period, the records are opened and the right to be remembered is activated.

The final chapter concludes the whole book by summarising a set of recommendations that I view as suitable for implementation in archival policies on data minimisation and storage limitation in the context of protecting (not only) personality rights, personal data, and privacy of those who have left their traces in archival records.