Galleries, libraries, archives and museums (GLAMs) are striving to retain audience attention to issues related to cultural heritage, by implementing various novel opportunities for audience engagement through technological means online. Although born-digital assets for cultural heritage may have inundated the Internet in some areas, most of the time they are stored in “digital warehouses,” and the questions of the digital ecosystem’s sustainability, meaningful public participation and creative reuse of data still remain. Emerging technologies, such as artificial intelligence (AI), are used to bring born-digital archives to light, aiming to enhance the public’s engagement and participation. At the core of this debate lies both the openness of data and issues of privacy. How open to the public should born-digital archives be? Should everything be open and available online, and what does it take to achieve balance between openness and privacy, especially through AI initiatives? The study is qualitative and builds on the rationale of grounded theory. The role of AI development is critically investigated in relation to opening up born-digital archives online, by considering privacy and ethics issues. Grounded in the context of the author’s PhD research, the paper proposes a human-centred approach to AI development for democratising its development towards fairness and social inclusion, contrary to the stereotypical cliché of blackboxing, allowing space for the plurality of born-digital archives to flourish.
Tension between openness and privacy is nothing new. However, with the evolution of emerging technologies, and particularly artificial intelligence (AI), tensions have rapidly developed between openness, privacy and AI. In particular, questions about the openness and accessibility of cultural born-digital archives concern galleries, libraries, archives and museums (GLAMs), with regards to meaningful public engagement. Indisputably, cultural archives are meant to be used in a creative way, not stored in “digital warehouses” to be accessed and used only by the archivist and other experts. At the same time, preservation, accessibility and, ultimately, usefulness of born-digital assets are crucial for maintaining our collective memory.
The quality of the process for opening up born-digital archives through AI is fundamental in ensuring the development of fairness, social inclusion and transparency, values that are necessary and are aligned with what GLAMs advocate for. In most cases, AI currently operates through opacity, where the public are generally not aware how AI’s statistical models and algorithms actually work yet still use them frequently. When designing strategies for opening up data online, it is crucial to be concerned with and reflect on aspects related to privacy and ethics; however, AI cannot currently be fully intelligible to the public. Breaking the stereotypic view of AI as a black box, this work discusses openness and transparency in AI, allowing possibilities for a socially inclusive, participatory culture through AI, tapping into the potential of human approach in AI development. This article will shed light on aspects of openness and AI in an effort to democratise the process of opening up, concluding with a human-centred approach to AI for fair use. The study is qualitative, and the author has conducted and analysed a series of expert interviews to investigate the research question. In addition, this work builds on the rationale of grounded theory.
This study is part of the POEM (Participatory Memory Practices) project, an Initial Training Network doctoral programme funded by Marie Skłodowska-Curie EU-Horizon 2020 grants. The paper is structured as follows: in Sect. 2 the impact of digital assets in born-digital archives is discussed alongside a brief discussion of AI and its challenges; in Sect. 3 the methodology is presented; in Sects. 4 the research results are presented; in Sect. 5 there is the discussion; and the conclusion follows.
2 What is the impact of born-digital assets on archives?
To begin with, born-digital assets have inundated the Internet in various ways, playing a determining role in the digital ecosystem’s sustainability, on the one hand, and people’s meaningful and creative engagement, on the other. Briefly, “born-digital materials,” as discussed in this article, is defined as all of those assets whose life cycle started within the digital realm. In reference to digital cultural archives, born-digital materials can encompass emails, documentation processes (if automated), 3-D models, GIFs, memes, images and videos, to name a few. With advances in digital media and emerging technologies, such as AI, born-digital assets are growing by leaps and bounds in terms of both their quantity and their quality. With regards to their quantity, born-digital assets are increasing rapidly in number thanks to automation and machine learning techniques. As for quality, these advancements enable more and more innovative and elaborate techniques for enhancing and improving born-digital materials.
To understand and identify the issues that derive from the tension between open data and privacy, this work will briefly reflect on the aspects that have fundamentally changed the archival sector in the digital realm. Jaillant (2019) discusses the history of born-digital records in her work, and so this paper will not delve into that topic here. However, the most profound changes in the digital versus the physical realm are mainly related to the aspects of communication and production: immediacy of access, ease of creating digital assets and costless copying of the assets (Pollock 2018). The last of these is a key factor, as born-digital assets can now be reproduced in a much shorter time than before, usually with no extra costs. At the same time, and as a result of ease of access, many have advocated for the democratisation of the Internet, through open access and open knowledge initiatives.
Open knowledge initiatives have been getting more attention in recent years. The call for a more democratic and equitable Internet has prompted exploration of the potential for openness. These initiatives align well with such values as transparency, fairness, social inclusion and participation, values that are in turn aligned with GLAMs. The open data movement has attempted to address certain topical questions, specifically regarding the accessibility and usability of data and, thereafter, its reusability by the public. Certainly, GLAMs’ vision, to digitise and open up their collections to the public, has been at the core of their digital communications planning. There have been initiatives created by private and public aggregators, such as Google Arts and Culture (2021) and Europeana (2021), respectively, as well as social media that have aided GLAMs in opening up their collections to the wider public.
2.1 What does “open” really mean?
Indisputably, open data have gained ground recently; however, openness does not only mean digitising artworks and making them available online. Being open means having the capability of being reused and remixed as an essential point (Huggett 2018; Tzouganatou 2021), such that a given asset can then produce open knowledge. Open data sets can be available to the public in many ways, such as through GLAMs’ online portals, an aggregator or an application programming interface (API). The following example helps to illustrate how an open data set could be used by AI and thereafter by the public. Consider the case of a 3-D model of an archaeological site, alongside its information, namely metadata and paradata,Footnote 1 which has been attributed a Creative Commons licence. These kinds of data sets could be used for training data by GLAMs for tagging purposes to improve AI models. At the same time, this practice could enhance GLAMs’ discoverability for the purposes of user engagement, intelligent search of text or image recognition. According to the definition provided by the Open Knowledge Foundation (2021), “‘open knowledge’ is any content, information or data that people are free to use, re-use and redistribute—without any legal, technological or social restriction.” Further, the close link between open knowledge and open data is explained as follows: “Open data are the building blocks of open knowledge. Open knowledge is what open data becomes when it’s useful, usable and used” (Open Knowledge Foundation 2021).
An important illustration of this direction is the implementation of the FAIR guiding principles (Wilkinson et al. 2016). The acronym refers to a set of guiding principles that aim to make data Findable, Accessible, Interoperable and Reusable. Moreover, initiatives like Creative Commons licences and the Traditional Knowledge (TK) labels are striving to enable fairer attribution and are making a huge positive impact on opening up knowledge. Ethics are inextricably linked with decisions about implementing practices of opening up knowledge, particularly via digital means generating opacity through emerging technologies. Questions about why to open up, whom to open up to, the level of openness and the quality of the process’s nature for opening itself up are critical.
2.2 AI and its challenges
AI is described as a “fast evolving family of technologies that can bring a wide array of economic and societal benefits across the entire spectrum of industries and social activities” (European Commission 2021a). It has indisputably disrupted society and transformed workflows, with pervasiveness being a key quality (Furman and Seamans 2019). Digital platforms, or the so-called platformization of the web (Helmond 2015), have permeated the Internet, shifting from connecting to predicting (Mackenzie 2018). AI has penetrated every sector—including, of course, the heritage sector—in many forms, including intelligent searching of text, image recognition, digital storytelling experiences and conversational interfaces, i.e. chatbots (Tzouganatou 2018). AI is indeed very promising but could also be “potentially harmful” (Madianou 2021, 865); hence, being cautious and critical of it is essential. People should make good use of AI and implement it to serve their needs, and not vice versa. The question that arises is “how can people understand the way automation actually works?”. This mitigates against the issue of AI viewed as a blackbox and makes the use of AI more open and ethical.
Automation and algorithms usually operate under opacity, and that is why people usually refer to AI as black box technology (Pasquale 2015): because it is challenging to decode and to understand. A black box “is an object, piece of software, or system in which the user can direct input but cannot examine or verify the processes that occur before the produced output” (Dennis 2021, 108). The opaque way that a black box operates has been widely discussed by Latour (1999), and it is not aligned with the notions of openness, as discussed above. Therefore, ethical issues arise from the use of digital tools operating under non-transparent practices, which are thereafter non-intelligible by the users (Dennis 2020, 215). Emerging technologies are linked with opaque practices, and the notion of “understanding” as well as the “ethical use” of AI are intertwined with initiatives like the eXplainable AI (XAI), realising the essentiality of focusing on how AI actually functions so that it becomes comprehensible enough for non-experts (Barredo Arrieta et al. 2020). Moreover, through the XAI initiative has emerged the concept of “responsible AI”, which promotes “a paradigm that imposes a series of AI principles to be met when implementing AI models in practice, including fairness, transparency, and privacy” (Barredo Arrieta et al. 2020, 46).
On the other hand, some basic and also crucial problems with machine learning issues, which are limiting actual intelligence, should be acknowledged. The history of intelligent machines began in the 1950s when the British mathematician and pioneer in computer science, Alan Turing, contemplated the notion that machines can think (Turing 1950). When John McCarthy coined the term “artificial intelligence” in 1955, he questioned whether machines can think intelligently. This question still has not been answered successfully. Yet, what does “intelligence” mean? The origin of the word is the Latin intelligere, which means “to understand.” Currently, machine learning is not actually capable of adapting and understanding (semi-)complex issues like matters of sensitive data and privacy. It has the capability of pretending to be intelligent, yet it is not really. This is because machine learning cannot successfully incorporate causality. Indisputably, this is a long-standing discussion in the field of AI (Eberhardt 2007). Indeed, there have been advances in modelling predictions yet there is much room for improvement when it comes to causality. Predicting something cannot (necessarily) be equated with understanding it or realising the “whys” and “hows.” So, when a machine predicts a user’s action/move, it follows a planned path, which it is trained to do. It can predict, i.e. follow a specific path, but not actually understand why. At the core of understanding and analysis lies causality; this is what humans perform when analysing facts and making relevant choices accordingly.
Addressing the lack of causality and developing actually intelligent systems are not easy steps to perform. This process undeniably requires continuous training of the machines. However, one of the main problems that AI and machine learning encounter is incomplete data sets (Little and Rubin 2020). Machine learning needs training to “learn” and perform certain and complex activities. Nevertheless, collecting representative data outside of lab conditions for the required training also involves some risks and might not ultimately lead to complete data sets. This is because society at large is biased, and the collected data will reflect that (Fricke 2020). In many cases, AI and their algorithms are blamed for being biased; however, any bias that exists does not result from the algorithms just operating independently in a biased way. It is because a human trained them to operate in such a way. Therefore, it is not that the algorithms are biased, but that the way society functions—and in particular, the way humans perform the appropriate training—is reflected in the collected data.
As discussed above, currently ΑΙ is operating mostly under opacity which makes it hard for people to understand how it works, also ethical consideration arise. The question that this article addresses is how to open up this process for more ethical public participation. This refers also in supporting the potential of (human) intelligence, while protecting privacy issues, which could align with the values of openness, transparency and fairness—values that GLAMs are advocates for.
This study was conducted in the context of the author’s PhD research project, which investigates the conditions of openness of cultural data in the digital ecosystem while producing a socioeconomic model capable of fostering public participation in a fair and equitable way. The methodology was qualitative and based on the analysis of interviews conducted during the study’s fieldwork, in February 2020, at the Finnish openGLAM chapter in Helsinki, part of Open Knowledge Finland. This study is based on the analysis of a subset of 6 out of in total 21 semi-structured interviews with GLAM professionals, social innovators, service designers and open knowledge activists, where the interviewees discussed and highlighted the importance of AI and its entanglements with openness, privacy and archives. The interviewees are referred in this article using their real names. All interviewees gave their consent to this in the beginning of the interviews and signed a consent form for permitting the use of the data for research purposes. For the purposes of this article, the author used the six interviews where AI emerged as a topic, alongside with memos and codes, as part of the analytical process of the interviews. The selected participants have expertise in the intersection of GLAMs, digital technologies, the opening up of knowledge and promotion of fairer practices in the digital economy. Furthermore, and apart from the experts’ interdisciplinary focus, the participants had working and/or research experience in the field of cultural archives, openGLAM and/or emerging technologies. As such, the sample reflects the interdisciplinary nature of the research topic, and the tensions between openness, privacy and AI, in the GLAMs’ sector. In total, the experts answered ten open-ended questions allowing space for follow-up questions, to elucidate when needed. The length of interviews ranged from one hour to two hours and 14 min. The questions focused on the confluence of open knowledge, GLAMs and the digital economy, particularly regarding the impact of the current digital ecosystem and its adopted emerging technologies, as well as practices, on future memory making.
Moreover, the study applied grounded theory (GT), which derived from the need for an open-ended and emergent method (Charmaz 2008) as a practice for grasping the fluidity of the research field. GT is a structured and inductive method for practising qualitative research that was introduced in 1967 by the sociologists Glaser and Strauss, who set the foundations of this method (Glaser and Strauss 1967). The process of GT comprises multiple different stages, allowing the researcher to be flexible enough, but still in a structured and systematic way, and to build the theory through a multisensory lens and versatile modes. This study applied GT through an exploration of emerging patterns in the collected data and ultimately rigorously developed a theory grounded to the research data. Once the interviews were conducted, they were transcribed and then the stage of coding and data analysis took place. One of the most crucial steps of GT is memo writing, which could be seen as the bridging of raw data to theory (Lempert 2007, 249), aiding the researcher in starting to construct the narrative. Memo writing was practised intensively during interviews but also during the coding phase. In the course of the coding, software for qualitative analysis, MAXQDA (2021), was used to identify relevant thematic categories that emerged from the collected material. The initial coding was based on open coding, which is a line-by-line form of coding. Also, the ‘in vivo codes’ practice was adopted, to reflect the actual words used by the research participants; at this stage, codes were derived from the data (Charmaz 2006). After the first efforts of open coding, the thematic categories were descriptive and to some degree repetitive, which is not unusual in the GT process (Holton 2007, 276), and hence an initial restructuring of codes took place. During the coding stages, the codes were restructured three times, reflecting the data and following the process towards ‘fine coding’. Initially 18 themes emerged; this number was reduced to 10 by the final stage.
3.1 Results and analysis: a human-centred approach to AI
The thematic categories derived during the coding process comprised of ten themes; (1) the importance of transparent practices in AI development (2) human intervention and human agency in AI development (3) privacy issues in born-digital archives (4) the openness of data matters (5) risks of opening up data online (6) social inclusive practices and the role of human and non-human actors (7) AI’s impact for opening up born-digital materials online (8) ownership issues in the digital era (9) human intervention and ethics in AI development and (10) human-centred technology.
3.2 AI transparency and inclusivity
All of the interviewees emphasised that the values of transparency, fairness and privacy are significant elements in how AI should perform, being at the service of humans. First, four out of the six interviewees suggested that the public is not actually aware of how AI technology works—specifically, that the results the user gets through a search engine could be limited, by algorithms and often through surveillance mechanisms and related practices. Indicative of this is a statement by Susanna Kokkinen, head of Aalto University Records Management. She noted:
As we know, Google has algorithms that actually find you only the content that you search most, which means that you will never actually find what you are looking for, unless Google wants so. (Kokkinen, interview by author, 2020).
This aspect refers to the phenomenon of surveillance capitalism (Zuboff 2019) which has penetrated many aspects of digital life, greatly affecting people’s actions and thinking. It is particularly prominent for commercial platforms such as Google with a business model of obfuscation, where little is known of how AI and its algorithms actually operate and run.
The second aspect of non-transparency, as highlighted by all interviewees, is linked to the contributions and participation of users, leading to the potential exclusion of people. Opaque AI practices might hinder the public’s participation in digital cultural archives. This limiting of participation can result from the way AI functions and operates; for instance, AI can exclude people with no IT background.
In contrast, transparent practices and open-source coding could allow people with limited IT skills to understand how AI operates, while allowing space for using it in such a way as to serve their needs. In turn, inclusivity could also be fostered. Even though AI functions in an automated way, a lot of people are involved in the process—AI architects, developers and so on. However, the people who design and build these systems are not usually the end users. According to five interviewees, the abovementioned issue is particularly evident in the cultural heritage sector, where most people affiliated with GLAMs have a humanities background and yet are required to make use of AI programmes and experiences. Moreover, GLAMs audiences should not be expected or required to have advanced digital literacy skills to comprehend how emerging technologies operate.
3.3 AI and human agency
Another important element that was highlighted by five out of the six interviewees is the issue of human agency in AI development. This is illustrated in the following statement by Minna Ruckenstein, associate professor at the Consumer Society Research Centre and the Helsinki Centre for Digital Humanities at the University of Helsinki:
Algorithms have no power by themselves also because we are part of the algorithms. The way we use these machines, decides what kind of results come from these. So we are kind of part of this feedback loop. (Ruckenstein, interview by author, 2020)
Frequently, it is said that algorithms and AI do things, connoting an abstract quality inherently linked with these emerging technologies and often portraying algorithms as powerful agents (Ziewitz 2016). However, behind the abstractness of the algorithms are indeed people who design and develop them. Algorithms have no power by themselves; someone designs them, trains them and ultimately lets them run. Humans are part of the processes of developing, and usually maintaining as well, the algorithms and should know how to use these machines, while deciding what kind of results come from them. Therefore, humans are part of this process. In this respect, rather than talking about algorithmic power as if people were not involved, it is important to speak of human agency, which is also something that is “developed,” in a sense—for better or for worse—like the algorithms themselves.
It is crucial to realise that people are involved in all of the processes of designing, developing, evaluating and maintaining AI and that the “algorithmic power is inherently only ever partial” (Ferrari and Graham 2021, 13). The aspect of human agency is very prominent in the design and development of a technological system and should not be neglected or underestimated by any means. However, adding to the discussion of forming a potential productive human–machine symbiosis (Cooley 1996; Gill 2019), considering the human and digital agency (Stapleton et al. 2020) and its nuances, it is also crucial and perhaps safer to acknowledge a degree of digital agency of machines (Huggett 2021, 422) as well and not diminish the digital or technological agency. To that respect, a human-centred approach to AI could aid in a socially inclusive understanding and, hence, reinforce the ethical use of AI.
Moreover, all the interviewees agreed that there is no doubt that AI should be designed for humans and that humans should make good use of the technology to serve their needs either individually or collectively, and not vice versa. Cooley (1987) was one of the first to set the foundations of a human-centred systems and technology movement, putting people first, while critically reflecting on the limitations of intelligent machines and automations (Cooley 2018). Moreover, recently, on April 21, 2021, the European Commission (2021a) published its proposal for a regulation “laying down harmonised rules on artificial intelligence” (the Artificial Intelligence Act), in which it emphasises the importance of a human-centred AI: “Rules for AI available in the Union market or otherwise affecting people in the Union should, therefore, be human centric, so that people can trust that the technology is used in a way that is safe and compliant with the law, including the respect of fundamental rights.” At the core of this paper lies a human-centric approach to emerging technologies, starting from a fundamentally anthropocentric point: that technology should enhance or augment human skills but not exploit them or undermine them and should be used for social benefit (Cooley 1989, 2018; Gill 2016). In that respect, when discussing “protecting” an archive, data or material in the digital realm, particularly with regards to privacy issues, it is not merely about preserving the material per se. Indeed, it might be the case that the material object needs to be digitally “preserved,” yet in most cases it is the people and their stories behind that object that need to be private, to be protected. Therefore, the strategies needed for investigating and maintaining a balance between openness, privacy and AI should be focused and designed primarily around the principles and values of respecting humans, the environment and the society. For these reasons, this article argues for a human-centred approach to AI for regulating issues of openness and privacy.
As discussed above, at present, machine learning does not have the capability of being actually intelligent and of understanding when, or why, certain data must be open and when it should be kept private. Nevertheless, envisioning intelligent systems that would be capable of adapting to their environment and understanding is of the utmost importance. According to the analysis of the research results, what is required is an automation that becomes more “conscious” of the different nuances of born-digital archives, particularly when it comes to privacy, which is an area where issues can be very complex. Reflecting the plurality and diversity of born-digital archives could potentially be realised through advancing the intelligence of AI systems by shifting the focus to humans and allowing “democratic interventions” (Feenberg 2017, 646), enabling and encouraging participation as well as collaboration in the design process. Taking into consideration all of the above, envisioning and performing a human-centred approach to AI, this work proposes three fundamental practices/principles: a focus on AI transparency and AI inclusivity (which are linked) and the importance of human agency.
4 Opening up born-digital archives
From the analysis of the collected data, it is evident that the purpose of opening up knowledge is not merely to provide access, but this is precisely one preliminary part of the process. Using the word ‘process’ emphasises the elements of continuity and fluidity: that opening up knowledge online is not a static state, nor is it the end goal or the final outcome. It requires a holistic approach, constant negotiations and strategies for its successful sustainability. It is a process to maintain, sustain and meaningfully engage the public and make knowledge useful to them. Hence, giving access to born-digital assets is perhaps one step. However, it is argued in this paper that having or acquiring access to digital cultural materials does not automatically or necessary make them ‘open.’ Openness has many qualities, yet at its core lies the possibility for change, and, therefore, it offers the potential for public participation and co-creation—the possibility for someone to take or receive it in a specific shape or form and then elaborate on, enrich or alter it accordingly. Moreover, there are many layers and complexities throughout the process. One layer is that the data are to be open; nonetheless, one of the most challenging parts is acknowledging the nuances of the openness itself and the different types of it, including aspects concerning privacy and ethics. Another crucial step follows: investigating how the data could be used and actually become useful. It is only when the data are open that someone can produce open knowledge, yet data should not be conflated with knowledge. There is always the question of whether open data actually produces open knowledge. The potential exists, but open data do not automatically produce open knowledge. A process needs to take place first.
What is required is not necessarily more open data but more ‘useful’ data—this is another complex part of AI. The human-centred approach and principles of AI development, presented in the previous section, could aid in making data more useful rather than just open, by amplifying the much-needed fairness and transparency in AI development for born-digital archives, ultimately potentially leading into public meaningful engagement.
4.1 Level of openness: privacy, human and digital agency
Indisputably, there are different levels of openness with regards to born-digital cultural archives. Not everything can be open, for example, owing to legal restrictions, i.e. copyright issues, but on the other hand, not everything should be open, because of privacy and ethical issues.
According to the General Data Protection Regulation (GDPR), personal data is “any information about a living individual which is capable of identifying that individual” (EU Commission 2016), whereas sensitive personal data is defined as any information relating to an individual’s “racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric data processed solely to identify a human being; health-related data; data concerning a person’s sex life or sexual orientation” (European Commission 2021b).
The GDPR refers to sensitive personal data as “special categories of personal data,” (EU Commission 2016) which could include one’s name, identification number or location data, like an IP address. It could also include other information (physical, genetic or cultural) that leads to an individual being identified. More care needs to be taken with sensitive personal data such as health data, religious beliefs and so on. Personal data, including sensitive data, should be protected, and it is argued that the ideology of “open everything” should be rethought in more pragmatic ways. Starting from the premise that open data does not equal useful data, what must be identified are the kinds of data that should and can be available online and those that can become useful and used. Different levels of openness would reflect the different needs of the born-digital materials, protecting any sensitive details that are present.
Born-digital archives are impressively diverse. To understand their needs and explore how to address the level-of-openness issue may require a holistic and analytic approach. This means that sociotechnical aspects of born-digital assets should be considered. Digital archives do not merely live online as neutral digital entities. They are part of the digital ecosystem that is constantly stretching, and they encompass its entanglements (Taffel 2019) and emerging complexities. Legal and economic issues are crucial elements of the ecosystem, capable of influencing and impacting born-digital archives in relation to their digital life cycle (Huggett 2018). The phases of the digital life cycleFootnote 2 involve the conception, production, accessibility, dissemination, reusability and sustainability of the assets. However, although these phases can be linear, they can also be messy. This is due to emerging technologies and automatic systems, where steps and phases of more “traditional” processes can become obsolete and in the long run be eliminated.
Undeniably, issues of ownership are bound to legal aspects. Who creates an asset? Who owns it? These are important questions in beginning to investigate how to address and reflect on the levels of openness through automation. For example, copyright constraints are capable of restricting certain actions, ranging from access to digital archives to their reuse. However, in some cases, ownership issues can become quite unclear in the digital realm. When an asset is created by humans it may be clear who holds ownership over it, but who holds ownership over an asset that was created by automation and algorithms? All of these critical issues are very important to understanding the complexities that are stemming from the emerging digital reality’s ecosystem. Addressing and reflecting on the different levels of openness through AI is a complex task; however, with the democratisation of AI—namely making emerging technologies and automation more inclusive, as discussed in the previous section—this potential could be tapped. A human-centred approach to AI would reflect the pluralism of digital cultural archives and the different levels of openness that are required.
On the basis of the analysis of the interview record, it can be suggested that human autonomy and agency (Onsrud and Campbell 2020, 236) could operate in a higher level in the born-digital archival context than the digital agency. This is because of the complexities and challenges that born-digital assets bear, as discussed above. The different layers and levels of openness should be considered, alongside the plurality and diversity of born-digital archives and also the fact that privacy and preservation are core qualities for the ethical reuse of digital assets. Human intervention is required and needs to respond to the degree of importance of each level of openness; more care needs to be taken when personal data are entangled versus when public domain data are involved.
The tension between openness and privacy at the crossroads of AI and born-digital archives is a complex thing to navigate. Currently, AI and automation do not have the ability to reflect on the diversity and plurality of born-digital archives or to meet the needs of the archives in terms of different levels of openness, as discussed above. Reflecting on the level of openness and plurality of digital cultural archives can be a complex matter. Having taken all of the above into consideration, in regulating the various issues discussed, a human-centred approach could be capable of augmenting a fruitful relationship concerning open data, privacy and AI, in the pursuit of finding a balance while also allowing space for ethical public participation. Ultimately, a human-centred approach to AI, through amplifying human agency, could possibly hone the level of openness and reflect the plurality of the digital cultural archives, as well as meet the needs of the public. Opaque and exclusive AI practices are not aligned with openness and inclusivity, values that GLAMs advocate for. Moreover, a human approach could secure social inclusion, taking into consideration the needs and wills of diverse people and society at large. Undoubtedly, AI transparency could potentially lead to inclusivity, as a safe way of making AI fairer and more democratic, by tapping into the potential to regulate different levels of privacy and openness. The principles this article proposes would aid in mitigating the gaps between AI technology, non-transparent practices, opacity, algorithms and the public, in the light of born-digital archives.
Availability of data and material (data transparency)
They are complied.
Although metadata is data about the data, namely the creator of an asset, the time and place created etc., paradata refers to information regarding the documentation processes. It is the “Documentation of the evaluative, analytical, deductive, interpretative and creative decisions made in the course of computer-based visualisation [that] should be disseminated in such a way that the relationship between research sources, implicit knowledge, explicit reasoning, and visualisation-based outcomes can be understood” (Denard 2012, 66).
This term refers to the phases of the curation life cycle model as adopted by the Digital Curation Centre (DCC) (2021).
Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R, Herrera F (2020) Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Charmaz K (2006) Constructing grounded theory: a practical guide through qualitative analysis. SAGE Publications, London, Thousand Oaks, Calif
Charmaz K (2008) Grounded theory as an emergent method. In: Hesse-Bilber SN, Leavy P (eds) Handbook of emergent methods. The Guilford Press, New York, pp 155–170
Cooley M (1987) Architect or Bee? the human price of technology. The Hogarth Press, London
Cooley M (1989) Human-centred Systems. In: Rosenbrock H (ed) Designing human-centred technology: a cross-disciplinary project in computer-aided manufacturing. Springer, London, pp 133–143
Cooley M (1996) On Human-Machine Symbiosis. In: Gill KS (ed) Human machine symbiosis: the foundations of human-centred systems design. Springer, London, pp 69–100
Cooley M (2018) Delinquent Genius: the strange affair of man and his technology. Spokesman Books, Nottingham
Denard H (2012) A new introduction to the London Charter. In: Bentkowska-Kafel A, Denard H (eds) Paradata and transparency in virtual heritage. Routledge, pp 57–71
Dennis LM (2020) Digital archaeological ethics: successes and failures in disciplinary attention. J Comput Appl Archaeol 3(1):210–218
Dennis LM (2021) Getting it right and getting it wrong in digital archaeological ethics. In: Champion E (ed) Virtual heritage: a guide. Ubiquity Press, London
Digital Curation Centre (2021) The DCC Curation Lifecycle Model. https://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf. Accessed 27 Aug 2021
Eberhardt F (2007) Causation and intervention. Doctoral dissertation, Carnegie Mellon University
Europeana (2021) Europeana. https://www.europeana.eu/. Accessed 16 Aug 2021
European Commission (2021a) Regulation of the European Parliament and of the Council: Laying Down Harmonised Rules on Artificial Intelligence. https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence. Accessed 21 Apr 2021
European Commission (2021b) What personal data is considered sensitive? https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/legal-grounds-processing-data/sensitive-data/what-personal-data-considered-sensitive_en. Accessed 28 Apr 2021
European Commission (ed) (2016) General Data Protection Regulation: Regulation on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
Feenberg A (2017) A critical theory of technology. In: Felt U, Fouché R, Miller CA, Smith-Doerr L (eds) Handbook of science and technology studies. MIT Press, pp 635–663
Ferrari F, Graham M (2021) Fissures in algorithmic power: platforms, code, and contestation. Cult Stud, 1–19
Fricke C (2020) Missing Fairness: The Discriminatory Effect of Missing Values in Datasets on Fairness in Machine Learning. Master’s Programme in Computer, Communication and Information Sciences, Aalto University
Furman J, Seamans R (2019) AI and the economy. Innov Policy Econ 19:161–191. https://doi.org/10.1086/699936
Gill KS (2016) Architect or Bee? Mike Cooley: the human spirit. AI & Soc 31:435–437. https://doi.org/10.1007/s00146-016-0675-2
Gill KS (2019) Holons on the Horizon: re-understanding automation and control. IFAC-PapersOnLine 52:556–561. https://doi.org/10.1016/j.ifacol.2019.12.605
Glaser BG, Strauss AL (1967) The discovery of grounded theory. Aldine de Gruyter, New York
Google Arts & Culture (2021) Google Arts & Culture. https://artsandculture.google.com/. Accessed 16 Aug 2021
Helmond A (2015) The platformization of the web: making web data platform ready. Soc Media Soc. https://doi.org/10.1177/2056305115603080
Holton JA (2007) The SAGE Handbook of Grounded Theory. In: SAGE Publications Ltd
Huggett J (2018) Reuse remix recycle. Adv Archaeol Pract 6(2):93–104. https://doi.org/10.1017/AAP.2018.1
Huggett J (2021) Algorithmic agency and autonomy in archaeological practice. Open Archaeol 7:417–434. https://doi.org/10.1515/opar-2020-0136
Jaillant L (2019) After the digital revolution: working with emails and born-digital records in literary and publishers’ archives. Arch Manuscr 47:285–304. https://doi.org/10.1080/01576895.2019.1640555
Latour B (1999) Pandora’s hope: essays on the reality of science studies. Harvard University Press, Cambridge
Lempert LB (2007) The SAGE Handbook of Grounded Theory. In: SAGE Publications Ltd
Little RJA, Rubin DB (2020) Statistical analysis with missing data. Wiley series in probability and statistics Wiley, Hoboken
Mackenzie A (2018) From API to AI: platforms and their opacities. Inf Commun Soc 22:1989–2006. https://doi.org/10.1080/1369118x.2018.1476569
Madianou M (2021) Nonhuman humanitarianism: when ’AI for good’ can be harmful. Inf Commun Soc 24:850–868. https://doi.org/10.1080/1369118X.2021.1909100
MAXQDA (2021) MAXQDA The Art of Data Analysis. https://www.maxqda.com/. Accessed 16 Aug 2021
Onsrud H, Campbell J (2020) Being human in an algorithmically controlled World. Int J Human Arts Comput 14:235–252
Open Knowledge Foundation (2021) What is open? https://okfn.org/opendata/. Accessed 10 Feb 2021
Pasquale F (2015) The black box society. Harvard University Press, Cambridge
Pollock R (2018) The open revolution. A/E/T Press, London
Stapleton L, O’Neill Brenda, McInerney Patrick (2020) Intelligent control and automation systems: Mike Cooley’s Vision of Socially Responsible, Human-Centred Technology
Taffel S (2019) Digital media ecologies: entanglements of content, code and hardware / Sy Taffel. Bloomsbury Publishing, New York
Turing AM (1950) Computing machinery and intelligence. Mind LIX. https://doi.org/10.1093/mind/LIX.236.433
Tzouganatou A (2018) Can Heritage Bots Thrive? toward future engagement in cultural heritage. Adv Archaeol Pract 6:377–383. https://doi.org/10.1017/aap.2018.32
Tzouganatou A (2021) On complexity of GLAMs’ digital ecosystem: APIs as change makers for opening up knowledge. In: Rauterberg M (ed) Culture and computing design thinking and cultural computing. Springer International Publishing, Cham, pp 348–359
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, Hoen PAC’t, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
Ziewitz M (2016) Governing algorithms: myth, mess, and methods. Sci Technol Human Values 41:3–16. https://doi.org/10.1177/0162243915608948
Zuboff S (2019) The age of surveillance capitalism: the fight for a human future at the new frontier of power. PublicAffairs, New York
Open Access funding enabled and organized by Projekt DEAL. Additionally, the research work has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Grant agreement No. 764859.
Conflict of interest
No potential conflict of interest is reported by the author.
Ethics approval can be found attached.
Consent to participate
Template of the informed consent can be found attached.
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tzouganatou, A. Openness and privacy in born-digital archives: reflecting the role of AI development. AI & Soc 37, 991–999 (2022). https://doi.org/10.1007/s00146-021-01361-3
- Artificial intelligence
- Human-centred AI