1 Introduction

The rise of Computational Social Science (hereafter CSS) over the last decades has become mingled with the rise of big data, and more recently with that of Artificial Intelligence (AI) and the automated processing of data on an enormous scale. New applications and uses of data arose with the advent of new data sources over the 2000s, and especially the rise of mobile phones and mobile connectivity in most countries of the world. The following decade has borne out many of the predictions that were made when big data was first conceptualised—that it would make populations, activities and behaviour visible in ways that had not previously been possible and that this would have huge impacts on both analysis and intervention across a range of fields, from urban policy to epidemiology and from international development to humanitarian intervention. This chapter examines the use of CSS in relation to national and international policy issues and asks who benefits, and how, when computational methods and new data sources are used to conduct policy-relevant analysis.

What few commentators on big data forecasted was the extent to which big data would represent a private sector revolution (Taylor & Broeders, 2015). The step change in volume, immediacy and power constituted by the new sources of large-scale data did not stem from bureaucratic or academic innovation, but from changes in the commercial world driven by new devices and massive investments in software, hardware and infrastructure. Despite the United Nations’ call in 2014 (United Nations, 2014) to use data for the public good, the potential of the new data sources to make people visible and to inform intervention has been led primarily by commercial firms, with policy as a secondary user of what is still largely commercial data. The proprietary nature of much of the data used in CSS is important because it determines what information becomes available and to whom and what kind of analysis and intervention it can inform. It also, as predicted more than a decade ago, creates hierarchies amongst researchers and institutions, since access to data is a privilege to be negotiated (boyd & Crawford, 2012). This has meant that so far the CSS field has been mainly populated by high-status researchers from well-funded institutions in high-income countries, who also tend to be male, white and connected to the well-funded academic disciplines of computer science, quantitative sociology and statistics or to policy interests that tend towards security, population management and economic development.

One example of the increasing hybridisation in the way data is sourced between commercial, international organisations and governments is the case of the ‘Premise’ app.Footnote 1 Developed in Silicon Valley, Premise is a crowdsourcing survey app that pays people small amounts to photograph or report features of their surroundings, from cash machines and construction sites to food prices. Initially marketed as a tool for international development agencies to remotely source information, it then became a tool for businesses to assess market possibilities and competition. It next morphed into a way for intelligence services to collect data covertly, with tasks offered such as photographing the locations of Shiite mosques in Kabul (Tau, 2021).

As the case of Premise suggests, data is not neutral. Unless we understand who it reflects and how it was sourced, there is the potential for harm to what Metcalf and Crawford have termed the ‘downstream subjects of research’ (Metcalf & Crawford, 2016). Understanding data gathered remotely also poses epistemological problems: without domain and local knowledge to convey ground truth (Dalton et al., 2016), not only its analysis but any interventions it informs are likely to be flawed and unreliable. Digitally informed analysis and intervention also raise issues of power and justice given that powerful interests drive its collection and use. Social data is always attached to people. Analysis often obscures that connection, but it remains throughout the lifecycle of bits and bytes, from information to intervention and evaluation.

Despite its usually explicit aim to have effects on people, Computational Social Science is not, however, subject to the kind of review process that is normal for other research on human subjects. One reason may be that it is designed to inform intervention, but it is not so far classified as constituting intervention itself. This places it in a different category to biomedical research, which is governed through an infrastructure for ethical review at the European, governmental and institutional level (EUREC, 2021). It also tends to escape review within academic institutions because CSS usually does not become registered as social scientific research unless a project lead is employed by a social science faculty. Due to their technical demands, many CSS projects originate in computer science, economics or data science departments and institutes. On the European level, CSS projects undergo an ethics check if the principal investigator flags them at the application stage as using personal data—which may not happen in many cases due to the definitional problem outlined by Metcalf and Crawford (2016). If they do go through review by the European Research Council’s ethics committee, they are reviewed for data protection compliance and for classic research ethics issues such as benefit-sharing, but as explored later in this chapter, this may not capture important ethical challenges relating to CSS, particularly where the benefits are defined as relating to policy.

2 Background: Computational Social Science and Data Justice

The central aim of using large-scale data and Computational Social Science methods to inform policy is to positively impact society. This aim, however, comes with no definition of which people should benefit and whether those are the same people who are reflected in the data. The unevenness of the new large-scale data sources, their representativeness and their potential for uneven effects when used in policy, therefore, are central concerns for any researcher or policymaker interested in not doing harm.

Over the 2010s first the field of critical data studies and, later, the related field of data justice have taken up these issues methodologically and theoretically. These fields have roots in digital geography, charting how epistemologies of big data (Kitchin, 2014) differ from previous ways of seeing the world through statistics and administrative accounts and how the geography of where and how data is sourced determines whose truth it can tell us (Dalton et al., 2016). They are sceptical of the claims of granularity and representativeness often made about large-scale data, a scepticism also present in the post-colonial strand of critique which has shown how the datafied representation of populations, cities and movement is always filtered through narratives of entrepreneurialism, innovation and modernity, which shape both the starting point and the uses of such analyses (Couldry & Mejias, 2019; Datta & Odendaal, 2019). Similar critiques can be found in sociological research, which take issue with the idea that data can ever be neutral or raw (Gitelman, 2013) and which also expose the underlying ideology of what van Dijck calls ‘dataism’:

a widespread belief in the objective quantification and potential tracking of all kinds of human behavior and sociality through online media technologies. Besides, dataism also involves trust in the (institutional) agents that collect, interpret, and share (meta)data culled from social media, internet platforms, and other communication technologies. (Van Dijck, 2014, p. 198)

Where all these accounts cumulatively tend is towards a statement that not only is big data applied to policy issues not as granular or omniscient as the hype of the early 2010s promised it would be, but that far from being objective, it is fundamentally shaped by the assumptions and standpoint of all the actors (many of them commercial) controlling its trajectory from creation to analysis and use. Not only are the questions asked of data usually oriented towards the needs and perspectives of the most powerful (Taylor & Meissner, 2020), but the data itself is generated, collected and shared in ways that reflect and confirm the status quo in terms of resource distribution, visibility and agency. As AI increasingly becomes an important part of data’s potential lifecycle, with data used to train, parameterise and feed models for business and policy, this dynamic where data reflects existing power and its interests becomes magnified. Data is now not only useful for making visible the behaviour and movement of populations, it is useful for optimising them. Correspondingly, any lack of representativeness or understanding of the interests and dynamics the data reflects are translated in this move from modelling to optimising into a direct shaping of subjects’ opportunities and possibilities (Kulynych et al., 2020).

Research on these issues of justice has been done in disciplines ranging from computer science (Philip et al., 2012) and information science (Heeks, 2017) to development studies (Taylor, 2017) and media studies (Dencik et al., 2016) and is increasingly affecting how regulators think about the data economy (Slaughter, 2021). How research, and specifically policy-relevant research conducted under the heading of Computational Social Science, intersects with this problem of data justice is the focus of this chapter. The questions that arise from CSS are not confined to data itself or to scientific or policy research methods. Instead, they span issues of democratic decision-making, representative government, the governance of data in general and social justice concerns of recognition, representation and redistribution. As Gangadharan and Niklas have argued (2019), doing justice to the subjects of datafication and datafied policy often implies decentring the technology and being less exceptionalist about data.

3 Questions and Challenges

It is possible to group the justice-related issues outlined above around two poles: the effects of CSS on those who use the data and its effects on those whom the data represents. The first deals with how data confers new forms of power on the already powerful through their access not only to data itself but also resources, computing infrastructures and policy attention. The second relates to the way in which making people visible to policy does not automatically benefit them and may instead either amplify existing problems or create entirely new ones, while the remote nature of the research decreases people’s agency in relation to policy and decision-making. CSS methods, and the data that fuels them, frequently confer on researchers the power to make social phenomena visible at the aggregate level and continuously—people’s behaviour, whereabouts, activities and histories—and on policymakers the power to intervene in new ways.

The optimisation of social systems and its policy predecessors, nudging and governance through statistics, are all ways of intervening that rely on detailed quantitative data. Computational Social Science demonstrates the tendency of this datafied power to be unbalanced in its distribution, favouring those with the resources, infrastructure and power to gather and use data effectively. Like all social science, it involves a power relation between the researcher and the subject, but in the case of CSS, that subject can be an entire population. Large-scale data conveys the power to intervene but also the power to define problems in the first place: what Pentland has termed ‘the god’s eye view’ (Pentland, 2010) brings with it little accountability.

A justice perspective, above all, asks what would shape the power conferred by data towards the public interest. Adding a governance perspective means we should also consider how the negative possibilities of datafied power can be systematically identified and controlled. Computational Social Science, specifically where it has the aim of informing policy, is a relevant field in which to ask these questions for two reasons. First, because the ways in which it accesses data, analyses it and uses it to intervene opaque to the public, taking place in the realm of large producers of data and high-level policymakers. This has meant that CSS has so far been relatively invisible to the kind of ethical or justice-based critiques which have arisen around AI and machine learning over the recent period. Second, we should interrogate it because it increasingly has real and large-scale effects on populations, either local or distant, once translated into policy information.

3.1 Who Benefits?

The issue of the distribution of benefits from CSS is both discursive and contested. Discursive because as with all scientific disciplines, there is an argument that fundamental research is justified by the search for knowledge alone, but this is counterbalanced by the responsibility that research on human subjects brings with it. CSS has not so far been categorised as human subject research because, despite its connection to policy and the shaping of social processes, data is collected remotely and the human subjects are not directly connected to the research. This means that CSS research has not so far been subject to the same ethical review process as human subject research, where researchers must explain how any benefits of their research will be distributed. The question is also contested because human subjects of the research, given the chance, will often have very different understandings of what constitute benefits. For example, starting from the assumption that data exists and must therefore be used (Taylor, 2016) is problematic because it addresses data about society as ‘terra nullius’ (Couldry & Mejias, 2019), a raw resource which exists independently of the people it reflects. In contrast, the subjects of the research (city dwellers, migrants, workers, the subjects of development intervention and others) may disagree that this is true. The ‘terra nullius’ assumption has also been undermined by work on group privacy (Taylor et al., 2017), which argues that data which facilitates intervention upon people—whether personal data or not—raises the question of when it is justified to shape and optimise behaviour or social conditions. Given that CSS is usually conducted on remote subjects with only the consent of the intermediaries holding the data, this means its legitimacy is usually based on the interests of those intermediaries and the researchers, not the subjects of research themselves (Taylor, 2021).

To offer an example, data stemming from refugees’ use of mobile phones was made available by the Turkish national mobile network operator and used remotely by computational social scientists in the Data for Refugees challenge (2019).Footnote 2 One group built a model that could identify where people were working informally—something 99% of Syrian refugees in Turkey were doing at the time due to lack of employment permission. The authors explain their logic for conducting the study:

Refugees don't normally have permission to work and only have access to informal employment. Our results not only provided country-wide statistics of employment but also gave a detailed breakdown of employment characteristics via heatmaps across Turkey. This information is valuable since it would allow GOs and NGOs to refine and target appropriate policy to generate opportunities and economic integration as well as social mobility specific to each area of Turkey. (Reece et al., 2019, p. 13)

It is possible to contest this, however. The fact that Turkey was legally restricting the right of refugees to internal mobility and employment—which the authors note many other countries also do—does not mean that this is in line with international human rights law (International Justice Resource Center, 2012). It is doubtful that the Syrians in the dataset would find that creating a way to make visible their mobility and informal employment was in their own interests. The authors’ claim that their model allows government and non-governmental organisations to target policy, generate opportunities and economic integration and help refugees become socially mobile rests on the optimistic assumption that these organisations are incentivised to do so. An alternate and more likely result would be that the model would facilitate the authorities’ ability to constrain refugees’ ability to move and work, an incentive already present in Turkish law.

Whose interests does this analysis serve, then? First, the Turkish government, since the model can help enforce a national law against refugees’ moving and working freely. It may be in the interests of NGOs wishing to help refugees, but given the Turkish regime’s laws targeting organisations that do so (Deutsche Welle, 2020), it is unlikely. The national telecom provider is a potential beneficiary in terms of positive publicity and potentially governmental approval if the authoritarian government of Turkey sees the researchers’ analysis of the data as being useful for its governance of refugee populations. Lastly, the researchers themselves benefit in the form of access to data and ensuing publications. And so we can chart how analysis that claims to be ‘Data for Refugees’ may in fact be data for government, data for telecom providers and data for academic researchers.

Scholars of data governance have debated the problem of determining interests in, and rights over, data once it enters the public sphere. These include public data commons and data trusts (Micheli et al., 2020), both of which appear at first sight ideal for protecting the rights of data subjects. These approaches are promising under conditions where data is moving within the same jurisdiction (local, national or regional) in which it was created and where there is a fiduciary capable of representing the interests of the people reflected in the data (Delacroix & Lawrence, 2019). In the case of cross-border transfers of data for scientific research, however, this chain is often broken at the starting point. In the case (common in CSS) of mobile data on non-European populations, the data is de-identified and aggregated by the mobile network provider (Taylor, 2016) before it is made available for analysis, placing the network provider in the position of fiduciary. Creating a different fiduciary would in the case explored above mean empowering someone to represent the interests of all Syrian refugees in Turkey.

This hints at several problems: can a fiduciary from a group in a situation of extraordinary vulnerability be expected to have the power to protect that group’s interests? What happens when the group in question has, as in this case, a limited set of enforceable rights compared to everyone else with an interest in the data? For example, are the claims of the Western CSS community likely to be effectively contested by a population of refugees primarily engaged with their own survival? It is easy to see how, in cases where people within a population of interest is not able to assert their rights, even fiduciary arrangements quickly come to represent an idea of the public good that may not align with that group’s own ideas—if such a diverse group agrees on what is in its interests in the first place.

This case illustrates that, given that the stakes for refugees in being monitored and intervened upon are extraordinarily high, and the CSS in this case actively creates new vulnerabilities, it seems more attention should be given to how far fiduciary-based models can stretch. In situations of radical power asymmetry, it is not clear that the fiduciary model necessarily leads to the legitimate use of data for research. In fact, drawing on discussions of indigenous data sovereignty, it is clear that in the case of people in situations of vulnerability, a model based on the assumption that data will be shared and reused may not be appropriate (Rainie et al., 2019). As indigenous scholars point out (Simpson, 2017), if refusal is not an option on the table for those who have been made vulnerable, further ideas about governance cease to be ethical choices.

3.2 Making People Visible: Surveillance as Social Science

Data sourced from platforms, large-scale administrative data from public services or data from monitoring of public space are, in their different ways, all forms of surveillance. They are often quite intimate, drawing a picture of how people use city space or move across borders, how they break rules and create informal ways to support their families in emergency situations and how they catch and pass on infectious diseases, spend their money, interact with each other and use public services. Human activity everywhere is becoming datafied, sometimes with people’s knowledge as they engage with platforms and online services, but often without their awareness as they are captured by CCTV, satellites, mobile phone network infrastructures, apps or payment services. Increasingly, these forms of surveillance intersect and feed into each other. Urban space has become securitised through the availability of CCTV and mobile phone data, just as borders have become securitised through satellite surveillance and geospatial sensing. But all these sensing technologies are dual use—either in their potential or in their actual usage by authorities. Urban crowd sensing systems, relying on mobile phone location data and social media analysis, were first created as a way to keep track of crowding during public events and then repurposed to help enforce pandemic public health measures. These functions also, however, support police and security services by showing how public protests evolve, by helping track how people move to and from locations authorities wish to control and by making it possible to identify protesters in real time—something law enforcement used to chilling effect during the Hong Kong protests of 2019–2020 (Zalnieriute, 2021).

Border enforcement activities have also become an important target for Computational Social Science methods. In 2019 the European Asylum Support Office was warned by the European Data Protection Supervisor (EDPS, 2019) that conducting social media analysis of groups assumed to be potential migrants in Africa, with the aim of tracking migration flows towards the EU’s borders, was illegal under European data protection law. This was a project the Asylum Support Office had inherited from the United Nations, which had been developing Computational Social Science methods with big data for nearly a decade (Taylor, 2016) using methods developed in collaboration with academic Computational Social Science researchers. Similarly, epidemiological surveillance has a long history of constructing models that show how people move across borders, first in relation to malaria and later dengue and Ebola (Pindolia et al., 2012; Wesolowski et al., 2014). These methods were co-designed and then separately developed by mobility researchers over the 2010s, culminating in the use of mobile phone connectivity for tracking infections (and people’s movement in general) during the COVID-19 pandemic (Ferretti et al., 2020). Mobile data in particular can inform many forms of monitoring, from policing borders to political protests, with methods shared between humanitarian technologists, public health specialists, security services and law enforcement.

These interactions between different forms of surveillance suggest two conclusions: first, that an innocuous history and set of uses can always be claimed for any methodology involving surveillance-derived data and, second, that the reverse is also true—all methods and types of data intersect at some point in the data’s lifecycle with uses that potentially or actually violate the right to protest anonymously, to move freely, to work, to self-determination and many other rights and entitlements. A justice-based approach illuminates these interactions rather than seeking the innocuous explanations and follows data and methods through their lifecycles to find the points where they generate injustice by rendering people visible in ways that are damaging to their rights and freedoms.

Much of this discussion comes down to the question of who has the right to derive policy-relevant conclusions from data, under what circumstances and on whose behalf. It is not a simple question: should people ‘own’ data about them (something that is not present in data protection law, or any other, which only confer rights over data to people under some specific circumstances in order to protect from harm), or should the makers and managers of data be free to use it in line with whatever they conceive to be the public benefit? The issue seems mainly to revolve around how the public benefit will be agreed upon, rather than who has the right to data per se. Forced migrants in particular but also those suffering marginalisation or disadvantage of any kind may be generating information that is important not only to them, but also to others—on environmental change, conflicts and humanitarian crises, for example, not to mention living conditions in cities and the adequate provision of public services such as education and transport. What should we say about the shared interests in data that can illuminate problems and inform change?

This is partly a question for democratic discussion—something that has not been well conceptualised so far. It is also, however, a normative question that the EU needs to find a preliminary answer to in order to make possible such a debate. One suggestion from work on data justice is that the normative framing tends to be that of economic growth and technical advancement, whereas an alternative but valid one is that of the good of the groups involved in the data. If the starting point for analysis is the interests of those groups, this demands not only different ways of analysing the ethics of a particular research project or policy advice process but also that democratic processes be set up for determining the interests of the groups in question (Taylor & De Souza, 2021). This becomes a much broader issue of decolonising international relations, reframing the allocation of fundamental rights so that they cover people, for instance, on both sides of the EU’s border, and treating people who are in conditions of conflict, forced migration or other precarious situation as if they are the same kind of legal subject as more empowered and vocal research subjects in easier conditions.

4 Addressing Justice Concerns: Ethics, Regulation and Governance

The potential and actual justice problems for CSS outlined above are frequently seen as problems of research ethics. If researchers can comply with data protection provisions, the logic goes, they will not violate the rights of those the data reflects. Similarly, if research ethics are followed—again, mainly focusing on the privacy and confidentiality of research data because consent tends to come from the intermediaries offering the data—the subjects of the research will be protected. Both the data protection-compliance and research ethics/privacy approaches, however, are necessary but insufficient to address the justice concerns that arise from CSS methods and the ways in which they inform policy.

As the EDPS’ warning to the Asylum Support Office states, the problems caused by remote analysis of data on unaware and often vulnerable populations are not solved by preventing the identification of individual research subjects. In its letter the data protection supervisor’s office notes that ‘EASO accesses open source info, manually looks at groups and produces reports, which according to them no longer contain personal data’ and that ‘EASO’s monitoring activities subject them to enhanced surveillance due to the mere fact that they are or might become migrants or asylum seekers’. Both these statements accurately describe much of CSS research, hence the relevance of this example. The EDPS names two risks: possible inaccuracy in identifying groups (not individuals) who might attempt to cross borders irregularly—something with potentially serious consequences for the people involved—and the risk of discrimination against those people. The EDPS quotes theory on group privacy, noting ‘the risk of group discrimination, i.e. a situation in which inferences drawn from SMM [social media monitoring] can conceivably put a group, as group, at risk, in a way that cannot be covered by ensuring each member’s control over their individual data’ (EDPS, 2019) (the EDPS also notes, however, that the likelihood of such individuals knowing their data is being used in this way and ‘controlling’ it is vanishingly small).

The EDPS’ analysis of this problem merits serious consideration by CSS researchers, given that it overturns a generation of research ethics based on preserving the individual privacy and confidentiality of research subjects. If we shift the focus from the individual in the dataset—who will often be de-identified anyway—to the consequences of the analysis, a whole different set of concerns opens up, namely, those of rights violations, discrimination and illegitimate intervention on the collective level. In this scenario, it is not enough for researchers to claim that they are merely performing social scientific analysis and that the potential policy uses of their work are not their responsibility. CSS is intimately connected to policy through a history of providing findings on public health, migration dynamics, economic development, urban planning, labour market dynamics and a myriad other areas which connect directly to policy uses.

It is not clear how to govern CSS research so that research ethics is not violated. As experts have pointed out, research ethics practices, and the academic infrastructure of checks and balances that enforce it, urgently require updating for the era of big data research (Metcalf & Crawford, 2016). Given that the field of CSS does not conceptualise itself as ‘human subjects research’, researchers are not incentivised either to conceptualise the downstream effects on whole populations or to weigh the justification for those effects. Instead they are strongly incentivised to make general statements about how their research will benefit society or institutions, without acknowledging that those benefits come with costs to others, most often the subjects of the research themselves. This lack of alignment between research ethics and much of CSS research does not justify proceeding with business as usual. Instead it sets a challenge to both CSS researchers and the policymakers who use their findings: to place real checks and balances on what research can be done, with processes involving both domain knowledge and rights expertise, and to undertake concentrated work to identify the ways in which projects may create or amplify injustice. Only by doing so can the acceptability and normality of doing unacknowledged dual-purpose research be countered.

This is particularly important given that data’s availability will potentially become much greater over the 2020s. New models for data sharing such as those outlined in the EU’s data governance act (European Commission, 2020) are designed to contribute to the availability of data for both CSS and AI, both redefining ‘public’ data as data with possible public uses and setting broader parameters for sharing it between business, government and research. These new models also include new intermediaries to ensure that ‘altruistic data sharing’ can occur without friction. Once enacted, this vastly greater legal and technical infrastructure will increase the interactions between the public and private sectors, allowing research to more comprehensively inform policy and business. It is likely the line between the two will increasingly blur, as governmental and EU research funding continues to be oriented towards serving business and the EU’s economic agenda. It is likely that this blurring of boundaries between the commercial and research worlds will also lead to more policy-relevant research in terms of influencing social behaviour, just as nudging both inherited methods from and contributed to marketing research over the 2000s (Baldwin, 2014). Such a merging of commercial and governmental surveillance and analytical methodologies has already occurred: the Snowden revelations of 2014 (Lyon, 2014) revealed that security surveillance was already based on scanning behavioural and social media data and that it was conducted not by native security technicians but by commercial contractors. More recently the work of the Data Justice Lab in Cardiff, for example, has demonstrated that citizen scoring has transitioned from a commercial to a governmental practice, with the two connected by common methodologies and analytical practices (Dencik et al., 2018).

5 The Way Forward

The analysis in this chapter offers two main conclusions: First, that the field of CSS has evolved without an accompanying evolution of debates on ethics and justice and that these debates are long overdue. Second, that CSS is privileged as policy-relevant research precisely because of many of the features which bring up concerns about justice—large-scale datasets, remote data gathering, purely quantitative methods and an orientation towards policy questions rather than the needs of the research subjects.

The hype that has accompanied the discovery of new data sources and new ways of applying statistical methodologies to very large-scale data has frequently eclipsed the question of when doing such analysis is justified and whether the benefits it may create are proportionate to the costs of making people and their activities visible to new (policy) actors. Migration data offers a key lesson here: computational collection and analysis of large-scale data does not aim at identifying individuals and is therefore considered by its practitioners not to be problematic. However, when practised with the aim of providing an ‘early warning system’ for the approach of irregular migrants to the EU’s borders, it has the potential to violate fundamental human rights, both in the form of discrimination and by narrowing the right to claim asylum. Similarly, building models to identify those working irregularly in refugee receiving states may be welcomed by state authorities and by the statistical methods community, but does not represent a contribution to the care and wellbeing of the refugees in question. Once such a model exists, the researcher cannot unpublish it—it is open to the use of anyone with access to the relevant type of data. The responsibility in this case is squarely with the researcher, but accountability is absent.

One step, therefore—if the field of CSS and the policymakers it informs wish to move towards a justice-based approach—is to subject all CSS studies involving data on people and informing any kind of intervention, to the same kind of ethical review that is performed on standard social scientific research projects involving human subjects. This is not enough on its own, however: that ethical review has to also respond to concerns about proportionality, fairness and the appropriateness of the methods to the question, regardless of whether the research is remote or in-person. The examples offered in this chapter suggest that it is time to update research ethics to cover the fields and methods involved in big data and that this is also a concern for policymakers interested in aligning their work with human rights. Demand from CSS researchers and policymakers could provide the necessary stimulus to update academic research review for the 2020s and align checks and balances with contemporary research practices.

A second concern is that CSS is rarely, if ever, performed in circumstances where the individuals implicated by the research either influence the questions asked or have access to the conclusions. A notable exception is ‘citizen sensing’ methods (Suman, 2021) where people source data about their local environment and use it to create public awareness, policy change or both. There is much room for expanding these methodologies and practices, as well as formalising and standardising them so that they can be a more accessible resource for policymakers (Suman, 2019). Another exception is the informal version of citizen sensing, sousveillance, which has a long history of disrupting the use of digital data for restricting public freedoms. Like citizen sensing, which tends to challenge the business and policy status quo, sousveillance practices are a datafied tool for the marginalised or neglected to assert their rights and claim space in policy debates. Unlike established CSS analysis where people are addressed as passive research subjects generating data which can only meaningfully be analysed at scale, sousveillance analysis tends to be conducted on the micro-level, as, for example, in Akbari’s account of Iranian women tracking the moral police through Tehran in order to avoid their scrutiny (Akbari, 2019), van Doorn’s account of gig workers in Berlin collecting data to reverse-engineer a platform’s fee structures and challenge its labour practices (Doorn, 2020) or AlgorithmWatch’s construction of a crowdsourced credit check model in Germany (AlgorithmWatch, 2018).

Although they also employ social science methods and can be rigorous and reliable, the entire point of these sousveillance methods is that they do not scale: they are local and specific, devised in response to particular challenges. They constitute participatory action research, a methodology where the research subject sets the agenda and where the aim is advancing social justice. Such methods constitute a claim to the right to participate, both in research and in society: they are an assertion of the presence and rights of the research subject. It is worth considering the numerous obstacles that this kind of research meets when it claims policy relevance: it has traditionally been rejected as unsystematic, not scalable, and unreliable because it reflects a local, rather than generalised, understanding (Chambers, 2007). These methods can be seen as the antithesis of current CSS in that they present a contradictory set of assumptions about what constitutes reliability, policy-relevance and participation. They also raise the question as to whether CSS in its current policy- and optimisation-oriented form can align with social justice concerns or whether data governance in this sphere should be aiming for legal compliance and harm reduction.