1 Introduction

As Sabina Leonelli notices in her introduction to this volume, Bruno Latour’s notion of immutable mobiles – ‘objects which have the properties of being mobile but also immutable, presentable, readable and combinable with one another’ (Latour 1986, 7) – has been a useful starting point for making sense of data journeys in the sciences. In this contribution I take Latour’s notion as a point of departure for probing into how digital data become ‘tools for communication’ (Leonelli 2016, 69) in astronomical research, oriented not only to the production of specific results but also to the repair or correction of data analysis practices. In doing so I take note of how data journeys in astronomy are shaped by its disciplinary setting in terms of researchers’ shared object of interest (the sky), their use of digital infrastructures and data standards, as well as their largely shared access to telescopes and data. This has pervasive effects on the mobility and uses of data in astronomy. One of these is how it enables practices to be reflexive, that is, how earlier observations and interpretations can be witnessably revised in sequences of action.Footnote 1

Hans-Jörg Rheinberger (1997, 106) has observed that, by making (traces of) transient events durable and available in many places and at various times, immutable mobiles are ‘able to retroact on other graphematic articulations – and, what is most important, not only on those from which they have originated.’ Drawing on William Ivins (1953) and Elizabeth Eisenstein (1979), Latour (1986, 19–20) can be read as illustrating this retroaction with the impact of printing technology on early modern astronomy, which made it possible for astronomers to notice differences and inconsistencies in data, allowing them to use new observations to re-assess prior ones.

The retroaction that Rheinberger describes is worth a closer look if one seeks to gain insights into contemporary data uses as social and material practices. For one thing, it brings the sequentiality and temporality of scientific work into focus. New data can lead researchers to re-consider prior records. They can spot differences where data were expected to show ‘the same,’ alerting data users to details of the unavoidably local and contextual production and interpretation of data. In its course, data may be used-as-is, be dismissed, or repaired.Footnote 2

When conceived as machine-generated ‘inscriptions’ (Latour and Woolgar 1979), digital data may appear to be text-like, a form of writing. The transmission of writing has been commonly regarded as fundamentally distinct from dialogical exchanges in co-presence (Peters 1999). Sybille Krämer expresses this view starkly when she writes that ‘[t]ransmission is precisely not dialogical: the goal of technical communication is emission or dissemination, not dialogue. We can thus clearly distinguish between the personal principle of understanding and the postal principle of transmission’ (Krämer 2015, 23). As conversation analysts have demonstrated, talk-in-interaction (whether in face-to-face situations or mediated through telephones or screen-based media) is shaped by the ongoing repair of utterances: fellow conversationalists routinely resolve the meaning of indexical, context-dependent utterances in the ‘here and now’ of their interaction, and thus maintain mutual understanding and communicative order concurrently. For example, a speaker may correct an utterance upon noticing her recipient’s misunderstanding – a case of self-repair. In doing so participants maintain intersubjectivity (Schegloff 2006).

By contrast, uses of texts appear to be subjected less to the ‘tyranny of accountability’ (Enfield and Sidnell 2017) characteristic of social interaction in co-presence (Deppermann 2015). Interpreting texts is less constrained than the interpretation of utterances in conversation, but also more necessary (McHoul 1982; Livingston 1995). But because of this, certain features of texts become more prominent and consequential for assuring the success of communication at a distance, including the resort to numbers (Porter 1995; Heintz 2007).

Some work on writing argues that the schism between transmission and dialogue is not as radical in practice as Krämer and others posit in principle. Thus, Dorothy Smith (2001, 175–176) suggested to conceive of the social, organizational and institutional uses of texts, especially of printed materials, as

text-reader conversations in which, unlike real-life conversations, one side of the conversation is fixed and unresponsive to the other’s responses. (…) However the reader takes it up, the text remains as a constant point of reference against which any particular interpretation can be checked. It is the constancy of the text that provides for the standardization effect. (…) Text-reader conversations are embedded in and organize local settings of work. (…) In standardizing one ‘party’ to every text-reader conversation, the terms of all conversations with the ‘same’ text are standardized. Among participants, an open-ended chain is created: text-reader-reader-reader-.

Much like Latour (1986), Smith explores the consequences of the spread of ‘identical copies’ to multiple sites, yet she focuses on the institutional, regulatory and always again locally situated uses of texts. If digital media technologies provide new possibilities for communication, one may wonder if, in scientists’ work with digital data, the schism of transmission and dialogue is likewise challenged.

Building on studies of social interaction and Alfred Schütz’s (1967) phenomenology of the social world, Charles Goodwin illustrates how social actors perform ‘co-operative, accumulative action on materials provided by predecessors who are not present’ (Goodwin 2018, 248). He argues that this pertains characteristically to scientific data production (Goodwin 2013, 8). Witnessing the training of an astronomy PhD student I observed that the work of combining data from different telescopes is not only sequential, temporal, and contextual, but also reflexive (Hoeppe 2014). That is, past actions and interpretations were commonly re-assessed, and repaired as this unfolding work was oriented to the (re-)construction of natural order. For example, when the output of an algorithm for parameter estimation was assessed and deemed implausible (yielding galaxies that were ‘too bright for their distance’), calibration exposures were re-inspected, resulting in the identification of an artifact of straylight that was subsequently subtracted to yield better calibrated ‘science images’ on which the algorithm was re-run. Involving such instances of repair this work bears a resemblance with repair in talk-in-interaction and correction in instructional settings as it has been studied by ethnomethodologists and conversation analysts (Macbeth 2004; Schegloff 2006).Footnote 3 It also resonates with studies that have expanded and elaborated this notion of repair to address the maintainance of infrastructures and socio-material orders (Henke 2000; Graham and Thrift 2007; Schaffer 2011; Sims and Henke 2012).

My aim in this chapter is to make the notions of repair and reflexivity fruitful for the study of data journeys in the natural sciences. I do so by attempting a reading of an episode of data re-use from recent astronomy. I focus on the presumed detection, in 2004, of a galaxy at record distance from Earth. The original data became public at the time of publication and were soon re-used and supplemented with new observations by other teams. I inquire into how data re-using scientists sought to reconstruct the practices used in making the discovery claim, and found them at fault. Doing so allowed them not only to suggest the repair of data (such as removing artifacts) but also the repair of data use practices, which were subsequently taken up by the scientists who had claimed the discovery. I shall argue that this work was enabled by astronomy’s discipline-specific ‘architecture for observation,’ of which objectual, technological and institutional elements provide contexts and resources for achieving the reflexive repair of data and data use practices. Before describing and interpreting this episode (in Sects. 3 and 4) I sketch the architecture of astronomical observation in which it unfolded (Sect. 2).

While I draw mainly on published sources, the episode I describe happened when I worked as an editor and staff-writer of the popular astronomy magazine Sterne und Weltraum. I wrote two pieces about it (Hoeppe 2004, 2005). This magazine’s editorial offices are located at the Max Planck Institute for Astronomy in Heidelberg (Germany), a leading research institute, where I benefitted from witnessing rumour about the claimed discovery and assessments of it. This chapter is also informed by my subsequent 18 months of ethnography on digital astronomical research practices, conducted between 2007 and 2010, followed by re-visits between 2010 and 2017, as well as by my own graduate training in astrophysics.

2 An Architecture for Observation: Enabling Reflexive Uses of Data

Seeking to gain insights into data journeys in contemporary astronomy as a social and material practice, I first identify three recurrent disciplinary aspects that come to matter therein: It is marked by astronomers’ shared practices of observing and re-observing objects in the sky (a), by their data being almost exclusively digital and available in a standard format (b), and by the shared access to many observing facilities and much observational data (c). The first of these – an object or environment, of sorts – is specific to astronomy (although reference to shared environments or objects is common in other disciplines as well). The other two – a set of technologies and social institutions – are shared to a certain degree with other scientific disciplines.

Together these aspects contribute essentially to what I shall call the architecture of contemporary astronomical observation. It is a relatively stable, and partly institutionalized, configuration that is shared by diverse users throughout various projects. Today encompassing all branches of astronomy, this architecture has been shaped by the use of satellite observatories and radio telescopes (Hoeppe, in preparation). Data need not be digital or public to be able to travel, nor does the sky have to be fixed for this to succeed, but in contemporary astronomy the three aspects – (a), (b) and (c) – are central to researchers’ experience.Footnote 4 Here I prefer ‘architecture’ to the notion of ‘knowledge infrastructure’ (Edwards 2010; Borgman 2015; Hoeppe 2019a) for drawing attention to the discipline-specific, situated and material setting of observational astronomy and its pervasive effects on the mobility and uses of data.

My use of ‘architecture’ is informed primarily by Michael Lynch (1993) and Charles Goodwin (2010). Drawing on work by Gurwitsch, Merleau-Ponty and Foucault, Lynch (1993, 132) inquired into how acts of observation are shaped and constrained by disciplinary ‘archi-textural environments’ that comprise buildings, laboratory set-ups and other equipment. Goodwin (2010, 107) conceives of an ‘architecture for perception’ as ‘a physical object that embodies a solution to a repetitive cognitive task posed in the work of the community using it.’ My use of ‘architecture’ resonates more loosely, but still pertinently, with Knorr-Cetina’s (2003) notion, informed in turn by Fligstein (2001), of the reflexive architecture of financial markets, wherein traders engage (and co-constitute) a shared object (a financial market) through mediating digital technologies.

2.1 Object: ‘Astronomy is About Observing and Re-Observing Sources on the Sky’

In a blog post, New York University astronomer David W. Hogg (2008) noted in passing that ‘[a]ll of astronomy and astrophysics is built on the observation and reobservation of sources on the sky.’ Doing so is contingent on the stability or ‘immutability’ of the sky that has been a commonplace for astronomers since Antiquity (Evans 1998). While some objects are known to move in respect to this apparently stable background, most celestial objects can be found again by reference to patterns of stars or celestial coordinates. These are dominant organizing principles for accessing observational data.

Whereas some ancient Greek philosophers famously imagined the astronomical sky to be a material sphere surrounding all observers on Earth (Aristotle 1939), contemporary astronomers tend to define it as ‘a two-dimensional distribution of intensity of electromagnetic radiation’ (Léna 1989, 245). But it only becomes a ‘two-dimensional distribution’ when thus represented using media like paper, photographs or digital technologies. The epistemic benefits of observing and re-observing objects in the sky are contingent on this use of media. In using diverse media, astronomers’ ‘mundane reason’ is oriented to reflexively producing consistent representations of the ‘same’ sky despite ever-present noise and artefacts in their data (Hoeppe 2014, 2019b; cf. Pollner 1987). In such work, cartographic reference posits the uniqueness of the world as a methodological maxim (Giere 2006) – an assumption that facilitates robustness reasoning in astronomy (cf. Wimsatt 2012 [1981]; Wylie, chapter “Radiocarbon Dating in Archaeology: Triangulation and Traceability”, this volume).

2.2 Technology: Astronomical Data Are Digital, and Utilize a Standard Format

A second aspect of contemporary astronomy’s architecture of observation is technological. Unlike the enormous diversity of materials that biologists, oceanographers or archaeologists can use (Leonelli 2016; Halfmann, this volume; Wylie, this volume), almost all data in contemporary astronomy are digital recordings of cosmic radiation. To unpack the specific salience of the digital for the travel of data, it is necessary to refine Latour’s (1986) notion of immutable mobiles, which included, among others, hand-drawn maps, machine generated inscriptions and printed tabulations. Rheinberger (2011, 344) suggests that the traces produced in laboratory experiments become ‘data proper’ (and proper immutable mobiles) only when they can be easily stored and retrieved. In my reading, he appears to be close to suggesting that ‘data proper’ are symbols. In Peirce’s (1992 [1894]) classification of the relation between signs and their objects, traces are indices and represent their object by contiguity. Photographs are indices as well as icons, signs which correlate with their objects by resemblance. Beyond this, digital photographs are also symbols, since – constituted by arrays of numbers, in binary format or otherwise – they use notational conventions. This resonates with an understanding of the digital as the ‘encoding’ of ‘information’ that permits its subsequent retrieval without loss (e.g. Dourish 2017, Chapter 1).

Invented in 1969, Charged-Coupled Devices (CCDs) are found in most digital cameras and at all observatories today (Smith and Tatarewicz 1985; McCray 2014). These detectors use the photoelectric effect to produce grid-shaped pixel images which can be read out and then stored, retrieved or transmitted as digital files. Not only are they very sensitive, and – once cooled with liquid nitrogen to reduce quantum noise in the detector – can be exposed for several hours. CCDs also are very linear, recording incoming light in direct proportion to the exposure time. This implies that their outputs are directly amenable to arithmetic calculations, including the pixel-by-pixel addition, subtraction and division of images, with generative uses for epistemic work (Hoeppe 2019b). The linearity of CCDs also allows astronomers to calculate the exposure time necessary for reaching a specific sensitivity. This encourages conceiving of data in terms of the ‘abstract time’ of exposures and facilitates scheduling observing time – a requirement for the institutionalization of service mode observing, in which observatory staff members produce data for absent data users (Hoeppe 2018).

In 1979, astronomers defined FITS (Flexible Image Transport System), a shared data format to ‘transfer regularly gridded astronomical image data between different locations’ (Grosbøl et al. 1988, 359; cf. also McCray 2014). It was quickly adopted and endorsed by all major observatories and space agencies. FITS files are calculable objects which link metadata to images and tables; they have been the dominant data format in astronomy for more than 30 years. The FITS format has shaped astronomers’ understanding of what their data are like.Footnote 5 Its dominance contrast with the diversity of data formats in disciplines like biology (Leonelli 2016), the Earth sciences (Halfmann, this volume) and economics (Morgan, this volume).

2.3 Social Institutions: Sharing Instruments and Data

The third aspect of astronomy’s architecture for observation is institutional.Footnote 6 Since the 1960s, a dominant fraction of astronomical data has been produced by public observatories built and operated using tax money. In their process of allocating observing time, peer-review committees at major observatories and space agencies consider proposals from a diverse, international community of academic users. Current practices of observation and data management are deeply informed by how satellite telescopes and radio observatories have been operated since the late the 1970s. These data have been digital throughout. Produced mostly at public institutions, they were made exclusively available to applicant users only for a period of proprietary use (typically 6 or 12 months), after which they became public. The commitment to do so instigated the formation of public data archives. Another defining element of the operation of satellite and radio observatories was the introduction of service-mode observing (Hoeppe 2018). Authors of observing projects can use data earlier, but they do not have preferential access to the local context of data production, including the ‘tacit knowledge’ of observatory staff members.

3 Re-Using Data to Assess an Astronomical Discovery Claim

Given this background I now consider a discovery claim and its subsequent evaluation, in which the original data, available publicly at the end of a period of proprietary use, were re-used and re-assessed in the light of additional observations.

3.1 Record Distance: “A Lensed Galaxy at z = 10.0”

In 2004, a group of five astronomers led by Roser Pelló of the Observatoire Midi-Pyrénées in Toulouse (France) announced the discovery of a galaxy at record distance from Earth (Pelló et al. 2004a). These researchers had used detectors at three large telescopes to observe clusters of galaxies, which, because of their considerable mass, are thought to act as gravitational lenses which focus the light emitted from faint distant background sources. By utilizing this ‘gravitational telescope,’ they hoped to exceed the sensitivity of previous searches for the most distant galaxies. What astronomers call redshift (abbreviated as z) is a measure of how much the wavelengths of the light emitted by cosmic objects are stretched due to cosmic expansion, shifting specific spectral features to longer wavelengths. Adopting a specific cosmological model allows computing both the distance and the look-back time, that is, how long this light has traveled to reach observers on Earth. Pelló et al. claimed to have discovered a galaxy at redshift 10.0 behind the galaxy cluster Abell 1835, corresponding to a look-back time of more than 13 billion years. This was a momentous claim, given that spectroscopically confirmed, and thus presumably reliable, record-redshifts had increased more or less steadily from z = 5.7 in 1993 to ‘only’ z = 6.5 in 2004, with a few redshift 7 candidates awaiting spectroscopic confirmation (Hu and Cowie 2006).

The Toulouse team relied on two lines of evidence. The first was a series of digital pixel images taken through a series of broad-band filters (each transmitting light of a specific wavelength range) in visible and near-infrared light using the Wide-Field/Planetary Camera (WFPC2) of Hubble Space Telescope (HST), the 3.6-meter Canada-France Hawaii Telescope (CFHT) on Mauna Kea (Hawaii) and, with the Infrared Spectrometer And Array Camera (ISAAC) at one of the European Southern Observatory’s (ESO) four 8-meter Very Large Telescopes (VLT) on Paranal (Chile). These data, throughout in FITS format, were obtained in service mode. Pelló et al. first reduced the digital images of Abell 1835, detected objects using SourceExtractor (Bertin and Arnouts 1996), a code widely used in the community, and assembled a catalogue of photometric measurements of the detected sources in the exposures of all the filters used.

As in other attempts to find distant, young galaxies, Pelló et al. then searched for a discontinuity in the observed spectral energy distributions. To qualify as candidate high-redshift galaxies, objects had to be detected at longer (near-infrared) wavebands only, but not at shorter (visible) ones. The ‘break’ in-between, ascribed to the observed wavelength of the redshifted Lyman α spectral emission line of hydrogen, was expected from previous observations of distant galaxies and simulated model spectra.

Object #1916 in Pelló et al.’s catalogue was the most promising candidate. It was not detected in visible light, but in three near-infrared wavebands, with an apparent ‘jump’ between the so-called J-band (around 1.26 μm) and the H-band (around 1.65 μm; Fig. 1). This suggested a redshift around 10 to Pelló et al., even though detections in each single detection were only marginally statistically significant.

Fig. 1
Six digital photographic negatives labeled a to f. Photo a illustrates the J H K band with the field galaxy and a star. The other photos display W F P C 2 slash R, J, H, K, and J H K.

Figure 1 of Pelló et al. (2004a), showing digital photographic negatives of exposures of the Abell 1835 galaxy cluster using the Infrared Spectrometer And Array Camera (ISAAC) at the Very Large Telescope (VLT, Chile; above) with exposures of the field around the candidate high-redshift galaxy #1916, as taken with the WFPC2 camera on board the Hubble Space Telescope in the visual R band (bottom left) and the near-infrared J-, H-, and K-bands using ISAAC (bottom right). Pelló et al. claim the detection of #1916 in the J-, H- and K-bands. (Reproduced with permission © ESO)

The Toulouse team’s second line of evidence was a spectroscopic analysis. They recorded spectra of #1916 in the J-band, also with the ISAAC instrument at the VLT. Long exposures taken in two different observational set-ups suggested to them a statistically significant signal of a spectral line at a wavelength of 1.337 μm. Interpreting it as the redshifted Lyman α emission, they inferred a redshift of 10.0 for #1916. Pelló et al. argued that finding a galaxy at such a high redshift, whose light was emitted only 460 million years after the big bang, was in accordance with theoretical models of galaxy formation and cosmology. On March 1, 2004, ESO published a press release entitled ‘VLT smashes the record of the farthest galaxy known.’Footnote 7 It was widely taken up by popular news media.

3.2 Three Hot Pixels

Pelló et al.’s ISAAC/VLT observations became public through ESO’s data archive website on March 3, 2004, one year after the observations were recorded, and 2 days after publication of the press release. Several scientists retrieved the data for scrutinizing the analysis and for re-assessing the data in light of additional observations. Soon thereafter the Toulouse team’s second line of evidence was challenged. Stephen Weatherley from Imperial College London and colleagues processed the spectroscopic data with an independent approach (Weatherley et al. 2004). After failing to confirm the spectral line, they tried to identify the discrepancy with the analysis of the Toulouse team of Pelló et al. (2004a), which they refer to as P04, by replicating their procedure:

To find the cause of the discrepancy between our results for the Lyα line and those reported by P04, we re-reduced the data following the principles of P04, i.e. subtracting frames in pairs, then wavelength calibrating the frames, rebinning onto a linear wavelength scale. In this process we made a careful check for bad data. We identified three variable hot pixels3[pixels which did not record incoming light linearly and have to be excluded from the analysis] which result in spurious positive flux in four of the sky-subtracted frames in the region of the emission line. We confirmed that these are very easily identified when the frames are registered to the nearest pixel, but are harder to spot when the data are rebinned in the wavelength calibration step. The summed spurious positive flux, when averaged into the entire data set, corresponds approximately to the flux measured by P04; therefore these variable hot pixels plausibly account for the difference between our results and those of P04.

3 These have coordinates (28, 761), (28, 836), (919, 790) in the raw frames. (Weatherley et al. 2004, L32)

Weatherley et al. recognized that one step in the reduction procedure adopted by Pelló et al. (2004a) – ‘rebinning the data onto a linear wavelength scale’ – had caused them to fail to identify the three hot pixels as artifacts that, in a proper analysis, had to be removed from the data. In other words, Weatherley et al. could replicate the signal reported by Pelló et al. only if making what they thought was a mistaken use of the data. By listing the positions of the hot pixels in the raw frames in a footnote, Weatherley et al. made the Toulouse team accountable in detail to their treatment of the raw data.

3.3 A Transient Source?

It did not take long until the Toulouse team’s first line of evidence (an object detected with the photometric properties of a high-redshift galaxy) was challenged as well. Only in combination with the photometry measured through broad-band filters was the high-redshift interpretation of the spectral line plausible. A single spectral line itself would not have provided substantial evidence for any galaxy’s redshift, since the spectra of young, intensively star-forming galaxies exhibit several perspicuous spectral lines at widely different wavelengths. Their individual detection would point to different, and generally smaller, redshifts. Pelló et al.’s claim that the spectral line detected of #1916 was the redshifted Lyman α emission of a galaxy critically depended on the detected discontinuity in emission between the near-infrared J and H wavebands.

However, as pointed out by a team led by Malcolm Bremer of the University of Bristol (UK), both of these detections were ‘not highly significant’ (Bremer et al. 2004, L1). Shortly after the publication of Pelló et al.’s paper, Bremer and his colleagues were granted two blocks of Director’s Discretionary TimeFootnote 8 for using the NIRI (Near Infra-Red Imager) camera at the 8-meter Gemini North telescope on Mauna Kea (Hawaii) to obtain a deeper exposure of #1916 in the H-band. In their resulting publication, Bremer et al. (2004) state that they aim to ‘better constrain the H-band photometry (…) and to investigate the morphology of the source under the excellent seeing conditions that are often attainable at Gemini-North’ (Bremer et al. 2004, L2). Thus, they write that they are not merely out to replicate Pelló et al.’s claim but seek to refine their interpretation.

Even though Bremer et al.’s (2004) Gemini NIRI observations had been taken under excellent conditions and being significantly deeper, i.e. more sensitive, than the ones taken for Pelló et al. at the VLT, they failed to detect #1916 in the H-band. Their paper is a comprehensive exercise in making sense of this non-confirmation. They did so by first re-reducing Pelló et al.’s H-band data, which they showed side-by-side along with their deeper H-band image (Fig. 2), confirming that their photometric calibration agrees well with that of Pelló et al. Next, Bremer et al. set out to probe whether, with their method and new data, they could have accidentally failed to detect #1916. For doing so they placed artificial objects into their digital exposures and demonstrated that, using their source detection and photometry algorithms, they could retrieve the properties of these objects, illustrating the soundness of their measurements. As such, they called the discontinuity between the J- and H-band fluxes into question, and with it a critical piece of evidence for the redshift of 10.0. Maintaining a cautious and considerate tone throughout, Bremer et al. discuss that #1916 may not existent or be intrinsically variable, considering if a transient object in the outer solar system could have been spotted in some exposures. They conclude that ‘the reality of any source at this position [of #1916] has to be strongly questioned’ (Bremer et al. 2004, L4).

Fig. 2
Two digital photographic negatives display H-bands. The right photo contains more dark spots than the left photo.

Figure 1 of Bremer et al. (2004), showing their re-reduction of Pelló et al.’s (2004a, b) H-band image taken with ISAAC at the VLT (right) along with new H-band observations made with the NIRI camera at the Gemini North telescope at Mauna Kea (Hawaii). Bremer et al. emphasize that they have used the same display parameters as Pelló et al. Note that these images are rotated relative to those shown in Fig. 1. (© AAS. Reproduced with permission)

The lack of a detection at visible wavelengths was another piece of Pelló et al.’s evidence for the high redshift of #1916, an argument informed by model spectra energy distributions of young star-forming galaxies. To probe this further, members of Bremer’s team, now under the lead of Matt Lehnert of the Max Planck Institute for extraterrestrial Physics in Garching (Germany), succeeded to obtain Director’s Discretionary Time at the VLT to obtain additional deep imaging in the (visible) V-band. They wrote: ‘A V-band detection would be decisive: it would demonstrate beyond any doubt that the source is not at z = 10’ (Lehnert et al. 2005, 81, emphasis in original). Other than in their previous paper, their objective now appears to challenge Pelló et al.’s discovery claim. Despite going deeper than Pelló et al.’s previous V-band images, which had been taken with the Hubble Space Telescope, and with assessing their detection limit by again placing faint artificial objects into their digital exposure and retrieving them using algorithms, Lehnert et al. fail to detect #1916 in the V-band. They note that, ‘[f]ormally, a nondetection is consistent with the candidate having a redshift of 10’ (Lehnert et al. 2005, 82), and then embarked on a long critical discussion of how a transient source, such as a supernova explosion or an object moving in the outer solar system, could have conspired to produce the signal that Pelló et al. claimed, finding none of these scenarios compelling.

3.4 Lost in the Noise

Yet another group of astronomers combined new observations of #1916 with a re-analysis of Pelló et al.’s ISAAC/VLT data. For an independent study of Abell 1835, Graham P. Smith of the California Institute for Technology and colleagues at the University of Arizona (USA) had been granted spectroscopic observations using LRIS, the Low-Resolution Imaging Spectrograph at the 10-meter Keck telescope on Mauna Kea (Hawaii), and infrared images taken with the Spitzer Space Telescope, a satellite observatory. These researchers were able to modify their observing run with LRIS so as to include the position of #1916, and to search for it in the Spitzer images which had been scheduled prior to Pelló et al.’s discovery announcement. Neither of these observations yielded a detection at the position of #1916. It is noteworthy that Smith had been the principal investigator of the Hubble Space Telescope WFPC2 observations of A1835 that Pelló et al. (2004b) (re-)used.

Smith et al. (2006) then went on to re-analyze Pelló et al.’s H- and K-band data (see Fig. 3). After not detecting #1916 with what they regarded as a proper analysis set-up, they experimented with alternative algorithm settings (smoothing the images, varying the size of the detection area etc.) to find out under which conditions Pelló et al.’s near-infrared images would yield the detection they claimed. Doing so was similar to Weaverley et al.’s (2004) re-analysis of Pelló et al.’s ISAAC/VLT spectra. Smith et al. (2006) wondered how the apparently elongated shape of #1916 (as seen in Fig. 1, center of the bottom panel, and Fig. 2, right image) could be reproduced. They found that only, and inappropriately, searching for objects at an angular scale smaller than the resolution of the exposures would yield the stated detection at the position of #1916. Doing so would make it one of 500 comparably large statistical fluctuations across the field, each of which could have been mistakenly held for a detection. They conclude that ‘there is no statistically sound evidence for the existence of #1916’ (Smith et al. 2006, 580).

Fig. 3
Two digital photographic negatives display the H-bands at 1.6 micrometers and the K-bands at 2.2 micrometers.

Figure 1 of Smith et al. (2006), showing re-reductions of Pelló et al.’s (2004a, b) H and K near-infrared images of the field around the position of the high-redshift galaxy candidate #1916. Note that these images are rotated relative to those shown in Fig. 1. Using the same data, Smith et al. fail to replicate Pelló et al.’s H- and K-band detection. (© AAS. Reproduced with permission)

3.5 The Toulouse Team Responds to Its Critics

Progressively faced with these accounts, the Toulouse team first endorsed Bremer et al.’s speculation that #1916 might be variable and announced a more detailed investigation (Péllo et al. 2004b).Footnote 9 Two years later they presented a comprehensive analysis of their search for distant galaxies in the fields of the galaxy clusters A1835 and AC114 (Richard et al. 2006). It includes improved photometry of #1916, which they rename as A1835#8, and, in separate online material, a newly estimated redshift: z = 7.38, which is much lower than the claim of a record redshift (Richard et al. 2006, Online Material, p. 4). Citing Lehnert et al. (2005) and Smith et al. (2006), Richard et al. acknowledge in their main paper that ‘the photometric properties of this source are still a matter of debate’ and notice that ‘its nature (and hence also its redshift) presents a puzzle’ (Richard et al. 2006, 873). They drop it from their list of high redshift galaxy candidates without addressing the alternative analyses of Bremer et al. and Smith et al., whose data had meanwhile become public.Footnote 10

All critics of the Toulouse team acknowledged communications with Roser Pelló in their publications (Weatherley et al. 2004, L29, L30; Bremer et al. 2004, L4; Lehnert et al. 2005, 84; Smith et al. 2006, 581). In their 2006 paper, the Toulouse team in turn acknowledges its critics’ ‘useful comments and discussions’, including Graham Smith and his co-author Egichi Egami (Richard et al. 2006, 879). A closer reading of their paper suggests that the Toulouse team’s refined data analysis is informed by their critics. The Online Materials to their paper are particularly interesting. There they describe improvements in the data reduction and attend carefully to the assessment ‘false-positive detections.’ Not only did they now probe their completeness statistics with inserting (and algorithmically retreiving) artifical stars into their digital images (Richard et al. 2006, 867), as Bremer et al. (2004) had done (see above). They also argue for a careful analysis of the noise properties of near-infrared images that echoes the comments and recommendations of Smith et al. (2006). These Online Materials thus communicate the Toulouse team’s adoption of specific sequential operations of work with near-infrared exposures first adopted by secondary data users. As such, members of the Toulouse team repaired (or corrected) its data analysis practices.

On September 27, 2010, the European Southern Observatory added a note to the 2004 press release on its website, stating that the ‘identification of this object with a galaxy at very high redshift is no longer considered to be valid by most astronomers.’Footnote 11

4 Discussion and Conclusions

This discovery claim and its subsequent dismissal is an episode of astronomical data journeys that involved 18 astronomers and data from seven different detectors attached to four large ground-based telescopes (one in Chile, three in Hawaii) and two satellites. These diverse data ‘met’ in ‘cartographic’ digital images, as well as in discipline-specific representational spaces: in tables listing measured radiation fluxes as a function of wavelengths, and in their graphical representation as spectral energy distributions (SEDs), typically with model SED shapes overlaid (as in Fig. 1 of Pelló et al. 2004a). Once its proprietary period had ended, Pelló et al.’s (2004a) ISAAC/VLT data were being successively re-analysed in the light of additional observations, and the question turned to what Pelló et al. had done with the data to see what they saw. Given that their observations were done in service mode, Pelló et al. did not have preferential access to the local context of data production at the observatory.

To see (or not to see) #1916 in the reduced images was distinctly shaped by specific equipments and work practices (cf. Lynch 2013). Bremer et al. and Smith et al. present images of their re-reductions of Pello et al.’s VLT/ISAAC data used for the discovery claim alongside reductions of their supplementary data. The critics insist that one has to make specific identifiable and describable mistakes to make #1916 visible as a high-redshift galaxy. Weatherley et al. (2004) claim that the presumed spectral line becomes visible only when three hot pixels are not properly deleted from the data set, and Smith et al. (2006) found that only when parameters are set to values they consider inappropriate did the search algorithm identify #1916 as a proper source. All participants agreed that at least two lines of evidence were necessary to claim the discovery of a high-redshift galaxy, a shared demand for the robustness of evidence (see the chapters by Halfmann, Parker and Wylie).

Pelló et al.’s (2004a) discovery announcement elicited the critical responses and was as such generative of a sequence of actions. The unfolding ‘text-reader conversation’ (Smith 2001) was marked by a series of comparisons involving re-analyses of Pelló et al.’s VLT/ISAAC ‘raw’ data (as available on the observatory website) and re-assessments of the initial detection. The results of these re-analyses were made witnessably visible (see Figs. 2 and 3). This conversation was not entirely virtual, with scientists reading each other’s papers and working with the original data set in different ways. As mentioned above, all critics acknowledge communications with Roser Pelló, the lead author of the Toulouse team.Footnote 12

While any description of action is unavoidably incomplete at some level of detail (Livingston 2008, 161), the critics of the Toulouse team point to omissions of descriptive detail in the Pelló et al. (2004a) article that could challenge their replicability. Thus, Weatherley et al. (2004, L31) miss a proper description of Pelló et al.’s bad pixel rejection methods, Bremer et al. (2004, L3) bemoan the unspecified observing time of the H-band exposure, and Smith et al. (2006, 576) note that Pelló et al. ‘neither explain how they reduced the [Hubble Space Telescope WFPC2] data nor how the detection limit was calculated.’ However, these critics claim to have been able to re-construct what Pelló et al. had done nevertheless (perhaps thanks to Roser Pelló’s clarifications; see above) – at least to their own satisfaction and expectation of what they themselves could be held accountable to. In this sense, the open access to data made analysis practices available for inspection by other researchers. This opens the way to a deeper mutual understanding, and possibly agreement, of what proper procedures for using these data are.

As such, this episode can be read as an instance of the repair of data use practices. Members of the Toulouse team ended up learning from secondary users of ‘their’ data, making their revised understanding witnessable in the Online Materials of their Richard et al. (2006) article. It seems, then, that it was through the (separate) circulation of a discovery claim and the ‘raw’ data on which it was based that practices could travel from data re-users ‘back’ to those for whom the data were originally recorded. The ‘raw’ data themselves were not repaired, but remained fixed as the first element of a ‘text-reader conversation’ (Smith 2001). The work described was reflexive, inasmuch as past actions were re-interpreted in the light of new data and analyses, and made witnessable as such. In terms of its mediated character and its episodic temporality that extended over 2 years, the repair of practices in this episode was markedly different from conversational repair or instructional correction (Macbeth 2004; Schegloff 2006). However, as argued previously for cases of maintaining, or re-establishing the functioning of, motor boats (Sohn-Rethel 1990), buildings (Henke 2000), scientific instruments (Schaffer 2011), infrastructures (Graham and Thrift 2007) and credibility (Sims and Henke 2012), the notion of repair is illuminating in its orientation to social, material and natural orders.

The architecture for observation that I described in Sect. 2 provided resources for the assessment and repair of data and data use practices. First, there are its objectual features. The ‘immutability of the heavens’ has been instrumental already for assembling the data set that the Toulouse team gathered over a period of 2 years (Pelló et al. 2004a). The use of celestial coordinates for achieving reference was not described as being problematic in this episode. Only in respect to the possibility that Pelló et al. may have detected a transient source were time-variable phenomena, such as small objects moving in the outer solar system or supernovae, invoked (by Lehnert et al. and Smith et al.) as interpretive resources.

Secondly, there are its technological and medial features. The importance of the digitality of data is illustrated not only by its apparent mobility (through information infrastructures), which – like the FITS data format – is presumed throughout and not mentioned in the publications cited, but also by the possibilities of analysis afforded by this medium, including Smith et al.’s experimenting with inserting artificial objects into their images and their detailed assessment of the statistical properties of noise in their infrared images. The Toulouse team later adopted these techniques.

Thirdly, this episode was shaped institutionally not only by the open access to Pelló et al.’s VLT/ISAAC data after the proprietary period, which made it possible for others to reconstruct and criticize their actions. With the exception of having access to the data earlier, Pelló et al. used ESO’s data archive just like those who later scrutinized, and contested, their discovery claim.

The possibility of re-using data for making sense of what the Toulouse team had done to see what they saw arguably contributed to avoiding a discourse in which a discovery claim was directly confronted with counter-evidence, resulting in its dismissal. As interest turned from the presumed discovery of a specific galaxy at record distance to the viability of the method of using galaxy clusters as ‘gravitational telescopes’ for such work, the reputation of the Toulouse team was not damaged beyond repair. Indeed, its members have continued to do much respected research in the field.Footnote 13 Since their data had been taken by observatory staff in service mode, Pelló et al. could not be blamed for lacking technical skill or manipulative intentions in producing their data. Although Pelló et al. were informally blamed for having issued an overly bold and ultimately mistaken claim, nobody accused them of fraud. Galison (2003) and Leahey (2016) have pointed out that scandals of fraud are rare or even absent in contemporary astronomy, ascribing this mostly to the dearth of commercial interest and the large team sizes in the discipline. Going beyond this claim it seems that if there is a particular ethos of sharing in astronomy, it may well be constituted by the ‘tyranny of accountability’ (Enfield and Sidnell 2017) of this work with open access data in astronomy’s architecture for observation.