In 1993, the Sanger Institute—initially named the Sanger Centre—was founded in Britain as an institution to carry out large-scale genome mapping and sequencing.Footnote 1 It represented a significant departure from the strategy of the UK Human Genome Mapping Project (HGMP) and from the prior contribution of British laboratories to the European Commission’s Yeast Genome Sequencing Project. The Sanger Institute, instead, aligned more with the objectives of the US Human Genome Project (US-HGP) and the genome centres that both the US Department of Energy and National Institutes of Health were establishing to fulfil them. Rather than mapping and sequencing modest amounts of DNA—chiefly at the request of other laboratories—the Sanger Institute undertook to sequence substantial parts of the whole genomes of yeast, Homo sapiens and the worm Caenorhabditis elegans at its own initiative.

This chapter shows how the emergence of the Sanger Institute changed the landscape of genomic science internationally. We start by situating its origins in the rise of the Wellcome Trust from a modest British charity to a major biomedical funder, one that crucially impacted on the contestation between state-supported and commercial institutions over the ownership of the human genome map and sequence. The Wellcome Trust allied with the UK Medical Research Council (MRC) in supporting the establishment of a large-scale centre in the UK where John Sulston could finish the sequencing of C. elegans and contribute to the completion of the yeast and human reference genomes.

We stress the pivotal role of Sulston, along with Wellcome Trust manager Michael Morgan, in the conception and ethos of the Sanger Institute. This institution consolidated the factory-style operation that Sulston had envisaged for the sequencing of C. elegans, as a departure from the way he and his collaborators had constructed the physical map of this organism (Chap. 2). The Wellcome Trust, as the main funder of the Sanger Institute, provided Sulston with the necessary financial and organisational flexibility to overhaul the à la carte, community service approach embodied in the mapping of C. elegans and later in the HGMP. As a result, the Sanger Institute avoided establishing itself as an academic institution connected to a university or research council, and instead became a scientific centre that was managed by a charitable company, which was called Genome Research Limited. Due to this, an emphasis on efficiency and accountability in map and sequence data production became central to the identity of the Sanger Institute: these objectives were prioritised over contributing to answering research questions.Footnote 2

The chapter finishes by showing how, three years after the Sanger Institute was launched, Sulston and the Wellcome Trust decisively pushed for the unrestricted release of human sequence data at a conference—the First International Strategy Meeting on Human Genome Sequencing—held in Bermuda. The conference was attended by representatives from established genome centres in the USA and UK, as well as other institutions that were increasingly aligning with this form of conducting and organising human genomics. It led to the formation of what became known as the International Human Genome Sequencing Consortium (IHGSC), the institutional alliance that between 2001 and 2004 published the reference sequence of the whole human genome in the scientific literature and made it available in freely-accessible databases.Footnote 3

We stress how the concentration of mapping and sequencing operations in these institutions—via the channelling of grants to the IHGSC members—was key to how they coordinated the production of a single reference sequence. To this end, the IHGSC led the construction of a bespoke physical map that was designed to aid in the determination and assembly of the sequence. As a result, both the reference sequence and the map were shared products of consensus across the IHGSC, albeit in a way that involved restricted participation compared to the wider institutional involvement in the HGMP, the European Commission’s Human Genome Analysis Programme (HGAP) and earlier stages of the US-HGP. The IHGSC reference genome was also narrower than other preceding maps and sequences in terms of the variability across different individuals and human populations that it captured.

The publication of the IHGSC reference sequence, which was announced worldwide at a ceremony in the White House in 2000, led to the widespread belief that the Human Genome Project was a single, international initiative that had always sought to make the entire human genome sequence available in the public domain. The IHGSC members coordinated a number of factory-style genome centres to sequence the whole genome, and presented the reference sequence as successfully and rapidly completed in draft form in the ensuing Nature article.Footnote 4 Yet as we document in this book, the determination of the human reference sequence was but one of a plethora of initiatives and models that co-existed in the early-to-mid 1990s, some of them converging in the IHGSC effort and some being sidelined. Some sidelined models and initiatives, we argue, continued as distinct lines of genome research during and after the IHGSC endeavour. We conclude this chapter by showing how documenting these sidelined lineages allows us to move beyond the canonical history of genomics that portrays the human reference sequence as its totemic outcome. Our history also illustrates the funnelling effect that the Sanger Institute—and, more generally, the IHGSC—has had on the organisation and practice of genomics.

1 C. elegans Sequencing and the Patenting Controversy

Sulston’s C. elegans sequencing project continued a longstanding line of research on this organism at the Laboratory of Molecular Biology (LMB) in Cambridge, UK. As we showed in the previous chapter, C. elegans had been proposed as a model for investigating the genetics of development and behaviour by Sydney Brenner, and this became an early line of research at the LMB, which was founded in 1962. The worm C. elegans has since consolidated as a widely-adopted model organism. The requirements of the growing community of researchers investigating C. elegans genetics provided the basis for an international project to construct a physical map of its genome and later to determine its reference sequence, which was led by Sulston and Robert Waterston of Washington University in St Louis (WU). Yet the sequencing project, which started in 1989, substantially differed from the mapping exercise that the same scientific team had initiated five years earlier.

Firstly, Brenner had left the LMB in 1986 to establish a Molecular Genetics Unit at Addenbrooke’s Hospital in Cambridge, where he first proposed the HGMP. This had led Sulston and his LMB associate, Alan Coulson, to gain control of the C. elegans mapping project. Under Brenner’s leadership and before the start of the mapping, Sulston had compiled the lineages describing every cell division during the worm’s embryonic and post-embryonic development. Coulson had joined the C. elegans team from another LMB division headed by the inventor of the first DNA sequencing techniques, Frederick Sanger. Just before the start of the mapping effort, Coulson had applied Sanger’s sequencing methods to the determination of the full genome of various microorganisms, among them the PhiX174 and lambda bacteriophages (García-Sancho, 2010, p. 306ff). For Sulston and Coulson, these prior experiences cultivated a different vision of how to organise the description of the worm’s genome. Rather than following the requests of other laboratories seeking to locate genes—as had been the case with the C. elegans mapping project and would be so with the HGMP—they sought a comprehensive characterisation of the worm’s genome in which sequencing would be conducted at their own initiative.

Secondly, Sulston and Coulson’s vision aligned with the strategy that James Watson was starting to formulate at the US National Institutes of Health (NIH) Office for Human Genome Research. The start of the C. elegans sequencing project was preceded by a meeting between these three scientists and Waterston, who had commenced his involvement in the worm mapping project from his position at WU. By the end of the 1980s, Watson was starting to deploy the model of large-scale genome centres through which the NIH would contribute to the US-HGP, alongside the national laboratories of the US Department of Energy. One way in which he sought to increase the scale of genomic work was by supporting groups willing to tackle the full genomes of other organisms, in order to transfer the know-how developed in these efforts to the sequencing of H. sapiens. Boosting the groups at the LMB and WU was thus a priority for Watson.

Sulston, Coulson and Waterston’s presence at the 1989 biennial symposium of the C. elegans community was seen as an opportunity by Watson. The meeting was held at Cold Spring Harbor Laboratory, an institution that Watson directed alongside his newly-acquired NIH role. Watson approached the three worm researchers and invited them to apply for funding through the NIH Office, which that same year became the National Center for Human Genome Research; it was later redesignated as the National Human Genome Research Institute (NHGRI).Footnote 5 The funding would jump-start a comprehensive sequencing operation of the genomes of C. elegans and yeast: the former at WU and the LMB, and the latter at WU and Stanford University (Chap. 2). The NIH would support the sequencing of several yeast chromosomes and 3 million of the 100 million nucleotides of the worm’s genome, as long as the UK Government committed to provide two-thirds of the funding on the LMB side.

Watson’s plan was that this initial funding could subsequently be extended to broader areas of these and other genomes. As discussed earlier, Sulston describes the Cold Spring Harbor meeting in his memoirs as a “prison door” moment that shaped his scientific life forever: from the mapping and sequencing of the worm to the sequencing of the human genome (Chap. 2, see also Sulston & Ferry, 2002, p. 13). In what follows, we argue that the alignment of Sulston and Watson’s visions enabled the genome centre model to expand and gradually acquire international influence. More importantly, this alignment started narrowing the array of practices and institutional configurations that were considered to be genomic science.Footnote 6

Sulston and Coulson accepted Watson’s proposal and approached the MRC for the British tranche of funding. In their application, they presented a three-phase operation of which the MRC grant—if awarded—would only support the first two. These first two phases comprised “testing technical and managerial procedures” and the sequencing of about one million nucleotides of the C. elegans genome over three years. The work would develop at “extensions” of Sulston and Coulson’s existing laboratories and require the purchase of automatic instrumentation. The team would comprise three technicians and four scientists—including Sulston and Coulson—that would combine large-scale sequencing with the continued refinement of the physical map. The third phase would extend the sequencing endeavour to the whole C. elegans genome and be conducted at a “factory setting” to be established at “industrial estates” or other areas outside of traditional academic centres. By then, the team members of phases one and two would “move forward as team leaders” and additional “relatively unskilled” junior staff would need to be hired.Footnote 7

Sulston and Coulson explicitly stated in the application that they had been “encouraged to expect extensive support” from the US-HGP, this being the reason for a whole phase of their proposed operation starting after the MRC grant had concluded. In this regard, they argued that the infrastructure and “scale” required for the third phase could not be funded “in its entirety through the normal granting procedures”.Footnote 8

The MRC funded the proposal through the first cycle of grants awarded by the HGMP (1989–1992). Yet Sulston and Coulson anticipated that there would be no funding mechanisms—not from the HGMP or any other usual biomedical grant-giving body—to further develop the sequencing of C. elegans once phase two had finished. The worm sequencing project had already been an outlier compared to the other HGMP grants (see examples in Chap. 3). This divergence increased in 1992, when the already expanded facilities of Sulston and Coulson needed to be transformed into an industrial sequencing facility. The duration and level of support that Sulston and Coulson required, as well as the factory-style institution they envisaged, was incompatible with the HGMP grants and their remit of funding targeted mapping and sequencing work at existing academic laboratories. Furthermore, by the early-to-mid 1990s, the HGMP was becoming increasingly involved in controversy.

One layer of controversy was between the interests of human and medical geneticists on the one hand, and the goals of the HGMP on the other. Following the first round of grants, the MRC and the advisory committees of the HGMP recommended tightening the funding criteria, in order to ensure that work supported by the Directed Programme of awards would address areas of the human genome that had not already been targeted by gene mapping projects. Given that the diseases and other traits on which British laboratories worked implicated a limited number of human genes, their research priorities started to diverge from the requirements of the Directed Programme. As a result of this, unless the applicants artificially expanded the areas they worked on, the HGMP Resource Centre would either face duplicate mapping requests from the grant holders or receive data on gene loci that did not fill the gaps in the human genome database (Balmer, 1996).

The second controversial issue was less internal to the HGMP and affected the ownership and patentability of DNA sequences. Following the launch of the US-HGP in 1990, the NIH allowed patent applications to be filed for DNA sequences comprising or connected to human genes. This move aligned with the scientific policies that had led to the emergence of the biotechnology industry the decade before, spurred during Ronald Reagan’s Administration and continued in George H. W. Bush’s (Rasmussen, 2014; Yi, 2015). The patenting of sequences triggered a heated debate, with some scientists and administrators vehemently opposing the creation of proprietary rights on such fundamental data. Among the fiercest critics of these practices was Watson, who in 1992 resigned from his US-HGP leadership position in protest. Although the NIH subsequently changed its policy and increasingly discouraged the patenting of the results of sequencing that it funded, other scientists and institutions welcomed the exercise of property rights on DNA sequences.

This was the case for Craig Venter, a biochemist initially based at the NIH Institute of Neurological Disorders and Stroke. Like many yeast genomicists, in the 1980s he realised that emergent DNA techniques would enable him to turn his attention from examining functionally-relevant proteins to identifying and analysing the genes involved in their synthesis. In 1992, Venter left the NIH due to his growing frustration with what he perceived as a conservative attitude: neurogeneticists and administrators in his home institute continued to focus on a set of pre-defined brain conditions rather than using recombinant and DNA sequencing techniques to find genes on a larger scale (Venter, 2007). He founded a charitable organisation called The Institute for Genomic Research (TIGR), which would go on to generate a large number of human DNA sequences that were potentially linked to diseases with a genetic basis.

The sequences were patented by TIGR and licensed exclusively to its partner biotechnology company, Human Genome Sciences (Jackson, 2015). The method that Venter used to locate and determine them involved producing Expressed Sequence Tags (ESTs). It yielded similar results to the complementary DNA (cDNA) sequencing approach that Ross Sibson was pursuing at the HGMP Resource Centre (Chap. 3). This activity led the MRC to also patent its cDNA sequences, in spite of the growing scientific and public controversy. Both MRC officers and HGMP staff justified the patents as a defensive move that would “protect the sequences” from proprietary commercial exploitation by Venter or any other entrepreneur.Footnote 9

The growing commercial interest in DNA sequences also affected the C. elegans project. In 1992, when the first two phases of the sequencing operation were close to completion, Sulston and Waterston were approached by Frederic Bourke, a US entrepreneur who wanted to enter the biotechnology market after a successful career in the retail industry. Bourke proposed the creation of a company that would complete the sequence of C. elegans and tackle the human genome. The firm would be based in Seattle, where yeast genome mapper Maynard Olson was moving after the University of Washington had established a Molecular Biotechnology Department with funding from IT tycoon Bill Gates (Chap. 2). Waterston and Sulston were never fully convinced of the feasibility of Bourke’s proposal. They both preferred to continue to be state-supported scientists, but the end of the MRC and NIH grants was causing uncertainty about their next move.Footnote 10

This led Sulston to discuss his future prospects with Aaron Klug, a structural biologist who had succeeded Brenner as director of the LMB. They both believed that in order to undertake the third stage of the worm project, Sulston would need a funding scheme committed to large-scale and comprehensive genome sequencing. Given the much more specific remit of the HGMP, Klug offered to mediate between the MRC and the Wellcome Trust, a charity that by the early-1990s was substantially reconfiguring its strategy and involvement in genomic science. These conversations led to the realisation of Sulston’s envisaged factory-style operation, in the form of a genome centre that undertook significant chunks of the whole-genome sequencing of C. elegans, yeast and H. sapiens. The new British centre aligned with the large-scale genome centre model of the US-HGP, and distanced itself from the distributed approach of the HGMP and the European Commission.

2 The Wellcome Trust and its Advocacy for a ‘New Genetics’

The Wellcome Trust’s status as a biomedical funder predates genomics research, the establishment of the LMB and the emergence of molecular biology. This charity was created in 1936, following the death of Henry Wellcome, owner of the British-based pharmaceutical company Burroughs Wellcome, which was later renamed as the Wellcome Foundation. The Wellcome Trust took ownership of the pharmaceutical company with the charitable mission of using its revenues to advance medicine through support for research (Hall & Bembridge, 1986). In 1986, it began a new strategy for its charitable work that consisted of gradually selling the shares of the pharmaceutical company and reinvesting the income. This strategy, engineered by the Trust’s new director of finance Ian Macgregor, meant that if the investments were successful, the revenue would generate potentially endless capital. This capital could then be used by the Trust to fund research and operate independently from the Wellcome Foundation. As the sale of shares increased throughout the late-1980s and early-1990s, so did the independence of the Trust, its resources to invest and, ultimately, its ability to function as a funding body.

This period of considerable growth coincided with important developments in genetics, a substantial part of which derived from the application of recombinant DNA techniques to medical research. In 1991, the Wellcome Trust appointed an expert group to advise on how best to support and seed the new genetic medicine. One of its first interventions was funding, along with the European Commission’s HGAP, the holding of chromosome mapping workshops in Europe. Yet as their investment revenues rose, the Trust became keen to distinguish itself from other charities focused on specific diseases and conditions such as the Imperial Cancer Research Fund (ICRF). During the early-to-mid 1990s, at the same time that the ICRF became a main driver and participant in the HGMP, the Wellcome Trust developed a strategy with its advisory group to fund the establishment of research centres bridging genetics and medicine across the UK.Footnote 11

This was the context in which Klug’s mediation between the Wellcome Trust and the MRC took place. During the first half of 1992, he brokered a series of meetings between teams headed by Dai Rees and Bridget Ogilvie, chief executive of the MRC and director of the Wellcome Trust, respectively. They agreed that the Trust’s strategy of supporting new institutional settings for genetic medicine squared with Sulston’s aspiration for a factory-style genome centre. Furthermore, they concurred that in the light of their remits and available resources, the MRC was prepared to fund work on model organisms with smaller genomes—such as C. elegans—while the Wellcome Trust would financially support the mapping and sequencing of the human genome.

The next step was to visit Sulston’s group at the LMB and ask him for a detailed proposal that would be presented to the Wellcome Trust’s genetics advisory group, and then be externally reviewed. Michael Morgan, the Trust’s director of research partnerships and ventures, acted as the liaison between Sulston and the advisory group.Footnote 12 In July 1992, as the proposal was being reviewed, the financial capacities of the Wellcome Trust increased significantly when its chairman, Roger Gibbs, sold another tranche of Wellcome Foundation shares, leaving their holding now below 50%.

The proposal, submitted in the summer of 1992, argued for the establishment of a “new centre” that would be named after Frederick Sanger, the inventor of the first sequencing techniques and Coulson’s first line manager at the LMB. This new institution would grow “out of the C. elegans sequencing project” and become “a facility” that would “sequence and interpret a substantial part of the human genome”. As an “interim” goal, the Sanger Institute would “contribute heavily to the sequencing of the yeast genome” to help finish that project “within two to three years”; ahead of the European consortium’s schedule. Another key difference to both the European consortium’s approach to yeast sequencing, and the HGMP and Venter’s approach to human sequencing, was that rather than setting an embargo period for the release of the data or patenting the results, the Sanger Institute would aim for rapid dissemination of the maps and sequences it produced “to the public domain”.Footnote 13

The structure of the institute would comprise a “technology core” conducting DNA mapping and sequencing on a “large-scale” and “quasi-industrial basis”. This would be headed by a senior scientist and run by technical staff, many of which would be “unskilled”. The core would be at the centre of operations, serving distinct sections devoted to C. elegans, yeast and human genome work, as well as informatics and cDNA sequencing (Fig. 4.1). The informatics section would assemble sequences from various DNA fragments, annotate genes within the strings of nucleotides (Chap. 6) and organise and store the information in databases. The cDNA section would “test” the value of the genome projects by using some of the mapping results to generate sequence data and address “biological research topics”, especially in the field of neuroscience. Biological research, however, was planned to represent only a small fraction of the overall activity of the Sanger Institute: about 10%. The remaining 90% would focus on large-scale mapping and sequencing across the whole genome rather than targeting smaller areas using the cDNA approach.Footnote 14

Fig. 4.1
A block diagram represents the distribution of sequencing and mapping of the technology core. It includes human 1, 2, and 3, c D N A studies, yeast, informatics, and nematode, and a question mark. Below is a photograph of 8 people posing for a photo, a woman is seated in the left corner.

Above, a diagram included in the 1992 application to establish the Sanger Institute, showing how its structure and the distribution of work between the different mapping, sequencing and bioinformatics sections was envisaged. Below, a picture of the Board of Management of the Sanger Institute, including John Sulston (front row, second-from-left), Alan Coulson (back row, second-from-right) and Bart Barrell (back row, far-left). The roles of David Bentley (front row, second-from-right), Jane Rogers (front row, far-left), Murray Cairns (back row, second-from-left), Mike Stratton (back row, far-right) and Richard Durbin (front row, far-right) are discussed later in this chapter, and also in Chaps. 5 and 6. Above image: retrieved from Papers and Correspondence of Sir John Sulston, Wellcome Library, London, file number PP/SUL/B/1/1/1/2, p. 3; reproduced with permission from the Wellcome Library. Below image: retrieved from Waterston and Ferry (2019, p. 437, Figure 7); reprinted with permission from Wellcome Sanger Institute

Once the Sanger Institute was established, Coulson was appointed head of the C. elegans section, while Bart Barrell, a researcher who had also worked with Frederick Sanger on the development of sequencing techniques at the LMB, led the yeast genome effort. Sulston became responsible for the technology core and, from this position, was able to coordinate the whole institute (Fig. 4.1).Footnote 15

Another early recruit who decisively contributed to the Sanger Institute’s human genome work was David Bentley. Bentley had also started his career in Frederick Sanger’s LMB division, where he spent the first year of his PhD. At the same time Bentley was conducting research there, Coulson and other co-workers were sequencing bacteriophage viruses using the newly-developed DNA sequencing techniques. Following the institutional migration of his supervisor, Bentley completed his PhD at Oxford and obtained his first academic positions in London. By the time of the Sanger Institute proposal, he was based in the Division of Medical and Molecular Genetics of Guy’s Hospital, London. The head of Bentley’s department was Martin Bobrow, a reputed medical geneticist and member of the Wellcome Trust advisory group. Bobrow’s research focused on the mapping of genes connected to different conditions, among them haemophilia and muscular dystrophy.Footnote 16

Bentley used a promising technique called positional cloning to identify new gene–disease associations, most notably for immune disorders arising from mutations in the X chromosome (X-linked disorders). This technique had emerged in the late-1980s and enabled researchers to find disease-associated genes based on progressively homing in on the genetic location of the putative gene. The identification and mapping of DNA markers surrounding the gene on the chromosome enabled researchers to narrow down the search to specific genomic regions. This was followed by sifting through the handful of genes in the region to find mutations that presented in patients suffering the disorder. Positional cloning avoided the need for gathering the information that geneticists had traditionally needed—such as details about the protein involved in a genetic condition—in order to deduce the function of a gene and then identify it. At Guy’s Hospital, with funding from the HGMP, Bentley also initiated studies to develop shared resources for genome mapping, concentrating first on chromosomes 22 and X. These projects later became central to the production of the human reference sequence at the Sanger Institute.Footnote 17

Positional cloning, as with Venter’s EST method, enabled geneticists to scroll human chromosomes in search of different genes. A driving force behind this technique was Francis Collins, who started developing it at Yale University, the home institution of Frank Ruddle, co-founder of the journal Genomics and promoter of the chromosome workshops (Chap. 3). Collins later moved to the University of Michigan, where he used positional cloning in the mapping of various disease genes, including the one responsible for cystic fibrosis to chromosome 7 (Sferra & Collins, 1993).

Positional cloning and the EST method thus enabled Collins, Bentley and Venter to shift from one gene or chromosomal region to another. Rather than being constrained by a focus on a specific disease and having to determine the presence of particular proteins or other biomolecules, these three researchers could now move across conditions and research problems more easily. Yet the traditional funding regime and institutional organisation of medical genetics limited this multi-locus chromosome scrolling approach. In 1993, just months after Venter left the NIH to found TIGR, Bentley moved to the Sanger Institute with his Guy’s Hospital colleague, Ian Dunham. That same year, the NIH appointed Collins as director of the National Center for Human Genome Research (later NHGRI) following Watson’s resignation. In these three institutions, the researchers would address the whole human genome rather than looking for specific genes or regions connected with particular conditions.

Dunham and Bentley brought some of their Guy’s Hospital collaborations to the Sanger Institute, as well as their interest in chromosomes 22 and X. Yet, their remit at the Sanger Institute was to map and sequence the whole chromosomes rather than helping to locate specific disease-associated genes. This created tensions with some of the community of medical geneticists, especially those working on conditions or biological problems connected to chromosomes 22 and X (Sulston & Ferry, 2002, p. 131). The early-to-mid 1990s had witnessed a continuation of the policies with which Reagan in the USA and Margaret Thatcher in the UK had attempted to nurture the biotechnology industry the decade before: the encouragement of private sector investment in developing applications of publicly and charitably-funded biomedical research (Myelnikov, 2017). Due to this, some geneticists regarded the comprehensive efforts at the Sanger Institute as a threat to their means of acquiring scientific credit and funding prospects. By releasing their results in the public domain, Bentley, Dunham and their whole-chromosome teams could devalue the publication of sequence data in medical journals or the patenting of gene detection techniques that could be licensed to companies manufacturing diagnostic kits.

The key difference underlying these tensions was the Sanger Institute’s ambition of sequencing the whole genome rather than restricting their efforts to the traditional target of medical geneticists: protein-coding regions of the chromosomes corresponding with genes implicated in diseases. This difference was manifested by the emphasis that Sulston had placed on his commitment to determine the yeast, C. elegans, and human sequences from one end to the other and restrict the more targeted, cDNA approach to just one unit of his envisaged new institute. Although avoiding the abundant non-coding regions may have seemed to be an “advantage” of the cDNA approach, Sulston’s proposal argued that sequencing the chromosomes in full was “self-evidently the only route to a complete understanding” of the genomes. This was, to a large extent, due to “transcription control elements” regulating the activation of genes and the synthesis of proteins being found “largely in non-coding regions”.Footnote 18

Sulston and Coulson had become aware of the importance of these regulatory regions during the prior physical mapping of C. elegans. At that early stage, during the mid-to-late 1980s, the majority of mapping requests that they received came from laboratories working on the developmental biology of the worm. Investigating the switching of genes on and off during development had been a main objective of Brenner’s initiation of C. elegans as a model organism and Sulston’s descriptions of the worm’s embryonic and post-embryonic cell lineages (de Chadarevian, 1998). These mechanisms of gene regulation and their salience for worm researchers encouraged and justified Sulston and Coulson’s 1989 proposal to sequence the whole C. elegans genome, a proposal that later informed their vision for the Sanger Institute.

The Sanger Institute was thus the result of a confluence of interests and strategic visions. In the first instance, the genome centre model that Watson and other US-HGP champions were deploying converged with Sulston and Coulson’s will to address the whole C. elegans sequence in the 1989 prison door meeting that led to the formulation of the factory-style sequencing operation. Second, the commitment to tackling full genomes made by both the US-HGP and the C. elegans sequencing project, aligned with the imaginaries and aspirations of three different communities: an established base of molecular biologists seeking comprehensive descriptions of the processes underlying life; a fledging group of developmental biologists interested in regulatory as well as protein-coding regions of DNA; and a new breed of medical geneticists who, like Bentley and Dunham, sought to move beyond the traditional focus on individual disease genes. Thirdly, this novel and comprehensive ambition, one that promised a much broader set of beneficiaries and stakeholders, persuaded the Wellcome Trust: an emergent funder that aimed to support distinctive new models of genetics research. As the US-HGP did with Watson’s genome centre model, the Wellcome funding operationalised Sulston and Coulson’s factory-style vision. This vision would acquire a life of its own during the mid-to-late 1990s and would reposition—and eclipse—existing genome programmes.

3 Managerial Optimisation and the Whole-Genome Coalition

Following favourable reports from its advisory group and external reviewers, the Wellcome Trust, along with the MRC, agreed to establish the Sanger Institute with a start-up grant of 40 to 50 million pounds.Footnote 19 The renewal of this initial funding would be subject to financial and scientific review following the first five years of operation. The main, and almost only, criterion for this review would be progress with the map and sequence data: a strong indication of this was that the cDNA unit requested in the initial proposal was not implemented.Footnote 20

While the rationale of the MRC in starting the conversations leading to the funding of the Sanger Institute had been to avoid losing its flagship C. elegans project to the USA—given the impossibility of continuing Sulston’s sequencing initiative at the LMB—the top priority of the Wellcome Trust was making a substantial contribution to the elucidation of the human genome. This difference largely stemmed from the disparate positions of each agency in terms of funding policy. The MRC needed to support a variety of biomedical disciplines using more rigid grant schemes, while the absence of other funding commitments placed the Trust in the enviable position of being able to make a larger award that would support the mapping and sequencing of human chromosomes. This was by far the most onerous expenditure item of the Sanger Institute and led the worm and yeast work to be subordinated to the human genome, as was the case in the US-HGP.

Morgan would help Sulston with the logistic and administrative details of setting up the new institute. During the second half of 1992, they toured a number of potential locations with the clear aim of avoiding traditional academic settings. After visiting various industrial parks on the outskirts of Cambridge, London and Edinburgh, they chose the country estate of Hinxton Hall, which included several large buildings and surrounding lands. The site, ten miles south of Cambridge, had served multiple purposes since the eighteenth century, the last of them being the hosting of a suite of laboratories for a metallurgical company (Fletcher & Porter, 1997, Ch. 3). Once the Wellcome Trust purchased the site, refurbishment works ensued to develop provisional facilities where operations could be quickly started and then expanded in the longer term (Fig. 4.2). Sulston and Morgan considered that the Hinxton location would benefit from its proximity to Cambridge, London and Oxford, while keeping it independent from academic environments.Footnote 21

Fig. 4.2
2 photographs. A front view of the building with trees is on the top. Below is a top view of a place with buildings, trees, and ground.

Above, the first building in which the Sanger Institute operated, located in the grounds of Hinxton Hall, an eighteenth century estate ten miles south of Cambridge. The building had previously housed metallurgical laboratories. Below, the Wellcome Trust Genome Campus that has developed at the Hinxton site from 1993 onwards, with early buildings on the left-hand side of the image. Above image: reproduced from Fletcher and Porter (1997, p. 9) with permission from the Wellcome Library. Below image: reproduced with permission from Wellcome Sanger Institute

This institutional independence was perceived as crucial for the smooth running of the Sanger Institute. From the proposal stage, Sulston had envisaged a radically different structure from that of any academic research institution. When Morgan sought to implement this vision, his belief was that merely avoiding university campuses would not suffice, and innovative day-to-day forms of operation would need to be added to the equation. After considering different options with MRC officials and Wellcome trustees, it was decided that the legal nature of the Sanger Institute would be that of a research institution funded and managed by a non-profit company called Genome Research Limited (GRL).Footnote 22 This led to a dual governance structure with a scientific manager and a separate head of corporate services. Jane Rogers, a senior LMB administrator, was recruited for the scientific manager position, with the remit of coordinating the mapping and sequencing projects. Murray Cairns, formerly a manager in the brewing industry, became head of corporate services and liaison between the Sanger Institute and GRL (Fig. 4.1). Both the MRC and the Trust were represented on GRL’s board of governors and oversaw the progress of the Sanger Institute (Sulston & Ferry, 2002, Ch. 3).

Sulston’s view was that the sequencing of the human genome could at least be started without the need for further technological developments. This was at odds with the initial goals of the US-HGP, which advocated a focus on mapping until the performance of automatic sequencing instruments had improved.Footnote 23 By the time the Sanger Institute opened, in 1993, its collaboration with Waterston’s WU group around the sequencing of C. elegans had entered a new phase, working towards characterising the whole genome of that worm. Yet the only comprehensive sequencing work that WU and other US genome centres were conducting, was on organisms with substantially smaller genomes than that of H. sapiens.

At the same time, Venter’s patents on human sequences were growing in number. Several institutions—among them competing pharmaceutical companies—joined forces with US genome centres to determine cDNA sequences that were then released into the public domain (Hilgartner, 2017, pp. 149ff). For Sulston and Waterston, however, the best way of countering proprietary ambitions was launching a concerted effort that would sequence and make the entire genome freely-available, rather than targeting specific fragments or waiting for “some magic new” sequencing technology (Sulston & Ferry, 2002, p. 140).

Waterston was the first to articulate this urgency in a 1994 email to Sulston entitled “an indecent proposal”. Written shortly after he visited Britain, it outlined a strategy by which the genome centre at WU and the Sanger Institute could both tackle and complete the human sequence with a small number of collaborators. The plan required a commitment by funding agencies to concentrate support in a handful of carefully-chosen institutions with large-scale sequencing capacities. This implicitly challenged the more inclusive, distributed approach of the European Commission and UK HGMP, which allotted a greater degree of independence in conducting sequencing to the institutions that they funded. As Sulston put it in a meeting with Wellcome trustees in which he strongly defended Waterston’s approach, the underlying message was to “stop fiddling around” and realise that a concerted whole-genome project would be “cheaper” than “pour[ing] the budget into half efforts”.Footnote 24

Waterston’s proposal led to what began to be called the “megalomaniac” genome project (Sulston & Ferry, 2002, Ch. 4). In the months following the email, he and Sulston started pressing their funders to increase their grants either immediately, or during the next cycle of support of their centres, so they could meet their unprecedentedly ambitious sequencing targets. They also entered into correspondence with potential partners that could join them in the sequencing enterprise. These included another fledgling genome centre, the NIH-funded Whitehead Institute in Boston, as well as Généthon, based on the outskirts of Paris. Généthon was devoted to large-scale human genome mapping, and was the only European institution on a par with the Sanger Institute (Chaps. 2 and 3).Footnote 25 In Britain, the Hinxton site where the Sanger Institute was based attracted two other institutions in 1994, with the transfer of the HGMP Resource Centre from Northwick Park Hospital, and a successful bid to house the European Bioinformatics Institute (EBI).

The EBI was the result of the expansion of the first centralised database to store DNA sequences, which had been based for 14 years in the European Molecular Biology Laboratory (EMBL) at Heidelberg and was due to move to a building of its own (García-Sancho, 2011). Sulston and Michael Ashburner, a Cambridge-based computational biologist, submitted a proposal to incorporate it at the Hinxton site with support from the Wellcome Trust. The growing importance of the Sanger Institute in DNA sequencing and the advantages of having the EBI next door to this major sequence producer led the proposal to unexpectedly beat rival and less logistically-demanding bids from, among others, Heidelberg University.Footnote 26 As we show later in the book, input from the EBI was crucial in the assembly, annotation and curation of the reference sequences that the Sanger Institute and other genome centres produced (Chap. 6).

The other institution moving to Hinxton—the HGMP Resource Centre—was transformed into an interface providing training, computer access and other support for users of genomic data. This was the vision of the MRC, which developed the databases that the Resource Centre housed, and appointed new personnel to take on the fresh mission.Footnote 27 The coordination between the Resource Centre, the EBI and the Sanger Institute was, however, challenging at times, due to the differences in approach and scale between the three institutions.Footnote 28

In 1996, the Wellcome Trust convened an international meeting in Bermuda where it invited scientists and administrators from institutions active in DNA mapping and sequencing, including many of Waterston and Sulston’s correspondents, as well as their funders.Footnote 29 One year before, and again at chairman Roger Gibbs’ initiative, the Wellcome Trust had considerably increased its financial capacity by selling its remaining Wellcome Foundation shares to Glaxo Laboratories, which had been a rival pharmaceutical company. The Bermuda meeting, as well as a number of preparatory and follow-up gatherings, have been carefully reconstructed by Kathryn Maxson Jones, Rachel Ankeny and Robert Cook-Deegan, who have documented the complex negotiations leading to the establishment of a set of principles for the free release of sequence data. These principles and further refinements have since shaped the practice of genomic science (Maxson Jones et al., 2018). Here, what we emphasise is how the meeting and its concluding principles enabled the Wellcome Trust and the NIH to operationalise Waterston and Sulston’s ambitions.

A critical mass of the Bermuda attendees agreed with the principles of making the sequence data that they determined rapidly available in open-access databases (Guyer, 1998). These databases were housed in three international repositories, located at the EBI, NIH National Center for Biotechnology Information and the National Institute of Genetics of Japan.Footnote 30 The Bermuda agreement cemented a longer-term commitment by its signatories to start a comprehensive and coordinated attack on the whole human genome sequence. The geography of data repositories, along with Sulston and Waterston’s pivotal role in promoting the rapid and unrestricted release of sequence data, placed the Wellcome Trust and the NIH in a strong position to lead this concerted effort. Venter, who attended the Bermuda meeting with TIGR colleagues, became increasingly isolated as he was one of the few participants defending proprietary rights on DNA sequence data.

Some Bermuda attendees suggested the Human Genome Organisation (HUGO) as a potential coordinator of an international whole-genome sequencing initiative. This organisation had been launched in 1988, after an encounter between Brenner and medical geneticists Victor McKusick and Walter Bodmer at Cold Spring Harbor Laboratory. Its objective was to coordinate scientists involved in human DNA mapping and sequencing, so they would collaborate and avoid duplicating efforts. HUGO, however, did not itself provide funding for mapping and sequencing enterprises. More importantly, it had been explicitly created as independent from any government-funded or transnational human genome programme. Its main activity throughout the 1990s had been organising the human chromosome workshops, which most HUGO member scientists attended with support from the Wellcome Trust and the March of Dimes, a US charity committed to pre-natal genetic screening (Bodmer, 1991). HUGO was thus too close to the distributed approach to mapping and sequencing that Sulston and Waterston had sought to overcome with the concerted, whole-genome effort stemming from the Bermuda agreement.Footnote 31

Due to this, the whole-genome coalition and their programme of work developed “more organically”. According to Mark Guyer—who became director of the NHGRI’s extramural (grant-funding) programme—the coordination was brokered by the funding agencies of the sequencing grants rather than being left to HUGO or any other entity that lacked the legal authority to manage grant funds. The necessary distribution of work and quality assessment was thus achieved through frequent meetings and even more frequent telephone calls, and involved effective collaboration among the participating scientists and funding agency staff.Footnote 32

This form of collaboration also allowed the convergence of the different human genome programmes, therefore encouraging the coalescing of scientists and funders into a single sequencing effort. As a result, the International Human Genome Sequencing Consortium (IHGSC) emerged with an initial membership of 20 large-scale sequencing centres from the USA, UK, France, Germany, China, and Japan, plus bioinformatics institutions and administrative agencies.Footnote 33 The remit of this alliance was to produce a reference sequence encompassing the full human genome, something that they did in draft form in 2000 and in more final form in 2003. The results were published in the journal Nature—the former as an initial draft in 2001 and the latter as a more finished version in 2004—and the data were released to the three open-access international repositories: the EMBL-EBI Nucleotide Sequence Database, NIH GenBank and the DNA Data Bank of Japan.

The IHGSC embodied the funnelling effect that genomics was experiencing by the turn of the millennium. The composition, size and remit of this coalition was much narrower than the diversity of institutions, programmes and modes of organisation that had proliferated in both human and non-human genomics from the late-1980s onwards. The distributed, piecemeal and more inclusive approaches of the HGMP and European Commission contrasted with this selective club which pursued the sequencing of the whole human genome to prevent TIGR and Venter patenting it. Watson, Sulston and Waterston’s visions converged and materialised in the IHGSC, which had the Wellcome Trust and NHGRI as its largest funders. This was reflected in the geography of the resulting coalition: twelve of the twenty large-scale centres of the IHGSC and four of the five top sequence contributors were based in the USA. The other top sequence contributor—second overall—was the UK’s Sanger Institute.Footnote 34

4 From Bespoke Map to Reference Sequence

Venter’s response to the emergent coalition was to formulate a strategy to determine the whole human sequence ahead of the IHGSC. In 1998, he became CEO of Celera Genomics, a company that launched a parallel effort to produce a whole human genome sequence. Unlike the IHGSC endeavour, this sequence would be temporarily stored in a private database, so patents could be sought. Venter used a sequencing approach that he had devised before the Bermuda meetings and successfully applied to the genome of the bacterium Haemophilus influenzae in 1995. This technique, called whole-genome shotgun sequencing, sought to enable the determination and assembly of full genomes without constructing a prior physical map, contrary to the hierarchical map-based sequencing that the IHGSC intended to execute. Along with the powerful automatic sequencing instrument that Venter had at his disposal—the ABI 3700 sequencer, produced by a company belonging to the same corporate group as Celera—the speed of this shotgun approach was a new threat to the open release agenda (Hilgartner, 2017, Ch. 7).

The NHGRI and Wellcome Trust reacted by increasing their financial support, as Waterston and Sulston had been requesting over the preceding four years. While the Trust awarded a substantially higher grant to the Sanger Institute for the period 1998 to 2003, the NHGRI channelled the US-HGP funding to the Whitehead Institute, WU, and Baylor College of Medicine, the latter hosting a new genome centre established in 1996. In 1997, the three large-scale mapping and sequencing centres that the US Department of Energy (DoE) sponsored merged into the Joint Genome Institute. These five institutions, which started to be called the Genomic 5 or G5, took the lead of the IHGSC operation (see note 34).

A problem presented by this funding boost and the advent of the IHGSC more generally, was how to combine this rapid, concentrated and comprehensive sequencing endeavour with the genetic linkage and physical mapping that the US-HGP and other funders of the coalition had been supporting. Despite having grown in resolution throughout the 1990s, most of the resulting genetic linkage and physical maps had not been produced for the specific goal of aiding whole-genome sequencing. This was due to the majority of human genome programmes combining the objective of improving maps with that of supporting medical genetics communities (Chap. 3). The maps were thus focused on a limited range of chromosomal regions that contained the loci of genes connected to diseases.

Additionally, rather than creating a consensus representation that could be used to build a reference sequence, these maps had been produced with the goal of uncovering variation: the differences in the mapped regions that presented across healthy and diseased individuals. This had been the case for the maps produced by the US-HGP in its early years, as well as those funded by the French and German national human genome programmes, which shared the community support ethos of the HGMP and HGAP. Another difficulty was that those maps had been generated by communities of medically and evolutionary-inclined geneticists that were only marginally represented in the G5 institutions and in the IHGSC as a whole.

In the face of this—and the pressing competition of Celera’s approach—the IHGSC decided to produce their own bespoke maps for whole-genome sequencing. This decision enabled the development of tools that were specifically designed to support the determination of the reference genome. The maps were intended to encompass the full set of human chromosomes at sufficient resolution, to enable the identification of the ordered DNA fragments that were needed to sequence all chromosomal regions and then assemble the results into a complete reference genome. These comprehensive bespoke maps could, however, also work as platforms to which prior maps—and the information contained in them about clinically relevant variation—could be linked. The whole-genome maps were mainly produced by the same institutions that undertook the sequencing. Making connections to the more detailed maps incorporating variation, however, required collaboration with the medical and human genetic groups that had been previously funded by the national and European genome programmes.

A first step towards the construction of these bespoke maps was obtaining a library or collection of DNA fragments encompassing the whole human genome. These fragments needed to be cloned: multiplied after their insertion into a reproducing organism, so they would be available in sufficient quantity for the sequencing operation. The Yeast Artificial Chromosomes (YACs) that had been used in the mapping of C. elegans (Chap. 2) were discarded due to their tendency to contaminate the foreign DNA inserted into them. Bacterial Artificial Chromosomes (BACs) or those derived from bacteriophage virus P1 (PACs) were preferred for their greater stability, despite only allowing smaller inserts.

The size of the human genome—considerably larger than any other species sequenced thus far—made the production of the library a complex endeavour requiring expert knowledge and technical dexterity. This led the IHGSC to rely on an external collaborator: Pieter de Jong’s laboratory at Roswell Park Cancer Institute (RPCI). Prior to his appointment at RPCI, de Jong had trained as a biochemical engineer in Europe and worked at the Lawrence Livermore National Laboratory—one of the DoE genome centres—during the early years of the US-HGP. This had furnished him with expertise in large-scale, whole-genome mapping technologies, which he applied to the detection of mutations involved in genetic diseases (Buxton et al., 1992).

Like Venter, Collins and Bentley, de Jong belonged to a community of younger and technologically-savvy biomedical researchers who were pushing the boundaries of medical genetics from specific single-locus diseases to broader areas of the human genome. At RPCI, where he moved in 1993, de Jong and his team distributed libraries to both genetic research institutions and large-scale genome centres.

Both the IHGSC reference sequence and its bespoke physical map were largely based on the RPCI-11 library, produced by de Jong’s team. This library was obtained from the blood of an anonymous male donor chosen from a set of ten men and ten women who came forward in answer to an advert placed in The Buffalo News on 23rd March 1997—RPCI is located in Buffalo, a city in upstate New York, close to the border with Canada marked by the Niagara Falls. Although the initial IHGSC policy was to use a wider range of DNA sources,Footnote 35 in the end almost three-quarters of the total number of nucleotides comprising the draft sequence published in 2001—over 74%—came from the RPCI-11 library, with a further 17% derived from seven other libraries, four of which were produced by de Jong’s group. Overall, more than 90% of the human reference sequence was therefore derived from these eight libraries, all of them produced using DNA sourced from male donors (International Human Genome Sequencing Consortium, 2001, p. 866).

One reason for this relatively small pool of DNA was that, as Adam Bostanci has shown, the IHGSC relied on data that suggested that the sequence similarity between any two humans was 99.9%, which to them made the choice of donor irrelevant, and the use of samples from people of different ethnicities and sexes scientifically meaningless (Bostanci, 2006). Given this, it was believed that reducing the number of libraries and mainly using one would substantially simplify the task of assembling the DNA fragments, while not affecting the representativeness of the results.

The bespoke mapping effort yielded physical maps of individual chromosomes and one comprising the whole human genome. It mainly used the RCPI-11 library to ease cross-referencing between the different mapping operations, and also across the mapping and sequencing projects.Footnote 36 Yet clones from other libraries and data from other mapping endeavours were also incorporated to enhance the content of the maps for particular chromosomes. This was the case for the X-chromosome map, which used fragments from the RPCI-13 library produced by de Jong’s team from a female donor who answered the Buffalo News advert (Bentley et al., 2001). The IHGSC members also collaborated with other institutions that had experience of mapping specific “known regions” of the chromosomes they were assigned. This provided the mappers with detailed knowledge and data that complemented and, at times, corrected the results of the maps they were devising with more generic protocols (The International Human Genome Mapping Consortium, 2001, p. 935). The compilation of these maps of individual chromosomes was, however, subordinated to the task of producing an overall one, and was therefore directed towards the drive to represent the whole genome rather than constituting tailored resources for medical or human genetics.

This hierarchy was reflected in the February 2001 issue of Nature in which the first full draft of the IHGSC reference sequence was published. Along with the sequence, the journal issue included a physical map of the whole human genome and ten maps of individual chromosomes. In all of them, the authors emphasised that the purpose of the maps had been easing the IHGSC operation via the creation of a “tiling path” of ordered DNA fragments that could then be sequenced and assembled into the reference genome. One of the teams stressed that the “only prerequisite” for devising those maps was having a “centralised repository” of data about the BAC clones that were used in the sequencing. Other resources, despite providing “useful information” for the selection of clones and “validation” of the results, “were auxiliary” to the centralised library (Brüls et al., 2001, p. 948; see also Tilford et al., 2001; Montgomery et al., 2001; Bentley et al., 2001). The whole-genome map article emphasised that, although human genome mapping had been an ongoing exercise for over a decade, its scale had “increased approximately tenfold” since 1998 to “keep pace with the ramping up of the sequencing effort” (The International Human Genome Mapping Consortium, 2001, p. 934).

The whole-genome map article was signed by an International Human Genome Mapping Consortium (IHGMC) that incorporated fourteen institutions from the IHGSC, including four of the large-scale sequencing centres from the G5. The rest of the mapping consortium membership comprised de Jong’s RCPI group and five consolidated teams of cancer geneticists (Table 4.1, below). The inclusion of these geneticists was partly driven by the Cancer Chromosome Aberration Project, an initiative funded by the NIH National Cancer Institute that sought to integrate markers of the disease across different human genome maps. This led to the markers and other results of the project being nested in the IHGMC maps.Footnote 37 One of the cancer genetics teams, based in the Albert Einstein College of Medicine, coordinated the physical mapping of chromosome 12, described in the 2001 Nature issue. The mapping of the rest of the chromosomes published that year was led by the Sanger Institute, the Whitehead Institute and Genoscope, all prominent members of the IHGSC.Footnote 38

Table 4.1 Table reflecting the overlaps between the institutions represented in the First International Strategy Meeting on Human Genome Sequencing (Bermuda, 1996; left-hand column), those forming the International Human Genome Mapping Consortium (IHGMC; middle column) at the time of the 2001 publication of the reference sequence in Nature and those listed as genome centres of the International Human Genome Sequencing Consortium (IHGSC; right-hand column) in the same publication

The IHGMC strategy differed from previous mapping initiatives that the human and medical genetics communities had pursued during the 1980s and 1990s. Compared to the chromosome 7 mapping led by the University of Toronto Hospital for Sick Children that we discussed earlier in the book (Chap. 3), the maps described in the 2001 Nature issue were constructed in a less inclusive and collective way, and they were also less attentive to variation. While the chromosome 7 effort involved the collation of contributions made by a wide range of medical genetics laboratories, the IHGMC was a more selective club formed of a smaller number of institutions—mainly genome centres—that mapped larger chromosomal areas. The reasons for undertaking the mapping were also different: positioning genes or markers implicated in diseases in the case of the Toronto-led chromosome 7 initiative, and preparing the genome for sequencing in the case of the IHGMC. This meant that once genes or markers were positioned—and regardless of the rest of the chromosome being mapped—the institutions coordinated by the Toronto hospital would turn to identifying variation in these target regions, and then investigating differences in the sequences of the pertinent segments of DNA across healthy and diseased individuals. The IHGMC, by contrast, prioritised the mapping of entire chromosomes and only used variation as a second layer of information to help verify and add detail to the whole map and, ultimately, to the reference genome sequence.

Celera’s sequencing strategy was actually more sensitive to variation in the genome, despite their whole-genome shotgun technique not requiring initial physical mapping. It was based on five blood donors selected from a pool of twenty-one, three of whom were female and two male—one of them was Venter. The company stated its commitment to a sequence that should be “a composite derived from multiple donors of diverse ethnic backgrounds”—one of the five selected volunteers was African-American, one Asian-Chinese, one Hispanic-Mexican and there were two Caucasians (Venter et al., 2001, p. 306). Celera’s commercial orientation and its plan of devising a restricted-access database required that variation be easily related to the sequence. The company’s potential customers, mainly in the biotechnology and pharmaceutical industries, needed to find the sequence data useful for biomedical research.

This became especially vital when, in 2000 and after continued pressure and mediation, Celera agreed to publish its draft sequence in the journal Science and make some of the data publicly available (Hilgartner, 2017, Ch. 7). This they did against the background of the full and open release of data from the IHGSC. Consequently, Celera decided to refocus its business strategy, increasingly emphasising the development of diagnostic and therapeutic tools using the sequence data, over charging for access to its databases (Rabinow & Dan-Cohen, 2005). This resulted in an alliance with the Toronto-led chromosome 7 effort and an alignment of their collectively produced physical map and associated medical annotations with Celera’s sequence. In 2003, Celera, the Toronto team and more than 40 other institutional co-authors—mainly from medical schools and hospitals—described the sequence of chromosome 7 in detail, including examinations of regions containing clinically-relevant variation (Garcia-Sancho, Leng et al., 2022, see also Chap. 6). At the same time as this collaboration, the IHGSC and IHGMC published fully mapped and polished sequences of each human chromosome—sometimes in collaboration with other co-authors when specific knowledge was required—ahead of the more ‘complete’ version of their reference genome, which appeared in Nature in 2004.

All of this shows that both Celera and the consortia regarded their sequences as platforms to which further information could be linked: about chromosomal positions, inter-individual and inter-species variation, and the biological implications of these. Yet their large-scale mapping and sequencing endeavours and the publicity around them, especially after the 2001 draft publications, led to the consolidation of a success narrative that has emphasised the production of the reference sequence, and overlooked the ways in which this sequence was related and contextualised to other forms of biological data. In stressing and praising the abstracted reference sequence, this master narrative has also abstracted away the prior diversity of genomics to a few participants, modes of organisation and forms of representing variation within the resulting maps and sequences. This funnelling effect has narrowed the public perception of what genomics was, and what it has produced. It has led to the marginalisation of those programmes and institutions that did not converge in the IHGMC and IHGSC endeavours. The Resource Centre became the only surviving component of the HGMP and was increasingly overshadowed by the Sanger Institute and EBI, while the distributed model of the European Commission dissipated around the turn of the millennium (Chap. 2).

This lost—or forgotten—diversity of genomics can be retrieved by examining the processes by which reference sequences were produced and what data were incorporated in—and linked to—them. Historicising the model of genomics instantiated by the IHGMC and IHGSC endeavours has enabled us to uncover the journeys made by different forms of genomic data towards either their incorporation in—or linkage to—the human reference sequence. By reconstructing the historicity of these journeys—as fellow scholars have done with other scientific fields (Leonelli & Tempini, 2020)—this chapter has resurfaced the contribution of human and medical geneticists to the human reference genome, despite these communities being only peripherally represented in Celera, the IHGMC and the IHGSC. The next chapter examines a different historical instantiation of large-scale, concentrated sequencing: the determination of the reference genome of the pig, Sus scrofa, which largely took place at the Sanger Institute. Here, the journeys of the data underlying that reference genome present greater continuities between the production of earlier maps and the reference sequence, than was exhibited by the IHGMC and IHGSC.