Abstract
Marine bioprospecting, which involves the exploration of genetic and biochemical material from marine organisms, can be used towards addressing a broad range of public and environmental health applications such as disease treatment, diagnostics and bioremediation. Marine genetic resources are important reservoirs for such bioprospecting efforts; however, the extent to which they are used commercially for natural product discovery and the marine sources from which they are derived are not well understood. Here we introduce a comprehensive database of marine genes referenced in patent filings, the Marine Bioprospecting Patent database. It includes 92,550 protein-coding sequences associated with 4,779 patent filings, identified by analysing all relevant records from genetic sequence databases. Three companies alone—BASF, IFF and DuPont—included sequences from 949 species (more than half of referenced species with identified marine origin). Microbial life in the deep sea, a vast and remote biome predominantly beyond national jurisdiction, is already attracting substantial economic interest; the top ten patent holders have all filed marine gene patents referencing sequences from deep-sea life. Our findings provide an updated understanding of the marine bioprospecting landscape, contribute to the sustainable use of marine biodiversity and underscore the need for policymakers to ensure stewardship of deep-sea ecosystems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Main
Biodiscovery—the exploration and use of genetic and biochemical properties of biological materials—has a long and rich history. For instance, centuries before the discovery of penicillin from mould in a laboratory, skin diseases were already being treated in the Kingdom of Jordan via red soils with potent antibacterial properties that have only recently been confirmed1. Other examples include traditional medicines extracted from evergreen shrubs for cancer treatment2, derivatives of the foxglove plant used to treat heart problems3, antimalarial quinine4 and fungi-extracted podophyllotoxin to treat sexually transmitted diseases5. However, recent advances in genetics and sequencing innovations have spurred an unprecedented growth in the scale of discoveries. Today, bioprospecting—the search for potential products with scientific and industrial value derived from biological resources such as animals, plants and microorganisms—often involves large-scale screening, analysis and prediction of prospective biological compounds through the exploration of databases with sequencing data, including DNA extracted directly from environmental samples6.
In this context, the ocean is considered a promising but largely untapped frontier for biodiscovery7. Marine organisms have evolved over millions of years to adapt to extreme conditions of temperature, salinity, light, pressure and water flow8. These conditions as well as a far longer evolutionary history have contributed to substantially greater taxonomic and functional diversity in marine habitats than in other biomes9. Nearly one million eukaryotic species are believed to inhabit the ocean10, and the number of archaea and bacteria may be ten thousand times higher11, yet most remain undescribed by science.
Despite these knowledge gaps, marine biotechnology—the use of marine organisms and their compounds for a wide range of applications in industrial sectors—has managed to distinguish itself from the broader biotechnology landscape. For instance, while nearly half of the approved pharmaceuticals are based on biological compounds produced by living organisms, success rates are two to four times higher for compounds from marine organisms7,12. Annual sales and licensing revenues from marine drugs have exceeded US$1 billion annually since 201113, and prospects for greater commercial growth are substantial: in 2020 alone, more than 1,400 new compounds were isolated from marine species14. Biomolecules extracted from marine bacteria and other products developed from sequences of larger marine organisms are widely used in food production, diagnostics, bioremediation and disease treatment15. Some notable examples include the discovery of a thermostable enzyme required for the production of lactose-free milk in Archaea Pyrococcus furiosus16, seawater cyanobacteria toxins developed into anticancer treatment products17 and the extensive use of green fluorescent protein found in jellyfish Aequorea victoria18 as a molecular marker, both in medical and diagnostic contexts and fundamental research.
Establishing a regulatory landscape that keeps pace with rapid advances in biotechnology, while also promoting transparency, equitable access and benefit-sharing mechanisms, has proven challenging19. The adoption of the Convention on Biological Diversity (CBD) in 1993 was a crucial milestone, as it defined genetic resources as ‘any material of marine plant, animal, microbial or other origin containing functional units of heredity of actual or potential value’, and established the fair and equitable sharing of benefits from their use as one of the convention’s three core objectives20. In 2014, the convention’s Nagoya Protocol provided a framework to regulate the access and benefit sharing of marine genetic resources (MGR) sampled in national jurisdictions21. Yet, some two-thirds of the ocean lies beyond national jurisdiction, and it was not until 2023, following protracted negotiations, that the ‘High Seas Treaty’ was agreed upon, including provisions to address MGR from areas beyond national jurisdiction (ABNJ)22.
Despite these encouraging developments, the actual and potential value of MGR for marine bioprospecting remains poorly understood. Studies have focused on counting referenced marine species in patents23 or GenBank24, examining sequences in international patent applications25,26,27,28 or exploring biological compounds for natural product discovery29,30. A common aspect to all these studies, however, is their lack of focus on the connection between the actors involved in the use of MGR and the potential sources for natural product discovery. They also suffer from limited information in patent and GenBank records about the geographical origin of gene sequences, which in many cases are referenced without naming the source species. The unevenness of these data presents a challenge for interpreting the true scale, scope and trajectory of marine bioprospecting.
Here we address these gaps by creating a comprehensive database of genetic sequences and related patent applications from 1989 to 2022 in marine bioprospecting. In addition to systematically compiling and presenting key data about the sequences, coded proteins, date of deposition and patent holders, we also address significant data gaps by developing and applying a BlastX sequence similarity model to consider sequences from unnamed species. We also assess the biodiversity data of species currently considered unique to ABNJ and highlight the special importance of deep-sea conservation for future biotechnology focused on the innovation and development of naturally derived products.
Results
Our analysis of patent filings revealed 29,065 nucleotide sequences from 1,474 disclosed marine species across 3,636 unique patents, representing approximately 1% of all gene patents submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Many patents referenced multiple sequences, with a majority including both marine and non-marine sequences (Fig. 1a). Overall, marine sequences and species represented only 16% and 15%, respectively, of all sequences and species identified within the 3,636 patents (Fig. 1b,c). For comparison, approximately 242,000 marine species have been described to date (World Register of Marine Species (WoRMS), 2022), corresponding to roughly 10% of the 2.1 million species described by science31. This suggests considerable untapped potential of marine bioprospecting (Fig. 1b and Supplementary Table 1).
Types of sequence in marine gene patents
The patent applicants who referenced the highest number of unique genetic sequences included both protein-coding and non-coding sequences, with the former having a higher potential for natural product discovery (Fig. 2). Most of the companies with a large number of applications referenced protein-coding genes that originate from multiple species, with an average length between 500 and 2,000 nucleotides. Some applicants specifically focused on MGR from a single species and predominately referenced non-coding sequences. For instance, the Fisheries Research Agency of the National Research and Development Agency in Japan included 1,179 sequences in their patent applications, mostly originating from Japanese eel (Anguilla japonica), yet only 127 are protein-coding sequences. Similarly, the Japan Science and Technology Agency has referenced 5,190 sequences from the sea vase tunicate (Ciona intestinalis), only 150 of which are protein-coding genes.
Most short non-coding sequences of identical length, originating from the same species, exhibit a wide range of GC content (that is, the percentage of two DNA basic building blocks), which is typical for artificially modified sequences used in amplification or as probes for detecting specific sequences of DNA or RNA (Supplementary Fig. 1). Out of all the patents that include at least one sequence from disclosed marine species, 71% contain nucleotide sequences that are potentially protein-coding genes. This suggests that most MGR are used in bioprospecting (Fig. 3a). For sequences of particular interest (that is, those submitted to all patent systems), we provide examples illustrating the conversion of DNA molecules into products of value (Box 1).
Marine Bioprospecting Patent database
While INSDC records provide considerable insight into the genes referenced in patents, only 37.3% of records include the name of source species, primarily filed under the World Intellectual Property Organization (WIPO), the European Patent Office, the Patent Office of Japan and the Korean Intellectual Property Office. Most of the remaining records are from the US Patent and Trademark Office, which does not share species names in its records (Supplementary Fig. 2).
To address this gap, we developed a sequence similarity model and BlastX search tool to query all genetic sequences with unknown origins against the UniProtKB protein sequence database. This model retrieved an additional 60,636 sequences, which can be said with a high degree of certainty to originate from marine organisms. Together with the 31,914 protein-coding sequences of confirmed marine species, this resulted in a comprehensive database of 92,550 sequences, which form the basis for all subsequent analysis in this paper and were used to construct the Marine Bioprospecting Patent (MABPAT) database (https://mabpat.shinyapps.io/main/).
Key actors in marine biotechnology
We found that 100 applicants accounted for 58% of all patents that contain protein-coding sequences with identified marine origin (that is, bioprospecting patents). The remaining 42% were associated with applicants who filed fewer than two patents on average. For companies in the top 100 (Supplementary Table 3), the total number of patent applications would have been underestimated by at least one-third if we had not applied the sequence similarity model. Transnational corporations (1,675 applications) are the most frequent type of applicant, although roughly one-fifth of filings are from research institutes and their commercialization centres (634 applications) (Fig. 3b). In total, 78% of all bioprospecting patents filed by the top 100 were submitted by actors headquartered in the USA, Germany or Japan (Fig. 3c).
The number of patents registered by each applicant is correlated with the total count of unique species included in such patents (r = 0.8168, P = 2.17552 × 10−318). To illustrate how much biological diversity each of these applicants is drawing upon, we connected patent holders and unique species included in patent claims and aggregated on the domain (Fig. 4a) and phylum level of biological taxonomy (Supplementary Fig. 6). For each flow diagram, we also added information if the corresponding marine species had been observed in a deep-sea environment. The most active users of MGR are primarily dependent on sequences from bacteria and archaea (Fig. 4a). The ten largest actors, including eight multinational corporations and two public research bodies (Fig. 4a), collectively registered more than one-third of all patents in the top 100. Deep-sea marine species have attracted interest from all ten of the largest users of MGR.
The opacity of marine bioprospecting in ABNJ
Issues of access and benefit sharing related to genetic material from ABNJ are of particular interest as they fall outside the scope of the Nagoya Protocol of the Convention on Biological Diversity and were at the core of negotiations for the High Seas Treaty adopted in June 2023. It is therefore notable that among 1,639 species of identified marine origin referenced in INSDC patent records, 281 species have been observed in ABNJ, with only 5 of them being exclusive. This contrasts with the 5,889 species found exclusively in ABNJ, predominantly from the Arthropoda, Foraminifera and Nematoda phyla, according to our analysis of species observation data available in the Ocean Biodiversity Information System (OBIS), a global open-access database on marine biodiversity (https://obis.org). The complete taxonomic distribution is given in Supplementary Fig. 7. According to the records from the World Register of Deep-Sea Species (WoRDSS), 39% of marine species were exclusively found to inhabit deep-sea environments, in contrast to only 15% of all species listed in WoRMS (Fig. 4b). The spatial distribution of ABNJ-specific species (Supplementary Table 4) is predominantly in the sub-Antarctic and Antarctic latitudes (Supplementary Fig. 8).
ABNJ account for 64% of the ocean surface area and 95% of its volume. Once thought to be largely devoid of life, the deep-sea habitats and the water column have been found to harbour many marine species. While many of these species are thought to be considerably cosmopolitan, hotspots of endemism are found throughout the deep sea, perhaps most strikingly around hydrothermal vent systems32. According to geolocations of active hydrothermal vents (721 in total), more than half (363) are located in ABNJ.
Discussion
Marine biotechnology is mainly focused on species that serve as model organisms in basic research and as a backbone in genetic engineering, allowing the creation of new drugs and increasing the efficiency of biotechnological processes for food and energy production, plant agriculture or the invention of new materials33. Marine species currently represent a small, but important, share that is used as a source for natural product discovery7,30. Unravelling the global scope of economic interest in MGR is a crucial first step towards understanding the value that rests in the biological functions encoded in genetic sequences and pathways to fair and equitable sharing of benefits from its use.
Patent data are a valuable source of information in examining innovation and technological advancements, which are widely acknowledged as key drivers of firm performance and economic growth34,35. Aggregate patent application counts in particular are useful for studying national patenting activity36. Patent data also provide insights into the scope of ‘pre-emptive patenting’ to block competitors, to increase the market price of existing products or to ensure operational freedom37—strategies that biotechnology corporations are known to use38. While estimating the market value of patents or establishing links to commercialization is challenging39, patent data are a useful indicator for gaining insights into the long-term economic interest of societal actors in MGR applications on a global level, in the form of either knowledge production or market control.
The MABPAT database offers a global catalogue of patent sequences derived from marine species over the past three decades. It includes in depth information on patent applications, the genetic sequences attached to them and the marine species from which the sequences were derived, effectively connecting the resources and users of marine bioprospecting. In doing so, the MABPAT database not only fills an important research gap but also contributes to the transparency and interoperability of MGR use. By making it publicly available, we hope to enable further research efforts to inform improved policymaking. The analysis that generated this database also resulted in three key insights that are addressed below.
Rapid technological advances and data governance
Scholars have suggested that the earliest form of a patent system can be traced back 2,500 years ago to ancient Greece and that the first modern patent law dates back to the year 147440. Little surprise then that the patent system has struggled to keep pace with the rapid advances in genetics and genomics research of the past decades, as seen, for instance, in the considerable variation in ground rules for patenting genetic sequences across jurisdictions27. Key developments over the past 30 years have focused on jurisdictional norms and compliance standards. In 1998, international applications introduced a mandatory data element for sequence description (‘organism’), which aimed to indicate biological origin41. Yet, current international standards42 still allow the inclusion of custom organism names not listed in the Integrated Taxonomic Information System (https://www.itis.gov/), including ‘unknown’, ‘unidentified’ and ‘artificial sequence’. The new requirements of INSDC43, announced in November 2021, aim to ensure correct origin disclosure for all incoming sequences. But the effect on the 24.5 million patent sequences already stored in the databases as well as new depositions remains uncertain given that it is ultimately up to patent offices to define standards for the sequences attached to patent applications (https://www.ncbi.nlm.nih.gov/education/patent_and_ip_faqs/).
The analysis of patents therefore often depends on either accepting considerable data gaps or developing methods to reconstruct missing data. In this study, for instance, 17.2 million sequences would have been excluded from the analysis owing to the lack of species names (primarily from the US Patent and Trademark Office, the largest repository of biological sequences and patents). Instead, our sequence similarity model allowed us to reasonably and more comprehensively estimate the patent shares across national states and actor types. This reconstruction allowed us to identify marine origin, focusing on molecular similarities of biological molecules instead of relying on disclosed species names, and to confirm with higher confidence than previous work that Japan, the USA and Germany are the headquarters location for the world’s primary MGR patent applicants25,27. The disproportionate importance of these three states suggests a corresponding responsibility to work towards innovative benefit-sharing and capacity-building mechanisms. These could include, for instance, the establishment of a multilateral fund for the equitable sharing of benefits between providers and users of digital sequence information (DSI), which has been agreed to be finalized at CBD COP16 (ref. 44).
Importance of microorganisms and deep-sea life for bioprospecting
Marine viruses, although having been recognized as being highly prevalent in ocean ecosystems, contributing to the largest pool of genetic diversity45, have seen little commercial activity to date beyond a limited focus on those that affect commercial aquaculture production. However, the potential role of viruses in creating proteins of interest for marine bioprospecting could be bigger than we think. Viruses have shaped the majority of the genomes of Archaea and Bacteria via horizontal gene transfer, the exchange of genetic material between organisms that do not form parent–offspring relationships46. Bacterial and archaeal species often live in symbiosis and exchange genes with microbial eukaryotes, protists47, and together constitute the vast majority of organisms used in marine bioprospecting. Importantly, many archaeal and bacterial species used in bioprospecting live in deep-sea habitats, most of which are located in ABNJ. The diversity of microbial marine species is still highly underrepresented in databases that document the distribution and abundance of marine life (Box 2). This underrepresentation may account for the lack of patenting interest in species found exclusively in ABNJ. However, even with limited data, our findings show that ABNJ-specific species are 2.5 times more likely to inhabit the deep ocean compared with marine species in general.
Our analysis of the past three decades of global gene patents indicates that deep-sea species have become an important source for marine bioprospecting. All of the ten largest actors in marine bioprospecting are already using deep-sea species. As a result, there is a logic for benefit sharing from MGR utilization to flow into conservation projects aimed at protecting at-risk deep-sea habitats48, not least as a vital source for future biotechnology focused on innovation and development of naturally derived products. More advanced biodiversity models that put emphasis on safeguarding entire communities with unique functional roles, including microbial species, should also be better integrated into conservation plans49.
With the successful conclusion of the High Seas Treaty and the recognition of DSI in the legally binding agreement, MGR used for bioprospecting and product discovery opens a new opportunity to protect biodiversity in deep-sea habitats. However, the INSDC database, the largest data repository of DSI, is currently missing from the biodiversity informatics landscape50; therefore, genetic diversity and information on the spatial origin of genetic information are not available on a global level. Adoption of the principles of Open and Responsible Data Governance and the development of MGR data repositories51 will be a necessary step to overcome the lack of information on MGR in ABNJ.
Intellectual property questions are not discussed within the High Seas Treaty, yet commercial sensitivities and national patent regulations are important for benefit sharing related to MGR sourced from the deep sea in ABNJ. While the agreed text of the treaty includes a voluntary mechanism to ensure traceability of MGR collected from ABNJ to end product, the treaty implementation will not affect sequences already used in marine bioprospecting up to date. As there are no legal requirements for patent holders to disclose commercialization of their patents, the scale of commercial products developed and marketed from deep-sea organisms will remain poorly understood. A continued increase in corporate interest along current trajectories would lead to unequal opportunities for new developments in biotechnology.
Multi-stakeholder collaboration in MGR protection
Analysis of bioprospecting patents yielded an asymmetrical distribution of patent registrations, consistent with previous findings25,27. The sector is dominated by transnational corporations, which have a higher capacity to undertake genomic research. One-third of all patents were held by the ten largest actors, eight of which are large multinational corporations and none of which conduct marine research themselves but instead rely on public gene databases for sequences with potential commercial applications. While many multinational pharmaceutical companies have marine biology departments52, their total share of bioprospecting patents is modest (Supplementary Table 3). Still, a fair estimate of corporate engagement in marine species discovery is hard to calculate. Marine scientists who study microbial diversity often engage in collaboration with the oil and gas industry for the collection of samples in deep-sea oil wells53,54. With the rising popularity of using remotely operating vehicles for the inspection and maintenance of offshore oil and gas development sites, it is likely that more science–industry partnerships will emerge to support collection of biological data in the deep sea55.
The disproportionate role of a small number of actors also suggests the potential for science–industry collaboration in the spirit of previous efforts with so-called keystone actors, which consists in engaging the largest companies in a given sector to enable transformative change56. Constructive efforts to promote sustainable management in ABNJ have also been undertaken by partnerships such as the Deep Seas Project (https://www.deep-seas.eu) and the Common Oceans ABNJ Project57, as well as regional bodies such as OSPAR Comission, the North East Atlantic Fisheries Comission and the Sargasso Sea Commission58, which have addressed challenges related to illegal, unreported and unregulated fishing, and pollution, based on integrated and holistic approaches. The International Seabed Authority, empowered by UNCLOS (Supplementary Text 1) to manage the resources of the seabed in ABNJ, has begun to apply tools such as Regional Environmental Management Plans (REMPs) and designated associated Areas of Particular Ecological Interest (APEIs) aimed at conserving ecosystem function and biodiversity. The impact of such measures could be further amplified by seeking a coordinated approach in accordance with overarching environmental goals59. Such initiatives can foster cross-sectoral dialogue and capacity-building activities that improve the capacity of national governments and local communities to engage in sustainable resource use in ABNJ.
Corporate efforts to safeguard intellectual property rights, significant data gaps and the heterogeneity of data standards have contributed to the use of ambiguous terminology and a lack of precision in discussions concerning MGR and bioprospecting in ABNJ. This has shaped perceptions of the scale and nature of commercial interest in MGR from ABNJ, feeding expectations of a lucrative ‘deep-sea gold rush’ without adequate empirical support for such claims60,61. While the conclusion of the High Seas Treaty has laid the foundation for improved management in ABNJ, its entry into force and full implementation are a remote prospect and, in the meantime, voluntary collaborative efforts based on the best available science can help inform future binding mechanisms to ensure conservation and sustainable use. By filling the crucial knowledge gap in understanding the potential of MGR, the MABPAT database represents a first step in that direction.
Methods
Summary statistics of patents that include MGR
The GenBank patent division, the European Bioinformatics Institute database (EMBL-EBI) and the DNA DataBank of Japan (DDBJ) exchange their data daily and together form the INSDC. Genetic sequences associated with patents were retrieved from the Patent division of GenBank from the NCBI (GenBank database) on 10 November 2022; this included 24,600,503 annotated sequences. All files (from gbpat1.seq.gz to gbpat254.seq.gz) were downloaded and processed following the methodology of ref. 25 to create database entries with information on the nucleotide sequence of DNA, species name, patent number, patent data and the party registering the patent. This was done by splitting each file into individual sequences and by extracting the data in the ‘origin’ field (nucleotide sequence), ‘organism’ field (species name) and ‘journal’ field (patent application number, year of application, patent system and patent applicant name) for each sequence. Unlike previous studies25,27 that restricted their analysis to sequences submitted in a given patent system, here we considered both patents submitted in national jurisdictions and those filed under the Patent Cooperation Treaty (‘international’ patents) of WIPO.
As of November 2022, sequences from a total of 14,708 different species were included in the GenBank database. To determine the subset of ‘marine species’ within the database, the taxon match tool of the WoRMS was used for all database entries, resulting in a filtered list of 4,000 species. Web searches were conducted for each of these species to verify the marine origin and to collect further information about the nature of each species. More than half of the matched species were subsequently excluded as non-unique to marine environments, resulting in the list of 1,474 marine species, which was used to select patent records associated with disclosed marine species. See ref. 27 for details of marine origin determination and criteria for filtering.
The taxonomy (domain and phylum) of 879 marine species was retrieved from the WoRMS database. In cases in which such taxonomic levels were not available, we obtained species taxonomy from the NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy) and Wikipedia (https://en.wikipedia.org/wiki/) (220 and 356 species, respectively). We did not succeed in matching 19 of the marine species (predominantly marine bacterial strains) into related taxonomic groups owing to lack of certainty in organism names. The complete list of marine species selected for this study is given in Supplementary Table 5a.
MABPAT construction
Marine biotechnology pipelines usually focus on the search for biological compounds that encode a new functionality62. There are two types of nucleotide sequences encoded in DNA: protein-coding sequences and non-coding sequences. The latter could have either a functional or a non-functional role in genome regulation, including DNA fragments that code for proteins involved in all cell functions. Except for short peptides like cone snail peptide toxins63, most natural products are derived from proteins, which are polypeptide chains of a certain length. While identifying the shortest polypeptide chain length to form a protein is still controversial, it is currently estimated at 50 (ref. 64) to 100 (ref. 65) amino acids or 150 to 300 DNA base pairs, respectively.
Another important metric widely used to analyse genome composition variation in molecular biology and genomics is nucleotide usage, which is normally calculated as GC content—the percentage of certain nucleotide bases (guanine and cytosine) that form stronger chemical bonds in DNA strings. Modern genetic engineering techniques such as CRISPR66 have proven to be very useful at enhancing important functions of proteins by altering DNA makeup. This could involve changing individual nucleotides or introducing short sequences that control gene regulation and protein synthesis. Hence, GC content for modified proteins with similar functionality remains the same. Short DNA sequences, below the shortest DNA length required for protein formation, have various functions, including in the amplification of a specific gene sequence (as PCR primers), and usually have a wide range of GC content.
To predict whether genetic sequences are protein coding or not, we applied two filtering criteria: sequence length threshold and the presence of an open reading frame (ORF)—a gene region that has the potential to be transcribed into RNA and, after, translated into proteins. Sequences with an ORF longer than 150 base pairs have been considered protein-coding sequences. As most natural products are derived from proteins, we reason that at least one protein-coding sequence has to be included in a patent application, to be related to marine bioprospecting. Following that, we selected 31,914 protein-coding sequences associated with 1,039 marine species together with 112,115 of other sequences that have been submitted as a part of the same application.
For all companies that have registered patents associated with MGR, we counted the total number of nucleotide sequences and calculated the average sequence length (Fig. 2). Based on the shortest protein length estimation, the number of protein-coding or non-coding sequences for each company was identified. In each category, for the ten companies with the highest counts of genetic sequences attached to patent claims, we calculated the length and DNA composition (GC content) of each sequence, and coloured by distinct species origin (Supplementary Fig. 1).
For each sequence that was included in patent applications submitted in national jurisdictions as well as ‘international’ patents (sequences of special patenting interest), we collected the description of the invention and the protein function, if a nucleotide sequence search (BlastX) resulted in a significant match to a protein with annotated function. Web searches were conducted for each of these proteins to collect further information about protein function and potential application. The resulting information about the sequences of special patenting interest is available in Supplementary Table 2.
Patents owned by subsidiaries were replaced with ultimate owner names of controlled subsidiaries as stated in the Orbis company database, which contains information on around 400 million companies worldwide (Orbis; https://orbis.bvdinfo.com/). For jointly owned patents, the ownership was assigned to the first company on the list. After filtering and removing duplicate entity names and aggregating subsidiaries, we identified a total of 1,125 applicants and collected information about each through web searches, including the country where it is headquartered and the type of entity that it represents. Our classification resulted in five major entity types: multinational (presence in more than two countries) or national companies, universities and their commercialization centres, governmental agencies and ‘other’ (predominantly applications submitted by private individuals). We also included patent applications from 201 entities that contained protein-coding sequences with identified marine origins, which we were unable to classify under any specific entity type (‘none’).
Each record in the MABPAT database includes the following: (1) patent applicant name, (2) type of applicant, (3) country where it is headquartered, (4) year of application, (5) patent application number, (6) patent system, (7) genetic sequence identification, (8) marine species name associated with the sequence, (9 and 10) species taxonomy, (11) taxonomic source, (12) whether species can be classified as ‘deep-sea’ species, (13) source of deep-sea presence, (14) whether species were observed in ABNJ, (15) genetic sequence, (16) GC content, (17) sequence length, (18) whether the sequence originated from a marine organism, (19) whether the marine origin of the sequence was disclosed by the patent applicant or bioinformatically predicted, (20) whether the sequence contains protein-coding information and (21) sequence prediction source. If the marine origin was predicted, the following information about the most similar protein entry in the reference database is provided: (22) protein entry header, (23) protein entry sequence identification, (24) protein entry title, (25) E-value, (26) hit identity and (27) query coverage.
Deep-sea presence of marine species
The search for presence of species in deep-sea habitats was conducted based on multiple sources. For species in the Eukarya domain of life, we used the WoRDSS, a taxonomic database of deep-sea species. As Bacteria and Archaea species are not present in WoRDSS, we used web search based on the PubMed (https://pubmed.ncbi.nlm.nih.gov/) and Integrated Microbial Genomes and Microbiomes (https://img.jgi.doe.gov/) databases to establish their potential presence in deep-sea habitats, whether within or beyond national jurisdiction. Samples of species collected from deep-sea environments that have already been found to be associated with international patent applications27 are also marked as ‘deep-sea’ species. For the definition of deep-sea marine species, we followed the inclusion criteria in WoRDSS, that is, that the biological material was sampled in depths greater than 500 m.
BlastX sequence similarity model and patent share estimation
Sequence similarity models are widely used to identify newly sequenced data or unknown species67. To conduct sequence similarity BlastX searches (translated nucleotide versus protein) against the database of annotated protein sequences, we created the reference database of all proteins belonging to 627 genera of previously confirmed marine species in Supplementary Table 5a. A total of 24,024,531 proteins from all species within those genera were selected from UniProt Knowledgebase (UniProtKB/Swiss-Prot; UniProt Consortium 2023) which included Swiss-Prot (the expertly curated protein records) and TrEMBL (bioinformatically predicted proteins).
BlastX searches with a specific set of search parameters (E-value ≤ 10−5, query coverage ≥ 80%, hit identity ≥ 99%) were used to verify that marine sequences could be identified to a genus level with at least 95% confidence (correct hit) (Supplementary Fig. 3a). We also tested whether correct hits and searches with confidence below 95% tend to be included in certain patent applications, patented by certain actors or in certain patent systems, but did not find any preference (Supplementary Fig. 3b,c). Using the sequence search tool DIAMOND68, we queried 12,716 protein-coding sequences with disclosed marine origin against the selected records from UniProtKB, which resulted in 10,514 correct hits (82.68% recovery rate).
We then queried 7,467,396 sequences with unknown taxonomic origin (‘unknown’, ‘unidentified’ and ‘synthetic construct’ species tag)—62.7% of all GenBank records—against the selected records from UniProtKB, and found 234,836 sequences originating from 1,368 species not previously disclosed in patent records. All matched species were subsequently verified to be exclusively present in marine habitats, resulting in a final list of 561 additional marine species (Supplementary Table 5b). Overall, we have recovered 60,636 previously unknown protein-coding sequences with marine origin and 144,545 other sequences that have been submitted as a part of the same patent application (2,257 patent applications in total).
Finally, we compared summary statistics (number of sequences, number of patents and median year of application) for the top 10 largest patent applicants that referenced sequences with disclosed marine origin and top 10 applicants that referenced sequences with predicted marine origin (Supplementary Figs. 4 and 5, respectively), and found that both lists contained the two largest patent applicants (Bayer and BASF, respectively).
Hydrothermal vent presence and ABNJ-unique species counts
The geolocation of hydrothermal vents was collected from the InterRidge Vents Database. The maritime boundary map of World High Seas was downloaded from Marine Regions (https://marineregions.org/). Each set of hydrothermal vent coordinates was checked for presence within any of the High Seas polygons. Spatial vector data were analysed with the R package sf version 1.0-9 (ref. 69).
To establish the list of species uniquely present in ABNJ, we used species geographical abundance data from OBIS. We first retrieved all 28,375 species with at least one occurrence record in ABNJ (https://obis.org/area/1). For each ABNJ-present species, we checked if it was also observed in the territorial waters of any country. Species with at least one occurrence record were excluded. Data were obtained from the OBIS database (2022) using the R package robis version 2.11.0. (ref. 70) and parallel version 3.6.2. (ref. 71).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Data were collected from publicly available data sources: INSDC (ftp.ncbi.nih.gov/genbank/)72, UniProtKB73, WoRMS (https://www.marinespecies.org)74, WoRDSS75, PubMed (https://pubmed.ncbi.nlm.nih.gov) and the Integrated Microbial Genomes and Microbiomes database (https://img.jgi.doe.gov). Species observation records were obtained from OBIS (https://obis.org). The geolocations of hydrothermal vents were collected from the InterRidge Vents database (http://vents-data.interridge.org)76. The maritime boundaries map of World High Seas was downloaded from Marine Regions (World EEZ v.11) (https://marineregions.org). The resulting MABPAT database is available at https://mabpat.shinyapps.io/main and via figshare at https://doi.org/10.6084/m9.figshare.25289404.v3 (ref. 77).
Code availability
Analysis scripts are available via GitHub at https://github.com/zhivkoplias/mabpat.
References
Falkinham, J. O. et al. Proliferation of antibiotic-producing bacteria and concomitant antibiotic production as the basis for the antibiotic activity of Jordan’s red soils. Appl. Environ. Microbiol. 75, 2735–2741 (2009).
Cragg, G. M. & Pezzuto, M. Natural products as a vital source for the discovery of cancer chemotherapeutic and chemopreventive agents. Med Princ. Pract. 25, 41–59 (2016).
Hournan, P. C. H., Hertog, M. G. L. & Katanc, M. B. Analysis and health effects of flavonoids. Food Chem. 57, 43–46 (1996).
Achan, J. et al. Quinine, an old anti-malarial drug in a modern world: role in the treatment of malaria. Malar. J. 10, 144 (2011).
Shah, Z. et al. Podophyllotoxin: history, recent advances and future prospects. Biomolecules 11, 603 (2021).
Atanasov, A. G. et al. Natural products in drug discovery: advances and opportunities. Nat. Rev. Drug Discov. 20, 200–216 (2021).
Sigwart, J. D. et al. Unlocking the potential of marine biodiscovery. Nat. Prod. Rep. 38, 1235–1242 (2021).
Beraldi-Campesi, H. Early life on land and the first terrestrial ecosystems. Ecol. Process. 2, 1 (2013).
Román-Palacios, C., Moraga-López, D. & Wiens, J. J. The origins of global biodiversity on land, sea and freshwater. Ecol. Lett. 25, 1376–1386 (2022).
Appeltans, W. et al. The magnitude of global marine species diversity. Curr. Biol. 22, 2189–2202 (2012).
Eguíluz, V. M. et al. Scaling of species distribution explains the vast potential marine prokaryote diversity. Sci. Rep. 9, 18710 (2019).
Gerwick, W. H. & Moore, B. S. Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. Chem. Biol. 19, 85–98 (2012).
Blasiak, R. et al. A forgotten element of the blue economy: marine biomimetics and inspiration from the deep sea. PNAS Nexus 1, pgac196 (2022).
Carroll, A. R., Copp, B. R., Davis, R. A., Keyzers, R. A. & Prinsep, M. R. Marine natural products. Nat. Prod. Rep. 38, 362–413 (2021).
Blasiak, R. et al. Making marine biotechnology work for people and nature. Nat. Ecol. Evol. 7, 482–485 (2023).
Li, B. et al. Preparation of lactose-free pasteurized milk with a recombinant thermostable β-glucosidase from Pyrococcus furiosus. BMC Biotechnol. 13, 73 (2013).
Aesoy, R. & Herfindal, L. in Principles of Cancer Treatment and Anticancer Drug Development (ed. Link, W.) 137–139 (Springer International Publishing, 2022).
Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W. & Prasher, D. C. Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994).
Wynberg, R. & Laird, S. A. Fast science and sluggish policy: the Herculean task of regulating biodiscovery. Trends Biotechnol. 36, 1–3 (2018).
Convention on Biological Diversity (Secretariat of the CBD, UN Environment Programme, 2011); https://www.cbd.int/convention/text
Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity (Secretariat of the CBD, UN Environment Programme, 2011); https://www.cbd.int/abs/text
UN General Assembly. Draft Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction; https://www.un.org/bbnj/sites/www.un.org.bbnj/files/draft_agreement_advanced_unedited_for_posting_v1.pdf (2023).
Oldham, P., Hall, S. & Barnes, C. Patent Landscape Report on Animal Genetic Resources (World Intellectual Property Organization, 2014); https://www.wipo.int/edocs/pubdocs/en/wipo_pub_947_3.pdf
Scholz, A. H. et al. Myth-busting the provider-user relationship for digital sequence information. Gigascience 10, giab085 (2021).
Arnaud-Haond, S., Arrieta, J. M. & Duarte, C. M. Marine biodiversity and gene patents. Science 331, 1521–1522 (2011).
Arrieta, J. M., Arnaud-Haond, S. & Duarte, C. M. What lies underneath: conserving the oceans’ genetic resources. Proc. Natl Acad. Sci. USA 107, 18318–18324 (2010).
Blasiak, R., Jouffray, J.-B., Wabnitz, C. C. C., Sundström, E. & Österblom, H. Corporate control and global governance of marine genetic resources. Sci. Adv. 4, eaar5237 (2018).
Blasiak, R., Jouffray, J.-B., Wabnitz, C. C. C. & Österblom, H. Scientists should disclose origin in marine gene patents. Trends Ecol. Evol. 34, 392–395 (2019).
Katz, L. & Baltz, R. H. Natural product discovery: past, present, and future. J. Ind. Microbiol. Biotechnol. 43, 155–176 (2016).
Jaspars, M. et al. The marine biodiscovery pipeline and ocean medicines of tomorrow. J. Mar. Biol. Assoc. 96, 151–158 (2016).
The IUCN Red List of Threatened Species Version 2024-1 (IUCN, 2022).
Van Dover, C. L. et al. Scientific rationale and international obligations for protection of active hydrothermal vent ecosystems from deep-sea mining. Mar. Policy 90, 20–28 (2018).
Khan, I., Akmal, K. F., Chong, W. S., Maran, B. A. V. & Shah, M. D. in Marine Biotechnology: Applications in Food, Drugs and Energy (eds Shah, M. D. et al.) Ch. 1 (Springer Nature, 2023).
Hasan, I. et al. The innovation–economic growth nexus: global evidence. Res. Policy 39, 1264–1276 (2010).
Lara-Lopez, A., Valdés, L., de Pinho, R. & Enevoldsen, H. In Global Ocean Science Report 2020: Charting Capacity for Ocean Sustainability (ed. Isensee, K.) 135–173 (UNESCO Publishing, 2020).
Haščič, I. et al. Public Interventions and Private Climate Finance Flows: Empirical Evidence from Renewable Energy Financing; OECD Environment Working Papers no. 80 (2015).
Guellec, D., Martinez, C. & Zuniga, M. P. Pre-emptive patenting: securing market exclusion and freedom of operation. Econ. Innov. New Technol. 21, 1–29 (2012).
Gurgula, O. Strategic patenting by pharmaceutical companies—should competition law intervene? IIC Int. Rev. Ind. Prop. Copyr. Law 51, 1062–1085 (2020).
Hall, B. H., Jaffe, A. & Trajtenberg, M. Market value and patent citations. RAND J. Econ. 36, 16–38 (2005).
Adams, J. N. in Research Handbook on Patent Law and Theory (ed. Takenaka, T.) Ch. 1, 2–26 (Edward Elgar Publishing, 2019).
Jefferson, O. A., Köllhofer, D., Ajjikuttira, P. & Jefferson, R. A. Public disclosure of biological sequences in global patent practice. World Pat. Inf. 43, 12–24 (2015).
Standard ST.26: Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (Extensible Markup Language) (WIPO, 2023); https://www.wipo.int/export/sites/www/standards/en/pdf/03-26-01.pdf
Spatio-Temporal Annotation Policy (INSDC, 2021); https://www.insdc.org/news/spatio-temporal-annotation-policy-18-11-2021/
COP15: Nations Adopt Four Goals, 23 Targets for 2030 in Landmark UN Biodiversity Agreement (Secretariat of the CBD, UN Environment Programme, 2022); https://www.cbd.int/article/cop15-cbd-press-release-final-19dec2022
Suttle, C. Marine viruses—major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007).
Sobecky, P. A. & Hazen, T. H. Horizontal gene transfer and mobile genetic elements in marine systems. Methods Mol. Biol. 532, 435–53, (2009).
Husnik, F. et al. Bacterial and archaeal symbioses with protists. Curr. Biol. 31, 862–877 (2021).
Cordes, E. E. & Levin, L. A. Exploration before exploitation. Science 359, 719 (2018).
Pollock, L. J. et al. Protecting biodiversity (in all its complexity): new models and methods. Trends Ecol. Evol. 35, 1119–1128 (2020).
Bingham, H. et al. The biodiversity informatics landscape: elements, connections and opportunities. Res. Ideas Outcomes 3, e14059 (2017).
Oldham, P., Chiarolla, C. & Thambisetty, S. Digital Sequence Information in the UN High Seas Treaty: Insights from the Global Biodiversity Framework-related Decisions; LSE Law School Policy Briefing Series 53/2023. Available at SSRN https://doi.org/10.2139/ssrn.4343130 (2023).
Trevisanut, S. & Bonfanti, A. Intellectual Property Rights Beyond National Jurisdiction: Outlining a Regime for Patenting Products Based on Marine Genetic Resources of the Deep-Sea Bed and High Sea. Available at SSRN https://doi.org/10.2139/ssrn.1861020 (2011).
Alexander, J. B. et al. Complementary molecular and visual sampling of fish on oil and gas platforms provides superior biodiversity characterisation. Mar. Environ. Res. 179, 105692 (2022).
Franco, N. R. et al. Bacterial composition and diversity in deep-sea sediments from the southern Colombian Caribbean Sea. Diversity 13, 10 (2020).
McLean, D. L. et al. Enhancing the scientific value of industry remotely operated vehicles (ROVs) in our oceans. Front. Mar. Sci. 7, 00220 (2020).
Österblom, H. et al. Scientific mobilization of keystone actors for biosphere stewardship. Sci. Rep. 12, 3802 (2022).
The Common Oceans: ABNJ Deep Seas Project (FAO, 2018); https://www.fao.org/3/CA2245EN/ca2245en.pdf
Wright, G. & Rochette, J. Regional Ocean Governance of Areas Beyond National Jurisdiction: Lessons Learnt and Ways Forward (STRONG High Seas Project, 2019); https://www.prog-ocean.org/wp-content/uploads/2019/03/STRONG-HS_Lessons-Learnt-Report.pdf
Amon, D. J. et al. Assessment of scientific gaps related to the effective environmental management of deep-seabed mining. Mar. Policy 138, 105006 (2022).
Leary, D. & Juniper, S. K. in The Limits of Maritime Jurisdiction (eds Schofield, C. et al.) Ch. 34, 769–785 (Martinus Nijhoff Publishers, 2014).
Leary, D. Marine genetic resources in areas beyond national jurisdiction: do we need to regulate them in a new agreement? Marit. Saf. Secur. Law J. 5, 22–47 (2018).
Rotter, A. et al. The essentials of marine biotechnology. Front. Mar. Sci. 8, 629629 (2021).
Terlau, H. & Olivera, B. M. Conus venoms: a rich source of novel ion channel-targeted peptides. Physiol. Rev. 84, 41–68 (2004).
Woolfson, D. N., Baker, E. G. & Bartlett, G. J. How do miniproteins fold? Science 357, 133–134 (2017).
Brunet, M. A., Leblanc, S. & Roucou, X. Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs. Exp. Cell Res. 393, 112057 (2020).
Zhang, F., Wen, Y. & Guo, X. CRISPR/Cas9 for genome editing: progress, implications and challenges. Hum. Mol. Genet. 23, 40–46 (2014).
Pearson, W. R. An introduction to sequence similarity (“homology”) searching. Curr. Protoc. Bioinformatics Chapter 3, 3.1.1–3.1.8 (2013).
Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Pebesma, E. Simple features for R: standardized support for spatial vector data. R J. 10, 439 (2018).
Provoost, P., Bosch, S. & Best, B. iobis/robis: robis 2.11.0. Zenodo https://zenodo.org/doi/10.5281/zenodo.1489948 (2022).
parallel: support for Parallel computation in R. R version 3.6.2 https://rdocumentation.org/packages/parallel/versions/3.6.2
Arita, M. et al. The international nucleotide sequence database collaboration. Nucleic Acids Res. 49, D121–D124 (2021).
The Uniprot Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
World Register of Marine Species (WoRMS Editorial Board, accessed 15 November 2022); https://doi.org/10.14284/170
Glover, A. G., Higgs, N. & Horton, T. World Register of Deep-Sea Species (WoRDSS) (accessed 15 November 2022); https://doi.org/10.14284/352
Beaulieu, S. E. & Szafranski, K. InterRidge Global Database of Active Submarine Hydrothermal Vent Fields Version 3.4 (InterRidge, accessed 1 February 2023).
Zhivkoplias, E. MArine Bioprospecting PATent dataset. figshare https://doi.org/10.6084/m9.figshare.25289404.v3 (2024).
How much does a patent cost? BlueIron (16 January 2022); https://blueironip.com/how-much-does-a-patent-cost
Butamax, Gevo settle patent dispute. Biomass Magazine (24 August 2015); https://biomassmagazine.com/articles/butamax-gevo-settle-patent-dispute-12339
Gevo acquires Butamax patent estate. Yahoo Finance (23 September 2021); https://finance.yahoo.com/news/gevo-acquires-butamax-patent-estate-130000249.html
Abida, H. et al. Bioprospecting marine plankton. Mar. Drugs 11, 4594–4611 (2013).
Delmont, T. O. et al. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genom. 2, 100123 (2022).
Haward, M. G. & Rogers, A. D. Marine genetic resources in areas beyond national jurisdiction: promoting marine scientific research and enabling equitable benefit sharing. Front. Mar. Sci. 8, 667274 (2021).
Tara Ocean Foundation, T. O. & Oceans, T. Priorities for ocean microbiome research. Nat. Microbiol. 7, 937–947 (2022).
Acknowledgements
We thank D. Khvostovetc for providing valuable consultancy in the Patent Cooperation Treaty and European Patent Convention. E.Z., A.P. and R.B. are funded by FORMAS, project number 2020-01048. A.P. is also funded by FORMAS, project number 2019-01220. P.D. is funded by the Research Platform Governance of Digital Practices at the University of Vienna. J.-B.J. is funded by the Knut and Alice Wallenberg Foundation (2021.0343).
Funding
Open access funding provided by Stockholm University.
Author information
Authors and Affiliations
Contributions
E.Z. and A.P. collected the raw data. E.Z., J.-B.J. and R.B. designed the research and analysed the data. E.Z., P.D., J.-B.J. and R.B. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Sustainability thanks Peter McGarvey and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Text 1 and 2, Figs. 1–8 and Tables 1–5.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhivkoplias, E., Jouffray, JB., Dunshirn, P. et al. Growing prominence of deep-sea life in marine bioprospecting. Nat Sustain 7, 1027–1037 (2024). https://doi.org/10.1038/s41893-024-01392-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41893-024-01392-w
- Springer Nature Limited