1 CONTENTS

Introduction

1. Universal Databases of Genetic Information

2. Databases of Genetic Information of the Population

3. Impact on the Healthcare System

4. Acceptance by the Population of Programs of Genetic Certification

5. Analysis of Genetic Data

6. Data Security

7. Natural Biological Resources and Digitalization

Conclusions

2 INTRODUCTION

Genetic and, more broadly, biological information is at the heart of critical areas such as food production, medicine, the environment, as well as key issues in human development, including the development of means to combat pandemics, and ensuring the long-term resilience of society in times of climate change and global biodiversity loss. The development of DNA analysis technologies has led to a sharp decrease in the cost of sequencing and an increase in the amount of digitized genetic information. According to data of the National Human Genome Research Institute (USA), the cost of sequencing a single human genome fell from $100 million in 2001 to $1000 in 2017. Since 2007, the rate of decline in the cost of DNA sequencing even exceeds the rate of increase in the performance of computer chips according to Moore’s Law [1]. As a result, as of 2023, the reported cost of human-genome analysis for 30× coverage may reach $299 and 100× coverage, $999 [2].

The growth in sequencing productivity is accompanied by an increase in the volume of databases of genetic information. One of the world’s leading databases of genetic information GenBank, maintained by the US National Center for Biotechnology Information, is growing at a rate of 61% per year as of 2022 [3]. This greatly exceeds the rate of increase in the volume of digitized information in the world, with a projected total annual growth rate of 23% between 2020 and 2025 [4]. Upon maintaining such rates of digitalization of genetic information in a time frame of a few decades to 110 years, the DNA of all living organisms on Earth can be analyzed and digitized [5]. The development of methods for analyzing information received and its application to change the properties of living organisms will lead to the global introduction of genetic technologies both in agriculture and in everyday human life: through the technologies of medicine, ensuring the quality of life, and life expectancy. Nevertheless, genetic technologies can be used to the detriment of humanity by individual states, and criminal and terrorist groups. Thus, humanity is at a stage of development when an increase in the amount of knowledge about the biological foundations of life can lead both to an improvement in the quality of life and to global challenges to the very existence of mankind. In this vein, genetic technologies, like nothing else, are as close as possible to nuclear technologies in terms of the possibility of their dual use.

The purpose of this work is to analyze the current situation in the field of the digitalization of genetic information and the global challenges facing humanity when genetic information and tools for the implementation of genetic technologies become widely available.

3 UNIVERSAL DATABASES OF GENETIC INFORMATION

As of the end of 2021, according to the annual review “Molecular Biology Database Collection” of the journal “Nucleic acid research,” there were 1645 databases of molecular biological information in the world [6]. The largest database is the International Nucleotide Sequence Database Collaboration (INSDC) consortium, which includes the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), and the National Center for Biotechnology Information (NCBI). Within the framework of this consortium, the joint database International Nucleotide Sequence Database (INSD) was created in 2005, but the actual cooperation between the three organizations included in the consortium began in the early 1980s. The largest database partition includes the Sequence Read Archive (SRA), which has a public database size of 14 petabytes (PB) as of June 2022 [7].

A similar database is the National Genomics Data Center (NGDC), which is part of the China National Center for Bioinformation (CNCB), which was officially founded in 2019. The CNCB-NGDC is based on three institutes of the Chinese Academy of Sciences: the Beijing Institute of Genomics, the Institute of Biophysics, and Shanghai Institute of Nutrition and Health. As of May 2023, the size of the Genome Sequence Archive (GSA) database was 25 PB [8]. In partnership with the European and American EBI and NCBI centers, the BIG Search project, a scalable universal cross-database search engine, was implemented. In addition, the NGDC developed the Database Commons project, which catalogs open biological databases around the world and provides access to them [9]. As of August 2022, the catalog contained 5832 databases from 72 countries. The largest number of bases is registered in the USA (1433 bases), in second place is China (1110 bases), then India (431 bases), Great Britain (425 bases), and other countries.

In accordance with Federal Law no. 643-FZ dated December 29, 2022 “On Amendments to the Federal Law “On State Regulation in the Field of Genetic Engineering,” the National Research Center “Kurchatov Institute” created a prototype of the National Genomic Information Database (NGID), created to ensure national security, the protection of life and health of citizens, sovereignty in the field of storage and use of genetic data, as well as ensuring the exchange of information contained in it between state bodies, local governments, and owners of genetic data when they interact within the framework of the implementation of genetic-engineering activities. From September 1, 2024, all genetic information received in the Russian Federation will be deposited with the NGID.Fig 1

Fig. 1.
figure 1

Growth of databases of genetic information between 2007 and 2023 in petabytes: (1) NCBI SRA, (2) NGDC GSA, (3) CNGBdb.

4 DATABASES OF GENETIC INFORMATION OF THE POPULATION

In the 1990s, the development of genetic technologies prompted national governments to create genetic databases of the population, the main incentives for the creation of which were security issues, namely the rapid identification of dangerous criminals, as well as the solution of medical problems.

China currently has the largest population database. The Institute of Forensic Science, Ministry of Public Security, China, has a database with information on the genetic profiles of at least 68 million people as of September 2018, while the size of the database grew from 55 to 68 million in less than a year [10]. According to third-party estimates made by the Australian Strategic Policy Institute, as of 2020, the database contained information on 105–140 million people [11].

The United States has the second largest database of population genetic profiles, where the Combined DNA Index System (CODIS) database, owned by the Federal Bureau of Investigation, operates. The database was founded in 1998 and included more than 17 million profiles as of 2018 [10]. Basically, the database includes information about offenders; DNA samples are also collected from relatives of people who are missing and from the military. A feature of the system is its structure, which includes levels of local, state and federal, and an autonomous information network of the Criminal Justice Information Services Wide Area Network (CJIS WAN). In the United States, there are other projects of genomic analysis of the population. In 2018, in the United States, under the auspices of the National Institute of Health, for medical purposes, a program for the mass analysis of the genomes of the population named “All of US” was launched. The program planned to analyze the genomes of at least 1 million people by 2022; as of 2021, at least 270 000 genomes had been analyzed [12].

The third largest national database belongs to the UK. Since 1995, the Home Office’s National DNA Database (NDNAD) program has collected, as of June 30, 2022, 6.9 million genetic profiles of individuals who have been prosecuted in one way or another. The database contains information about tandem repeats, not genome-wide information. However, samples of biological materials are in storage, which makes them available for deeper analysis. In addition to the NDNAD database, which solves forensic tasks, the UK Biobank program has been operating in the UK since 2006. As part of the program, by mid-2023, it is planned to make available the full-genome DNA sequences of 500 000 people. The program is being implemented as part of a public-private partnership with the participation of Wellcome Sanger Institute and companies deCODE Genetics, and pharmaceutical companies Amgen, AstraZeneca, GlaxoSmithKline, and Johnson & Johnson [13].

The fourth largest national database is located in France. Since 1998, the Fichier National Automatisé des Empreintes Génétiques (FNAEG) network has been operating, which as of December 31, 2021 had collected more than 6 million genetic passports, mostly offenders [14].

Currently, the European Union is working to combine national population DNA databases into a single network, which will include at least 16.8 million profiles by the end of 2021. The network will include the national bases of the EU member states and the UK [15].

Interpol has one of the largest international databases of genetic information. It is reported that the database contains 247 thousand profiles provided by 84 member countries [16]. The information provided by countries to Interpol does not include a person’s identity, but contains an alphanumeric code. Member countries retain ownership of the information and can choose with which other countries they share their data.

The size of the genetic databases of commercial companies providing DNA analysis services to the public is comparable to the size of the largest government databases. As of early 2019, more than 26 million people are estimated to have provided samples to the top four companies. The largest number of analyses was performed by Ancestry (14 million), followed by 23andMe (9 million), MyHeritage (2.5 million), and Gene By Gene (2 million) [17]. We note that genetic testing can be quite a profitable business. In August 2020, Ancestry was acquired by Blackstone for $4.7 billion, with Ancestry reported to have over $1 billion in annual revenue [18].

Increasing the productivity of sequencing and reducing its cost have led to the involvement of a wider range of countries in programs of genetic certification of the population, as well as the transition from the analysis of individual sections of the genome to whole-genome sequencing. This made it possible to move from tasks aimed primarily at identifying a person to the tasks of identifying genes encoding rare diseases, developing targeted pharmaceuticals, etc. As of early 2021, national genomic programs operate in 41 countries around the world. Basically, the programs are aimed at studying the variability of genomes in the population (90%), identifying variants that lead to diseases (71%), developing genomic infrastructure (59%), and developing personalized medicine (37%) [19].

There are a number of programs for genetic analysis of the population in Russia. Neonatal screening of all newborns for five diseases has been carried out since 2006 (adrenogenital syndrome, galactosemia, congenital hypothyroidism, cystic fibrosis, phenylketonuria), three more tests for spinal muscular atrophy and primary immune disorders were added to the mandatory list in 2022 [20]. Starting from 2021, the Ministry of Health of the Russian Federation has been conducting a pilot project to analyze 2000 newborn exome genes [21]. The Decree of the President of the Russian Federation of March 11, 2019 no. 97 established the foundations of the state policy of the Russian Federation in the field of ensuring chemical and biological safety for the period up to 2025, which implies “the implementation of genetic certification of the population, taking into account the legal framework for the protection of data on the personal genome of a person and formation of the genetic profile of the population.”

In Belarus, the issuance of genetic passports has also begun, a special program is underway. The leading organization is the Institute of Genetics and Cytology of the National Academy of Sciences of Belarus. Testing is carried out for genetic predisposition to cardiovascular diseases, diabetes, osteoporosis, metabolic syndrome; athletes are tested to identify genetic features favorable and unfavorable for sports; women with miscarriage problems are tested, etc. Over 15 thousand people have been tested [22].

To date, the closest approach to almost complete genetic certification of the population has come in small, developed countries. These include, in particular, Iceland, where, according to deCODE genetics, the complete genomes of more than 160 thousand inhabitants (out of 365 thousand of the total population) were obtained [23]. The island state is of great interest for studying the genomics of human populations, hereditary diseases, etc. in view of the fact that the ancestors of the first settlers of most Icelanders are documented and for a number of centuries there was virtually zero migration to the island. These features made it possible to create a genealogical database of the population Islendingabok (“Book of Icelanders”) [24]. The project began in 2003, it contains the profiles of 904 000 people. It is believed that this is half of the people who have ever lived in Iceland since its colonization in the 9th–10th centuries. Access to the database for citizens of Iceland is possible by individual ID, about two-thirds of the population has registered on the site. One of the popular services of this database is the ability to establish the degree of kinship with any of the inhabitants of the country.

Taking advantage of the fact that most of the Icelandic population is descended from a relatively small circle of common ancestors, the company deCODE genetics successfully conducts research on the heredity of heart disease, type-2 diabetes, Alzheimer’s disease, schizophrenia, etc. Collaborating with the national medical system, the company analyzes hereditary data, along with behavioral and other data influencing the development of innate susceptibility to disease. Taking advantage of the wide access to various data, since 2002 the company has been actively involved in deciphering the human genome, having published, in particular, 5000 microsatellite markers linked to chromosomes [25].

A high degree of knowledge of the population makes it possible to identify the role of the genotype in the occurrence or presence of a predisposition to a number of diseases, but raises ethical questions, which include the disclosure of confidential questions of kinship, hereditary morbidity, etc. [26]. Is it necessary to inform citizens and their relatives about the risk of contracting incurable or serious diseases (cancer, Alzheimer’s, etc.)? What could be the mechanism and legal basis for such reporting? What should be the mechanism for accessing information about hereditary diseases in the database, etc. [27]?

5 IMPACT ON THE HEALTH SYSTEM

The accumulation of information about the genomes of the population is of paramount importance for the development of personalized medicine, which may become the main driver of the digitalization of genetic information in coming years, as well as lead to the more intensive development of precision medicine and pharmacogenomics, especially in terms of the effect of drugs on different races and ethnic groups [28]. Studies published in recent years have shown that, for example, white and African American populations in the United States have different alleles that regulate the biochemistry of oleic acid, which react differently with atenolol [29]. In this regard, it is worth noting a number of challenges facing a global healthcare system currently focused largely on the use of mass drugs, the market for which may collapse with the development of precision medicine. Accordingly, there is predicted low interest of current players in the development of precision pharmaceuticals due to the lack of rapid cost efficiency. Despite the fact that in 2003 a number of pharmaceutical companies launched research in the field of pharmacogenetics [30], there is reason to believe that the pharmaceutical giants remain skeptical about the profitability of mass-produced “precise drugs”: their cost is difficult to reduce due to small batches, especially if racial, ethnic, and hereditary factors are taken into account in their recommendations to patients. The researchers compare the current interest in pharmacogenetics with the early work in gene therapy in the 1990s: the hopes raised during its inception led to a boom in investors, but the impossibility of rapid commercial success due to lack of scientific development led to the collapse of many small biotech companies, as well as to the exit of large pharmaceutical holdings from similar projects [31].

The development of new approaches in health insurance is required. A key function of precision medicine will be to identify those patients who can benefit the most from genomic testing, e.g., people with rare diseases or various forms of cancer, and then provide a report to the insurance company that takes into account the most effective drug-treatment model. Experts call this the transition “from a diagnostic odyssey to diagnostics,” counting on the fact that the correct and quick diagnosis will be the maximum possible taking into account genomic data [32]. Changes are also needed in medical training [33]. Training courses are required to include advanced knowledge of epigenomics, transcriptomics, proteomics, and metabolomics. As technology advances rapidly, physician training will be paramount, and there will be a need for innovative methods to support the education of health workers.

6 ACCEPTANCE BY THE POPULATION OF PROGRAMS OF GENETIC CERTIFICATION

With the exception of programs for the genetic certification of criminals, a key factor in the further growth of the population’s genetic databases is the consent of the population to conduct genetic analysis and store the information obtained in databases. Nevertheless, there are factors that can slow down further growth in the amount of genetic information about the population [34]. An example of this would be changes in the population genetic testing market in the direct-to-consumer segment. Between 2016–2019 the number of clients of companies offering genetic testing primarily for the purpose of identifying ethnicity grew exponentially. For example, between 2018 and 2019 the number of clients Ancestry, 23andMe, and others grew by 117% [35]. Nevertheless, as of February 2020, there was a slowdown in the growth in the number of clients of commercial companies offering genetic testing of the population [36]. The alleged reasons for this may be market saturation, public doubts about the confidentiality of testing, and discrepancies in the expected results. The saturation of the market is due to the fact that those who want to determine their origin have already received results, while their next of kin are not interested in testing, since they have already received a result. It should also be taken into account that the accuracy of genetic testing depends on the base against which the results are compared. In the early stages of information accumulation, this leads to lower testing accuracy, which in turn can lead to research frustration. The most acute problem is the doubts of the population and governmental organizations about the confidentiality of test results. A number of cases show that increasing the level of confidentiality of genetic data should become one of the most important priorities in the development of genetic technologies.

In December 2019, the US Department of Defense sent out advice to the US military not to undergo genetic testing by commercial companies and a warning that genetic testing increases the risks for military personnel: “Exposing sensitive genetic information to outside parties poses personal and operational risks to Service members” [37]. In the UK in 2007, five employees of the Forensic Science Service, which maintains the UK’s national population DNA database, were convicted as they stole data in order to set up a rival company to genetically identify the population [38]. A UK lawyer lost her job after information about her DNA in a criminal database was made available to her employer. At the same time, she herself spent 24 h under arrest and was found not guilty [39]. The effectiveness of the British database also raises questions from researchers. Costing more than 2.5 million pounds a year, this database allows you to track only serial killers or rapists who, with the effective work of law enforcement agencies and without genetic information, are under constant control or in prison. In addition, by concentrating only on citizens of the UK, and with certain criminal inclinations, the base becomes powerless in case of crimes committed by citizens of other countries, and even more so illegal migrants [40]. The company GEDmatch, originally providing services to help in the search for relatives or biological parents of adopted children, in 2018, after announcement of the discovery of a serial killer through the company’s database, faced the mass closure of user profiles as a result of the possibility of using them to search for criminals. In this regard, the company was forced to change the conditions for providing information to law-enforcement agencies and was subsequently bought by a company specializing in DNA forensics [41].

23andMe in 2018 was forced to tighten data access for third-party companies, which include developers of software for health monitoring, weight loss, etc. due to public concerns about the confidentiality of the results. It is argued that now access for third-party organizations is provided according to a more complex procedure [42]. At the same time, the issue of human genotyping has often been the subject of legislative and judicial proceedings [43]. In addition, researchers criticize the company’s tests for provoking phobias in clients based on an unproven predisposition to certain fatal diseases (for example, conclusions are drawn on the basis of a pair of gene mutations, rather than gene-network interactions) [44]. Officially, the company did not provide testing data to intelligence agencies and law-enforcement agencies, but such a provision is contained in the user agreement “if such disclosure is reasonably necessary.” By 2019, the company had collected a genetic base of 9 million people, arguing that further progress is hindered by legislative regulation by the FDA [45].

The negative perception of genetic certification by the broad masses of the population can become an obstacle to the development of genetic databases. Apparently, the closest analogue of such behavior may be the reaction of residents of various countries to vaccination during the COVID-2019 pandemic. This example becomes most obvious both in view of the wide spread of the disease, and in view of the fact that the need for vaccination is actively emphasized by government and public institutions. Vaccine hesitancy, defined by the WHO as a “delay in acceptance or refusal of vaccination despite availability of vaccination services” was declared by the WHO in 2019 as one of the top ten threats to public health [46]. As of 2021, the readiness to be vaccinated averaged 75.2% in the 23 countries studied, which is 3.7% higher than in 2020 [47]. However, a significant proportion of the population is hesitant about vaccination and may show a similar attitude towards genetic certification. The introduction of COVID-19 vaccination certificates and varying degrees of freedom of movement depending on the availability of vaccination also raise questions about the ethical aspects of introducing genetic passports. One potential issue could be the theoretical presence of a genetically determined immunity to a disease in a person, which could lead to greater freedom of movement and greater opportunities in a person with the presence of such genes [48]. The availability of information about a person’s “bad” and “good” genes rightly causes concern among the population, as this can manifest itself in the creation of unequal conditions in employment, tariffs for health insurance, etc. [49]. Therefore, the adoption of any decisions on the use of genetic technologies in relation to a person should take into account their interests, the results of scientific research, and ethical standards [50]. It is necessary to find an optimal balance between the requirements of personal-data confidentiality, the need for scientific research, the development of personalized-medicine technologies, and the development of a business model that can support the exponential growth of genetic data in the future [51, 52].

7 ANALYSIS OF GENETIC DATA

The need to process large amounts of genetic data using modern digital technologies has become an incentive for information giants to cooperate with genetic programs and companies. Thus, back in 2015, the Broad Institute of MIT and Harvard, together with Google Genomics, developed an entire cooperation program for cloud access to genetic information, including for third-party researchers. In turn, a subsidiary of Google, which received the broader name of Cloud Life Sciences, today offers the ample opportunities of its servers to bioinformaticians in the field of data storage and processing. Its competitors are not far behind: Oracle Life Science has created ample opportunities for the transfer and storage of such discrete information as the results of clinical trials, including the storage of patient data; clients include Oyster Point Pharma and Pfizer. Accordingly, Microsoft Azure also has a similar service, offering “the acceleration of data processing for genomic analysis, precision medicine, clinical trials” by providing large-format storage, high-speed data processing, and support for bioinformatics. There is the formation of digital ecosystems of genetic information, which may lead to the monopolization of access to such information, similar to the current monopolization of social networks and other digital platforms.

Despite impressive growth rates, the process of the digitalization of genetic information is at the initial stage of its development. Currently, there is the accelerated growth of databases, characteristic of the stage of information accumulation. Whereas the development of information-analysis tools in databases and decision making based on it are developing at a slower pace [53]. The main limitation is the lack of related information about studied organisms, such as the presence of diseases, blood biochemical parameters, etc. As a result, data-analysis systems based on artificial intelligence (AI) can successfully analyze genetic information but can only interpret it correctly in certain cases [54, 55]. The second limiting factor is the need to standardize and validate recommendations made by AI. For example, in the field of medicine, experts need to understand how AI came to solve a problem (explainable AI) and whether this solution is an error that can lead to the death of a patient [56]. A number of countries are developing procedures that would regulate the use of AI methods in medicine. For example, the US Food and Drug Administration published an action plan and requirements for developers of such technologies in January 2021 [57]. With the further development of AI methods and the expansion of biological-information databases, we can expect the accelerated development of research in this area and its ascent to a fundamentally new level.

8 DATA SECURITY

Widespread access to databases of genetic information and the results of scientific research leads to the mass development of amateur research in the field of molecular biology (biohacking). Examples of such research are the development of amateur vaccines against SARS-CoV-2 [58], anti-aging drugs [59], etc. This development of events is of concern to experts, since in a negative scenario this can lead to the creation of more advanced types of biological weapons and the emergence of a new type of “cyberbiological attack,” during which the infrastructure for the synthesis of nucleotide sequences can be hacked, and components for the synthesis of pathogens can fall into the hands of terrorists [60]. The published works on the experimental modification of dangerous viruses in the search for contagious forms to track probable natural mutations in this direction cannot but arouse concern among the population and the scientific community [61].

As a result, in parallel with an increase in the volume of digital genetic data, the volume of closed data is growing, when genetic data in publications are no longer laid out in the public domain. And we are talking not only about viruses and other pathogens, but also about crops. Thus, a publication of great scientific interest by an international team of authors financed by funds from China does not contain any open references to genomic data [62].

The question arises regarding the organizational form of large databases of genetic information containing sensitive information. For example, in France, a similar structure (France Genomique) was initiated by the state on the basis of the national institutes of health and medical research, agricultural research, and scientific research. Since January 2019, the parent and unifying organization has been the Commissariat for Atomic and Alternative Energy (CEA), created by Charles de Gaulle back in 1945. The existence of such a consortium under the leadership of an organization that has been responsible for decades for the French nuclear project is due to its managerial competencies, computing resources, and most importantly, the need to concentrate such an important national project as genomic research in one center [63].

9 NATURAL BIOLOGICAL RESOURCES AND DIGITALIZATION

Digitalization of the genetic information of natural ecosystems and organisms will play an important role in the development of measures for the conservation of endangered species of organisms, the study of natural biodiversity, the identification of new pathogenic organisms, and will serve as a source of information about new genetic sequences with potential use in biotechnology, agriculture, and medicine. In this direction, an important aspect is the detailed characterization of existing bioresource collections of microorganisms, plants, and animals. The digitalization of bioresource collections is already a significant driver of the growth of genetic information on a global scale. With the exception of work with pathogenic microorganisms, such research does not cause opposition from the general population. On the contrary, we are talking about the conservation of biodiversity, the conservation of endangered species, autochthonous animal breeds or crop varieties, which guarantees this process broad public and state support.

Also, the detailed characterization of bioresource collections plays an important role for the economy and an increase in the resilience of local communities in the face of climate change. An illustrative example is the work of the Ethiopian Biodiversity Institute (we are talking about one of the poorest countries in the world) [64]. The project, funded by the United Nations Global Environment Facility (GEF) and the World Bank, started in 1994 and focuses on the diversity of local crops (cereals, coffee, fruits, medicinal plants) cultivated in the traditional way by small farms. The cultures maintained in the collection are genotyped while being maintained as living cultivated collections located in six agroecological regions. Associations of nursery farmers were formed for each region, the local knowledge of farmers about their cultivars, methods of their cultivation and processing, and national selection was studied and documented.

National genetic banks of agricultural and wild crops also exist in a number of other countries, in Germany, Canada, Brazil, and the National Bank of Genetic Resources of Economically Useful Plants of the Republic of Belarus operates and is replenished. A similar program is appearing in Russia: On March 30, 2023, the draft law “On Bioresource Centers and Biological (Bioresource) Collections” was submitted to the State Duma, introducing, in particular, the concepts of genetic resources, their national catalogs and collections, including for the purpose of regulating received digital genetic information [65]. Given that we are talking about one of the largest bioresource collections in the world, such as the All-Russian Collection of Microorganisms of the National Research Center “Kurchatov Institute” (more than 20 000 samples) and the Collection of Plant Genetic Resources of the Vavilov All-Russian Institute of Plant Genetic Resources (more than 320 000 samples) [66], Russia is expecting an explosive growth in the digitalization of genetic information, which should potentially make the NDGI a dynamically developing database of genetic information in the long term until 2030.

10 CONCLUSIONS

Currently the process of the digitalization of biological information is in its early stages of development, characterized by the accelerated exponential growth of databases, which opens up great opportunities for the development of personalized medicine, biodiversity conservation, the development of biotechnology and agriculture. An urgent issue is the connection of AI technologies to the analysis of biological data, although its effectiveness has been proven only in certain areas where there is good data structuring. The availability of genetic and biological information in digital form and wide access to it open up new horizons for the development of synthetic biology, but at the same time provide opportunities for the development of dual-use technologies and biological terrorism. In view of this, it is necessary to develop mechanisms for monitoring and regulating the development of digital genetic and biological databases with broad involvement of the scientific community and business. It is urgent to develop standards for the safe use of digital genetic data, as well as the development of mechanisms for national and international control in this area, possibly similar to the International Atomic Energy Agency (IAEA). In this regard, the most important task is searching for the optimal balance between requirements for the confidentiality of personal data of the population, compliance with legal and ethical standards, the need for scientific research, the development of personalized medicine technologies and the development of business models and organizational and legal forms that can accompany further growth in the volume of genetic data.