Review

Systems biology is developing rapidly. It is an integrative research strategy designed to tackle the complexity of biological systems and their behavior at all levels of organization (from molecules, cells and organs to organisms and ecosystems) in normal and perturbed conditions. It is based on an understanding of biological functions as system properties that are different from those of the individual interacting components (reviewed in [19]). It integrates the mass of data that has been collected with various global measurement technologies (techniques that look at the complete set of genes, proteins or other features in an organism), in order to formulate predictive mathematical and computational models of functional and regulatory biological networks. Specific biological hypotheses can thus be tested by designing a series of perturbation experiments. It thus combines data-driven (bottom-up) [10] and model-driven (top-down) [11] approaches into a question-driven (middle-out) inquiry in search of basic principles [1214]. In the end, systems approaches must be driven by high-quality hypothesis-driven biology and not just by data-accumulating technologies or high-performance computational modeling.

Although systems biology has ancient roots in physiology, biochemistry, and molecular and cellular biology, its current development is the result of recent advances in genomics and bioinformatics, which were made possible by the continuous development of high-throughput experimental and computational platforms. The field is also revisiting previous attempts at modeling biological complexity, by taking advantage of insights from system theory [15] and engineering sciences [16, 17]. It forms the basis for an extension of genetic engineering into synthetic biology: designing and building biological systems with new properties from modular components [1821].

Evolution, development, physiology and disease are viewed in systems biology as dynamic processes that operate on widely different scales in space and time between biological states that are constrained by interrelationships among pathway and network components. In this context, detecting, understanding and treating disease translates into identifying and manipulating global perturbed networks rather than focusing only on unique failing components.

Here we review how medical genomics [22, 23], based on recent advances in high-throughput experimental and computational technologies, is evolving in the context of systems biology into a more prospective systems medicine [2431]. This new kind of medicine will be able to overcome the current limitations of disease complexity (through stratification of patients and diseases by molecular diagnostics) and drug discovery (through the analysis and targeting of disease-perturbed networks) [3235]. We discuss some of the technological, conceptual and organizational challenges that we will face in implementing this new vision and practice of biology and medicine, and we argue that it offers new opportunities to more efficiently tackle key medical problems in both developed and developing countries.

Technology is moving genomics from structure to function

The initial sequencing of the human genome was made possible by the automation of the DNA sequencing chemistry and by the development of data-acquisition tools and software for the reliable interpretation and assembly of the DNA sequence. It required a multi-billion-dollar investment and the participation of thousands of researchers in the public and private sectors over more than a decade. It came together with the sequencing of genomes in a variety of microorganisms, animals and plants. All these efforts combined served as test cases to trigger sustained technology developments.

With the next-generation DNA mega-sequencing technologies currently available, which enable the collection of billions of nucleotides in single instrument runs [36], it is now possible to sequence and assemble a human genome in a matter of weeks at a small fraction of the cost of the reference genome [37]. With both incremental progress and the introduction of third-generation sequencing technologies, it may soon become possible to collect large numbers of individual genomes in days for US$1,000 or less and ascertain their unique variations. This opens up the possibility of a Personal Genome Project to find correlations between genotypes and normal or diseased phenotypes [38]. In parallel, over a period of more than 30 years, successive generations of increasingly miniaturized DNA arrays have been used for expression profiling, benefiting from the extensive sequencing of partial and complete cDNA collections. Microarray technology, because of its intrinsic complexity and that of the transcriptome, has reached an intermediate stage of maturity compared with sequencing [39]; it is possible to detect variations in expression of many but not all gene transcripts under normal and perturbed conditions.

Early on, insufficient attention was paid by users of current microarray platforms to proper design and quality assessment, which is needed to control for variation in the large number of biological and experimental parameters involved. This compromised the usefulness of these platforms, for example in the development of classification and predictive biomarkers [4042]. The introduction of standards and guidelines for complete microarray workflows [43] is helping to rectify the problem; these need to cover all aspects, from RNA integrity assessment [44, 45] to data analysis and reporting [46, 47]. There has also been constant progress in the use of advanced statistical methods for multivariate classification [48] and for gene-set enrichment analysis [49] of expression profiles. At the same time, it has become clear through the combination of tiling arrays and systematic sequencing that a larger fraction of the human genome is transcribed into diverse types of RNA than was previously thought [50, 51]. The increased power and reduced cost of deep sequencing thus means that it is starting to compete with high-density microarrays [52, 53], to the extent that some believe this is marking the beginning of the end of microarrays [54].

However, given that each new generation of tools takes several years to mature, it is most likely that sequencing and microarrays will continue to coexist. Microarrays will probably be increasingly used for specialized applications, such as those related to transcription regulation, epigenetic modifications, and selection of subfractions of individual genomes for sequencing (for example, exons, highly conserved regions, and so on); whereas mega-sequencing will be used for deep exploration of transcriptomes. The results of transcriptome analyses will increasingly be validated by emerging technologies that use miniaturized high-throughput reverse-transcriptase PCR [55] or multiplex direct visualization and counting of RNA molecules; the latter technology has the added advantage of avoiding biases resulting from reverse transcription [56].

From the genome and transcriptome sequences, it has been possible to derive a relatively complete parts list of genes and, by extension, of proteins, thus revolutionizing the field of proteomics. It is crucial to note that mass spectrometry is effective in the identification of peptides, and not of complete proteins. In order to identify and quantify interesting proteins by mass spectrometry, either through shotgun or directed approaches, an investigator therefore needs to know the sequence of the peptides obtained by enzymatic digestion of those proteins. The current generation of proteomic tools that are based on high-performance combinations of chromatography and mass spectrometry thus enable the identification of a growing number of proteins, and can also identify them over a wide range of abundances and when they have complex secondary modifications. The technology can achieve this using fragmentation, peptide sequencing and, as noted above, comparison with proteins that have been predicted from genome and transcriptome sequences [5759].

Recent results indicate that the description of complete reference proteomes is now within reach in advanced centers, using multiple reaction monitoring combined with mass spectrometry; this combination is the most powerful and rapid targeted approach currently available [6062]. These complete proteomes will probably serve as a reference for the subsequent development of simpler targeted assays [63], which will be complemented by array-based global surveys using affinity-based protein-specific reagents [6466], and in certain cases, by single-cell proteomics using high-speed flow cytometry [67, 68]. Furthermore, ongoing developments using nanomaterials are expected to provide next-generation proteomic analysis tools [69].

In addition to using chromatography and mass spectrometry, metabolomics is also taking advantage of nuclear magnetic resonance to analyze complex sets of metabolites in body fluids and tissues that reflect normal and disease states, and to study interactions with the gut microbial flora and environment factors [7072]. Special attention is being paid increasingly to lipidomics [73, 74] and glycomics [75, 76] as complementary sources of biomarkers.

The development of each of these global high-throughput technologies has triggered implementation of standard operating procedures, ontologies and quality-assurance pipelines for data collection and analysis using dedicated software and databases, and this has required a change in culture in biological laboratories [7781]. In turn, the need for independent validation of the results obtained with these 'omics' technologies has stimulated the emergence of large-scale chemical-genetics and functional screens using cell microarrays and RNA interference [8284].

Computational and mathematical tools empower systems biology

With the increasing availability of large amounts of data and curated information on all types of biological system components, the focus has progressively shifted to identifying the interactions they make, forming transient or permanent macromolecular structures with particular biological functions, and to looking at how the interactions can be represented computationally as metabolic, protein, microRNA and gene-regulatory networks [85, 86]. This emerging 'network biology' is taking advantage of advances in functional genomics, computational methods, computing power, and network and graph theories. It is reviving the advanced biochemistry that has been published in textbooks and illustrated in static wall charts for decades [8789]. Network biology is revealing the existence of modular structures in biological networks that may explain the robustness of biological systems when they are exposed to changing environments [9093].

The initial attempts to identify biologically relevant protein-protein interactions using the yeast two-hybrid technology were plagued by high rates of artifactual events. Thanks to methodological improvements, the rate of false positives has been reduced. The careful curation of these interactions from targeted assays reported in the literature has led to high quality but incomplete maps of the human interactome, which are now available and which are expected to be extended to more complete coverage in the future [9496]. Given that biological networks change their architectures dynamically during biological processes, such as development, physiological responses and disease, their complete determination will continue to be an enormous scientific and technological challenge.

Similar progress is being made in assembling human signaling, metabolic and gene regulatory networks that are based on metabolites, RNA and microRNA expression, protein-protein and protein-DNA interactions [97100]. This has required the development of standardized languages and software tools for graphical representation of molecular interaction maps and computation of predictive and dynamic models [101105]. Integration methodologies have also been essential to combine diverse types of data that have been collected with different platforms and in many laboratories, and thus to generate testable hypotheses [106110]. A limitation that is often overlooked is that the quality of the annotation resources is very variable [111114]. This has triggered sustained community efforts for integrative annotation, which combine automated computation with human-supervised curation, the use of quality indices, text-mining tools, biological ontologies and the semantic web [115120].

In general, the models derived from these integrated methodologies have not yet reached the level of detail and precision of those obtained through highly focused systems biology approaches, such as those that describe the transcriptional control in a free living microorganism under changing environmental conditions [121] or the early phases of development of the sea urchin [122]. It seems likely that the same operating principles of network structure and dynamics that have been revealed in these latter model systems will be relevant to human physiology and pathology [123].

In a parallel track, the Physiome Project is building on over half a century of molecular modeling of excitable cells that used ordinary and partial differential equations and is also using finite element lattices for geometric modeling of complete human organs. This project has steadily developed a computational physiology framework with its own modeling language [124, 125], and initial models of the beating heart, the contracting muscle and the breathing lung are already available [126, 127]. Cell and development simulation efforts use yet other types of modeling formalisms and languages, including Boolean networks, cellular automata and process algebra [128135], and many others are being developed in computational neuroscience, which has yet to merge with systems biology [136].

This diversity of approaches for modeling biological systems highlights the renewed importance of the contributions of mathematics, informatics and physics to systems biology [137139]. Despite the introduction of novel computational methods, given that they are often based on distinct or incompatible principles, it is difficult or impossible to integrate these methods across the multiple levels of organization and time-scales characteristic of living systems [140142]. Thus, multi-scale integration of different types of biological information (DNA, RNA, protein, networks, organelles, cells, tissues, organs, higher level phenotypes, and so on) remains a major challenge in systems biology. The plea for more theory by some of the founders of systems biology must be tempered by the fundamental need to have theories that closely reflect biological data through hypothesis-driven model testing [143]. Recent proposals based on allometric scaling [144] and scale relativity theory [8, 145] may provide the theoretical framework and mathematical tools required to overcome some of these limitations, and may reveal an important role for small fluctuations in driving the behavior of biological systems [146, 147].

The transition from medical genomics to systems medicine

With the availability of increasingly powerful high-throughput technologies, computational tools and integrated knowledge bases, it has become possible to establish new links between genes, biological functions and a wide range of human diseases [148153]. This is providing signatures of pathological biology [154] and links to clinical research [155] and drug discovery [156, 157]. These are the hallmarks of systems medicine as it is emerging from the initial, more targeted efforts of medical genomics.

Success in the identification of mutations affecting the hundreds of genes involved in inherited disorders has been a major outcome of the first generations of genetic maps of the human genome. In contrast, the reported associations between genetic polymorphisms and common complex traits have rarely been confirmed in independent studies. The situation has changed in the past two years [158], with the availability of dense maps of single nucleotide polymorphisms and the adoption by the community of medical geneticists of consensus guidelines for the optimal design of genome-wide or targeted association studies, including rules for independent replication [159, 160]. Despite the very significant problems with signal-to-noise ratios that still severely limit the conclusions that can be drawn from such studies, progress has been made in identifying susceptibility loci involved in, for example, diabetes [161163], obesity [164], and breast or lung cancer [165167]. In the case of lung cancer, however, different scientific groups interpret the functional significance of the results differently. Further progress is expected now that the important role of other forms of genomic polymorphisms between individuals, including monozygotic twins, has been recognized; these include the effects of copy number variations and epigenetic modifications [168170].

Taking advantage of expression-profiling surveys performed in extended human populations [171174], systems biologists have started integrating physiopathology, network biology and DNA variations [175177], providing novel insights into the mechanisms of various diseases, such as diabetes [178] and obesity [179]. Cancer, which can be considered as a prototypical systems disease, has benefited greatly from systems approaches and has served to a large extent as a test case to develop them [180184]. This work has highlighted the importance of epigenetic variations in controlling transcriptional programs sustaining differentiation of normal and cancer stem cells [185, 186].

Transcriptome and proteome analyses of collections of cancer samples, combined with functional annotation and modeling of modulated molecular pathways and networks, have revealed useful biomarkers for the classification and diagnosis of cancer subtypes, the prognosis of patient outcomes, the prediction of treatment responses and the identification of perturbation targets for drug development [187196]. As an illustration of the value of systems approaches, the predictive power and robustness of biomarkers can be significantly increased by integrating transcriptome profiles with interactome data to reveal more relevant functional subnetwork modules [197]. In a similar way, systems approaches are starting to have an impact on the study of immunological diseases [198], inflammation [199], infectious diseases such as tuberculosis [200], neurological diseases such as autism [201] and Alzheimer's [202], respiratory diseases such as asthma [203], cardiovascular and metabolic diseases [204206] and many others. A common biological theme that emerges from many of these studies is that the control and dysfunction of energy metabolism has a central role. This is illustrated in cardiac system bioenergetics by the Frank-Starling law of cardiac muscle contraction [207, 208], in cancer by the Warburg effect (the dependence of cancer cells on aerobic glycolysis) [209], and in neurodegenerative diseases and aging by increases of oxidative stress [210, 211].

When East and West, North and South meet to develop systems medicine

Systems approaches are likely to help elucidate the mechanisms underlying the fundamental biological processes perturbed in human diseases and, in doing so, enable more efficient therapeutic interventions. They will change how drug targets are identified. Novel treatments will include multiple drugs interacting with key interconnected components within functional network modules, each contributing a fraction of the effects of perturbations that cause disease. It is likely that they will be effective only when combined with the multiple interactions of other drugs. This reflects the way that biological systems function and are organized to maintain themselves and constantly adapt to developmental, environmental, physiological or pathological changes. It is also reminiscent of the principles underlying traditional medicines developed empirically within Chinese or Indian cultures for the past several thousand years. Initial attempts at systems approaches, using transcriptome and proteome analyses to study the synergistic effect of combining Western drugs with Chinese medicine components in the treatment of leukemia, are starting to bear fruit [192, 212, 213]. Similarly, metabolome studies are being used to analyze the composition of herbal medicines and explain their properties [214], and to establish how gut microorganisms modulate human metabolic phenotypes and respond to the health or disease state of their host [215].

Systems approaches are also providing evidence on the effects of stress, relaxation, nutrition and lifestyle on the course of health and diseases [216, 217]. Systems studies need to pay greater attention to gender, age and time differences in diet, disease development and treatment administration and responses [218221]. These factors can be monitored, for example, using non-invasive metabolomics surveys of urine [222, 223], and they will increasingly also be monitored using molecular fingerprints of blood proteins that indicate relevant physiological or disease states. Other important contextual phenomena that also need to be taken into account in future studies include the effects of the mother's genetic makeup or feeding habits on the development of the fetus and the timing of its biological clock, which have been observed in animal models [224, 225], and the central role of the major histocompatibility complex in the development and control of disease through immunity and inflammation [226229].

Thus, systems biology will provide the foundation for a prospective medicine that will be predictive, personalized, preventive and participatory [230], and that takes into account the multiple components of the healthcare system, including disease outcomes as reported by the patients themselves, and public and private organizations involved in healthcare management. [231]. In addition to genomics and systems biology, the key components that will ensure the successful development of systems medicine are the modeling of physiopathology in a clinical-practice context [232], imaging [233], and bio-banking that complies with strictly enforced ethical regulations [234236]. These intrinsically interdisciplinary endeavors will require dedicated centers and networks in which scientists of all disciplines can work together [237239], with careful attention to clinical practice and education [240, 241].

In order to implement this vision, academia and industry will have to work closely together in an open-access and open-source environment focusing on the initial, pre-competitive phase of the drug discovery process. This will enable the subsequent development of valuable intellectual property that will result in more effective diagnostic and therapeutic approaches. Such developments might seem very far from the priorities of the less developed countries, in which the majority of the population is excluded from basic medical care. These countries are facing major challenges to their ability to fight infectious diseases and malnutrition, a situation aggravated by the shortage of safe drinking water and economic poverty [242]. International initiatives are underway to tackle these challenges in global health, such as support for engagement of communities in research and formulation of a research and development treaty that will redefine the rules for clinical trials and management of intellectual property rights [243, 244].

Strategic partnerships, such as the Systemoscope Consortium, propose guidelines and strategies for 'rethinking research, understanding life, improving health' [245]. We support the view that leaders of the developing countries should consider establishing integrative systems biology and medicine centers networked with those emerging in the developed countries. Implementing such centers at the heart of their much-needed healthcare infrastructures would ensure immediate access to the most advanced technologies, and allow developing countries to build an essential knowledge base centered on the analyses of their populations. These centers would provide a route to the adequate healthcare that is required to reduce the ever-growing gap between the developed and underdeveloped nations.