God does not care about our mathematical difficulties. He integrates empirically.

—Albert Einstein

Introduction

The reductionism versus holism is an age old debate in all realms of science. Reductionism explains the individual factors responsible for the systems behavior. On the contrary, holism advocates the behavior of the systems in totality and hence favors the integrative approach to understand a system. The concept of holism can be traced back, from the Aristotle’s Metaphysica which says “The whole is greater than the sum of its parts”.

Science finds its roots in reductionism right from its infancy. Reductionist thinking still holds a significant influence on science, including biology, especially after the rise of molecular biology which interpreted life being guided by molecular means including heredity. Molecular biology is grounded in reductionist views wherein complexity of biological systems is explicated in terms of the physicochemical properties of the individual components. The reductionist understanding, finds its roots, way back in the exemplary the one-gene one-polypeptide hypothesis of Beadle and Tatum, who actually demonstrated the direct relationship of the genotype to the phenotype at the molecular level. The technological advances in the DNA sequencing and other analytical platforms have resulted in an exponential increase in the molecular data (Ahmadian et al. 2006; Lister et al. 2009). In this context, the main challenge confronting the field is not to look back (incorporating previous findings is critical but will be comparatively easy) but to look forward to how one might plan and interpret the enormous new data that soon will be generated. Although as far as the interpretation of the data into meaningful information is concerned, bioinformatics has been crucial in every aspect of omics-based research to manage various types of genome-scale datasets efficiently and extract valuable knowledge (Shinozaki and Sakakibara 2009).

However, with the passage of time, the limitations of the reductionist molecular approach have become increasingly evident. It became clear that biological systems cannot be explained only at the genetic level. Instead they should be understood as complex systems resulting from dynamic interactions of different components at different levels, each individually functioning as wholes, which eventually control the phenotype. Complex systems exist at different levels of biological organization ranging from the subatomic realm to individual organisms to whole populations and beyond. Hence a need arose for an integrative framework, which provides a holistic understanding of the biological systems. This new realm of science, in the post-reductionist era, is called systems biology. The concept of incorporation of systems theory into biological sciences was first proposed in 1940s by the biologist Ludwig von Bertalanffy (1973) which was further developed in the 1950s by Ashby (1956).

Systems biology is an approach by which a system of interacting units is analyzed as a whole rather than by analyzing its individual constituents separately. Currently three types of systems approaches exist viz. “top down” which entails the use of high-throughput ‘omics’ technologies to gain a holistic understanding to biological systems (Ideker et al. 2001); “bottom-up” which starts with molecular properties and leads to derivation of models to be tested and validated (Hartwell et al. 1999); and “middle-out” which is based on the principle to start somewhere in between the top and bottom levels, then work out toward a hierarchy of models (Noble 2002). Amongst these, the ‘top down’ and ‘bottom-up’ have been, most usually used to describe the biological systems (Fig. 1). Systems biology has actually enabled the maturation of science from a data collection enterprise to an explanatory endeavor.

Fig. 1
figure 1

Types of modeling approaches in systems biology

Biological systems and systems biology of plants

The biological communities, differing in their form, structure and habitat extend across the entire biosphere. Complexity and robustness are the key features of biological systems. Robustness plays a key role in maintaining the appropriate functioning of the complex system despite internal/external perturbations (e.g., stochastic noise, physiological and developmental signals, environmental change and genetic variation) (Stelling et al. 2004; Kitano 2007; Coruzzi et al. 2009). A thoughtful examination of the biological systems is indispensable to predict the effects of developmental programs, natural or human-induced perturbations on the composition and function of biological systems. The molecular interrogation and investigation of a complex system, along with maintenance of a global perspective is needed to understand the system function is eventually a key challenge to biological intellect. The various levels of biological systems’ organization include cells, molecules, organelles, tissues, organs, biochemical pathways, and whole plants and so on. The systems biology focuses on integration and not the dissimilation of the parts below the cellular level and hence the biological systems extend toward levels such as genome, transcriptome, proteome, metabolome, interactome and cellome (Ideker et al. 2001; Kitano 2002). To gain a comprehensive idea, the assimilation of the networks at different levels of biological organization involving experimental and computational modeling approaches is required (Keurentjes et al. 2011).

Plant systems biology

Plant biology has the potential of providing partial solutions to several of the most daunting problems faced by our planet in the twenty-first century viz. increasing scarcity of food, the depletion of global oil reserves, and a shortage of freshwater. Plants are greatly suited for systems analyses as they are easy to handle, quite good production of offspring for genetic analyses and many have comparatively short generation times. Hence modeling of plant biological phenomena can be accomplished in a quick, reiterative way by means of using the abundant molecular tools available (Keurentjes et al. 2008). The rationale behind increasing interest in systems biology is the prospects and progress in high-throughput genomics and proteomics technologies facilitating the researchers on building ample datasets with relevance to differing plant response (Minorsky 2003). Systems biology aids in our understanding of the plants using holistic systems approaches (Ideker et al. 2001; Kitano 2002).

Plant systems biology resides at the intersection of physiological, morphological, molecular, biochemical and genetic information applied to plant biology (Trewavas 2006). The need for the subject arose with necessity to integrate and interpret large datasets of high-throughput genomic, transcriptomic, proteomic, metabolomics technologies. Despite the enormous efforts in generation of huge experimental datasets, the work done on the side of development of computational platforms to analyze and decipher the data in form of biological model remains scanty (Kirschner 2005; Yuan et al. 2008). Thus, plant systems biology encompasses the studies on plants in response to biological, genetic or chemical perturbations which includes supervision of gene, protein and biochemical pathways; application of omics approaches, integration of the data; and formulation of mathematical models that describe the structure and function responsive to the individual perturbations in form of biologically meaningful perturbations (Fig. 2).

Fig. 2
figure 2

The schematic diagram of a typical plant systems biology approach

The ‘parts’ of plant systems biology

Genomics

The ‘genome’ comprises of the complete genetic material of the organism. Genomics means the study of the organism’s whole genome. It lies at the base of the complex plant systems’ hierarchy and it provides an understanding toward the organisms’ behavioral explanation, and hence is a scientific discipline in its own premise. The earlier molecular perspective to resolve the ambiguities in plant identification and discrimination included various molecular techniques viz. PCR, RFLP, AFLP, RAPD and sequencing (Titanji et al. 2007; Saliba-Colombani et al. 2000). The era of single gene sequencing marked the beginning of plant genomics followed by whole genome sequencing, single nucleotide polymorphism (SNP) and medium density arrays, and eventually led to the current whole genome resequencing. As such, it is a prerequisite for understanding the role of the genes in the development of an organism, hence acting as a driving force from the genomics to the systems biology approach for gaining an insight in the totality.

The sequencing applications known today rely on one of the two applicative aspects viz. de novo sequencing, wherein the sequence reads are obtained from an unknown sequence followed by the assembly; or resequencing, wherein reads are mapped/aligned to a known reference sequence. The de novo applications are far way more labor and cost intensive than the resequencing ones. Major resequencing applications include polymorphism discovery and transcription profiling. The state-of-the-art genome sequencing architecture involving several path breaking technologies has enabled us to gain an insight into the gene complement of the biosphere. The genome-scale technological advances have triggered the whole genome sequencing of the complex plant systems at lower costs. Publication of the first complete genome sequence of Arabidopsis thaliana in 2000 and subsequently that of first monocotyledonous plant Oryza sativa in 2005 revolutionized research in the realm of plant genomics. These genomes used the traditional clone-by-clone strategy which involved sequencing of overlapped bacterial artificial chromosomes (BAC) clones selected from a physical map, deploying the Sanger dideoxynucleotide chain termination method. The next approach introduced was the whole genome shotgun (WGS) which was used to produce the draft sequences of many plants like Sorghum bicolor (Paterson et al. 2009), Vitis vinifera (Jaillon et al. 2007) and so on. Since 2005, new sequencing platforms have emerged, having the high-throughput and cost-efficient capabilities, so called as ‘Next Generation Sequencing (NGS)’ technologies. A variety of NGS technologies include the 454 FLX (Roche) (Margulies et al. 2005), the Genome Analyzer/Hiseq (Illumina Solexa) (Bennett et al. 2005) and the SOLiD (Life Technologies), as well as newer platforms such as Heliscope (Helicos) (Milos 2008), PacBio RS (Pacific Biosciences) (Eid et al. 2009) for single molecular sequencing, and Ion Torrent (Life Technologies), based on a semiconductor chip (Rothberg et al. 2011), are also available. A good No. of plant genomes have recently been sequenced using NGS technology (Huang et al. 2009; Sato et al. 2011; Shulaev et al. 2010; Wang et al. 2011; Xu et al. 2011). Some examples of draft plant genome sequences include that of cucumber (Cucumis sativus) and cassava (Manihot esculenta) (Feuillet et al. 2011).

The main challenges in the whole genome sequencing of plants include: size and complexity (in the form of varied ploidy levels). Overall, the size of plant genomes (both number of chromosomes and total nucleotide base pairs) exhibits the greatest variation amongst other forms of life, with the average representative size of a plant genome being ~6 Gb (Gregory et al. 2007)—far larger than the average size of genomes sequenced so far from other organisms. Yet other hurdle includes the heterozygosity, observed in a variety of plants species, and to solve this issue, the homozygous derivatives are selected for sequencing, whose sequences would then act as templates for mapping the variation observed on the heterozygotes (Jaillon et al. 2007). The other factors affecting the success rate include the presence of repetitive and transposable elements (Ming et al. 2008) as well as high-copy chloroplast and mitochondria organelles (Schnable et al. 2009), which complicate the de novo assembly of plant genome sequences and skew the coverage levels. Despite the limitations, plants do offer some advantages like ability of clonal propagation, which effectively immortalizes the genotypes of interest. The various de novo assemblers for next-generation sequencing platforms are listed in Table 1 (Imelfort and Edwards 2009; Zhang et al. 2011).

Table 1 List of de novo genome assembly software tools for plant genomics

The final step in genomics is the annotation and deposition of sequence data into public consortia of databases. The vast technological attainments have pushed the needs for improvement of bioinformatics capabilities to mine complex data for valuable biological information. The current databanks [e.g., NCBI Genbank, EMBL (European Molecular Biology Laboratory), DDBJ (DNA Databank of Japan)] contain wealth of ~7.4 million plant DNA sequences, relatively few whole organellar and fewer nuclear genomes.

Genome-wide association studies (GWAS)

The interaction of the individual genes with the environment enables an organism to produce a phenotype. Many phenotypes are quantitative in nature, and complex in etiology, with multiple environmental and genetic causes. There are innumerable examples of quantitative traits in plants viz. plant yield, flowering time, sugar content, disease resistance and fruit weight, which result from the segregation of many genes and are influenced by environmental interactions (Paran and Zamir 2003). The classical approach to study the complex quantitative trait loci (QTLs) was linkage analysis. In this approach, the identification of polymorphisms is carried out in two parents followed in a large segregating population. The linkage can be inferred from the identification of the recombinants showing phenotypic differences from that of parents. The limitation of this approach was that the resultant QTLs were restricted in allelic diversity and in having limited genomic resolution (Borevitz and Nordborg 2003). Another approach to map QTLs is association mapping, which resolves complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the level of whole populations (Nordborg and Tavaré 2002). The latter can be investigated at the candidate gene level as well as a genome-wide study, involving multiple gene associations. The current advancement in the NGS technologies and decrease in associated costs, have enabled the paradigm shift from candidate gene-based studies to a comparatively new approach of genome-wide association studies (GWAS) to map the plant QTLs efficiently (Rafalski 2010, Schneeberger and Weigel 2011). Genome-wide association mapping, or genome scan, includes the investigation of genetic variation in the whole genome to find signals of association for various complex quantitative traits. The main advantages of GWAS over the classical linkage mapping includes: (1) increased mapping resolution, (2) reduced research time and (3) greater allele number (Yu and Buckler 2006).

The GWAS strategy comprises of single or combinatorial use of high-throughput NGS techniques, whole genome oligonucleotide arrays and genotyping by sequencing approaches for elucidation of complex phenotypic variations in plants. The GWAS approach was successfully exploited in many plant species like Arabidopsis (Atwell et al. 2010), rice (Huang et al. 2010b), maize (McMullen et al. 2009), barley (Pasam et al. 2012), sorghum (Morris et al. 2013), etc. In several plant species, diverse germplasm panels are being established for whole-genome association analysis (Caldwell et al. 2006; Hamblin et al. 2006; Nordborg et al. 2005; Yu and Buckler 2006).

The ideal technological platforms for GWAS include 454-GS FLX and Illumina 1 G Genome Analyzer for identifying SNPs through short resequencing reads of alleles from different individuals. Despite the technological and thoughtful innovations, GWAS have given rise to several systems level bioinformatics challenges of modeling the complex genotype–phenotype relationships using computational means, interpretation of genetic associations using biological resources and further development of powerful yet user-friendly softwares for the interaction between genome biologists and bioinformaticians.

Epigenomics

Phenotypic variation is usually attributed to genetic variation corresponding to the nucleotide sequences during the course of evolution. The candidate gene and genome-wide analyses are imperative for mapping the genotype–phenotype relationship. Over a long span of time, numerous biological phenomena like paramutation (Brink 1958), parental imprinting (Kermicle 1970), control of transposon activity (McClintock 1984) and transgene silencing (Baulcombe 2004; Matzke et al. 2009), failed to be understood by the classical mendelian notion of inheritance of acquired traits, long before their molecular details were known. These phenomena overruled the Mendelian laws and appeared to follow the non-mendelian mode of inheritance (Grant-Downton and Dickinson 2005). The intensive efforts to find suitable reasons and mechanisms underlying, these apparent ‘exceptions’ to Mendelian suppositions have led to shaping a new field of ‘epigenetics’ (literally meaning ‘above genetics’).

The term “Epigenesis” was coined by Aristotle and later in 1942, the British developmental biologist Conrad H. Waddington coined the term “epigenetics” (Waddington 1942). In current scenario, epigenetics is the study of heritable changes in gene expression and function that cannot be explained by changes in DNA sequence (Bird 2007). The epigenome refers to the description of various epigenetic regulators across the whole genome (He et al. 2011). However, in contrast to the genome, which is identical in all cell types throughout life, the epigenome is dynamic and varies between cell types, developmental stages or in response to environmental stimuli.

Plants, being sessile autotrophs, demonstrate a high degree of developmental plasticity, to survive stressful environmental changes. Recent studies show that various epigenetic mechanisms underlie the survival strategies of plants under environmentally harsh conditions.

Epigenetic pathways are important components of plant growth, development and reproduction regulatory processes in plants. Various plant physiological processes in plant development, including flowering time, gametogenesis, stress response, light signaling and morphological change are modulated directly or indirectly by epigenetic marks. DNA methylation, histone modifications and non-coding RNA-based mechanisms are the chief modes by which ‘epigenetic’ information can be encoded in plants.

DNA methylation

Cytosine methylation, i.e., the chemical modification of the cytosine bases with the methyl group at the 5′ end is a common epigenetic mark of the plant genome. The patterns of cytosine methylation are varied and have been elaborately reviewed by Castiglione et al. (2002).

These patterns of cytosine methylation are highly dynamic and can change significantly with both the developmental state and the environmental perturbations (e.g., Burn et al. 1993). The DNA methylation may appear to be more complex particularly in plants with reference to the sequence sites which are methylated. DNA methylation could take place at symmetric (CpG and CpNpG, where N is any nucleotide) as well as asymmetric sites (any other cytosine) on the genome. The CpG and CpNpG, sites are called symmetric as the sequence is self complementary with methylable cytosines in pairs on opposite stands. Asymmetric methylation sites are rarely found in plants as compared to their symmetric counterparts.

The various techniques associated with the comprehensive study of DNA methylation on a genome-wide scale include bisulfite sequencing, restriction endonuclease digestion coupled to microarray technology and affinity purification (Beck and Rakyan 2008; Suzuki and Bird 2008). While sequencing the DNA methylome, bisulfite treatment converts unmethylated cytosines to uracils but does not alter methylated cytosines, thereby scoring the correct methylated cytosines. Very recently, single-molecule-based methods such as nanopore sequencing (Clarke et al. 2009) have been used to directly sequence the DNA methylome without bisulfite treatment. The activities of methylation-sensitive restriction endonucleases like HpaII and SmaI are inhibited by methylated cytosines, thereby allowing differentiation between methylated and unmethylated DNA fragments (Zilberman and Henikoff 2007). The combination of the methylation-sensitive enzymatic digestion with microarray technologies was used in early cases of plant DNA methylome profiling (Tompa et al. 2002; Tran et al. 2005a, b). The affinity purification includes immunoprecipitation of DNA with an antibody that specifically recognizes methylated cytosine (mCIP). Recently, the combination of this technique with high-end sequencing techniques has been used, i.e., methylated Chromatin Immunoprecipitation Sequencing (mCIP-Seq) to profile the DNA methylome in Arabidopsis (Gehring et al. 2009) and rice (Yan et al. 2010). BS-Seq is presently the most useful and widely used technology for DNA methylation analyses in plants (Cokus et al. 2008; Feng et al. 2010; Hsieh et al. 2009; Lister et al. 2008; Zemach et al. 2010).

Histone modifications

The histone proteins (H2A, H2B, H3 and H4), involved in the compaction of the chromosomes, can be covalently modified by post-translational modifications like methylation, acetylation, ubiquitination, phosphorylation, biotinylation and ADP-ribosylation. They are thought to determine the accessibility of DNA to transcriptional regulators (Bird 2001; Schones and Zhao 2008). Histone acetylation generally creates an accessible chromatin conformation while histone deacetylation, often coupled to histone methylation, initiates a compressed chromatin conformation that promotes silencing and the formation of heterochromatin (Berger 2002). Histone methylation can confer both an active or repressed transcriptional state depending upon which lysine is methylated viz. (Histone 3 lysine 9: H3K9), H4K20 and H3K27 are silencing modifications, while H3K4 methylation produces active chromatin (Feng and Jacobsen 2011). Histone modifications and DNA methylation are often intertwined; each epigenetic mark can influence the other’s recruitment to reinforce differential epigenetic states (Tariq and Paszkowski 2004; Cedar and Bergman 2009). Histone modifications can be studied via deploying the different variants of the traditional chromatin immunoprecipitation (ChIP) technology, based on the principle of probing histone–DNA interactions in vivo and to determine the genomic location of chromatin-associated proteins (Solomon et al. 1988). The enriched DNA from a ChIP assay can be examined by genomic tiling microarray hybridization (ChIP-chip) or various sequencing-based technologies (ChIP-Seq) enabling genome-wide analyses of histone modifications using instrumental sophistication (Mendenhall and Bernstein 2008; Park 2009; Schones and Zhao 2008).

Non-coding RNA

Small non-coding RNAs (miRNAs, siRNAs, snoRNAs and rasiRNAs), about 21–30 nucleotides long, play crucial epigenetic regulatory roles at different developmental stages in plants in response to environmental perturbations (Bonnet et al. 2006). In the past some years, smRNAs have been found to be involved in a plethora of roles in developmental biology, plant physiology and genome stability as adaptive responses to environmental stimuli (Mirouze 2012). They also control the movement of transposable elements at the transcriptional and post-transcriptional level (Vaucheret 2006). The analytics used for genome-wide profiling of smRNAs is the massively parallel signature sequencing (MPSS) approach (Lu et al. 2006; Nobuta et al. 2007). This technique briefly involves smRNA purification, adaptor ligation, reverse transcription and amplification of cDNA products followed by high-throughput sequencing of the same (Wang et al. 2009a).

High-resolution profiling of the epigenome has uncovered genome-wide combinatorial interactions of DNA methylation, histone modifications and siRNAs with complex relationships to chromatin accessibility and mRNA transcription in plants (Elling and Deng 2009). Recent technical advances used to decipher the epigenetic mechanisms have started to convert epigenetic research into a high-throughput enterprise, to which bioinformatics is expected to make significant contributions. The Epigenomics of Plants International Consortium web site (https://www.plant-epigenome.org/) provides hyperlinks to plant epigenome data resources. Various other web-based epigenetic resources include SIGnAL (Zhang et al. 2006) and Epigara (Bouyer et al. 2011) for the Arabidopsis-based research. Various bioinformatics tools are available for explaining epigenetic phenomena (Bock and Lengauer 2008) like for ChIP on chip (Ringo, Tilescope), ChIP-seq (BLAT, BLASTn, ELAND, Solexa Analysis pipeline, SXOligosearch), CpG sequence prediction (Glimmer, GrailEXP), Bisulfite sequencing (ESME, BiQAnalyzer).

Transcriptomics

The term ‘transcriptome’ encompasses all the genomic counterparts which are expressed as RNA transcripts, including coding (mRNA) and non-coding (e.g., tRNA, miRNA) RNAs at a given time in a cell or population of cells under a given set of environmental conditions (Wang et al. 2010). Traditional transcriptional analyses comprised of northern blots, but the advent of high-throughput technologies, have enabled us to elucidate the entire transcriptome of model and non-model plants. The modern methods of transcriptome elucidation include microarray analysis and NGS technologies. Microarray technology has been used since past many years and has matured into developing our insight toward gene regulatory networks and their behavior under varying environmental conditions. Particularly, the ATH1 Genome Array developed by Affymetrix™ has been extensively used for transcriptional studies in Arabidopsis (Busch and Lohmann 2007). Microarray technology was deployed to characterize transcriptomes for other plant species, including maize (Zhu et al. 2009) rice (Hazen et al. 2005), barley (Delp et al. 2009), soybean (O’Rourke et al. 2009) and tomato (Auge et al. 2009). The success of microarray techniques and extension toward other ‘omics’ approaches have been reviewed in Prowar (2012) and Tohge and Fernie (2012). A variety of bioinformatics tools have emanated in the microarray-based research of the plant fraternity viz. Genevestigator (Zimmermann et al. 2004), NASCArrays (Craigon et al. 2004), ArrayExpress (Parkinson et al. 2005), the Gene Expression Omnibus (Edgar et al. 2002) and Stanford Microarray Database (Ball et al. 2005).

A better alternative to microarray technology is sequence-based transcriptome profiling. The significant advantages of sequence-based transcriptome analyses are potential to quantify the abundance of transcript and second, indifference to the availability of sequenced genome. The high-throughput NGS technologies in RNA-seq are a popular approach to collecting and quantifying the large-scale sequences of coding and non-coding RNA pools at a low cost (Wang et al. 2009b; Garber et al. 2011). The two main NGS platforms that have been used successfully for transcriptomic studies of non-model plants include Roche/454 (Margulies et al. 2005) and Solexa/Illumina (San Diego, California, USA). This approach has been used for the swift development of genomic resources in applied and emerging plant species (Zenoni et al. 2010; Gowik et al. 2011). Majority of RNA-seq projects have utilized a reference genome for identification of splice variants like in Arabidopsis and rice (Filichkin et al. 2010; Lu et al. 2010). Although, in cases of unavailability of reference genomes, the de novo assembly from RNA-seq data is used to provide transcriptional profiling in new species (Fu et al. 2011; Su et al. 2011; Wong et al. 2011). Apart from mRNA-based transcriptome studies, NGS technologies were also exploited to studies focused on non-coding smRNA studies in plants like Solanum lycopersicum (Moxon et al. 2008), Medicago truncatula (Szittya et al. 2008; Lelandais-Brière et al. 2009) and rice (Zhou et al. 2010).

Various bioinformatics tools aid the de novo assembly in RNA-seq based applications viz. CAP3 (Huang and Madan 1999), CLCbio Genomics Workbench (CLC Bio), gsAssembler (454 Life Sciences), MIRA (Chevreux et al. 2004), TGICL (Pertea et al. 2003) and Trinity (Grabherr et al. 2011) and Velvet (Zerbino and Birney 2008) etc. Similarly, there are others available for reference-guided assembly in RNA-seq like BLAT (Kent 2002), Bowtie (Langmead et al. 2009), BWA (Li and Durbin 2009), GSNAP (Wu and Nacu 2010) and an RNA sequence analysis tool is RX (Hong et al. 2012) etc. Moreover, programs like TopHat (Trapnell et al. 2009) and SpliceMap (Au et al. 2010) have been used to map splice junctions onto the sequence.

Proteomics

Proteome, termed by Wilkins et al. (1996) is the entire protein complement of the system, expressed at a given time and under-defined environmental conditions. The systemic analysis of the proteome is called proteomics (Pandey and Mann 2000; Patterson and Aebersold 2003; Phizicky et al. 2003). The complexity of the proteome is much greater than that of the transcriptome due to the huge amount of possible post-translational modifications, making the former highly dynamic (Glinski and Weckwerth 2006). The general workflow of proteomics analyses involves separation of proteins, digestion with proteases followed by peptide mass fingerprinting, determination of mass of the fragments and peptide sequencing by MS/MS method followed by database searching and identification of protein (Thelen and Peck 2007). Proteomics basically deals with analyzing changes in protein expression, study of protein structure, function and post-translational modifications that majorly include phosphorylation and ubiquitination (You-ping et al. 2010). A wide variety of analytical platforms for elucidation of the proteomic aspects are currently available which include, the classical 2D Gel electrophoresis, Edman sequencing and mass spectrometric methods. Amongst these, mass spectrometric variants like Matrix-Assisted Laser Desorption Ionization Mass Spectrometry (MALDI-TOF MS), Electron spray ionization (ESI-MS), MS–MS, etc. have contributed immensely to the development of the field (Park 2004; Domon and Aebersold 2006). Recently, a new approach to analyze proteins directly by MS, without gel separation, has been developed, which is referred to as multidimensional protein identification technology (MudPIT) (Appella et al. 1995; Washburn et al. 2001; Wolters et al. 2001). The isotope-coded affinity tag (ICAT)-MS has been used for quantitative proteomics. Recently a new method using fluorescence resonance energy transfer (FRET) between fluorescent tags on interacting proteins, using green, cyan and yellow fluorescent protein (Phizicky et al. 2003) has been reported, which involves the in vivo analysis by microscopy.

Bioinformatics has immensely helped in the detailed analysis of the proteomics data. Various 2D PAGE-related databases are available on the web like ECO-2DBASE (VanBogelen et al. 1999), SWISS-2DPAGE (Hoogland et al. 2000) and WORLD-2DPAGE (specifically for plant proteins). Various other programs are available for the image analysis of the 2D gels like Melanie (Appel et al. 1997), PDQuest (BioRad), Progenesis (Rosengren et al. 2003) and Delta 2D (Decodon GmbH). The protein mass fingerprinting analysis tools include Mascot, Sequest, Aldente, Popitam, Phenyx, FindMod, Profound, PepFrag, MS-Fit, OMSSA, TagIdent, etc. The protein databases available over the web include, SwissProt/UniProt knowledgebase, Tr-EMBL, Genbank, Ensembl, PIR and EST database (Vihinen 2001). The Arabidopsis and rice proteome databases are used by various proteomics labs today. The subcellular proteomic database (SUBA) hosts information on subcellular localization of plant proteins, based on GFP tagging and proteomics methods. The various other types of proteomics databases have been reviewed by Komatsu (2006).

An attempt to map proteomes of different plant tissues from rice and Arabidopsis was seen in the publications by Tsugita et al. (1996). The works of Vener et al. (2001) and Sherrier et al. (1999) were emphasized on the post-translational modifications of proteins. Also, the subcellular proteomes and protein complexes in plants, e.g., proteins in the plasma membranes, chloroplasts, mitochondria and nuclei have been worked out (Rouquie et al. 1997; Peltier et al. 2000, 2001; Prime et al. 2000; Kruft et al. 2001; Millar et al. 2001; Bae et al. 2003). A genome-scale proteomics study was carried out in A. thaliana with the identification of 13,000 proteins and almost half of predicted gene models (Baerenfaller et al. 2008). More recently, the International Plant Proteomics Organization (www.inppo.com), which is a recent global initiative to develop and improve connections between plant proteomics researchers and related fields, has been established (Agrawal et al. 2011). The field of ‘proteomics’ is still in its infancy; and for the prosperity of the same, the challenges have to be overcome with further logical and technological advancement (van Wijk 2001).

Metabolomics

The term “metabolome”, suggested by Oliver et al. (1998) , includes the entire set of small molecule metabolites, produced by any organism. Metabolomics, coined by, Oliver Fiehn (2002), is the comprehensive analysis of all the metabolites under given set of conditions, in an organism. The set of metabolites in an organism, represents more heterogeneity compared to genes and proteins in terms of their physical and chemical properties, varying widely with respect to size, polarity, quantity and stability. This is responsible for the dynamism of the metabolome having both temporal and spatial constraints (Fiehn 2002). Moreover, the metabolomes spatially defined as organs, tissues, cells have different metabolite profiles (Ebert et al. 2010; Schad et al. 2005; Sumner et al. 2011). The metabolic wealth is attributed not only to the amount of genes (20,000–50,000) but also to the multiple substrate specificities for many enzymes (Aharoni et al. 2000), subcellular compartmentation, and the occurrence of nonenzymatic reactions. An estimated 200,000 metabolites exist in plant (Pichersky and Gang 2000), although only ~50,000 have been elucidated (De Luca and St. Pierre 2000). Metabolic profiles provide a biochemical phenotypic assessment of the plants and hence are the most valuable in systems biology studies, so regarded as a cornerstone of systems biology (Hall 2006; Saito and Matsuda 2010).

The general workflow of a metabolomic analysis comprises of four main stages viz. preparation of the sample, data acquisition using analytical methods, data mining and compound identification plus quantification using the statistical and bioinformatics analyses. The final task is to draw meaningful biological interpretations from the analyzed data (Fiehn 2002). Various analytical platforms may be used in metabolomics like, nuclear magnetic resonance (NMR), LC-NMR, mass spectrometry (MS): gas chromatography–mass spectrometry (GC-MS), capillary electrophoresis–mass spectrometry (CE-MS), liquid chromatography–mass spectrometry (LC-MS), liquid chromatography–electrochemistry–mass spectrometry (LC-EC-MS), direct infusion mass spectrometry (DIMS), fourier transform ion cyclotron mass spectrometry (FTMS); infrared spectroscopy (IR), thin layer chromatography (TLC), high-performance liquid chromatography (HPLC) equipped with different kinds of detectors: UV or photodiode array (PDA), fluorescent, electrochemical, etc., Fourier transform infrared (FT-IR)- and Raman spectroscopies (Fiehn et al. 2000; Verpoorte et al. 2008; Moco et al. 2008; Allwood and Goodacre 2010; Weckwerth and Morgenthal 2005). Amongst these, NMR and MS are the chiefly applied (Fiehn 2002; Allwood and Goodacre 2010). Statistics and bioinformatics are indispensable tools for the processing of the large metabolome datasets generated by high-performance instrumentation mentioned above. Statistical analyses include unsupervised and supervised algorithm-based methods. Unsupervised methods include principal component analysis (PCA), hierarchical clustering (HCA), K-means clustering, Soft Independent Modeling of Class Analogy (SIMCA) (Wold and Sjostrom 1977) and self-organizing maps (SOMs) while the supervised methods include discriminant function analysis (DFA) (Lachenbruch 1975), partial least squares (PLS), ANOVA, feed-forward neural networks (Cowan and Sharp 1988), support vector machines (Cristianini and Shawe-Taylor 2000), orthogonal partial least squares discriminant analysis (OPLS-DA) (Bylesjö et al. 2006), genetic algorithms (Goldberg 1989) and genetic programming (Koza 1992). Various bioinformatics tools and databases are available for handling, processing and analyzing metabolomics data. The diverse resources used for plant metabolomics-based studies have been excellently reviewed by Fukushima and Kusano (2013).

Several attempts are made to explore the plant metabolome in various aspects like mutations (Bino et al. 2005; Yonekura-Sakakibara et al. 2008), identification of novel metabolite compounds (Xie et al. 2008), environmental disturbances (Ahuja et al. 2010; Allwood et al. 2006, 2008, 2010; Choi et al. 2006; Jansen et al. 2008; Sanchez et al. 2010; van Dam and van der Meijden 2011; Ward et al. 2010), genotyping (Sumner et al. 2003; Fiehn et al. 2000) and genetic introgression (Fernie and Schauer 2009; Keurentjes et al. 2006; Schauer et al. 2006). Systems analyses have also been applied to secondary metabolite in plants (Huang et al. 2010a;Breitling et al. 2013; Spiering et al. 2014). An integrated systems approach has also been applied to the investigation of tomato fruit development, where combined transcript, protein and metabolite analyses are carried out (Osorio et al. 2011). Recent studies pertaining to secondary metabolites have been carried out in Jasmonates (De Geyter et al. 2012), Flavonoids (Groenenboom et al. 2013), polyphenolics (Bovy et al. 2010), alkaloids analysis (Kim and Verpoorte 2010) and glucosinolate analysis (Hall et al. 2010). The current challenges in the field include, the heterogeneity and complexity of the plant metabolome and corresponding massive analytical needs to elucidate the same.

Interactomics

The functioning of a cell or any system is attributed to the dynamic and harmonized interactions of its macromolecular constituents, like DNA, RNA, lipids, proteins, and other small molecules, with varied biochemical properties. Amongst these, the protein–protein interactions are most abundantly reported followed by the DNA/RNA–protein interactions in plants. DNA–protein interactions include the histone proteins bound to DNA to form the chromatin structure, which function in the epigenetic regulation of various physiological processes, discussed earlier. The RNA-binding proteins (RBPs), influence the processing, synthesis, nuclear export, stability and translation of RNA subsets contribute to the coordination of gene expression, but the reports are scanty (Fedoroff 2002; Belostotsky and Rose 2005) owing to the genetic compartmentation (nuclear and organellar) and eventual larger repertoire of the RBPs than in other well-explored biological systems (Barkan 2009). Thus, interactomics, the comprehensive analysis of the interactions between different macromolecules, predominantly protein–protein interactions in an organism, is critical to our understanding of the cellular systems (Cusick et al. 2005; Morsy et al. 2008). Protein–protein interaction (PPI) studies aid in elucidation of the biophysical basis for all the cellular processes and building a framework for functional characterization of individual proteins. Profiling protein–protein interactions has been the major focus of interactomics in the past few years (Charbonnier et al. 2008) largely due to the technological advances and developing insight in the field. Plants are expected to have 75,000–15,000 protein interaction pairs (Morsy et al. 2008), from proteomes of 30,000–40,000 estimated proteins (Sterck et al. 2007).

Plant protein interactomes can be mapped using in vivo, in vitro and in silico methods. The various in vivo technological platforms used for the plant interactome mapping studies include, yeast two-hybrid (Y2H), split-ubiquitin system (SUS), bimolecular fluorescence complementation (BiFC), Split-luciferase system, fluorescence resonance energy transfer (FRET) and bioluminescence resonance energy transfer (BRET) (Morsy et al. 2008). The various in vitro analytics include affinity purification mass spectrometry (AP-MS), protein microarrays and surface plasmon resonance. The in silico prediction methods often integrate multiple types of data from various analyses like co-expression, co-localisation, co-evolution, functional comparison, occurrence of orthologs or interologs (i.e., interactions conserved across species), times of occurrence and type of interaction (e.g., direct, physical association, genetic interaction, etc.) and so on (Sharan and Ideker 2006; Sharan et al. 2007).

The various protein–protein interaction web resources in plants include, Arabidopsis Interactome 1 (Arabidopsis Interactome Mapping Consortium 2011), Arabidopsis membrane interactome (Mukhtar et al. 2011), Rice kinase-protein interaction map (Ding et al. 2009), auxin-signaling network (Vernoux et al. 2011), InAct (Aranda et al. 2010), TAIR Protein Interaction data (Swarbreck et al. 2008), AtPID (Cui et al. 2008), CORNET (De Bodt et al. 2012), PAIR (Lin et al. 2011), PRIN (Gu et al. 2011), MINT (Licata et al. 2012), DIP (Xenarios et al. 2002), MatrixDB (Chautard et al. 2009), BIND (Alfarano et al. 2005), BioGRID (Stark et al. 2006), APID (Prieto and De Las 2006), iPfam (Finn et al. 2005), BAR (Geisler-Lee et al. 2007), STRING (Szklarczyk et al. 2011) and AtPIN (Brandão et al. 2009), PINA (Cowley et al. 2012).

The first computationally predicted interactome for Arabidopsis was given by Geisler-Lee et al. (2007). However, the first organized experimental proteome-scale interactome map for plants—Arabidopsis Interactome 1 (AI-1), containing ~6,200 interaction among ~2,700 proteins, was published in 2011 (Arabidopsis Interactome Mapping Consortium. 2011). Mukhtar et al. (2011) utilized AI-1 to understand the pathogen–host interaction. The PPI resources containing >50 interaction have been recently reviewed by Braun et al. (2013). Protein–protein interaction networks are found to be key players in various biological phenomena like signal transduction (Schoonheim et al. 2007), response to abiotic stresses (Tardif et al. 2007), cell cycle (Van Leene et al. 2007), protein ubiquitination (Maor et al. 2007), RNA processing (Hunt et al. 2008), etc. As far as protein–protein interaction networks are concerned, network dynamics and comprehensiveness are the issues requiring great concern, to obtain better systems level understanding of plants.

Other ‘omics’ approaches

Apart from the chief omics approaches discussed earlier, some recent approaches include lipidomics (comprehensive study of the lipid entities of the organism) (Welti et al. 2007) and hormonomics (the entire set of endogenous hormones in a plant). The low molecular weight plant hormones include auxin, ABA, cytokinin, gibberellins, ethylene, brassinosteroids, jasmonates, salicylic acid (Davies 2004) and a newly identified one—strigolactone (acting as a shoot branching inhibitor) (Gomez-Roldan et al. 2008; Umehara et al. 2008); lectinomics (bioinformatics studies of carbohydrate-binding proteins–lectins) and various others. Also, a new concept which has gained much attention in this era is that of phenomics’, the high-throughput systemic analysis of phenotypes, which has probably the biggest application in plant biotechnology (Edwards and Batley 2004).

Integration of multiple ‘omics’ data

The advances in high-throughput analytics have enabled us to gain insights of individual biomolecules with the help of various ‘omics technologies’ discussed in the previous section. However, any single ‘omics’ approach may be inadequate to characterize the complexity and behavior of biological systems as a ‘whole’ (Gygi et al. 1999). Hence the molecular research is gradually shifting toward the holistic perceptions of systems biology, by integration of the individual ‘omics’ datasets, to gain biologically meaningful interpretation of the plant systems.

Therefore, integration of multiple layers of biological information will provide a precise ‘picture’ of the ‘whole’ plant systems (Fig. 3). The integration of the multiple omics datasets must be carried out after they are preprocessed (normalization, missing value attribution and feature selection). Data integration is a key to the successful development of the systems philosophy by building comprehensive models of plant systems. Given the enormous promise of integration of the multiple omics data, logical input to the designing of various experiments and analysis of the heterogeneous data is gaining interest (Choi and Pavelka 2011). The successful integration of data will depend on appropriate experimental design, sound statistical analysis and correct interpretation of the results. The various aspects of successful integration of multiple heterogeneous omics datasets are to deposit individual ‘omics’ data to respective public repositories, to generate relationships among various kinds of datasets, visualization of the data and application of statistical and bioinformatics resources, where and when needed (Fig. 3). These aspects have been elaborately discussed in Joyce and Palsson (2006).

Fig. 3
figure 3

Integration of heterogeneous multiple ‘omics’ data

Various instances of omics data integration are available in the literature. There are a number of reports on the elucidation of gene function by combining the metabolomic analysis with genomic and transcriptomic data (Tohge et al. 2005; Hirai et al. 2004, 2007; Hirai and Saito 2008; Saito et al. 2008; Watanabe et al. 2008; Yonekura-Sakakibara et al. 2008; Okazaki et al. 2009). Very recently, an integrated transcriptomics and epigenomics approach was used in maize hybrids (He et al. 2013). The integrated use of transcriptomic and proteomic data has been reported in various recent studies involving whole plant nitrogen economics of maize (Amiour et al. 2012), growth to dormancy transition in white spruce stems (Galindo González et al. 2012), phytohormone crosstalk (Proietti et al. 2013) and flour quality in wheat (Altenbach et al. 2010). Similarly, integrated metabolome and transcriptome analyses were recently applied in analysis of rice developing caryopses under high-temperature conditions (Yamakawa and Hakata 2010), molecular events underlying pollination-induced and pollination-independent fruit sets (Wang et al. 2009c), the effects of DE-ETIOLATED1 down-regulation in tomato fruits (Enfissi et al. 2010) and changing metabolic systems in plants growing in field conditions, such as the rice mutant and transgenic barley (Kogel et al. 2010; Izawa et al. 2011). An integrated metabolome and proteome analysis was applied in wheat and rice coleoptiles to illustrate the differences in response to anoxia (Shingaki-Wells et al. 2011) and characterization of starch and raffinose metabolisms to low and high temperatures in A. thaliana (Mostafavi et al. 2008). An integrated transcriptome, proteome and metabolome approach was adopted to describe the cascading changes to UV-B in maize (Casati et al. 2011). Moreover, an integrated hormonome, metabolome and transcriptome analyses in Arabidopsis transgenic lines, displayed increased leaf growth to gain insight into the molecular mechanisms that control leaf size (Gonzalez et al. 2010) have been reported. The literature mining is also a useful approach to knowledge integration in plant biology (Krallinger et al. 2008; Winnenburg et al. 2008). Apart from single problems, more complex problems like photosynthesis have been addressed by Weston et al. (2011), where they characterized a network for heat transcriptome of three plant species (Arabidopsis, Populus and Soybean) where expression of one heat responsive module showed a negative correlation with leaf-level photosynthesis at a critical temperature. Later they proposed a conceptual model where traditional network analysis can be linked to whole-plant models (Weston et al. 2012). Also recently, Fouracre (2014), threw light on the application of systems approaches in understanding the Kranz anatomy of the C4 plants. Several web-based resources like PLAN2L (Krallinger et al. 2009) and PosMed-plus (positional Medline for plant upgrading science) (Makita et al. 2009) are available to integrate literature-derived bioentities and associated information.

There are several challenges to integration of the multiple omics data (Joyce and Palsson 2006; De Keersmaecker et al. 2006; Steinfath et al. 2007). One of the problems in complex annotation and integration is the lack of agreed formats across different omics datasets due to the heterogeneous repositories of the primary data sources. The solutions to this problem include creation of ‘data warehouses’, use of extensible markup language (XML), hypertext navigation, Unmediated MultiDB queries, creation of federated database and using controlled vocabularies. A Data Warehouse retrieves data from multiple resources, translates the formats and puts them in one database. The examples of data warehouses include: Atlas, BioMart, BioWarehouse, Columba, SYSTOMONAS, BioDWH, VINEdb, Booly, GNCPro (Turenne 2011). The XML is a general-purpose markup language that helps in sharing data across heterogeneous systems. The development of Systems Biology Markup Language (SBML) (Hucka et al. 2003) is probably the first and most successful efforts in this aspect. Plant Ontology Consortium is a collaborative effort among model plant genome databases and plant researchers that aim to create maintain and facilitate the use of a controlled vocabulary (ontology) for plants (Avraham et al. 2008). The other problem includes is of statistical analysis, i.e., evaluation of the complexity of integration differing from that of individual omics analysis and applying a suitable method thereafter. The integration of omics data is thus, far more than just ‘joining the pieces’; it is actually a journey of exploring uncharted territories and transforming information to more useful biological knowledge.

Modeling and simulation in plant system dynamics

The systems interest to biological sciences dates back to the days of von Bertalanffy (1933, 1969), Wiener (1948) and Forrester (1958, 1961). In the context of biology, Biochemical Systems Theory (Voit 2000) and Metabolic Control Theory (Heinrich and Schuster 1996), proposed general mathematical models of biological systems at and around a steady state (equilibrium). The successful modeling of the plants is the ultimate goal of plant systems biology. A model (modus in Latin, meaning manner/measure) usually represents the causal relationships in a system, in mathematics. In systems biology, cells or higher units of biological organization are understood as systems of interacting elements. For a systems level explanation, one needs to know the identity of the constituents, dynamic behavior and interactions among the constituents, of the biological system, under study (Kitano 2002). This information can eventually be combined into a model, which is not only consistent with the current knowledge but also can predict the system behavior under new unexplored perturbations. Modeling and simulation are central to bridge the gaps between theory and experiment (Dhar et al. 2004). Usually, experimental results require correct mathematical/statistical input, and model hypotheses require experimental evidences, to provide biologically meaningful interpretations. Modeling usually starts with construction of biological networks from the available molecular datasets. Network construction and analysis are the crucial components of systems biology.

A network/graph, in systems biology, has two basic parts: the elements of the system are represented as graph nodes (also called vertices) and the interactions are represented as edges, that is, lines connecting pairs of nodes. Edges may be directed (originating from a source (starting node) to a sink (ending node) and represent unidirectional flow of material or information) or non-directed (representing mutual interactions where the directional flow of information is not known). In biological networks, nodes (or vertices) represent the molecules present inside a cell (e.g., proteins, RNAs and/or metabolites) and links (or edges) between nodes represent their biological relationships (e.g., physical interaction, regulatory connections, metabolic reactions) (Blais and Dynlacht 2005). Signs representing activation or inhibition can be shown on edges to augment the information content of the network. The important characteristics of biological networks are scale-free structure (the number of nodes that make a large number of connections with other nodes (referred to as “hubs”) is much lower than the number of nodes with few connections) and relative scarcity of hubs that connect directly to one another (Barabasi and Oltvai 2004). The nodes of the interaction network represent population of biomolecules, whose abundance varies in time and in response to the internal and environmental perturbations. To visualize the changes and create a model, the interaction network needs to be augmented by variables (expression, concentration, activity) which indicate the state of each node and set of equations, indicating the how the state changes corresponding to the stimuli. Models may be static or dynamic depending on their behavior in the system with time. The four common types of networks in plant systems biology include, gene-to-metabolite networks, protein–protein interaction networks, transcriptional regulatory networks and gene regulatory networks, wherein he first three types are often static, while the gene regulatory network frequently is dynamic (Yuan et al. 2008) (Fig. 4).

Fig. 4
figure 4

Plant biological networks. a Gene-to-metabolite network, b protein–protein interaction network, c transcriptional regulatory network and d gene regulatory network

Gene-to-metabolite networks

Gene-to-metabolite networks are derived from the correlation analysis of genes and metabolites under a given set of conditions. Here the genes and metabolites act as nodes and the edges represent the regulatory interactions. The interactions are interpreted depending on the distance between the genes and the metabolites. These types of networks are highly complex and difficult to study in plants, owing to the enormous diversity and number of metabolites being produced in the cells corresponding to their sessile lifestyle. Various new research dimensions like interrelation among biological processes, gene functional annotation, discovery of new genes in biosynthesis, regulation and transport of metabolites, have been added to plant science owing to the elucidation of gene-to metabolite networks (Yuan et al. 2008). The gene-to- metabolite networks have been worked out in various studies like in stress responses (Goossens et al. 2003; Zulak et al. 2007; Carrari et al. 2006), discovery of novel candidate genes for terpenoid indole alkaloid biosynthesis in Catharanthus roseus (Rischer et al. 2006), in the response to nitrogen deficiency and during diurnal cycles (Bläsing et al. 2005; Scheible et al. 2004) an so on.

Protein–protein interaction networks

In protein–protein interaction networks, the nodes are proteins which are connected by direct edges if the direction of information flow during their interaction is known, or non-directed edges if there is strong evidence of their physical interaction or association without an evidence for directionality of interaction (Assmann and Albert 2009). Two types of interactions might be possible: genetic or physical. A genetic protein–protein interaction is a network of genes characterized on the basis of genetic interactions to explain gene function within physiological processes (Boone et al. 2007). However, this approach is difficult to implement owing to the ploidy levels and long life cycles of plants. On the contrary, physical interactions are easier to be characterized on the plant systems. In plants, interaction maps have been experimentally elucidated for homo and heterodimerization within two large classes of transcription factors: the MADS (MCM1, Agamous, Deficiens, SRF) box transcription factors (Immink et al. 2003; de Folter et al. 2005) and the MYB (myeloblastosis) transcription factor family (Zimmermann et al. 2004). The further details regarding interactome are furnished in a preceding section in the current review namely ‘interactomics’.

Transcriptional regulatory networks

The transcription regulatory network explains the regulatory interactions between transcription factors and downstream genes. They have two types of nodes—transcription factors and regulatory genes and two types of directed edged viz. transcriptional regulation and translation (Babu et al. 2004). In addition, the regulatory edges can have two types of signs, corresponding to activation or repression. Despite the general organizational similarity of networks across the phylogenetic spectrum, there are interesting qualitative differences among the network components, such as the transcription factors (Babu et al. 2004). Transcription factors usually regulate multiple genes and hence transcriptional regulatory networks are unidirectional and do not have strongly connected components.

The various approaches to decipher transcriptional regulatory networks include, genome-wide expression profiling, genome-wide RNA interference (RNAi) screens (Baum and Craig 2004), transcription rate assessment by measurement of mRNA decay rates (Holstege et al. 1998; Nachman et al. 2004), the evaluation of promoter co-occupancy by pairs of transcription factors (Geisberg and Struhl 2004) and computational prediction of cis-elements (Beer and Tavazoie 2004). In Arabidopsis, a transcriptional regulatory map was created for cold signaling mediated by the ICE1 transcription factor (Benedict et al. 2006). Recent reports on transcriptional regulatory networks include that in rice to understand the role of oxidative signals in chilling stress (Yun et al. 2010), those in response to abiotic stresses in Arabidopsis and grasses (Nakashima et al. 2009) as well as rice (Todaka et al. 2012), abiotic light-regulated transcriptional networks in higher plants (Jiao et al. 2007) and so on.

Gene regulatory networks

In a gene regulatory network, the nodes correspond to genes, messengers or proteins and the edges represent the regulatory interactions (activation, inhibition, repression or other functional interactions) among the components of the network. Complex gene regulatory networks are composed of genes, non-coding RNAs, proteins, metabolites and signaling components (Long et al. 2008). This type of network incorporates all the stages of regulation of gene expression including regulation of DNA transcription, RNA translation, post-transcriptional RNA processing as well as the post-translational modifications like protein targeting and covalent protein modification. These networks are often used to display the dynamics of the plant systems, unlike other networks which are static in nature (Yuan et al. 2008). The ABC model, which was one of the first plant gene regulatory networks modeled, explained the interactions among transcription factors that regulate floral pattern formation across plant species (Coen and Meyerowitz 1991). Gene regulatory networks have been reported in several studies to study developmental and physiological processes in plants. The studies include the attempt to model the essential components controlling guard cell size in stomatal closure (Li et al. 2006), cell fate determination during flower development in A. thaliana (Espinosa-Soto et al. 2004), microRNA (miRNA)-mediated gene regulatory networks (Meng et al. 2011) and recently in explaining land plant evolution (Pires et al. 2013).

Hence, biological network construction and analyses has been an important approach in plant systems biology to explain the organism or a biological process as a whole. The high-throughput technologies in modern science provide huge amount of quantitative data. However, the use of quantitative data is obstructed in systems wherein the knowledge of mechanistic details and kinetic parameters is scarce. In such cases, a wealth of molecular data on individual constituents as well as interactions can be helpful in modeling the system (Assmann and Albert 2009). The individual key components of the systems biology viz. genomic, transcriptomics, proteomics, metabolomics, etc. have been explained earlier. The biological networks along with these components are chief aspects of plant systems biology. Although the models could not exactly mimic the system with pure accuracy, still are highly capable to explain the intrinsic complexity of the plant systems.

Softwares and algorithms for plant systems biology

The use of bioinformatics’ softwares is inevitable for the comprehensive study of plant systems biology. In addition to the tools and resources used in the analyses of the individual ‘omics’ platforms, several resources are required for the elucidation of the ‘complete picture’. The detailed discussion of various algorithms and softwares used for systems biology are listed in Joyce and Palsson (2006) and Turenne (2011). These include the tools for network visualization, modeling environments, pathway construction and visualization tools, systems biology platforms and repositories of the models.

Visualization is a means of investigative data analysis and a key method for network analysis. The purpose of omics data visualization should be to create clear, meaningful and integrated resources without being besieged by the inherent complexity of the data (Gehlenborg et al. 2010). Several tools are available which help in visualization of ‘omics’ data on a systems scale like Sungear (Poultney et al. 2007), MapMan (Thimm et al. 2004), Genevestigator (Zimmermann et al. 2004), Cytoscape (Shannon et al. 2003), VirtualPlant (Katari et al. 2010), REACTOME (Joshi-Tope et al. 2005). Pathway databases are used for modeling systems, since they offer a clear-cut way of building network topologies by the annotated reaction systems. The various pathway databases for systems analyses include KEGG (Kanehisa et al. 2012), BioCyc (Caspi et al. 2010), Aracyc (Mueller et al. 2003), Pathway Interaction Database (PID) (Schaefer et al. 2009) and BioCarta (Nishimura 2001). Also, several comprehensive modeling environments are available, like Gepasi (Mendes 1997), Virtual Cell (Loew and Schaff 2001), Osprey (Breitkreutz et al. 2003), Arabidopsis eFP browser (Winter et al. 2007), COPASI (Hoops et al. 2006), R (http://www.R-project.org), MatLab and InfoBiotics workbench (Blakes et al. 2011), E-Cell (Tomita et al. 1999), Systems Biology WorkBench (Sauro et al. 2003).

The Systems biology model repositories include BioModels database (Le Novere et al. 2006) or JWS (Olivier and Snoep 2004). Both are public, centralized databases of curated, published, quantitative kinetic models of biochemical and cellular systems. The core systems biology networks include SynBioWave (Staab et al. 2010), Cell Illustrator (Nagasaki et al. 2010), Moksiskaan (Laakso and Hautaniemi 2010), MEMOSys (Pabinger et al. 2011), Babelomics (Al-Shahrour et al. 2006), MetNet (Sucaet et al. 2012), etc.

Conclusion and future perspectives

Plants are a solution to many environmental problems like food and water scarcity in the twenty-first century. Despite the difficulty to find a single perfect resolution, systems biology can be a medium to understand plants through mechanistic efforts, and eventually act as a base for partial solutions. The productivity of the individual plants could be enhanced by genetic modifications. However, the comprehensive understanding of the whole plants is a prerequisite to engineer those using molecular approaches. Ample knowledge about the response of the plants to internal and external stimuli is a must, which can be gained by systems biology (Ideker et al. 2001). Investigation of the individual hierarchical biological components and their integration is a key to systems biology. Although, the reductionist molecular approaches are inevitable for expansion of our insight, holistic systems biology approaches provide a complete sense of the plant systems. In addition to the promises, there are some biological and computational challenges to the application of systems’ approaches to plants (Fernie 2012). The difficulty in deciphering the highly complex architecture of plants is one of the major challenges to success of the field. This problem could be dealt by improvement of the current experimental platforms through better technical innovations. The diversity of the data formats of the experimentally derived datasets is a major computational challenge to the integration of the massive data (Heath and Kavraki 2009). Also due to the complexity of the plant biological networks, the experimental testing is not always possible. Still, they can be helpful in assisting and predicting the most promising experimental strategies thereby reducing the labor and time, otherwise used for the ‘trial and error’-based approach. But the above challenges can be met by the integrative efforts of diverse branches of plant sciences, rather than working in isolation. Hence systems biology will give rise to immense opportunities in decoding the complexity of plants by the fruitful collaboration of the classical plant biologists and computational modelers. The ‘unity in diversity’ approach of systems biology, is growing and will continue to impact the remarkable future of the plant science, thereby getting the most of the plants’ worth to benefit humanity in a pragmatic time frame.