Local coexpression domains in the genome of rice show no microsynteny with Arabidopsis domains
- First Online:
- Cite this article as:
- Ren, XY., Stiekema, W.J. & Nap, JP. Plant Mol Biol (2007) 65: 205. doi:10.1007/s11103-007-9209-0
- 381 Downloads
Chromosomal coexpression domains are found in a number of different genomes under various developmental conditions. The size of these domains and the number of genes they contain vary. Here, we define local coexpression domains as adjacent genes where all possible pair-wise correlations of expression data are higher than 0.7. In rice, such local coexpression domains range from predominantly two genes, up to 4, and make up ∼5% of the genomic neighboring genes, when examining different expression platforms from the public domain. The genes in local coexpression domains do not fall in the same ontology category significantly more than neighboring genes that are not coexpressed. Duplication, orientation or the distance between the genes does not solely explain coexpression. The regulation of coexpression is therefore thought to be regulated at the level of chromatin structure. The characteristics of the local coexpression domains in rice are strikingly similar to such domains in the Arabidopsis genome. Yet, no microsynteny between local coexpression domains in Arabidopsis and rice could be identified. Although the rice genome is not yet as extensively annotated as the Arabidopsis genome, the lack of conservation of local coexpression domains may indicate that such domains have not played a major role in the evolution of genome structure or in genome conservation.
KeywordsArabidopsis Coexpression domain Higher-order chromatin structures Microsynteny Rice
Basic local alignment search tool
Microarray expression data
Massively parallel signature sequencing expression data
Million years ago
Partially syntenic coexpression domain
Partially syntenic non-coexpressed domain
The Arabidopsis information resource
The Institute of Genomic Research
The fast-growing data sets on genome annotation and genome-wide gene expression facilitate the study and comparison of gene activity between and among genomes. The genomic context of genes is supposed to play an important role in the regulation of gene expression (van Drunen et al. 1997). Non-random clusters of similarly expressed (co-regulated, coexpressed, highly expressed and/or broadly expressed) genes have been described in almost all organisms, ranging from prokaryotes to eukaryotes. In eukaryotes, from yeast to Arabidopsis to human, both short-range co-regulated/coexpressed clusters of two to five genes (Cohen et al. 2000; Ren et al. 2005; Zhan et al. 2006) and longer-range coexpression domains of up to 30 genes spanning up to 100 kb and more (Lercher et al. 2003; Ma et al. 2005; Spellman and Rubin 2002; Zhan et al. 2006) have been described. Duplicated genes (Lercher et al. 2003), shared promoter regions (Kruglyak and Tang 2000), shorter gene distance (Cohen et al. 2000; Roy et al. 2002; Semon and Duret 2006; Williams and Bowles 2004) and/or functional relatedness (Cohen et al. 2000; Lee et al. 2004; Spellman and Rubin 2002; Williams and Bowles 2004) were found to account for only part of the coexpression between genes. Therefore, most studies postulate that the occurrence of coexpression domains, small or large, is regulated on the level of higher-order chromatin structure (Cohen et al. 2000; Hershberg et al. 2005; Ren et al. 2005; Spellman and Rubin 2002; Williams and Bowles 2004), although alternative views exist (Semon and Duret 2006).
Previously we have defined and demonstrated the existence of local coexpression domains in the genome of Arabidopsis (Ren et al. 2005). A local coexpression domain was defined as any set of physically adjacent genes that are highly coexpressed with a pair-wise Pearson’s correlation coefficient larger than 0.7. It was shown that a small (5–10%) yet significant fraction of genes in the Arabidopsis genome is organized in such local coexpression domains. Genes in such local domains were for the major part not categorized in the same functional category (GOslim). Neither tandemly duplicated genes nor shared promoter sequence, nor gene distance, explained the occurrence of coexpression of genes in such chromosomal domains. This indicates that other parameters in genes or gene positions are important to establish coexpression in local domains of Arabidopsis chromosomes. Here it is analyzed whether a similar situation exists in the genome of the monocotyledonous model plant rice (Oryza sativa), for which an earlier study had shown the existence of longer-range domains (Ma et al. 2005). We combined the whole genome rice annotation data (TIGR version 3; www.tigr.com) with Massively Parallel Signature Sequencing (MPSS; mpss.udel.edu/rice) expression data as well as Affymetrix array expression data (GEO, GSE4438; www.ncbi.nlm.nih.gov/geo) for the rice cultivar japonica in a way similar to the analysis performed for the Arabidopsis genome (Ren et al. 2005). The results show that the characteristics of the two genomes with respect to the occurrence and configuration of local expression domains are remarkably similar.
Also in the rice genome, a small yet significant fraction of genes is organized in local coexpression domains that predominantly consist of two, up to 4, genes that are not categorized in the same functional category, irrespective of the expression platform used for analyses. The presence of tandemly duplicated genes, shared promoter sequence or gene distance is not fully explaining the occurrence of coexpression of genes in such chromosomal domains. Therefore, the regulation of local coexpression domains is postulated to be at the level of higher-order of chromatin structure. Given the similarities in the characteristics and occurrence of coexpression domains between Arabidopsis and rice, we investigated whether the genes involved showed microsynteny between the two genomes. These analyses did not identify the presence of syntenic local coexpression domains between Arabidopsis and rice.
Material and methods
The rice (Oryza sativa) genome was obtained from the website of The Institute of Genomic Research (TIGR; www.tigr.org). The rice TIGR version 3 [Jan. 2005] annotation has 57,915 gene loci. In case of alternative splicing, the longest variant of the gene was used. The genes along each chromosome were sorted based on ascending start coordinates and were numbered consecutively. These rank numbers (rank ID) helped to eliminate any discontinuity in the unique Os gene identifiers of the annotated genes and facilitated analyzing physically adjacent genes. In case of overlapping gene loci, the smaller one of the overlapping genes was removed from the data set. This way the order of both gene and rank ID numbers was maintained.
The MPSS expression data for rice (cv. Japonica) were obtained from the rice MPSS database (http://mpss.udel.edu/rice). Only the unique MPSS tags (mapping to the genome only once) and those mapping to unique gene identifiers in TIGR v3 were used in the analyses. The MA expression data were obtained from the Gene Expression Omnibus repository (GEO; www.ncbi.nlm.nih.gov/geo). The data with accession number GSE4438 were chosen for analysis, because they represent the only Affymetrix chip-based data set in the public domain with a reasonable coverage of the Oryza sativa cv. japonica transcriptome under four different experimental conditions (Walia et al. 2007). The earlier analysis in Arabidopsis also concerned Affymetrix chip data (Ren et al. 2005). The MA expression data were mapped to the TIGR v3 annotation based on gene accession numbers.
The expression values in libraries representing biological replicates, i.e., the same tissues under the same experimental conditions, were averaged. This way, for the MPSS data 18 different libraries were generated, that cover expression in 9 tissues (callus, panicle, leaves, root, germinating seed and seedling meristem, ovary and stigma, pollen, stem) under different experimental treatments or in different developmental stages. From the microarray (MA) data, only the expression data from cv. japonica were used to allow comparison with the MPSS data. Here, 4 different libraries were generated: a sensitive japonica genotype under control conditions and under salt stress; and a tolerant japonica genotype under control conditions and under salt stress (Walia et al. 2007). The expression data is from crown and growing point tissue.
Identification of local coexpression domains
Pearson’s correlation coefficient (R) was calculated between all adjacent pairs (duplets) of genes using the expression data from all 18 libraries. If R was higher than 0.7, the gene pair concerned was considered to be coexpressed. The value of R > 0.7 is generally considered a rule-of-thumb threshold (see for example bbc.botany.utoronto.ca/affydb/BAR_instructions) and is used in various analyses (Cohen et al. 2000; Lee et al. 2004; Ren et al. 2005). The number of coexpressed adjacent pairs was counted. To evaluate the statistical significance of these numbers, they were compared with the number of coexpressed pairs from 100 randomizations of the population of expressed genes using the cumulative binomial distribution (Cohen et al. 2000). Previous analyses indicated that more than 100 randomizations did not result in significant changes in the numbers obtained (Ren et al. 2005). In each round of randomization, non-adjacent pairs of genes were randomly selected with replacement from the list of expressed genes that have expressed neighbors till the same total number of pairs was obtained. Similarly, coexpressed adjacent triplets, quadruplets and pentaplets were identified as series of genes with consecutive IDs in which all possible (that is, (n!/(n−2)!)*2; where n is the number of genes involved) pair-wise R’s should be above the cut-off of 0.7. The significance of results was evaluated with randomizations equivalent to the procedure used in case of duplets.
Duplicated genes were identified by local pair-wise protein BLAST (BLASTP 2.2.6 [Apr-09-2003]; (Altschul et al. 1997)), on all gene pairs in the rice genome. A gene pair was considered to be duplicated (dup) if BLASTP yielded an E-value < 0.2 (Fukuoka et al. 2004; Lercher et al. 2003; Williams and Bowles 2004). To determine duplicated triplets, quadruplets and pentaplets, it was required that any pair of the genes concerned had a BLASTP E-value <0.2.
Analyses of gene orientation and gene distance
Adjacent gene pairs were separated into tandemly, divergently and convergently transcribed pairs according to their relative direction of transcription. The number of coexpressed pairs in each orientation group was expressed as percentage relative to the total number of adjacent pairs in that group. Random pairs were made by randomly picking two non-adjacent genes from the list of expressed genes represented in pairs, analyzed for their orientation and compared with the real genome using a variant of the two-sample t test for proportions for determining the significance of a difference between two population proportions (Ott and Longnecker 2001). The test statistic is based on the z statistic from the normal distribution and is given by (p1−p2)/ √ (p1*(1−p1)/n1 + p2*(1−p2)/n2), with p1 and p2 the two sample proportions, n1 and n2 the two sample sizes, under the condition that n1*p1, n1*(1−p1), n2*p2 and n2*(1−p2) are all larger than 5. The z value is converted to a p value using standard normal tables.
To determine the gene distance, the intergenic distance is used. This distance is defined as the length in nucleotides from the annotated end of one gene to the annotated start of the next gene, including the UTRs when known, otherwise the translation start and stop sites were taken. The data sets excluding the duplicated gene pairs were analyzed. For each data set, gene pairs were sorted based on gene distance from short to long and bins of 1,000 pairs were taken and analyzed, excluding the last bin with less than 1,000 pairs. The advantage of using equal pair bin is that it avoids unequal number of gene pairs in different distance categories. Per 1,000-pair bin, gene distance was calculated as the average over all 1,000 pairs. For each 1,000-pair bin, the fraction of coexpressed pairs relative to the total number of pairs in each orientation group in each bin was calculated and plotted.
Functional categorization of genes
TAIR’s GOslim, the Gene Ontology (GO) developed for plants (Berardini et al. 2004) was used to classify the genes present in local coexpression domains. The three aspects of GOslim, molecular function, biological process and cellular component, were analyzed in parallel. With Python scripts, the number of pairs of which both members could be classified in GOslim was determined, and the number of pairs of which both members fall into the same well-defined GOslim category was also determined. The GOslim categories of ‘unknown’ and ‘other’ were not included into well-defined categories, because they give less (or no) information about functional categorization. The percentage of coexpressed pairs falling into the same well-defined category was compared with that of non-coexpressed pairs to determine whether coexpressed genes are more enriched in the same functional category than non-coexpressed genes.
Assessing synteny between Arabidopsis and rice
The Inparanoid Eukaryotic Orthologous groups database (inparanoid.cgb.ki.se; O’Brien et al. 2005) was used to download all known orthologous and inparalogous clusters between Arabidopsis and rice. Inparanoid defines inparalogs as paralogs that arose through gene duplication after speciation. Inparalogs can form a group of genes that together are orthologous to a gene in another species. There are 9,044 orthologous clusters between Arabidopsis (from Ensemble) and rice (from the Model Organism database) and all of them were taken into account. These clusters were downloaded on Dec. 12, 2005. In the orthologous clusters, 15,544 sequences (proteins) from Arabidopsis are inparalogs and 14,807 sequences (proteins) from rice are inparalogs. More than half of Arabidopsis and rice inparalogs are many-to-many or many to-one orthology cases. Less than half of the cases are one-to-one orthology cases.
The Ensembl protein IDs (for Arabidopsis) and the Model organism database protein IDs (for rice) in Inparanoid were first translated to their unique gene identifiers in the respective TIGR annotation by BLASTP using an E-value < e−20. This yielded 14,753 unique Arabidopsis genes and 12,428 unique rice genes as inparalogs. The pairs of genes in local coexpression domains were analyzed to determine which genes in a rice local coexpressed pair have orthologs in an Arabidopsis local coexpressed pair, and vice versa. Because of the larger coverage of genes and tissues, only the MPSS expression data were used for this analysis. As coexpressed triplets and quadruplets are always combinations of coexpressed pairs, they were not further analyzed. For comparison, the pairs of genes that are not coexpressed were analyzed to determine how many non-coexpressed pairs, or one of their member genes, have orthologs in the other plant species. The numbers were then compared between coexpressed pairs and non-coexpressed pairs to determine the significance of occurrence of syntenic local coexpression domains.
Local coexpression domains consist of two to four neighboring genes
Description of rice expression data used for whole-genome local coexpression analysis
Genes with expression
Excluding overlapping genes
Without expressed neighbor(s)
represented in pairs
Tandemly duplicated pairs (td)
Total excluding td
Coexpressed excluding td
Coexpressed adjacent pairs
Tandemly duplicated pairs
Tandemly duplicated pairs
32 (5.6%) e
Notably duplicated genes are supposed to influence coexpression statistics due to their common origin (Lercher et al. 2003), although a surprising finding for the Arabidopsis coexpression domains was that only a minor fraction of duplicated genes were actually coexpressed (Ren et al. 2005). The occurrence of duplicated pairs in the rice set was determined with pair-wise protein BLAST using a cut-off of E < 0.2 (Fukuoka et al. 2004; Lercher et al. 2003; Williams and Bowles 2004). This identified 1,663 (12.9%) duplicated gene pairs in the MPSS gene pair data set and 573 (9.5%) duplicated gene pairs in the MA gene pair data set.
Local coexpression domains in the rice genome
Random genome (100×)
MPSS + tde
408 ± 17
1.46 × 10−17
356 ± 21
2.17 × 10−6
MA + tdg
301 ± 17
271 ± 16
MPSS + td
8.78 ± 2.9
2.95 × 10−5
7.74 ± 3.0
MA + td
6.54 ± 2.7
5.10 ± 2.4
MPSS + td
0.24 ± 0.47
1.81 × 10−3
0.18 ± 0.39
MA + td
0.14 ± 0.37
Orientation and distance do not solely explain the occurrence of local coexpression
Orientation of coexpressed gene pairs
Functional categorization of coexpressed genes
Distribution of gene pairs over GOslim categories (Non-duplicated pairs)
Microsynteny of local coexpression domains between rice and Arabidopsis
The structural characteristics of local coexpression domains in rice and in Arabidopsis (Ren et al. 2005) are remarkably similar. This prompts the question whether such domains also share functional characteristics and possibly consist of the same or related genes. Microsynteny in local expression domains of these two genomes would reflect conservation of such domains. The Inparanoid Eukaryotic Orthologous database (O’Brien et al. 2005) was used to retrieve the current list of genes that are supposed to be orthologous between Arabidopsis (14,753 genes) and rice (12,428 genes), including all many-to-many relationships. The genes establishing coexpressed pairs based on MPSS expression data in either Arabidopsis (944 pairs including 116 duplicated pairs; data from Ren et al. 2005, but updated to the TIGR 5 annotation) or rice (584 pairs, including 146 duplicated pairs) were searched against these lists. This way, we aimed to identify the pairs of which both genes in the pair have an ortholog in the other plant and these orthologs are also coexpressed The analyses showed that there was not a single coexpressed pair in either Arabidopsis or rice of which both genes are orthologous to a gene of a coexpressed pair in the other species. Therefore, given the current annotation of the two genomes, there are no syntenic local coexpression domains between Arabidopsis and rice.
Partially syntenic local coexpression domains can occur by chance
In 34 cases though, one gene of a coexpressed pair in one plant species was orthologous to at least one gene of a coexpressed pair in the other plant. That is 3.6% of all (944) coexpressed pairs in Arabidopsis and 5.8% of all (584) coexpressed pairs in rice. We will refer to such a case as a partially syntenic coexpression domain (PSCD). To assess the significance of such partially syntenic domains, we evaluated all the genes in non-coexpressed pairs, comparing Arabidopsis (15,629 pairs including 617 duplicated pairs) and rice (12,336 pairs including 1,517 duplicated pairs) to establish whether PSCDs are more enriched in the genome than partially syntenic non-coexpressed domains (PSND). We identified 4,488 PSNDs (72 due to duplicated pairs) between all non-coexpressed pairs of genes in both plant genomes. This is 28.7% of all Arabidopsis non-coexpressed pairs and 36.4% of all rice non-coexpressed pairs. The percentage of PSNDs among non-coexpressed pairs is 6–8 times higher than that of PSCDs from coexpressed pairs. Therefore, PSCDs do not seem to occur more often than expected by chance alone.
Local coexpression domains represent only a small part of the genome
Setting stringent criteria for coexpression using MPSS and microarray expression data, the rice genome was found to contain a small but significant number of local coexpression domains that range from predominantly two, up till 4, genes, irrespective of the expression platform used for analysis. This is similar to the situation in Arabidopsis (Ren et al. 2005). It shows that a genome is essentially not a random entity with respect to the occurrence of local coexpression domains. Our results agree with other coexpression studies where strong coexpression was shown to occur only within close proximity of several genes (Cohen et al. 2000; Hershberg et al. 2005; Lercher et al. 2003; Semon and Duret 2006). Although coexpression was shown to extend to regions covering up to 30 genes and more (Spellman and Rubin 2002) and to cover chromosomal regions up to 100 kb (Ma et al. 2005; Spellman and Rubin 2002; Williams and Bowles 2004) and more, there appears to exist a decrease in the strength of coexpression with increasing distances. The local coexpression domains described here represent ∼4–5% of the potential coexpression fraction in the whole genome as found in other studies (Semon and Duret 2006). Larger but looser coexpression domains might cover up to ∼10% (Cohen et al. 2000; Williams and Bowles 2004) till 20% (Spellman and Rubin 2002) of the genome. The difference in occurrence between local and longer-range weaker but still statistically significant coexpression domains is highly dependent on the method used (Semon and Duret 2006). Moreover, the expression platforms used focus on snapshots of gene expression at the RNA level. Such data ignore various post-transcriptional regulatory mechanisms that can for example result in poor correlations between RNA and protein levels. Although we have shown before that in a transgenic set-up the creation of an artificial local coexpression domain results in markedly improved RNA/protein correlations (Mlynarova et al. 2002), this does not need to be the case for all endogenous local coexpression domains now identified.
The terms cluster or chromosomal domain and associated terms such as neighboring are generally based on a (much) more loose definition compared to the definition used here to identify local coexpression domains. Local coexpression domains require a pair-wise correlation between the expressions of ALL adjacent genes above 0.7. The larger domains are defined on the basis of the use of a sliding window of either a given sequence length (number of nucleotides) or of a given number of genes (Spellman and Rubin 2002; Williams and Bowles 2004). In such a window, the average correlation is calculated and compared with simulated sets. This allows for the presence of genes within a domain that are not strongly (co)expressed but are “carried along for a ride” in the open chromatin domain (Spellman and Rubin 2002).
The two expression data sets here analyzed show very little overlap in coexpressed pairs (only 10 out of 4,643 all common pairs). The low number of shared pairs is thought to reflect the biological background of the data sets. The MPSS data concern a broad range of tissues and experimental conditions, while the MA data only cover crown and growing point tissue under control and salt stress condition (Walia et al et al. 2007). The number of libraries taken into account, as well as their biological background, obviously influences the possibility of identifying significant numbers of coexpression domains above the number of domains expected by chance. Two genes evaluated in four comparably similar conditions (MA data) are predicted to have a higher likelihood of exhibiting a similar expression pattern, than when examined in 18 different (MPSS) conditions. When considering genome-wide local coexpression of genes, a wide diversity of tissues and/or conditions should be taken for analysis. Highly tissue-specific coexpression may be masked in this approach and should be analyzed by other means.
Parameters shaping local coexpression domains
The existence of local coexpression domains in rice could not be explained solely by gene orientation, such as tandemly, divergently or convergently oriented gene pairs. No relative enrichment of the proportion of coexpressed pairs was seen. The fraction of coexpressed genes in the divergent orientation was even lower than for the other two orientations (Table 3). So shared promoter regions (for divergent pairs) and transcriptional read-through (for tandem pairs) do not explain the local coexpression domains in rice, similar to what we have concluded for Arabidopsis (Ren et al. 2005). This is in contrast to some other studies in which shared promoter region (for divergent pairs) and transcriptional read-through established coexpression domains (Semon and Duret 2006). Whereas we do not detect any preferred orientation to result in coexpression in rice, other studies show a higher degree of coexpression in divergent and tandemly oriented gene pairs (Williams and Bowles 2004; Zhan et al. 2006). The differences in conclusions are most likely due to the different methods used, such as the definition of coexpression of neighboring genes as well as the dataset and/or expression platform used.
Gene distance is not the explanatory factor for the occurrence of local coexpression domains in rice. No significant decrease in the fraction of coexpressed genes was observed with increasing intergenic distance (Fig. 2). The fraction of coexpressed pairs does not decrease even with gene distances up to 12 kb. It shows that at a relatively large distance, neighboring genes can still be coexpressed. Another study reported that when genes >12 kb apart were taken into account, the negative correlation between coexpression and gene distance was gone (Williams and Bowles 2004). In a comparative study of 6 eukaryotic genomes, coexpression was shown to vary at chromosomal distances above 100 kb (Fukuoka et al. 2004). This suggests that considerable coexpression of neighboring genes can occur even at large gene distance, although the coexpression may not be related to the physical distance anymore. While gene distance itself is not predictive for coexpression (Cohen et al. 2000; Kruglyak and Tang 2000), the likelihood of coexpression would favor short gene distances (Hershberg et al. 2005; Hurst et al. 2002; Lercher et al. 2003; Semon and Duret 2006). It should be kept in mind that the rice data set now analyzed is far from complete in terms of its annotation, so what are now far-apart neighboring genes may be no longer directly neighboring the moment the annotation is improved. Because the characteristics of the local coexpression domains in Arabidopsis and rice are remarkably alike, the actual numbers and possibly even genes involved in local coexpression domains may be subject to change, but not the occurrence of such domains in these genomes.
With the gene ontology developed for plants (GOslim), there is no evidence that coexpressed genes are more enriched in the same functional category in comparison to non-coexpressed genes (Table 4). Previous studies suggested that clustering of functionally related genes would occur in all metazoans (Cohen et al. 2000; Lercher et al. 2003). Recent studies demonstrated a significant enrichment for coexpressed genes in the same metabolic pathway (Williams and Bowles 2004) or the same biological processes (Zhan et al. 2006), although this appeared not to be the explanation for the coexpression of neighboring genes (Williams and Bowles 2004). In worm, clusters of similarly expressed genes cover similar biological functions (Roy et al. 2002). In human, coexpression over the whole genome was shown to correlate with functional relationships between the genes (Lee et al. 2004). Our study found no enrichment of coexpressed gene pairs in the same functional category than non-coexpressed pairs, suggesting that it is not necessarily true that the natural selection maintained regional coexpression by keeping genes with similar functions in adjacent positions (Cohen et al. 2000; Semon and Duret 2006).
The genomic context of genes is supposed to play an important role in the regulation of gene expression (van Drunen et al. 1997). In a number of coexpression studies in various organisms, the occurrence of coexpression domains, whether small (local) or larger (global), sometimes independent of gene orientation and gene distance, were all supposed to be regulated at the level of higher-order chromosomal structure (Cohen et al. 2000; Hershberg et al. 2005; Ren et al. 2005; Spellman and Rubin 2002; Williams and Bowles 2004; Zhan et al. 2006). It will be difficult to formally exclude the possibility of transcriptional regulation as a cause of local coexpression. Co-regulated transcription could occur through shared promoter elements between the neighboring genes in coexpression domains. However, previous experience with transgene expression data indicated that the particular position of neighboring genes in a genome affects the expression of that gene considerably (Mlynarova et al. 1994, 1995). Two physically neighboring transgenes could only be made to into an artificial local expression domain, that is, show correlated expression, when chromatin-organizing elements were placed around the genes (Mlynarova et al. 2002). Although this experimental result indicates the importance of chromatin organization in establishing local coexpression in a plant genome, this result does not need to be the case for all coexpression domains identified. With this caveat, this study of local coexpression domains in rice that are independent of duplication, gene orientation, or gene distance strengthen the notion that the regulation of genes in such domains resides at the level of higher-order chromatin structures. Future studies using for example advanced fluorescence in situ hybridization (FISH) technology on interphase chromosomes (Walter et al. 2006), or the transgenic approach outlined above (Mlynarova et al. 2002), will generate further experimental support for the location and characteristics of local coexpression domains. In addition, the new approaches in and insights from epigenetics and epigenomics (Henderson and Jacobsen 2007), such as the genome-wide mapping and analysis of DNA methylation (Zhang et al. 2006; Zilberman et al. 2007) and future histone modification maps (Esteller 2007) of plants, possibly in combination with genetics (Jansen and Nap 2001), will be useful to get more insight into the occurrence and function of local coexpression domains in plant genomes.
Lack of microsyntenic coexpression
From an evolutionary point of view, syntenic regions between species reveal genes for conserved and important traits. Macrosynteny is generally not easily detectable after a long evolutionary time, as colinearity erodes by various mechanisms, such as transposon activity, intra or inter-chromosomal rearrangements, duplications, translocations, inversions and/or individual divergence after speciation (Salse et al. 2002). While macrosynteny may not be detectable any more for genomes that diverged more than 100 million years ago (mya), microsynteny, i.e., conservation of local gene order and orientation, may still exist and be informative (Devos et al. 1999; Salse et al. 2002). Arabidopsis and rice are thought to have diverged about 120–200 mya (Salse et al. 2002). Microsyntenic local coexpression domains between Arabidopsis and rice would indicate the importance of the evolutionary conservation of regulatory systems beyond sequence similarity after the divergence of dicotyledonous and monocotyledonous plants. Analyses show that there is not a single coexpressed pair in either Arabidopsis or rice of which both genes are orthologous to a gene in a coexpressed pair in the other species. Therefore, there are no syntenic local coexpression domains between Arabidopsis and rice. Although the analyses were performed for only one monocot and one dicot and should be extended to many more genomes, the results could be taken to suggest that maintenance of coexpression has not been an important driving force in genome conservation during or after the divergence of dicotyledonous and monocotyledonous plants. Although individual genes in local coexpression domains in either rice or Arabidopsis may have an ortholog in the other species, establishing so-called partially syntenic coexpression domains (PSCDs), this does not seem to occur above chance in the context of whole-genome configurations. Without statistical significance, the occurrence of such PSCDs is unlikely to have any evolutionary relevance on a genome-wide scale. Detailed analyses of individual cases and gene locations may suggest the occurrence of local microsynteny and point to chains of evolutionary events in which the conservation of coexpression could be involved. However, more detailed studies are required to assess the functional relevance, if any, of such genomic constitutions.
We thank Prof. Blake Meyers (University of Delaware) for early access to the MPSS rice expression data and Dr. Ludmila Mlynarova (Wageningen University) for discussion. This research was supported by a program subsidy from the Netherlands Organization for Scientific Research (NWO) and by the Centre for BioSystems Genomics (CBSG), which is part of the Dutch Genomics Initiative.