Proteome Complexity Measures Based on Counting of Domain-to-Protein Links for Replicative and Non-Replicative Domains
The entire protein domain set of the proteome of an organism we call the domainome. We define the list of domains in domainome, together with the numbers of their occurrences (links to proteins) found in the proteome to be the domain-to-protein linkage profile (DPLP). We estimated the DPLP of the proteomes of the 156 complete genomes represented in the InterPro database. This work presents several quantitative measures of the complexity of a proteome based on the DPLP. For each of the 156 studied genomes, we found two large sets of domains: D1, the domains that are not replicated within any protein of the proteome and D2, the domains that occur two or more times in at least one protein of the proteome. Statistics of the observed domain-to-protein links (DPLs) for set D1 and set D2 do not exhibit simple ‘scale-free network’ properties: for D1, the distribution of DPLs in proteome follows the Generalized Discrete Pareto function and for D2, the distribution of DPLs in proteome follows the inversed gamma probability function. Dynamical range of DPLs for D1 domains is larger than for D2 domains, and this range correlates with biological complexity of organism. D1 and D2 sets exhibit significant differences of molecular functions of the corresponding proteins, biological processes, and cellular components. The statistical distributions of the number of DPLs in the proteome and the estimates of the differences between the DPLPs for pairs of organisms are used as measures of relative biological complexity of the organisms. In particular, we show quantitatively the greater domain composition complexity of the human proteins relative to that of a mouse or a rat.
Key wordsproteome complexity evolution domain-to-protein links skew distributions
Unable to display preview. Download preview PDF.