The genomes of typical plant species contain more genes than those of animals, fungi, and bacteria. This should not surprise us – plants are autotrophs, able to grow with just water, minerals, gases, and sunshine. Moreover, they must protect themselves from the auxotrophs that need to consume plants for their own growth, and co-opt animals to help them reproduce. Immobile plants have evolved elaborate biochemical pathways that produce defensive, attractive, and rewarding “secondary” or “specialized” metabolites. Gene duplication is the major process creating new genes in all organisms, and the large number of plant genes devoted to such functions has arisen through repeated cycles of duplication and divergence. Even after diverging, duplicated genes still tend to encode enzymes with related function, and with recent developments of sensitive and high-throughput technologies to help identify the function(s) of specific genes and enzymes, the time is ripe for a major push to examine the roles that genes related through duplications play in the ecological interactions of plants with their environment, especially those mediated by chemistry.

The process of genetic change in populations (and species) requires two steps. First, a random genetic change occurs in an individual. Then, if this change is adaptive, selection increases the proportion of individuals who carry the new trait (random drift also plays a role in small populations and for alleles with trivial differences in adaptive value). With gene duplication, the process requires three steps – first the duplication, then a mutation in one copy that creates an altered function, then selection. It is believed that duplication increases the probability of subsequent mutations being beneficial, because at least one gene copy maintains the original function, so the disadvantage of losing this function is eliminated.

Gene families, and genome size in general, evolve in both directions, and inactivation and subsequent loss of genes also occur. The ancestor of humans likely had a few more genes for amino acid biosynthesis that we presently do, just as plants that evolve a parasitic life style are known to lose genes in the process. Moreover, in specialized metabolism it is possible that a gene’s function could change without preserving the old function (i.e., without a prior gene duplication). Loss of function is a distinct possibility when it no longer confers an advantage to the plant (for example, if a pest has developed a resistance to the compound) or worse, when it becomes disadvantageous if a pest has acquired a taste for it. Yet, many of the plant genes specifying the synthesis of specialized metabolites are members of moderate (>10 members) or large (>30 members) gene families, and it is estimated that perhaps 10–20% of the genes in a plant’s genome specify enzymes for specialized metabolism (Somerville and Somerville, 1999). With plant genomes containing >25,000 genes, this substantial number amounts to a large cost to the organism, not only in maintaining these genes but also in synthesizing the enzymes and the compounds themselves. Nonetheless, the sheer number of specialized compounds found in a single plant has often caused people to wonder if all of them are adaptive. While several clear examples of specialized metabolites that confer a selective advantage have been documented, the function of many of these compounds is still unknown, although often suspected.

Recent developments for large-scale data collection – metabolic profiling, transcript profiling, and genomic sequencing – now make possible a combined effort by chemists, biochemists, molecular biologists, physiologists, and ecologists to examine the metabolome of each plant, identify the genes and enzymes responsible for secondary compound formation, and dissect their roles in the interactions of the plant with its environment. In this endeavor, gene families are prime targets. The commonality between members of each family, such as similar reaction mechanisms and similar substrates, will make it easier to identify the function of individual enzymes once the function of a representative member – often an enzyme in primary metabolism – has been established. Moreover, comparison of the functions of orthologs in different species (or populations) also allows us to assess the molecular evolutionary processes that give rise to new function, in turn leading us to examine the ecological effects of the synthesis of new compounds in plants. Recent examples of such approaches have included the identification of new cytochrome P450 oxidases in Brassicaceae that are involved in the synthesis of new phenolic/polyamine compounds deposited on the surface of pollen grains (Matsuno et al., 2009), and the characterization of a duplicate stearoyl-acyl carrier protein desaturase responsible for the synthesis of scent compounds in orchids that mimic insect sex pheromones (Schluter et al., 2011). Such an iterative process of studying gene families in plants will greatly speed up our understanding of the roles of plant specialized metabolites.