Isoprenoids form a very large and diverse group of natural products and are especially widespread in plants. The diversity of the enzymes responsible for isoprenoid biosynthesis is based on structural as well as functional variety. Up to now the evolutionary origin of prenyl converting enzymes is not well studied although almost all of them use substrates with a common activating group: diphosphate (pyrophosphate).

As a starting point of a more detailed analysis of the evolution of prenyl converting enzymes, diphosphate binding sites are analysed. Fundamental research regarding the related phosphate binding modes has already been done by Hirsch et al. [1] in 2007, where proteins were initially grouped concerning their function and subsequently statistically analysed. Our approach, in contrast, starts with an analysis of the diphosphate binding sites that is used as a basis for a clustering to avoid statistical falsification caused by imperfect grouping.

Therefore several routines were developed using the Scientific Vector Language SVL provided by the Molecular Operating Environment MOE [2]. On the basis of all diphosphate binding proteins deposited in the Protein Data Bank and also taking into account metal ions and water molecules as bridging structural elements, functional groups interacting with the pyrophosphate are identified. They are subsequently stored in a molecular database together with information like PDB classification of the protein, metal content, number of interacting groups in the binding site or different properties of the pyrophosphate. Descriptive statistical analyses based on the complete dataset including the comparison of bond angles and lengths as well as the description of torsion angles are performed to characterise the binding sites as a whole. A subsequent clustering based on these results is expected to uncover typical binding modes for pyrophosphates. Therefore additional SVL routines were developed to extract the required information from the binding sites. On the basis of these data different clustering methods provided by the statistical package R [3] are used and compared to get insight into the possibilities nature provides in pyrophosphate binding.