3D-dynamic representation of DNA sequences

A new 3D graphical representation of DNA sequences is introduced. This representation is called 3D-dynamic representation. It is a generalization of the 2D-dynamic dynamic representation. The sequences are represented by sets of “material points” in the 3D space. The resulting 3D-dynamic graphs are treated as rigid bodies. The descriptors characterizing the graphs are analogous to the ones used in the classical dynamics. The classification diagrams derived from this representation are presented and discussed. Due to the third dimension, “the history of the graph” can be recognized graphically because the 3D-dynamic graph does not overlap with itself. Specific parts of the graphs correspond to specific parts of the sequence. This feature is essential for graphical comparisons of the sequences. Numerically, both 2D and 3D approaches are of high quality. In particular, a difference in a single base between two sequences can be identified and correctly described (one can identify which base) by both 2D and 3D methods.


Introduction
In modern biomedical sciences methods derived from physics, mathematics, and numerical analysis are frequently applied.
Therefore this branch of science is, in fact, interdisciplinary. In particular, the analysis of biological sequences (DNA, RNA, protein) combines interdisciplinary methodology. Powerful methods are graphical representations which allow for both graphical and numerical characterization of the sequences. The sequences are usually very long, and it is not obvious how to represent these objects. The questions how to avoid the degeneracy and how to express the features of the objects both graphically and numerically, result in numerous methods.
In the present work, we introduce a new 3D graphical representation method. The proposed method is a 3D generalization of the 2D-dynamic representation of DNA sequences [1]. The 2D-dynamic graphs represent the DNA sequences. They are composed of the "material points" distributed in a 2D-space. Their distribution is determined by the sequence. We proposed the moments of inertia and the coordinates of the centers of mass of the 2D-dynamic graphs for the numerical characterization of the DNA sequences [1]. We also considered the high-order moments of the mass-density distributions based on 2D-dynamic graphs as the descriptors [2]. The mass overlaps and the angles between X axis and the principal axis of inertia are also used for the description of similarity/ dissimilarity of the DNA sequences [3].
Both our methods (2D and 3D-dynamic representations) are based on a walk in a space which is one of the common approaches in this field. The 2D graphical representation methods took their origin in visualizations of these walks [4][5][6]. The approaches based on a walk in a 3D space may be found in [7][8][9][10][11]. The differences between them are due to assigning different basis vectors to particular bases and due to different numerical characterizations of the graphs. Examples of various 3D graphical representation methods may be found in [12][13][14][15][16][17][18][19][20][21][22][23].
In the present work we model a DNA sequence as a set of "material points" in the 3D space. As a consequence, the sequence is characterized by the dynamical quantities, e.g., moments of inertia, analogously as in 2D-dynamic representations. Therefore we retained the name '3D-dynamic representation of DNA sequences'. Using the new model we construct the classification diagrams.

Method
The proposed method is based on the convention of a walk in a 3D space. A base in a sequence is represented by a material point in the 3D space. To each point an abstract mass is assigned. We start the walk in the point with coordinates (0,0). In each step this point is shifted by a unit vector. We represent the bases by the following unit vectors: A=(−1,0,1), G=(1,0,1), C=(0,1,1), and T=(0,−1,1). At the end of the vector we locate a mass m=1. As a consequence, the 3Ddynamic graph is obtained. It consists of the material points in the 3D space with the unit masses. The distribution of the points in the space is determined by the sequence.
The coordinates of the center of mass of the 3D-dynamic graph, in the {X,Y,Z} coordinate system are defined as where x i , y i , z i are the coordinates of the mass m i . Since m i =1 for all the points, the total mass of the sequence is N=∑ i m i , where N is the length of the sequence. Then, the coordinates of the center of mass of the 3D-dynamic graph may be expressed as The tensor of the moment of inertia is given by the matrix  with where x i μ , y i μ , z i μ are the coordinates of m i in the Cartesian coordinate system for which the origin has been selected at the center of mass. The eigenvalue problem of the tensor of inertia is defined as where I k are the eigenvalues and ω k -the eigenvectors. The eigenvalues are obtained by solving the third-order secular equation The eigenvectors ω 1 , ω 2 , ω 3 are orthonormal. Thus, they form a basis for a new coordinate system. The corresponding axes of this new system are denoted Ω 1 , Ω 2 , Ω 3 and referred to as the principal axes. The eigenvalues I 1 , I 2 , I 3 , are called the principal moments of inertia and are equal to the moments of inertia associated with the rotations around the principal axes.
The relative orientation of the new and old coordinate system may be described by the cosines of properly defined angles. Let M 1 , M 2 , and M 3 denote, respectively, the planes (X,Y), (X,Z), and (Y,Z). Similarly, N 1 , N 2 , N 3 stand for the planes (Ω 1 ,Ω 2 ), (Ω 1 ,Ω 3 ), (Ω 2 ,Ω 3 ), respectively. For the characterization of the 3D-dynamic graphs we use the cosines of the angles between the planes of the two systems of coordinates: It is also convenient to use square roots of the normalized principal moments of inertia:

Results and discussion
The new approach has been applied to histone H4 coding sequences of different species listed in Table 1 and for alpha globin coding sequences of different species listed in Table 4. The lengths of all histone H4 coding sequences are N=312 and of all alpha globing coding sequences are N=429. Some examples of 3D-dynamic graphs are shown in Fig. 1. Figure 2 shows 2D-dynamic graph for the same sequence (No. 3 in Table 1) as in Fig. 1. 2D-dynamic graphs remove the degeneracy present in the Nandy plots [5]. This degeneracy comes from the so called repetitive walks (walks performed back and forth along the same trace). By the introduction in the 2D-dynamic graphs points with different masses the repetitive walks can be recognized both graphically and numerically (the descriptors depend on masses different than 1). However, the 2D-dynamic graphs still do not retain the history of the sequence. Introducing the third dimension one can avoid self-overlapping of the graph.
Numerically, each graph is characterized by descriptors. The values of the descriptors considered in this work are shown in Tables 1, 2, 3, 4, 5, and 6. Due to the choice of the unit vectors representing the four bases, μ x and μ y give information about the relative number of particular bases in the sequences, and μ z contains information about the lengths of the sequences only. μ x and μ y shown in Tables 1 and 4 are identical to μ x and μ y for the 2D-dynamic graphs for the same sequences [1]. New information is contained in other descriptors (Tables 2, 3, 5, and 6). The descriptors are very sensitive: they correctly identify a single-base difference between two sequences. The sequence no. 6 in Table 4 (EF605407) differs by two bases from the sequence (MMAGL1) used in the calculations in [1]. The base T in MMAGL1 is replaced by  Fig. 4 Classification diagram C 11 -C 12 -C 13 Table 5 Principal moments of inertia of the graphs and cosines of the angles relative to M 1 representing alpha globing coding sequences  Using the present approach one can also create very detailed classification diagrams (in this case, for histone H4 coding sequences of evolutionary similar organisms). The similarity matrix using the standard Clustal W approach for histone H4 coding sequences we gave in [3] (the similarity values are either larger or equal 78%). The considered sequences are rather similar to each other and it is difficult to find a property which allows to distinguish between  Table 1) with the sequences of plants rather than with the ones of vertebrates. Using 2Ddynamic representation we found some properties that in effect give the classification of the sequences representing plants and vertebrates [24]. In the present work, we find more descriptors that give a similar classification.
The descriptors representing the sequences of plants and of vertebrates are located in different parts of the diagrams. In order to visualize the classifications, the clusters of descriptors corresponding to different species have been separated by planes.
Summarizing, both approaches (2D and 3D-dynamic representations) are examples of graphical representation methods. Very popular methods based on the alignment of the sequences give rather limited information about similarity/ dissimilarity of the sequences. Their degeneracy is relatively high. The same similarity values are obtained if T, C, G, or A bases align. Using graphical representation methods one has a chance to consider different aspects of similarity separately, both graphically and numerically. The computing time of these methods is low.
The 3D-dynamic graphs are generalizations of the 2Ddynamic graphs. The descriptors used for the characterization of the graphs are also related to the dynamics. The proposed descriptors of the 3D-dynamic graphs lead to new classifications diagrams for the considered data, analogously as for the 2D-dynamic graphs [24]. Therefore the descriptors proposed for both 2D and 3D-dynamic graphs are good, reliable and sensitive, tools for similarity/dissimilarity analysis of DNA sequences. The 3D-dynamic graphs retain the history of the sequences and this is one of their advantages. The consecutive bases in the sequences are represented by the appropriate parts of the 3D-dynamic graphs (the 3D graph never overlaps with itself). Therefore the future applications of the 3D method both as a graphical and as a numerical tool seem to be promising.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.