DNA Mutations via Chern-Simons Currents

We test the validity of a possible schematization of DNA structure and dynamics based on the Chern-Simons theory, that is a topological field theory mostly considered in the context of effective gravity theories. By means of the expectation value of the Wilson Loop, derived from this analogue gravity approach, we find the point-like curvature of genomic strings in KRAS human gene and COVID-19 sequences, correlating this curvature with the genetic mutations. The point-like curvature profile, obtained by means of the Chern-Simons currents, can be used to infer the position of the given mutations within the genetic string. Generally, mutations take place in the highest Chern-Simons current gradient locations and subsequent mutated sequences appear to have a smoother curvature than the initial ones, in agreement with a free energy minimization argument.

based on approaches typical of statistical mechanics applied to complex systems, but rather on first principles of field theories of physics. This novel point of view might be used completing outputs derived from statistical methods, to address issues of biological and medical sciences, such as preventing diseases, predicting the evolution of a genetic string or investigating the docking among biological large molecules, potentially implementing the nowadays knowledge of the biological scenario. The link between gravitational theories and the dynamics/interactions of complex biomolecules is the topological nature of the former which can be essential to describe the complicated physical-chemical and biological behavior of the latter, very much relying on their topology. Basically, the main idea is to describe the DNA curvature by using the same formalism used for the space-time, treating the interactions occurring in biological systems as driven by the same general principles that govern the gravitational interaction.
Moreover, the deterministic approach based on Chern-Simons gravity can be also merged with the intrinsic probabilistic aspect of standard bioinformatic techniques in different ways. As an example, using topological field theories to describe DNA configuration can provide the exact position in which mutations take place, by means of the comparison between two sequence curvatures. Once the position of the mutation is identified, bioinformatics is able to predict the probabilistic evolution and the clinical impact of that mutation. Another potential application which can be considered in the context of Chern-Simons formalism, is the docking between macro-molecules [13]. The latter can be understood as the interaction among different points, which tend to attract each other only where the corresponding curvatures are similar (by analogy with the gravitational interaction). Also in this regard, the probabilistic vision provided by bioinformatic techniques can be combined with the prediction given by topological field theories, in order to develop a coherent scheme capable of predicting where and when a disease could manifest.
Although the application of Chern-Simons gravity to complex systems seems to be unusual, topological field theories are deeply studied in several branches of physics, due to their suitability at ultraviolet (UV) and infrared (IR) scales [14][15][16][17][18]. In general, they involve Topological Invariants, namely quantities which are conserved under homeomorphism transformations. Topological invariants, indeed, only depend on the spacetime topology, independently of the point-like geometry [19]. They find their best application in the description of the gravitational interaction, and are considered to the purpose of finding alternatives to General Relativity (GR) which better adapt to the quantum formalism [20][21][22][23][24][25][26][27][28].
Moreover, although the theoretical predictions of GR are perfectly consistent with observations at the level of solar system, the theory suffers some shortcoming at larger scales. As an example, the late-time accelerated expansion of the universe is nowadays addressed to a never detected form of energy, called Dark Energy. Similarly, incompatibilities in the galaxy rotation curve led to the introduction of Dark Matter, which is supposed to account for the 85% of matter in the Universe and to have had a high influence in the evolution of the latter. These are two of the biggest problems suffered by GR; for a complete discussion see e.g. [29][30][31][32][33] With the aim to solve part of these issues, mainly those related to a self-consistent quantization of the gravitational interaction, in the first half of twentieth century, S.S. Chern and J.H. Simons developed a topological field theory capable of describing gravity as a gauge invariant theory of different gauge groups [34]. It turns out that n-dimensional Lagrangians whose exterior derivative gives n + 1-dimensional topological invariants, are quasi-gauge invariant, i.e they only change by a surface term after performing a gauge transformation. However, the lack of non-trivial topological invariants in even dimensions, restricts the validity of the formalism to odd dimensions, only. This is the main obstacle toward the construction of a 3+1 dimensional topological theory of gravity, though odd-dimensional topological theories find large applications in several fields. See e.g. [14][15][16][17][18] for basic foundations of Chern-Simons gravity and [35][36][37][38] for applications.
Due to the applications to the three-dimensional electromagnetic theory [39,40], one of the most studied Chern-Simons Lagrangian is the 2+1 dimensional U (1)-invariant Lagrangian, namely: with A being the one-form connection and dA its exterior derivative. Notice that the exterior derivative of L CS provides the four-dimensional Pontryagin density, namely P (4) = F ∧ F , where F represents the two-form curvature defined as F = dA.
The Lagrangian in Eq. (1) can be also applied to standard electromagnetic theory, providing a massive wave equation which carries extra polarization modes. Specifically, in coordinates representation, the Chern-Simons term can be considered along with the free electromagnetic Lagrangian to provide a massive wave equation of the form ( + m 2 ) µνρ F µν = 0, with m being a constant having mass dimension, the d'Alembert operator ≡ ∂ µ ∂ µ , F µν the electromagnetic tensor and µνρ the Levi-Civita symbol.
Another topological invariant, used to construct a gauge-invariant Lagrangian, is the four-dimensional Euler density, which turns out to be the exterior derivative of the three-dimensional Anti de Sitter-invariant Chern-Simons Lagrangian, that is: with µνρ being the Levi-Civita symbol, R µν the two-form curvature, ∧ the exterior product and l a real constant with dimension of length. In general, the 2n − 1-dimensional Chern-Simons Lagrangian, invariant under the local Anti de Sitter group, reads: Due to the AdS/CFT 1 correspondence, the above Lagrangian is mainly considered in five dimensions, where cosmological and spherically symmetric solutions can be analytically found [43][44][45][46][47][48][49][50][51][52][53]. The Chern-Simons approach, as we are going to discuss, can represent a starting point for the analysis of biological systems. From a conceptual point of view, the issue comes out because Quantum Mechanics, being a linear theory, could not be sufficient to approach the high non-linearity of biological systems. Due to this, the latter can be suitably described by non-linear theories like GR or Chern-Simons.
For instance, by means of the Chern-Simons formalism, some biological problems can be addressed, such as the presence of knotted DNAs and their interactions with proteins [54]. Furthermore, in [55] the interactions of unknotted RNAs with knotted proteins have been analyzed in the process of codon and correction of RNA in methil transfer, as well as a general equation to solve the dynamics of knotted proteins has been proposed by Lin and Zewail [56], based on the Wilson loop operator for gene expression with a boundary phase condition.
On the other hand, basic foundations lying behind the application of Chern-Simons theory to biology can be found in [57] and [13]. In these references, the authors develop the formal structure of the theory and consider some application to biological system, in order to unveil the mechanism of DNA-RNA transcriptions. They also provide some insights to specifically describe the junk area within the DNA sequence [57]. In [57], the theory is applied to the docking mechanism of biological macromolecules, such as the configurational dynamics occurring in protein-protein interactions.
Without claiming completeness, in Sec. II we outline the main properties of the theory, with the aim to subsequently test its validity by considering DNA sequences and introducing known mutations. The introduction of a mutation yields a change in the point-like curvature of the given sequence, which may give important information regarding the biological impact carried by such mutation. From the mutated sequence, it is possible to infer the frequency/probability of the mutation to occur, as well as to predict the evolution of the system towards a given configuration. This paper is organized as follows: in Sec. II we briefly review the application of Chern-Simons theory to DNA and RNA systems; in Sec. III the formalism is then applied to different strings of KRAS human gene and to SARS-CoV-2 virus sequences. In the former case, we apply the model to analyze mutations occurring in few regions of the KRAS human gene. The latter is a gene acting as an on/off switch in cell signaling which, among its functions, controls cell proliferation. When KRAS is mutated, negative signaling is disrupted, with the consequence that cells can continuously proliferate, often degenerating into tumors [58,59].
In our analysis KRAS sequences with mutations are thus compared with reference sequences, with the aim to use Chern-Simons theory to infer predictions of biological interest. As for the latter case, which is naturally one of the most studied RNA sequence to date due to pandemic, using a genome wide approach, Bobay et al. [60] examined SARS-CoV-2 RNA, observing that recombination events account for approximately 40% of the polymorphisms, and gene exchange occurs only within strains of the same subgenus (Sarbeco virus). Moreover, frequent mutations tend to increase the likelihood of convergent mutations, in regions exposed to a major positive selection, causing analogies in the sequences that could be misinterpreted as it was a recombination, and introduce new diversifying mutations which might accumulate, hiding past recombination events [60].
Genomic sequences of various SARS-CoV-2 strains from all over the world are available on specific platforms (eg. GI-SAID) and increasingly monitored to timely track SARS-CoV-2 variants [61]; as large databases and systematic sequencing are required, irregular sampling in time and space represents a crucial limitation to track pandemic evolution. Genetic diversity observed in SARS-CoV-2 populations across distinct geographic areas suggests independent events of SARS-CoV-2 introduction occurred, with few exceptions including China, being the original source, and, to a lesser extent, the early involved Italy [62]. Quantitatively, amino acid mutations were found to be significantly more frequent over the entire viral sequence in SARS-CoV-2 genomes tracked in Europe (43.07%), than in Asia (38.08%) and in North America (29.64%) [61].
Here we compare sequences of single filament RNA SARS CoV-2 viruses coming from different countries, using Chern-Simons currents to potentially explain the reason why SARS-CoV-2 variants seem to exhibit a higher incidence during the 2020/2021 pandemic. Finally in Sec. IV we conclude the work discussing results and future perspectives.

II. THE CHERN-SIMONS THEORY FOR DNA SYSTEMS
In this section we overview the application of Chern-Simons theory to DNA/RNA systems, outlining the main results obtained in [57]. The first step is to use quaternion fields to define a set of Nitrogen Bases over the DNA or RNA, namely being The one-form connection A can be thought as a state of the above written nitrogen bases, namely A ∈ {A, T /U, C, G}); consequently the DNA curvature in the configuration space of nitrogen bases is represented by the two-form curvature F = dA, which in coordinates representation can be written as: Therefore, taking into account the SU (2)-invariant Chern-Simons three-dimensional action it is possible to define the Chern-Simons current as the measurable, gauge invariant quantity that can be obtained from the expectation value of the Wilson loop: Wilson loop is the trace of a path-ordered exponential of the gauge connection and represents the only gauge invariant of the theory: Wilson Loops can be obtained from the holonomy of the gauge connection around a given loop and are mainly used in gauge lattice theories and quantum chromodynamics [9][10][11][12]. They have been formerly introduced to address a nonperturbative formulation of quantum chromodynamics [63], but nowadays play an important role in the formulation of Loop Quantum Gravity, particle physics and String Theory. The choice of the three-dimensional action is the key point of the method: standard biology suggests that nitrogen bases combine each other in triplets, forming therefore a three-dimensional space of configurations that can be described by means of the Chern-Simons three form. Any point of the space is, thus, labeled by a given triplet. Sixty-four possible combinations arise after combining the nitrogen bases in triplets, and correspond to the combinations occurring in the genetic code. For this reason the space turns out to be discrete and finite.
By means of Eq. (5), it is possible to define a discrete superstate of configurations, in which the nitrogen bases represent the dynamical variables, so that the genetic code is labeled by the Chern-Simons currents only. After few calculations, the curvature spectrum of the genetic code can be obtained [57], as reported in Table 1. The same analysis can be also pursued by considering the amino acids, so that the genetic code is equivalently described by 21 different Chern-Simons currents. The simplest way to construct a curvature spectrum with respect to amino acids, is to take the average values of the Chern-Simons currents which refer to triplets coding for the same amino acid. The Chern-Simons currents of the amino acids are listed in Table 2. Table 2. Value of Chern-Simons current for the amino acids.

Amino acid CS Current Amino acid CS Current Amino acid CS Current Amino acid CS Current
Notice that the formalism permits to assign a numerical value to each component of the genetic code, finding a point by point correspondence between triplets and curvature. Such a curvature of the DNA is the key parameter of our approach, as it may provide several predictions about the docking between two different parts of DNA or between DNA and RNA. The genomic curvature can be also used to find out those positions having highest probability to exhibit a mutation. The introduction of the mutation, indeed, leads to a local variation of the curvature, whose value might suggest the clinic importance and the impact of the corresponding disease. Moreover, the curvature spectrum can provide important insights regarding the evolution of the genomic strings: those points with highest curvature are the best candidates to evolve toward a stabler configuration, making the entire sequence more uniform in the configuration space of all the possible triplets.

A. The Chern-Simons Current in Mutated KRAS Human Gene
The first application of the above described method is focused on the comparison between mutated and standard DNA sequences. In particular, first we consider the KRAS gene, whose details are reported in App. A. It is located in the 12th chromosome, from the base 25,205,246 to 25,250,929 and represents one of the most mutated human genes [58,64,65]. Then we introduce some known mutations into the original sequence, causing a change in the Chern-Simons current. Being the current linked to the curvature of the DNA, the configuration space made of nitrogen bases changes the point-like curvature wherever a mutation is introduced.
By means of physical considerations, we theoretically expect the mutation to level out the graph, providing smoother variations of the current with respect to those of the original sequence. By analogy with other physical systems, the curved point is surrounded by a non-equilibrium region, which in turn tends to mutate in order to reach a minimum free energy state.
Moreover, this prescription is in agreement with the general criterion which governs thermodynamic transformations, according to which any spontaneous transformations must minimize the Gibbs free energy. This statement can be simply proved by considering the definition of the Gibbs free energy G, that is with p being the pressure, V the volume, T the temperature, S the entropy and U the free energy. Neglecting the contribution of p and setting T = const. (as standard for biological systems), it turns out that for the system to undergo a spontaneous transformation, the entropy must increase as the free energy decreases. The latter can be thought as the expectation value of the Hamiltonian of the system, which includes potential and kinetic energies. Therefore, requiring the Gibbs free energy to decrease spontaneously is equivalent to require the gravitational potential to decrease spontaneously. This means that, as the system evolves toward a configuration with ∆G < 0, the potential energy decreases. By applying these considerations to the formalism developed in Sec. II, a spontaneous transformation must yield an evolution of the system toward flat regions in the configuration space. For these reasons, mutations of DNA/RNA sequences occur to render the graph smoother and to bring the general state toward an equilibrium configuration. Reversing the argument, those mutations which make the sequence more peaked than the original one, are supposed to occur less frequently, since they lead to a higher free energy configuration. Therefore, significant variation should not occur in flat regions of the curvature spectrum, which are closer to an equilibrium state. The result of the analysis in KRAS human gene via Chern-Simons current method is reported in Fig. 1a. Most significant mutations occur in the regions comprised between the 5th and the 15th amino acid, and between the 30th and the 35th. For this reason, within these intervals, the original sequence differs from the mutated one. This is due to the fact that the presence of the point-like mutation (see e.g. position 7, 14 and 33) also influences the curvature of the surrounding regions. Nevertheless, the Chern-Simons currents of the two sequences converge again in correspondence of those points which are not affected by mutations. This shift between original and mutated sequence is more evident in Fig. 1, due to the large amount of mutations introduced in a short sequence made of few amino acids (see Table 3). Further details are reported in App. A. As expected by the free energy minimization argument, mutations occur whereas the curvature is most peaked, providing a smoother general trend, with respect to the original one. Notice, however, that mutations are not directly correlated to peaks, but rather to curvature gradients, namely they are mostly located near those points whose curvature is very much higher (or lower) than their contiguous. By computing the differences between contiguous points, it is possible to associate mutations to peaks, as reported in Fig. 1b.
In the same region of the twelfth chromosome, another set of mutations occurs (Fig. 2) Figure 2. Chern-Simons current in KRAS gene. Figure 2a shows the comparison between the original sequence (black dashed line) and the mutated one (red solid line), while Figure 2b shows the Chern-Simons current variation, obtained comparing the point-like differences between contiguous points of the original and mutated sequences. The region considered is 25,245,274 -25,245,384 of the 12th chromosome. Fig. 1 and Fig. 2 refer to the same region of KRAS, though different mutations are introduced in the two cases. More precisely, mutations occurring in these selected regions are split in two different sets, in order to facilitate reading and visualizing the curvature spectrum.
In the second half of the plot, the mutated sequence results shifted with respect to the original one. This can be physically motivated by considering the features of the mutations introduced in position 22 and 26. Specifically, both mutations (see Table  4 for details) provide Chern-Simons current values which largely differ from the corresponding original ones. Therefore, though the variation is point-like, the overall trend is highly influenced by the occurrence of such mutations, with the consequence that also the surrounding regions result shifted. However, in position 21 and in position 23 (where no mutations occur) original and mutated sequences have the same current again.
It is worth noticing that, even in this case, a mutation corresponds to each peak, as theoretically inferred. Moreover, the mutated sequence makes the overall trend smoother than the original one, in agreement with theoretical predictions. To confirm this result, two other different regions of human KRAS are analyzed in Fig. 3 and Fig. 4, where the original sequences are again compared with the corresponding mutated ones. Mutations are carefully chosen according to the database BioMuta. Also in this case, further details can be found in App. A.  Notice that, in both cases, mutations occur where the sequence is peaked, in agreement with theoretical predictions. This is particularly evident in the former case (Fig. 3), where almost all peaks correspond to a mutation (see also Table 6). Moreover, the introduction of the mutations has the effect to avoid abrupt differences in the overall trend of the curvature spectrum.
On the contrary, notice that few mutations also occur in flat regions. This may be due to other factors that induce mutations, not taken into account by our model at the moment, which is only based on the curvature gradient variation and the free energy minimization.

B. The Chern-Simons Current in Mutated COVID-19 Sequences
In this subsection we discuss the results provided by the application of the Chern-Simons formalism to different variants of SARS-CoV-2 virus. Let us start by introducing the main features of the latter.
The S glycoprotein is a Class I fusion protein, composed by two subunits (S1,S2) [66]; the S1 subunit contains the receptor binding domain (RBD), directly binding to the main receptor human angiotensin-converting enzyme 2 (hACE2) and determinant for both host range and cellular tropism [67]; the S2 subunit is directly involved in membrane fusion and virus endocytosis [68,69]. Receptor binding triggers conformational changes; specifically, host proteases (such as furin) will mediate its functional transition by cleaving the interface between the two subunits (S1, S2). Additionally, the RBDs of SARS-CoV and SARS-CoV-2 are highly similar, despite few key residues, appearing to enhance the transmissibility of the novel CoV [70,71]. The spike glycoprotein is the main inducer for neutralizing antibodies [72]; unwillingly, it shows the highest mutation rate among SARS-CoV-2 proteins [73,74], and a variable glycosylation can create novel CTL epitopes, possibly altering hACE2 binding and accessibility to proteases and neutralizing antibodies [68,75].
The purpose here is to find a correlation in terms of Chern-Simons current among the mutations of the sequences, a correlation that could possibly give insights aiming at localizing and predicting mutation sites in the new variants of the virus. We analyze eleven strings, which underwent mutations with respect to the original sequence of SARS-CoV-2, firstly detected in Wuhan at the end of 2019. They all correspond to the same RNA region and was selected in accordance with Fig. 5. In particular, we compare the difference of Chern-Simons currents, considering variants from Asia, Europe, Oceania and North America. Specifically, sequence 19A is the first one which arose in Wuhan and have been spreading during the initial 2020 outbreak; 19B is the first detected variant in China; 20A dominated mostly in Europe from march 2020, to subsequently spreading out globally; 20B and 20C are variants of 20A which mainly spread in the early 2020; finally, 20D, 20E, 20F, 20G, 20H, 20I occurred on summer 2020 as variants of 20B, 20C and 20A. Among them, 20I and 20H are English and south-African variants. To be more precise, we used the tool Nextclade, yielding the graph of Fig. 5. This figure shows the aforementioned evolution of the sequences (https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv). Mutations of the triplets which caused the occurrence of variants are reported in App. B. In our analysis, because of the large amount of nitrogen bases, we only compute the difference of Chern-Simons currents between the original sequence and the mutated one. Specifically, we consider the slope of the current for each mutation, namely the number Specifically, high values of the slope represent a large discrepancy between the original sequence and the mutated one in the curvature spectrum, while lower values account for small differences. We perform the one-to-one comparison between contiguous sequences (showed in Fig. 5), with the aim to find out a correlation between slopes and mutations. Each variant is compared with the corresponding predecessor, so that no comparison is carried on between sequences which are not directly evolving from one another, according to Fig. 5. For example, sequence 19A is not compared with 20I, as well as 20D is not compared with 20H. The analysis shows that mutations occur with highest probability where the slope (as defined in Eq. (11)) of Chern-Simons current assumes extreme values, namely when its modulus is extremely high or extremely low 2 .
This means that even those mutations which do not cause significant current variations can support variants. In particular, the one-to-one comparison between the original and the corresponding mutated sequences shows that approx 70% of mutations corresponds to extreme values of current. Such percentage increases up to 80% if we consider only those mutations which will effectively spread out (denoted in italic bold and highlighted in light yellow), as showed in App. B, Figs. 7-17. Consequently, this statistic can be used to point out which occurred mutation of the sequence can be more likely to evolve in a real, spread out variant of the virus. To be more precise, once we know the position of a given mutation, Chern-Simons currents can suggest which type of triplets will arise from such mutation. In particular, as provided by the analysis, the mutated sequences should exhibit mutations whose related Chern-Simons currents provide extremely high or extremely low percentage variations, with respect to the original ones. Therefore, we do not expect the sequence to evolve such that mutations cause intermediate values of current variations; rather, if the position of the mutation is known, we expect the triplet to mutate towards those possible configurations whose Chern-Simons current is either very close or very far from the initial one (in terms of percentage). This means that from a given triplet we can select a set of possible mutations, namely those which cause either high or low current variations.
The above results constitute a part of the analysis of SARS-Cov-2 virus, which mainly relies on the evolution of given sequences towards mutated configurations. As mentioned above, this first part turns out to be useful to restrict all possible mutations within a given range, but can provide suitable information only if the position of the mutation is known a priori. From this point of view, no information regarding the mutation position can be provided. Now, in the next part, we use Chern-Simons formalism to select regions where mutations are most likely to occur.
With the aim to link the currents with the probability to exhibit mutations, we analyze only those sequences which generate variants, i.e. 19A, 20A, 20B and 20C. Specifically, as we can infer from Fig. 5, 19A generates 19B and 20A; 20A generates 20B, 20C and 20E; 20C generates 20H and 20C. Similarly to the previous analysis of KRAS human gene, we aim to relate the curvature spectrum with the likelihood to find out mutations. To this purpose, we calculated the Chern-Simons currents of 19A, 20A, 20B and 20C sequences and computed the current variations in those points affected by known mutations. Specifically, let n be the position of a given mutation along the sequence and j n the corresponding Chern-Simons current. The normalized current variations are computed in accordance to the formulas: and This means that we are investigating the current variations where the mutations occur, with respect to the previous and the subsequent points, respectively. The comparison between these values, calculated for the triplets affected by mutations and the surrounding points, can be used to relate the current variation with RNA mutations. This prescription is suggested by the analysis performed on human KRAS regions, where it turns out that points far from the equilibrium state in the curvature spectrum are the best candidates to provide mutations. Here, given the large amount of amino acids, the curvature spectrum cannot be computed entirely. For this reason, we only focused on noticeable mutations, namely preferred points which exhibit known triplet variations.
The analysis again shows that mutations mostly occur where the current variation, as calculated in Eqs. (12) and (12), is high-valued. More precisely, in a set of 125 total mutations, 59% of them (74/125, see Tables 7-10 Tables 7-10.
This result can be explained based on the achievements of the previous section, where non-equilibrium points turned out to be best candidates to provide nitrogen bases mutations. More precisely, large values of the current variations account for peaked regions, which tend to evolve to a lower curvature, that is a lower current. Reversing the argument, large variations of current are exhibited by points which are far from the minimum of energy, which is supposed to occur where the trend is constant.
In this framework, the application of Chern-Simons theory to DNA/RNA systems such as SARS-CoV-2 or KRAS, can give important information about the positions where the mutation is more likely to manifest. The consequent biological impact naturally follows, since this prediction can be used to prevent the occurrence of variants or to know in advance the probability for the sequence to evolve towards another configuration.
Taking into account these results, let us evaluate the spike region of SARS-CoV-2 virus only, with the aim to analyze the tertiary structure. In particular, we rely on the interaction points reported Ref. [76], according to which the amino acids of the spike protein are interact as reported in Fig. 6 3 . Figure 6. Tertiary structure of the spike protein of SARS-CoV-2 virus (as taken from [76], Fig. 3 therein). Green, orange and pink colors refer to the oligomannose content. Specifically, glycan sites labeled in green contain 80-100% of oligommannose, those labeled in orange 30-79% and those labeled in pink 0-29%. Light blue denotes ACE2 binding sites.
In light of the results provided by Ref. [76], we analyzed 11 contact points, namely 22 corresponding amino acids. The features of these latter, such as position, current or percentage variation with respect to the surrounding triplets are reported in Table 11.
We considered 22 sites and calculated the Chern-Simons current variation of each amino acid with respect to the surrounding points in the linear structure. Beside the first amino acid (position 19), none of them is affected by known mutations. It is interesting to observe that the amount of large variations in those sites which are not affected by mutations is 7/21, namely 33%. Note that such a percentage is quite lower than the previously discussed one, which is of the order of 72%. This confirms that Chern-Simons current variations is high-valued whereas mutations develop. Moreover, these seven sites which undergo large percentage variations are oligomannose-type, as pointed out in Ref. [76]. This, in principle, could be the reason of such large values. For instance, the high value of current variation in position 234 might be due to the proximity of the site with ACE-2, or to the high percentage of glycosylation occurring in such amino acid.
Moreover, it turns out that the docking points have same or similar values of current, which means low percentage variation. This is expected from a physical point of view, since those points with same curvature tend to interact in order to reach a stabler configuration. Also here, the analogy with gravitational interaction is simply understood.

IV. CONCLUSIONS AND PERSPECTIVES
In this work we apply the schematization method of the nucleic acids representation, based on the Chern-Simons theory as developed in [13,57]. Our main purpose is to analyze DNA sequences, such as those contained in the KRAS human gene, and some RNA noticeable sequences such as those of the most known SARS-CoV variants. In particular, we compare known windows of the reference sequences with the corresponding noticeable mutations, reported in well-known and reputed genetic databases. To develop the formalism, the nitrogen bases are recast as quaternion fields, combined in triplets as dictated by biology golden rules. These triplets form a three-dimensional space of configuration that can be described through the Chern-Simons three form. The expectation value of the only observable of the theory, the Wilson Loop, provides the so called Chern-Simons current. The latter gives point-like information of the curvature of the genetic code, and can be used to compute the curvature spectrum of a given genetic string. If some triplet of the initial sequence changes due (for example) to the replacement of a nitrogen base, the point-like curvature changes accordingly. Therefore, the introduction of some mutations yields a variation in the Chern-Simons current. The difference between the original and the mutated sequence can be used to infer where DNA-DNA (or DNA-RNA) interactions take place, or to predict the evolution probability toward a given configuration.
On the one hand, the latter application of our method can shed light on the possibility to develop proper vaccination strategies against, for instance, SARS-CoV-2 virus; on the other hand it can potentially be used to monitor pharmacological therapies and to quantify the risk of developing DNA/RNA mutations between remission and relapsing phases.
The result of the analysis of four different regions of KRAS human gene, an important gene acting as on/off switch in cell signaling and controlling cell proliferation, shows that common features are shared in all analyzed cases. Specifically, in almost all cases, a curvature peak of the regions corresponds to a known mutation, which often yields a new smoother curvature spectrum with respect to the reference. This can be theoretically motivated by physical considerations: the most peaked regions represent non-equilibrium points, which tend to evolve toward stabler configuration of minimum free energy.
Consequently, it follows that the variations in the curvature spectrum, leading to genetic mutations, likely take place in those regions with higher curvature. This means that, as an effect of the mutations, the overall trend of the curvature spectrum of the sequence tend to become smoother and smoother with no avoid abrupt variations, making nearby points to have similar values of current. As mentioned above, this happens for most of the analyzed cases; however, DNA and RNA evolution can certainly also depend on many other factors that cannot be taken into account by this method. The application of Chern-Simons theory to DNA systems, indeed, only relies on the intrinsic curvature calculation assumed by biological systems in the configuration space made of nitrogen bases. A free energy minimum principle, then, leads to the evolution of the configurations and may suggest likely positions for possible mutations.
We utilize our method also to analyze RNA sequences: in this case we pick the COVID-19 virus, a striking example of the present time, and apply the same prescription to more than 20Kbases of the COVID-19 virus, coming from different countries. Due to the intrinsic attitude of RNA viruses to change their sequence with replication, mutations of various types can occur such as recombination and reassortment, rendering more complex the related genomic analyses.
Rather than analyzing the entire RNA sequence of the virus, which is very long, we prefer to focus on the regions that are reported to exhibit the most significant mutations, such as the region coding for the SARS-CoV-2 spike protein. Interestingly, the analysis shows that most of mutations occur where the slope of the Chern-Simons current takes extremely high values, which accounts for peaked regions in the curvature spectrum. This result can be explained again considering the principle of minimum free energy, according to which amino acids in correspondence of peaks of the Chern-Simons current value are intrinsically unstable and therefore tend to evolve towards a stabler configuration. Furthermore, we note that few mutations are also exhibited in correspondence of low current values. This may happen because some regions with low current values, namely having a small curvature and being rather flat, often are the border with areas with steep gradients of the current value denoting high curvature. Then, in some cases, even regions with very small curvature may be affected by a close instability, due to the presence of a current gradient nearby. By comparing low current variations listed in Figs. 7-16 with Tables 7-10, it turns out that 47% of points which exhibit low current variations (between mutated and original sequences), are unstable due to the presence of a current gradient nearby.
Notice that, in our analysis, we only considered 2020 sequences of SARS-CoV-2 virus. This is mostly due to the fact that the best part of variants spread out during 2020, thus a comparison like the one reported in Fig. 5 turns out to be more interesting.
As a final remark, the importance of the applications here discussed is twofold. On the one hand, it tests the capability of a topological theory in schematizing DNA and RNA configurations to correctly represent their interactions and mutations. On the other hand, it suggests a general criterion to predict the location in genetic sequences where it could be most likely a mutation to take place. This novel method, based on analogue gravity, can be helpful in addressing biological issues, especially when combined with standard bioinformatic approaches. For instance, the probable evolution of a given string, provided by the Chern-Simons formalism, can be approached to mathematical and statistical techniques to increase the likelihood to localize the mutations. In this sense, the approach is deterministic and based on the dynamics of structures, rather than on their mere description. In future works we plan to provide further confirmation of the validity of our approach, by extending the analyses to other genetic sequences both for DNA-and RNA-based systems. We also aim to study the interactions between macro molecules, in order to check whether their point-like curvature values can provide information regarding the docking probability, or predict the points where interactions occur.                  Table 11. List of amino acids of 19A sequence in the spike protein, with corresponding positions, Chern-Simons currents and their variations with respect to surrounding positions. Listed amino acid are those involved in forming the tertiary structure, according to Ref. [76].