Quantum chemical “Aufbau” principles: how to estimate the shape of highly flexible (bio-)polymers? A recursively extendable “chemion picture” of Euler-Hückel-type

Abstract An outline is given of how to split the n-dimensional space of torsion angles occurring in flexible (bio-)polymers (like alkanes, nucleic acids, or proteins, for instance) into n one-dimensional potential curves. Forthcoming applications will focus on the “protein folding problem,” beginning with polyglycine. Context In accordance with Euler’s topology rules, molecules are considered to be composed of “vertices” (atoms, ligands, bonding sites, functional groups, and bigger fragments). Following Hückel, each vertex is represented by only one basis function. Starting from the “monofocal” hydrids \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {CH}_{4}$$\end{document}CH4, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {NH}_{3}$$\end{document}NH3, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {OH}_{2}$$\end{document}OH2, FH, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SiH}_{4}$$\end{document}SiH4, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {PH}_{3}$$\end{document}PH3, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SH}_{2}$$\end{document}SH2, ClH as anchor units, “chemionic” Hamiltonians (of individual “chemion ensembles” and proportional nuclear charges) are constructed recursively, together with an appropriate basis set for the first five (normal) alkanes and some related oligomers like primary alcohols, alkyl amines, and alkyl chlorides. Methods Standard methods (“Restricted Hartree-Fock RHF” and “Full Configuration Interaction FCI”) are used to solve the various stationary Schrödinger equations. Two software packages are indispensable: “SMILES” for integral evaluations over Slater-type orbitals (STO), and “Numerical Recipes” for matrix diagonalizations and inversions. While managing with only two-center repulsion integrals, “implicit multi-center integrations” lead us to the non-empirical fundament of Hoffmann’s “Extended-Hückel Theory.”

Instead of saying "electron," we are going to use the term "chemion" for mainly two reasons: • Due to their localization, chemions are characterized by an "ensemble individuality," which severely violates the indistinguishability postulate of Schrödinger [12,13] and Pauli [14].• Speaking of "valence chemions," we claim the separability of the "valence shell" from its "atomic core" [15][16][17].Later, we will even dare to distinguish "valence ligand chemions" and "valence bond chemions," which dominantly govern the molecular shape due to their responsibility for interfragmental torsions.
Such classifications emphasize the fact that physicists and chemists sometimes look at the same molecular object from different points of view.The questions "Was sind Elektronen?" and "Kann Chemie auf Physik reduziert werden?" are thoroughly discussed in articles and books of Hans Primas and Ulrich Müller-Herold [18][19][20][21][22][23][24][25].In this context, let us just mention another example of an extraordinary view: Topological properties of the charge density lead Bader to his quantum theoretical picture entitled "Atoms in Molecules" [26][27][28][29][30].
A priori predictions of molecular bond lengths, bond angles, dihedral and torsion angles are rather sophisticated [31][32][33][34]."In ab initio electronic-structure calculations, approximate solutions are obtained to the molecular Schrödinger equation.The errors made in such calculations arise from the truncation of the one-electron space... and from the approximate treatment of the N -electron space..." [35].If this is so, we cannot expect high-quality predictions of geometric parameters like bond lengths and bond angles.Focussing on highly flexible organic polymers (like alkane chains, proteins, or nucleic acids), their "secondary structure" is mostly determined by low-barrier rotations around single bonds [36,37].
Next, we shall try to establish some "Quantum Chemical Aufbau Principles" [54,55].Introducing "vertex condensations" of Roothaan's Fock-matrix expression, we are prepared to recursively construct growing diagonal elements from precomputed fragments.A successively augmented construction set will serve us to approximately build-up higher elements from lower units.Precomputed quantities thus can be readily taken from an external storage device, yielding a "Recursively Extendable Chemion Picture" [38,39].
Similar to the atomic "Aufbau Principles" of Niels Bohr [56,57] and Friedrich Hund [58,59], we want to bring some order into quantum chemical descriptions of the molecular world.Therefore, we close the present paper with an outlook to straightforward generalizations, which might help us to estimate the shape of even macromolecular chains, particularly those of biochemical interest.If it becomes possible one day, to substitute our model assumptions by more realistic geometries, appropriate orbital exponents, and above all more complex fragments (including doubly bound groups like phenyl, carbonyl, and the peptide bond), this might be an important step towards a quantum chemical description of the "protein folding problem" [60].
It is one of the basic insights of chemical taxonomy, that molecules can be ordered into compound classes.Due to the recursive nature of our "Chemionic Hamiltonians," classifications of molecular families become evident [61], this time also quantum chemically [62,63].
During the recent decades, considerable progress has been made in our science.However, the topic "explicitly correlated coupled cluster calculations," for instance, is beyond the scope of our article.Focusing on standard orbital theories with linear expansions, our main tools are "Restricted Hartree-Fock (RHF)" and "Full Configuration Interaction (FCI)." The article has four chapters: 1.The Chemical Model; 2. Solution Methods; 3. Compound Classes; 4. Outlook and Conclusions.Appendices: Anchor Data and First Results.

The chemical model
To begin with less complex examples, our paper focuses on polymerized organic chain molecules (normal, iso-and neoalkanes; primary, secondary, and tertiary amines or alcohols, for instance).From an even more exposed viewpoint, our goal is the geometry prediction of biopolymers (like nucleic acids or proteins) with all their sensitive interactions (among them interfragmental torsions and hydrogen bonds).Always keeping in mind these background intentions, our present paper starts with some general considerations.
Based on the Born-Oppenheimer assumption [44], molecular orbital theories, in general, have to specify mainly three items: • The number of chemions under consideration

Euler topologies
"Vertices" can be synonymously understood as "bonding sites": central atoms, hydrogen ligands, and "lone pair positions (love)."Beginning with the eight "monofocals" methane, ammonia, water, and hydrogen fluoride, as well as silane, phosphine, hydrogen sulfide, and hydrogen chloride we can easily see, that they all confirm the Eulerian topology rule [2][3][4]: Inspecting the Lewis-Langmuir formulae of more complex molecular frameworks, we find that the validity of this rule will always hold for "bifocals" (like ethane, methylamine, methanol, for instance), "trifocals" (like propane, ethylamine, ethanol . . .), and all the other "polyfocals" of such kind: It even remains valid for a chain of "functional groups" like H, CH 2 , or CH 3 , NH 2 , OH, F, SiH 3 , PH 3 , SH 2 , and Cl, the formulae of which do not take notice any more of their internal chemionic structure: The geometry of such a polymer mainly depends on the n torsion angles τ [36,37].A non-gradient search for the global minimum of the corresponding n-dimensional "Born-Oppenheimer Energy Function (BOEF)" practically can only be done for n=1 or n=2 [64].Therefore, we first have to answer the question of how to split a higher n-dimensional BOEF into n one-dimensional "potential curves."

Recursive polymerizations
Similar to a continuous polymerization process, which stepwise adds another monomer to its precursor, we consider a recursive chain growth.Taking the normal alkanes and primary amines and alcohols as an example, and using the convenient formula language (with X = CH 3 , NH 2 , OH) [8][9][10][11], it can be written down as in Table 1.

Proportional nuclear charges
In an electrically neutral molecule of Eqs. 2 and 3, the total chemion charge 2#chp corresponds to the sum of "valence charges" Z val C of all the nuclei (i.e., the column index of Mendelejew's and Meyer's "Periodic Table" [69][70][71]).Inspecting Table 1, however, 2#chp=2 chemion charges must be distributed among the #nuc nuclei under consideration.
In order to guarantee a proportional set of positive point charges for any chemionic ensemble , we deal with "proportional valence charges" of the different nuclei [72,73]: The nuclear repulsion energy with respect to any chemionic ensemble then reads as follows:

Chemionic Hamiltonians
For a given chemionic ensemble , the Born-Oppenheimer Hamiltonian reads as follows: where (r i ) is the Laplace operator in Cartesian coordinates.
The components of the ground state energy • are the nuclear Born-Oppenheimer repulsion E nucl of Eq. 6, • and the chemionic part E chem , which will be identified with the lowest "Full Configuration Interaction" energy (FCI) of singlet or triplet multiplicity:

Vertex orbitals of Euler-Hückel-type
From Hückel, we adopt the idea that each Eulerian "anchor vertex" V is represented by only one basis orbital n val V s of spherical symmetry [48][49][50][51][52], also for any "lone vertex (love)."The orbital type is specified through the following [70,71]: , O, F or their "love"s 3 if V = Si, P, S, Cl or their "love"s (10) which is the row index of the "Periodic Table" [70,71].

Anchor orbitals, their exponents and locations
Orbital exponents are considered to be given by the ratio Alternatively, we might follow the recommendations of Slater's rules, which also incorporate an estimated "screening" [74][75][76].The best thing to do, however, would be an exponent optimization by means of a non-linear variation procedure [77] (see Appendix A.3).Nevertheless, we like to stress that we aim at a more symbolic approach in the spirit of Erich Hückel [48][49][50][51][52].
In our model, hence, "lone vertices (love)" are represented by a n val s-function, localized at one of the peripheric tetrahedron positions; for the distance from their center, we adopt the Bohr radius (B ≈ 0.529177249 Å).As an artifact, EHVO bases produce "rotational barriers" also for axial symmetric halogen compounds like F 2 , Cl 2 , and FCl.

Anchor integrals
Quantum chemical calculations, in general, have to deal with at least four types of integrals: For the one-and two-chemion anchor integrals EHVI over EHVO, we use the following notations: Numerical evaluations are available by the SMILES package of FORTRAN routines, for instance [78][79][80].

Recursion orbitals and integrals
Now let us return to the simplified Lewis/Langmuirformulas of Table 1, which promise the successive "Aufbau" of macromolecular chains from smaller units: As we can see, each prolongation step n is composed of two "diagonal" and one "off-diagonal" contribution (αα, ββ (n) , and αβ (n) , respectively).In order to construct two basis functions for all the chain molecules with #chp = 1, we define the following set of normalized "Recursion Orbitals" With them, we arrive at • and "recursive repulsion integrals" R They read as follows: The "recursive attraction matrix" and the "recursive onechemion matrix" then read as follows: Let us illustrate some recursion relations between the diagonal M-matrix elements of ethane, propane, normal butane, normal pentane etc., where M ∈ {S, K}.
Recursion relations for the -dependent matrices M ∈ {V , H } will be discussed below, in the context of Roothaan's Fock matrix expressions.

Solution methods
In order to approximately solve the time-independent Schrödinger Eq. 1, we start from an expansion of the molecular orbitals in terms of the vertex basis.In spite of the linear nature of this expansion, such "Roothaan-Hartree-Fock Orbitals (RHFO)" have to be determined iteratively.Expanding the wave function in terms of all the constructable singlet and triplet configurations, corresponding ground state energies are available by linear variations.Non-linear variations come in through an optimization of a few orbital exponents.The methodical chapter is subdivided into three sections: 2.1.Linear combinations of vertex orbitals 2.2.Self-consistent Roothaan-Hartree-Fock orbitals 2.3.Full configuration interaction

Linear combinations of vertex orbitals
Molecular RHFO are linear combinations of the given EHVO: For convenience, we skipped the chain length (n) on the left side.
With this expansion, molecular "Roothaan-Hartree-Fock Integrals (RHFI)" relate to the "Euler-Hückel Vertex Integrals (EHVI)" as follows: For large basis sets, the transformation of Eq.21, in particular, is quite time-consuming.Due to the irreducible Euler-Hückel expansion, however, its expense is moderate.
Recursion relations for the matrices M ∈ {H , F } essentially are those of Eq. 18.This time, however, they additionly depend on the "diagonal" or "off-diagonal" ensemble parameter , as indicated above.
Given both Fock and overlap matrices F (n) and S (n) , respectively, the Roothaan equation can be solved by means of standard techniques [40][41][42].Due to the density dependence of F , however, the coefficient matrix C RHF and the diagonal matrix E RHF of orbital energies have to be determined iteratively.Self-consistence of the density matrix has been achieved, if for successive iterations (i) and (i − 1) the following convergence criterion σ (i) falls below a predefined threshold:

Full configuration interaction
Seeking for solutions of the time-independent Schrödinger Eq. 1, "Full Configuration Interaction (FCI)" "is the best we can do" [43].Although being sparse, however, the so-called "minimal basis set" [40,41] representations H FCI of manychemion Hamiltonians are truly gigantic, even for rather small molecules [43].
Due to the irreducible nature of the chemionic vertex picture (with #vt x = #chp + 1), on the other hand, the matrix dimensions #sing and #tri p of its singlet block 1 H FCI and triplet block 3 H FCI , respectively, are quite moderate: The corresponding eigenvalue equations now read as follows: Matrix elements may become zero due to group theoretical reasons, with respect to the spatial symmetry of the molecule [65][66][67][68].Regarding the "spin operators" S 2 and S z , matrix elements between states with different quantum numbers S ∈ {0, 1} and M S (with integer values −S ≤ M S ≤ +S) vanish [44]: triplet states thus neither mix with singlet states nor with other triplet states of different M S .
For intermediately normalized wave functions [43], the singlet-type matrix elements (upper triangle) with nonvanishing expressions read as follows: where we used the following relations known from RHF theory [40,41]: The triplet-type matrix elements (upper triangle) with nonvanishing expressions read as follows: Triplet states are triply degenerate: 123

Compound classes
For a given "chemionic ensemble" , the Hamiltonian of Eq. 7 is mainly specified through the quantities #chp( ) and #vt x( ) = #chp( ) + 1, which enter into a parameter chart like Table 1.Parameter charts of "iso-chemionic" molecules (such as methane, ammonia, water, and hydrogen fluoride, for instance) thus refer to some topological relationship.
If the nuclear composition is even the same, family relations become apparently stronger.Table 2 indicates that (due to identical entries) normal alkanes, iso-alkanes, and neo-alkanes, primary, secondary, and tertiary alcohols, etc. form compound classes: their members only differ through {Z prop C , C = 1, . . ., #nuc} (i.e., the number and kind of nuclei of ensemble ) and another nuclear charge distribution, hence.
Considering chemionic intra-valence separations, on the other hand, distinctions of "functional groups" like CH 3 and CH 2 become evident.Most important, however, is the "chain bond" or "backbone bond" like CH 3 −CH 2 , which keeps its parameter chart constant as in Table 2. Parameter tables of identical form thus indicate a chemical relationship: molecular families or compound classes essentially show the same parameter table.
According to Primas and Müller-Herold, those quantum theoretical descriptions are preferable, which bring in some chemical evidence.For our purposes, Lorentzrelativistic kinematics, for instance, are useless; although being "better," such a picture does not offer the desired systematic abstractions [81].

Outlook and concluding remarks
There are at least two items, which could improve our model: • Making the vertex geometry more realistic through empirical bond lengths and angles [82] • Optimizing the few orbital exponents by nonlinear variation [77] Its applicability could be enlarged by adding an Euler-Hückel treatment of: In order to estimate the shape of proteins, condensed quantities for the 20 naturally occurring amino acids should enter into a data bank of fragments, together with the "peptide bond" [83]: The side-chain R is a fragment of the naturally occurring αamino acids glycine (with R=H), alanine, serine, threonine, methionine, valine, leucine, isoleucine, phenylalanine, tyrosine, cysteine, aspartic acid, glutamic acid, arginine, lysine, histidine, tryptophan, asparagine, and glutamine.Proline has a five-membered ring instead of the −NH − CH(−R)− moiety.Similar considerations will be necessary for the conformation analysis of nucleic acids, which are alternant chains of phosphoric acids and pentose sugars, each substituted by a purin or pyrimidine base [84,85] In contrast to polypeptides and proteins, DNA structures are better known.In 1953, after Chargaff had found the complementary pairing of adenine-thymine and guaninecytosine, Watson and Crick proposed their famous "double helix model" [84,85].
Protein geometries, on the other hand, are very hard to predict.For a given sequence of amino acids (called its "primary structure"), a couple of "secondary structures" may appear: α-helices and β-strands, forming either parallel or anti-parallel sheets [83].How such distinct domains fold into a more compact geometry is still an open question of biochemical sciences."The general difficulties in obtaining protein structures using experimental techniques means that there is considerable interest in theoretical methods for predicting the three-dimensional structure of proteins from the amino acid sequence: this is often referred to as the protein folding problem" [60,[86][87][88][89][90][91][92][93][94][95].
While matrix dimensions can be kept constantly low during the described polymerization process, the evaluation of anchor integrals for each new torsion angle τ n is still a "computational bottleneck" [40,41] − especially of the "repulsions."Due to the recursive nature of our Euler-Hückel picture, however, some of them were already stored during the precursor step of torsion optimization (see Table 3 for X = C, N, O, F, Si, P, S, Cl and their three vertices V): Since and the difference (a − b) always is constant (namely 4), the task of "anchor integration" is considerably accelerated."Linear scaling" [96] can be achieved, if we restrict ourselves to one-and two-center integrations, only.
Let us close this article with some fundamental considerations on quantum mechanics, in general.Physical theories can be characterized by an * -algebra of observables.Algebras of classical theories (like Newton's mechanics, Maxwell's electrodynamics, Clausius' thermodynamics, and Einstein's theory of relativity) are "commutative" (i.e., any two observables Â an B interchange pairwise: Â B = B Â).The algebra of quantum mechanics, however, is "noncommutative" with incompatible observables: Â B = B Â. Different from the sharp values of classical observables, quantum theory deals with potential properties, which cannot be actualized simultaneously.For a thorough discussion of this topic, please consult the lectures and writings of Hans Primas, particularly his famous book of 1983, entitled "Chemistry, Quantum Mechanics and Reductionism" [18][19][20][21][22][23][24][25].
Following Ulrich Hoyer, on the other hand, Schrödinger's (stationary and time-dependent) equations, for instance, can be derived by applying the statistical rules of Boltzmann's distribution law.Alternatively, such a "Synthetic Quantum Theory" also emerges from Liouville's theorem.It seems, that non-compatible properties of the molecular world can be readily explained by simple probability arguments.Let us just mention that such an "Ansatz" already shows the non-existence of He 2 , Ne 2 and Be 2 ; for the diatomics H 2 , Li 2 , B 2 , C 2 , N 2 , O 2 and F 2 , it also leads to an estimation of dissociation energies; the bond length of H 2 is predicted as 3 √ 4B, where B is the Bohr radius.According to Hoyer, pioneer quantum mechanics mainly suffers from a "hýsteron-próteron": instead of proceeding with a belated probabilistic interpretation of the postulated wave function, one should start with purely statistical axioms from the very beginning [97][98][99][100][101][102][103][104][105][106][107].
Finally, we want to acknowledge for help and friendship.Moreover, we like to add three aphorisms of a home-attached philosopher, a cosmopolitan discoverer, and an expert for "life as work of art" [108], that might be encouraging for others, too.

A.1 Standard vertex geometries
Given the bond lengths λ (AV i ) , the dihedral angle δ = 2π/3 and with ϑ def = π − arccos(− 1 3 ), the five vertex position vectors of the eight monofocal hydrids AV 4 read as in Table 4: For A=C, reflecting the a 0 , a 1 , a 2 , a 3 at the x y-plane and shifting them to the right (i.e., along the positive z-axis) by λ (CC) , we get another four vectors b 0 , b 1 , b 2 , b 3 , which all together define the "eclipsed" conformation of ethane CH 3 −CH 3 .For a "bifocal" methyl compound CH 3 −XV 3 , in general (where X = C, N, O, F, Si, P, S, Cl), we equivalently adopt the additional coordinates of Table 5.
Defining a shift vector (with λ def = λ (CC) ), a rotation and a torsion matrix, respectively:   respectfully, are mainly defined through the flexible set of b n i -vectors: The required torsion angle τ n refers to the minimum of the potential curve E 0 (τ ).In highly symmetric cases (as for "eclipsed" ethane (point group D 3v ) and "staggered" ethane (point group D 3d )) energetic "degeneracies" will occur.

A.4 Proportional nuclear charges
Due to "chemionic separations" and "vertex condensations," different nuclear charges enter into the diagonal and off-diagonal elements of attraction energy and their Fock matrices.Corresponding considerations hold for the nuclear repulsion energies.While in "diagonal" contributions V CC (and the corresponding E nucl (dia) ), 2#chp chemions have to be proportionally distributed on a subset of vertices, "off-diagonal" terms V AB (and the corresponding E nucl (off) ) contribute very low nuclear charges-namely the proportional weight of only two chemions.See Tables 9 and 10.

A.5 Approximate three-and four-center repulsion integrals
In order to accelerate the described recursive procedure, we consider two additional techniques for the purpose of further simplification: an explicit and an implicit pathway.

A.5.1 Explicit integrations
For the "Coulomb-type three-center repulsions," we write: For the "exchange-type three-center repulsions," we write: With respect to the four-center integrals, in general, such distinctions cannot be made.Nevertheless, we consider four different approximations: For the "Coulomb-type four-center repulsions," we write: For the "exchange-type four-center repulsions," we write: With the following substitutions, we get from Eq. (36 1 ):
The diagonal elements W (q.q) (53) Due to their ability of leading back repulsive multi-center interactions on only one-and two-center integrations, the above expressions yield considerable accelerations.Their use will be another important step towards a "linear scaling" of our recursive algorithm [96].

A.6 Polyfocal ground state energies, torsion angles, and rotational barriers
All the "Born-Oppenheimer Energy Functions (BOEF)" discussed here are one-dimensional singlet and triplet state energies (i.e., potential curves).Mainly interesting • are their minimum E min 0 and the corresponding torsion angle τ min • are their maximum E max 0 and the corresponding torsion angle τ max • and the difference E 0 , which corresponds to a "barrier of internal rotation."