Despite the apparent complexity of semiempirical methods, there are only three possible sources of error: reference data may be inaccurate or inadequate, the set of approximations may include unrealistic assumptions or be too inflexible, and the parameter optimization process may be incomplete. In order for a method to be accurate, all three potential sources of error must be carefully examined, and, where faults are found, appropriate corrective action taken.
Reference data
In contrast to earlier methods, in which reference data was assembled by painstakingly searching the original literature, the current work relies heavily on the large compendia of data that have been developed in recent years. The most important of these are the WebBook [20], for thermochemistry, and the Cambridge Structural Database [21] (CSD), for molecular geometries.
During the early stages of the current work, consistency checks were performed to ensure that erroneous data were not used. These checks revealed many cases in which the calculated heats of formation were inconsistent with the reference heats of formation reported in the NIST database. On further checking, many of these reference data were also found [22, 23] to be inconsistent with other data in the WebBook. In those cases where there was strong evidence of error in the reference data, the offending data were deleted, and the webbook updated [24].
For molecular geometries, gas phase reference data are preferred, but in many instances such data were unavailable, and recourse was made to condensed-phase data. Provided that care was taken to exclude those species whose geometries were likely to be significantly distorted by crystal forces, or which carried a large formal charge, condensed-phase data of the type found in the CSD were regarded as being suitable as reference data.
Because earlier methods used only a limited number of reference data, most of the cases where the method gave bad results were not discovered until after the method was published. In an attempt to minimize the occurrence of such unpleasant surprises, the set of reference data used was made as large as practical. To this end, where there was a dearth or even a complete absence of experimental reference data, recourse was made to high level calculations. Thus, for the Group VIII elements, there are relatively few stable compounds, and the main phenomena of interest involve rare gas atoms colliding with other atoms or molecules, so reference data representing the mechanics of rare gas atoms colliding with other atoms was generated from the results of ab-initio calculations. Additionally, there is an almost complete lack of thermochemical data for many types of complexes involving transition metals, so augmenting what little data there was with the results of ab-initio calculations was essential.
Use of Ab-Initio results
Ab-initio calculations provide a convenient source of reference data; for this work, extensive use has been made of results of Hartree Fock and B3LYP density functional [25, 26] methods (DFT), both with the 6–31G(d) basis set for elements in the periodic table up to argon. For systems involving heavier elements, the B88–PW91 functional [27, 28] was used with the DZVP basis set. Within the spectrum of ab-initio methods these methods are not particularly accurate; many methods with larger basis sets and with post-Hartree-Fock corrections are more accurate. However, the methods used in this work were chosen because they were regarded as robust, practical methods, allowing many systems to be modeled in a reasonable amount of time, a condition that could not be achieved with the more sophisticated ab-initio methods.
Procedure used in deriving ΔHf
Reference heats of formation, ΔHf, for compounds and ions of elements for which there was a paucity of data were derived from DFT total energies in two stages. In the first stage, a basic set of ∼1,400 well-behaved compounds, for which reliable reference values of experimental ΔHf were available, was assembled. Only compounds containing one or more of the elements H, C, N, O, F, P, S, Cl, Br, and I were used. For this set, a root-mean-square fit was made to the reference ΔHf using the calculated total energies, E
tot and the atom counts. Thus, the error function, S, in Eq. (1) was minimized.
$$ S = {\sum\limits_j {{\left( {\Delta H_{j} {\left( {\operatorname{Re} {\text{f}}{\text{.}}} \right)} - 627.51{\left( {E_{{{\text{Tot}}}} + {\sum\limits_i {C_{i} n_{i} } }} \right)}} \right)}^{2}_{j} } }$$
(1)
In this expression, the C
i
are constants for each atom of type i, and the n
i
are the number of atoms of that type.
In the second stage, the contribution to the total energy of compounds containing element X arising from the elements in the first stage was removed using the coefficients from Equation (1). A second RMS fit was then performed. In this, the function minimized, S, was the RMS difference between the reference ΔHf of compound X and the values predicted from the DFT energy, Eq. (2).
$$S = {\sum\limits_j {{\left( {\Delta H_{j} {\left( {\operatorname{Re} {\text{f}}{\text{.}}} \right)} - 627.51{\left( {E_{{{\text{Tot}}}} + {\sum\limits_i {C_{i} n_{i} } } + C_{x} n_{x} } \right)}} \right)}^{2}_{j} } }$$
(2)
In this expression, the only unknown is the multiplier coefficient C
x
. After solving for C
x
, the ΔHf of any compound of X could then be predicted as soon as its DFT total energy was evaluated.
Training set reference data
The training set of reference data used was considerably larger than that used in parameterizing PM3 [7, 8], where approximately 800 discrete species were used. In optimizing the parameters for PM6, somewhat over 9,000 separate species were used, of which about 7,500 were well-behaved stable molecules. The remainder consisted of reference data that were tailored to help define the values of individual parameters or sets of parameters.
Use of rules in parameter optimization
Most reference data can be expressed as simple facts. Indeed, all the earlier NDDO methods were parameterized using precisely four types of reference data: ΔHf, molecular geometries, dipole moments, and ionization potentials. During the development of PM6, however, the use of other types of reference data was found to be necessary. Because of their behavior, these new data are best described as “rules.” In this context, a rule can therefore be regarded as a reference datum that is a function of one or more other data. To illustrate the use of a rule, consider the binding energy of a hydrogen bond in the water dimer. By default, the weighting factor for ΔHf for normal compounds is 1.0 kcal mol−1. With this weighting factor, average unsigned errors in the predicted ΔHf of the order of 3–5 kcal mol−1 would be acceptable, particularly as the spectrum of values of ΔHf spans several hundreds of kilocalories per mole. However, the binding energy of a hydrogen bond in a water dimer is only 5 kcal mol−1. To have an average unsigned error (AUE) of 4 kcal mol−1 in the prediction of hydrogen bond energies would render such a method almost useless for modeling such phenomena.
One way to increase the importance of the hydrogen bond in water would be to increase the weight for the ΔHf of the water molecule, −57.8 kcal mol−1, and the water dimer system, ca. −120.6 kcal mol−1. While this would have the intended effect of increasing the weight of the hydrogen bond energy, it would also have the undesired effect of increasing the weight of the ΔHf of water.
An alternative would be to express the ΔHf of the water dimer in terms of the ΔHf of two individual water molecules. The difference between the two ΔHf, that of water dimer and that of two isolated water molecules, would be the energy of the hydrogen bond. If the weight assigned to this quantity were then increased, it would increase the weight for the hydrogen bond energy without also increasing the weight for the ΔHf of water. Such a reference datum is referred to here as a rule. That is, rules relate the ΔHf of a moiety to that of one or more other moieties. Thus, in the above example, the simple reference datum H, representing the ΔHf of an isolated water molecule, could be expressed as:
Using a rule-based reference datum to represent the strength of the hydrogen bond, and giving a weight of 10 to the hydrogen bond energy, the ΔHf of the water dimer would then be defined as
$${\text{H = 10}}{\left( {{\text{ - 5 + H}}_{{{\text{H2O}}}} {\text{ + H}}_{{{\text{H2O}}}} } \right)}$$
In this expression, HH2O was the calculated ΔHf, in kcal mol−1, of an isolated water molecule. This rule could be interpreted as “The calculated strength of the hydrogen bond formed when two water molecules form the dimer should be 5 kcal mol−1, and the importance should be 100 times that of ordinary heats of formation.”
Rules are very useful in defining the parameter hypersurface. Examples of such tailoring are as follows:
Correcting qualitatively incorrect predictions
During the parameterization of transition metals, some systems were predicted to have qualitatively the wrong structure. For example, [CuIICl4]2− was initially predicted to have a tetrahedral structure, instead of the D2d geometry observed. To induce the parameters to change so as to make the D2d geometry more stable than the Td geometry, a rule was added to the set of reference data for copper compounds. This rule was constructed using the results of B3LYP calculations on [CuIICl4]2−. First, the total energies of the optimized B3LYP structure and that of the structure resulting from the semiempirical calculation were evaluated. The difference between these energies was then used in constructing the rule. In this case, the rule was that “The ΔHf of the geometry predicted by the faulty semiempirical method should be n.n kcal mol−1 more than that of the B3LYP geometry.” When such a rule was included in the parameter optimization, with an appropriate large weight, any tendency of the parameters to predict the incorrect geometry resulted in a large contribution to the error function. That is, with the new rule in place, there was a strong disincentive to prediction of the incorrect structure. Usually one rule was sufficient to correct most qualitative errors, but for a few complicated structures more than one rule was needed. The commonest need for multiple rules occurred when, initially, one rule was used to correct a faulty prediction and, after re-optimizing the parameters, the geometry optimized to a new structure that was distinctly different from either the correct structure or the incorrect structure covered by the rule. When that happened, the procedure just described was repeated, and a new rule added to the set of reference data to address the new incorrect structure. In extreme cases, several such rules might be needed, each one defining a geometry that was incorrect and should therefore be avoided.
Rare gas atoms at sub-equilibrium distances
For some elements, specifically those of Group VIII, there is an understandable shortage of useful experimental reference data. In addition, most simulations involving these elements are likely to involve a rare-gas atom dynamically interacting with another atom or with a molecule at distances significantly less than the equilibrium distance. This makes determining the potential energy surface at sub-equilibrium distances important. As with hydrogen bond energies, the energies involved in this domain are likely to be in the order of a few kcal mol−1. The shape of the potential energy surface (PES) can readily be mapped using DFT methods. By selecting two or three representative points on this PES, reference data rules can be constructed that describe the mechanical properties of the interactions. As with hydrogen bonding, a large weight can be assigned to these rules.
Use of rules to restrain parameter values
In general, uncharged atoms that are separated by a distance sufficiently large so that all overlaps between orbitals on the two atoms are vanishingly small will not interact significantly, and what interaction energy exists would arise from VDW terms: of their nature, these are mildly stabilizing. Although statements of this type are obviously true, when they are expressed as rules and added to the training set of reference data they can help define the parameter values. For a pair of atoms, A and B, a simple diatomic system would be constructed in which the interatomic separation was the minimum distance at which any overlaps of the atomic orbitals would still be insignificant. The electronic state of such a system would then be the sum of the states of the two isolated atoms. Thus, if both A and B were silicon, then, since the ground state of an isolated silicon atom is a triplet, the combined state would be a quintet. Because the two atoms do not interact significantly, a rule could then be constructed that said “The energy of the diatomic system is equal to the addition of energies of the two individual systems.” By giving this rule a large weight, any tendency of the method to generate a spurious attraction or repulsion between the atoms would be prevented.
Atomic energy levels
In keeping with the philosophy that a large amount of reference data should be used in the parameter optimization, spin-free atomic energy levels were used for most elements. The exceptions were carbon, nitrogen, and oxygen, where there were enough conventional reference data that the addition of atomic energy levels would not significantly improve the definition of the parameter surface.
NDDO approximations do not allow for spin-orbit coupling. Therefore, spin-free levels were needed. For a few elements, there were insufficient spin states to allow the spin-free energy levels to be calculated. For all the remaining elements, spin-free energy levels were calculated.
In Moore’s compendia [29–31] of atomic energy levels, observed emission spectra were used in determining the energy levels of the various states of neutral and ionized atoms. Most of these energy levels were characterized by three quantum numbers: the spin and orbital angular momenta, and the “J” or spin-orbit quantum number. The starting point for determining the spin-free atomic energy levels for a given element consisted of identifying each complete manifold of atomic energy levels for that element, that is, each set of levels split by spin-orbit coupling. If all members of the set were present, i.e., all energy levels from L+S to |L−S|, then the weighted barycenter of energy could be calculated. The spin-free energy level, E, was derived from the spin-split levels E(S,L,J) using Eq. (3).
$$ E = \frac{1} {{{\left( {2S + 1} \right)}{\left( {2L + 1} \right)}}}{\sum\limits_{J = {\left| {L - S} \right|}}^{L + S} {{\left( {2J + 1} \right)}E{\left( {S,L,J} \right)}} } $$
(3)
In those cases where the ground state of an atom was itself a member of a spin-split manifold, the barycenter of the ground state manifold was calculated and used in re-defining the spin-free ground state. For all elements except tungsten, this change in definition was benign. There is a 7S3 level present in tungsten that is located only 8.4 kcal mol−1 above the ground state. This puts it inside the 5DJ, manifold, which has a barycenter at 12.7 kcal mol−1. The effect of this was that, on going from a spin-split to a spin-free ground state, the ground state changed from 6d
25d
4 or 5D to 6d
15d
5 or 7S, and the 5D state now became an excited state with an energy of 4.4 kcal mol−1. To allow for this, a corresponding change was made to the ground state configuration in the PM6 definition of tungsten.
Where there were relatively few other reference data, the singly-ionized, and, in rare cases, the doubly-ionized, spin-free states were also evaluated and used as reference data.
Each energy level contributed one reference datum to the training set. Most atoms have a large number of atomic energy levels, so in order to minimize the probability that a level might be incorrectly assigned, each level was labeled with three quantum numbers: the total spin momentum, the total angular momentum, and the principal quantum number for these two quantum numbers. These were compared with the corresponding values calculated from the state functions. Since each set of three quantum numbers is unique, the potential for miss-assignment was minimized. In rare cases, particularly during the early stages of parameter optimization, two states with the same total spin and angular quantum numbers would be interchanged, with the result that the calculated principal quantum number would also be interchanged. All such cases always involved the ground state, and were quickly identified and corrected.
Approximations
Most of the approximations used in PM6 are identical to those in AM1 and PM3. The differences are:
Core-core interactions
In the original MNDO set of approximations, two changes were made to the simple point-charge expression for the core-core repulsion term. Beyond about five Ångstroms, there should be no significant interaction of two neutral atoms. However, in MNDO, the two-electron, two-center \(\left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.\) integrals and the electron-core interactions do not converge to the exact point charge expression; instead, they are always slightly smaller. To prevent there being a small net repulsion between two uncharged atoms, the core-core expression is modified by the exact 1/RAB term being replaced by the term used in the \(\left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.\) integrals. An additional term is needed to represent the increased core-core repulsion at small distances due to the unpolarizable core. These two changes can be expressed as the MNDO core-core repulsion term as shown in Eq. (4).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.{\left( {1 + e^{{ - \alpha _{A} R_{{AB}} }} + e^{{ - \alpha _{B} R_{{AB}} }} } \right)}$$
(4)
This approximation works well for most main-group elements, but when molybdenum was being parameterized, Voityuk [14] found that the errors in heats of formation and geometries were unacceptably large, and good results were achieved only when a diatomic term was added to the core-core approximation, as shown in Eq. (5).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.{\left( {1 + x_{{AB}} e^{{ - \alpha _{{AB}} R_{{AB}} }} } \right)}$$
(5)
When PM3 parameters for elements of Groups IA were being optimized, the MNDO approximation to the core-core expression was found to be unsuitable. In these elements there is only one valence electron so the core charge is the same as that of hydrogen. A consequence of this was that the apparent size of these elements was also approximately that of a hydrogen atom, in marked contrast with observation. For these elements, diatomic core-core parameters were also found to be essential.
Further examination showed that when diatomic parameters were used, there was always an increase in accuracy; therefore, in the current work, Eq. (4) was replaced systematically by Eq. (5).
As the interatomic separation increased, Voityuk’s equation converged to the exact point-charge interaction, as expected. However, for rare gas interactions, an increase in accuracy was found when the rate of convergence was increased by the addition of a small perturbation. Subsequently, the perturbed function was found to be generally beneficial. Because of this, the general form of the core-core interaction used in PM6 is that given in Eq. (6).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.{\left( {1 + x_{{AB}} e^{{ - \alpha _{{AB}} {\left( {R_{{AB}} + 0.0003R^{6}_{{AB}} } \right)}}} } \right)}$$
(6)
At normal chemical bonding distances, Eqs. (5) and (6) have essentially similar behavior, but at distances of greater than about 3 Å the effect of the perturbation is to make the PM6 function significantly smaller than the Voityuk approximation.
d-orbitals on main-group elements
Thiel and Voityuk have shown [13] that a large increase in accuracy results when d-orbitals are added to main-group elements that have the potential to be hypervalent. During preliminary stages of this work, d-orbitals were excluded from main-group elements, and the parameters were optimized. This work was then repeated but with d-orbitals on various main-group elements. The results were in accordance with Thiel’s observation: the accuracy of the method increased significantly. Because of this, d-orbitals were added to several main-group elements: the value of the increased accuracy far outweighs the extra computational cost.
The effect of the addition of d-orbitals was fundamentally different between main-group elements and transition metals. For main-group elements, the effect of d-orbitals is merely a perturbation: to a large degree the chemistry of these elements is determined by the s and p atomic orbitals. This is not the case with transition metals, where the d-orbitals are of paramount importance and the s and p orbitals are of only very minor significance. In recognition of the importance of the s and p shells in main-group chemistry, specific parameters are used for the five one-center two-electron integrals. Conversely, for the transition metals, the values of these integrals are derived directly from the internal orbital exponents.
Unpolarizable core
As noted earlier, the NDDO core-core interaction is a function of the number of valence electrons. For elements on the left of the periodic table these numbers are small and can cause the elements to appear to be too small. This was part of the rationale behind the adoption of Voityuk’s diatomic core-core parameters. However, even the Voityuk approximation failed during parameter optimization when, in rare cases, a pair of atoms would approach each other very closely. Examination of these catastrophes indicated that the cause was the complete neglect of the unpolarizable core of the atoms involved. To allow for its presence, the core-core interaction for all element pairs was modified by the addition of a simple function, f
AB
, based on the first term of the Lennard-Jones potential [32]. A candidate function was constructed, Eq. (7), using the fact that, to a first approximation, the size of an atom increases as the third power of its atomic number.
$$ f_{{AB}} = c{\left( {\frac{{{\left( {Z^{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 3}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$3$}}}_{A} + Z^{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 3}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$3$}}}_{B} } \right)}}} {{R_{{AB}} }}} \right)}^{{12}} $$
(7)
The value of c was set to 10−8, this being the best compromise between the requirements that the function should have a vanishingly small value at normal chemical distances. That is, under normal conditions the value of the function should be negligible, and at small interatomic separations the function should be highly repulsive, i.e., that it should represent the unpolarizable core.
Individual core-core corrections
For a small number of diatomic interactions, the general expression for the core-core interaction was modified in order to correct a specific fault. Because it is desirable to keep the methodology as simple as possible, modifications of the approximations were made only after determining that the existing approximations were inadequate. The diatomic specific modifications were:
O–H and N–H
In the original MNDO formalism, the general core-core interaction, Eq. (4), was replaced in the cases of O–H and N–H pairs with Eq. (8).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|} \right.\left. {s_{B} s_{B} } \right\rangle {\left( {1 + R_{{AB}} e^{{ - \alpha _{A} R_{{AB}} }} + R_{{AB}} e^{{ - \alpha _{B} R_{{AB}} }} } \right)}$$
(8)
An unintended effect of this change was that at distances where hydrogen-bonding interactions are important, the diatomic contribution to the ΔHf is greater than if the general approximation, Eq. (4), had been used. This contributed to a reduced hydrogen-bonding interaction in MNDO, and was a contributor to the need for modified core-core interactions in AM1 and PM3.
In PM6, the MNDO core-core approximation is replaced by Voityuk’s diatomic expression, but even with that modification, the resulting hydrogen bond interaction energy was too small. In an attempt to increase it, the Voityuk approximation was replaced by Eq. (9).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|\left. {s_{B} s_{B} } \right\rangle } \right.{\left( {1 + x_{{AB}} e^{{ - \alpha _{{AB}} R^{2}_{{AB}} }} } \right)}$$
(9)
At normal O–H and N–H separations, approximately 1 Å, Eqs. (5) and (9) have similar values, but at hydrogen bonding distances, ∼2 Å, the contribution arising from the exponential term is significantly reduced, resulting in a corresponding increased hydrogen bond interaction energy.
C–C
After optimizing all parameters, it was found that compounds containing yne groups, -C≡C-, were predicted to be too stable by about 10 kcal mol−1 per yne group. This error was unique to compounds with extremely short C–C distances, and in light of the increased emphasis on accurately reproducing the properties of organic compounds, the C–C core-core term was perturbed by the addition of a repulsive term. This term was optimized to correct the error in the yne groups and to have a negligible effect on all other C–C interactions. The optimized form of the C–C core-core interaction is given in Eq. (10).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|} \right.\left. {s_{B} s_{B} } \right\rangle {\left( {1 + x_{{AB}} e^{{ - \alpha _{{AB}} {\left( {R_{{AB}} + 0.0003R^{6}_{{AB}} } \right)}}} + 9.28e^{{ - 5.98R_{{AB}} }} } \right)}$$
(10)
Si–O
During testing of PM6, neutral silicate layers of the type found in talc, H2Mg3Si4O12, were found to be slightly repulsive instead of being slightly bound. An attempt was made to correct for this error by adding a weak perturbation to the Si–O interaction, illustrated by Eq. (11).
$$E_{n} {\left( {A,B} \right)} = Z_{A} Z_{B} \left\langle {\left. {s_{A} s_{A} } \right|} \right.\left. {s_{B} s_{B} } \right\rangle {\left( {1 + x_{{AB}} e^{{ - \alpha _{{AB}} {\left( {R_{{AB}} + 0.0003R^{6}_{{AB}} } \right)}}} - 0.0007e^{{ - {\left( {R_{{AB}} - 2.9} \right)}^{2} }} } \right)}$$
(11)
Nitrogen sp
2 pyramidalization
Although PM6 predicted the degree of pyramidalization of primary amines correctly, it overestimated the pyramidalization of secondary and tertiary amines. The degree of pyramidalization of these amines was decreased by adding a function to make the calculated ΔHf more negative as the nitrogen became more planar, as shown in Eq. (12).
$$ \Delta {H}\ifmmode{'}\else$'$\fi_{f} = \Delta H_{f} - 0.5e^{{ - 10\phi }} $$
(12)
In this equation, the angle ϕ is a measure of the non-planarity of the nitrogen environment, and is given by 2π minus the sum of the three contained angles about the nitrogen atom. For planar sp
2 secondary and tertiary amines, this correction amounted to 0.5 kcal mol−1 per nitrogen atom.
More elements
The NDDO basis sets of many of the elements parameterized in PM6 have not previously been described. For all elements except hydrogen, which has only an s orbital, the basis set consists of an s orbital, three p orbitals, and, for most elements, a set of five d orbitals. Slater atomic orbitals are used exclusively; these are of form:
$$ \varphi = \frac{{{\left( {2\xi } \right)}^{{n + \raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}} }} {{{\left( {{\left( {2n} \right)}!} \right)}^{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}} }}r^{{n - 1}} e^{{ - \xi r}} Y^{m}_{l} {\left( {\theta ,\phi } \right)} $$
Where ξ is the orbital exponent, n is the principal quantum number (PQN), and the Y
l
m(θ, ϕ) are the normalized real spherical harmonics. The PQN are those of the valence shell, i.e., the set of atomic orbitals most important in forming chemical bonds. For PM6, the PQN used are shown in Table 1. For most main-group elements, the s and p PQN are the same, and, when d orbitals are present, all three PQN are the same: that is, the PQN are (ns, np, nd). For transition metals, the d PQN is one less than that of the s and p shells, i.e., (ns, np, (n–1)d). An exception to this generalization occurs in the elements of Group VIII. Here, the valence shell is completely filled, so in all chemical interactions that could occur between an atom of a Group VIII element and any other atom, electron density could only migrate from the Group VIII element to the other atom. That is, when a rare gas element forms any type of chemical bond it would necessarily become slightly positive. This is an unrealistic result. In order to allow rare gas atoms to have the potential of being slightly negative, the set of valence orbitals was changed from (ns, np) to (np, (n+1)s), for the elements Ne, Ar, Kr, and Xe. Helium is the only exception to this change, because it does not have a “1p” valence shell. For helium, the valence shell used was (1s, 2p), this being considered the best compromise.
Table 1 Principal quantum numbers for atomic orbitals
Parameter optimization
Background
The objective of parameter optimization is to modify the values of the parameters so as to minimize the error function, S, Eq. (13), representing the square of the differences between the values of reference data, Q
ref
(i), and the values calculated using the semiempirical method, Q
calc
(i), with appropriate weighting factors, g
i
.
$$S = {\sum\limits_i {{\left( {g_{i} {\left( {Q_{{calc}} {\left( i \right)} - Q_{{ref}} {\left( i \right)}} \right)}} \right)}^{2} } }$$
(13)
This process is initiated by rendering the reference data in the training set dimensionless. The default conversion factors are given in Table 2, with weighting factors for reference data represented by rules being much larger, typically in the order of 5–20 kcal mol−1.
Table 2 Default weighting factors for reference data
The elements were divided into four sets: core elements, (H, C, N, and O), other elements important in organic chemistry (F, Na, P, S, Cl, K, Br, I), the rest of the main group, and the transitions metals. Elements were assigned to the different sets based on their presumed degree of importance in biochemistry, and this importance was converted into a weighting factor to be used in the parameterization optimization procedure. Reference data representing species consisting only of core elements were given their default weight. When other elements were present, the weight was set to the default weight times the smallest multiplier shown in Table 2. Thus the default weight for a reference datum involving tetramethyllead, Pb(CH3)4, would be multiplied by 0.8 reflecting the fact that this species contains an element in the main group set.
For a given set of parameters, P, optimization proceeds by calculating the values of all the Q
calc
(i), their first derivatives with respect to each parameter, P(j), and the second derivatives with respect to every pair of parameters. Evaluating these quantities is time-consuming, and considerable effort was expended in minimizing the need for explicit evaluation of these functions. The most efficient strategy developed [7] involved assuming that, in the region of parameter space near to the current values of the parameters, the values of the first derivatives of the Q
calc
(i) with respect to P were, at least to a first approximation, constant. By making this assumption the values of the parameters could then be updated using perturbation theory. Because the assumption is only valid in the region of the starting point in parameter space, periodically the focus was moved to the new point in parameter space and a complete explicit re-evaluations of all the functions performed. The parameter optimization process terminated when the scalar of the first derivatives dropped below a preset limit. This process was fully automated, and for given sets of reference data and parameters, parameter optimization could be performed rapidly, easily, and reliably.
Sequence of optimization of parameters
Notwithstanding the reliability of the parameter optimization procedure, a simple global optimization of all the parameters for all 70 elements involving about over 9,000 discrete species was found to be impractical because of the large number of derivatives involved. Such an optimization would involve over 2,000 parameters and over 10,000 reference data. The set of second derivatives alone would consist of 2×1010 terms. With more powerful computers, evaluating such large sets of derivatives might be practical some day, but even then, one faulty reference datum or one faulty initial parameter value would ruin an optimization run. The strategy of parameter optimization was approached with great caution, and the procedure finally adopted was as follows:
Because the elements H, C, N, and O are of paramount importance in biochemistry, and because large amounts of reference data are available, the starting point for parameter optimization involved the simultaneous optimization of parameters for these four elements. For the purposes of discussion, this set of four elements will be called the “core elements”.
Once stable parameters had been obtained, parameters for other elements important in organic chemistry were optimized in two stages. First, the parameters for the core elements were held constant, and parameters for the elements F, P, S, Cl, Br, and I were optimized one at a time. Then all parameters for all ten elements were simultaneously optimized. This set (the organic elements) was then used as the starting point for parameterizing the rest of the main group.
The same sequence was followed for the rest of the main-group elements. That is, parameters for each element were optimized while freezing the parameters for the organic elements. Then, once all the elements had been processed, all parameters for all of the 39 main-group elements, plus zinc, cadmium, and mercury, were optimized simultaneously.
When parameters for the transition metals were being optimized, all parameters for the main group elements were held constant. There were several reasons for this. Most importantly, the reference data for the transition metals, particularly the thermochemical data, was of lower quality, so one consideration was to prevent the transition metals from having a deleterious effect on the main-group elements. Another important consideration was that most compounds involving transition metals also involved only elements of the organic set. Since parameters for these elements had been optimized using a training set consisting of all the main-group elements, the values of the optimized parameters would likely be relatively insensitive to the influence of the small number of additional reference data involving transition metals.
In general, all parameters for a given element were optimized simultaneously; this was both efficient and convenient. In some optimizations, specifically those involving a new element, only sub-sets of parameters were used. Three main sub-sets were used:
Parameters that determine atomic electronic properties
For most elements, atomic energy levels are determined by six parameters: the one-electron one-center integrals Uss, Upp, Udd, and the internal orbital exponents ζsn, ζpn and ζdn. If the heat of ionization and sufficient atomic energy level data were available, these quantities could be uniquely defined; there would be no need for the use of molecular reference data. These parameters were the first to be optimized whenever an optimization was started for an element that had not previously been parameterized
Parameters that determine molecular electronic properties
Two of the more important electronic molecular properties are the dipole moment, which indicates the degree of polarization within a molecule, and the ionization potential. These properties are determined primarily by 12 parameters: the six parameters that determine atomic electronic properties and six additional parameters: βs, βp, and βd and the Slater orbital exponents ζs, ζp, and ζd. In the second stage of parameter optimization, the first six parameters were held constant at the values defined using atomic data and the second set optimized. During this operation, all geometries were fixed at their reference values.
Parameters that determine geometries
As soon as an initial optimized set of electronic parameters was available, the diatomic and other core-core parameters could be optimized. The most efficient process was to optimize these parameters initially without allowing the electronic parameters or the molecular geometries to optimize. If geometries were allowed to optimize, optimization of the core-core parameters would be slowed considerably, because of the tight dependency of the optimized geometries on the values of the core-core parameters, and vice versa.
As soon as all parameters had been optimized using fixed geometries, the geometries were allowed to relax and the parameters that determine geometry re-optimized. After that there would be three sets of incompletely optimized parameters: the six atomic electronic parameters, the six molecular electronic parameters and the core-core parameters. The only remaining operation was the simultaneous optimization of all the parameters. If the training set of reference data was insufficient to unambiguously define the values of all the parameters, then, at that stage, the potential existed for the parameters to become ill-defined. An example of this would be where there were too few atomic energy levels to allow all six parameters in the first set to be defined. To allow for this, a penalty function was added to each parameter. If the values of a parameter exceeded pre-defined limits, the error function S was incremented by a constant times the square of the excess. No penalty was applied if the value of a parameter was between the pre-defined limits; that is, no bias was applied to the numerical value of a parameter. During the early stages of simultaneous optimization of all the parameters for a given element the penalty function was used frequently. In the later stages the penalty function was invoked rarely, and then only when there was a distinct shortage of reference data.