Minimal set of molecule-adapted atomic orbitals from maximum overlap criterion

The criterion of maximum overlap with the canonical free-atom orbitals is used to construct a minimal set of molecule-intrinsic orthogonal atomic orbitals that resemble the most their promolecular origins. Partial atomic charges derived from population analysis within representation of such molecule-adopted atomic orbitals are examined on example of first-row hydrides and compared with charges from other methods. The maximum overlap criterion is also utilized to approximate the exact free-atom orbitals obtained from ab initio calculations in any arbitrary basis set and the influence of the resulting fitted canonical atomic orbitals on properties of molecule-adopted atomic orbitals is briefly discussed.


Introduction
It has recently been argued [1] within the framework the Orbital Communication Theory (OCT) [2][3][4] of the chemical bond that the minimum basis (MB) of atomic orbitals (AO) occupied in the promolecular system of non-interacting atoms gives rise to the most intuitively "chemical" account of the bond covalency/ionicity and gives understanding of diverse factors conditioning the efficiency of the AO interactions. It has turned out that, in the case of molecules with typical covalent bonds, amount of information about electron localization calculated within the MB-set of atomic orbitals is (to a certain degree) invariant with respect to basis-set enlargement [1].
The maximum overlap criterion (MOC) [5,6] has been proposed to numerically confirm that the MOC-approximated (fitted) small basis and the reference minimum set of AOs indeed generate practically identical OCT-indices.
The criterion of maximum overlap has also been used [7] to approximate the wave function calculated within a general (extended) basis set by the molecular orbitals (MO) calculated within the external minimal basis (EMB), i.e. Huzinagas MINI basis functions [8]. It has been demonstrated that the electron population analysis (EPA) based on the resulting EMB-orbitals gives rise to definitely more reliable and sensible partial atomic charges (PICs) than the corresponding Mulliken's and Löwdins population analyses (MPA [9][10][11][12] and LPA [13], respectively), especially if calculated within extended sets of basis functions. Moreover, EMB-charges have turned out to be comparable to PACs calculated within representation of natural atomic orbitals (NAO) [14], as well as preserve the proper convergence profile when one systematically enlarges the basis set.
The main goal of this paper is to utilize the MOC-scheme (in the form presented in [7]) to generate the molecule-intrinsic minimal-basis (IMB) orbitals that resembles the most (in the least-squares sense) the canonical free-atom orbitals (FAO) of the relevant system promolecule.

Method details
Within the framework of molecular-orbital theory the one-determinant wavefunction of the closed-shell groundstate molecular system of bonded atoms is fully determined by the subspace of occupied MOs, |ψ o , calculated within a general set of non-orthogonal basis functions (not necessarily centered on particular atoms), |χ , i.e. with overlap matrix S χ = χ |χ = 1. In the above equation rectangular matrix C o χ collects the relevant linear-combination (LC) coefficients for occupied molecular orbitals. The complementary subspace of virtual MOs, |ψ v , is therefore defined as follows: The dimension of the Hilbert space of all molecular orbitals, n, is the simple sum of dimensions of occupied and virtual MO-subspaces, n = n o + n v ; furthermore, n o can be expressed by the number of core-MOs (n o c ) and valence-MOs (n o v ), n o = n o c + n o v . The number of basis functions |χ we will denote by n χ . Now, let us consider the corresponding promolecular system given by the relevant non-bonded atoms at their molecular positions, calculated using the ROHF/GVB method with explicitly assumed fractional occupation numbers to assure sphericall symmetry of all atoms. For separate atom X, the subspace of the resultant occupied canonical free-atom orbitals, |ϕ 0 X , determine the minimal set of atomic orbitals strictly assigned to the atom in question; here, matrix B 0 X collects the linear-combination coefficients related to the subspace of occupied FAOs. Therefore, grouping the relevant LC-matrices of all constituent atoms in the molecule gives rise to the promolecular minimal basis of canonical atomic orbitals, |ϕ 0 , within representation of |χ , The number of all promolecular FAOs, b, can be regarded as the simple sum of the number of core-orbitals |ϕ 0 c , b c , and valence-orbitals At this point it has to be mentioned that, generally, b c = n o c but b v > n o v and hence b > n o , i.e. the number of FAOs is always greater than the number of occupied MOs in the molecule and therefore the definition of molecular intrinsic AOs requires using also some orbitals from virtual MO-subspace. However, the choice of appropriate Recently, has been proposed an efficient algorithm for determining the n v vdimensional valence-subspace |ϕ 0 v within the framework of quasi-atomic minimal basis orbitals (QUAMBO) [15,16] which constitute the molecule-intrinsic minimal basis of valence atomic orbitals, |ϕ v . However, the use of the maximum overlap criterion allows one to easily generalize the original procedure to cover also the core atomic orbitals. In short, the alternative algorithm that transforms occupied non-orthogonal FAOs directly to orthogonal IMB-orbitals, |ϕ o → |ϕ , can be summarized in the following two steps: where the unitary matrix U v ϕ diagonalize the appropriate metric matrix W v ϕ , More details about the virtual valence MO-subspace given byC v χ one can find in the original work [15]. The subspaces of occupied and virtual valence-MOs can be collectively written as |ψ and the corresponding n Using the maximum overlap criterion, to find the molecule-intrinsic orthogonal atomic orbitals |ϕ that resemble the most canonical atomic orbitals |ϕ 0 . Then, according to [7] we get: where The presented procedure gives rive to the set of molecule-adapted atomic orbitals preserving the most the inner-shell AOs and with the effective valence-AOs that reveal the influence of the net of chemical bonds in the molecule. In accordance to (8), one can express the contracted set of molecular orbitals |ψ within the representation of atomic orbitals |ϕ as follows: andC Construction of such set of IMB-orbitals every-time involves additional calculations of orbitals |ϕ 0 for separated atoms. However, the maximum overlap criterion can also be successfully utilized to smooth the way of the whole procedure. In particular, matrices B 0 X for the consecutive elements X from the periodic table, calculated with high accuracy within the extended basis set |χ can either reside on disk or can be stored in the quantum-chemistry program. Then, according to the original MOCscheme [6], the required LC-coefficients within arbitrary set of basis functions |χ one can calculate (exact to basis set superposition error) as follows: where In the foregoing equations labels X and X conventionally refer to atom-X-centered functions |χ X and |χ X , respectively, and X ,X stands for the rectangular matrix of overlap integrals between both these basis subsets,

Numerical results
Numerical results presented in this section were obtained using the special program written by authors to perform MOC-calculations as well as the ab initio quantum chemistry package GAMESS [17], the Natural Bond Orbitals software NBO 5.0 [18] and the molecular visualization program MOLDEN [19]. Molecular calculations were carried out for species diversified by means of chemical bond character, i.e. the following first-row hydrides: LiH, BeH 2 , BH 3 , CH 4 , NH 3 , H 2 O and HF (for comparison calculations for LiF were performed as well) at the RHF theory level and using experimental geometries whereas calculations for free atoms were performed using the ROHF/GVB method with explicitly assumed fractional occupation numbers to assure sphericall symmetry of atoms (more details can be found in "Further Information" section of GAMESS documentation [17]).

IMB-orbitals from MOC
First, let us analyze sensitivity to the size of basis sets and convergence properties of partial atomic charges involving IMB-orbitals, (here Z X stands for the atomic number of atom X and N IMB X is the relevant electron population on atom X). The above are compared with PACs from the following standard electron population analyses (-PA): , -electrostatic potential (ESP)-derived atomic charges (Q ESP X ) [20], -the recently proposed population analysis involving the EMB of AOs from the criterion of maximum overlap with the Huzinagas MINI basis functions, (Q EMB X ) [7].
The list of basis sets used involves: STO-3G (a), 6-31G (b), 6-311G (c), 6-311G * * (d), 6-311++G * * (e), 6-311++G * * d (f), 6-311++G * * t (g); here superscripts d and t respectively stand for doubled and tripled sets of relevant polarization functions on each atom. Table 1 collects partial atomic charges of non-hydrogen atoms for all hydrides under consideration and PACs of fluorine atom in LiF molecule, calculated using various PAschemes and the most extended basis set |χ g . Even a cursory analysis of Q LPA X seems to reaffirm the well known shortcoming of population analysis based on symmetrically converges to nearly-zero values as the basis set is enlarged). Although PACs calculated within the framework of MPA assume more sensible values (with the exception of CH 4 molecule), they still seem to be somewhat underestimated with regard to the reference atomic charges Q NPA X and Q ESP X . Moreover, atomic charges Q MPA X are much more sensitive to the basis set choice than Q LPA X [22] and, as proved by Ruedenberg [23], in the limit of a complete basis set they can actually exhibit any value between ±∞. Details of characteristic basis-set dependencies of Q LPA X , Q MPA X as well as charges from other methods for two representative cases, CH 4 and NH 3 , can be investigated in Figs. 1 and 2.
For the most part atomic charges Q NPA X , Q ESP X , Q EMB X and Q IMB X sharply differ from MPA-and LPA-derived ones; they converge to definitely more reasonable values and thus they seem to display adequate picture of molecular electronic structure in atomic resolution. This is evident especially if one compares results for LiF; only atomic charges from NPA, ESP, EMB and IMB correctly predict predominatingly ionic (90-100 %) character of the chemical bond Li-F. However, as follows from a thorough analysis of results from Table 1, in most cases partial atomic charges Q NPA X and Q EMB X reveal quite good correlation and both tend to predict slightly more polarized bond densities than Q IMB X and Q ESP X . This tendencies were also probed using quantities δ Q H involving charges on hydrogen atoms from various PA-methods, Q H , and electronegativity differencies in Pauling scale [24], ξ XH : Fig. 3 Distances between relative bond-ionicity descriptors of X-H based on Pauling's electronegativity differences and those derived from partial atomic charges calculated using various schemes. Method: RHF/6-311++G** (with triple set of polarization functions), experimental geometries Here quantities Q re f and ξ re f refer to diatomic with possibly the most polarized chemical bond which in this case is LiF. Therefore, for particular molecule, δ Q H measures the distance between relative polarizations of the chemical bond H-X evaluated using calculated partial atomic charges and Pauling's electronegativity differencies.
Results are presented in Fig. 3. Since natural atomic charges are known to be quite well-correlated with electronegativity differencies [14] let us confine to compare results only for NPA-and IMB-charges. It follows directly from Fig. 3 that electronic structures predicted by the newly proposed IMB-orbitals are generally closer to those resulting from Pauling's electronegativity differencies (δ Q H , for IMB and NPA assume 0.2072 and 0.3140, respectively), especially for nonmetal hydrides (δ Q H (IMB) = 0.0661 and δ Q H (NPA) = 0.1805). Just as natural atomic charge, IMB-charge display a far more ionic bond Be-H than predicted by other methods while in the case of LiH the latter is closer to predictions from ξ LiH . However, as follows also from Table 1, the most conspicuous difference between atomic charges involving maximum overlap with the minimum set of AOs (IMB and EMB) and PACs from NPA (as well as ESP) concerns BH 3 molecule. In particular, within description involving IMB-orbitals boron atom seem to exhibit almost purely nonmetalic character (since ξ BH ≈ 0.16 and δ Q H is comparable with CH 4 , NH 3 , H 2 O and HF) whereas within ESP and NPA approach boron atom reveals rather metalic/semimetalic character. A similar conclusion can be drawn after analysis of quantity δ Q H (Fig. 4) being an analogue of (16) but measuring difference between relative polarizations of bond H-X evaluated from various PACs, relative Pauling's electronegativity differencies (ELN) and relative "charges" from ESP-approach: Table 2 collects electron populations for three different sets of minimal-basis AOs, i.e. those from IMB, EMB and NPA formalisms. One should notice that within subset of minimal basis natural atomic orbitals Rydberg-type functions are excluded and thus orbital populations usually do not sum up exactly to the overall number of electrons,  ≈ N . At the first glance it is clear that representation of IMB-orbitals assures the best separation of lone-pair and core orbitals; comparable results we obtain using NAOs. But it is somehow puzzling that within representation of EMB-orbitals lone-pairs exhibit better separation than core-orbitals. Furthermore, electron populations of orbital 2s, N EMB 2s , assume significantly greater values than N IMB 2s and N NPA 2s (this feature is characteristic for "physically"-orthogonalized atomic orbitals [25]). In the case of "p"-type orbitals NAO-populations sharply differ from those based on minimal-basis AOs involving MOC-approach only for borane molecule. Below we present averaged (over all molecules) differences between the relevant orbital populations from two different MB-representations, N μ , that concisely recaps Table 2:

MOC-approximated FAOs
Let us investigate the quality of free-atom AOs within arbitrary basis set ( from a to g) obtained using the MOC-method from canonical AOs calculated at ROHF/GVB theory level within three reference set of functions, a, d and g. Table 3 presents averaged (over all orbitals and all available basis sets a-g ) overlap integrals between reference atomic orbitals calculated using ab initio method, |χ re f , and those obtained using maximum overlap criterion, |χ x , S re f Averaged quantities with subscripts were calculated only for basis sets less (|χ x = a, b, c) or more accurate (|χ x = e, f, g) than the reference set of functions |χ re f = d.
In accordance to expectations, results from Table 3 clearly indicate that using the maximum overlap criterion with the reference FAOs calculated within set of functions g allows one to reproduce canonical orbitals for free-atoms within any basis set a-f with the highest accuracy; typical deviation of MOC-derived orbitals from ab initio FAOs is 10 −5 . Using the reference basis-set with only one set of polarization funtions and Table 3 Average (over all basis sets a-g) overlap integrals between MOC-approximated free-atom orbitals and the corresponding canonical FAOs obtained from ab initio calculations using reference basis sets d and g   All molecules at their experimental geometries without diffuse ones, d, gives rise to very similar results provided that |χ x = a, b, c; approximation of more accurate FAOs (|χ x = e, f, g) leads to slightly worse results, with deviation assuming 10 −4 . Obviously, for reference FAOs calculated with very poor set o functions the MOC-procedure does not allow one to obtain atomic orbitals of satisfying quality (atom-averaged deviation is up to 10 −2 ). It was of our special interest to probe to what extent using MOC-approximated FAOs in construction of molecule-adopted AOs influences the results of EPA. Table 4 collects partial atomic charges Q IMB X , calculated within the representation of IMBorbitals involving exact FAOs and various basis sets (a-g), as well as corresponding PICs, Q d X and Q g X , involving MOC-approximated FAOs calculated using extended basis sets d and g, respectively. Quantities¯ X stand for averaged (over all molecules excluding BH 3 ) differencies between IMB-charges calculated within particular basis set and the corresponding atomic charge calculated using exact FAOs and within basis set g; for negative exact PACs the respective differences were multiplied by factor −1. Table 4, using approximated free-atom orbitals in construction of IMB-orbitals has no significant influence on calculated IMB-atomic charges, especially for calculations involving the minimal set of basis functions. Indeed, the first column of numbers in Table 4 clearly indicates that within very poor basis set a atomic charges are particularly insensitive to quality of FAOs used in MOC-procedure (it becomes obvious if one recalls the results from Table 3).

As follows from
On average, atomic charges Q d X and Q g X deviate from the corresponding exact charges Q IMB X by 0.0029e and 0.0020e, respectively and thus, in the view of population-analysis accuracy, they give rise to the same conclusions about the electronic structure of molecules. However, a more discerning comparison of atomic charges Q IMB X and Q g X reveals that the latter ones are usually closer to the corresponding exact IMB-charges (calculated within set of functions g). On the other hand, atomic charges Q d X deviate significantly from the exact charges only in the case of more accurate calculations (if we exclude BH 3 molecule and take into consideration only basis set of TZV-type the average difference between the most exact values of Q d X and Q IMB mathrm X assumes 0.0043e, e.i. about 0.6 %). Indeed, convergence profiles that emerge from values of¯ X allow one to draw the conclusion that the more accurate are canonical FAOs used to construct a set of IMB-orbitals the closer to the exact values (in a sense of basis-set completeness) are the resulting IMB-charges calculated within any arbitrary basis set.

Summary
In this work we have introduced and briefly examined a simple method of generating a minimal set of molecule-adopted atomic orbitals. Contrary to the previously proposed method involving a reference set of (external) minimal-basis orbitals (e.g. Huzinaga's MINI basis set) [7], in this approach we have used the criterion of maximum overlap to the set of free-atom orbitals obtained from ab initio calculations (using the same set of basis functions) for the corresponding system promolecule. Hence, the resulting minimal-basis orbitals are intrinsic for individual molecules and consequently exhibit appropriate convergence properties as the number of basis functions used in calculations increases.
It has also been demonstrated that the MOC-scheme can be successfully utilized to approximate canonical free-atom orbitals within any arbitrary basis set using oneoff calculated FAOs of high quality (by default stored on disk); therefore, generating of IMB-orbitals does not require every-time calculations of promolecular systems. Moreover, partial atomic charges calculated within representation of such moleculeadapted atomic orbitals tend to converge noticeably faster to their exact values in the limit of complete basis set.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.