Introduction

Ferroelectrics are insulating materials with a nonzero electric polarization switchable by an applied electric field. Empirically, a ferroelectric is characterized by a measured polarization versus electric field hysteresis loop, with spontaneous polarization being half of the measured change in polarization at zero electric fields (Fig. 1a). Ferroelectric materials are critical for many technologies and have found applications in electronic devices such as memories, sensors, and capacitors among others1. Well-known ferroelectrics include those in the perovskite family, such as BaTiO3, PbTiO3, and KNbO32. Ferroelectric materials with magnetic order, or multiferroics, find applications in which the magnetic order can be altered by an external electric field3,4. BiFeO3 is an example of a multiferroic where the electric polarization can be coupled to the magnetic order5,6,7.

Fig. 1: Schematic representation of the key properties of a ferroelectric.
figure 1

a The measured polarization P as a function of the applied electric field ϵ and b the typical double-well profile of the energy E as function of the polarization P of a ferroelectric material. Highlighted in b are an atomic-scale schematic of the polar and nonpolar phases and their energy difference ΔE, used as a simple indicator of polarization switchability within the Landau-Devonshire framework.

For more than three decades, ferroelectric materials have been studied with density functional theory (DFT) and the modern theory of polarization8,9,10,11,12,13,14, where extensive progress has been made in understanding the physical and chemical mechanisms underlying the emergence of macroscopic polarization in bulk and finite materials15,16. DFT with standard exchange-correlation functionals has also been shown to predict polarization with good accuracy, often matching values measured by experiments within a small error1.

A necessary condition for having a nonzero spontaneous polarization is that the material possess a polar space group. However, many polar crystals with a nonzero polarization are not switchable, and, therefore, not ferroelectric. Landau-Devonshire theory is often used to rationalize and predict the likelihood the polarization in a material is switchable. Within this framework, a key descriptor for switchability is the energy gained by symmetry-breaking atomic distortion taking a higher-symmetry nonpolar phase to a lower-symmetry polar crystal structure (Fig. 1b). Early studies by Abrahams17,18,19 used this descriptor and symmetry criteria to search for switchable polar materials in crystallographic databases predicting candidate ferroelectrics, some of which, more recently, have been synthesized and confirmed20,21,22,23. Nowadays, Abrahams’ analysis can be readily performed using the online software on the Bilbao Crystallographic Server (BCS), such as PSEUDO24, which has been used in prior work to find a nonpolar reference structure starting from a polar one in certain families of compounds25,26,27,28.

The advent of high-throughput DFT approaches and materials databases has enabled new search strategies and larger-scale high-throughput ferroelectric materials searches29,30,31,32,33,34. Smidt et al.35 developed the first automated workflow for ferroelectric materials discovery and searched the Materials Project (MP) database for pairs of polar and nonpolar structures connected by a subgroup-group relationship. DFT values of the polarization of hundreds of polar compounds were automatically computed using the modern theory of polarization and an interpolation method for automatically selecting the branch of polarization13,14, the latter also having been implemented in the Atomate software36. For known compounds, the computed polarization was compared with the experiments and found to be in agreement to within 10%, or better. Ultimately, ref. 35 identified 200 ferroelectrics, where 74 were known or previously proposed, while 124 were new candidates, some of which were later experimentally investigated37,38.

Although ref. 35 represents the largest database of potential ferroelectric materials automatically generated via first-principles methods to date, it did not discriminate for synthesizability and stability. Further, it neglected polar compounds that did not have a nonpolar counterpart phase in the database. Since there are ~10,000 polar insulating materials in the MP database without a nonpolar reference phase, relaxing this constraint and generating a nonpolar reference for each polar material would increase the number of candidate ferroelectrics.

Here, we present a new workflow for ferroelectric materials discovery, and use it to compute the polarization of 447 polar materials in the MP database, predicting 182 potential candidate ferroelectric materials, 130 of which are previously uninvestigated. We make use of pseudosymmetries to generate a nonpolar reference structure starting from an initial polar structure, using the PSEUDO tool24 of the BCS (https://www.cryst.ehu.es/). While pseudosymmetries have been used to find reference nonpolar structures of potential ferroelectrics in the past39,40,41, they have yet to be applied in an automated fashion at the large scale we have here. In the present work, we automatically query and extract data from the PSEUDO web interface, enabling the automatic search of a nonpolar reference for any polar structure. In particular, with such a method, we generate a nonpolar reference and compute the polarization for a large set of insulating polar structures present in the MP database, which do not have a nonpolar reference in the MP database. We focus on materials that are non-magnetic in the MP database, and we consider only materials that have formation energy on the convex hull, increasing their relevance to the experiment. The results are publicly available through the MPContribs platform (https://contribs.materialsproject.org/). None of the polar compounds presented here were studied in ref. 35. Further, for the subset of materials that are most promising as ferroelectrics, we develop a ranking that combines the computed polarization and polar-nonpolar energy difference (Fig. 1b) with an automated literature search and a machine learning-based prediction of synthesizability42 to provide suggestions for new potential ferroelectrics for experimental investigation.

Results and discussion

Screening

We use the MP database for our high-throughput screening for ferroelectric materials. We consider materials in the database that have a polar space group, that are insulating with a DFT-PBE Kohn-Sham band gap higher than 0.1 eV, and that are non magnetic (as reported by the MP). We consider only materials that have energy above hull equal to zero eV per atom, that is, materials that are ’on the hull’. The initial set defined by these criteria contains 1978 materials. We note that many materials reported in the MP to be above but within 0.1 eV per atom of the hull are synthesizable43,44 (e.g., BiFeO3). The constraint of zero energy above hull can be easily lifted, doubling or more the number of potential candidates; however, as this also significantly increases the computational cost and analysis effort, we leave these systems for future work. Consistent with potential synthesizability, we focus only on binary, ternary, and quaternary compounds with less than 56 atoms per primitive cell. Also, we exclude materials containing rare-earth elements because such elements can possess partially filled f-shells, and strong spin-orbit coupling and can require a case-by-case treatment. Finally, we restrict our focus to polar materials that either do not have a nonpolar reference phase in the MP database or that have such a reference but were not available at the time the database in Ref. 35 was developed. None of the polar compounds presented here were studied in ref. 35.

For all polar materials studied, we automatically generate a nonpolar reference phase using the BCS PSEUDO tool. Among the higher-symmetry structures proposed by PSEUDO, we consider only the nonpolar ones, and we rank them according to their maximum atomic displacement (MAD) relative to the polar phase and select the nonpolar structure that has the smallest MAD. This approach automatically provides nonpolar structures most likely to have minimal energy differences with the polar phase. The cases where PSEUDO returns only polar structures, about 40% of the initial set, are discarded. A natural extension is to re-run the pseudosymmetry search on each of these newly identified polar structures and find the closest nonpolar with higher symmetry. This extension of our workflow would recover ~40% of the compounds, but we relegate it to future work. Finally, since the nonpolar structure in its conventional setting obtained from PSEUDO is already in the same setting as the polar, it can be directly used in the automated workflow. To reduce the computational time whenever possible, we reduce both structures to their primitive unit cell.

The screening of the MP database with the above-mentioned filters and the subsequent pseudosymmetry search of the nonpolar parent structures returns 618 pairs of structures for which we compute the polarization. Eighty of these pairs had a calculation issue at some point in the workflow. Among the remaining 538 completed workflows, 447 were completed successfully and the polarization is computed. In 91 cases, either the nonpolar structure (69 cases) or one of the other structures is metallic, and the polarization calculation was not complete. Figure 2a summarizes the screening process. In the Results section, the 447 compounds that completed the workflow are summarized, along with a brief comment on those with a nonpolar phase that is metallic within DFT-PBE. Fig 2b summarizes the classifications made for these remaining compounds. Compounds that are simultaneously polar and magnetic, which require a careful assessment of the magnetic ground state, are not considered here and will be the subject of future work.

Fig. 2: Summary of the screening process and number of structures in output at each step.
figure 2

a Subsets of candidate ferroelectric materials found during our screening and their size. Starting from the initial entries in the MP, filters on the number and type of elements are applied; PSEUDO is used to generate the nonpolar reference; and an automated workflow (WF) is used to compute the polarization and the polar-nonpolar phase energy difference. b Subsequent classification based on the polar-nonpolar phase energy difference, polarization, ICSD ids, and synthesizability (as computed following prior work42). In the charts, the abbreviations e.a.h., Eg, FeDB, #elements, no rare-earths, #sites, P, and ΔE stand for the energy above hull, the electronic band gap, the database of ferroelectrics in ref. 35, the number of elements, rare-earth elements are excluded, the number of elements, the polarization, and the polar-nonpolar energy difference, respectively. Paired and not paired refer to polar-nonpolar pairs present or not in the FeDB.

Main classification

We summarize the 447 compounds that emerge from our workflow in terms of their computed polarization P and the energy difference between polar and nonpolar phases ΔE, as shown in Fig. 3. The computed values of both quantities cover a large range. In particular, 50% of materials have ΔE in the range 0–100 meV per atom, and 70% possess P less than 50 μC cm−2. Based on previously reported values of these quantities for known ferroelectric and polar materials, such as BaTiO3, PbTiO3, LiNbO3, BiFeO3, YMnO3, HfO235,45, GaFeO346, AlN, Al0.5Sc0.5N47, ZnO48, and GaN (present work) computed with same exchange-correlation functional (except for YMnO3 where LDA functional was used) and nonpolar reference as those used here, we classify our 447 candidate materials in three tiers based on ΔE (Fig. 1b), indicated with different colors in Fig. 3.

Fig. 3: Main classification of the potential ferroelectrics in the dataset.
figure 3

Computed polarization P (μC cm−2) and the polar-nonpolar phase energy difference ΔE (meV per atom) of polar compounds in the present dataset. Three tiers are highlighted according to the energy difference: the green tier contains potential ferroelectrics that are similar to common ferroelectrics; the orange tier contains polar materials where the energy difference could be tuned, and ferroelectricity appears, as it happens for AlN with Sc doping; the red tier contains polar materials with an energy difference that may be too high to allow ferroelectricity. The dashed box encloses a subset of materials most relevant as ferroelectrics and considered in the ranking and literature search. The negative values of the energy difference represent cases where the nonpolar reference structure has lower energy than the polar phase. Values of the ΔE and P for BaTiO3, PbTiO3, LiNbO3, BiFeO3, YMnO3, HfO235,45, GaFeO346, AlN, Al0.5Sc0.5N47, ZnO48, GaN (present work) are reported for comparison.

The first tier contains 222 materials with ΔE in the range 0–100 meV per atom (green region), similar to values reported for known ferroelectrics. The second tier contains 74 materials with ΔE in the range 100–200 meV per atom (orange region), similar to polar materials such as AlN. The pristine version of AlN is a polar material with a high P (~120 μC cm−2), but not a ferroelectric, as its P cannot be switched by an electric field. However, as found both theoretically47,49,50 and experimentally50, doping AlN with Sc atoms preserves the P, reduces the ΔE, and makes switching possible. Alx-1ScxN has been synthesized50,51, and it is promising for applications52. Proposed compounds with ΔE in the same range may exhibit switchability via doping or other alternative methods (e.g., strain, thin films). The third tier contains the remaining 95 compounds with ΔE above 200 meV per atom (red region). This may be an upper limit for conventional ferroelectricity; GaFeO3 possesses the largest predicted ΔE that has also been experimentally demonstrated to be ferroelectric46, to our knowledge. These polar compounds, like GaN, could also be interesting for other properties (e.g., dielectric, piezoelectric, photovoltaic, nonlinear optical), but may not be bulk ferroelectrics, at least in pristine crystalline form.

Subset of potential ferroelectrics

We now focus on a subset of the 182 identified candidate ferroelectric materials in tiers 1 and 2 with a computed ΔE between 1 and 200 meV per atom and a predicted P larger than 10 μC cm−2. This subset is highlighted by the dash line region in the Figures 3 and 2b.

We assess these candidate materials via three metrics. First, we rank promising ferroelectric materials identified here by combining their computed values of P and ΔE into a unitless effective score, \({F}_{{{{\rm{score}}}}}\), defined as follows: \({F}_{{{{\rm{score}}}}}=\bar{P}-2\cdot \Delta \bar{E}\), where \(\bar{P}\) and \(\Delta \bar{E}\) are P and ΔE normalized with respect to their respective maximum value. \({F}_{{{{\rm{score}}}}}\) is then normalized and scaled to take on values between zero and unity. \({F}_{{{{\rm{score}}}}}\) prioritizes candidates with high P and low ΔE. We note that our choice of ranking is not unique, and present data are publicly available; the materials computed here can be ranked differently depending on specific target properties.

Second, we use Scopus (https://www.scopus.com/home.uri) and Google Scholar (GS) databases to determine whether the compounds have been synthesized or studied previously by theory, as described in Methods. We count the total number of entries for each material’s chemical formula in Scopus and GS databases, the sum of the number of abstracts containing relevant ferroelectric-related keywords, and the number of abstracts that refer to synthesis. These cumulative numbers indicate the extent of prior studies of the material both in general and in the context of ferroelectricity or piezoelectricity.

Third, we separate this subset of materials into two categories depending on whether they are in the Inorganic Crystal Structure Database (ICSD) database53. The first category contains the 132 compounds with an ICSD id, which we consider as increasing its likelihood to be synthesizable with conventional methods. The second category contains 50 materials which are not included in the ICSD database, indicating that some of them have not yet been synthesized. To assess the synthesizability of these 50 compounds, we adopt the machine learning-based approach presented in ref. 42, and briefly described in the Methods section, which provides an approximate measure of those materials more likely to be synthesizable. Using this model 19 compounds are predicted to be likely synthesizable and 31 compounds less likely synthesizable. This subsets are visually represented in Fig. 2b.

The 30 compounds with an ICSD id with the highest \({F}_{{{{\rm{score}}}}}\) are reported in Table 1. The reminder of materials in this category is provided in Supplementary Table 1. In Table 2 the same ranking is reported for compounds absent from the ICSD database but predicted likely synthesizable. Those predicted less-likely synthesizable are reported in Supplementary Table 2. In these tables, the LP column indicates the presence in the literature of each formula; the R column indicates the number of abstracts containing at least one of the keywords related to ferroelectricity; the S column reports the number of abstracts referring to a synthesis method. The details on how these three quantities are determined can be found in the Methods section.

Table 1 Top 30 polar compounds with experimentally known structures.
Table 2 Theoretical polar structures with the synthesizability \(C{L}_{{{{\rm{score}}}}}\) higher then 0.5, as defined in Methods.

Tables 1 and 2, together with Supplementary Tables 1 and 2 summarize promising ferroelectrics, and they provide information on which of these materials have already been synthesized. A higher number of entries in Scopus or GS databases (LP column) suggests that these compounds are relatively more prominent in the literature, and those materials that have a higher number of entries reporting synthesis (S column) are more likely to be readily synthesized. Cases where LP is ’Medium’ and ’High’ and R is equal to zero suggest that either they have been studied for applications other than ferroelectrics, or the investigated phase is different from the polar one. Overall, we find 53 compounds that are, for the most part, previously unknown, i.e., they have a total number of entries in the two databases lower than 10 (’Low’ LP). Considering the results from Scopus, 51 have entries that can be related to ferroelectricity and piezoelectricity, 80 are known in the literature but have been studied for different properties and applications, and 119 have been featured in at least one paper mentioning synthesis methods. Therefore, our literature search suggests that ~130 materials are potential ferroelectrics which may be targeted for further theoretical and experimental effort.

Our literature search finds that the main applications explored for these materials include photovoltaic, photocatalytic, and nonlinear optical properties such as second harmonic generation (SHG); electrodes and electrolytes for batteries; and thermoelectricity. The study of polar and ferroelectric materials for photovoltaic applications is motivated by the generation of photocurrent in these materials54. More generally, nonlinear optical effects constitute an important application area55, and LiNbO3, for example, shows strong SHG and is also a ferroelectric56,57.

In the following, we discuss some of the materials present in Tables 1 and 2 and recently reported in the literature. In Table 1, among those with an ICSD id and reported as relevant, we recognize LiTaO3, GeTe, Bi2WO6, and SrNb2Bi2O9 that have already been extensively studied as ferroelectric or piezoelectric materials; Sr2Nb2O7, SnPS3, LiGaO2, Ga2S3, SnPSe3, KNbSi2O7, B4PbO7, and Li2GeO3 are also known ferroelectrics but less extensively studied. In Ref. 58, the dielectric and piezoelectric properties are reported of vanadium-alloyed Sr2Nb2O7, but the potential for ferroelectricity in this system has, to our knowledge, yet to be assessed. Jia et al.59 performed a computational study of the NbX2O family, finding materials with intrinsic ferroelectricity and antiferroelectricity, and proposed these materials as potential 2D ferroelectrics. Predicted value of P for NbX2O (where X = Cl, I, Br) compounds in our database are in agreement with ref. 59. Some of our A2B3 candidates, and in particular some in the III2-VI3 family, appear to be promising from prior theoretical and experimental studies60,61 as 2D ferroelectrics62,63,64,65,66. Further, CsGeX3 (with X = Cl, Br, I) compounds are part of another family recently synthesized where ferroelectricity is reported67. Our computed values of P, for these compounds, are very close to those previously measured and computed.

Of the remaining materials with an ICSD id reported in Supplementary Table 1, we mention SrAlGeH, a member of the hydrogenated Zintl family AeTrTtH (Ae = Ca, Sr, and Ba; Tr = Al and Ga; Tt = Si, Ge, and Sn); some of these compounds have been synthesized and studied for their potential bulk photovoltaic effects and ferroelectricity68. In addition, NaGaO2 has been reportedly synthesized69,70 but has not yet been explored as ferroelectric. Interestingly, CuGaO2 has been identified as promising ferroelectric71 and was synthesized recently,72, and its P measured73. This material is not in our screening because its energy above hull is ~0.1 eV per atom according to the MP. Furthermore, multiferroic NaFeO2 has been reported as ferroelectric and a weak ferromagnet74. Finally, we mention Ca4YB3O10, which has been investigated as a piezoelectric material75 and has been synthesized76. This systems may not be a ferroelectric in its pristine form due to its high ΔE (low \({F}_{{{{\rm{score}}}}}\)).

Among the structures without an ICSD number predicted to be synthesizable in Table 2, LiAlS2 is the only one reported as relevant. LiAlS2 has been recently reported as part of a study of a larger family of LiMX2 compounds computationally predicted as promising monolayer piezoelectric compounds77. Furthermore, its relative LiAlTe2 has been studied as ferroelectric in both bulk and 2D layered forms, and both P and ΔE reported values are similar to our calculated values78. Many materials reported here are quaternaries and hence relatively less explored, especially as ferroelectrics, according to our literature search. Also, their ΔE values are relatively high, suggesting that while they may not be ferroelectric in the bulk, they may be tuned to enable ferroelectricity.

In Supplementary Table 2, we also tabulate candidate ferroelectrics predicted to be less-likely synthesizable. Among the materials with higher \({F}_{{{{\rm{score}}}}}\), we identify some nitrides with stoichiometry XYN2 and XYN3, with large P and large ΔE values. These materials were also predicted in a prior high-throughput search79 and recently added to the MP database. Some of these nitrides were reported in previous computational studies as promising materials for photovoltaic applications80. Notably, TiZnN2 has been recently synthesized in its polar wurtzite-like phase81, but its P has not been measured yet. This class of materials certainly deserves further investigation to assess if ferroelectricity is possible. In general, although most of the materials in Supplementary Table 2 appear to be appealing as ferroelectric candidates, they may be somewhat more challenging to synthesize by conventional methods.

Overall, our large-scale high-throughput screening offers the opportunity to identify already known ferroelectrics and to highlight new potential ferroelectrics among materials that have been less investigated or overlooked. Materials present in our dataset, such as CsGeX3, were very recently synthesized and reported to be ferroelectrics; NbX2O and A2B3 have been the subject of recent theoretical investigations for ferroelectric properties, with computed P in agreement with values reported here, providing validation to our approach. Future studies building on our work could take multiple approaches. One could select candidates from our study that have been already reported in the literature but not studied as ferroelectrics. In this case, papers reporting synthesis techniques could be helpful as a starting point for realizing these materials experimentally. Or, one could select those that are relatively less known, in an attempt to identify a completely new ferroelectric material. We hope our database and workflow will inspire several such studies.

Dynamical stability of polar phases

The thermodynamic stability of the polar phases studied here is imposed during our screening by the use of the stringent criterion of working with materials with only zero energy above hull. At zero temperature, dynamical stability is conventionally evaluated by computing the phonon spectrum, and inspecting for imaginary frequencies. We make use of the phonon band structures provided by Togo et al.82,83 (phonondb, available at https://mdr.nims.go.jp/collections/8g84ms862), obtained via finite differences approach and within DFT, to roughly access the dynamical stability of the polar materials in our subset of potential ferroelectric compounds. Among the 182 materials in this group, the phonondb database contains the phonon band structure for a subset of 94 compounds. Inspection of the phonondb suggests that out of these 94 cases, 23 are susceptible to instability at 0K due to the presence of imaginary frequencies at q-points beyond the Γ point, principally zone boundary instabilities. Notably, of the 23 compounds with potential 0K instabilities, 22 are associated with an ICSD id, suggesting plausible stability at higher temperatures. Hence, while assessing phonons at 0K serves as a reliable metric for ascertaining dynamical stability, they are not necessarily indicative of a material’s stability and synthesizability at different temperatures, pressures, strain, etc. For example, the tetragonal phase of BaTiO3, which is stable and ferroelectric at room temperature2,84,85, exhibits imaginary frequencies both at zone center and zone boundary points in its computed 0K phonon band structure (https://next-gen.materialsproject.org/materials/mp-5986).

Role of the nonpolar structure and energy profiles

In this section, we comment on the role of the nonpolar phases in the context of defining a material as ferroelectric. Using the modern theory of polarization8, the polarization of a polar material is defined as the difference between the polarization in a polar and nonpolar phase along an adiabatic path. Therefore, a reference nonpolar structure is often required to have an accurate value of polarization for comparison with the experiment. In some cases, e.g. in perovskites, the nonpolar phase can have a physical meaning as a high-temperature phase, and the difference in energy between polar and nonpolar phases provides a measure of the stability of the polar phase with temperature, as well as being a measure of the ability to switch the polarization direction via an electric field (Fig. 1b).

While the nonpolar structure that we obtain using pseudosymmetries is valid as a reference to compute the polarization, because it is on an adiabatic path, there is no guarantee that this nonpolar structure is relevant to switching under an applied electric field. For instance, the polarization could rotate under applied field, instead of maintaining its direction and reducing its intensity, with the rotated phase still possessing a polar space group86. There also may be another nonpolar phase, one not related to the polar one by group-subgroup relationships, that could compete in energy and manifests during the switching. At microscopic level, switching is generally accepted to occur in many ferroelectrics by domain wall motion87. Overall, this means that the intermediate phases occurring during switching, their energetic, and the corresponding electric field required for switching, could differ from the ones we report here. The nonpolar structures used in our study, while strictly a means for computing the polarizations and energy difference, are closest to the polar phase in terms of atomic displacements, and, thereby, the most relevant and simplest for assessing the stability of the polar material and its potential ferroelectricity. Therefore, we use it to classify the materials in three tiers, separating those cases that are likely to be ferroelectric from others which are most likely not, despite the caveats mentioned above. Finally, we note that a quantitative estimate for the coercive field could be an alternative descriptor for switchability, but its accurate prediction, which depends on the microscopic nature of the switching pathway, is currently beyond the scope of high-throughput ab initio investigations.

We now comment on the energy profiles obtained by our interpolation method. The majority of these energy profiles show a classic double-well shape (Fig. 1b). However, in 35 cases the energy profile deviates from this standard shape and indicates that one of the intermediate interpolated structures has an energy higher than the nonpolar reference. This would suggest that the relevant energy barrier could be actually higher than the simple energy difference between the polar and nonpolar phase. However, the linear interpolation that we perform to generate the intermediate structures is fictitious, and the actual structures could find different ways to rearrange the atoms to minimize the energy. In the context of two-dimensional ferroelectric materials62, the nudged elastic band approach88 is used to recover the usual double-well profile in such a case. The energy profiles generated automatically from linear interpolation of the endpoint structures do not always provide reliable information, and the actual path can require more advanced methods to be determined accurately. Also, in some case, the atomic distortions along this fictitious path can lead to metallic structures. This was reported in ref. 35 and can be linked to our 22 cases (Fig. 2a) where a metallic interpolated structure is found.

Other sets of materials

We note that 75 compounds have an energy difference close to zero or negative (<1 meV per atom). In these compounds the nonpolar phase we create is actually more stable than, or competing with, the polar one. In general, these cases constitute a discovery of another phase, not yet reported in the MP database, that expands knowledge of the energy landscape around the known polar phase. Cases with energy differences of ~1 meV per atom should be treated with care and are at the accuracy limit of our calculations; moreover, the synthesis of these materials could likely lead to the competing or more stable nonpolar phase instead, as reported recently for BiInO3 in ref. 37

For 69 compounds, as reported in Supplementary Table 3, the polarization could not be computed due to the fact that the nonpolar structure turns out to be metallic (i.e., the computed DFT band gap is zero) within DFT-PBE. In these cases, there may be another nonpolar structure that is semiconducting, or the metallic nonpolar phase could be an artifact of the exchange-correlation functional we use. Although we do not investigate these cases further in the present work, we cannot rule these systems out as potential ferroelectric or piezoelectric materials. An example is LaWN3, which has been recently synthesized in its polar phase89 and shows piezoelectric properties; its nonpolar phase is known to be metallic89,90. We note that this specific compound is not among our current candidates because we exclude materials with rare-earth elements, but it underscores that the materials in this group should be considered for further investigation in future work.

Methods

Pseudosymmetries

The present workflow implements a large-scale use of the BCS’s PSEUDO tool via an automated interface to generate a nonpolar parent structure from an initial polar structure that is used as a reference to compute the polarization. PSEUDO is a tool that can be used via a web interface on the BCS website. This method requires an input structure in the conventional cell setting and a tolerance value representing the maximum atomic displacement in Angstroms allowed for the pseudosymmetry search. A default value of 2 Å is used in our search. Starting with a polar structure, PSEUDO returns a list of possible parent structures of higher symmetry respecting a) the group-supergroup relationship, (b) the lattice distortion default values, (c) the fact that the maximum atomic displacement is lower than the chosen threshold, and (d) the consistency of Wyckoff splitting for each species24. Therefore, PSEUDO does not simply rely on decreasing the tolerance for the symmetry identification of the input structure, as discussed in previous works33. PSEUDO instead represents a more general, rigorous, and effective approach for detecting high symmetry parents of an initial structure. We note that the structures proposed by PSEUDO have the same cell parameters of the input polar structure, and therefore, a relaxation is needed to access its total energy; we further note that their space groups are not necessarily only nonpolar. Then, a nonpolar structure has to be chosen, downloaded in cif format, and used for different types of investigations24. The whole procedure to obtain a nonpolar structure is typically done manually by the user through the PSEUDO web interface, which limits the use of this tool at a large scale. To overcome this limitation, we implement a python interface that can automatically interact with the PSEUDO website, retrieve the proposed structures as pymatgen91 structure objects, and rank them by their maximum atomic displacement and space group. We note that the application of this workflow to polar phases of well-known ferroelectrics BaTiO3 and PbTiO3 leads to the cubic perovskite nonpolar reference structure. Other cases where the nonpolar phase found by our procedure matches the one reported in the literature are mentioned in the ’Subset of potential ferroelectrics’ section. Also, we highlight that our code can be used to automatically build the complete graph of all possible parent structures starting from an input structure, which can be used to study competing phases and the energy landscape around a certain structure of interest (not necessarily polar).

Computational details

The polarization is computed via DFT and the Berry phase approach8,9,10,11,12,13,14, as implemented in Vienna Ab initio Software Package (VASP) version 6.392,93,94. We use the generalized gradient approximation (GGA) functional of Perdew, Burke, and Ernzerhof (PBE)95 and the projected augmented wave (PAW) pseudopotentials96. The interpolation of the polarization branch is implemented in the workflow described in ref. 35 and in the Atomate code36. We refer to these papers for more details on the workflow and the DFT parameters, as both have been used here in the same way. We reiterate that the values of lattice parameters, total energy, and polarization depend on the choice of the exchange-correlation functional45; this must be kept in mind when comparing these values for the materials reported in the present work with those taken from other works in the literature.

Polarization jumping detection

In ref. 35, cases such as CrO3, where the polarization values jumped from one branch to another nearby, resulting in an incorrect prediction of the polarization, were marked with a warning. In the present workflow, we improve the detection of a polarization jumping branch, and for the problematic cases, their polarization is recomputed using double the number of interpolated structures. Briefly, 34 cases out of 447 are recomputed in this way and recovered. Only seven cases still present a change of branch even with a doubled number of interpolation points, although they are less interesting for ferroelectric applications due to their very high ΔE. More details of the improved detection method are included in the Supplementary Methods.

Literature search

The large-scale screening of the literature performed here has been implemented by searching two known databases, Scopus (https://www.scopus.com/home.uri) and GS. The first database is searched with the MatScholar web tool (https://matscholar.com/), which provides access, via a website and a dedicated API, to abstracts, titles, and authors of scientific papers published before 2018 and reported in Scopus. MatScholar uses an unsupervised machine learning technique based on natural language processing trained on the Scopus database of scientific paper abstracts97,98. This allows the automatic extraction of materials properties, applications, phase labels, synthesis and characterization methods, enabling the user to obtain information about materials from the literature. We searched this database for the formula both in the standard reduced format used in the MP and in alphabetical format. For each formula, we check how many entries contain relevant context-related tags (automatic extracted by the Matscholar algorithm) such as ’ferroelectric’, ’spontaneous polarization’, ’piezoelectric’, ’remnant polarization’, ’synthesis_methods’, to estimate if that formula has been previously associated to ferroelectricity and if a synthesis technique is reported. Furthermore, to include more updated contributions, we screen for formulas and relevant context-related text strings (equivalents to the above mentioned tags) in the abstracts present in Scopus since 2018 by a simpler text-matching method. An extract of the output of this screening is shown in Supplementary Tables 4 and 5 for the top most known formulas with a number of abstracts containing that formula higher than 10. The sum of entries containing a ferroelectric-related or ’synthesis_method’ keyword, in both Scopus datasets (pre- and post-2018), are collected in the R and S columns, respectively, of Tables 1 and 2 and Supplementary Tables 1 and 2. Further, since most of the materials are not found in the Scopus database, we search through the GS database, which, in principle, allows the screening of the whole literature, for the chemical formula, only in the standard reduced format used in the MP, and record the total number of entries. An extract of the output of this screening is shown in the Supplementary Table 6 for the top 30 most known formulas. Finally, to further understand which properties have been studied and which applications were proposed for these materials, we also searched each formula on GS, looked at the title and text snippet associated with each entry of the first page, and manually extracted the main material properties, applications, and the research topics.

Our approach presents some potential pitfalls, and, in particular, the number of false positives can be high. This can be due to the fact that (a) a keyword being in the abstract does not guarantee that it actually refers to the considered material, and (b) the phase of the material studied in the paper can be different from the polar one that we consider here. However, we consider the false positives a minor issue. In fact, on the one hand, those cases where many abstracts are found, will certainly contain false positives, but it is very unlikely that they are all false positives. Assuming a fixed percentage of false positives (as a systematic error), we can confidently assume that these materials are the most reported in the literature among all. On the other hand, those cases with few entries can still contain false positives, which can be easily checked manually, but the fact that these are not known in the literature is captured. Lastly, we could be missing papers where the chemical formula is not expressed in a conventional way. An example is Ca4YB3O10, which is known in the literature with the formula YCa4O(BO3)3. In order to take into account these limitations, we summarize the overall presence of the materials in the literature by summing the number of entries per each formula in the three different databases and by grouping the number of entries in three ranges, namely [0,10], [10,100], >100, and we refer to them with the labels ’Low’, ’Medium’, ’High’, respectively.

Synthesizability score

The synthesizability of predicted compounds, i.e., those without an entry in the ICSD databases, has been assessed by a machine learning technique that has been recently used to predict the synthesizability of perovskites99 and MAX phases100. In particular, it has also been applied to the entire MP database to provide a score of synthesizability for all the materials42, which is particularly useful for those structures that are only theoretical structures. This technique leverages the fact that the majority of the materials in the MP database have an experimental crystal structure reported in the ICSD database of experimental crystal structures53. This can be exploited by ML to predict if a crystal structure is synthesizable measuring its difference with those structures that are known experimentally and returning a probability score called crystal-likeness score (\(C{L}_{{{{\rm{score}}}}}\)). This unitless score ranges between 0 and 1, with 0.5 taken as a threshold to distinguish likely synthesizable materials (above 0.5) from those less likely (below 0.5). The model has been trained on the MP structures with an ICSD id and shows an accuracy higher than 80%42. Then, it has been applied to predict the synthesizability of the theoretical structures in the MP database. The score for all the structures of the MP database is publicly available, and we use it to assess the synthesizability of the theoretical structures in the present dataset.