Background

Fluoride (F) has been used as preventive and therapeutic agent in dentistry for over eight decades. It is widely known that its main side-effect (i.e., dental fluorosis) was reported decades prior to the accidental discovery of its caries-preventive effects1, further leading to investigations on the mechanisms of action involved, acute and chronic toxicity, as well as its safety and modes of administration. In brief, F can be delivered by community-based strategies (e.g., water, salt and milk fluoridation schemes), as well as by professionally- or self-application methods (e.g., toothpastes, mouthrinses, gels and varnishes), alone or in association2, and its use is regarded as safe and cost-effective when administered within the recommended levels3,4.

As for community-based methods, water fluoridation is by far the most widely used worldwide, covering ~ 400 million people in 25 countries5. It is regarded as a cost-effective method, consisting of the controlled addition of F to the public water supply at concentrations typically ranging from 0.7 to 1.2 mg/L, depending on the mean annual temperature6. Water fluoridation was considered as one of the ten greatest public health measures of the twentieth century achievements according to the US Center of Disease Control and Prevention7, which is endorsed by several scientific societies, including the World Health Organization (WHO).

Despite the body of evidence attesting the efficacy and safety of water fluoridation, this method has been the subject of heated debate in several parts of the world, questioning legal aspects of the compulsory nature and potential harmful effects. Within this context, a recent systematic review with meta-analysis attempted to demonstrate the relationship between F exposure from the drinking water and intelligence quotient (IQ) impairment, concluding that exposure to water containing high F levels interferes with the child's intelligence development8. It is noteworthy, however, that no clear-cut threshold was established to determine which F levels would correspond to each study group, resulting in a wide variability within the control (0.25 to 1.03 mg F/L) and exposed (0.8–11.0 mg F/L) groups, with some overlaps between them. Others reviews were designed to reunite the evidences regarding F developmental neurotoxicity9,10 and have highlighted the detrimental effects of high fluoride doses in children exposed by fluoridated water. It is important to highlight that the present study gathered evidences not only from children, but adults exposed to all fluoride sources according to the search strategy. Moreover, we seek to investigate the available evidences about neurological damages in general, not only mnemonic aspects. Also, some of the concentrations included in the control group are not effective for caries control according to the WHO criteria, so that the issue of risks and benefits resulting from exposure to fluoridated water could not be analyzed. Furthermore, the review focused on IQ impairment without considering other neurological disorders that could also potentially be associated with F exposure.

Considering the above, the present systematic review and meta-analysis aimed to investigate the impact of environmental exposure to F from different sources on neurological disorders in humans. For studies that assessed F exposure from water, this review adopted the WHO guidelines to dichotomize between low (0.5 to 1.0 mg F/L) and high (above 2 mg F/L) exposure, allowing the discussion of doses safety of water fluoridation.

Methods

Protocol and registration

This systematic review was registered in PROSPERO database, under CRD number 42017067234. A review was performed according to Moher, Liberati11, followed as recommendations by the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement.

Eligibility criteria and search strategy

This review was designed using the PECO strategy and based on it, observational studies in humans (P) exposed to high concentrations of F (E) and low concentrations (C) in which the associations between F and neurological damage (O) were investigated. Case reports, descriptive studies, review articles, opinion articles, technical articles, guidelines, as well as animal and in vitro studies were disregarded.

The study was based on the question: "Can chronic F exposure be associated with neurological damage?" The searches were performed in January 2021, with no restrictions on the date of publication and the language of the studies. The electronic databases used were: PubMed, Scopus, Web of Science, Lilacs, Cochrane and Google Schoolar. The MeSH terms used were: “Humans”; “Central nervous system”; “Nervous system”; “Fluorine”; “Fluorides”; “Fluorine Compounds”; “Fluoride Poisoning”; “Neurobehavioral manifestations”; “Nervous System Disease”; “Neurologic manifestations”; “Intelligence. All MeSH keywords and search strategy were adapted according to the specifics of each database, as represented in Table A.1.

After the search stage, an alert was registered in each database for weekly notification of new studies that fit the vested strategy. All citations were entered into a bibliographic reference manager and duplicate studies were excluded, either automatically or manually (EndNote®, v. X7, Thomson Reuters, Philadelphia, USA). The search, study selection, risk of bias and data extraction stages were performed independently by two evaluators (G.H.N.M; M.O.P.A.) and checked by a third evaluator in case of disagreement (R.R.L).

Then, the study selection was made based on the title and abstract of articles and then by full-text analysis according to the recommended eligibility requirements. Reference lists of included studies were also evaluated for study selection.

Data extraction and assessment of methodological quality and risk of bias

From the included articles, data regarding the year of publication, study design, participant characteristics (origin and sample size), mean age, F concentration measurement parameters, diagnostic criteria for assessment of cognitive performance, results and statistical analysis were extracted and tabulated. In case of doubts about the methodology, lack of data in the studies and inability to find full articles, the authors were contacted via email with a weekly message for three consecutive weeks.

To assess the methodological quality and risk of bias, the checklist of Fowkes and Fulton12 was applied. This checklist has domains that relate to study and sample design; control group characteristics; quality of measures and results; and distorted integrity and influences.

After evaluating each criterion, a (++) sign was assigned for major study problems or (+) for minor problems to assess whether the methods are adequate to produce consistent and valid information, as well as whether the results offered the expected effects. In items where the question was not applicable to the type of study, it was assigned the acronym NA (not applicable). "No problem" has been assigned the sign (0). The evaluation for each domain was standardized by the examiners and is described in Table A.2.

After detailed evaluation of the methods and results, the studies were analyzed to verify the possibility of "skewed results", "confusions" and "random occurrence". To determine the value of the study, three summary questions were answered: "Were the results biased?"; "Are factors of confusion or distortion present?" and "Is there a possibility that the results came about by chance?" "YES" and "NO" answers were given. If the answer is NO in the three questions, the article is considered reliable, with low risk of bias.

Quantitative synthesis (meta-analysis)

The studies data were analyzed using Review Manager software (Review Manager v. 5.3, The Cochrane Collaboration; Copenhagen, Denmark) to evaluate if Chronic exposure to F is associated with neurological deficit. In all analyses, only studies with low risk of bias were included.

A meta-analysis was performed to compare the percentage of low IQ with high and low chronic exposure to F. Previously, each study classified the F levels as low or high with heterogeneous concentrations. Then, for the meta-analysis we decided to classify the studies according to the WHO guidelines that consider optimal levels between 0.5–1.0 mg/L (low levels) and > 2 mg/L, as higher levels for water fluoridation13,14. The number of people with low IQ and the total number of participants in each case group (high fluoride) and control group (low fluoride) were included to calculate the odds ratio with a 95% confidence interval (CI).

The heterogeneity among studies was tested using I2 index (p-value < 0.05 was considered statistically significant). A fixed and random effects models were used in the analyses of the studies. The final choice regarding effects model was performed based on I2 index16. The forest plots were generated for each analysis and an alpha of 0.05 was adopted as the cut-off point for significance.

The publication bias was assessed through a comprehensive analysis of Egger’s test, and Funnel Plot Visual interpretation17. A p-value < 0.05 indicated a likely publication bias across the studies. The Jamovi statistical software (version 1.6, Sydney, Australia) was used to generate figures and to run the test.

A sensitivity analyses was used to explore the influence of each study in the pooled meta-analysis or publication bias results. This analysis was adopted in case of substantial or considerable (50 to 100%) heterogeneity, or significant publication bias (p < 0.05). This evaluation was performed by manually omitting one study at time, one by one, and verifying its impact in the final results15.

Level of evidence assessment–GRADE

The level of evidence was determined using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. This tool provides a structured process for developing and presenting evidence summaries that measure the quality of evidence to confirm or reject hypotheses in systematic reviews18.

GRADE has four levels of evidence –decreasing from low to very low, moderate, and high; depending on whether issues such as risk of bias, inconsistency, inaccuracy and publication bias are severe, very serious or not serious. Although, observational studies begin as poor-quality evidence, the level can increase from low to high if the magnitude of the effect is large or very large19.

Consent for publication

All the authors are in accordance with the publication.

Results

Search results

Based on the database searches, 4,024 studies were found. Three studies were included after manually searching in the reference lists20,21,22. After the removal of duplicate studies (714), 3,310 articles remained and were analyzed by title and abstract according to the eligibility criteria. A total of 3,260 were excluded, and 50 studies remained for full text reading. Fifteen studies were excluded because, when assessing IQ, they did not compare between high and low F concentrations, four contained co-exposure of F and other concomitant elements, and four used the same sample from studies included in this systematic review (Table A.3). Thus, 27 studies were elected, which underwent quality assessment of the risk of bias. The summary of the selection process is shown in Fig. 1.

Figure 1
figure 1

Flow diagram of databases searched according to PRISMA guidelines. PRISMA, Preferred Reporting Items for Systematic Review and Meta-Analysis.

Characteristics of the studies

The 27 included studies were characterized as observational, cross-sectional type, among which 26 were analytical studies, and one was descriptive23. The age group investigated included individuals from 6 to 18 years of age. Most of the articles evaluated F exposure due to ingestion of naturally fluoridated water. Only one study analyzed populations exposed to F by burning coal24.

The F concentrations in drinking water categorized as low exposure in the selected studies ranged from 0.19 ppm25 to 2.01 ppm26, while high doses ranged from 1.5 ppm23,27 to 8.3 ppm28. Some studies considered a third intermediate category23,29,30,31,32,33, which ranged from 0.5 ppm30 to 3.1 ppm33. One study classified exposed groups according to four concentration levels, ranging from < 0.7 ppm to > 4.0 ppm21. One study did not provide high and low dose reference concentrations20 and the study developed with F exposure from coal burning24 reported only the content of F related to high exposure (0.0298 mg/m3).

Regarding the source of sample used for the estimation of F exposure, most of the studies evaluated the drinking water alone20,21,24,25,29,30,31,34,35,36, followed by measurement in both drinking water and urine of participants20,22,26,27,28,37,38,39,40,41, and in the air24. Some studies did not quantify the F levels, however determined the concentration from data available from national databases or electronic addresses32,42,43. Three studies did not report the process used, nor the source consulted to establish F exposure23,44,45, and mentioned the use of conventional chemical tests only without specifying the method for F46.

In relation to the parameters of cognitive assessment, in 26 studies the IQ was used to estimate a comparative intellectual and stabilizing capacity between the high and low groups, whereas one study23 evaluated neurological manifestations such as headache, insomnia, lethargy, polydipsia and polyuria. The tests applied for IQ evaluation varied among the studies, being the "Raven's Standard Progressive Matrices test"20,21,27,29,30,31,34,38 and the "Standardized Chinese Test"22,28,37,40,41,44,46 the most used, followed by "Raven's Color Progressive Matrices"25,32,33,43, "Stanford-Binet Intelligence Scale"26,39, "Chinese Binet IQ Test"24, "Prueba Raymond B Cattell"35, "Wechsler Preschool Guidelines and Primary Intelligence Scale (WPPSI)"36, "Rui Wen Prueba Handbook"45 and "Form Board Test"42. The descriptive study23 used as a tool for data collection, interviews with questionnaires prepared by qualified professionals.

In the analysis of results, 23 studies showed a statistical difference between exposure to high and low doses of F. In three studies a comparison of intellectual skill among the groups exposed to high and low F concentrations was not statistically significant30,34,46. The descriptive study23 reported the presence of alterations related to neurological manifestations in some group in high dose exposure (1.5–6.4 ppm). Table 1 shows details of all the characteristics of the included studies.

Table 1 Data extraction from included studies.

Risk of bias analysis

The quality of the studies was assessed based on risk of bias, confounding factors, and random occurrence. Eight studies were considered of low methodological quality and were classified as high risk of bias20,22,23,30,34,36,39,45. The other 19 articles were classified as low risk of bias and, despite having some problems, they were not serious enough to be classified as high risk of bias. In the "Study sample representative" domain, the problem items were the "Sampling method", "Sample size" and "Entry criteria/exclusion". In the "Sampling method", nine studies presented major problems (++) mainly related to the convenience sample. In the item "Sample size", two articles presented major problems, because they did not make a sample calculation and the sample was smaller than 50 participants. In the entry criteria/exclusion section, only two studies presented a minor problem due to co-exposure to arsenic and iodine.

For "Control group acceptable", the item "Definition of controls" presented two articles with minor problems (+) because they did not report the F concentration of the control group. Regarding "Matching/Randomization", nine studies did not mention randomization, but did the matching, being considered as a minor problem (+). However, two articles did not mention randomization or pairing, being considered as a major problem (++).

The domain "Quality of measurements and outcomes", the item with the most serious issues was the "Blindness", as 18 studies did not adopt any kind of blinding, followed by "Quality control", with eight studies that did not describe the measurement method. Table 2 presents the risk assessment of bias of the 27 eligible articles.

Table 2 Quality assessment of the studies included in the review.

Level of evidence

The assessment of the level of certainty of the evidence was conducted through a narrative synthesis following the GRADE parameters for systematic reviews. The level of evidence of the studies was very low, both for studies evaluating IQ impairment and for the only study assessing other neurological manifestations, due to observational nature of the study protocol, as well as due to methodological inaccuracy. For the studies that evaluated IQ impairment, a serious risk of bias was observed. Regarding the study evaluating neurological manifestations other than IQ impairment, it also presented a highly suspicious publication bias, given that the measurement of these manifestations was done by the application of a questionnaire with unknown information about validation and without precise details for their reproduction.

Although, a narrative synthesis does not provide precise estimates, nor measure of effects, it was concluded that the level of evidence of the studies taken together is not strong enough to affirm that the high F exposure may produce a neurological damage in children. Results are represented in Table 3.

Table 3 GRADE evidence profile table.

Quantitative analysis

Ten studies21,25,28,30,31,34,35,37,38,40 that provided sufficient data for the analysis were included in the meta-analysis. From the studies selected, it was only possible to run the meta-analysis for IQ, due to the scarcity of investigations on other neurological aspects. People exposed to high F levels accounted for 1383 individuals, and to low levels, 1556 individuals. The results showed an association between high F exposure and decreased IQ (OR 3.88; 95% CI 2.41–6.23; p < 0.00001; I2 = 77%), demonstrating a deleterious effect of high levels of F over IQ (Fig. 2). This evidence was qualified as very low (Table 3). It was observed a considerable heterogeneity (I2 = 77%, p < 0.00001, Fig. 2) and significant publication bias (p < 0.00001) (Fig. 3).

Figure 2
figure 2

Forest plot of meta-analysis for ten studies (I2 = 77%). The association between chronic exposure to fluoride and cognitive deficit. CI, confidence interval; M-H, Mantel–Haenszel method. The figure was created using Review Manager v. 5.3 software (https://training.cochrane.org).

Figure 3
figure 3

Funnel plot of meta-analysis for ten studies (I2 = 77%). The association between chronic exposure to fluoride and cognitive deficit (p < 0.001). The figure was created using Review Manager v. 5.3 software (https://training.cochrane.org).

After performing the sensitivity analysis, three studies were identified as a possible cause of publication bias25,30,31, with the detection of a low risk of publication bias (p = 0.25; Figure A, Supplementary material 5) after the exclusion of these studies. However, a considerable heterogeneity was still observed after sensitivity analysis. When the three studies previously identified as possible reason for publication bias were removed from the meta-analysis, the I2 index decreased from 77 to 62% (Table B, Supplementary material 5). Therefore, the interpretation of the meta-analysis results after sensitivity analysis is still limited due to the considerable heterogeneity across the studies.

Discussion

This systematic review and meta-analysis gathered evidence showing that, following the WHO classification of low and high levels in the drinking water, exposure to low/adequate water F levels is not associated with any neurological damage, while exposure to high levels is. The level of evidence for this association, however, was considered very low. Furthermore, the IQ deficit was reported in the marjority of the primary studies identified, and only one article reported others neurological manifestations.

Systematic reviews aim to gather all the available evidence in the literature to answer a guiding question according to predefined eligibility criteria. It uses a well-designed, explicit and systematic methodology to minimize bias, generating reliable results, answers to raised questions and conclusions about certain problems, thus helping in decision making47,48. According to the Cochrane systematic reviews manual, this type of study has as main characteristics: clear and well-defined objectives that follow the pre-established eligibility criteria; the methodology must be easily reproducible, well designed and transparent; the survey must be comprehensive, meeting all the necessary eligibility criteria; the included studies must have their results evaluated for validity, assessing the risk of bias; all characteristics of the studies, including their results, must be presented.

Combined with qualitative synthesis, the meta-analysis reunites the quantitative data of the elected studies, thus being able to estimate the effects of the evidence, whether or not it can confirm the individual results of the elected studies of the systematic review15. After these qualitative and quantitative analysis, the GRADE tool helps to compile all the obtained results in the systematic review in order to promote an analysis of evidence and its recommendations for an evidence-based practice. This assessment has four levels of recommendations: very low, low, moderate and high.

Despite some variations in the literature on the F concentrations in the drinking water regarded as both effective and safe, it has been often reported that 1 mg/L is the “optimum level”13,14 and, as previously mentioned, the concentrations may be adjusted at 0.7–1.2 mg/L, depending on climate, local environment and other sources of F6. In line with the above-mentioned observation, the 2017-updated edition of the WHO guidelines for drinking-water quality suggested that F levels must be within the 0.5–1.0 mg/L range in order to promote maximum caries-preventive benefits with minimum risk of dental fluorosis13,14. This justifies the threshold set in the present study to dichotomize F exposure into “low” and “high” categories. This is also more relevant from a public health standpoint, given that artificially fluoridated water facilities must comply with the aforementioned levels, whereas higher concentrations are usually related to focal points in areas in which F is naturally present in the water.

The mechanisms by which F can interfere with child neurodevelopment are associated with damage to nervous cells. Evidences suggest that chronic exposure to F in the prenatal and neonatal periods is potentially toxic to the metabolism and physiology of neuronal and glial cells, which leads to changes in processes related to memory and learning9,49,50,51. This is due to the ability of F to cross the placental and blood–brain barriers, especially in developing individuals, who are more susceptible to changes caused by F because they have greater permeability of this barrier and defense mechanisms that are still immature49,52,53,54. In addition, F can influence membrane ion channels, through interaction with the Ras protein, leading to changes in ion flow and nerve cell volume, which can lead to metabolic disturbances, changes in cell function and modification transmission of nerve impulses49,55.

According to the WHO, neurological disorders are multifactorial clinical conditions that may be characterized by signs and symptoms with different aspects, as physical functioning limitations, behavioral problems, psychosocial limitations, communicative and cognitive impairments56. Among these features, our study focused on cognitive functions due the approach performed by the elected studies. In this sense, it is important to highlight that several techniques, tests and protocols to evaluate the cognitive functions are available57, once this central function may be characterized as a complex reunion of processes that aims to classify, recognize and comprise information through reasoning, learning and executing them58.

In this context, aiming to evaluate cognitive functions of people exposed to F, the researchers from the elected studies used IQ test varieties as previously mentioned and due to that, different abilities of cognitive functions are evaluated, not having standardized and homogeneous parameters among the tests. Matzel and Sauce59 suggested a hierarquical model of intelligence, in which the general ability, i.e., the intelligence is a result from several domains of ability, as reasoning, processing speed, memory and comprehension, which are evaluated by different methodologies. Stanford-Binet IQ method, e.g., includes tests of different abilities, which estimate the intelligence after and aggregate performance across the tests. While, the Raven's Standard Progressive Matrices is based on a unique ability and in the test, the main feature is that there is an increase on the difficult of perceptual reasoning60.

The studies included individuals with ages ranging from 6 to 18 years of age. From epidemiological point of view, this is not interesting, because intelligence tests were applied to participants with very different degrees of neurodevelopment. Data extraction indicates that all eligible studies were concentrated in the Asian continent. These data reflect the remarkable influence of the geographical aspect on the epidemiology of clinical manifestations resulting from F exposure. The availability of naturally occurring high concentration fluoridated compounds in drinking water used by rural communities increases their susceptibility to the adverse effects of F. Considering this aspect, a systematic review proposed to evaluate the neurotoxic effects of F from studies conducted specifically in the Chinese territory9, due to the high number of publications on this subject that sometimes has restricted dissemination due to language barrier.

The methodological quality analyses of the studies detected serious problems related to the quality of sample, measurements and outcomes. There were also problems related to the absence of randomization, sample size calculation and blinding, which increase the risk of bias and limit the inference capacity of studies on the neurotoxic effects of F.

Most studies did not assess the individual level of exposure to F, i.e., by urinary F samples. The F concentration in drinking water in regions with high and low F levels was the most reported method. However, there were also studies that used secondary data or did not report the F content in water, which significantly compromises the findings of these investigations. Furthermore, it should be considered that some studies used creatinine-adjusted urinary F concentrations to account for urinary dilution which may cause an additional bias61, since renal dysfunction in children may be associated with neurocognitive impairments62.

Another point worth mentioning is the increased risk of water contamination by other substances in the areas of naturally occurring F. Although some authors consider it unlikely that the effects attributed to F neurotoxicity can be triggered by other contaminants9, it is possible that the absence of control in relation to these parameters generates confounding factors. To ensure the balance of electrical charges, water with higher concentrations of endemically occurring F must contain higher concentrations of positive ions to balance out the F. This may affect the pH of the water or result in greater contamination by electropositive water contaminants, for example aluminum, zinc, arsenic, lead, mercury, and other metals and metalloids61.

Following the parameters of GRADE, the level of evidence was considered as very low even for individuals exposed to high doses of F, due to imprecision problems (Table 3). This result is related to the types of studies included in this systematic review, as the level of evidence in observational studies starts at a very low level, which can only increase if the study meets the other criteria of this evaluation. Despite the large numbers of participants in the analysis, detected problems of inaccuracy can be elucidated by possible methodological disparities in the studies that might interfere in the intelligence quotient (IQ) analysis and neurological manifestations.

Another important limitation to be considered is the predominance of cross-sectional studies in this systematic review. Cross-sectional and ecological studies do not allow the establishment of cause-and-effect relationships. They are useful for investigating the effect of environmental exposures related to acute processes, as the time interval between exposure and measurement of physiological parameters is close. Therefore, cross-sectional studies are not the ideal model to assess the effect of chronic F exposure on a parameter such as human intelligence61. Longitudinal studies, on the other hand, are considered the most appropriate to assess chronic conditions, as by allowing the long-term follow-up of individuals, they make it possible to infer causality63.

To sum up, despite the elected studies showed an association between F exposure and IQ deficit, this association was only observed for individuals exposed to levels above those regarded as safe, and the evidence certainty for this association is very low. Within the above-mentioned limitations, the results of the present systematic review demonstrated that exposure to fluoridated water at levels recommended by the WHO can be considered as safe, as it is not associated with IQ impairment.

Conclusion

Although the findings of this meta-analysis indicated that IQ damage can be triggered only by exposure to F at levels that exceed those recommended as a public health measure, the high heterogeneity observed compromise the final conclusions obtained by quantitative analyses. Thus, based on the evidence available on the topic, it is not possible to state neither any association or the lack of an association between F exposure and any neurological disorder.