Ambient Air Pollution Exposure Assessments in Fertility Studies: a Systematic Review and Guide for Reproductive Epidemiologists

We reviewed the exposure assessments of ambient air pollution used in studies of fertility, fecundability, and pregnancy loss. Comprehensive literature searches were performed in the PUBMED, Web of Science, and Scopus databases. Of 168 total studies, 45 met the eligibility criteria and were included in the review. We find that 69% of fertility and pregnancy loss studies have used one-dimensional proximity models or surface monitor data, while only 35% have used the improved models, such as land-use regression models (4%), dispersion/chemical transport models (11%), or fusion models (20%). No published studies have used personal air monitors. While air pollution exposure models have vastly improved over the past decade from a simple, one-dimensional distance or air monitor data to models that incorporate physiochemical properties leading to better predictive accuracy, precision, and increased spatiotemporal variability and resolution, the fertility literature has yet to fully incorporate these new methods. We provide descriptions of each of these air pollution exposure models and assess the strengths and limitations of each model, while summarizing the findings of the literature on ambient air pollution and fertility that apply each method.


Introduction
Recent reviews of the literature regarding the associations of ambient air pollution with fertility and pregnancy loss have addressed important issues for epidemiology, including the synthesis of key results in human [1][2][3] and animal studies [4][5][6], the comparison of fertility in the general population, and those attempting to conceive through assisted reproductive technology (ART) [1,2,5,6], considerations of biological mechanisms [4,6], and the assessment of the relevant timing of exposure for these outcomes [3,4]. These reviews highlight the associations that have been found between the criteria air pollutants identified by the U.S. Environmental Protection Agency (EPA), including ground-level ozone (O 3 ), particulate matter (PM), carbon monoxide (CO), sulfur dioxide (SO 2 ), nitrogen oxides (NO x ), and lead (Pb) [7], and fertility and pregnancy loss. However, none of these reviews assesses or compares the air pollution exposure assessment methods used in each study, which may account for some of the variability in findings. As a result, findings in the field are difficult to contextualize, and comparisons across studies are limited. Further, the majority of studies use proximity models or surface monitor data to assess air pollution exposure, while the use of newer and more sophisticated exposure assessment methods is rare.
In this systematic review, we aim to summarize and compare the results from studies on criteria air pollutants and spontaneous fertility and pregnancy loss in humans within and across distinct exposure assessment techniques and recommend air pollution exposure assessment methods for future research in this subject area. Here, we consider eight commonly used approaches to air pollution exposure assessment: (1) proximity models, (2) surface monitor data,

Air Pollution Modeling in the Fertility Literature
Overall, we found that the majority of studies on air pollution and fertility have not yet incorporated new methods of exposure assessment for ambient air pollution, which offer improved predictive accuracy and precision, increased spatiotemporal variability and resolution, and incorporation of physiochemical properties. These methods have been applied to outcomes such as mortality [8] and lung cancer [9], resulting in important contributions to our understanding of air pollution health effects. However, the majority of fertility literature has yet to fully incorporate these methods. From the most simple to most computationally or logistically intensive, models used to assess air pollution exposure in fertility studies include (1) proximity models (9% of studies reviewed), (2) surface monitor data (60%), (3) land-use  Table 1). We are not aware of any fertility studies that use kriging and other pure geospatial methods, pure satellite models, or personal monitors to assess air pollution exposure, which all have advantages that we discuss below. Individual study exposures, study design, and findings organized by the outcome and exposure assessment method are presented in Table 2.

Proximity Models
Proximity models are the simplest method for estimating air pollution exposure in epidemiologic studies and are used as the air pollution exposure assessment in 9% of fertility studies. Proximity models estimate pollution exposures as the distance from a pollution source to a fixed participant location, allowing the estimation of individual-level exposures at that location [55,56]. While proximity models are often inexpensive to implement as they do not require the monitoring or estimation of pollutants, they have several limitations. First, these models are a proxy but do not measure pollutants and do not typically account for pollution dispersion, chemical transformation of pollutants, meteorologic effects on pollutants, land-use, and topography. Second, while these models may offer insight into a recommended distance needed from roadways to communities to mitigate health consequences for city planning purposes, proximity models are non-specific and do not provide data on individual pollutants and disease etiology [55]. The use of proximity models is most appropriate when monitoring data are sparse and the study domain size is small (i.e., neighborhoods or small cities), such that the spatial resolution of available models does not allow for enough variability in exposure estimates.
In the fertility literature, four studies have used proximity models to estimate exposure to pollutants generated by vehicle exhaust [10,11,19,35] (Table 1). Three of these studies defined the exposure as the distance from participant residences to any Class A roadway, defined by class codes on the U.S. Census [10,11,35]. Class A roadways include three categories: Class A1 (primary roads, typically interstate highways, with limited access, the division between the opposing directions of traffic, and defined exits), Class A2 (primary major, non-interstate highways, and major roads without access restrictions), and Class A3 (smaller, secondary roads, usually with more than two lanes) [10]. In one study, the distance to the roadway was calculated from participants' residential addresses to the nearest Class A road segment, and results showed that living closer to a Class A roadway was associated with increased risk of infertility, defined as failure to conceive with at least 12 months of unprotected intercourse [10]. In another study, Mendola et al. (2017) estimated exposure as the distance from the participants' residences to the nearest Class A major roadway, finding that the likelihood of conceiving rose by 3% for every 200 m further between the couples' residence and the major roadway [11]. A third study that also classified the exposure as the distance from the participants' homes to the nearest Class A roadway found that women living within 100 m of a Class A roadway have a moderately higher risk 1.75 (95% CI 0.82-3.76) of stillbirth than those living over 200 m away from a Class A roadway [35].
More complex proximity models can distinguish analyses by road subtype (Class A1, A2, A3, and others) or incorporate traffic flow and land use [55]. The third study using proximity models incorporated traffic flow by assigning the annual average daily traffic (AADT) counts (according to the California Department of Transportation) to principal arterial interstates, principal arterial freeways and highways, minor arterials, and major and minor collectors [19]. The authors then estimated residential traffic exposure for participants based on the traffic counts for all road segments within a 300-m radius of each woman's residence [19]. In the full sample, these traffic measures were not associated with increased risk for spontaneous abortion, but in subgroup analyses limited to African-American women and non-smokers, these measures were associated with increased risk for spontaneous abortion [19].

Surface Monitor Data
Surface monitor data are used to define air pollution exposure in 60% of fertility studies (Table 1). These data are often the product of national regulatory networks, like the Air Quality System from the United States Environmental Protection Agency (U.S. EPA) [57]. Estimating exposures through monitor data has many benefits: they are readily accessible to researchers, use consistent monitoring methods, include refined temporal measures (often hourly), and offer long-term historical data [58]. For example, the U.S. EPA's Air Quality System offers monitoring data for some pollutants since 1980 [57]. Consequently, surface monitor  [33] data are particularly useful for examining changes over time and thus could be in projects with a case-crossover design. Despite these strengths, monitor data are sparse, as most counties in the USA do not have a regulatory air pollution monitor [56]. Fertility studies that use surface monitor data to estimate exposures have taken a variety of approaches to address this limitation (Fig. 2 [49], or zip code centroids [29,48] or postal codes of participant residence [50] Of the studies using surface monitor data, 12 of the 26 studies (46%) assessing NO x (including NO and NO 2 ) found an association between high exposure to NO x and a poor fertility outcome (including low fertility, low fecundability, and high pregnancy loss) (Fig. 3). Associations for high exposure to pollutants and poor fertility outcomes were also found in 13 of the 20 studies that assessed SO 2 (65%), 6 of the 17 studies that assessed CO (35%), 7 of the 14 studies that assessed ozone (50%), 9 of the 16 studies that assessed PM 2.5 (56%), 8 of the 12 studies that assessed PM 10 (67%), and in the one study that assessed PM 10-2.5 (100%) (Fig. 3). Nonetheless, these studies varied widely in defining windows of exposure (Table 2), and significant associations were not found in every window of exposure assessed in each study.

Land-Use Regression Models
Just 4% of studies assessing the effect of air pollution on fertility use land-use regression (LUR) models as the exposure assessment method. LUR models, which use multivariable linear regression models to estimate the spatial distribution of pollutant concentrations [58,59], were first used for air pollution modeling in 1997 [60]. These models incorporate air monitor data, emissions data, land-use data (traffic, population density, etc.), and meteorological data (altitude, wind speed) to predict pollutant concentrations at point-specific locations where there is not monitor-based data [58,59,61,62]. Monitoring data used for LUR models are usually based on ambient monitoring networks [57], but some city-wide models are developed based on study-specific temporary networks of monitors. The strengths of LUR models include their ability to (1) predict at unmonitored locations at an address or point-level spatial scale, (2) use readily available ambient monitoring data, (3) use readily available GIS Fig. 2 Methods used to assign exposures with surface monitor data Fig. 3 Proportion of studies that found an association between pollutants and poor fertility outcomes. a Poor fertility outcomes are defined as low fertility/fecundability or high pregnancy loss. Blue percentages do not add to 100% because proximity models are not represented here and two studies use two types of models. See Table 1 for breakdown of the percentage of studies using each model datasets for covariates (4), and deliver a meaningful interpretation of coefficients. Nonetheless, these models have a few key limitations. Primarily, while LUR models can in theory predict at any spatial and temporal resolution, the predicted variability is often underestimated. Mobile monitoring studies have shown significantly more variability in air pollution than LUR models can predict [63,64]. The ability of LUR models to capture local variability is tied to the availability and quality of local covariates, like construction sites and gas stations, which are not often readily available [63,64]. Nonetheless, some researchers have attempted to address this limitation with information from Google Place, with mixed results [65].
In the fertility literature, only two studies have used LUR models to estimate ambient air pollution exposures. Nieuwenhuijsen et al. (2014) assessed how the annual average concentration at the census tract level for NO 2 , NO x , PM 10 , PM 10-2.5 , and PM 2.5 affected risk estimates for fertility in women and found that only PM 10-2.5 is significantly associated with a reduced fertility rate [14]. Zhang et al. (2019) used a LUR model with a spatial resolution of 1 km 2 grid to assess how exposure to PM 2.5 during three pre-conception and seven post-conception windows of exposure influenced the odds of clinically recognized early pregnancy loss (CREPL) [30]. This study found an association between higher PM 2.5 exposure and CREPL in two of the windows of exposure: during the second week post-conception and during the entire 1-month period post-conception [30]. Overall, of the studies that have used LUR models, neither of the analyses assessing NO x (NO 2 and NO x ), nor the one study assessing PM 10 were associated with poor fertility outcomes (Fig. 3). One of the two studies assessing PM 2.5 and the one study assessing PM 10-2.5 were associated with poor fertility outcomes. No studies using LUR models have assessed exposure to SO 2 , CO, or ozone and our fertility outcomes of interest.

Dispersion and Chemical Transport Models
Eleven percent of fertility studies use dispersion and chemical transport models to assess air pollution exposure. Dispersion models (DMs) mathematically simulate air pollution concentrations through physical, fluid dynamical processes of the transport, and dispersion of air pollutants in the atmosphere from known or estimated emission inventories [61,66]. Transport is accounted for by incorporating meteorological data such as wind speed, velocity, and atmospheric boundary layer heights. Chemical transport models (CTMs) are unique from dispersion models in that they simulate processes of chemical transformation, diffusion, and deposition to predict chemical concentrations in the atmosphere [61]. In practice, these models are often combined into dispersion/chemical transport models (D/CTMs) [61], which are deterministic models that incorporate principles from physics and chemistry and include data on emissions, meteorology, topography, and land-use to simulate how pollutants are distributed into the atmosphere [58,66]. DMs, CTMs, and D/CTMs can account for a variety of types of emission sources, such as point (waste sites, industrial facilities, etc.), area (also waste sites and industrial facilities), and line sources (roads), increasing their accuracy of prediction [55]. Mathematically, these models are formulated as a series of differential equations on a 4-dimensional space/time grid (e.g., longitude, latitude, elevation, and time).
While these models attempt to improve upon LUR models through the addition of physical and chemical processes, their performance is highly dependent on the quality of the input data [58]. Some comparisons of the performance of LUR models and D/CTMs for NO 2 have found that D/CTMs performed better than LUR models for monitored and modeled concentrations on multiple sites, while other studies have seen moderate to good correlations between LUR models and D/CTMs [61]. Compared to LUR models, D/CTMs often produce models with higher temporal resolution but lower spatial resolution [67] since emission and meteorology predictions are more precise across time than space. Moreover, the computational intensity of D/CTM scales directly with the spatial resolution. D/CTMs provide the benefit of not requiring a dense network of surface monitors for exposure estimation [62]. Further, D/CTMs incorporate explicit physics and chemistry so that they can better predict the secondary formation of pollutants. Nonetheless, these models have a few important limitations. First, D/CTMs require a larger quantity and variety of input data than simpler models, which can be expensive, time-consuming, and need D/CTM-specific expertise to prepare [55,58,62]. D/CTMs are suitable for research questions that require a refined temporal resolution, a large spatial domain (e.g., national or global), and circumstances in which changes in emissions are investigated [67].
In the fertility literature, D/CTMs have been used to estimate exposures with monthly averages at 20 × 20 m grid points [51], with monthly averages at a participant's place of residence [31], with weekly averages for an entire city district [22], at daily averages for postcode centroids [52], and at the municipality level (timing not specified) [53]. Of these studies, one of the five studies assessing NO X found an association between greater exposure to NO X and a poor fertility outcome (including low fertility/fecundability and high pregnancy loss) (Fig. 3). More specifically, this study found an association between higher NO 2 exposure during the 16th gestational week and a lower relative risk of live birth [22]. Of the three studies that assessed PM 10 , only one found an association with a poor fertility outcome, specifically a higher risk of miscarriage among participants without prior miscarriages [31]. Only one study assessed ozone, finding that high ozone exposure in the first and second trimester was associated with a higher risk of stillbirth [51]. The one study that assessed exposure to SO 2 did not find a significant association with stillbirth [53], and the one study that assessed exposure to PM 2.5 did not find an association with stillbirth [51]. No studies that assessed exposure to CO or PM 10-2.5 used a D/CTM.

Fusion Models
Fusion models combine any of the above methods and may incorporate monitoring data, emissions, meteorology, landuse, satellite imagery, or model components from LUR models, machine learning, and D/CTMs [55]. For example, a D/ CTM may be used as a covariate in a LUR model that incorporates the physical and chemical properties of the D/CTM to the LUR model, but effectively calibrates the D/CTM to ground-based monitoring data. Moreover, LUR models usually benefit from the incorporation of satellite imagery data, particularly in data sparse regions, or from the integration of a LUR model into a kriging model [68]. Generally, LUR models and D/CTMs are improved by incorporating them into a kriging of the Gaussian process model.
In comparison to other air pollution exposure models, fusion models typically reduce prediction variance and bias due to their ability to capitalize on the advantages of their component parts [55,69]. Nonetheless, fusion models have a few important limitations. First, fusion models may be limited by the same concerns of their constituent model parts; for example, when combining a model with surface monitor data, the fusion model will be limited by the density of the surface monitor network [55]. Second, as models use varying temporal and spatial scales, integration can be both complex and computationally challenging [55].
The fertility literature includes a variety of fusion models to estimate air pollution exposures, including those that integrate surface monitoring data and D/CTMs [10,15,18,32,34,54], satellite measures and D/CTMs [33], and surface monitoring data, satellite data, and D/CTMs [16,17]. Of the studies using fusion models, 1 of the 5 studies (20%) assessing NO x (including NO and NO 2 ) found an association between high exposure to NO x and a poor fertility outcome (including low fertility, low fecundability, and high pregnancy loss) (Fig. 3). Associations for high exposure to pollutants and poor fertility outcomes were also found in: all three studies that assessed ozone (100%), 8 of the 10 studies that assessed PM 2.5 (80%), 3 of the 5 studies that assessed PM 10 (60%), and both of the studies that assessed PM 10-2.5 (100%).
Specifically, three studies identified an association between PM 2.5 and a reduced fertility rate [16][17][18], while another found associations between PM 10 , PM 2.5 , and particularly PM 10-2.5 and higher frequency of infertility [10] ( Table 2). Another study found that NO 2 and O 3 exposures are associated with a lower fecundability odds ratio (FOR), but that PM 10 is associated with a greater FOR [15]. Three studies that use fusion models found associations between PM 2.5 and miscarriage [32][33][34], one found that higher O 3 and PM 2.5 exposure over the whole pregnancy are associated with a higher HR for pregnancy loss [34], one found that in a case-crossover analysis exposure to PM 10 , PM 10-2.5 , and PM 2.5 in the year prior to pregnancy were associated with higher odds of spontaneous abortion (SAB) [32], and one found that PM 2.5 exposure was associated with higher odds of SAB. Of the studies that investigated stillbirth, one found that O 3 exposure in various windows of exposure during pregnancy was associated with stillbirth [54], while another found that PM 2.5 was associated with higher odds of stillbirth [33].

Other Methods of Exposure Assessment Not Applied in Studies of Fertility
To our knowledge, air pollution exposure estimates based purely on kriging models or satellite data, and estimates based on personal monitor data, have not been used in fertility studies.
Kriging, also known as Gaussian process regression, represents a common and well-established method for geostatistical analysis. Like inverse distance weighting, kriging interpolates values at unmeasured locations based on values from measured locations like a surface monitor. However, unlike inverse distance, Kriging provides a theoretical basis for the weights where interpolation is developed from spatial correlation, the integration of covariates, and by quantifying and incorporating predictive uncertainty [70,71]. Kriging may be estimated through classical, maximum likelihood, and Bayesian approaches. The simplest approaches, simple and ordinary kriging, assume no spatial patterns for the estimation of the mean and that all spatial variation is a component of the error. However, this limits the inclusion of covariates and is generally less flexible [71]. Kriging approaches incorporating ancillary information through the mean trend, referred to as universal kriging, kriging with an external drift, or land-use regression kriging, offer greater flexibility and more realistic interpolations of values than pure kriging or IDW. An early publication on kriging by Oliver et al. (1992) [72] includes a useful and simple description of the structure of kriging and applies it to estimate the risk of childhood cancer in UK electoral wards. Further descriptions of kriging/Gaussian processes, and their various iterations, are found in many textbooks [71,73] and are supported in many programs including ArcGIS, QGIS, R, Python, and Matlab. To our knowledge, no fertility studies have assessed air pollution using a pure kriging or other geostatistical method. Nonetheless, kriging alone has been used in other reproductive health studies, including those assessing the effects of ambient air pollution on preterm birth [74,75] and on low birth weight [76,77].
Satellite data are particularly useful in filling gaps in areas with few ground monitors or with poor D/CTM resolution. Though satellite data have not been used independently in fertility studies, they have been incorporated into fusion models [16,17,33]. Satellite data are derived from processed spectral imagery of the Earth's surface that is used to detect and classify cloud cover, greenness on the planet's surface, particulate matter, and the occurrence of gases in the atmosphere [78]. Particulate matter is measured by columnal aerosol optical depth (AOD), which is currently derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) and Multiangle Imaging Spectroradiometer (MISR) satellite instruments. An upcoming 2022 mission from the National Aeronautics and Space Administration (NASA), Multi-Angle Imager for Aerosols (MAIA), will specifically focus on producing satellite AOD data for epidemiologic studies to capture the variation of PM 2.5 and PM 10 [79]. More information on AOD measures can be found in Sorek-Hamer et al.'s (2016) review [78]. Other criteria pollutants can be measured by the TROPOspheric Monitoring Instrument (TROPOMI) on the Sentinel-5 Precursor satellite [80]. TROPOMI provides measurements for atmospheric pollutants from 2018 to near real-time at 0.01 arc degree resolution for NO 2 , O 3 , SO 2 , CO, methane (CH 4 ), and formaldehyde (CH 2 O) [80]. Despite the advantages of satellite data, interpreting the ground level concentration directly from the satellite images can be confounded by cloud cover, reflectance, and complexities of atmospheric mixing [81]. Further, depending on the satellite orbit, the spatiotemporal coverage of the atmosphere may be geostationary and temporally consistent or cover different swaths of the globe at different time intervals. Numerous algorithms have been developed to interpolate across the missing space and time values in the data. Duncan et al. (2014) provide a thorough overview of satellite data, though some of the available instruments and algorithms have changed since that publication [81].
Unlike the other exposure assessments discussed, personal monitoring devices offer the opportunity to quantify individual-level exposure to pollutants within specific microenvironments [56,82]. Personal monitors may be active monitors that use a power source, or a passive sampling technology such as a silicon wristband [83,84], some of which are able to characterize hundreds of different pollutants [85]. To date, the high cost of personal monitors, measurement accuracy and precision concerns, and the relatively high burden to the participant have limited the use of personal monitors in research [56] and restricted the sample size and duration of exposure assessment in existing studies [85]. Nonetheless, recent advances in personal monitor technology have lowered the costs and made monitors smaller and more manageable for participants, creating an opportunity for future work [85]. We are not aware of any fertility studies that have used personal monitors to assess air pollution exposures, though the pre-conception period and pregnancy may provide important areas of opportunity for the use of personal monitors in research, particularly since exposure windows may be short and discrete. In other reproductive health research, personal monitors have been used to assess acute air pollution exposure during each trimester in a study assessing the effect of air pollution on fetal growth and birth outcomes [86] and to assess exposure to polycyclic aromatic hydrocarbons in pregnant women [87].

Discussion
The aim of this project was not to review the point estimates in the air pollution and fertility literature, but rather to review the air pollution exposure assessments used in the human fertility field. Figure 3 shows a summary of the results from studies assessing the relationship between ambient air pollution and fertility and pregnancy loss. The differences in exposure/disease definitions and spatial/temporal scales of interest make general comparisons across studies difficult. However, the results from fusion models show larger total exposure variability and better spatial/temporal heterogeneity compared to surface monitor data, which should improve the power to detect effects. There were not enough studies using LUR models or D/CTMs to draw similar comparisons. More studies are needed to make claims about whether more complex models truly offer an improvement upon traditional methods used in epidemiology in assessing these outcomes.

Other Considerations for Exposure Assessment in Fertility Research
Aside from the type of air pollution exposure assessment method used, the heterogeneity in spatial and temporal resolution also makes findings in the fertility literature difficult to compare across studies.
The spatial resolution of air pollution exposure across the studies reviewed varies widely, as do the methods used to assign exposures to participants. Fertility studies that use surface monitor data have estimated pollutant exposure using three methods: (1) (without using individual participant addresses) averaging monitor data regionally or (2) (using participant addresses) using the closest monitoring station [23,27,28,[37][38][39][40][41][42][43][44] or (3) inverse weighting [29,[48][49][50] to more refined geographic areas, including zip codes, postal codes, residential addresses, and places of work (Fig. 2). Most of the studies average monitor data regionally, in regions as broad as a province. In these cases, the heterogeneity in exposure estimates rely more heavily on temporal differences than geographic differences. For studies that do use individual participant addresses to assign exposures, some assign exposures at the address itself while others assign exposures to the zip or postal code centroids of the addresses. However, assigning exposures based on zip codes can lead to exposure misclassification, as zip codes were developed as mail delivery routes, and thus, they vary widely in size and shape, and they are not tied to potentially meaningful geographic data including municipal boundaries, land-use data, or human-activity patterns [88]. This is important since location-based exposure misclassification can cause a bias toward the null [89], dampening the ability to discern associations between environmental exposures and health outcomes. Moving away from surface monitor data, air pollution output from LUR models, D/CTMs, and fusion models generally take the form of refined exposure estimates at the grid-cell level. In the fertility literature, most LUR models, D/CTMs, and fusion models provide air pollution estimates at 12-km or 20-km grids.
Existing air pollution exposure methods offer data with set geographic resolutions that provide exposure estimates at points (latitude × longitude), grids, census tracts, zip codes, cities, or other areas. While it may be tempting to try to improve the accuracy of individual exposure assessment by using an exposure assessment method that provides data at a point in order to assign exposures at an exact residence or workplace, this strategy may actually increase exposure misclassification. While participants' homes may be where they accumulate most of their pollution exposure, realistically, many people spend hours a day away from their homes at work or school, running errands, commuting, socializing, exercising, etc. Consequently, assigning exposures to the 12-or 20-km grid cell of residence may more appropriately quantify a person's daily exposure. In considering the spatial resolution of exposures for humans, epidemiologists must strike a balance to use an area that is an accurate estimate of a participant's realistic life exposure. For example, the authors of a recent miscarriage study attempted to improve the accuracy of their air pollution assessment by estimating exposures as 1/3 of the woman's exposure at her workplace and 2/3 of her exposure at her residence [23]. While these are important considerations, depending on the research question, the decision as to which exposure assessment method to use may ultimately be derived from other factors, including the pollutants of interest, the temporal resolution of a model, model type, or other characteristics. Last, these considerations are important when using existing air pollution data, but prospective studies that use personal monitors will capture daily exposures at an individual level, thereby mitigating this potential exposure misclassification.
In the studies reviewed here, the temporal resolution of the air pollution exposure ranges from yearly to daily pollutant concentrations that are used to calculate windows of susceptibility, with many studies assessing similar but slightly different exposure windows. For example, six studies that assessed the effect of pollutants on miscarriage defined different preconception windows of susceptibility [21,23,27,30,32,34], ranging from the average concentration over the 2 years prior to pregnancy [32] to the average concentration over the week prior to conception [30]. More detail on each study's windows of exposure is provided in Table 2. For each of these disparate windows of exposure, every study reviewed here averaged pollutants across their selected windows of exposure with the exception of Mahalingaiah et al. (2016) [10], who also added the cumulative average exposure over a time period to the analysis, and the time-series studies [20,22,24,36], which use daily averages. To our knowledge, no studies in the fertility literature have examined differences between the average and the peak (or extended peak) pollutant concentrations within an exposure window. This distinction may be important in order to better understand how extremes in pollutant exposures influence the biological mechanisms underpinning fertility and reproductive health.
Temporal considerations present specific challenges for reproductive health and fertility studies, which often require refined daily exposure data. While chronic exposure to air pollution may contribute to infertility or reproductive health consequences, often research on fertility requires exposure assessment methods with daily temporal data on exposures to define windows of susceptibility for ovulation, conception, gestational trimesters, discrete periods before a pregnancy loss, or other exposure windows of interest. Studies examining acute exposures may require even more refined temporal resolution, at the hourly level, which are available from some chemical transport models including the US EPA's Community Multiscale Air Quality Modeling System (CMAQ) [90]. Further, while surface monitor data have many limitations, they do use consistent monitoring methods, refined temporal measures, and long-term historical data, making them a potentially good choice for studies examining changes over time for limited geographic areas, such as city or county-scale case-crossover study designs [91].
Currently, 69% of fertility studies examining air pollution use proximity models or surface monitor data to assess the air pollution exposure. Future research in this field would benefit from the incorporation of newer and more sophisticated modeling methods with LUR models, D/CTMs, and fusion models. Embracing these new technologies could improve the predictive accuracy, precision, spatiotemporal variability, and resolution of air pollution exposure assessment methods. This shift is particularly important for fertility work and reproductive health epidemiology more broadly, which requires refined exposures to define discrete and biologically meaningful windows of exposure.
Funding This research was supported by the intramural research program of the National Institute of the Environmental Health Sciences under award number Z01ES103333 and by the Division of the National Toxicology Program of the National Institute of Environmental Health Sciences, National Institutes of Health.

Data Availability Not applicable.
Code Availability Not applicable.

Declarations
Ethics Approval Not applicable.

Conflict of Interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.