1 Introduction

Mining operations such as drilling, cutting, crushing, blasting, and material handling are inherently associated with dust generation (Suarthana et al. 2011; Perret et al. 2017). Dust negatively impacts the air quality in the mining area and causes severe health hazards such as lung disease (Onder and Yigit 2009; NAS 2018). Cumulative inhalation of respirable coal mine dust, RCMDFootnote 1 can lead to diseases such as coal worker’s pneumoconiosis (CWP), silicosis, mixed dust pneumoconiosis, dust-related diffuse fibrosis (DDF), and progressive massive fibrosis (PMF) (Schatzel 2009; NAS 2018). During the last 2 decades, lung diseases among coal miners have been resurged in several countries, especially major coal producer countries (Rahimi 2020). Owing to the lack of reliable and consistent statistics, estimation of the global prevalence of lung diseases related to coal mine dust is difficult (Shekarian et al. 2021a).

In the United States, after the implementation of the interim coal mine dust standard of 3.0 mg/m3 in 1970, and the final standard of 2.0 mg/m3 in 1972, the prevalence of CWP and concentrations of coal mine dust declined substantially (Suarthana et al. 2011; MSHA 2014). However, a resurgence in disease incidence has been observed since the mid-1990s (NAS 2018). A study conducted by National Institute for Occupational Safety and Health (NIOSH) in 2005 reported a rapidly progressive disease among coal miners. This study revealed that among the miners with pneumoconiosis, 35% diagnosed with rapidly progressive disease including 15% with the most severe and often lethal form (i.e., PMF) (Antao et al. 2005). Medical surveys further revealed that relatively young miners, who have spent their entire employment in the modern dust control regulations, are being affected (Perret et al. 2017; Amandus and Piacitelli 1987). Such rapidly progressive pneumoconiosis has been specifically observed in the Appalachian coal region (Laney et al. 2010; Gamble et al. 2012; Blackley et al. 2014; Dwyer-Lindgren et al. 2017).

The geographical clustering of pneumoconiosis has been speculated to be mainly due to mining thin-coal seams in this area, necessitating cutting a significant amount of roof and floor rock. These out of seam materials are mostly composed of silica (i.e., quartz) and silicate minerals (cyclosilicates, phyllosilicates, tectosilicates, etc.) (Johann-Essex et al. 2017), which have been associated with severe lung diseases (Schatzel 2009). Other potential factors such as duration and level of exposure, mine operation type, coal and rock-strata geological conditions, dust characteristics (i.e., size, shape, mineralogy, elemental content), dust mitigation techniques, mine size, coal rank, and advancements in cutting technologies have also been suggested as contributing factors to the recent unexpected trend (Antao et al. 2006; Laney et al. 2010; Laney and Attfield 2010; Gamble et al. 2012; Blackley et al. 2014; Johann-Essex et al. 2017; Graber et al. 2017; Dwyer-Lindgren et al. 2017; Sarver et al. 2019a, 2020; Shekarian 2020; Shekarian et al. 2021a, 2021b).

The recent resurgence of coal mine dust lung diseases has raised concern in the scientific and regulatory community. To address the concern, Mine Safety and Health Administration (MSHA) issued the 2014 dust rule changing the RCMD measurement technology, allowable limits, and sampling protocol for RCMD exposure (NAS 2018). However, many researchers have questioned whether the reduced limit would actually target the root problem (e.g., Amandus and Piacitelli; 1987; Antao et al. 2005). A recent report convened by the National Academies of Sciences, Engineering, and Medicine (NAS) highlighted the need for research efforts with respect to respirable coal dust including characterization and deposition, sampling protocol, and monitoring strategies for controlling miner’s exposure to RCMD (NAS 2018).

This study was motivated by recommendations number 8, 10, and 12 of the NAS report, which are associated with achieving a greater understanding of the relationships between mining activities and technology changes in coal extraction and increases in coal worker’s diseases. The objective of this study is to determine the effect of several factors, namely, mine size, coal rank, mine operation type, geographical location, and coal seam thickness to the prevalence of CWP. To achieve this objective, a detailed statistical analysis of the relationship between the rate of CWP cases and the potential contributing factors was conducted.

2 Materials and methods

2.1 Data acquisition

A comprehensive dataset was extracted from the MSHA accident/injury and MSHA employee/production. The number of CWP cases, mine ID (as the unique key for this data), and mine operation type were obtained from the MSHA accident/injury. The number of employees (as an indicator for operation size), coal production rate, number of coal mines, coal seam thickness, geographic location, state, county, coal rank, and mine ID were collected from MSHA employee/production file. These groups of information were then merged based on the mine ID by utilizing SQL data management software. The summary report from SQL provided a total number of 123,589 mine-year observations (i.e., ∑ i.t (number of mines × number of years)) from 1986 to 2018. These observations were obtained from 21,396 number of mine operations categorized as underground, surface, and others (i.e., augur, milling and preparation plant, office, culm banks, independent shops and yards, surface at underground mine), as follows:

  1. (1)

    Number of observations for underground mines: X = 5477 (mine-ID) * 5.42 (avg.) = 29,707

  2. (2)

    Number of observations for surface mines: Y = 5759 (mine-ID) * 5.67 (avg.) = 32,643

  3. (3)

    Number of observations for other operations: Z = 10,160 (mine-ID) * 6.03 (avg.) = 61,239

where, avg. is the average number of observations for each category between 1986 to 2018.

All observations in the analysis included the following information:

  1. (1)

    MSHA mine identification (mine ID);

  2. (2)

    Mine operation (surface, underground, and others);

  3. (3)

    Geographic location (Appalachian, Western, and Interior)Footnote 2 ;

  4. (4)

    State and county;

  5. (5)

    The average number of employee in each mine per year (as an indicator for mine size);

  6. (6)

    Total number of employee hours per year;

  7. (7)

    Number of reported CWP cases by mines per year;

  8. (8)

    Average of coal seam height in each mine;

  9. (9)

    Coal rank of each mine.

In this study, the average number of employees was used as an indicator for mine size, classified in three categories as small, medium, and large with a number of employees less than 50, between 50 and 100, and more than 100, correspondingly (Laney and Attfield 2010; Blackley et al. 2014; Shekarian et al. 2021b). Furthermore, coal seams were categorized into three different groups based on the average seam height as thin (less than 40 inches), medium (i.e., between 40 and 75 inches), and thick (more than 75 inches). Available data for coal rank was categorized based on two major coal ranks in the U.S., anthracite, and bituminous (both bituminous and sub-bituminousFootnote 3) (Table 1).

Table 1 Description of variables involves in the statistical analysis for U.S. coal mines, 1986–2018

The summary report of SQL data was used to conduct the statistical analysis, including descriptive statistics, correlation, and regression analysis. Available data were used to formulate hypotheses for the relationship between CWP rate and the contributing factors (i.e., mine operation, geographic location, mine size, coal rank, and coal seam thickness). Figure 1 illustrates the schematic of the methodology steps.

Fig. 1
figure 1

Summary of data management and methodology for statistical analysis

2.2 Descriptive statistics

A total of 123,589 mine-year observations was included in the regression analysis. The dependent variable was the rate of CWP, as the number of CWP per total hours of employees in each year for each mine. Mine operation type, mine size, coal seam thickness, geographic location, and coal rank were considered as the independent variables. Table 1 summarizes the type, classification, and description of the classification of each variable. For regression modeling, each multiclass variable was broken into multiple dummy variables as binary variables. A total number of 14 binary variables were defined for the independent variables (i.e., 3 variables for mine operation type, 3 variables for mine size, 3 variables for geographic location, 3 variables for seam thickness, and 2 variables for coal rank).

2.3 Regression analysis

Because of the panel nature of data, linear regression model was utilized to analyze the relationship between the rate of CWP and the independent variables. Considering a dependent variable of Y, and the independent variable of X, the following equation provides the relationship:

$$Y_{i,t} = \beta X_{i,t} + u_{i} + \varepsilon_{i,t}$$
(1)

where, Y denotes the dependent variable, ß is the coefficient, X is the independent variable, the subscript i stands for mine-ID, the subscript t represents year, u is mine-specific unobserved heterogeneity (i.e., factors constant over time but unobserved to the econometrician), and ɛ is the error term (e.g., observation-specific error) (Bruin 2006; Benfratello 2014).

In estimation models, the assumption is that errors are independent and equally distributed. However, in longitudinal data, such as those used in this study, this independency does not exist. In other words, observations for each unit (mines) are correlated (e.g., each mine has different observations in different years). One possible solution is to include subject-specific random effects in the model fitting. This method is called generalized estimating equations (GEE), which is a nonparametric way to account for this issue. GEE estimates the marginal effect of covariates averaged across units (Fitzmaurice et al. 2008). Here, this can be interpreted as the overall effect of mining operation, mine size, geographic location, coal rank, and coal seam thickness on CWP disease among coal miners in the U. S. coal mines during 1986–2018. GEE has been widely used for panel estimation (Pang 2017; Shah et al. 2017; Shekarian et al. 2021b). This method provided the best fit for our data. The models' covariance structure was specified as exchangeable, implying a shared correlation between observations within each mine. Furthermore, variance inflation factor (VIF), robust regression, and homoscedasticity analyses were performed to achieve accurate estimation results.

A comprehensive review was conducted to develop hypotheses and examine the relationship between CWP rate and mining factors in the U.S. coal mines from 1986 to 2018 (Shekarian 2020). This systematic review found several mine-, individual-, and dust-specific factors that may contribute to the prevalence of lung diseases (Shekarian 2020). The selected database allows for studying the effects of some of those factors, including mine operation type, geographic location, mine size, coal rank, seam height. Therefore, the model was utilized to test the following hypotheses:

  • H1: Workers in underground coal mines are more likely to develop CWP.

  • H2: The coal region contributes to CWP incidence rate.

  • H3: Workers in smaller operations are more likely to develop CWP incidence.

  • H4: The coal rank contributes to the CWP incidence rate.

  • H5: Workers in thin-seam mine operations are more likely to develop the CWP incidence rate.

3 Results and discussion

3.1 Descriptive statistics

A description of the single variable, frequency and correlation coefficients is a standard technique for statistical analysis (Rahimi 2020). Descriptive statistical analysis of the data was conducted by measuring means, standard deviations, minimum, and maximum of all the variables (Table 2). These information was determined for each of the independent variables, in underground and surface mines, from the data of 1986 to 2018. As for underground mines, 93% of observations were in the Appalachian region, 3% in the Western region, and the rest in the Interior region. The majority (about 80%) of mines were small size. Additionally, the highest number of CWP cases in the observations was 946 (West Virginia, Boone county). In surface mines, approximately 86% of mines were in the Appalachian region, 9% in the Interior region, and the rest (5%) in the West region. Similar to underground mines, the majority of mines (86%) were classified as small mine. Additionally, the highest number of CWP cases in the observations was 175 (West Virginia, Logan County). The descriptive statistical analysis of the CWP rate for various independent variables is discussed in the following.

Table 2 Descriptive statistics for variables in underground and surface coal mines in the U.S., 1986–2018

Mine operation type: Mine operation type is a significant determinant of RCMD exposure (NAS 2018). RCMD compositions at surface and underground differ from each other, due to the different operations in these mining types (Schatzel 2009; Landen et al. 2011; Thakur 2019). Consequently, underground coal workers are at a higher risk of CWP than surface mines due to a confined space and limitations in artificial ventilation systems. Analysis of CWP by mine operation shows that, out of 7337 CWP cases, the majority (i.e., 76%) was reported in underground mines. Approximately 11% of the CWP cases were reported in surface mine operation. The rest (i.e., 13%) of cases were workers in other mine operations, mainly mill or preparation plant (Fig. 2a). As shown by chronological data of CWP cases in the U.S. coal mines (Fig. 2b), there was a decrease in the rate of CWP by implication of permissible exposure limit (PEL). However, there was an increase in the prevalence of CWP in the 1990s.

Fig. 2
figure 2

Number of CWP cases as function of a mine operation type and b year in Unites States during 1986–2018 (total number of CWP is 7337)

Geographic location: The geographic location of coal mines is an important factor in assessing the RCMD health risk. Regional variations in dust characteristics exist due to the geographical clustering of coal mines in the U.S.. In Central Appalachia, for instance, mines may have more rock strata sourced dust compared to other regions (Sarver et al. 2020). Amandus' study showed that coal workers in the eastern region of Appalachian coal field, including West Virginia and Pennsylvania, are at a higher risk of CWP than U.S. western mines (Amandus and Piacitelli 1987; Thakur 2019). The findings of Sarver’s study supported this hypothesis that RCMD characteristics differ substantially among mining regions. Understanding the difference in mineral and elemental compositions, as well as particle size distributions of RCMD among geographic locations sheds light on the recent CWP resurgence (Sarver et al. 2019b).

From 1986 to 2018, a total number of 106 counties from 16 states across the country reported coal miners with CWP disease. The number and distribution of CWP prevalence among coal miners in different states and counties are demonstrated in Fig. 3. The hot spot areas, including West Virginia, Kentucky, Virginia, and Pennsylvania, reported a higher number of CWP cases. Several reasons including high silica content of mines in the Appalachian region, thin coal seams containing a high percentage of quartz, small sizes of mines, and an increase in the mines’ shift hours resulting in coal and silica dust accumulation may contribute to the higher prevalence of CWP diseases in this region (Gamble et al. 2012; Sarver et al. 2019a).

Fig. 3
figure 3

Distribution of CWP per state a and county b for underground, surface, and total data during 1986–2018 in the U.S.

Mine size: Several studies has identified mine size (identified by number of underground miners employed) as a predictor of CWP risk among US underground coal miners (Laney and Attfield 2010; Laney et al. 2012; Shekarian et al. 2021b). These studies indicated workers in small mines are associated with an increased risk of CWP, but it was unknown if abnormal lung functions are also linked to the mine size. Blackley et al. (2014) showed that mine size significantly affects the CWP prevalence and lung function abnormality. The spirometry and radiographic analysis among 3770 coal miners in their study showed that there is a higher risk of abnormal spirometry (18.5% vs. 13.8%, p < 0.01), CWP (10.8% vs. 5.2%, p < 0.01), and progressive massive fibrosis (2.4% vs. 1.1%, p < 0.01) in miners working in small mine operations compared to that of large operations. They also concluded that coal workers in small mines (i.e., the number of employees is less than 50) in Kentucky, Virginia, or West Virginia are at a higher risk of CWP prevalence than those in large mines (Blackley et al. 2014). Suarthana et al. (2011) found an association between decreasing mine size and prevalence of CWP and PMF among coal miners in the U.S.. One possible explanation is that smaller mines may have fewer health and safety resources than larger operations (Suarthana et al. 2011). Moreover, previous investigations indicated that the average concentrations of RCMD in small mines are higher (Antao et al. 2005; Antao et al. 2006; Suarthana et al. 2011; Blackley et al. 2014).

Size analysis of coal mines in the U.S. revealed that most of underground and surface mines are in a small size (Fig. 4a). The distribution of CWP per mine size indicated that the number of CWP in the large mines are more than that in small and medium mines. However, the total number of CWP in each mine size does not necessarily show that people in the large mine size are at a higher risk of CWP. Therefore, the rate of CWP per number of employees was calculated to compare the prevalence of CWP per mine size. The data showed that the rate of CWP in the small mines are more than that in medium and large mines (Fig. 4b).

Fig. 4
figure 4

Percentage of underground and surface mines a and rate of CWP (%) b per mine size in the U.S. during 1986–2018

Coal seam thickness: Coal seam thickness is one of the potential contributing factors that could influence the prevalence of CWP among coal miners (Laney and Attfield 2010; Blackley et al. 2014). Seam height in coal mines varies based on the coal reserves' geographic location and geological properties. Suarthana et al. (2011) reported that the average coal seam thickness for central Appalachia mines is lower than that in other regions. Further to this review, it was concluded that CWP and abnormal lung function prevalence were likely associated with the low seam height and small mine size in the U.S. (Suarthana et al. 2011; Shekarian 2020).

Distribution of surface and underground mines in three coal seam thickness classes was performed (Fig. S1 in Supplementary Material). Only less than 11% of mines had thick coal seams, and the majority of mines had either thin or medium size coal seam heights. In underground mines, the medium seam height was dominant (i.e., 44%), while the majority (i.e., 50%) of surface mines operated thin coal seams.

The distribution of CWP per coal seam height and mine size for both underground and surface coal mines was subsequently studied. The results showed that CWP rate is more prevalent in underground mines operating medium seams than that of mines operating thin and thick coal seams (Fig. 5a). However, the rate of CWP in surface mines indicated a higher rate percentage for thick seams than thin and medium seams (Fig. 5b). Regardless of coal seam thickness, the majority of CWP cases in underground and surface mines was reported in the small mine size.

Fig. 5
figure 5

Rate of CWP (%) by seam thickness and mine size in the U.S. underground a and surface b mines during 1986–2018

Coal rank: Several studies confirmed that there is a higher risk of CWP for higher coal rank, even at the same level of RCMD concentrations (Antao et al. 2005; Antao et al. 2006). Gamble et al. (2011) proposed higher rank coal as a plausible factor for CWP prevalence within Appalachian region (NIOSH 2008; Gamble et al. 2011). In many bituminous coal mines, the higher prevalence of CWP have been also linked to a higher quartz content in respirable dust (Gamble et al. 2012). Previous studies indicated that an apparent link between the coal rank and CWP may be attributed to the particle surface charge and mineralogical composition of RCMD (Gamble et al. 2011, 2012; Sellaro and Sarver 2014). However, coal rank causal effects have not been exclusively investigated.

Distribution of U.S. surface and underground coal mines based on the coal rank showed that bituminous mines account for about 95% of the coal operations (Fig. 6a). Compared to surface anthracite mines (59 mines), only a few active underground anthracite mines (9 mines) existed in 2018. The distribution of CWP by coal rank indicated that the bituminous coal rank contributes to about 95% CWP rate. (Fig. 6b, c).

Fig. 6
figure 6

Number of underground and surface mines a, and rate of CWP (%) b, c per coal rank in the U.S. during 1986–2018

3.2 Regression analysis

Analysis was performed (at three significance levels of 0.01, 0.05, and 0.1) to determine the presence and strength of correlations among the variables considered in this study. The correlation study indicated no strong correlation between the independent variables (Tables S1, and S2 in Supplemental information). Thus, these variables can be used in multivariant regression modeling. Regression analysis of the relationship between CWP rate and the identified contributing factors was carried out utilizing GEE model. Table 3 shows the main results of the GEE analysis which utilized for testing the hypotheses of relationship between CWP and the independent variables (described in Sect. 3.3).

Table 3 GEE estimation results

H1: It was hypothesized that workers in underground mines are more likely to developing CWP than those in other operations. The regression analysis showed that coal workers in underground coal mines are at a higher risk of CWP than surface coal miners (β = 4.010, p < 0.01). It also showed that coal workers in other mine operation (including milling and preparation plant) are at a higher risk of CWP than workers at surface mines (β = 2.706, p < 0.05). Therefore, H1 is supported, and mine operation type is a significant factor contributing to the CWP prevalence.

H2: Geographical location was hypothesized to be a contributing factor to the prevalence of CWP. The regression analysis showed a significant positive coefficient for both Appalachia (including West Virginia, Kentucky, Pennsylvania, Virginia, Alabama, Tennessee, Maryland, and Ohio) and Interior (including Illinois and Indiana) regions compared to the Western (including Wyoming, Texas, New Mexico, Oklahoma, Utah, and Colorado) geographic region. The statistical analysis showed that, compared to the Western region, underground coal workers in both Appalachia (β = 4.407, p < 0.01), and Interior (β = 3.750, p < 0.01) geographic regions are at a greater risk of CWP. Therefore, H2 is supported for underground coal mines. The result of regression for surface mines, with Western region as a reference, showed that surface coal workers in the Appalachian region are at a higher risk of CWP (β = 5.101, p < 0.01). The outcome of the regression model for Interior region was not statistically significant. Therefore, H2 is supported only for Appalachia vs. Western surface coal mines.

H3: The third hypothesis investigated how the size of mine could influence the prevalence of CWP among coal miners. We categorized the mine size based on the average number of employees in each mine (small: less than 50; medium: between 50 and 100; large: more than 100). The results of the statistical analysis indicated that underground coal workers in small mines are at a higher risk of CWP in comparison with workers at medium (β = − 1.961, p < 0.01) and large (β = − 1.879, p < 0.01) mines. Therefore, H3 is supported for the underground mines. In surface mines, coal workers in small mines are at a higher risk of CWP in comparison with medium size mine workers (β = − 1.277, p < 0.1). The results were not statistically significant for large operations. Therefore, H3 is supported only for the medium vs. small surface coal mines.

H4: It was hypothesized that coal rank contributes to the CWP incidence rate. For the coal rank, the statistical analysis showed a significant relationship between CWP rates and bituminous coal rank in underground mines. It indicated that underground bituminous coal miners are at a higher risk of CWP than anthracite coal miners (β = 7.383, p < 0.01). Therefore, H4 is supported for underground mines. On the other hand, surface anthracite coal miners are at a higher risk of CWP than anthracite coal miners (β = − 1.476, p < 0.01). Therefore, H4 is supported for the surface mines. It should be noted that the MSHA database classifies coal rank only as bituminous and anthracite. Only 0.3% of coal production in 2018 came from anthracite (Shekarian 2020).

H5: Finally, a hypothesis examined how the coal thickness could influence the prevalence of CWP among coal miners. The seam thickness was categorized into three groups based on the average of seam thickness in each mine (thin: Seam height ≤ 40”; medium: 40” < Seam height ≤ 75”; and thick: Seam height > 75 inches). The GEE result indicated that coal workers in the underground mines operating thin (β = 1.416, p < 0.05) and medium (β = 1.397, p < 0.01) seams are at a higher risk of CWP prevalence, compared with those working in thick-seam underground operations. Therefore, H5 is supported for underground coal mines. The result of regression for surface mines shows that we cannot make a conclusion for the coal workers in the thin-seam surface operations, but coal workers in the medium-seam operations (β =  – 1.969,p < 0.01) are at a lower risk of CWP prevalence in comparison with workers for thick-seam mines. Therefore, H5 is not supported for the medium vs. thick-seam surface coal mines.

In order to examine the accuracy of the regression model results, VIF and homoscedasticity analyses were conducted. VIF identifies multicollinearity in the regression models. Multicollinearity exists when there is a correlation among independent variables in a regression model. The presence of multicollinearity will negatively influence the results of the estimation. This connection is, in other words, a concern since independent variables should be independent. Each of the VIF scores for the dataset were less than 5 (mean score of 1.75) (Table S3), indicating that lack of multicollinearity has been met.

The homogeneity of variance of the residuals is one of the main assumptions of GEE. For all predicted dependent variables, the variance of residuals is roughly equal. This makes the prediction of regression unbiased, consistent, and accurate (Salkind 2007; Shekarian et al. 2020). The existence of homoscedasticity was tested using Breusch–Pagan test (Table 4). The p-value was statistically significant at significance level of 0.01; therefore, the null hypothesis, which is the existence of homoscedasticity, was rejected. Table 4 shows the result of Breusch–Pagan’s test. To account for heteroscedasticity, the robust standard error was used in GEE model (Pitselis 2013; Wooldridge 2016).

Table 4 Homoscedasticity test results

4 Conclusions

In the United States, the increase in the rate of CWP in the mid-1990s has renewed the urge among medical and science researchers to investigate the primary root causes of the problem. No study has investigated the effect of all of the available mining factors in the prevalence of CWP among coal miners in a multivariable model. This study attempted to conduct such a comprehensive study using the longitudinal data collected from multiple credible sources such as MSHA accident/injury, and MSHA employee/production. First, a 33-years panel data analysis on the U.S. coal mines was conducted to determine the relationship between the CWP rate and mining parameters including mining operation type, geographic location, mine size, coal rank, and coal seam thickness. Second, for each type of mining operation, five hypotheses were developed to determine the relationship between the CWP rate and the contributing factors using a multiple linear regression model. The results of the GEE regression supported all of the hypotheses for underground coal mines. More specifically, mine operation, geographic location, mine size, coal seam thickness, and coal rank contribute to the prevalence of CWP among coal miners. In surface mines, mine size (only medium vs small size), geographic location (only Appalachia vs Western region), coal rank, and seam height (only medium-thickness vs thick-seam) contribute to the prevalence of CWP.