Background

The prevalence of infectious diseases caused by a multitude of pathogens results in high economic losses for the pig industry [1, 2]. Genetic improvement for disease resilience is a practical option to help address the problem of infectious disease as it can ensure production efficiency, because resilient animals are defined as maintaining a relatively undepressed performance in the face of disturbances caused by infection [3, 4]. Disease resilience is a complex trait composed of multiple biological functions, such as growth, health, nutrient status, and other dynamic elements, including the efficiency of immune response and the rate of recovery from infection [5]. This complexity makes disease resilience hard to properly characterize and little is known about the genetic architecture that drives disease resilience. Alternatively, indirect selection of disease resilience based on immune-related traits may be a feasible breeding strategy, because the disease response of an animal largely depends upon its immunity [6, 7].

Blood cells comprise white blood cells, red blood cells, and platelets that are important elements of an animal’s immune status [8]. Complete blood count (CBC) is one of the most common clinical tests performed to evaluate concentrations and relative proportions of these circulating blood cells, which may help to uncover the layers of immune system complexity [9]. Our previous study [10] found that CBC traits collected from blood samples of pigs in both healthy and challenged conditions at 2-weeks before, and 2 and 6-weeks after exposure to a polymicrobial challenge were moderately to highly heritable (0.08 ± 0.04 to 0.53 ± 0.05). Changes of each CBC trait between blood samples collected at different time points (e.g. the change of a CBC level from 2-weeks before to 2-weeks after exposure to the challenge) were also found to be heritable, with estimates ranging from 0.06 ± 0.04 to 0.24 ± 0.04 [10]. These heritability estimates indicate the importance of the genetic component of CBC traits. Moreover, significant genetic correlations (either positive or negative) were found for several CBC traits collected after exposure to the challenge with the economically important production traits of grow-to-finish growth rate (GFGR) and treatment rate (TR) in response to the polymicrobial challenge (− 0.82 ± 0.47 to 0.89 ± 0.26) [10], which may further indicate the potential of developing those CBC traits as indicator traits of disease resilience. In addition to these significant genetic correlations for CBC with GFGR and TR, our previous study [10] also found high genetic correlations (≥ 0.40 ± 0.04) between the CBC traits. Changes in CBC traits between each time point were also found to be genetically correlated, with significant estimates ranging from − 0.42 ± 0.21 to − 0.92 ± 0.11 to 0.44 ± 0.22 to 0.98 ± 0.03 [10]. This allows multivariate models to be used for joint analyses of these genetically correlated traits, which provides the potential to improve statistical power and explore pleiotropy [11,12,13,14].

To date, some quantitative trait loci (QTL) have been identified for some blood cell traits in pigs under either healthy or disease challenged status by linkage and association analyses [15,16,17,18,19,20,21,22]. However, due to the use of a pathogen-specific challenge or a relatively low density of genetic markers, the genetic components of blood cell traits in pigs under typical commercial environments, where multiple disease-causing pathogens are present, remains largely unknown.

In this study, CBC traits were collected from pigs in a natural polymicrobial disease challenge model, as described by Bai et al. [10]. Standard univariate genome-wide association studies (GWAS) and multivariate GWAS based on a relatively high-density panel of 465,910 autosomal single-nucleotide polymorphisms (SNPs) were conducted for these CBC traits. The objectives were: (1) to reveal the genomic regions associated with the CBC traits and with their changes in response to the polymicrobial challenge; and (2) to explore the underlying genetic architecture for disease resilience of pigs in the face of a polymicrobial infectious challenge.

Results

Descriptive statistics and genetic parameters

Descriptive statistics (mean, standard deviation, minimum, maximum, and distribution) for the CBC data after removing the outliers, are shown in Additional file 1: Fig. S1, S2 and S3. Details about genetic parameters for the evaluated CBC traits, including heritabilities and genetic correlations, can be found in our previous study [10], which used the same 2593 genotyped animals. In addition to the genetic correlations with resilience already reported for these data by Bai et al. [10], we also found significant genetic correlations for platelet concentration in Blood 3 collected at 2-weeks after exposure to a polymicrobial infectious challenge with GFGR (0.40 ± 0.22) and TR (− 0.46 ± 0.26), and for the change of monocyte concentration from Blood 1 to Blood 3 (MONOΔ13) collected at 2-weeks before and 2-weeks after exposure to the challenge with GFGR (0.63 ± 0.21).

Population structure

As false positive results can be introduced in GWAS by confounding effects due to population stratification, multidimensional scaling (MDS) plots (Additional file 2: Fig. S4) were generated to provide a visualization of the population structure in the first three dimensions (C1, C2, and C3). Animals tended to cluster by farm of origin, as they shared a similar genetic background when they came from the same farm. Since batches were nested within farms and coded uniquely, population stratification associated with the farm effect was accounted for in the association analysis model by fitting the fixed effect of batch. The genomic inflation factors of single-step (SSGWAS) for the CBC traits ranged from 0.98 to 1.06, suggesting that there was no population stratification that confounded the GWAS results.

Association results and estimates for SNP effects

White blood cell traits

Five genomic regions were found to be significantly associated with white blood cell traits at a genome-wise false discovery rate (FDR) of 0.05 by univariate SSGWAS. Of note, SNPs located on Sus scrofa chromosome (SSC) 4, SSC10, and SSC12 were found to be associated with eosinophil concentration in Blood 3, which was collected 2 weeks after exposure to the challenge (EOSB3). Meanwhile, SNPs on SSC2 and two adjacent floating SNPs (significant SNPs without a group of supportive SNPs) on SSC9 were identified to be associated with MONOΔ13. The Manhattan and quantile-quantile (Q-Q) plots for EOSB3 and MONOΔ13 are shown in Additional file 2: Figs. S5 and S6. Top lead SNPs (the most significant SNP with a group of supportive SNPs) for significant associations (genome-wise FDR < 0.05) with EOSB3 and MONOΔ13 are shown in Table 1. For EOSB3, the additive genetic variances explained by the 1 Mb window of the top lead SNPs (SNP1, SNP2, SNP3) and their adjacent SNPs on SSC4, SSC10, and SSC12 were estimated to be 0.46, 0.35, and 0.53% of the additive genetic variance for EOSB3, respectively. SNP4 was a floating SNP on SSC2 and its 1 Mb window explained 0.12% of the additive genetic variance for MONOΔ13. The 1 Mb window for SNP5, the top lead SNP on SSC9, was estimated to explain about 1.23% of the additive genetic variance for MONOΔ13.

Table 1 Top significant SNPs identified by univariate SSGWAS for significant associations with white blood cell traits at a genome-wise false discovery rate (FDR) of 0.05

Estimates of additive and dominance effects for the top significant SNPs (genome-wise FDR < 0.05, including both top lead and top floating SNPs) associated with EOSB3 and MONOΔ13 are summarized in Table 1. A significant dominance effect (p < 0.05) was only identified for SNP2, which was associated with EOSB3. Estimates of additive effects were found to be significant (p < 0.05) for all SNPs that were associated with EOSB3 and MONOΔ13. For EOSB3, estimates of additive effects were − 0.05 ± 0.01, 0.14 ± 0.04, and − 0.06 ± 0.01 for SNP1, 2 and 3, respectively. Estimates of additive effects for MONOΔ13 were 0.08 ± 0.02 and − 0.08 ± 0.02 for SNP 4 and 5, respectively.

Due the relatively low genetic correlations and large standard errors between white blood cell traits [10], no genomic region was found to be significantly associated with white blood cell traits at FDR < 0.05 from multivariate SSGWAS.

Red blood cell and platelet traits

Nine genomic regions were found to be significantly associated with red blood cell and platelet traits at the genome-wise FDR of 0.05 by univariate and or multivariate SSGWAS. The Manhattan plots and Q-Q plots are shown in Additional file 2: Figs. S7, S8, S9, S10, S11 and S12. The four top lead SNPs for significant associations (genome-wise FDR < 0.05) and estimates of additive genetic variances explained by these top lead SNPs and their adjacent SNPs in a 1 Mb window are summarized in Table 2. Five floating SNPs (genome-wise FDR < 0.05) that explained small amounts of the additive genetic variance (0.05 to 0.21%) for associated red blood cell and platelet traits were found and are summarized in Table 3. Of note, several pleiotropic SNPs associated with red blood cell or platelet traits were identified by multivariate SSGWAS of CBC traits in Blood 1, 3, and 4 (collected at 2-weeks before, 2- and 6-weeks after the challenge, respectively). High genetic correlations were found between mean corpuscular hemoglobin (MCH), mean corpuscular volume (MCV), and red blood cell concentration (RBC) traits (Additional file 1: Table S1), and also between all three sampling time points for each of these traits (≥ 0.77 ± 0.08) [10]. Therefore, pleiotropic SNP7 on SSC6 was identified as the top lead and pleiotropic SNP for MCH in Blood 1 and for both MCV and RBC traits in all three blood samples (Table 2). The percentage of additive genetic variance explained by the 1 Mb window of SNP7 and its adjacent SNPs ranged from 0.29 to 0.57% for its associated traits. Moreover, SNP8 was the top lead and pleiotropic SNP on SSC8, which was associated with MCH, MCV, and RBC traits in all three blood samples. The percentages of additive genetic variance explained by SNP8 and its adjacent SNPs in a 1 Mb window were estimated to range from 0.28 to 0.35% for its associated traits. SNP9 on SSC17 was the top lead and pleiotropic SNP for mean platelet volume (MPV) in Blood 1 and 4. Together with adjacent SNPs in a 1 Mb window, SNP7 was estimated to explain about 0.49 and 0.40% of the additive genetic variances for MPV in Blood 1 and 4, respectively. Significant associations (genome-wise FDR < 0.05) for SNP7 with MCV in Blood 1 (genome-wise FDR = 0.003) and for SNP8 with MCV in Blood 4 (genome-wise FDR = 0.04) were also found by univariate SSGWAS but at a lower significance level compared to the multivariate SSGWAS. Meanwhile, univariate SSGWAS only indicated suggestive associations (genome-wise FDR of 0.10) for SNP8 with RBC in Blood 1 (genome-wise FDR = 0.09) and with MCV in Blood 1 (genome-wise FDR = 0.08).

Table 2 Top lead SNPsa identified by univariate and multivariate SSGWAS for significant associations with red blood cell and platelet traits at a genome-wise false discovery rate (FDR) of 0.05
Table 3 Top floating SNPsa identified by univariate and multivariate SSGWAS for significant associations with red blood cell and platelet traits at a genome-wise false discovery rate (FDR) of 0.05

For red blood cell and platelet traits, the estimates of additive and dominance effects for the top lead SNPs are summarized in Table 2 and for the top floating SNPs in Table 3. Of note, the additive effects for pleiotropic SNPs showed a tendency of affecting each CBC trait in the three blood samples in the same way, including SNP7 for MCV, SNP8 for MCH and MCV, SNP9 for MPV, SNP10 for MCV, SNP11 for PLT, and SNP12 for MCH. For pleiotropic SNP8, no significant additive effect was found for RBC traits.

Candidate genes and functional enrichment results

Browsing regions for candidate genes located within a maximum distance of 1 Mb on either side of the lead SNPs based on a genome-wise FDR < 0.1 for the associated CBC traits are summarized in Additional file 3: Table S2. Candidate gene functions for each CBC trait were explored with the Ingenuity Pathway Analysis (IPA) database. The top five enriched (p < 0.05) biological functions among diseases, molecular and cellular functions, and physiological system development and function categories for each CBC trait are summarized in Additional file 4. Enriched functions such as inflammatory responses, cell-to-cell signaling and interaction, cellular development, cell morphology, cellular growth and proliferation, and hematological system development and function were commonly identified for the candidate gene lists for white blood cell traits collected after exposure to the challenge, and / or the pleiotropic candidate gene lists for red blood cell and platelet traits collected before and after exposure to the challenge.

Candidate genes that have been reported by previous studies of pigs, human, mice, or rats to be functionally and biologically related to the same category of blood cells, as explored here, are summarized in Table 4. A group of immunity genes on SSC2 has been reported to be functionally and biologically related to monocytes, including TICAM2 (toll-like receptor adaptor molecule 2), TMED7(transmembrane emp24 domain-containing protein 7 precursor), and CDO1 (cysteine dioxygenase type 1), which were located proximal to SNP4, and COMMD10 (COMM domain containing 10), which harbored SNP4 (Table 4). An overview of the location of these candidate genes and the distribution of all the SNPs in this region on SSC2 is shown in the linkage disequilibrium (LD) haplotype map in Additional file 3: Fig. S13. SNP6 on SSC4 is intronic within candidate gene SPTA1 (spectrin alpha, erythrocytic 1) and the LD haplotype map for this region is shown in Additional file 3: Fig. S14. In Table 4, a group of candidate genes, including THAP11 (THAP domain containing protein 11), PSMB10 (proteasome subunit beta type 10), LCAT (lecithin-cholesterol acyltransferase), and SLC12A4 (Potassium/Chloride Cotransporter 1), was reported to be functionally related to red blood cells and were located close to SNP7 in the same haplotype block on SSC6 (Additional file 3: Fig. S15). SNP8 on SSC8 was found to be in LD (r2 > 0.30) with SNPs in the PDGFRA (platelet derived growth factor receptor alpha) gene (Additional file 3: Fig. S16).

Table 4 Candidate genes located within 1 Mb on either side of the top significant SNPsa that have been reported by previous studies of pigs, human, mice, and rats to be functionally and biologically related to CBC traits

Discussion

Potential roles of candidate genes

Functional enrichment analyses for the candidate gene lists for CBC traits indicated multiple enriched functions that can be considered as functionally and biologically relevant to white blood cell traits in response to a polymicrobial infectious challenge, and red blood cell and platelet traits that were collected before and after exposure to the challenge, such as inflammatory response, cell growth and proliferation, cell-to-cell signaling and interaction, and hematological system development and function.

The candidate genes in Table 4 have been reported to be relevant to particular types of CBC traits by studies in pigs, human, mice, and rat, which may help us to further understand the functions of these candidate genes related to CBC traits in response to the polymicrobial challenge. Of note, candidate genes ARHGEF2 (Rho/Rac guanine nucleotide exchange factor), TGFB2 (transforming growth factor beta 2), and MIR21 (microRNA miR-21) were identified to be functionally and biologically relevant to eosinophils. The product of ARHGEF2 regulates the activity of GTPases and has been identified to be highly expressed in eosinophils. GTPases are known to be involved in mediator release from granulocytes, which is a crucial event in the activation of eosinophils and neutrophils during inflammation [23, 24]. TGFB2 has also been found to be expressed mainly in eosinophils, and greater expression of TGFB2 has been identified to be associated with persistent eosinophilic inflammation (severe asthma) in human [25]. However, in the polymicrobial challenge, an increase in the number of eosinophils may be associated with parasitic infection (e.g. Ascaris suum) rather than respiratory disease. Eosinophils play an important role of killing larvae by releasing the toxic content of their granules as part of the immune response [26]. Thus, further investigations are warranted to investigate the functional relationships between the expression of TGFB2 and response to the challenge. Expression of MIR21 has not been identified in eosinophils but in other white blood cells, including lymphocytes, monocytes, macrophages, and dendritic cells, which work collaboratively with eosinophils in the immune response [27,28,29]. Although the mRNA targets for MIR21 are complex and remain an area of active investigation, it has been demonstrated that MIR21 acts as a key signal mediating the balance of the inflammatory reaction to promote healing, resolution, and a return to homeostasis [27].

For the candidate genes on SSC2, the product of COMMD10 has been found to be related to the function of phagosomes in murine macrophages, which promotes phagolysosome maturation and facilitates the timely killing of pathogens [30, 31]. The product of ATG12 (autophagy related 12) is involved in autophagy of circulating monocytes for degradation and recycling of cellular components, which prevents apoptosis (programmed cell death) of monocytes and is essential for monocyte-macrophage differentiation and cytokine production in the innate immune response [32, 33]. The product of CDO1, cysteine dioxygenase type 1, catalyzes taurine synthesis and it is commonly accepted that taurine plays an important role in the immune system as an antioxidant to protect phagocytes, including macrophages, from oxidative stress caused by the generation of reactive oxygen species at the site of inflammation [34,35,36,37]. Both TMED7 and TICAM2 are immunity genes and their products are involved in the function of toll-like receptors (TLRs), which are expressed on macrophages and monocytes and are responsible for the sensing of pathogen-associated-molecular-patterns in the extracellular environment and in endosomes [38,39,40]. Of note, overexpression of TMED7 has been found to be associated with inhibition of MyD88-independent TLR4 signaling and the protein encoded by TICAM2 has been identified as a bridge adaptor recruiting TLRs to mediate innate immune responses [38,39,40]. In addition, NAMPT (nicotinamide phosphoribosyl transferase) on SSC9 has been found to be functionally and biologically related to monocytes, and its gene product has been found to play an important role in governing monocyte recruitment and in monocyte-macrophage differentiation [41, 42].

For red blood cells, the majority of candidate genes reported here have been identified as key components involved in hematopoiesis and erythropoiesis responsible for the differentiation and development of red blood cells, including MNDA (myeloid cell nuclear differentiation antigen) on SSC4, CBFB (core-binding factor subunit beta) and THAP11 on SSC6, PDGFRA and KIT (KIT proto-oncogene, receptor tyrosine kinase) on SSC8, and RARA (retinoic receptor alpha) and THRA (thyroid hormone receptor alpha) on SSC12 [43,44,45,46,47,48,49,50,51,52]. In addition, SPTA1 on SSC4 encodes a protein in the red blood cell membrane, the products of LCAT and SLC12A4 on SSC6 regulate the lipid composition in the red blood cell membrane and cell swelling, respectively, and all these gene products work together to maintain the normal volume and biconcave shape of red blood cells, which helps to ensure the biological and biomechanical functions of the cells [53,54,55,56,57]. ACKR1 (atypical chemokine receptor 1) on SSC4 and PSMB10 on SSC6 are candidate genes that have been shown to be involved in the immune response of red blood cells. The receptor ACKR1 expressed in red blood cells was found to regulate immune responses by interacting with chemokines, and which works as a blood-based chemokine buffer involved with the uptake and degradation of chemokines [58]. Meanwhile, ACKR1 has also been identified as an essential regulator of hematopoiesis and erythropoiesis promoting interactions between nuclear progenitor red blood cells and hematopoietic stem cells in the bone marrow [58, 59]. PSMB10 is found to be responsible for intracellular protein degradation and generation of peptides that bind to class I major histocompatibility complex (MHC) molecules [60]. The MHC molecules display these peptides to cytotoxic CD8+ T cells to support their activity of immune surveillance [61]. Further, through a study of anemia caused by congenital red blood cell aplasia in human, PSMB10 has been suggested to be functional in the MHC class I machinery in mature red blood cells in response to inflammatory signaling [62].

Candidate genes for platelet traits were annotated into two major functions, platelet aggregation and megakaryopoiesis. Platelet aggregation involves platelet-to-platelet adhesion, which is essential for effective hemostasis following injury and bleeding, and megakaryopoiesis is the process of differentiation and development of platelets [63]. Among them, CD9 (CD9 antigen) on SSC5 encodes a major platelet cell surface glycoprotein and plays dual roles in megakarypoiesis and platelet aggregation [64,65,66,67]. The products of VWF (von Willebrand factor), PHB2 (prohibitin 2), and GNB3 (G protein subunit beta 3) on SSC5 and GNAS (guanine nucleotide binding protein) on SSC17 were found to be involved in platelet aggregation [68,69,70,71,72,73]. In addition to megakaryopoiesis, tubulin beta class VI coded by TUBB1 (tubulin beta 1 class VI) on SSC17 has been reported to play a role in maintaining platelet morphology [74,75,76].

Overlap with previously discovered QTL

In addition to the novel QTL for CBC traits identified in this study, some of the QTL identified have been previously reported. QTL on SSC8 located nearby the KIT gene were found to be associated with MCH, MCV, and RBC in this study. In addition, this region has also been identified to show a significant effect on the levels of NEU and HCT in the crossbreds of European Wild Boar × Yorkshire and Landrace × Yorkshire subsequent to stress and disease challenges [15, 22]. For QTL on SSC5 that associated with PLT traits here, Reiner et al. [17] found them to be associated with red blood cell traits in Pietrain × Meishan pigs including HCT, HGB, and RBC traits. These results may be caused by the common myeloid progenitors for all cells mentioned above. Moreover, it may also further indicate the pleiotropic roles of QTL involved in the functions of different blood cells. Apart from studies in pigs, the candidate gene SPTA1 associated with MCHC has also been identified by GWAS for red blood cell traits in human, which also functions in maintaining the shape and deformability of human red blood cells [77].

Potential links with disease resilience

Although the QTL uncovered for blood cell traits have small effects in this study, which has also been found in previous GWAS for blood cell traits of pigs and human [21, 77], the genes involved in these QTL are suggested to be involved in hematopoiesis and immune responses in the face of a polymicrobial infectious challenge. In turn, they may contribute to disease resilience, as hematopoiesis and immune response are collaborative mechanisms that play essential roles in defending against pathogens, maintaining homeostasis, and preventing death from the infection [6, 78, 79]. None of the QTL identified for the CBC traits were pleiotropic with GFGR or TR in response to the challenge. However, some candidate genes are known to have pleiotropic effects among different CBC traits and play roles in both hematopoiesis and immune response. For example, KIT may be a pleiotropic gene for multiple blood cell populations in response to stress and disease challenge, and ACKR1 exhibits pleiotropic effects on hematopoiesis and immune responses, as discussed above [15, 22, 58, 59]. Accordingly, these results highlight the importance of further investigating and validating the function of such pleiotropic genes in disease resilience.

Conclusions

In this study, we identified 14 genomic regions that were significantly associated (genome-wise FDR < 0.05) with CBC traits collected from the natural polymicrobial challenge model, including five for white blood cell traits and nine for red blood cell and platelet traits. Candidate genes or regions located nearby significant SNPs were found to have potential roles in immune response pathways, red blood cell morphology, platelet aggregation, and hematopoiesis, including granulopoiesis and granulocytic differentiation, erythropoiesis, and megakarypoiesis. These results complement previous GWAS for blood cell traits in pigs and contribute to improving our understanding of the genetic basis of blood cell composition before and after exposure to a polymicrobial infectious challenge. This study also advances understanding of the genetic control of disease resilience, as blood cells are key players in an animal’s immune response and are recruited by hematopoiesis. Validation and identification of the candidate genes and causal mutations are necessary to further investigate and develop the use of CBC traits to enhance genetic improvement of disease resilience for the pig industry.

Methods

Natural disease challenge model and phenotypic traits

Details of the natural disease challenge model (NDCM) and the collection of phenotypic traits are described in Bai et al. [10] and Putz et al. [80]. Briefly, the NDCM was established to simulate a polymicrobial infectious challenge and severe disease pressure often found at the commercial level of pig production. A 3-week healthy quarantine nursery after weaning and a test station that consisted of a 4-week second-stage nursery and an approximately 16-week grow-to-finish stage were the two main facilities in the NDCM. A total of 2743 healthy F1 crossbred (Landrace × Yorkshire) barrows owned by company Centre de Développement du Porc du Québec, Inc. were introduced into the NDCM in 42 batches at 3-week intervals after weaning at an average age of 21 days old. Each batch consisted of 60 or 75 weaned crossbred barrows that were provided by one of the seven members of PigGen Canada from healthy multiplier farms. All pigs from each batch were sourced from one multiplier farm, and over time, 13 farms were involved, and each supplied one to six batches of pigs. Pigs that went through the NDCM were first exposed to the polymicrobial infectious challenge in the second-stage nursery. The challenge was established by co-introducing four groups of 12 to 28 commercial seeder pigs from three strategically selected commercial farms with known disease outbreaks together with the first four batches of healthy barrows into the NDCM. Common disease-causing pathogens found in commercial farms were the major target pathogens in the NDCM, including multiple strains of porcine reproductive and respiratory syndrome virus (PRRSV) and swine influenza A virus, various respiratory and enteric bacterial pathogens (such as Mycoplasma hyopneumoniae, Haemophilus parasuis, Brachyspira hampsonii, Salmonella enterica serovar typhimurium, and Streptococcus suis), and two parasites (Cystoisospora suis and Ascaris suum). Subsequently, the challenge model was maintained as a continuous flow system by nose-to-nose direct contact between the new batch of healthy barrows and the preceding challenged group during the first week of challenge nursery period, except during the period of excessively high challenge pressure when indirect contact was used to help maintain the mortality rate below the target level established by the Animal Protection Committee. Serum samples were randomly collected from a subset of individuals for RT-PCR 4 weeks post-challenge and ELISA 6 weeks post-challenge to confirm that every batch had been exposed to PRRSV in the test station and to monitor the pathogens in the test station for each batch. The disease pressure varied by batch; not all pigs were exposed to all the same pathogens because not all pathogens were identified in all batches, as would be the case on a commercial farm. Clinical signs and mortality were monitored in the NDCM for treatment decisions made by herd veterinarians and trained staff, including individual treatments that were given on a case-by-case basis and group medication through water and feed on a batch-level, as necessary, to maintain a balance between disease pressure and animal welfare. Of a total of 2743 pigs, some of the animals died due to infectious diseases after exposure to the challenge (n = 561), a few animals died due to non-infectious and unclear reasons (n = 164) and their phenotypes were set to missing, the other animals (n = 2018) reaching the target slaughter weight at approximately 181 days old were slaughtered commercially and entered the food chain after the study.

In total, four sets of blood samples (Blood 1, Blood 2, Blood 3, and Blood 4) were collected from the jugular vein. Blood 2 was collected immediately before entry into the challenge nursery and polymicrobial infectious challenge at 40 days of age. Whole EDTA-anticoagulated blood was collected for CBC analyses at Blood 1, Blood 3, and Blood 4 (2-weeks before, and at 2- and 6-weeks after exposure to the challenge) at an average age of 26 days, 54 days, and 82 days, respectively, using the ADVIA® 2120i Hematology System (Siemens Healthineer, Erlangen, Germany) within 24 to 48 h of collection. The standard deviation of the bleeding age was 2, 3, and 3 days for Blood 1, Blood 3, and Blood 4, respectively. CBC traits used for this study were described and the outliers were removed previously [10]. There are three categories of CBC traits: (1) six white blood cell traits, including total white blood cell concentration (WBC, 103/μL), neutrophil concentration (NEU, 103/μL), lymphocyte concentration (LYM, 103/μL), monocyte concentration (MONO, 103/μL), eosinophil concentration (EOS, 103/μL), and basophil concentration (BASO, 103/μL); (2) seven red blood cell traits, consisting of red blood cell concentration (RBC, 106/μL), hemoglobin concentration (HGB, g/L), hematocrit (HCT, %), mean corpuscular volume (MCV, fL), mean corpuscular hemoglobin (MCH, pg), mean corpuscular hemoglobin concentration (MCHC, g/L), and red blood cell distribution width (RDW, %); and (3) two platelet traits, including platelet concentration (PLT, 103/μL) and mean platelet volume (MPV, fL). CBC traits were assessed at individual time points (Blood 1, Blood 3, Blood 4) as well as calculating and testing the changes between time points: Blood 1 to Blood 3 (Δ 13, calculated as Blood 3 − Blood 1), Blood 3 to Blood 4 (Δ 34, Blood 4 − Blood 3), and Blood 1 to Blood 4 (Δ 14, Blood 4 − Blood 1). All white blood cell traits in Blood 1, Blood 3, and Blood 4 were log10-transformed to reduce skewness of the distribution.

Body weights and veterinary treatments were recorded on an individual pig basis and used to calculate production traits, grow-to-finish growth rate (GFGR), and treatment rate (TR), which were regarded as economically important traits related to disease resilience. The GFGR for each animal was estimated using linear regression of body weights measured every 3 weeks in the grow-to-finish phase on ages from an average age of 69 days of age to the endpoint, i.e., either mortality due to infectious diseases or reaching the target slaughter weight at approximately 181 days old. The TR for each animal was the number of individual treatment events given on a case-by-case basis standardized by the number of days the animal spent in the NDCM (TR = number of treatment events/days × 100%). The TR for animals that died before receiving any treatment was set to missing.

SNP array genotyping and quality control

The genotyping using the 650 K Affymetrix Axiom® Porcine Genotyping Array was performed at Delta Genomics (Edmonton AB, Canada). Raw Affymetrix SNP data were processed by Delta Genomics with the Axiom Analysis Suite, using all defaults (sample call rate ≥ 97%; SNP call rate ≥ 97%; number of minor alleles observed ≥2). Imputation of sporadic missing genotypes was completed using FImpute [81]. The pedigree was utilized for imputation but only included the dam, since sire was typically unknown due to the use of pooled semen. The preGSf90 software in the BLUPF90 suite of programs was used to remove SNPs with minor allele frequency lower than 0.01 [82]. After quality control, there were 2593 genotyped animals with 465,910 autosomal SNPs remained for the subsequent analyses.

Population stratification and linkage disequilibrium estimation

Population stratification among genotyped animals was investigated using PLINK 1.90 [83] based on pairwise identity-by-state (IBS) distance, which was estimated using SNP genotypes. A multidimensional scaling (MDS) plot based on IBS pairwise distance was drawn by the ‘ggplot2’ package in R [84] to show the first three dimensions of the population structure. The genomic inflation factor and quantile–quantile (Q–Q) plots were applied to assess genomic inflation of the test statistics using the R packages of ‘GenABEL’ and ‘qqman’ [84,85,86]. The linkage disequilibrium (LD) of pairwise SNPs was measured as the squared correlation (r2) of allele counts for the two SNPs and haplotype blocks were built using the Haploview software [87, 88].

Single-step GWAS and models

Univariate and multivariate single-step GWAS (SSGWAS) for CBC traits were implemented in the BLUPF90 suite of programs [82, 89] with the joint pedigree-genomic relationship matrix (H) for single-marker associations, accommodating both genotyped (n = 2593) and non-genotyped (n = 150) animals. Details for algorithms employed for these analyses have been described by Aguilar et al. [89]. Briefly, BLUPF90 combines the algorithms for single-step GBLUP and for back-solving to obtain estimates and p-values for SNP associations from estimates of breeding values. The genomic relationship matrix (G) for genotyped animals was constructed as ZZ/2 ∑ pi(1 − pi), where the Z matrix contains centered SNP genotype codes and pi is the minor allele frequency for SNP i [90]. The p-values for SNP associations were adjusted for multiple testing by the Benjamini and Hochberg correction (false discovery rate, FDR) [84, 91]. An FDR threshold of 0.05 was used to control false positive results and to declare significant associations. The most significant SNP above the genome-wise FDR of 0.05 in each genomic region were referred to as the top significant SNP, which were further separated into top lead and top floating SNPs, which referred to top significant SNPs in a genomic region with or without a group of supportive SNPs, respectively.

The univariate mixed linear model used for GWAS can be described as follows:

$$ \mathbf{y}=\mathbf{Xb}+\mathbf{Za}+\mathbf{Wc}+\mathbf{e} $$

where y is a vector of observations on a CBC trait for all individuals, b is a vector of fixed effects, including the effect of batch and the covariate of bleeding age, X is a design matrix relating observations to the fixed effects, a is a vector of breeding values, Z is a design matrix that relates observations to breeding values, including genotyped and ungenotyped animals, and e is a vector of residual effects. Vector c represents a stack of vectors (cLitter, cPen1, cPen2, and cPen3) of independent and uncorrelated random environmental effects, including litter (cLitter) and pen effects in the quarantine unit (cPen1), in the test station second-stage nursery (cPen2), and in the test station grow-to-finish stage (cPen3). These random environmental effects were tested and fitted in the model for each CBC trait when they were significant (p < 0.05). Matrix W (WLitter, WPen1, WPen2, and WPen3) is a stack of incidence matrices that relate observations to the corresponding random environmental effects. The random effects fitted for each of CBC traits were the same as Bai et al. [10].

Assuming the random effects c and e are uncorrelated and identically distributed, the (co-)variances of random effects for univariate models are:

$$ \operatorname{var}\left[\begin{array}{c}\mathbf{a}\\ {}\mathbf{c}\\ {}\mathbf{e}\end{array}\right]=\left[\begin{array}{ccc}{\mathbf{H}\upsigma}_{\mathrm{a}}^2& 0& 0\\ {}0& {\mathbf{I}\boldsymbol{\upsigma}}_{\mathbf{c}}^{\mathbf{2}}& 0\\ {}0& 0& {\mathbf{I}\upsigma}_{\mathrm{e}}^2\end{array}\right] $$

where H is the joint pedigree-genomic relationship matrix for genotyped and non-genotyped animals as mentioned above, I is the identity matrix, \( {\upsigma}_{\mathrm{a}}^2 \) is the additive genetic variance, \( {\boldsymbol{\upsigma}}_{\mathbf{c}}^{\mathbf{2}} \) represents a stack of random effect variances (e.g. \( {\boldsymbol{\upsigma}}_{\mathbf{c}}^{\mathbf{2}}=\left[\begin{array}{cc}{\upsigma}_{{\mathrm{c}}_{\mathrm{Litter}}}^2& 0\\ {}0& {\upsigma}_{{\mathrm{c}}_{\mathrm{Pen}1}}^2\end{array}\right] \), when the random effects cLitter and cPen1 are significant and fitted in the model for a trait), and \( {\upsigma}_{\mathrm{e}}^2 \) is the residual variance.

The model for multivariate analyses resembles a stack of univariate models for each of the traits that were found to be highly genetic correlated in [10], which can be written as [11, 92]:

$$ \left[\begin{array}{c}{\mathbf{y}}_{\mathbf{1}}\\ {}{\mathbf{y}}_{\mathbf{2}}\\ {}{\mathbf{y}}_{\mathbf{3}}\end{array}\right]=\left[\begin{array}{c}{\mathbf{X}}_{\mathbf{1}}{\mathbf{b}}_{\mathbf{1}}+{\mathbf{Z}}_{\mathbf{1}}{\mathbf{a}}_{\mathbf{1}}+{\mathbf{W}}_{\mathbf{1}}{\mathbf{c}}_{\mathbf{1}}+{\mathbf{e}}_{\mathbf{1}}\\ {}{\mathbf{X}}_{\mathbf{2}}{\mathbf{b}}_{\mathbf{2}}+{\mathbf{Z}}_{\mathbf{2}}{\mathbf{a}}_{\mathbf{2}}+{\mathbf{W}}_{\mathbf{2}}{\mathbf{c}}_{\mathbf{2}}+{\mathbf{e}}_{\mathbf{2}}\\ {}{\mathbf{X}}_{\mathbf{3}}{\mathbf{b}}_{\mathbf{3}}+{\mathbf{Z}}_{\mathbf{3}}{\mathbf{a}}_{\mathbf{3}}+{\mathbf{W}}_{\mathbf{3}}{\mathbf{c}}_{\mathbf{3}}+{\mathbf{e}}_{\mathbf{3}}\end{array}\right] $$

For each trait in the multivariate model, the same effects were fitted as in the univariate models. For multivariate models, assuming random effects cn and residual effects en for the nth trait (n = 1, 2, 3) are uncorrelated and identically distributed, the (co-) variances of random effects are:

$$ \mathrm{Var}\left[\begin{array}{c}{\mathbf{a}}_{\mathbf{1}}\\ {}{\mathbf{a}}_{\mathbf{2}}\\ {}{\mathbf{a}}_{\mathbf{3}}\\ {}{\mathbf{c}}_{\mathbf{1}}\\ {}{\mathbf{c}}_{\mathbf{2}}\\ {}{\mathbf{c}}_{\mathbf{3}}\\ {}{\mathbf{e}}_{\mathbf{1}}\\ {}{\mathbf{e}}_{\mathbf{2}}\\ {}{\mathbf{e}}_{\mathbf{3}}\end{array}\right]=\left[\begin{array}{ccccccccc}{\mathbf{H}\upsigma}_{{\mathrm{a}}_1}^2& {\mathbf{H}\upsigma}_{{\mathrm{a}}_{12}}& {\mathbf{H}\upsigma}_{{\mathrm{a}}_{13}}& 0& 0& 0& 0& 0& 0\\ {}{\mathbf{H}\upsigma}_{{\mathrm{a}}_{21}}& {\mathbf{H}\upsigma}_{{\mathrm{a}}_2}^2& {\mathbf{H}\upsigma}_{{\mathrm{a}}_{23}}& 0& 0& 0& 0& 0& 0\\ {}{\mathbf{H}\upsigma}_{{\mathrm{a}}_{31}}& {\mathbf{H}\upsigma}_{{\mathrm{a}}_{32}}& {\mathbf{H}\upsigma}_{{\mathrm{a}}_3}^2& 0& 0& 0& 0& 0& 0\\ {}0& 0& 0& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{1}}}^{\mathbf{2}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{1}\mathbf{2}}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{1}\mathbf{3}}}& 0& 0& 0\\ {}0& 0& 0& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{2}\mathbf{1}}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{2}}}^{\mathbf{2}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{2}\mathbf{3}}}& 0& 0& 0\\ {}0& 0& 0& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{3}\mathbf{1}}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{3}\mathbf{2}}}& {\mathbf{I}\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{3}}}^{\mathbf{2}}& 0& 0& 0\\ {}0& 0& 0& 0& 0& 0& {\mathbf{I}\upsigma}_{{\mathrm{e}}_1}^2& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{12}}& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{13}}\\ {}0& 0& 0& 0& 0& 0& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{21}}& {\mathbf{I}\upsigma}_{{\mathrm{e}}_2}^2& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{23}}\\ {}0& 0& 0& 0& 0& 0& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{31}}& {\mathbf{I}\upsigma}_{{\mathrm{e}}_{32}}& {\mathbf{I}\upsigma}_{{\mathrm{e}}_3}^2\end{array}\right] $$

where \( {\upsigma}_{{\mathrm{a}}_{12}}={\upsigma}_{{\mathrm{a}}_{21}} \), \( {\upsigma}_{{\mathrm{a}}_{13}}={\upsigma}_{{\mathrm{a}}_{31}} \), and \( {\upsigma}_{{\mathrm{a}}_{23}}={\upsigma}_{{\mathrm{a}}_{32}} \) are additive genetic covariances between traits, \( {\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{12}}}={\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{21}}} \), \( {\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{13}}}={\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{31}}} \), and \( {\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{23}}}={\boldsymbol{\upsigma}}_{{\mathbf{c}}_{\mathbf{32}}} \) are covariances for common random effects between two traits, \( {\upsigma}_{{\mathrm{e}}_{12}}={\upsigma}_{{\mathrm{e}}_{21}} \), \( {\upsigma}_{{\mathrm{e}}_{13}}={\upsigma}_{{\mathrm{e}}_{31}} \), and \( {\upsigma}_{{\mathrm{e}}_{23}}={\upsigma}_{{\mathrm{e}}_{32}} \) are covariances for residual effects between two traits.

Post-GWAS marker effect analyses

The percentage of additive genetic variance explained by a 1 Mb window (with a median of 224 adjacent SNPs) was estimated by conducting window-based inferences for additive genetic variance in the BLUPF90 suit of programs [82]. Each chromosome was evaluated by using a sliding (moving) 1 Mb window by using every SNP on the chromosome as a starting SNP for a window segment [82]. Therefore, a top significant SNP was contained within multiple windows and among them, the largest percentage of additive genetic variance explained by a window that contained that top significant SNP was reported for each trait.

Additive and dominance effects of each top significant SNP were estimated using the BLUPF90 suite of programs [82] based on the following model:

$$ \mathbf{y}=\mathbf{Xb}+\mathbf{Za}+\mathbf{Wc}+\mathbf{v}\upalpha +\mathbf{d}\updelta +\mathbf{e} $$

where y, X, b, Z, a, W, c, and e are the same as for the univariate model described above; v is a vector of the top significant SNP genotypes coded as − 1, 0, and 1 for the AA, AB, and BB, respectively; α is the additive effect; d is a vector of dominance coded as 1 for heterozygous genotype (AB) and 0 for homozygous genotypes (AA and BB); δ is the dominance effect. Vectors v and d were fitted as covariates and the top significant SNPs were fitted one by one in the model. The likelihood ratio test was used to test the significance of the additive and dominance effects for each of the top significant SNPs by comparing full models to restricted models that constrained additive or dominance effects to zero using the REMLF90 program of BLUPF90 [82]. When the dominance effect was not significant (p > 0.05), the additive effect for a SNP was re-estimated by removing the dominance effect from the model.

Post-GWAS bioinformatics analyses

Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, Redwood City, CA; https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/, IPA Spring 2020 release) was used for functional enrichment analyses of candidate genes in significant genomic regions for the CBC trait. A maximum distance of 1 Mb on either side of the lead SNPs based on a genome-wise FDR < 0.10 was used to search for candidate genes for white blood cell traits of EOSB3 and MONOΔ13. The lead pleiotropic SNPs at genome-wise FDR < 0.10 were used to search for common candidate genes located within 1 Mb on either side of the SNPs for red blood cell (MCH, MCV, and RBC) and (MPV and PLT) platelet traits in different time points before and after exposure to the challenge. A relaxed FDR < 0.10 threshold for associated SNPs was used here to increase identification of true positives for the significance of biological and functional relevance of candidate genes [93]. Identification of positional candidate genes was conducted using the UCSC Genome Browser for the Ensembl annotation of the Sscrofa11.1 build of the swine genome (https://genome.ucsc.edu). One collective gene list was created for each trait by combing all candidate genes in associated genomic regions for IPA [94, 95]. Human, mouse, and rat genes in the IPA knowledge base database were used as background for biological function analyses in diseases, molecular and cellular functions, and physiological system development and function categories. A biological function was considered significantly enriched if the p-value for the overlap comparison test between the input list of candidate genes and the IPA database was less than 0.05 [94,95,96].