Introduction

Many emerging infectious diseases are of zoonotic origins1. Approximately 70% of zoonoses are proposed to originate from wildlife and outbreaks of MERS-CoV2 from camels, Nipah virus3 from bats and SARS-CoV-2 from minks4 have shown that livestock and farmed animals can act as intermediate hosts and/or reservoirs which can sustain zoonotic transmission. In addition to passing viruses from wildlife to humans, livestock and farmed animals are also reservoir of zoonotic pathogens, such as bacteria Coxiella burnetii (C. burnettii) that causes Q-fever in goats, sheep and cattle5, and avian influenza6,7. Additionally, epidemiological studies have found evidence for an increased risk of respiratory disease in persons exposed to large-scale animal farming8,9. In part, this can be attributed to farm dust exposure for which adverse health effects have been demonstrated10,11,12, and the potential of zoonotic infections in these cases could be important yet under-reported. For instance, a Q-fever outbreak affecting approximately 4,000 people was reported in the Netherlands in 2007–2010 that was linked to circulation of C. burnettii in local dairy goat farms and dairy sheep farms13. Q-fever was not notifiable in small ruminant farms in the Netherlands before the outbreak, making early detection of zoonotic transmission of C. burnettii challenging14. Similarly, MERS-CoV had been circulating among dromedary camels long before it was recognized as a cause of severe respiratory disease in humans15. These livestock-associated zoonotic disease outbreaks emphasize that continued surveillance of livestock is essential, especially when the intensification of livestock production is widely practiced to meet increasing food demands of growing populations16.

Advances in next-generation sequencing (NGS) technology can provide a detailed description of the viral content of environmental samples and offers great potential for viral surveillance. Viral metagenomic approaches (using random priming rather than target specific primers) allow the detection of viruses present in a sample without prior knowledge of its presence17,18 as well as characterization of complex viral communities19,20. Indeed, reports of virus diversity in animals including bats21,22,23, pigs24,25,26, wild birds27,28, and chickens29 have been published since the development of NGS methods.

Chickens (Gallus gallus domesticus) are the most abundant domestic livestock in the world. According to the Food and Agriculture Organization of the United Nations (FAO), the global chicken population was > 22 billion birds in 201730. Chicken farming ranges from backyard to large-scale commercial farms that house ten-thousands of chickens. The Netherlands is the second largest agricultural exporter in the world31,32. There are approximately 0.1 billion housed chickens and around 1700 chicken farms in the Netherlands in 202033. Chicken can carry various pathogens such as Salmonella34, Campylobacter35 and avian influenza viruses including subtypes that can cause severe infection in humans like H5N136 and H7N96; and it has been found that avian influenza virus can be transmitted to humans as a result of close exposure to contaminated chicken7. Monitoring virus diversity in chickens and other livestock may be a key component for identifying potential zoonotic threats. Meanwhile, little is known about the virome composition of farm dust, although it has been suggested that farm dust might play a role in infectious disease transmission37,38. Farm dust typically consists of fragments of animal feed, bedding material, animal feces, animal dander, mites and microorganisms39, as well as fragments of poultry feathers in the case of poultry farm dust. The complex composition of dust particles allows them to accumulate and serve as a vector of biologically active material including antibiotic resistance genes, endotoxin, and, as shown here, complete viral genomes. Farm dust exposure may cause adverse health effects40,41. Most studies on farm dust have focused on measuring antimicrobial gene sequences42 and endotoxins10,12, with no published reports documenting the viral sequence content of farm dust samples.

Here, we describe a metagenomic NGS characterization of virus diversity in a paired sample set of farm dust and chicken feces collected at multiple time points in three commercial chicken farms across the Netherlands. We hypothesized that the viral sequences identified from dust samples would share similarities with viral sequences detected from chicken fecal samples from the same farms. We develop and present a sequencing methodology that enabled recovery of virus genome sequences from farm dust samples. We show that the chicken fecal virome and farm dust virome show similar patterns, demonstrating the potential of using dust as a part of surveillance matrix for monitoring farm virus. This study provides useful insights in understanding the virus communities in chicken and surrounding farm environments and provides an important tool for monitoring viral movement by dust, a previously underappreciated source of viral movement through our environment.

Results

Comparative virus detection in chicken farm dust samples and paired feces

A total of 46 individual farm dust samples and 56 pooled chicken feces were collected. Five chicken fecal samples were excluded due to insufficient cDNA concentration for library preparation. Individual farm dust samples collected at the same sampling moment from the same farm were pooled together during sample processing for sequencing, making up a final set of 13 pooled farm dust samples. Metagenomic deep sequencing of pooled chicken farm dust samples (N = 13) and pooled chicken feces (N = 51) generated ca 1–3 million paired-end short reads per sample. The negative blank control did not yield sufficient nucleic acid to be analyzed, indicating that consumables and reagents used were free of detectable viral materials.

The distribution of viral contigs from different virus families detected in chicken feces and chicken farm dust samples were generally similar (Fig. 1). The number of contigs detected in Picornaviridae (farm dust samples: 199; chicken feces; 914) was the highest in both chicken farm dust and chicken feces samples, followed by Parvoviridae (farm dust samples: 183; chicken feces: 293), Astroviridae (farm dust samples: 135; chicken feces: 267) and Caliciviridae (farm dust samples: 85; chicken feces: 226). Interestingly, more DNA virus contigs were generally detected in chicken farm dust samples when comparing to that of chicken fecal samples, including adenoviruses, genomoviruses, smacoviruses and circoviruses.

Figure 1
figure 1

An overview treemap of the number of viral contigs detected in all chicken fecal samples (A) and all chicken farm dust samples (B). The size of each virus family sector is proportional to the number of contigs detected in the family (= number of contigs in this family/total number contigs observed). Only viral contigs with minimum length of 300 nt, at least 70% identity to the reference sequences and a detection e-value threshold of < 1 × 10–10 when comparing with the closest reference sequence in our database were included.

From the total of 64 chicken farm dust samples (prefix: V_E; N = 13) and chicken feces (prefix: V_M; N = 51) analyzed, viral sequences belonging to the family Parvoviridae were consistently detected in all samples, followed by Picornaviridae with a 85% and 100% detection rate in chicken dust samples and chicken feces respectively (Table 1). Astroviridae were the third most commonly detected virus family detected in 85% of farm dust samples and 82% chicken feces, followed by Caliciviridae which were present in 69% (farm dust samples) and 78% (chicken feces) detection rate. The overall pattern of virus detection observed in dust samples was similar to that in chicken feces, some more variability was observed within farms over time (Table 1 and Fig. S1).

Table 1 Detection of viral sequences from different eukaryotic virus families in chicken feces and farm dust samples.

Although the number of Parvoviridae positive samples was slightly higher than for Picornaviridae, the actual number of Picornaviridae sequence contigs detected was the highest (Figs. 1 and 2). Within the Picornaviridae family, Sicinivirus sequences were consistently detected in 69% of farm dust samples and in all chicken feces (Table 2), followed by Anativirus (farm dust samples: 61%; chicken feces: 70%), and Megrivirus (farm dust samples: 46%; chicken feces: 43%). Of note, the number of contigs detected in different picornavirus genera corresponded to the detection pattern (Fig. 3; Fig. S2). In particular, Anatvirus, Gallivirus, Megrivirus and Sicinivirus sequences were detected in all three farms, while Orivirus was only identified in farm F_01 and F_03 and Avisivirus was only found in farm F_03 (Fig. S2). When looking at the virus family level, we did not observe any substantial differences in overall viral contents (Fig. 2) and virus-specific detection (Fig. S1) of samples from different farms and from different production cycles (different chicken ages) within farms.

Figure 2
figure 2

An overview of virus contigs detected in chicken feces and chicken farm dust samples. Each column represents one sample. Associated metadata is shown at the top panel. The color intensity of the heatmap (bottom panel) is determined by number of contigs with minimum length of 300 nt/bp, at least 70% identity and an e-value threshold of 1 × 10–10 when comparing with closest reference in our database. Sample order is assorted by farm ID, sampling time point. Time point is an arbitrary number.
 Samples were collected at the same time and place when both farm ID and time point match.

Table 2 Detection of viral sequences from picornavirus genera in chicken feces and farm dust samples.
Figure 3
figure 3

An overview of picornavirus contigs detected in chicken feces and chicken farm dust samples. Each column represents one sample. Associated metadata is shown at the top panel. The color intensity of the heatmap (bottom panel) is determined by number of contigs with minimum length of 300 nt, at least 70% identity and an e-value threshold of 1 × 10–10 when comparing with closest reference in our database. Sample order is assorted by farm ID, sampling time point. Time point is an arbitrary number.
 Samples were collected at the same time and place when both farm ID and time point match.

Genomic diversity in viral sequences identified from chicken farm dust samples and paired feces

Picornaviridae

Phylogenetic reconstruction was performed to compare the reported Picornaviridae sequences from 5 genera (Avisvirus, Anativirus, Gallivirus, Megrivirus and Sicinivirus) with global sequences. Forty-six picornavirus sequences with ≥ 80% genome coverage (farm dust samples: 8; chicken feces: 38) were included in the phylogenetic analysis of the polyprotein region. The maximum-likelihood (ML) phylogenetic tree (Fig. 4A) showed that the sequences identified from farm F_03, albeit small number (N = 12), belonged to all 5 different genera Avisvirus, Anativirus, Gallivirus, Megrivirus and Sicinivirus, suggesting a high viral diversity circulating among chicken in this farm. Interestingly, farm F_02 with the highest number of picornavirus contigs identified (N = 31) has a moderate level of viral diversity, with these contigs belonging to 3 genera Anativirus, Sicinivirus and Megrivirus. Only 3 picornavirus genomic sequences identified were from farm F_01, belonging to Sicinivirus (N = 1) and Anativirus (N = 2). Sicinivirus and Anativirus are the most commonly detected virus genera found with virus sequences detected in all 3 farms, while sequences of Gallivirus and Avisivirus were less commonly detected, found in only 1 farm, farm F_03.

Figure 4
figure 4

(A) Maximum-likelihood phylogenetic tree of 76 (sequences from this study: 46; reference sequences: 30) picornavirus partial polyprotein amino acid sequences (3786 sites) reconstructed using a best-fit evolutionary model LG + FC + R10. For clarity, only bootstrap supports > = 70 are shown as a light brown node point. Tip labels of our samples are color-coded by corresponding farm ID. Reference sequences are in grey. Tree was mid-point rooted for clarity. (B) Maximum-likelihood phylogenetic tree of 30 Sicinivirus partial polyprotein nucleotide sequences (sequences from this study: 15; reference sequences: 15; 8672 sites) reconstructed using a best-fit evolutionary model GTR + F + R10. For clarity, only bootstrap supports > = 70 are shown as a light brown node point. Tip labels of our samples are color-coded by corresponding farm ID. Reference sequences are in grey and tree was mid-point rooted for clarity.

Genome sequences detected from farm dust samples belonged to Sicinivirus and Megrivirus from farm F_02 that clustered with sequences identified in chicken feces from the same farm (Fig. 4A). Viral sequences in the Sicinivirus genus were detected in 69% of dust samples and all chicken feces samples with the highest number of viral contigs identified in each sample as compared to other genera in the Picornaviridae family. Slightly longer branches in the Sicinivirus sequences in the ML tree indicates higher diversity within this Sicinivirus clade (Fig. 4A). We further reconstructed a ML tree for all identified Sicinivirus polyprotein nucleotide sequences (N = 15; farm dust samples: 4; chicken feces: 11) in this study and compared them with global reference sequences (Fig. 4B). Two sequences from chicken feces (V_M_005_picorna_5 from farm F_01 and V_M_034_picorna_3 from farm F_03) were most distinct from the rest of the identified sequences (sharing only 69.4–72.3% nt identity) and were most closely related to strain JSY previously identified in China (78.9–79.0% nt identity). Other Sicinivirus sequences from the same farm often formed a monophyletic lineage, suggesting within-farm Sicinivirus sequences are highly genetically similar. Viral sequences identified from chicken feces and farm dust samples consistently clustered with each other forming their own sub-clusters within major clades in the ML trees, indicating high similarity between sequences from feces and dusts and that they are more closely related as compared to global reference sequences.

Astroviridae

Sixteen astrovirus sequences with ≥ 80% genome coverage were identified in this study, among which only 1 astrovirus sequence was found in farm dust (VE_7_astro_14; farm F_02). There was only 1 astrovirus genomic sequence from farm F_01. RNA-dependent RNA polymerase (RdRp) and capsid regions of these sequences were extracted for phylogenetic analyses (Fig. 5A,B, respectively). ML trees of both regions showed that these 16 astrovirus sequences belonged to two distinct lineages, A1 and A2 (Fig. 5). Sequences from samples from farm F_02 and F_03 were found in both lineages A1 and A2, suggesting co-circulation of different astrovirus strains in both farms. The 12 sequences from lineage A1 were most closely related to avian nephritis virus strains previously reported in China and Brazil (GenBank accession number: HM029238, MN732558 and MH028405), while the 4 sequences from lineage A2 were more closely related to each other and clustered with chicken astroviruses previously identified in Malaysia, Canada and the USA (GenBank accession number: MT491731-2, MT789782, MT789784 and JF414802). The 12 sequences from lineage A1 shared 96–100% amino acid (aa) identity in the RdRp region while 4 sequences from lineage A2 shared a higher 99–100% aa identity. For the less conserved capsid region, three small sub-lineages can be observed within lineage A1 (Fig. 5B). Notably, the only astrovirus sequence found in dust (strain VE_7_astro_14) was identical to astrovirus sequence recovered from its paired chicken feces sample (strain V_M_038_astro_4) at the same farm.

Figure 5
figure 5

(A) Maximum-likelihood phylogenetic tree of 33 astrovirus partial RdRp amino acid sequences (sequences from this study: 16; reference sequences: 17; 548 sites) reconstructed using a best-fit evolutionary model LG + FC + I + G4m. (B) Maximum-likelihood phylogenetic tree of 33 astrovirus partial capsid amino acid sequences (sequences from this study: 16; reference sequences: 17; 908 sites) reconstructed using a best-fit evolutionary model LG + FC + I + G4m. For clarity, only bootstrap supports > = 70 are shown as a light brown node point. Tip labels of our samples are color-coded by corresponding farm ID. Reference sequences are in grey and tree was mid-point rooted for clarity.

Caliciviridae

Similar to astrovirus distribution in feces versus dust, 13 out of 14 calicivirus sequences identified (with ≥ 80% genome coverage) were recovered from fecal samples with only 1 calicivirus sequence identified from a dust sample. Partial ORF1 polyprotein region of these sequences was extracted to construct ML tree for comparing the local calicivirus with global sequences. The ML tree showed that the reported calicivirus sequences were from two distinct lineages (C1 and C2) (Fig. 6). All calicivirus sequences from farm F_02 and F_03 belonged to lineage C1, while sequences from F_01 belonged to both lineages C1 and C2. The 11 sequences in lineage C1 shared 98.7–100% aa identity despite being collected from different farms and they shared 97.2–99.1% aa similarity with two partial ORF1 sequences identified in Germany in 2010 (GenBank accession number: JQ347523 and JQ347527). BLAST searches and multiple sequence alignment indicated that these 11 local calicivirus sequences are the first report of complete ORF1 sequence of that particular chicken calicivirus lineage on GenBank. The only calicivirus sequence from dust (VE_8_calici_66) clustered in lineage C1. The 3 sequences in lineage C2 are highly similar, sharing 99.9–100% aa identity and were all collected from farm F_01 at two time points. These 3 sequences shared 93.8–96.8% aa similarity with reference sequences from Germany, USA, Korea and Brazil (Bavaria virus, Bavovirus genus, GenBank accession number: HQ010042, MN810875, KM254170-1, MG846433-4).

Figure 6
figure 6

Maximum-likelihood phylogenetic tree of 30 calicivirus partial polyprotein amino acid sequences (sequences from this study: 14; reference sequences: 16; 2794 sites) reconstructed using a best-fit evolutionary model LG + FC + R10. For clarity, only bootstrap supports > = 70 are shown as a light brown node point. Tip labels of our samples are color-coded by corresponding farm ID. Reference sequences are in grey and tree was mid-point rooted for clarity.

Parvoviridae

A total of 34 parvovirus sequences with ≥ 80% genome coverage were identified (chicken farm dust: 8; chicken feces: 28). The amino acid sequences of the nonstructural protein (NS1) region were extracted for ML tree reconstruction. Two distinct lineages (P1 and P2) were observed in the ML tree supported by > 70% bootstrap value (Fig. 7). All but one sequences from farm F_01 and F_03 were clustered in lineage P1 and lineage P2 respectively, while sequences from farm F_02 belonged to both lineages. The 16 sequences in lineage P1 shared 99.5%-100% aa identity and they shared 97.3–99.6% aa identity with sequences from Switzerland and Brazil (GenBank accession number: OM469025, OM469088, OM469107 and MG846442-3). The 18 sequences in lineage P2 shared 99.4–100% aa identity with each other and 98.7–99.9% aa identity to sequences from Switzerland, Brazil, Korea and Canada (GenBank accession number: OM469032, OM469027, MG846441, KM254174 and MW306779). The inter-lineage aa identity between lineage P1 and P2 ranged from 73.7 to 74.4%.

Figure 7
figure 7

Maximum-likelihood phylogenetic tree of 48 parovirus partial NS1 amino sequences (sequences from this study: 34; reference sequences: 14; 700 sites) reconstructed using a best-fit evolutionary model JTT + FC + R2. For clarity, only bootstrap supports > = 70 are shown as a light brown node point. Tip labels of our samples are color-coded by corresponding farm ID. Reference sequences are in grey. The tree is rooted to a chaphamaparvovirus 1 strain, named PBD12.16-AU-2016, identified from Pacific black duck (GenBank accession number: MT247730).

Other virus families

Three nearly-complete coronavirus genome sequences were identified in 3 chicken feces (V_M_001, V_M_002 and V_M_003) from farm F_01. BLAST analyses showed that these sequences are infectious bronchitis virus (IBV) strains and share 99.92–99.95% identity at the nucleotide level with IBV vaccine strain 4/91 (GenBank accession number: KF377577), suggesting these IBV may possibly be viral shedding following vaccination or environmental contamination as vaccines are generally provided in drinking water.

Mash analysis

The phylogenetic comparison of large contig sequences presented above support the findings about the similarities between sequences from dust samples and possible farm animal sources. However, additional information can be obtained with global analyses that would compare all available sequence data from the samples. A method that allows a comparison of the large amount of unclassified sequences that are commonly observed in NGS data would provide additional information. We therefore used a kmer comparison tool Mash43,44 that prepares a hash description of all kmers of a given length and then allows rapid quantitative comparison of similarity of the kmers sets generated from different sequenced samples. We used the Mash distance function to calculate a Jaccard distance value for all pairs of dust sequence samples.

We compared all assembled viral contigs of the dust samples. Figure 8A shows the Jaccard distance between all 13 dust samples from 3 farms, colored by farm source. In all comparisons, pairs of dust samples from the same farm have lower Jaccard distance values (= greater similarity, darker blue in the figure) than inter-farm samples. It is also apparent that farm F_01 and farm F_02 were slightly more related to each other than either were to farm F_03. Similar patterns were obtained using all quality-controlled short read data from each dust sample (Fig. 8B).

Figure 8
figure 8

(A) Jaccard distance analysis of assembled viral contigs of 13 pooled farm dust samples. (B) Jaccard distance analysis of quality-controlled short reads of 13 pooled farm dust samples. The color intensity of the heatmaps is determined by MASH distance. Corresponding MASH distances are also shown in the heatmap. Sample labels are color-coded by farm ID and farm ID was added as prefix for clarity.

Discussion

Animals including livestock kept in close proximity to humans can act as intermediate hosts and/or sources for zoonotic disease transmission45. The current COVID-19 pandemic has raised awareness of zoonotic risks and has prompted the need for more proactive surveillance strategies involving animal surveillance, which could potentially guide zoonotic outbreak preparedness. Chickens are the most abundant livestock in the world30 and have been identified as reservoirs for various zoonotic pathogens35,36. In this study, we employed viral metagenomic deep sequencing to document the viruses present in farm dust samples and paired chicken feces collected within the same farms. These data revealed potential virus flow between chickens and their farm environment, highlighting the important role of dust as part of surveillance matrix to bring useful insights into the equation for monitoring virus flows at the animal-environment interface.

In this study, we described a random-primed viral metagenomic deep sequencing strategy to obtain viral sequences from dust samples. We were able to obtain long viral sequences providing ≥ 80% viral genome coverage from farm dust samples. Farm dust is known to be associated with adverse respiratory effects observed in farm workers with prolonged exposure40; however, whether viruses play a role is thus far unknown due to a knowledge gap in virus detection and virus diversity in farm dust. Characterizing farm dust viromes will aid in investigating possible health effects of occupational and environmental exposure to virus-containing farm dust. The method we have developed provides a platform for future surveillance in farm animals and environmental samples.

Viral contig distribution of chicken dust samples was in general comparable to that of chicken fecal samples. Viral sequences from Picornaviridae, Parvoviridae, Astroviridae and Caliciviridae were the most commonly detected in both sample types (Fig. 1), although a higher abundance of DNA virus sequences was interestingly observed in dust samples. Our phylogenetic analyses indicated that for all three farms monitored, viruses identified from farm dust samples were genetically closer to viruses identified from chicken feces collected from the same farm than from the other two farms in the study. Furthermore, the Jaccard similarity analysis (Fig. 8) further showed that for both known and unknown sequences the dust sequences were closer to other samples from the same farm, indicating some source specificity in the sampling method. This observation is in agreement with a previous study that reported the correlation found in the bacterial antimicrobial resistomes between animal feces and farm dust46. Although infectivity of the viruses identified was not determined in our study, the presence of nearly complete viral genome sequences and the complex protein/lipid/nucleic acid composition of dust makes it likely that dust indeed contains infectious virus particles. Future investigation on the infectivity of viruses in farm dust will provide useful insights in mapping virus flows between chickens and their surrounding environment, which collectively would guide future outbreak control strategies within and between farms.

Viral sequences identified in chicken feces from different farms were phylogenetically closely related to each other and formed sub-lineages or monophyletic lineages when comparing to global related sequences. This could possibly be explained by the pyramid structure of the broiler production system, as chicks found at each farm were all bought from a few suppliers; in other words, chicks could have been infected from the supplier and then brought the virus to different broiler farms with further spread within the farms, leading to similar viruses circulating in different broiler farms. Of note, the pyramid production structure was also suggested as a possible transmission route for ESBL/pAmpC producing bacteria between broiler farms47. Future characterization of virus diversity of chickens from the top part of the pyramid (suppliers) is needed to confirm the speculation.

Viral sequences belonging to five genera of the Picornaviridae family were identified in our study, suggesting a great diversity of picornaviruses circulating in chickens, which is consistent with previous studies that detected various picornavirus genera in chicken respiratory samples48,49. Astroviruses identified in this study clustered in two distinct phylogenetic lineages. All chicken astroviruses are grouped to one single species according to the current taxonomy assignment from International Committee on Taxonomy of Viruses50. Current classification of avastrovirus is based on phylogenetic analysis of capsid (ORF2) amino acid sequences51. More comprehensive assignment of genotypes is hindered by the limited number of complete capsid sequences. A similar observation was seen for Caliciviridae, where we identified two types of caliciviruses forming two distinct lineages in the ML tree. Lineage C2 is closely related to the Bavovirus genus, while the sequences from C1 are most related to an unclassified chicken calicivirus strain F10026n52 that only possessed a partial ORF1 sequence. Our data have provided 16 astrovirus and 14 calicivirus sequences, which helps expand the current knowledge of chicken astrovirus and calicivirus diversity and could potentially contribute to improved taxonomic assignment.

We observed that viral sequences in the Parvoviridae and Picornaviridae families were consistently detected in chicken feces and farm dust samples at multiple time points in all three farms. These virus families could potentially be used as signature markers when monitoring chicken and farm dust exposure. Additionally, these viruses could also be used as markers to assess the efficiency of disinfection procedures in farms and their impact on surrounding environments although stabilities of different viruses may differ due to different structures and we caution that viral activity assays have not yet been performed on this material. Further and perhaps larger scale of longitudinal surveillance with the inclusion of samples from layer hens and more environmental samples from surrounding areas, as well as samples from other bird species are certainly necessary to confirm our observation and the applicability of using these identified viruses as signature virus fingerprint.

Our study has several limitations. Although we collected chicken feces and farm dust samples longitudinally, limited number of samples per each time point especially for farm dust samples has hindered in-depth statistical comparisons between samples collected from different time points. An important limitation that should be considered is that the fecal and dust samples were each generated from pooled material, and dust by its very nature is a pooled sample (i.e. possibly containing sequences from multiple individual infections). The possibility that the resulting assembled genomes from these fecal and dust samples may be chimeras assembled from multiple individual infections should be considered in interpreting the results. This means that although one might intuitively expect identical sequences identified from the dust originating from the same source as the fecal sample, the technology and the pooling means that identical genome sequences may be difficult to find even though the sources of the viruses could be the same. Virologically important viruses should be examined using alternate methods such as direct isolation from individually infected animals. Chicken fecal samples and dust samples were processed slightly differently because of different sample nature and reagent availability, which could potentially lead to minor differences in detection sensitivity. Our finding showed that viral sequence content found in dust and chicken feces are relatively similar, suggesting any potential bias is minimal.

In conclusion, we provide a dataset of viral sequences generated from farm dust samples and feces and show a similar pattern between viral sequences from dust and feces samples in terms of composition and genetic similarity. These results support the idea that dust sampling can provide an accurate description of the farm virome and could potentially be incorporated as part of viral metagenomic surveillance matrix. In the long term, understanding virus flow between animals and humans is vital for identifying potential zoonotic threats and zoonotic outbreak control and preparedness.

Materials and methods

Sample collection

Pooled chicken feces and farm dust samples were collected in three commercial broiler farms in the Netherlands at 4–5 time points from May to August 2019. The three commercial broiler farms are located in three different regions in the Netherlands (Western part: Noord/Zuid-Holland region; Northern part: Friesland/Groningen region; Eastern part: Gelderland region). Detailed sampling strategy and sample metadata is described in Table 3. Each pooled poultry fecal sample was manually picked from the floor and contains fresh fecal material from 3–4 chicks. The chicks were not kept in cages but could walk freely. Farm dust samples were collected using a passive air sampling approach using electrostatic dustfall collectors (EDCs)53,54. Electrostatic cloths were sterilized through incubation at 200 °C for 4 h. Sterilized electrostatic cloths were then fixed to a pre-cleaned plastic frame. EDCs were exposed for 7 days at 1 m above the floor with the electrostatic cloths facing up in broiler farms to enable sampling of settling airborne dust instead of resuspended dust from the floor. EDCs were contained in a sterile plastic bag before and after sampling. All samples were transported under cold chain management and stored at − 20 °C/− 80 °C before processing.

Table 3 Detailed sampling strategy.

Sample processing and metagenomic deep sequencing

Chicken fecal samples were processed as previously described55. Briefly, chicken fecal suspension was prepared in Phosphate-buffered saline (PBS) and treated with TURBO DNAse (Thermo Fisher, USA) at 37 °C for 30 min, and then subjected to total nucleic acid extraction using QIAamp viral RNA mini kit (Qiagen, Germany) according to manufacturer’s instruction without addition of carrier RNA. For dust samples, electrostatic cloths were incubated in 3% beef extract buffer for 1 h on rolling as previously published56. After incubation, the suspension was collected and centrifugated at 4000g at 4 °C for 4 min to pellet any large particles or debris. Total viruses in dust suspension were concentrated using polyethylene glycol (PEG) similar to virus concentration in sewage published previously57,58. Briefly, PEG 6000 was added to each dust suspension to make up a final 10% PEG 6000 (Sigma-Aldrich, USA) concentration, followed by pH adjustment to pH 4 and overnight incubation at 4 °C with shaking. After overnight incubation, sample was centrifuged at 13,500 g at 4 °C for 90 min. Supernatant was removed and the remaining pellet was resuspended in 500 μL of pre-warmed glycine buffer, and then subjected to 5-min centrifugation at 13,000 g at 4 °C. Supernatant was collected, and supernatants from EDC samples that were collected in the same farm at the same time point (N = 1–4) were pooled together for further processing. Viral-enriched dust suspension samples were then treated with TURBO DNase as previously described55 to remove non-encapsulated DNA, followed by total nucleic acid extraction using MagMAX™ viral RNA isolation kit (Thermo Fisher, USA) according to manufacturer’ instructions but without the use of carrier RNA. Reverse transcription (using SuperScript III reverse transcriptase [Invitrogen, USA] and non-ribosomal hexamers59 and second strand cDNA synthesis (using Klenow fragment 3’-5’ exo- [New England BioLabs, USA]) of chicken fecal samples and dust samples was performed as previously described55 and following manufacturer’ instructions.

Standard Illumina libraries were prepared using Nextera XT DNA library kit following the manufacturers’ instructions. Final libraries were sequenced on the Illumina MiSeq platform (600 cycles; paired-end 2 × 300 bp). Chicken fecal samples were sequenced with a multiplex range of 16–19 samples per run. Dust samples and negative blank controls were sequenced with a multiplex of 12 samples per run.

Sequencing data analysis

Raw reads were subjected to adapter removal using Trim Galore/default Illumina software, followed by quality trimming using QUASR60 with a threshold of minimum length of 125 nt and median Phred score ≤ 30. The resulting quality-controlled reads were de novo assembled using metaSPAdes v3.12.061. De novo assembled contigs were classified using UBLAST v11.066762 against eukaryotic virus family protein databases as previously described23,26. We set a detection threshold of contig with minimum amino acid identity of 70%, minimum length of 300 nt and an e-value threshold of 1 × 10–10 when interpreting our contig classification results. Contig classification results were analyzed and visualized using R packages including dplyr, reshaped2 and ComplexHeatmap63,64,65 and python package squarify.

Phylogenetic analysis

Sequences with ≥ 80% of genome coverage from the Picornaviridae, Astroviridae, Caliciviridae and Parvoviridae families were used for phylogenetic analyses, comparing them with global sequences retrieved from GenBank. Multiple sequence alignments were performed using MAFFT v7.42766 and manually checked in Geneious v2021.0.3 (Biomatters, New Zealand). Best-fit models of evolution for phylogenetic tree reconstruction were estimated using ModelFinder module in IQ-TREE v1.6.1167, determined by Akaike Information Criterion (AIC). Maximum-likelihood trees were constructed using RAxML-NG v1.0.168 with 100 pseudo-replicates. Resulting trees were visualized and annotated using R package ggtree69.

Kmer analysis with MASH

To compare Jaccard distances between dust samples, all quality controlled short reads, or resulting assembled viral contigs were analyzed using the Triangle function from Mash v2.343,44 with a kmer size of 32 (-k 32) and a sketch size of 10,000 (-s 10,000). The resulting distances were visualized in a heatmap using R package ggplot270.