Introduction

Peripheral blood is a fluid connective tissue throughout the body connecting the entire biological system. Previous studies have shown that peripheral blood gene expression is an important source of information to profile individual uniqueness [1, 2]. Peripheral blood has distinct patterns of gene expression in different diseases including Parkinson’s disease [3], childhood asthma [4], rheumatoid arthritis [5, 6], infectious diseases [7] and early preeclampsia [8, 9]. A unique pattern of immune dysregulation is found in COVID-19 patients’ blood samples [10, 11]. In addition, studies have shown that composition and diversity of human blood microbial communities change with the onset of diseases including diabetes [12], cirrhotic [13], acute pancreatitis [14], cardiovascular diseases [15] schizophrenia [16], and cancer [17]. Peripheral blood is more available and less risky to collect than invasive organ biopsy, thus provides promising biomarkers for diseases in precision and translational medicine.

Peripheral whole blood (WB) contains white blood cells (WBCs), red blood cells (RBCs), platelets and plasma. WBCs include lymphocyte, granulocyte and monocyte and play critical roles in immunity, exhibiting the major complexity of blood. There are 3 popular methods to isolate WBCs from WB, including buffy coat (BC) extraction, RBC lysis and peripheral blood mononuclear cell (PBMC) isolation. BC extraction is achieved by extracting the white membrane layer of blood after centrifugation, remaining most WBCs and platelets [18, 19]. RBC lysis removes erythrocytes and leaves WBCs and platelets [20]. PBMC isolation uses density gradient centrifugation to separate mononuclear cells, such as lymphocytes and monocytes [21]. These cell isolation methods maintain different subpopulations of blood cells, leading to distinct expression patterns. Previous studies have observed a reduced number of detected genes, a high variability of gene expression profile and a low signal-to-noise ratio in WB compared with WBC, which was related to the RBC-derived globin transcripts [22, 23]. The gene expression pattern and cell subpopulations were also varied among WB and WBC isolations [22]. Most of these studies evaluated the difference on a group of selected genes by microarray. With the development of RNA sequencing (RNA-seq), it is possible to identify the changes of whole human transcriptome profile and microbial composition and diversity among WBC isolations, and the potential to affect biomarker discovery.

Here, we used RNA-seq to comprehensively investigate the influence of 3 popular WBC isolation methods on human transcriptome profile and microbial composition and diversity, including BC extraction, RBC lysis and PBMC isolation. With the assessment of the three cell isolation methods, we provide a reference to help researchers to choose suitable pretreatments for particular study purposes.

Materials and methods

Sample collection and leukocyte isolation

Three healthy volunteers (two males and one female) were enrolled, and 10 mL peripheral blood was collected each using EDTA anticoagulation tubes (BD, 0202992058) and processed immediately. Each tube of WB was divided into four parts, three of which underwent three different leukocyte isolation methods, and one portion was mixed with TRIzol LS Reagent (Thermo Fisher, 10296028). The methods of BC extraction, RBC lysis and PBMC isolation are described in Fig. 1. The study protocol was approved by the BGI Institutional Review Board (NO. BGI-IRB 17034). All donors signed consent forms for non-therapeutic use of their donated blood samples.

Fig. 1
figure 1

Schematic of sample processing. Whole blood was collected in EDTA anticoagulation tubes (n = 3). All samples were treated immediately after collection

RNA extraction, library preparation and sequencing

Twelve RNA samples were extracted by TRIzol Reagent (Thermo Fisher, 15596026) or TRIzol LS Reagent (Thermo Fisher, 10296028) according to the manufacturer’s manual. RNA concentration and integrity were measured by Agilent 2100 bioanalyzer (Agilent Technologies, G2939A). Samples with total RNA amount ≥ 200 ng and RIN score ≥ 6 were qualified to construct sequencing library. RNase H method was applied to deplete rRNA [24]. Sequencing was performed on BGISEQ-500RS (single-end 50 bp) developed by BGI.

Data filtering, alignment and expression quantification

Reads with adaptors and low quality were filtered by SOAPnuke [25], and rRNA was removed (hg19 rRNA ref) by SOAP2 [26] to obtain clean data. Metrics for clean data were calculated according to output files generated by SOAPnuke [25]. We used HISAT (version 2.1.0) [27] with default parameters to aligned all clean data on the human genome (hg19) and calculate metrics of alignment. We used RNA-SeQC (version 2.6.4) [28] to assessed the proportion of reads aligned to annotated CDS exons, 5′UTR exons. 3′UTR exons, introns, TSS up/down 10 kb and other regions.

We aligned all clean data on human transcript reference by Bowtie2 [29], the refMrna.fa.gz file from the UCSC database [30] removed NR_RNA as the mRNA reference, and NONCODEv5_human.fa.gz file from the NONCODE database [31] as the non-coding RNA reference. Saturation curves display the number of detected genes according to BAM files generated by Bowtie2. All transcripts estimated counts and Transcripts Per Million mapped reads (TPM) were obtained using Kallisto [32]. We performed principal component analysis (PCA) and calculated the coefficient of variation according to all expressed genes among isolation methods.

Blood cell subsets analysis

We estimated the content of globin mRNA (Supplementary Table S1) reads in clean data to present the residue of erythrocytes. CIBERSORT [33] was used to estimate the relative proportion of leukocyte subpopulations.

Identification of differentially expressed genes (DEGs) and co-expression genes

We used edgeR [34] to identify DEGs between isolation methods. The Benjamini-Hochberg (BH) method [35] was employed to correct multiple comparisons. DEGs were considered significant if they exhibited a BH-adjusted p-value ≤ 0.01 and fold change ≥ 2.

Then we identified isolation-associated co-expression genes by WGCNA [36] according to all DEGs’ TPM. We chose β = 9 as the soft threshold [37]. The Pearson correlation between module eigengenes (ME) [36] and isolation method was also calculated. Metascape [38] was used to analyze the disease enrichment according to DisGeNET database [39]. FunRich [40] was used to analyze the cell type enrichment of coding genes in different modules.

Human gene expression in different blood cell types

Gene expression levels summarized in 18 blood cell types is based on the Human Protein Atlas (version 20.0) [41] and Ensembl (version 92.38) [42]. TPM of uniquely detected genes in each group of cell was merged and observed in different cell types. Expression of coding genes of black and yellow modules in different blood cell types was also observed.

The blood microbiome analyses

Clean reads that failed to align to the human genome were further filtered with low-quality and low complexity reads, and the remaining reads were aligned to microbiome using Kraken (version 0.10.5) [43] with a database including viral, archaeal, bacterial, protozoa, fungi, and human. We excluded the human reads and calculated the microbiome reads in per million clean reads (microbiome-RPM) and the relative abundances of bacterial taxa at phylum level. The alpha diversity in each sample was determined using the Simpson index. To measure sample-to-sample dissimilarities between microbial communities, we used Bray–Curtis beta diversity index. Principal coordinates analysis (PCoA) was performed based on unweighted Bray–Curtis distances.

Statistical analysis

The coefficients of variation (CV) of commonly detected genes expression under each pretreatment were calculated. Paired two-sided t-test was used to compare the differences in CV, proportions of globin mRNA and leukocyte subsets between cell groups in this study, and p < 0.05 was considered to be significant. The relative proportions of leukocyte subsets were presented as mean ± SD.

Results

Quality control of RNA and RNA-seq

RNA was extracted from all processed samples with an average RIN value of 8.1 ± 1.0 (mean ± SD) and sequencing libraries were successfully constructed (Supplementary Table S2). For each sample of RNA-seq, about 100 M clean reads were obtained respectively (Supplementary Table S3). The proportion of rRNA reads and filtered reads in raw reads did not show significant difference among pretreatments (Supplementary Table S4). Q20 of all samples were larger than 98.75%, which inferred that the sequencing accuracy of most bases was up to 99% (Fig. S1A). The GC content in WB and BC extraction was higher than RBC lysis and PBMC isolation (Fig. S1B). All pretreatments have similar total aligned percentages, while WB and BC extraction with a higher multi-aligned and lower uniquely aligned percentages (Fig. S1C). We also observed that WB and BC extraction with higher proportion of CDS exons and lower proportion of introns (Fig. S1D).

Blood cell subsets analysis

The proportion of globin mRNA showed WB > BC extraction > RBC lysis > PBMC isolation (Fig. 2a, Supplementary Table S4). The proportion of leukocyte subsets varied with subjects (Fig. S2, Supplementary Table S5). As expected, neutrophils were rare in PBMC, with increasing proportions of lymphocyte and monocyte (Fig. 2b).

Fig. 2
figure 2

Blood cell subsets among WBCs isolations and WB. a Percentage of globin mRNA reads in clean reads. b The relative proportions of neutrophils, monocytes, CD8 cells, CD4 naïve cells, T regulatory cells, and NK cells resting. *Shows p-value <  = 0.05

Detection of human expressed genes

Saturation curves displayed that the number of detected genes performed PBMC isolation ≥ RBC lysis > BC extraction > WB at a random number of clean reads (Fig. 3a). The number of high abundance genes (TPM > 1) and uniquely detected genes (only detected in one cell group) (Fig. 3b, c) showed similar trends.

Fig. 3
figure 3

Human gene detected sensitivity. a Sequencing saturation analysis of each pretreatment. b Distribution of all gene expression levels. c Venn diagram showed the overlap of the detected genes by different WBCs isolations and WB. A gene is considered “expressed” in one pretreatment if it has a TPM value of at least 0.3 in all three biology replicates. d Principal component analysis (PCA) according to commonly detected genes expression. e Coefficient of variation for commonly detected genes expression. f Distribution of commonly detected gene and uniquely detected gene expression and g Heatmap of gene expression in 18 cell types. TPM was transformed by log2 (TPM+1)

According to the commonly detected genes of the four cell groups, the PCA showed that the expression profiling of WB and BC extraction were similar and the other two groups were distinct (Fig. 3d), and the CV performed PBMC isolation < RBC lysis < BC extraction < WB (Fig. 3e). Compared with the expression of commonly detected genes, that of uniquely detected genes was lower (Fig. 3f). We further found that the uniquely detected genes in RBC lysis were highly expressed in granulocytes (neutrophils, basophils and eosinophils), and those in PBMC isolation were highly expressed in basophils and NK cells (Fig. 3g). Enrichment analysis for DisGeNET did not identify significant terms.

Characterization of DEGs among isolation methods

The fold change distribution of DEGs in any two groups was shown in Fig. 4a. No DEG was identified between BC and WB. Through WGCNA, we successfully identified two integrative gene modules, and labeled by black and yellow (Fig. 4b). Furthermore, we found that the black module showed positive correlations with WB and BC extraction, while the yellow module showed a positive correlation with RBC lysis and a negative correlation with PBMC isolation (Fig. 4c). Heatmap showed that genes in black module had relatively high expression in WB and BC extraction, while those in the yellow module were highly expressed in RBC lysis (Fig. 4d).

Fig. 4
figure 4

Human gene expression profile difference among WBCs isolations and WB. a The fold change distribution of DEGs. b Gene dendrogram. The color row underneath the dendrogram shows the module assignment determined by the Dynamic Tree Cut. c Pearson correlation between module eigengene and the pretreatment. The cor (up) and p-value (down) are shown in box. The numbers of coding (left) and non-coding (right) gene are shown in brackets. d Heatmap of genes expression in each sample. TPM was transformed by log2 (TPM + 1). e Distribution of gene expression levels. f Summary of enrichment analysis in DisGeNET. (Color figure online)

The enrichment analysis of cell type showed coding gens in black module were significantly enriched in erythrocytes and yellow module were enriched in neutrophils (Fig. S3A) which consistent with the principle of experiments. We further found that the genes in black and yellow modules were also expressed in other leukocyte subsets (Fig. S3B). The gene expression distributions of black and yellow modules were shown in Fig. 4e. In DisGeNET enrichment analysis terms, the genes in black module were mostly enriched in erythrocytes related disease such as erythroleukemia, anemia, beta thalassemia intermedia, acute erythroblastic leukemia and hereditary spherocytosis (Fig. 4f). Genes in yellow module were mostly enriched in inflammation-related diseases, such as pneumonitis, inflammation, infection, Juvenile psoriatic arthritis and inflammatory dermatosis (Fig. 4f).

The microbial composition and diversity among isolation methods

The microbiome-RPM was varied with subjects and did not show difference among isolation methods (Fig. 5a). We found that the microbial composition was stable for each individual in different pretreatments, and proteobacteria dominate all samples (Fig. 5b). PCoA demonstrated that the microbial communities at the genus level was mainly affected by individuals (Fig. 5c). Alpha and beta diversities at the genus level were not different among groups (Fig. 5d, e).

Fig. 5
figure 5

The microbial composition and diversity among WBCs isolations and WB. a The microbiome-RPM. b Relative abundances of microbial taxa at phylum level. c Principal coordinates analysis (PCoA) of microbial communities at the genus level based on unweighted Bray–Curtis distances. Alpha (Simpson index) (d) and beta (Bray–Curtis dissimilarity index) (e) diversity of per sample at the genus level of classification

Discussion

Peripheral blood is a valuable source for noninvasive diagnosis and prognosis of various diseases and biomarker discovery. Expression of genes and the microbial composition and diversity in the blood provide important information of diseases and health status. There are several methods to preprocess peripheral blood, and we comprehensively and systematically assessed the influence of three popular cell isolation methods on the performance of transcriptome and microbial composition and diversity profiling of peripheral blood.

RBCs make up around 45% of the WB volume, and the mapping rate of globin mRNA varies with the proportion of erythrocytes and affect the capability of RNA-seq [44, 45]. Among the three leukocyte isolation methods, BC extraction presented similar with WB in PCA analysis, the globin mRNA mapping rate and the composition of leukocyte subsets, which demonstrated that BC extraction could not deplete RBCs efficiently and had a comparable composition of blood cells with WB. BC layer probably contains some RBCs, or BC extraction inevitably includes erythrocyte layer [18, 46]. With the comparable residue of erythrocytes as WB, the RNA-seq data of BC extraction showed a smaller number of detected genes, a higher GC content and a higher proportion of CDS exon than RBC lysis and PBMC separation. More genes with low abundance (TPM < 1) and less uniquely detected genes were also identified in WB and BC extraction. In addition, genes in the black module positively associated with WB and BC extraction, were observed with high abundance in these two groups and enriched in diseases such as anemia and beta thalassemia. As a result, BC extraction could not remove erythrocytes effectively and affected the capability of RNA-seq.

WBCs are important components of the peripheral immune system and play an essential role in protecting the body against infection, illness and disease. They include granulocytes, monocytes and lymphocytes. In healthy peripheral blood, neutrophil ranges from 50 to 75%, monocytes range from 1 to 8%, and lymphocytes range from 20 to 40% [47]. The relative proportions and expression of leukocyte subsets change with diseases [48,49,50]. Neutrophil is usually maintained by RBC lysis but depleted by PBMC isolation. Though these two isolation methods removed most erythrocytes, they showed distinct transcriptome profiles due to maintaining neutrophils or not. Our results showed that the variability of gene expression was higher in RBC lysis compared with PBMC isolation, which might be caused by the variation of the proportion of neutrophils (51.19% ± 10.05%) in RBC lysis. Genes uniquely detected in RBC lysis showed low abundance (TPM < 1), and had relatively high expression in granulocytes (including neutrophil, basophil and eosinophil). Genes uniquely detected in PBMC isolation also presented low abundance (TPM < 1), and had relatively high expression in basophils and NK cells. The neutrophil genes expressed more highly in RBC lysis than PBMC isolation as we observed in the yellow module [22, 23], and these genes enriched in infection and inflammation. In conclusion, RBC lysis maintained most WBC subsets, and PBMC isolation kept lymphocyte and monocyte but not neutrophil, leading to different transcriptome profiles.

There are increasing evidences to prove that the microbiome exists in healthy human blood [51]. In our data, we observed that there were differences of the microbiome content, the relative abundance at the phylum level, and the microbiome diversity at the genus level among the three individuals but not WBCs isolation methods. Though more subjects are needed to draw a conclusion, similar phenomena were reported. A previous study set up a 16S rDNA quantitative polymerase chain reaction assay as well as a 16S targeted metagenomics sequencing pipeline specifically designed to analyze the blood microbiome, and demonstrated that it varied among healthy donors and blood fractions (BC, RBCs and plasma) [52]. Moreover, we found that the proteobacteria dominated the composition of microorganisms among different individuals and isolation methods, which was consistent with previous studies [14, 16, 52].

Different cell isolation methods obtained distinct blood cell subsets and affected transcriptome profiles. It is recommended to choose the cell isolation method according to the research purpose. WB has the complete information of blood cells including both WBCs and RBCs. Draw blood through PAXgene tubes and add the depletion of globin mRNA to RNA-seq could be an effective strategy. Due to the variable proportions of leukocyte subsets of individuals and distinct functions of these subsets, focusing on the whole WBCs or specific subpopulations like PBMC is according to the disease or health status. If necessary, commercial kits for isolating specific cell subtype could be used to enable a precise characterization. The cell isolation methods had less effects on microbial composition and diversity than human transcriptome profiles. Regarding to the relatively small number of subjects in this study, the findings need to be interpreted with caution.

Conclusions

We systematically assessed the effect of BC extraction, RBC lysis and PBMC isolation on human transcriptome profiles and microbial transcripts, and found that the composition of blood cell subpopulations varied with these methods. We provide a reference for researchers to develop proper sample processing strategies for their own study purposes.