The search in the PubMed database yielded 305 results, EMBASE yielded 585 and Web of Science yielded 272 results. In total, 1162 references were obtained. After removal of duplicates, 755 remained. After abstract and full-text screening, 14 articles met inclusion criteria. Subsequently, citation tracking was applied, which did not lead to any additional findings. Hence the final number of included articles was fourteen (Fig. 1).
Risk of bias assessment
Of the 14 studies, three were scored to have low, eight to have moderate and three to have high risk of bias. Regarding individual categories, first, risk of population bias was generally moderate, and all studies reported age and sex, whereas only five studies provided specific and explicit inclusion and exclusion criteria for lumbar disc herniation [13, 17,18,19,20]. Second, selection bias could be ruled out in six studies and was also regarded as generally moderate [13, 17, 18, 20,21,22]. Third, risk of outcome bias was generally considered moderate as well; most studies clearly defined outcome measures except for Schistad et al. . Here, the authors described IL-8 measurements in the method section but failed to elaborate on them in the results . If studies failed to test parametric test assumptions for VAS scores, no points for statistical analyses were awarded [13, 18, 20, 21, 23,24,25]. None of the studies described clinical evaluation as independent of the treating physician. Fourth, the selected studies showed a low risk of attrition bias, as all of the 14 selected articles were prospective studies. Eight studies had a follow-up period longer than 6 months in all described studies [13, 18, 20, 22,23,24, 26]. Finally, only five studies explicitly reported to have no conflict of interest [18,19,20, 23]. An overview of the risk of bias scores is provided in Table 1.
Data extraction: macrophages and related cytokines and factors
The reported methods of measuring macrophages, cytokines and excretion factors varied widely. Some authors histologically described their presence in nucleus pulposus material that was taken out during surgery, others looked at presence of macrophages and accompanying inflammatory factors in blood or cerebral spinal fluid. Moreover, the choice of parameter studied varied widely. Not all the parameters that are associated with M1 and M2 macrophages were reported in the studies that were eligible for this review. The histological parameter for macrophages, CD68 (surface marker), was reported in a few studies. M1-related factors that were encountered are: interferon-gamma (IFN-γ), tumour necrosis factor alpha (TNF-α), tumour necrosis factor receptor 1/2 (TNFR1/TNFR2), and M1-related cytokines that were reported are: IL-1α, IL-1β, IL-6, IL-8 and IL-12. M2-related factors that were reported in the articles are: tumour growth factor-beta (TGF-β), and the M2-related cytokines that were reported are: IL-4 and IL-10.
Association between macrophage marker and pain
Two out of the four studies on CD68 [22, 26] found a negative association with pain scores during follow-up [22, 26,27,28], and one study found a negative association with straight leg raising test , which means that patients with higher CD68 (macrophage) expression had less pain and lower SLR scores.
Association between pro-inflammatory factors (M1) and clinical outcome
In studies examining the association of TNF-α with VAS pain or SLR or ODI, five out of six studies found a positive association [17, 19, 20, 23,24,25], which means that patients with higher TNF-α levels had higher pain scores. The only study that did not find such association had a high risk of bias . In most studies, TNF-α association with clinical parameters was evaluated at baseline, but in follow-up data, the association remained present [20, 24]. Both studies on TNFR1, a TNF-α receptor, found a positive association with pain scores, one at baseline , and both during follow-up [23, 24]. In contrast, the same studies found that TNFR2 had a negative association with pain scores, which means that patients with high levels of TNFR2 reported lower pain scores [23, 24]. Three out of five studies on IL-6 found a positive association with pain scores and ODI [13, 18, 19, 23, 25]. One of the studies that did not find an association had high risk of bias . The other study that did not demonstrate an association between pain and IL-6 determined the IL-6 concentration in disc material, while the three studies that did find a positive correlation examined IL-6 in serum (Table 2).
Two out of four studies on IL-8 found a positive association with pain scores and SLR [13, 17, 19, 25]; one of these studies examined IL-8 in disc material , and the other in serum . Two out of four studies on IL-8 found no association with pain scores, SLR or ODI, one of these studies (high risk of bias ) examined IL-8 in CSF and the other study examined IL-8 in serum .
All three studies on IL-1β showed no association with pain scores and SLR [23, 25]. The IL-1β expression was examined in disc material, CSF and serum. The study on IL-1a found no association with pain scores .
Two studies examined the association of IFN-γ with pain or SLR and did not find an association [21, 25]. However, one of these studies had a high risk of bias , and the other examined the association with several VAS cut-off scores, thereby inducing outcome bias .
Association between anti-inflammatory factors (M2) and clinical outcomes
Two studies examining IL-10 demonstrated different results [17, 19]. One study did not demonstrate an association with pain score or SLR . The other study demonstrated a negative association: in patients with higher pain scores or ODI, the concentration of IL-10 in serum was lower as compared to patients with a low pain score or ODI  (Table 2).
One of the two studies on IL-4 found a negative association with pain scores at 12-month follow-up . The other study demonstrated no association with VAS or ODI . The study on TGF-β found no association with pain or SLR .