Introduction

As a critical factor affecting milk yield and quality, mastitis represents the most relevant health problem in dairy ruminants worldwide (Ruegg 2017). According to the National Mastitis Council (NMC) mastitis is defined as “an inflammation of one or more quarters/halves of the mammary gland, almost always caused by an infecting microorganism” (Lopez-Benavides et al. 2012). Whereas clinical mastitis can be diagnosed by examination of the udder and of the milk for visible abnormalities, identifying subclinical mastitis is more challenging (Menzies and Ramanoon 2001; Oliver et al. 2004). In animals with subclinical mastitis, the diagnosis is mainly performed on the milk through indirect methods such as the Somatic Cell Count (SCC) (Bergonier et al. 2003; Persson and Olofsson 2011) or its field version, the California Mastitis Test (CMT) (Kelly et al. 2018). Being typically caused by an intramammary infection (IMI) (Ezzat Alnakip et al. 2014), the disease is also investigated through direct methods such as the bacteriological culture (BC) (Contreras et al. 2007) or molecular assays (i.e., PCR) (Chakraborty et al. 2019). The indirect screening approaches rely mainly on the principle that the udder microenvironment changes during the inflammatory process, with an increase in the concentration of immune cells and immune mediators (Hughes and Watson 2018). Polymorphonuclear neutrophils (PMNs) are the prevalent immune cells in the acute phase of mastitis; therefore, SCC and CMT perform well as diagnostic tools because of their indirect relationship to the presence of PMNs (Leitner et al. 2000; Sordillo and Streicher 2002). However, these tests may lack specificity (Rossi et al. 2018), especially in small ruminants (Souza et al. 2012). On the other hand, BC lacks sensitivity (Chakraborty et al. 2019), and it is hardly applicable as a mastitis screening tool given its requirements in terms of time, labor, and cost. Clinical examination, SCC, CMT, and BC, should be used in combination for increasing diagnostic performance (Lam et al. 2009; Chakraborty et al. 2019); however, a universally accepted specific diagnostic algorithm or protocol is not yet available.

During mammary gland inflammation, numerous antibacterial and immune defense proteins, including Acute Phase Proteins (APPs), lactoferrin (LF), cathelicidins (CATH), cytokines, chemokines, and growth factors, are released in the milk and can potentially serve as “mastitis markers” (Smolenski et al. 2011; Thomas et al. 2015). Accordingly, their implementation as alternative/integrative diagnostic tools has been the subject of several studies during the last decades (Viguier et al. 2009). Many of them focused on discovering new biomarkers for implementing diagnostic tools with improved sensitivity and specificity when compared to the currently available assays. For inflammation-related proteins devoid of intrinsic enzymatic activity, the measurement methods are typically immunoassays employing highly specific antibodies (Viguier et al. 2009). Adding to the possibility of increased diagnostic performances, the integration of traditional diagnostic approaches with immunoassays measuring mastitis marker proteins might bring additional benefits, including the ability to work efficiently on frozen samples, the high analytical throughput, the relatively low analytical costs, and the minimal requirements for dedicated personnel training, specialized or expensive instrumentations (Addis et al. 2016a).

A group of widely investigated potential biomarkers are Acute Phase Proteins (APPs), commonly employed as clinical biomarkers of inflammation in serum but also found in the milk. In particular, the milk isoforms of serum amyloid A (M-SAA) and haptoglobin (HP) (Hussein et al. 2018; Chakraborty et al. 2019; Iliev and Georgieva 2019;) are among the most employed ones. Other proteins indicated as suitable mastitis markers are lactoferrin (LF) (Shimazaki and Kawai 2017) and cathelicidins (CATH) (Smolenski et al. 2011).

Biomarker discovery and implementation are constantly evolving, and comparative data on their diagnostic performances are lacking. Therefore, it is not easy to establish their relative advantages in the different dairy ruminant species compared to the current diagnostic approaches. To provide an organic overview of the topic, to understand if the data currently available in the literature are amenable to meta-analysis, and to attempt a comparative assessment of the respective diagnostic performances, we carried out a literature survey using the systematic review approach based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. In veterinary medicine, the methodology for systematic reviews has been defined by Sargeant and O’Connor (2020), who identified specific steps to follow. Accordingly, our review question falls within the fourth type, “Diagnostic test accuracy questions”, aimed at summarizing diagnostic test accuracy. Specifically, this systematic review aims at examining the scientific literature to answer the diagnostic question: “Which are the diagnostic performances of mastitis protein biomarkers investigated by immunoassays in ruminant milk?”.

Methods

Information sources and search strategy

We carried out this systematic review according to the guidelines of the PRISMA statement (Moher et al. 2009). We searched three different databases (i.e., MedLine, Scopus, and Web of Science) until January 28, 2021. For Scopus searches, we applied the default search settings (Article title, abstract, and keywords), whereas in Web of Science we used the specific database “Web of Science Core Collection”. Our review question falls within the fourth type, “Diagnostic test accuracy questions”, aimed at summarizing diagnostic test accuracy, and at answering the diagnostic question: “Which are the diagnostic performances of mastitis protein biomarkers investigated by immunoassays in ruminant milk?” as suggested by Sargeant and O’Connor (2020) for systematic reviews in veterinary medicine. Accordingly, the search terms included the words “biomarker”, “marker”, “intramammary infection”, “mastitis”, “milk”. These search terms were enriched with the most common markers and detection assays to improve the retrieval of relevant scientific articles. Concerning markers, an initial survey of the literature indicated that the ones most associated with the words “milk” and “mastitis” were M-SAA, HP, LF, and CATH. On the other hand, the two immunoassays most frequently used for measuring protein markers devoid of intrinsic enzymatic activity were ELISA and lateral flow/immunochromatography. Once defined, we combined the search terms and their related Mesh terms into 42 specific searches, as follows: (“biomarker” OR “marker” OR “amyloid” OR “haptoglobin” OR “cathelicidin” OR “lactoferrin”) AND (“intramammary infection” OR “mastitis”) AND (“milk”) AND (“immunoassay” OR “ELISA” OR “lateral flow” OR “immunochromatography”) (Supplementary Table I).

Study selection, data extraction, and synthesis method

Three researchers (AG, ST, and MP) independently screened title, abstract, and full-text for assessing the article compliance with the review question and solved any disagreement by discussion and consensus. When necessary, a fourth researcher with expertise in the field (MFA) was consulted to reach an exclusion decision. Adding to the articles not relating to the review question, we excluded those written in languages different from English and belonging to the categories review, case report, report, book chapter, editorial, abstract, and letter. From each eligible document, the following data were extracted: species, first author, year, country, study design, biomarker, technique, sample type and size, SCC, pathogens, unit of measurement, results, sensitivity, specificity, and cut-off. To synthesize the results we applied the “Synthesis Without Meta-analysis” (SWiM) guidelines (Campbell et al. 2020) by using tables and graphs.

Quality assessment

The tool consists of 14 questions and two main sections, bias assessment and applicability, including four key domains and three key domains, respectively. In bias assessment, for every study, were assessed the “animal selection” strategy, the “index test”, the “reference standard”, and “flow and timing”. The term "index test" is referred to the test object of study, while "reference standard" to the standard test considered the best available test to diagnose the disease of interest (i.e. a single test, follow-up or combination of tests).

In the applicability assessment, we collected and rated how much the studies matched the review question. For both sections, the risk was expressed as “high”, “low”, and “unclear” risk when data were insufficient. The 33 screened records showed high heterogeneity in study design, animal selection, and standard reference tests.

Results and discussion

Results of the PRISMA procedure

The steps of the literature search are summarized in the PRISMA 2009 flow diagram (Fig. 1). The search led to the identification of 507 scientific papers (220 MedLine + 131 Scopus + 156 Web of Science); 16 further records were then added to the original search through an expert revision of the literature, resulting in 523 manuscripts (Supplementary Tables II, III, IV, and V). After removing duplicates, 133 records entered three main screening steps. Firstly, records were screened on the title, secondly on the abstract (n = 72, intermediate step not included in Fig.1), and finally on the full-text for evaluating the eligibility to qualitative and quantitative analysis (Supplementary Tables VI and VII). As a result of this procedure, 33 scientific articles were considered eligible (Supplementary Table VIII).

Fig. 1
figure 1

PRISMA 2009 flow diagram

Species overview

By sorting the number of papers based on the dairy species, out of 33 manuscripts, 26 (78.8%) investigated cows, 4 (12.1%) sheep, 3 (9.1%) goats, and 2 (6.1%) buffaloes (Table 1). The total number of records does not match because 2 papers addressed more than one species.

Table 1 Species and biomarker overview

Cow. Out of 26 papers on cow milk, 15 (57.7%) investigated M-SAA, 9 (34.6%) HP, 5 (19.2%) LF, 2 CATH (7.7%), interleukin 1β (IL1β) and interleukin 6 (IL-6). Other biomarkers were Alpha-1-Acid Glycoprotein (AGP), bovine serum albumin (BSA), C-reactive protein (CRP), immunoglobulin G (IG), interleukin 8 (IL8), interleukin 10 (IL10), interleukin 12 (IL12) lipopolysaccharide-binding protein (LBP), Transforming Growth Factor α (TGFα), Transforming Growth Factor β (TGFβ), and Tumor Necrosis Factor α (TNFα), and they were addressed in 1 paper each (3.8%) (Table 1). The samples were represented by quarter milk in 18/26 (69.2%), and by composite milk in 8/26 (30.8%). In one record (Sobczuk-Szul et al. 2014), the milk sample type was not specified, whereas in another study (Thomas et al. 2015) both quarter and composite samples were used. Concerning the diagnostic methods, ELISA was used in 25 (96.2%) records, whereas in 1 paper (3.8%) the biomarker was investigated by SPARCL. Moreover, we observed 25 (96.2%) observational studies, related to natural inflammation/infection, and only one experimental infection study. Tables 2, 3, 4 and 5 summarize the main findings of the 26 papers evaluating cows.

Table 2 Cow, results obtained for M-SAA by applying ELISA and SPARCL* (Dalanezi et al. 2020). The unit of measurement is μg/mL.
Table 3 Cow, results obtained for haptoglobin by ELISA. The unit of measurement is μg/mL.
Table 4 Cow, results obtained for other non-cytokine markers.
Table 5 Cow, results obtained for cytokine markers by ELISA.

Sheep. Out of 4 papers on sheep milk, 2 (40.0%) assessed CATH, while 1 each (20.0%) were on interleukins and M-SAA, respectively. ELISA was used in all studies, three of which were observational (60.0%) and 2 (40.0%) experimental. All studies were carried out on half-udder milk samples. Table 6 summarizes the main findings of the 4 papers.

Table 6 Sheep, results obtained for all markers by ELISA.

Goat

Two (66.7%) out of 3 studies assessed CATH, while 1 (33.3%) assessed LF. ELISA was used in all studies, which are all observational. All papers investigated biomarkers from half-udder, but one (Chen et al. 2004) used also bulk milk samples. Table 7 summarizes the main findings of the three papers.

Table 7 Goat, results obtained for all markers by ELISA.

Water buffalo

Only two observational studies were performed on buffalo. The biomarkers investigated were LF and CATH from quarter milk by ELISA. Table 8 summarizes the main findings of the two papers.

Table 8 Water buffalo, results obtained for all markers by ELISA.

Biomarker overview

Table 1 summarizes our results presented in descending order of records addressing biomarkers and dairy species. Among all markers, M-SAA was the most frequently mentioned (n. 16; 48.5%), followed by HP (n. 9; 27.3%;), CATH (n. 8; 24.2%) and LF (n. 7; 21.2%;). Other markers investigated were IL1β and IL6, addressed in 3 papers each (9.1%), followed by IgG (n. 2; 6.1%) and finally AGP, BSA, CRP, IL8, IL10, IL12, LBP, TGFα, TGFβ, TNFα (n. 1; 3.0%).

Milk serum amyloid (M-SAA)

M-SAA is produced extrahepatically by healthy mammary epithelial cells (McDonald et al. 2001; Larson et al. 2005) and during inflammation (Grönlund et al. 2003; Larson et al. 2005; Brenaut et al. 2014). M-SAA was the protein most investigated as subclinical mastitis marker in ruminant milk, particularly in dairy cows (Table 2). In our study, we observed that in 17 papers M-SAA was investigated predominantly by ELISA with the commercial kit Tridelta solid sandwich ELISA in two variants (Tridelta Mast ID range MAA assay, Tridelta Development Ltd., Kildare, Ireland, Cat. No.: TP-802 for serum and TP-807 for milk). However, to diagnose mastitis, the authors did not discriminate for serum or milk amyloid isoforms but for the different matrices, defining the protein as SAA when analyzing serum and M-SAA when analyzing milk, respectively. Interestingly, in 5 studies M-SAA was investigated only by TP-802 (Grönlund et al. 2005; Eckersall et al. 2006b; Kováč et al. 2007; Åkerstedt et al. 2007, 2009), in 5 only by TP-807 (Åkerstedt et al. 2011; Shirazi-Beheshtiha et al. 2011; Jaeger et al. 2017; Hussein et al. 2018; Bochniarz et al. 2020; Wollowski et al. 2021), in 2 by both TP-802 and TP-807 (Gerardi et al. 2009; Safi et al. 2009) and in 5 a Tridelta kit was used but the test category was unspecified (Suojala et al. 2008; Pyörälä et al. 2011; Kovačević-Filipović et al. 2012; Szczubiał et al. 2012; Thomas et al. 2015). In particular, Gerardi et al. (2009) investigated M-SAA in milk with both TP-807 and TP-802 assays to compare their diagnostic performances. The sensitivity of TP-807 test is 0.10 μg/ml but a cut-off able to discriminate healthy from mastitic milk has not been defined yet. Miglio et al. (2013) reported a M-SAA peak almost 10 times higher in sheep milk than cow milk. Although no official reference range is fixed for M-SAA in milk, healthy sheep milk concentration ranges from 23.75 to 35.61 μg/ml (Miglio et al. 2013), higher than that observed in cow milk (range: 0.0 - 7.5 μg/ml) (Gerardi et al. 2009). In goat, the MAA as mastitis marker was not suitable. In this species, M-SAA levels increase physiologically as lactation progresses as does SCC, even in absence of infection (Pisanu et al. 2020).

Haptoglobin (HP)

HP was the second most represented marker in our literature search. Its performance for mastitis detection was analyzed in 9 records, only for cows and by ELISA (Table 3). HP found in milk has an undefined origin. However, similarly to M-SAA, extrahepatic production may also occur in the mammary tissue. Still, it has been demonstrated that HP concentration increases in milk upon endotoxin challenge, experimental, and natural intramammary infection (IMI) (Grönlund et al. 2003; Eckersall et al. 2006; Gerardi et al. 2009). Interestingly, HP appears in milk and raises in level 3 hours and in blood 9 hours after inflammation (Hiss et al. 2004), indicating that the production of this biomarker by the mammary gland is rapid and specific. The diagnostic performance reported in cows by various authors is promising (Table 3) and encourages its evaluation also in other dairy species. For its characteristics, this biomarker might also be promising for the diagnosis of caprine mastitis, particularly in late lactation, when the SCC is high and other markers fail to provide satisfactory performances (Pisanu et al. 2020).

Cathelicidin (CATH)

CATH was measured mainly most by ELISA in goats (n. 3), cows (n. 2), sheep (n. 2), and water buffalo (n. 1). CATH are host defense proteins with antimicrobial and immunomodulatory functions (van Harten et al. 2018) produced by milk PMNs (Kościuczuk et al. 2012) and mammary epithelial cells (Zanetti 2004, 2005; Addis et al. 2013; Cubeddu et al. 2017). The ruminant genome contains numerous CATH proteoform genes, but their differential abundance in mastitic milk is poorly known (Zanetti 2005). CATH showed a high diagnostic performance especially in cows and sheep, also in late lactation. Interestingly, by using a threshold set using negative healthy controls, a good sensitivity of the dedicated ELISA is reached not only for cow and sheep milk (Addis et al. 2016a, 2016b), but also for water buffalo milk (Puggioni et al. 2020a). Conversely, the application of CATH-ELISA in goats remains unsatisfactory in late lactation, especially in pluriparous goats. In fact, the related physiological increase in PMN compromises its reliability, as mentioned above for M-SAA (Pisanu et al. 2020).

Lactoferrin (LF)

LF was primarily detected by ELISA in studies involving cows (n. 5), goats (n. 1,) and water buffalo (n. 1). LF is a glycoprotein of the immune defense secreted by mammary epithelial cells during the late stage of milking and mammary involution (Welty et al. 1976; Galfi et al. 2016a). The presence of LF in milk is due to secretion by epithelial cells and degranulation of PMNs during inflammation (Lash et al. 1983). Even though LF is not an APP, it increases remarkably during the inflammatory response due to its production by mammary epithelial cells (Galfi et al. 2016a). Concerning test characteristics for goats and cows, two studies carried out a competitive ELISA by using a lactoferrin antiserum from rabbit, and goat lactoferrin was isolated and purified (Chen and Mao 2004; Chen et al. 2004). In other studies, cow LF was quantified by a commercial sandwich LF ELISA kit (Bethyl Laboratories, Montgomery, TX) (Cheng et al. 2008; Sobczuk-Szul et al. 2014; Galfi et al. 2016a, 2016b). For water buffalo, a specific ELISA kit was produced for the study (Özenç et al. 2019). None of the studies reported test characteristics for LF, and therefore no information on sensitivity or specificity is available for this marker.

Other markers

IL1β and IL6 were studied in both cows and sheep (Tab.2), IL-8 only in sheep, and the other proteins (AGP, BSA, CRP, IG, IL10, IL12, LBP, TGFα, TGFβ, TNFα) only in cows. In humans, immune cytokines such as TNFα, INFγ, and ILs are investigated as inflammatory markers to detect subclinical mastitis and identify Th1/Th2 ratio in the inflammatory process (Tuaillon et al. 2017). CRP was studied as a predictor of severity of symptomatology in women's breast inflammation (Fetherston et al. 2006).

In cows, immune cells and their related cytokines have been the subject of recent studies (Gulbe et al. 2020; Shaheen et al. 2020), especially pro-inflammatory immune mediators. In other dairy ruminants, however, these proteins and their roles in mastitis have to be still studied and understood.

Method overview

Clinical signs, SCC or CMT, and bacteriological culture results were the reference standard methods used to define the presence of mastitis or IMI in dairy ruminants, in association or alone (Chakraborty et al. 2019). Among the analytical techniques applied to evaluating protein biomarkers, ELISA was used in 31 of 33 (93.9%) selected records, whereas SPARCL (Spatial Proximity Analyte) and RID (radial immunodiffusion) were each applied in 1 paper.

Limitations of the systematic review

Issues in research methodology

Our research encountered several critical issues in applying the PRISMA standard methodology, especially concerning the search strategy. While selecting the best performing keywords for carrying out our review, we assessed several combinations for finding those enabling to collect the most comprehensive but selective set of publications possible. During the process, we had some unexpected findings; for instance, including the keyword “ruminant” produced a less sensitive search, leading to the decision to remove it. Interestingly, this gives a clue that the word “ruminant” is uncommonly used in title, abstract, or keywords, probably because the authors prefer to report only the name of the dairy species. Furthermore, misleading titles and abstracts led to identifying papers that did not address the research question, and these had to be excluded (as detailed in Methods). On the other hand, we compensated for the possible loss of records consequent to improper index terms with an additional critical revision of the literature performed on PubMed by an expert author. Furthermore, the references of each retrieved article were screened as a further compensative measure. Nevertheless, there is always a risk for exclusion for those articles that do not contain at least one of the selected search terms in the title, abstract, or keywords. Therefore, it is very important that the authors pay particular care when drafting these crucial parts in order to maximize article retrieval.

Bias assessment and applicability of studies

Defining quality assessment of primary studies is an essential step in systematic reviews. Therefore, the risk of bias and applicability must be evaluated and scored in all studies, especially those focused on diagnostic accuracy. Hence, we applied QUADAS, a quality assessment tool, to all the selected studies. Concerning the risk of bias (Supplementary Table IX), on animal selection (domain 1) 18/33 (54.5%) studies had a low risk of bias, 15/33 (45.5%) high, and 0/33 (0.0%) unclear risk. Regarding the index test (domain 2), one study out of 33 (3.0%) had a low risk of bias, 29/33 (87.9%) had high risk, and 3/33 (9.1%) unclear risk. For the reference standard (domain 3), we observed a low risk of bias in 22/33 studies (66.7%), high risk in 9/33 (27.3%) and unclear risk in 2/33 (6.1%). Finally, flow and timing (domain 4) showed low risk of bias in 21/33 records (63.6%), high risk in 11/33 (33.3%), and unclear risk in 1/33 (3.0%). Many studies showed low concerns about applicability, especially regarding domain 3 (Supplementary Table X) In detail, in domain 1, low risk was reported in 29/33 (87.9%), high in 4/33 (12.1%), and unclear in 0/53 (0.0%). In domain 2, records had low risk in 25/33 (75.8%), high in 5/33 (15.1%) and unclear 3/33 (9.1%); whereas in domain 3 we observed low risk in 31/33 (93.9%) papers, high in 2/33 (6.1%) and unclear in 0/33 (0.0%).

Conclusions and recommendations

Our work aimed at analytically assessing the scientific literature describing the use of non-enzymatic milk proteins as mastitis markers in dairy ruminant species with the PRISMA approach. Moreover, we aimed at summarizing and comparing the diagnostic performances of the immunoassays developed for their detection in the milk. As expected, the most frequently mentioned biomarkers were M-SAA, HP, CATH, and LF, which were investigated both in experimental/observational studies and in discovery/implementation approaches. Nonetheless, we observed several critical issues in study designs, reference standard methods (the lack of “gold standard”), index test (frequently performed without a blind approach), heterogeneity in the unit of measurement used for detecting the same biomarker, and the different type of statistical analysis performed, resulting in a heterogeneity of the collected data that was not amenable to meta-analysis. Unfortunately, this is a common finding in many meta-analyses and illustrates how important it is for case definitions and other criteria to be standardized between studies. Nevertheless, being related to the nature of the disease, some of these issues could hardly be solved, even because a truly reliable, sensitive, and specific reference diagnostic test does not exist. To deal with this, we applied an alternative synthesis method newly used in systematic reviews, the “Synthesis Without Meta-analysis” (SWiM), which improves transparency in reporting. The critical issues we observed further highlight the importance of title writing and keyword definition, both in the publishing and searching phases. When drafting these crucial parts of their manuscripts, using appropriate consensus terminology will maximize retrieval in bibliographic searches, enhancing article visibility and data usability.