Introduction

Milk is consumed by a vast number of the population worldwide. It is a nutritious product that can be consumed as such or after its transformation into various dairy products. It is the key nutrient for the growing of infants and it is anchored in many cultures. The understanding of its chemical composition, and biochemical functions are important for assuring the highest level of human nutrition and health. Therefore, milk has gained importance in research and has always been extensively analyzed regarding its composition.

Milk is a complex fluid with properties of three physical phases: a dilute emulsion, a colloidal dispersion, and a solution. On average, milk is composed of proteins and peptides (3.2%), lipids (4.6%), complex carbohydrates especially lactose (4.6%), vitamins (0.2%) and mineral substances (0.7%) [1, 2]. The composition of milk determines its nutritive quality and its authenticity. The composition greatly varies over the lactation period [2]. Other factors that influence the final composition of milk are animal species, location, type and amount of food intake, seasonal factors and diseases [3,4,5,6,7,8,9,10,11,12,13].

The number of factors, that influence the composition of milk, makes milk a very complex product. Due to such complex nature, its analysis is not straight-forward. To date, many techniques are implemented for milk analysis. A few examples include spectroscopic techniques including Dielectric, Raman, MIR, NIR, and Vis–NIR spectroscopy or capillary electrophoresis with UV detection. However, there are some drawbacks in using these techniques. For example, the infra-red absorption of milk components can be affected by interreferences caused by light scattering of the milk fat globules [14]. Many physical and chemical factors of milk, such as structure, ions, water, fat, and protein, for example, influence the prediction of the main components of milk in dielectric spectroscopy [15]. The capillary electrophoresis is a powerful separation technique, but only few methods have been reported recently for extensive elucidation of the composition of dairy products [16]. This may be due to the sample complexity and because only few applications apply direct UV absorption after the sample run [17].

Another widely used technique is chromatography, which is widely utilized to separate components like fat, proteins, lactose, minerals, and vitamins in milk. Because of its outstanding advantages of flexibility, sensitivity, and specificity liquid chromatography (LC) has become an indispensable tool in milk analysis. Its implementation in conjunction with mass spectrometry has increased in food analysis since the emergence of the soft ionization techniques, such as electrospray ionization (ESI) and the possibility to couple liquid chromatography with mass spectrometry (MS). For this reason, LC–MS has become increasingly significant for milk research and the study of its authenticity. The number of studies in which milk is analyzed with LC–MS has increased from around 42 to 1934 in the period of 2000 to 2021, according to the Web of Science database.

The use of MS to detect the components of milk has the advantage of no interferences regarding light absorption or light scattering. With the possibility to couple MS with LC the components of milk are efficiently separated prior to detection, making it one of the most promising techniques in milk analysis.

Given the complexity of its matrix, milk is usually analyzed via LC–MS after extraction of the constituents of interest. Consequently, most reviews dealing with this topic report recent trends and advances in the field regarding only one major aspect or constituent. For example, Liu et al. (2018) and George, Gay, Trengove, Geddes (2018) published comprehensive reviews on milk lipidomics [18, 19]. Similarly, Contarini and Povolo (2013) reported on the phospholipids in milk fat and their analytical strategies [20]. Other reviews focus on milk proteomics [21], milk glycomics [22], milk metabolomics [23], and milk vitamin analysis, such as vitamin D [24], and vitamin E [25], using LC–MS. Still, the use of the LC–MS techniques for the assessment of the milk is not sufficiently covered.

Therefore, compared to previous works this review reports the application of LC–MS for the analysis of milk as whole. The review is divided into four main parts according to the main constituents of milk and the studies in which they are analyzed. A summary of the studies on bovine milk, human milk and milk from other mammals is reported in Tables 1, 2, 3 and 4. In most publications one of the main milk components is analyzed, such as proteins, lipids, and sugars, using proteomic, lipidomic and glycomic techniques, followed by a smaller number of publications analyzing vitamins and trace elements. After a short review of the main systems applied to LC–MS of milk the major single component of milk will be discussed.

LC–MS instrumental systems for milk analysis

Separation and detection techniques in liquid chromatography and mass spectrometry have greatly developed over the last decades. One of the most important separation techniques is high-performance liquid chromatography (HPLC) as it enables high-throughput separation of complex matrices. Most notably normal phase (NP) and reversed phase (RP) and hydrophilic interaction (HILIC) liquid chromatography are increasingly described in food science. Regarding milk, different phases are implemented depending on the analyzed constituent. The most common phases and HPLC columns are described in the next sections and are summarized in Tables 1, 2, 3 and 4.

HPLC has become increasingly valuable in conjunction with mass spectrometry. This is possible thanks to the employment of the soft ionization techniques, such as electro-spray ionization (ESI) and atmospheric pressure chemical ionization (APCI) [26]. ESI and APCI are the most common soft ionization techniques used for LC–MS analysis of milk, as can be derived from the cited literature. The ionization of the eluted compounds after chromatography is necessary to transfer nonvolatile analyte molecules or ions from the liquid phase of the LC instrumentation into the gas phase of the mass spectrometer as molecular ions. They can then be separated and detected as individual analyte species based on the mass-to-charge ratio (m/z) by the mass analyzers.

As for mass analyzers, they can be divided into two main groups. The first group is based on ion beam transport which includes the magnetic or magnetic/electric hybrid sector instruments, time-of-flight (TOF), and quadrupole mass filter analyzers. The second group is based on ion trapping which includes the quadrupole ion trap (QIT), the Fourier-transform ion cyclotron resonance (FT-ICR, or FT-MS) analyzers, and the orbital trap analyzer (Orbitrap). Additionally, there are also hybrid analyzers, which are represented by the quadrupole time-of-flight analyzers (Q-TOF).

Most recently, mass analyzers were expanded by the addition of high-resolution analyzers (Orbitrap and FT-ICR). High resolution MS (HRMS) allows to determine analytes with high accuracy and to measure m/z ratios to several decimal places. This means information on the exact masses can be derived rather than those of nominal masses [27]. Additionally, the coupling of several usually two, mass analyzers in series, known as tandem mass spectrometry (MS/MS), allows accurate determination of fragmentation patterns of biomolecules and enables their identification. HRMS and MS/MS represent the major improvements in biomolecular mass spectrometry. The information obtained by HRMS results in a high amount of data, especially for untargeted studies, and its evaluation requires the use of post-processing software packages together with databases and chemometric tools. These modern tools are exploited in the omics fields. With the application of genomics, lipidomics, proteomics, and metabolomics milk can be comprehensively analyzed allowing a greater understanding of the composition of milk at all its levels [28].

In the next sections the different LC–MS applications and omics techniques used in milk research will be reported in detail based on the group of milk constituents, namely lipids, proteins, oligosaccharides, and vitamins. For each section examples from research in the literature will be given with a focus on authenticity of milk products.

Lipids (milk lipidomics)

Lipids can significantly alter milk quality and the products derived from it [29]. The lipid fraction of milk contains important structural and functional components. It is predominantly composed of triacylglycerols (TAGs), followed by diacylglycerols, monoacylglycerols, sterols (mainly cholesterol), phospholipids and fat-soluble vitamins [30]. The lipids are emulsified as globules in the aqueous phase, which contain the nonpolar lipids such as TAGs, cholesteryl esters, and retinol esters [31, 32]. The milk fat globule membrane, (MFGM), a complex membrane made of unsaturated phospholipids, proteins, glycoproteins, cholesterol, enzymes, and other minor components, surrounds the lipid globules [33]. To maximize the value of lipids present in raw milk, understanding their composition and distribution pattern is important [34]. The discipline dealing with the study of the lipid composition of a given matrix is called lipidomics.

Since its emergence in 2003 [35], lipidomics has become one of the most promising research fields. With the application of current state lipidomics to milk fat, its lipidome can be qualitatively and quantitatively analyzed. Knowledge of the milk lipidome, allows a greater understanding of the nutritive and organoleptic features of milk [8]. In the last 15 years milk lipidomics has gained increasing importance. A great share of studies focuses on the lipid composition of milk by employing LC–MS and lipidomic techniques (Table 2).

Extraction of milk fat for lipidomic studies

LC–MS-based lipidomic analyses typically start with the extraction of the lipids from the milk sample. Extraction steps such as liquid–liquid extraction and/or solid-phase extraction are pivotal to purify and concentrate the lipids of interest. The most common extraction techniques include methods by Folch [36], and Bligh–Dyer [37]. These methods involve liquid–liquid extraction using chloroform, methanol, and water in varying ratios. In some studies chloroform is replaced with less toxic alternatives such as dichloromethane or methyl-tert-butyl ether (MTBE) [38]. The most common liquid–liquid extraction techniques of fat from biological systems use chloroform, methanol, and water in varying ratios, followed by extraction using dichloromethane or MTBE [39]. Similarly, milk fat is extracted with the mentioned solvents in the reviewed articles (Table 1).

Table 1 Summary of studies performing lipid analysis of milk using LC–MS

LC–MS instrumentation for milk lipidomic analysis

Because of to the complexity of lipids, complete lipidomic analysis requires more than one instrumental platform. The choice of instrumentation depends on the lipid class which is going to be studied and the desired outcomes. For the separation of lipids in extracted milk samples, the most used LC techniques so far include reversed phase-LC (RP-LC), normal phase LC (NP-LC), hydrophilic interaction liquid chromatography (HILIC), supercritical fluid LC and ultra-performance convergence chromatography (UPC2) Table 1.

APCI and ESI emerged as the most utilized ionization techniques used for lipid detection. For the detection of lipids mostly ion trap (IT), triple quadrupole (QqQ), linear trap quadrupole-Orbitrap (LTQ-Orbitrap), quadrupole hybrid Orbitrap (Q-Exactive Orbitrap), Q-TOF and Q-Trap were described in the studies (Table 2).

MS allows to elucidate the mass of the intact lipid through its molecular ion as well as its structure after its fragmentation, such as the fatty acid composition of a specific triglyceride, for example [40]. Therefore, LC–MS can determine lipid classes of milk fat, fatty acid composition of each lipid species and in some cases the fatty acids’ regiospecific position. Furthermore, it is possible to provide quantitative information on the lipid class and individual species.

Analysis of single lipid classes

Fatty acids (FA)

Fatty acids (FA) constitute one of the main building blocks of most lipid classes in milk. The high variability of FA implicates high complexity of the milk fat. Among more than 370 different FA listed for bovine milk for example, only 14 demonstrate concentrations above 1% [30, 32].

The fatty acids in milk fat are mainly saturated fatty acids (69–70%), followed by mainly monounsaturated FA, most prevalently oleic acid, and only minor amounts of polyunsaturated FA, prevalently linoleic and linolenic acids [41]. Only a small percentage of fatty acids are present in milk fat as unesterified fatty acids. Usually, they can be found as fatty acid residues within other lipid classes, i.e., within TAGs or phospholipids. Therefore, most studies focus on the fatty acid composition and distribution in more complex lipid species.

Usually profiling of fatty acids is achieved through transesterification with subsequent GC-FID or GC–MS analysis [42]. However, recently approaches with LC–MS are becoming more popular, i.e. for multi-class lipid profiling, including free fatty acids [43].

In the past years, many studies focused on the characterization of the global milk fat lipidome. Most publications here reported focused on the analysis of single lipid classes mainly triacylglycerols and polar lipids, such as phospholipids and sphingolipids.

Triacylglycerols (TAGs)

Triacylglycerols (TAGs) or triglycerides are the most common form of natural lipids in both plants and animals and are the most abundant lipid class species in bovine milk [2]. They are composed of a glycerol backbone esterified with three FA molecules. Because of the combination of a high number of different FA, the molecular composition of TAG mixtures is typically very complex and results in a high number of possible TAG species. The studies characterizing TAGs can be divided into three levels: group level, FA composition level and FA position level. They provide information, respectively on TAG composition, FA make-up and regiospecific distribution of FA in TAG molecules.

TAGs are extracted into the non-polar fraction of milk fat prior to separation and detection. They are usually separated and characterized according to their number of carbons in the residues (CN). Among the LC techniques applied to TAG separation, RP-LC usually implementing C18 chromatographic columns resulted to be the most widely used one (Table 1). LC can separate TAG groups with different equivalent CN (ECN), which is obtained from the CN subtracted by the number of double bonds times two (DB, ECN = CN – 2 × DB). Groups with the same ECN cannot be resolved by LC separation [44]. The mobile phases usually contain ammonium salt to detect TAGs as ammoniated adducts. Successive identification is mainly performed in positive ionization mode using APCI-MS.

In 2021, the use of HPLC-APCI-MS was reported for the first time to identify intact TAGs in whole milk [45]. The authors investigated the composition of bovine milk fat using a combination of prefractionation techniques, namely silica thin-layer and gel permeation chromatography, followed by molecular species analysis by means of HPLC–APCI-MS and high-temperature GC–MS. A total of 120 TAGs were identified, giving a first description of the bovine TAG lipidome. Further authors described the bovine TAG profile. In another study a potential method was introduced to lipidomics by implementing an ultra-performance convergence chromatography (UPC2) system coupled with Q-TOF–MS [46]. A total of 49 triacylglycerols and 7 diacylglycerols were identified with this novel approach. A comprehensive investigation was conducted on TAG molecular species that contain at least one type of n-3 long-chain polyunsaturated fatty acids (LC-PUFA) in milk, namely eicosapentaenoic acid, docosahexaenoic acid, and docosapentaenoic acid, using HPLC-linear trap quadrupole-Orbitrap and HPLC-triple quadrupole MS techniques [47]. In total 51 TAG species that contained n-3 LC-PUFA were identified in bovine milk. In the most comprehensive characterization of bovine milk TAGs to date 3454 species were detected [48]. Most species are likely only of small abundance within the whole TAGs lipidome but represent a valuable reference to study the composition of bovine milk.

Other authors reported the composition of milk fat of other mammalian species. TAGs from different milk fat sources (human, bovine and goat) were compared and the FA components and distribution of TAGs identified by using non-aqueous RP HPLC-APCI MS/MS [8]. TAGs were identified using HPLC–MS, their FA composition was determined after transesterification via GC–MS. More than 160 different triglycerides were identified and differences among the milk fat sources determined. In a similar study ultra-high-performance liquid chromatography (UHPLC) was implemented to characterize TAGs in milk samples of different origin and in milk-derivatives [44]. Using a non-aqueous RP-LC/APCI-IT-TOF–MS method 243 different TAGs have been identified by protonated and fragment ions, containing up to 22 different fatty acids. With the aid of their newly developed method, the authors reported for the first time the TAGs composition of mozzarella cheese from buffalo milk. The camel milk lipidome and its TAG composition was also characterized [49]. After a prefractionation step, the authors used HPLC–ESI–MS to identify 135 TAG molecular species and providing the possibility to discover milk adulteration through triglycerides profiling. Finally, the TAG composition of human milk fat was studied to establish a model to precisely evaluate human milk fat [50]. The authors analyzed the TAG composition from different lactation stages by RP-HPLC-APCI-MS. The model was realized by profiling the human milk lipidome with four indices: fatty acid composition and distribution, poly-unsaturated fatty acid and TAG composition. Those indices could be used to aid the formulation of human milk substitutes for each lactation stage.

TAGs represent the most abundant lipid class in milk. Their characterization is important to understand their composition in each milk matrix. As seen in this section, in the recent years TAG composition was elucidated in milk from different mammalian species and at different lactation stages. Following the TAGs, polar lipids are the next most studied lipid classes.

Polar lipids (PLs)

Polar lipids (PLs) are an increasingly studied group of milk lipid classes, because of their functional properties within the milk lipidome. Some classes include phospholipids and sphingolipids. They surround milk fat globule membrane (MFGM). Their lipophilic and hydrophilic properties contribute considerably to the membrane’s emulsifying role [51]. Glycerophospholipids, glycerol-based phospholipids, and sphingolipids are quantitatively the most important PLs in milk. They include phosphatidylcholine, phosphatidylethanolamine, phosphatidylinositol and phosphatidylserine, while sphingomyelin is the dominant species of sphingolipids [20]. Sphingolipids with an attached carbohydrate are glycosphingolipids. An example of broadly studied glycosphingolipids are the gangliosides.

PLs analysis typically involves different purification and separation steps. Due to the low concentration of PLs they need to be concentrated after liquid–liquid extraction of milk [20]. A second extraction step is often applied to purify PLs and separate them from the other fat constituents. This usually involves solid-phase extraction [52]. Thereafter, the different PLs classes are generally separated by NP-LC, and more recently by HILIC (Table 1). Distinctive retention occurs according to their polar head group. ESI predominates as ionization technique for PLs analysis via MS. The mass analyzers used to detect PLs include the common MS techniques used so far for general milk lipid analysis. With the aid of LC–MS not only the different classes of phospholipids/sphingomyelin can be separated, but also the different species within each class can be identified [51].

HILIC in combination with IT-TOF MS was used to separate and detect the main phospholipid classes in milk from cow and donkey. Donato, Cacciola, Cichello, Russo, Dugo, Mondello [51] The unique phospholipid profile and fatty acid make-up of milk from cow and donkey was characterized in combination with evaporative light scattering detection (ELSD). On donkey milk this had been reported for the first time followed by a similar approach in a subsequent study [53]. The characterization of the contents of cholesterol and phospholipids of donkey milk was performed with HPLC-ELSD and HPLC–MS/MS. The fat profile of donkey milk was compared with the profiles of other species. The lipidic fraction of donkey milk appeared more concentrated in phospholipids than the ruminant milk fat. Phosphatidylethanolamine was the most concentrated species, followed by phosphatidylcholine and sphingomyelin. For the first time considerable quantities of glucosylceramide and lactosylceramide were identified in donkey milk. The cholesterol levels were higher in donkey milk than that of other mammals, except for human milk, for which it demonstrated a high correlation with the total phospholipid content. Similarly, milk from cow, goat, human, and donkey was analyzed via HILIC-ELSD with final characterization of the phospholipid profile and fatty acid composition using IT-TOF–MS and GC-FID and GC–MS, respectively [54].

While some studies focused on the composition of milk between different species, other studies assessed the changes of fat composition within one species. For example, the effects of the diet on the PLs (phospholipids and sphingomyelins) composition of bovine milk [6]. For the different diets evident changes in the levels of some of the polar lipid components were observed. Different feeding systems regarding different dairy farming practices were assessed. For all major polar lipid species, perceptible changes in their relative amounts with the different cow diets were observed. In detail, statistical analysis revealed significant differences in the fatty acid compositions of phosphatidylethanolamine and phosphatidylcholine species in milk from cows fed with the different diets. Furthermore, some of the lipid species were reported for the first time for bovine milk.

A single integrated LC–MS method based on HILIC coupled to an LTQ-Orbitrap mass spectrometer was developed to identify and quantify phospholipids in bovine milk [55]. The method enabled to quantify all essential PLs species, namely a total of 70 PLs within a single run of 45 min with minimum sample pre-treatment. Also changes in milk fat composition over time in human milk were studied [56]. The changes in the phospholipid concentration in human breast milk were studied over 12 month. The concentrations of each of the phospholipid classes in colostrum, transitional milk and mature milk were measured using HPLC–MS/MS and the authentic lipid profile of each lactation stage determined.

Finally, several studies investigated the content of gangliosides in milk from different origin. Several studies validated LC–MS methods and performed analyses of the major gangliosides species such as GD3 (disialodihexosylganglioside) and GM3 (monosialodihexosylganglioside) in milk and dairy products from different sources, such as bovine and human milk. The studies included the assessment of the origin of milk ingredients used [57, 58], and temporal changes of human milk gangliosides during lactation [59, 60].

In summary, in the last years, polar lipids in milk were increasingly assessed using LC–MS techniques. Their assessment is important to study their functional role in human health and nutrition. However, in the past years, the trend transitioned towards the assessment of the global lipidome. This has become increasingly feasible with the use of high-throughput lipidomic LC–MS techniques.

Global lipidomics and other fat constituents

There has been considerable interest in the study of single lipid classes, predominantly TAGs and PLs, in the past years. Nevertheless, many studies also focused on the characterization of the global milk fat lipidome, analyzing different lipid classes at once.

In a multi-class study, the human milk lipidome during lactation was characterized [61]. A single-phase extraction protocol with an MTBE solvent system on a representative pool of human milk was used to characterize its polar and lipidic metabolic composition using LC-QTOF-MS. The main lipidic classes have been identified including saturated and unsaturated fatty acids.

Changes in fatty acids (FA), phospholipids (PL), and gangliosides (GD) composition in human milk during lactation was studied with a similar approach [62]. FA were determined by GC after direct methylation. PL and GD classes were quantified by employing LC coupled with ELSD and with TOF–MS, respectively. The changes during the period of lactation and across sample origin in a large cohort of Chinese mothers from Beijing, Guangzhou, and Suzhou were determined. The content of saturated and mono-unsaturated FA, and PL decreased during lactation, while the content of polyunsaturated FA and GD increased. Differences in content of FA, PL and GD between the different cities were determined.

In a similar study, the difference in the lipid composition between term and preterm human milk was analyzed [63]. Furthermore, the study predicted the implication of preterm human milk lipids on the development of neonates. Similarly, the lipid composition of human milk at different stages in comparison to infant formula was studied [64]. A comparative LC–MS-based lipidomic study between infant formula and breastmilk was performed to evaluate the changes at different lactation stages. This was the first study to comprehensively report the differences in breast milk and infant formula lipidomes at each lactation stage.

Finally, regarding human milk, several authors analyzed glucocorticoids in the lipid fraction of human milk via LC–MS/MS and developed and validated an isotope-diluted LC–MS/MS method to measure global cortisol and cortisone in human milk without enzymatic deconjugation [65, 66]. These studies represent an important example of the characterization of the levels of glucocorticoids in human milk.

In the case of milk from other species, several studies assessed their lipid content using LC–MS based lipidomics. Examples include the determination of lipid content in bovine and goat milk and the identification of biomarkers for authentication and against adulteration [67], and the identification of lysophosphatidylcholine as heat stress biomarker for dairy cattle in a study for bovine milk authentication to identify changes in the lipid profile during a heat stress induced challenge [68].

In conclusion, in the past 20 years, notable progress has been made to identify and quantify the major lipid classes and species as well as to characterize the global lipidome in milk. However, additional progress in milk lipidomics is still needed. Some difficulties are still present regarding the lack of powerful separation techniques able to resolve isomeric species and to absolutely quantify lipids at the species level, because of the high number of lipid species present in milk and lack of standards [69]. Future studies should focus on systematic and extensive milk lipidomic analyses to understand the biological roles of milk lipids at the species level.

Proteins (milk proteomics)

Milk proteins are found in soluble form, micellar form or bound to the milk fat globule membrane (MFGM) [70]. It is possible to identify three different milk fractions: caseins, whey proteins, and a minor fraction of low molecular weight peptides (proteose peptones) and MFGM proteins [71]. The biggest fraction, with around 79.5%, are the caseins followed by the whey proteins with around 19.3% and the minor fractions, comprised of the MFGM proteins, enzymes and proteins arising from blood, with around 1.9% [1]. Each fraction can be analyzed with the aid of LC–MS and proteomics.

Two main approaches are used for proteomic analyses: a top-down and bottom-up approach [72]. In a top-down proteomic approach intact proteins are isolated and fragmented by mass spectrometry. In a bottom-up proteomic approach proteins are proteolytically digested before successive separation and fragmentation of the resulting peptides. In conjunction with LC–MS the bottom-up approach has gained popularity over the past years (Table 2). Depending on the proteomic approach selected, different extraction and isolation strategies are available.

Table 2 Studies performing protein analysis of milk using LC–MS

Extraction

Extraction of protein components of milk usually involves fractionation of the milk constituents by means of centrifugation and/or liquid–liquid extraction (Table 3). The phase containing the desired protein fraction is then recovered and further processed depending on the proteomic approach.

In top-down proteomics, intact proteins are subjected to mass spectrometric analysis without further cleavage into peptides. In bottom-up proteomics, the extraction of proteins is followed by their cleavage with a specific enzyme and isolation of the obtained peptides. Peptide generation commonly involves enzymatic digestion either in-gel after SDS-PAGE (sodium dodecyl sulphate-polyacrylamide gel electrophoresis) or in-solution. The enzyme mostly used for this scope is trypsin, which proteolytically cleaves proteins at the arginine and lysine amino acid residues. The presence of a C-terminal lysine or arginine residue yields molecular peptide ions with high efficiency under acidic conditions and final effective fragmentation via MS/MS [73].

LC–MS instrumentation for milk proteomic analysis

After extraction of proteins from milk and their eventual digestion into peptides, their analysis follows using LC coupled to MS implementing soft ionization interfaces. Advances in soft ionization techniques have refined the possibilities to characterize intact peptides and proteins [74]. Soft ionization techniques have particularly boosted the use of bottom-up proteomics [72]. Using LC for peptide separation and tandem MS for individual protein identification in complex mixtures, has risen to the mostly used bottom-up approach, also for milk proteomics.

ESI is the most used form of ionization in bottom-up proteomics in combination with online LC separation is. Most chromatographic techniques involve RP-LC, in which peptide mixtures are typically separated according to their hydrophobicity (Table 2). The peptides eluting from the column are directly ionized by electrospray ionization before entering the mass spectrometer.

Pivotal to present-day LC–MS based proteomics is a further ionization technique which has become recently popular: nano-spray ionization. In concept, Nano-spray is similar in ion formation to ESI but in contrast to ESI, nano-spray uses substantially lower flow rates and smaller needle diameters [75]. The lower flow rates allow longer analysis time favoring the efficacy of tandem MS measurements. Furthermore, droplet formation is heavily favored, which increases the ionization efficiency [75].

While to identify proteins and peptides LC–MS analysis involves the usual sample pre-treatment protocols, for their quantification further sample treatment steps and evaluation techniques are needed. For the quantification of proteins and peptides two broad strategies are available: label-based and label-free methods [76]. The most widely used labeling techniques include metabolic, proteolytic, and chemical labeling strategies [77]. Labeling strategies in milk proteomics usually involve chemical labeling, in which chemical reactions are used to incorporate labels into the analytes to relatively and/or absolutely quantify them. A few examples are described in the publications reviewed.

Alternatively, label-free are widely applied in milk proteomic. They are highly sensitive to MS analysis and the sample preparation is less time-consuming and more cost-effective than for label-based methods. Therefore, label-free methods are preferentially used in milk proteomic research. In label-free methods the MS signal is correlated to the abundance of a protein or peptide in a sample [76]. The quantification is performed by measuring the peak area and/or considering the amount of MS/MS spectra from every peptide [78, 79].

Most milk proteomic studies here reviewed deal with the analysis of either one of the main milk fractions, namely the caseins and the whey proteins. Great interest has also arisen in the investigation of the MFGM proteins and the glycosylated proteins and their functions in milk. The analysis of the global milk proteome has gained importance to compare different proteomes and assessing their authenticity, such of different animal species or in different lactation stages. In this review, the studies analyzing the milk proteome will be divided into the above-described protein fractions.

Caseins

Caseins are characterized by their insolubility at pH 4.6 and 20 °C [80]. Caseins can be identified according to the homology of their primary structures (amino acid sequences) into the following families: αs1-, αs2-, β-, and κ- casein [81]. The identification and quantification of the different caseins allows to characterize the milk proteome and its authenticity, as described in the following studies.

A proteomic description of certain minor proteins of the human milk casein fraction by means of LC–MS/MS resulted in the identification of 82 proteins in the casein micelle [82]. A label-free quantitative analysis allowed to simultaneously estimate the absolute quantities of total protein and caseins [83]. Human β-casein using UHPLC coupled to an Orbitrap fusion mass spectrometer was longitudinally assessed. The absolute concentrations of α-casein, β-casein, and κ-casein were obtained in a large sample set of human milk. Still regarding human milk, the post translational modifications on human β-casein over the lactation period were studied [84]. New O-glycosylation of β-casein and were discovered contributed to the profiling of the human proteome.

The casein fraction of milk from other mammalian species was studied [85, 86]. Cheeses produced from cow, sheep and goat milk were studied to verify their authenticity by mass spectrometry [85]. The species origin of caseins in the cheeses was determined by proteomic approach using LC-ESI-Q-TOF–MS. The genetic variations of β-casein were studied in milk from Western cattle breeds [86]. Bovine β-casein can be grouped into two main macro-families, namely A1-like and A2-like genetic variants. On the market there is an increasing availability of so-called A2 milk, containing the A2 variant of β-casein [86]. An LC–MS method was developed to distinguish between the different genetic variants and provide a starting point to analyze milk labeled as A2, with the purpose to assess the authenticity of this type of milk and to detect possible contaminations and frauds.

In summary, the here presented studies are examples how casein characterization is used to assess the genetic differences across species and breeds. After the casein fraction, the whey fraction represents the most abundant fraction in milk. This fraction is increasingly analyzed because of its functional role within the milk proteome, as described in the next section.

Whey proteins

Because of their functional role, the analysis of whey proteins represents an important task to understand the composition-function relation of the milk proteome. The whey proteins are described in a general sense as those milk proteins that stay soluble at pH 4.6 and 20 °C, and include α-lactalbumin, β-lactoglobulin, serum albumin, immunoglobulins, and proteose peptones to name the most important [81].

Different studies investigated the composition of all or single whey proteins in different mammalian species and at different lactation stages, such as the absolute quantitation of β-lactoglobulin (β-LG) in different milk products [87]. β-LG was determined as intact protein in whole bovine milk and infant food with yoghurt content. The absolute quantification at the protein level was achieved in a top-down proteomic approach to assess source of β-LG.

In a similar investigation an LC–MS/MS method based on a signature peptide was developed to quantify bovine lactoferrin in infant formulas [88]. Three signature tryptic peptides of bovine lactoferrin were identified and validated by analyzing lactoferrin contents in four different brands of infant formulas, milk, yogurt, whey protein concentrate, whole milk powder and skimmed milk powder. Still regarding bovine milk, LC–MS/MS analyses using bottom-up strategies were performed [89,90,91,92]. Comparative whey proteomics resulted in the identification of 71 proteins to authenticate the bovine whey proteome. The temporal changes in milk whey during the estrous cycle of cows were studied using global label-free LC–MS/MS [93]. Milk whey and extracellular vesicle enriched milk whey from day 21 of pregnancy were compared with day 21 of the estrous cycle. The research aimed to find biomarkers to determine early pregnancy. In sum, 218 proteins were detected, of which four were differentially expressed between the two test groups and enabled to determine early pregnancy. The milk whey proteome of Indian zebu (Sahiwal) cows was identified using LC–MS/MS [94]. The milk whey proteome of Sahiwal cattle was determined in its relation to host defense. This study was the first to report the milk proteome of the Indian zebu cattle.

A great share of studies assessed and compared the whey protein of other mammalian species apart from bovine. For example, the whey proteome of three indigenous pure-breed Greek sheep and goats [95]. One-dimensional nano-LC–MS/MS analysis suggested that the whey proteins in goat and sheep milk can be altered by geographical factors and nutritional likings of the animals across breeds and species. Proteomic LC–MS analysis of buffalo colostrum and milk resulted in the identification of 44 different whey proteins [96]. A comprehensive characterization of the sheep milk proteome over the lactation period suggested whey proteins undergo drastic changes during early development of newborn lambs [97]. Knowledge about the whey composition at each stage provides guidance for early weaning of lambs. Isobaric tag for relative and absolute quantification (iTRAQ) coupled with LC–ESI–MS/MS was used to characterized the ovine milk whey proteome [97]. iTRAQ is an isobaric labeling technique where the primary amine groups of proteins or peptides are derivatized [77]. Overall, 310 proteins were determined. Of them 121 proteins were differentially expressed.

Finally, some studies performed LC–MS based whey proteomics on human milk. Whey proteins in human and bovine colostrum and mature milk were quantified with the aid of iTRAQ labeling and LC–MS/MS [98]. Overall, 584 whey proteins were analyzed regarding their biological functions and functional variations in bovine and human from different stages of lactation.

In a similar study on human milk, 115 whey proteins were identified [99]. The functionality of the identified proteins was investigated. 35% concerned the immune response. Several proteins, that were monitored over a 12-month lactation period, were differentially regulated between early and mature milk. Similarly, the temporal changes of the human milk proteome were studied [100]. Overall, 976 whey proteins were identified after protein fractionation using separation techniques based on ion-exchange chromatography and electrophoresis. The human milk whey proteome was described at each lactation stage with 152 proteins that were considerably regulated between transitional and mature milk. Finally, changes in the whey proteome between healthy and pathological donors of human milk were assessed [101]. The difference in protein expression of whey in human colostrum from lactating mothers with and without gestational diabetes mellitus was assessed. A total of 601 proteins could be identified without prior fractionation in human milk whey. From the 601 proteins 27 could be related to gestational diabetes mellitus using chemometrics as tool to authenticate the health status of the donors.

In summary, the here presented studies investigated the composition of all or single whey proteins in milk samples. The studies characterized whole whey proteomes or compared the whey profile in different types of milk, as well as at different timepoints and regarding origin, diet and health status specific changes. The next most studied protein fraction in milk is represented by the milk fat globule membrane proteins.

Milk fat globule membrane (MFGM) proteins

The milk fat globule membrane (MFGM) is an important milk fraction, because of its high contents in bioactive proteins [71]. For this reason, many predominately functional studies have been performed for the determination of the MFGM proteome.

Several studies assessed the MFGM proteome of bovine milk. Micro-capillary-HPLC-nanospray-MS/MS after in-gel tryptic digestion was used to characterize the proteome of the MFGM isolated from bovine milk [102]. Overall, 120 proteins were identified, and their biological function assigned. LC–MS among other techniques was used to characterize the proteins and the lipids isolated from the bovine MFGM [103]. This study was the first to compositionally determine bovine MFGM proteins and lipids from the same sample. Proteins were identified with proteomic techniques and lipids using a combination of GC and LC–MS. Absolute quantification was performed to quantify the six most abundant bovine MFGM proteins in butter milk protein concentrates [104]. The proteins were cleaved enzymatically to generate peptides, which were identified via their specific cleavage sequences using LC-HSRM/MS. The technique was validated to measure multiple proteins simultaneously. An LC–MS/MS-based shotgun approach was used to proteomically characterize two MFGM-enriched milk fractions [71]. With a label-free proteomic approach 244 proteins were identified in whey protein concentrate and 133 in buttermilk. A direct LC–MS/MS analysis resulted in the characterization of the bovine MFGM proteome, regarding the isolation technique used to extract the proteins of interest [105]. The optimal workflow was assessed to detect new MFGM proteins. The approach revealed several previously unobserved novel proteins.

In the following studies, proteomic analyses of milk whey from additional mammalian species were performed. The proteomic profiles of MFGM in sheep affected or not by agalactia were compared [106]. MFGM proteins from camel milk were identified using a one-dimensional-LC–MS/MS approach which enabled the identification of 322 functional groups of proteins associated with the camel MFGM [107]. Proteins associated with the MFGM from goat milk were analyzed after in-gel trypsin digestion using nano-LC–MS/MS with Q-Exactive Orbitrap technology [108]. Overall, 442 functional groups of proteins and 127 functional groups of phosphoproteins were identified after phosphopeptide enrichment via strong cation exchange chromatography and titanium beads. The authors provided the comprehensive phosphoproteome of an MFGM sample. MFGM proteins isolated from buffalo milk and colostrum were also studied [109]. After in vitro gastrointestinal simulation digestion of the MFGM fractions a detailed proteomic analysis of the generated peptides was performed by nano-LC-ESI MS/MS. The matching of peptide sequences with the component proteins of MFGM allowed the clustering of MFGM proteins to various functional groups involved in lipid metabolism and energy production, protein synthesis and secretion, transport, cell signaling, catalytic and in immune function. This study provided a fundamental base to assess the composition-function relationship of milk.

In conclusion, the data reported in this section suggest the relevance of the characterization of the MFGM proteome. All the remaining milk proteins are assessed in global proteomic studies, presented in the next section.

Global milk proteome and other milk related proteins

The characterization of the global milk proteome is used to identify genetic variants, post-translational modifications, modifications induced by technological treatments and to determine different lactation stages [110]. Global proteomics are therefore important for comparative studies and the assessment of milk authenticity.

One of the first studies using on-line LC–ESI–MS to study milk proteins was conducted in 1995 [111]. Bovine milk diluted in a urea buffer with reducing agent was injected in an RP-HPLC C8 column and the flow was split before introducing it in the MS interface. All the major milk proteins except for α-lactalbumin were detected in a top-down approach as intact proteins. In a 2006-study, allergens derived from milk proteins were identified [112]. Emphasis was put on casein, the most abundant and considered most allergenic protein fraction in milk, which was analyzed by LC–MS/MS after extraction and tryptic digestion. Proteomic techniques were also applied to detect milk allergens in food samples [113]. Seven peptide markers arising from the casein and whey fraction were analyzed using a Q-Trap LC–MS/MS system. A single-run multianalyte method was developed to detect tryptic peptides from different milk proteins. This study was one of the first trails to establish a promising multianalyte alternative to immunoanalytical methods to determine milk allergens in different food samples. Similarly, a method was developed to simultaneously quantify twenty key bovine milk proteins [114]. The selected proteins comprised all individual caseins, the major whey proteins and most well-known MFGM proteins. Changes of sialylated N-glycan of bovine lactoferrin at different lactation stages using MALDI-TOF–MS and HILIC-MS/MS were studied and variations in sialylated N-glycan [115]. The results provided new insights in the structure–function relationship of bovine lactoferrin. The protein profile of colostrum in primiparous and multiparous Holstein dairy cows was analyzed [116]. The proteome changes during transition to mature milk using LC–MS/MS was assessed. Variations in the colostrum and transition milk proteomes due to parity were characterized demonstrating that differences in mammary secretion are evident with starting of lactation. Potential biomarkers of mammary function and wholesomeness of bovine milk were assessed. The milk proteome of two different bovine breeds, respectively Jersey and Kashmiri, were assessed at day 90 of lactation using LC–MS/Q-TOF [117]. The two breeds at the proteome level were differentiated, by determining differences in the expression of 81 high-abundant and 99 low-abundant proteins. The authentication of the respective cattle proteome can aid to determine the most suitable type of milk for the preparation of infant formula. The composition and properties of whey and MFGM proteins in Guanzhong goat and Holstein cow milk was characterized and compared by nano-LC and Orbitrap tandem MS [118]. In total 417 whey proteins and 776 MFGM proteins for goat and cow milk were detected and characterized using proteomic methods. The results suggested that the whey and MFGM proteins were different for Guanzhong goat and Holstein regarding their functions and pathways. Hay milk and conventional milk was studied for post translational modifications of the proteins, caused by post processing raw milk [119]. The study aimed to evaluate season and processing related changes in the modified proteome of bovine milk from two different feeding systems. Therefore, tryptic digests of regular and hay milk were analyzed by targeting 26 non-enzymatic modifications using LC–MS and their authenticity was assessed [119]. An improved LC–MS method was developed to quantify the six main bovine milk proteins and authenticate the molecular diversity of the bovine milk proteome including their post translational modifications [120].

Global proteomics was also conducted on milk from other mammalian species besides bovine. Qualitative and quantitative analysis of the N/O-glycome of goat milk at different lactation stages was performed via HILIC-MS/MS and compared it with the corresponding human milk glycoprotein N/O-glycome [121]. Differences and similarities in the glycosylation patterns were found that provide reference for the functional diversification of infant formula derived from goat milk. The changes of the bovine and camel milk proteome after dry spraying using HPLC–MS with Quadrupole-Orbitrap were studied [122]. The essential of caseins and whey proteins of camel and bovine milk were preserved after drying, demonstrating that the spray dry technique preserved the genuineness of both milks. A study on heat induced changes in milk was also conducted [123]. The authors analyzed the fundamental differences in protein aggregates after heat treatment of camel and cow milk. The difference in protein composition after heat treatment (80 °C, 60 min) of milk could be determined and a total of 45 different proteins were successfully identified.

Regarding goat milk, two studies from Verma and colleagues analyzed the goat milk proteome to assess the genetic and geographical origin of different breeds [124, 125]. Their findings are important for the assessment of goat milk authenticity and its use as potential functional ingredient for human nutrition. The authenticity of human, bovine and caprine milk was investigated by profiling the glycoproteomes of the different species using LC/MS-QTOF [126]. The highest glycosylated protein concentrations in caprine milk were assessed compared to the other species. Furthermore, 42 O-glycosylated and 56 N-glycosylated proteins were identified from which only the human ones could be matched to important human biological pathways suggesting their unique role for human neonatal development.

Lastly, extensive proteomic studies were conducted on human milk. An extensive LC–MS/MS proteomic analysis of extracellular vesicles purified from human milk was performed [127]. In total 1963 proteins were identified in milk derived extracellular vesicles, with a notable overlap between the donors. Human milk whey of intact glycopeptides was analyzed, and 330 were identified in colostrum and 327 in mature milk whey [128]. N-glycoproteomics analysis of label-free quantification using LC–MS enabled the authors to assess the site-specific glycoforms of human milk whey during lactation and describing the dynamic changes of the different glycoforms during lactation. The knowledge about the composition of the human peptide profile is important to be able to formulate substitute milk for infants. Bovine-derived infant formula, used as a common substitute to human milk lacks key bioactivities present in human milk and were therefore analyzed in human milk [129]. In sum, 618 peptides specific to human milk were identified using LC–MS/MS which highlight the need for a peptide-based approach to overcome the functional gap in substitute milk formula. The concentrations of over 1300 milk proteins and 2000 endogenous milk peptides were longitudinally profiled over the lactation period for 2 individual human milk donors [130]. The temporal variations in the human milk proteome were assessed using quantitative LC–MS/MS. The lactational changes in the human milk proteome were described which provided a tool to monitor individual health status and the influence of milk on the new-born.

In conclusion, in this section we summarized LC–MS based proteomic studies, which assessed the composition of milk regarding species, feeding, geographical origin, and post translational modifications. These works highlight its importance to human health and nutrition and is underlined by the number of works studying the bovine milk as a primary source of milk products, followed by the studies on human milk and the understanding of its composition in comparison to other mammalian species.

Oligosaccharides (milk glycomics)

In the past 20 years, increasing interest emerged in the study of the glycomic composition of milk beyond lactose using LC–MS. Most studies focus on the analysis of milk oligosaccharides. Milk oligosaccharides were discovered as functional component of milk with roles beyond nutrition [131]. A growing body of evidence indicates that oligosaccharides are important for infant development because of their bioactive properties such as immune functions, prebiotic activity and protection against microbial pathogens [132, 133].

Oligosaccharides in milk are a structurally varied group of unconjugated, branched glycans. In human milk, for example, they are the third most abundant solid fraction after lactose and fat [134]. Milk oligosaccharides are largely composed of 310 covalently linked monosaccharides [135]. They are synthesized in the mammary gland and are composed of five monosaccharide building blocks: galactose (Gal), glucose (Glc), N-acetylglucosamine (GlcNAc), sialic acid and fucose (Fuc), with N-acetylneuraminic acid (Neu5Ac) as its sole form so far identified [22]. Oligosaccharides in human milk seem to be the most complex among mammals, with over 200 different types described [22]. Almost exclusively all of them feature lactose at the reducing end [136].

Their effects on the health and development of the infant make milk oligosaccharides an important ingredient to be supplemented to milk formulas for infants that cannot be lactated [137]. For this reason, there is an increasing interest in studying and understanding the composition of milk oligosaccharides. Glycomic LC–MS techniques help to elucidate the composition of the milk oligosaccharides and their authenticity related to human health.

Extraction

To study the milk glycome the oligosaccharides must be isolated from milk. Most studies reviewed here achieve this with separation techniques such as liquid–liquid extraction and centrifugation to remove fat and precipitate proteins (Table 3). Different purification strategies can be used prior to determine the extracted oligosaccharides. Those include fractionation by gel permeation chromatography and different solid-phase extraction (SPE) techniques. Those can be followed by different labelling and derivatization techniques, such as permethylation. In most cases the oligosaccharides are analyzed in their “native” form after reduction to their alditol form. Because of native oligosaccharides’ poor UV absorption, mass spectrometry is extensively applied in combination with liquid chromatography to analyze the milk glycome.

Table 3 Summary of studies performing oligosaccharide analysis of milk using LC–MS

LC–MS instrumentation to characterize milk oligosaccharides

LC–MS has become widely used in the field of glycomics to elucidate the composition, form, arrangement and linkage of the sugar building blocks of milk oligosaccharides [138]. For the separation of isolated milk oligosaccharides RP-HPLC, Porous graphitic carbon (PGC) chromatography and HILIC are mostly described (Table 3). Especially PGC and HILIC have gained wide application in recent years. Because of their polar retention effect PGC is highly suitable and is widely employed to separate oligosaccharide isomers [139]. PGC can be applied to separate native neutral and acidic oligosaccharides.

Analysis

Most glycomic studies here reported focus on the analysis of human milk oligosaccharides (HMO). The milk glycome of other species are also studied, predominantly bovines and swine. Especially swine represent a valuable model system because of their similarities to humans regarding intestinal physiopathology [140].

Human milk oligosaccharides

Oligosaccharides were isolated from the lipids and proteins of individual human milk samples and analyzed using a new microchip liquid chromatography mass spectrometry (HPLC-Chip/MS) method and MALDI-FT ICR MS [141]. The obtained profiles of milk samples from five different women were compared and variations in their oligosaccharide profiles detected. Thanks to tandem MS combined with exoglycosidase digestion the unambiguous differentiation of structural isomers was possible. The accurate mass measurements provided oligosaccharide composition for nearly 200 individual molecular species. The same authors monitored the oligosaccharides in human milk obtained from five donors over a 3-month lactation period using HPLC-Chip/TOF–MS and a porous graphitized carbon (PGC) chromatographic column [142]. The composition of the human milk glycome at each lactation stage was assessed.

In a further study, an on-line HPLC-ESI–MS method combined with fluorescence detection for the simultaneous determination of the main oligosaccharides along with lactose from human milk samples was described [143]. Twenty-two neutral and sialylated oligosaccharides were separated on an amide column after precolumn derivatization with 2-aminoacridon for fluorescence detection before MS analysis. An LC–MS method was validated using a PGC column to separate and quantify eleven neutral oligosaccharides from human milk [144]. After reduction of the HMOs all isomers could be separated via gradient elution and all co-eluting compounds could be differentiated by ESI–MS.

An LC–MS/MS method based on a PGC column was developed and validated to determine and quantify two regioisomeric disaccharides, N-acetyllactosamine and lacto-N-biose in human colostrum [145]. A new, high-throughput UPLC-triple-quadrupole-MS method was implemented to simultaneously determine and absolutely quantify neutral and acidic oligosaccharides in human milk [146]. The separation of the oligosaccharides was performed with an amide column without prior reduction. The method was used to determine the phenotypic secretory status of the mothers. This rapid streamlined method was validated and applied to milk samples from 2 human cohorts, thereby demonstrating its practical applicability to large-scale studies for HMO profiling.

Comparative analysis of native and permethylated human milk oligosaccharides by two distinct LC–Orbitrap-MS/MS based glycomics platforms were performed [147]. In the first platform native HMOs were separated using a PGC column directly after reduction and SPE. In the second platform, the reduced oligosaccharides were permethylated and successively separated via a reversed phase C8 column. Their findings suggest that the analysis of native reduced HMOs after separation on a PGC column is more suitable for high-throughput experiments, while reversed phase separation of permethylated derivatives demonstrated advantages to identify and structurally characterize HMOs. Both methods demonstrate application for human milk glycomic authentication.

An identification method was developed for the rapid tentative identification of milk oligosaccharides based on a spectrum library using LC–MS/MS [148]. The milk samples were derived from African lion, Asian water buffalo, bovine, and goat and expanded a oligosaccharides library with the addition of data from non-human milk. Several studies determined HMOs at different lactation stages which represents an addition to assess the changes of HMOs during lactation [149, 150]. The oligosaccharide diversity of human milk at different time points post-partum was analyzed and the most abundant HMOs could be covered in the studies. Generally, the HMO concentrations varied substantially between mothers and over the lactation course indicating their temporal relevance for the child’s development.

In summary, the here presented studies on human milk glycomics, describe the efforts to characterize the oligosaccharide composition of human milk using advanced LC–MS techniques. Human breast milk represents the most important source of nutrients for the growing infant. However, when breastfeeding is not possible, it is important to be able to formulate the most suitable human milk surrogate. For this reason, it is important to study and gain knowledge about the authenticity of HMOs.

Oligosaccharides from other animal species and cross species studies

To understand the most suitable sources of nutrients and bioactive compounds for infant formula, in the past years the oligosaccharide composition in milk from other mammalian species was extensively studied. This section reviews the studies which assess the glycomic composition of milk from non-human mammalian species. Most studies focus on the characterization of the porcine milk glycome. Swine are a significant domestic species and are considered to be an excellent model for nutritional studies because their digestive and immune systems physiology, and anatomical structure are similar to those of humans [140].

The composition of porcine milk oligosaccharides (PMO) was analyzed using Nano-LC Chip QTOF MS–MS during a lactation period from pre-colostrum to 14 days lactation [140]. The development of oligosaccharides was studied in milk from 3 healthy sows in relation to the gut metagenome of suckling pigs and their interaction to shape the nursing pig microbiome. More than 30 oligosaccharides were identified in porcine milk. Their oligosaccharide composition and evolution were similar to those of human milk, demonstrating the swine’s suitability as model for simulating human nutritional studies and the importance to assess its authenticity. HPLC-Chip/TOF–MS and nano-ESI-FTICR-MS methods were implemented to analyze the glycome of different animal species [141, 151, 152]. The porcine milk glycome over the lactation period was monitored. Twenty-nine distinct porcine milk oligosaccharides were identified. The porcine milk contained both acidic and neutral oligosaccharides, which predominate throughout lactation. The same approach was used for bovine milk. For both glycomes, the predominant oligosaccharides were those containing sialic acid.

Regarding other mammalian species, HPLC-chip/TOF MS was used to evaluate the presence of acidic and neutral oligosaccharide profiles of bovine colostrum from Holstein–Friesian cows during the first 3 days of lactation [132]. Alternative methods for bovine milk glycome authentication were proposed by further studies [135, 138]. The studies developed and validated glycomic methods by implementing HILIC-HPLC-HRMS/MS. Canine and feline milk using a high-throughput LC–MS glycomics approach was studied [153]. The similarities and differences across in-species glycomes were described which provided a tool to correlate canine and feline milk composition to the needs of lactated puppies and kittens. An in-depth study assessed the composition and the quantity of oligosaccharides in goat milk using UPLC-MS/MS [154]. The oligosaccharides in Guanzhong, local breed and Saanen goat milk were quantified and compared to obtain a better understanding of oligosaccharide composition in these two goat breeds and their use as an alternative source to produce infant formula.

In conclusion, in the past year LC–MS was employed to profile the milk glycome from different mammalian species. In most of the studies the glycomes were compared between species with a special regard to similarities to the human milk glycome. Ultimately a better understanding of the authenticity of each milk glycome and especially the human one is important to direct the formulation of adequate human breast milk alternatives.

Inter-species adulteration in milk

The previous sections described the LC–MS characterization of milk from different species regarding their composition of lipids, proteins, and sugars. The used techniques enabled several authors to assess the species origin of the analyzed milk by lipidomic, proteomic and glycomic profiling. Remarkable examples include the use of HPLC–ESI–MS with the possibility to discover milk adulteration through triglycerides profiling [49]. Triglycerides profiling was also used to assess the authenticity of hay milk as an example to illustrate the potential of HPLC–ESI–MS to counteract milk adulteration in general [155]. Further studies include the determination of lipid content in bovine and goat milk and the identification of biomarkers for authentication and against adulteration [67].

Regarding milk proteomics several studies assessed the authenticity of milk and milk products at the species level. Caseins seemed a promising target to pinpoint the species origin of milk. The identification and quantification of the different caseins allows to characterize the milk proteome and its authenticity regarding their origin.

A proteomic description of certain minor proteins of the human milk casein fraction by means of LC–MS/MS for example, resulted in the identification of 82 proteins in the casein micelle [82]. Knowledge of the specific protein composition of the casein micelles can be a valuable tool to assess the species of the analyzed milk thus finding markers against adulteration. Other examples involving casein analysis to assess human caseins are the quantification of α-casein, β-casein, and κ-casein [83], as well as the assessment of post translational modifications to link new O-glycosylation to human β-casein and contribute to the profiling of the human proteome [84].

The assessment of species and origin of milk is also important for cheese production. In a study, cheeses produced from cow, sheep and goat milk were analyzed to verify their authenticity by mass spectrometry and develop possible tools to counteract adulteration [85]. A further interesting study was an LC–MS method development to distinguish between the so-called A2 milk and conventional milk through proteomic characterization of bovine A1-like and A2-like β-casein [86]. With this method it may be possible to detect contaminations and frauds.

Overall, casein characterization by LC–MS is used to assess the genetic differences across species and breeds and provide tools against inter-species adulteration. As food fraud is a profitable practice in cheese manufacturing it is important to provide methods to counteract it.

Finally, regarding milk glycomics, in the previous section it became evident how assessing the exact composition of milk oligosaccharides is important to assure the correct supply of sugars in infant formulas. Therefore, studying the authenticity of the human milk glycome and finding similarities to other mammalian species is important to produce correct infant formulas and find possible deficiencies. However, there have not been yet many studies on species adulteration focusing on milk sugar composition for detection of adulteration.

Lastly, not many studies deal with the global profiling of milk and its species origin. However, the advances made so far in milk metabolomics are very promising to identify potential new biomarkers.

Other constituents and milk metabolomics

Lastly, all the remaining milk components are investigated in milk metabolic studies. Determining the levels of milk metabolites can be of great use to verify milk authenticity and discover possible biomarkers for its assessment.

Vitamins

In most studies regarding other milk constituents, vitamin content is analyzed, with vitamin D being the most extensively studied. Vitamin D has a steroidal structure and is present in either of two different forms, vitamin D2 (ergocalciferol) or vitamin D3 (cholecalciferol). Currently, the most used vitamin D detection methods are based on reversed-phase HPLC usually using a C18 packed column with mass spectrometry or ultraviolet detection (Table 4). Robust methods using LC–MS and LC–tandem MS were described to determine the levels of vitamin D3 in fresh bovine milk, commercial and fortified milk, and a dairy-based infant formula [156]. The authors authenticated the vitamin D levels in each milk sample without the need of extensive sample pre-treatment. A similar approach was chosen to analyze vitamin D levels in bovine milk, milk powder and infant formula using LC–MS [157].

Table 4 Summary of studies on milk metabolomics for milk authenticity and analysis of other milk constituents using LC–MS

The vitamin D levels were also analyzed in milk from other mammals including humans. The content of vitamin D in rhesus monkey foremilk were determined using HPLC–MS/MS [158]. This was the first report of vitamin D content in foremilk of the rhesus monkey. The quantities of 25-hydroxyvitamin D, vitamin D2 and vitamin D3 in milk from this mammalian species were determined. The vitamin D and 25-hydroxyvitamin D levels were compared in breast milk collected from Japanese mothers in 1989 and 2016–2017 [159]. The respective concentrations were simultaneously analyzed using LC–MS/MS. The actual status of vitamin D levels were described in lactating mothers compared to the 1980s. The study found out that the concentrations had decreased in recent years and suggested the need to improve maternal vitamin D status to prevent rickets. A similar study on human milk was conducted to determine the vitamin D metabolites status in mother-infant pairs [160]. Sample preparation and LC–MS/MS methods were validated and the levels of vitamin D and 25-hydroxyvitamin D monitored in human breast milk between six and weeks and three months. 25-hydroxyvitamin D2 and D3 as well as 3-Epi 25-hydroxyvitamin were quantified and which enabled to categorize subjects as vitamin D deficient, when the 25-hydroxyvitamin D levels were below 50 nmol/L.

The studies here presented spent considerable efforts to optimize vitamin D detection in milk using LC–MS. Authenticate vitamin D levels is important to fulfill the nutritional needs related to human health. This is implied in the studies by Gjerde et al. and Tsugawa et al. Both studies include significant research to implement vitamin D monitoring to counteract vitamin D deficiency.

Milk metabolomics

Finally, the remaining studies deal with the assessment of the authenticity of the whole milk metabolome. Given the complex milk matrix, not many studies investigate the whole milk metabolome. However, some efforts were undertaken involving more than one analytical platform.

A multimethod approach combined with multivariate analysis was used to study the seasonal variations of bovine milk regarding mineral levels, fatty acid distribution and other selected metabolites [161]. Targeted polar metabolite analysis was conducted using HILIC coupled to an Orbitrap mass spectrometer. The phosphatase activity was suggested as a potential biomarker to assess the changes in milk composition and quality as a tool for tempestive interventions to modulate milk composition and compensate for seasonal variation in milk authenticity. Still regarding bovine milk, an untargeted-to-targeted metabolomics method was developed to differentiate between reconstituted milk vs. ultra-high temperature milk using UPLC–Q-TOF–MS [162]. Potential marker metabolites from peptides, lipids and nucleic acids were detected that can be useful for differentiating between UHT and reconstituted milk. A further extensive metabolic profiling was performed in buffalo milk from different species [163]. In sum, 2,563 metabolites were assessed from which 37 showed significant differences between the buffalo species. A multi-omics approach was performed to characterize the composition of the milk fat globule membrane (MFGM) [164]. The MFGM lipids and proteins were analyzed in different commercial bovine sources. The authors characterized the composition of commercial MFGM samples through a combination of proteomic and lipidomic analyses using LC–MS/MS. Differences in protein and lipid concentrations across the various commercial bovine sources were discovered. Finally, still regarding bovine milk, a method employing SPE-LC–ESI–MS/MS was successfully applied for the determination and quantification of melatonin [165]. It is important to assess the melatonin levels present in breast or cow milk, as they need to be supplied to the infant by the lactating mothers.

Finally, some studies performed metabolomic analyses on human milk. The metabolomic fingerprints of human milk from healthy donors vs donors affected from gestational diabetes mellitus were analyzed [166]. The effects of gestational diabetes mellitus on the metabolomic variation patterns throughout lactation were described. A unique metabolite pattern showed significant changes in specific stages of the respective type of human breast milk.

Concluding remarks

The remarkable developments of LC–MS methods in the field of dairy science made in the past 15 years resulted in an increasing number of studies on milk lipids, proteins, oligosaccharides and other compounds. A growing body of literature describes the milk metabolomes of different mammalian species regarding their lipid, protein and oligosaccharides composition and authenticity. This demonstrates the importance of LC–MS based omic studies to assess the authenticity of milk regarding its species and geographical origin as well as its transformation processes in the dairy industry.

Still the analysis requires extensive extraction and purification of the compounds, HPLC separation still needs to be improved and the matrix effect is a relevant hindrance that is difficult to overcome. Although there are some studies on vitamin D, other lipid soluble and water-soluble vitamins are not studied. The major drawback is that most LC–MS studies on milk, analyze only one major milk fraction at a time and global metabolomic studies are only sparse. Furthermore, relevant obstacles are present in LC–MS based omic methods.

One of the most common obstacles in proteomic analyses for example, is the biological complexity of the milk matrix. The analytical challenges related to the complexity of milk include protein proteolysis, post-translational modifications like glycosylation, phosphorylation, and disulfide bond formation, and the dynamic range of proteins in milk [21]. For lipidomic analyses, the biggest difficulties are the lack of a separation technique able to resolve isomeric species and compositional isomers, as well as the lack of standards for absolute quantification of lipids at species level to name a few [34].

At the present time, human milk profiling is used as a benchmark to understand human health and nutrition regarding the consumption of dairy products from different mammalian species. To ultimately fully characterize human milk and milk from other mammals, a multi-disciplinary approach that integrates deep proteomic, glycomic, and lipidomic analyses with the possibilities to further improve metabolomic studies is needed.