Introduction

Milk is the primary source of nutrition for mammals and serves as a major vehicle of maternal immunity transfer, thus, plays a vital role in inclusive development of the neonates. The milk proteome is extremely complex due to abundant post-translational modifications and various proteolytic processes1. Milk protein composition exhibited high heterogeneity due to numerous genetic variants, and isoforms with different degrees of posttranslational modifications such as phosphorylation and glycosylation in caseins2,3. Milk proteins exhibit conformational structure due to post translational modifications and constitutive levels of proteolytic activity produce a range of significant peptides. The posttranslational modifications of the polypeptide chain occur in the Golgi apparatus of the mammary epithelial cells4. Casein phosphorylation at amino acid serine or threonine is catalyzed by kinase enzymes5. Phosphorylation is affected by different factors such as protein sequence, efficacy of kinase enzymes, gene expression, substrate availability and access to phosphorylation site which is responsible for the specific protein conformation2,6.

Protein functions such as binding, stabilization, biological activity, interactions with proteins and other biomolecules are regulated by phosphorylation-dephosphorylation of protein7. Phosphorylation stabilizes calcium phosphate nano clusters in casein micelles8. The micellar structure of casein enables milk to carry calcium and phosphate to the neonate by channelizing the risk of mammary gland bio-calcification9,10. Phosphorylation state of caseins varies widely from 1P to 3P on CSN3, 4P to 5P on CSN2, 8P to 9P on CSN1S1, 10P to 13P on CSN1S26,11. In bovine milk, CSN1S1 accounts for about 35% of the total casein and has 2 common phosphorylation isoforms: CSN1S1-8P and CSN1S1-9P. Similarly, CSN1S2 accounts for about 10% of the total casein and is present with isoforms from 10 to 14P and occasionally with 9P or 15P2,12.

The identification and analysis of phosphopeptides has been challenging because of the relatively low stoichiometry, inherent lower ionization efficiency and variation of phosphorylation sites13,14. However, advances in proteomics have largely enhanced the quotient of protein identification. Phosphorylation of milk proteins has been studied in llama15, camel16 and goat milk fat globule membrane proteins17 using MS/MS proteomic approach. A total of 8 phosphopeptides corresponding to 18 phosphorylation sites were identified in CSN1S1, CSN1S2 and CSN2 of goat milk using nLC-MS/MS18. Casein phosphopeptides in goat milk have been studied by Olumee-shabon and Boehemer18 and phosphoproteome of goat milk fat globule membrane have been reported by Henry and others17. Phosphorylation have been reported for bovine caseins using inductively coupled plasma mass spectrometry (ICP-MS)19, equine CSN1S1 and and CSN2 by nESI-MS/MS20,21, donkey CSN2 by MALDI-TOF and nESI-MS/MS22 and for CSN1S1 and CSN2 of water buffalo by MS23.

Bovine milk is extensively used, due to its high biological value and plasticity as it can be transformed to cheese and several other dairy products. The functional knowledge of casein and whey proteins has been identified for the presence of bioactive peptides24. These peptides show various biological activities and are released by proteolytic digestion of caseins and milk proteins in gut or during fermentation. This study was designed to gain an insight into molecular diversity of goat milk proteins and to identify various degree of PTM in casein and non-casein proteins. The characterization of high and low abundant milk proteins by gel based proteomic approach i.e. 2DE and nLC-MS/MS. The present study analysed post-translational modification such as phosphorylation, oxidation, and carbamidomethylation on casein and non-casein proteins in goat milk. As post-translational modifications (PTMs) are acting as the major means of intracellular communication, therefore, the interactions of various proteins involved in post translational modifications were analysed.

Results

The present study has analyzed the post translational modifications in low and high abundant proteins in goat milk of genetically diverse goat breeds/genotypes reared in varied ecological and grazing condition in India. The present study generated a comprehensive profile of PTM in goat milk and identified PTM sites in relation to sample variations.

The goat milk proteome

The milk samples from 10 Indian goat breeds were investigated regarding their genetic variation, proteome composition and post translational modifications. The selected populations were mapped in their home tract and reared in semi-intensive system and the animals were apparently healthy. Altogether 1240 milk samples were analyzed by means of 14% SDS-urea-PAGE and different allelic combinations of caseins were resolved to identify the protein variants. Further, the selected variants were analyzed by combining highly sensitive 2DE and Q-Exactive nLC-MS/MS using Thermo Fisher in-house reference database. Distinctive protein spots (n=144) selected from 21 variants in 2DE gels were processed by nLC-MS/MS. The MS data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD013593. The .raw files generated from nLC-MS/MS of goat milk proteins were analysed for Sequest search using Proteome Discoverer (v2.2). By comparing the databases, a milk proteome of 578 peptides for 348 genes (800 Uniprot accessions) were identified using reference database. The post translational modifications were identified in 201 peptide sequences for 86 proteins (120 Uniprot KB accessions).

Post translational modifications in goat milk proteins

The annotations of peptides observing modifications including cysteine carbamidomethylation, oxidation, acetylation and phosphorylation in each protein are presented in Table S2. Post translational modifications were observed in a total of 201 peptide sequences corresponding to 120 Uniprot ID across the reference databases. PTMs were observed in high and low abundant milk proteins in each breed. High performance Q-Exactive LC-MS/MS analyses identified 287 sites of phosphorylation on 120 unique phosphoproteins. The Ser/Thr/Tyr/Asp ratio was 128:70:42:47, respectively. The phosphorylation at Asp residues is reported for the first time in goat milk proteins.

Phosphorylation of high abundance proteins

The observed phosphosites for the high abundance proteins in goat milk are presented in Table 1. In the casein fraction, β-casein (CSN2) showed a maximum of 17 phosphosites, 4D, 5T, 6S and 2Y ; α-S2 casein (CSN1S2) showed 13 phosphosites (7S, 1D, 5T) ; α-S1 casein (CSN1S1) showed 11 phosphosites (5S, 3D, 1T, 2Y) ; and κ-casein (CSN3) showed 6 phosphosites (3S, 1Y, 2T), respectively. Whey proteins β-lactoglobulin (BLG) (P02756) showed 19 phosphosites (7D, 3Y, 5T, 4S) and α-lactalbumin (LALBA) showed 4 phosphosites (3D, 1Y).

Table 1 Identified phosphosites for the major milk proteins in Capra hircus reference database.

The phosphorylated peptides generated from these proteins showed different isoforms and are presented in Table 2. The phosphorylated peptide 76–93 in CSN1S1 showed 5P at S78-S82 and peptide 58–73 showed 1P or 3P resulting in isoforms 10P or 12P. Similarly, peptides from CSN1S2 showed variations leading to different isoforms. Peptides 19–40, 19–37 showed 3 variations in site of phosphorylation such as 1P S24 or 2P S24, S25 or 3P S23, S24, S25. Peptide 142–153 showed 1P S145 or 1P S147 or 3P S145, T146, S147 and peptides 154–165/166 showed 1P S159 or 2P S159, T160 resulting in 8P isoform. β-casein peptides showed variations 1P or 6P in peptide 129–147, 2P or 3P in peptide 148–184 and may result in 10P and 11P isoforms. The identified fragment of κ-casein showed 4P to 5P variation by the level of phosphorylation of peptides 23–40 at 2P or 3P and 52–65/66 at 1P.

Table 2 Identification of phosphopeptides of caprine αS1, αS2, β, κ caseins and α-lactalbumin and β-lactoglobulin by nano-LC-MS/MS.

Similarly, LALBA peptide 78–89 showed 2P and 1P variation at D82, D83; peptide 113–127, 114–127 showed 1P at D116 and peptide 118–127 showed 1P at Y122 resulting in one more possible isoform up to 4P. In BLG the highest level of variation in phosphorylation was observed in peptide 33–58 with 3P, 4P, 5P and 6P at sites T36, Y38, S39, S45, D46 and S48. Furthermore, the peptides 59–78, 143–153/156 showed 1P and 2P variations resulting in 13P isoform of BLG. These findings indicated the different levels of isoforms in phosphorylation of the caprine milk proteins.

Phosphorylation of low abundance proteins

The sensitivity for the identification of phosphorylation sites in low abundance proteins was enhanced by combining 2DE and nLC-MS/MS approach. Phosphorylation was identified in 45 low abundant proteins and presented in Table 3. CREB binding protein (5P), inter-alpha-trypsin inhibitor heavy chain H2 (2P), olfactory receptor OR51D1 (3P), GYCAM1 (1P), putative transcription factor (2P), Proteoglycan 4 (8P), and other proteins with varying level of phosphorylation were identified. Antiviral interferon tau BB4 found only in goat milk was also detected with 7P level of phosphorylation.

Table 3 Phosphorylation in low abundance proteins of goat milk identified in the reference database with accession number and gene name.

Isoform in different breeds

The identified 123 common UniprotKB ID proteins with varying level of phosphorylation across the 10 breeds were observed by comparing databases. The phosphosites and phosphoproteins observed in each breed are presented in Table S3. CSN1S1 (Q8MIH4)9P (S27, D58, S61, S63, T64, D66, Y95, Y106, S130) was identified in samples of Himalayan local goats, whereas, D100, S138 phosphosites were observed in Osmanabadi goat. Similarly, CSN1S2 (P33049) 10P (S23, S24, S25, S145, T146, S147, T154, S159, S160, T161) was observed in Jamunapari goat. Phosphorylation at (S151, D156) and T164 was observed in Himalayan local goat and Osmanabadi goat, respectively. CSN2 (Q95L76) showed 13P (S50, D62, Y129, T135, S137, T141, T143, D144, S157, S167, T169, S181, D197) in Jakhrana goats. CSN3 (Q7YRX4) 5P (Y16, S24, S60, T64, S95) was observed in Attapady Black goats. α-lactalbumin (P00712) exhibited 4P (D82, D83, D116, Y122) in Himalayan local goat and β-lactoglobulin (P02756) showed presence of 16P (D29, T36, Y38, S39, S45, D46, Y60, T67, T94, D114, T115, D116, Y117, T143, D147, D155) in Osmanabadi, 15P in Attapady Black, 13P in Gaddi goats and sites S48, D51 and S54 were also noted.

Similarly, the low abundant proteins were identified in different breeds with varying sites of phosphorylation. Proteins Proteoglycan 4 (PRG4) showed 8P isoform in Barbari, PIK3C2A, CEP152 observed 7P each and SLC1A6 1P in Barbari. Interferon tau BB4 (7P), CREBBP (5P), Cationic trypsin (8P) isoforms were observed in Jamunapari. PDZD9 and SMC5 observed exhibited 5P and 7P in Osmanabadi goats. Emilin-2 (3P), FAM222B (4P), MnSOD and OVOL1 each with 2P isoforms were observed in Sirohi goats.

Other post translational modifications

Other post translational modification such as oxidation and carbamidomethylation were also observed and presented in Table 4. Caseins showed only oxidation and phosphorylation. Peptides of LALBA and BLG showed both oxidation and carboxymethylation other than phosphorylation. β-casein showed carboxymethylation on 7 sites (M117, M124, M159, M160, M171, M172, M198) whereas; α-S2 casein displayed 3 oxidation sites M42, M157, M206. PTM in proteins α-lactalbumin were observed as C80, C110, C130, C139 and C84, C178, M42, M163 on β-lactoglobulin. Low abundant proteins also exhibited oxidation and carboxymethylation such as interferon tau BB4 (C122, M127), laminin subunit LAMC2 (C493, C496, C514, C517, C519, C531), olfactory receptor OR2K2 (C97, C112, M101, M107, M118), ND5 (C279, C291, M277), proteoglycan (C34; C44; C46; C50; C56; C57), keratin KRT1 (C49, M259, M262, M296) and histone showed M60, M63 and/or C111 modifications.

Table 4 Oxidation and carboxymethylation sites as observed in high and low abundance goat milk proteins.

Functional prediction and protein interaction analysis and bioactive peptides

The identified proteome was classified into functional categories such as biological process (BP) (87.8% genes), cellular component (CC) (97.6% genes) and molecular function (MF) (92.7% genes) using DAVID 6.8. The detailed annotations including UP-Keywords from DAVID software are available in Table S4. Enriched GO terms for BP included single-multicellular organismal process (P value 3.63 E−08), cellular component organization (P value 9.51 E−10), response to stress (P value 1.27 E−07), anatomical structure development (P value 2.94 E−07), defense process (P value 4.99 E−07), response to hypoxia (P value 2.57 E−04). The proteins mostly localised in membrane bound organelle (P value 1.72 E−06), and extracellular region (P value 4.05 E−46) and extracellular exosome (P value 1.27 E−51). The molecular functions (MF) largely involved binding activity such as protein binding, DNA binding, nucleosome binding, anion binding, antioxidant activity (P value 2.94 E−04) and structural molecular activities (P value 2.51E−39) were also observed. The UP_Keywords comprised antimicrobials, phosphoprotein, disease mutation and methylation.

The interaction of proteins was analysed by STRING by MCL clustering of the identified proteome categorised the proteins in 16 clusters at highest confidence (>90%) score as depicted in Figure 1a. The network comprised of three major groups where keratin proteins shared distinct close interconnected cluster. Proteins found in defense response were also involved in response to stress, signal transduction and tissue development.

Figure 1
figure 1

Functional prediction and protein-protein interaction analysis: MCL clustering by STRING for all the identified proteins (a); Cytoscape network analysis of PTM gene subset (b); MCL clustering by STRING for PTM protein subset (c); Signal transduction pathway from reactome database for PTM protein subset (d); functional attributes of identified bioactive peptides (e).

Functional analysis of PTM proteins

The GO functional annotations for PTM genes subset were enriched using DAVID 6.8 for classification into BP (87.5% genes), CC (94.4% genes) and MF (87.5% genes). The details of annotations are presented in Table S5. Enriched GO terms included developmental process (P value 0.02069), response to stress (P value 0.0099), response to drug (P value 0.0040), biogenesis (P value 0.048), structural molecular activity (P value 5.19 E−04), protein dimerization activity (P value 0.0092) and extracellular space (P value 3.00 E−06).

The phosphoprotein subset when analysed in Cytoscape Cluego identified 76 genes to generate a network with 6 functional groups at different levels. The details of annotations are presented in Table S6, Supplementary data file 1. The categorized functional groups were lactation, response to hydrogen peroxide, positive regulation of ossification, defense response to gram-positive bacterium, cornification and Systemic lupus erythematosus. Keratin proteins were associated with cornification, H2BC12, H2BC21 and KRT6A with antimicrobial humoral immune response mediated by antimicrobial peptide. Proteins BMPR2, NPPC, ZBTB16 found in defense response to gram-positive bacteria, CAD, CSN2, CSN3 in lactation and neurofilament in structural constituent of cytoskeleton. Estrogen signaling pathway was detected and proteins H3C1, H3C13, H3C15 were associated with pathways such as alcoholism, histone modifications, systemic lupus erythematosus (Fig. 1b).

The PTM gene subset exhibited a network comprising of 8 clusters in STRING (Fig. 1c) The keratin proteins clustered with PRSS1 (Trypsin); CSN2, CSN1S1, CSN3 and LALBA were identified in single cluster and histone proteins interacted closely with CREB-binding protein.

The identifiers (PTM genes) were analysed in Reactome database, where 61 identifiers were found in 332 pathways hit by at least one of them. The majority of proteins were associated with signal transduction, immune system, developmental biology and metabolism pathways. Pathways such as signaling by nuclear receptors, WNT, Notch and RHO GTPase were significant with p < 0.05 (Fig. 1d). The identified 17 phosphoproteins involved in signal transduction pathways included BMPR2, CCR2, CFL1, CREBBP, CSN2, H3FA, HIST1H2BK, HIST2H2BE, HIST2H3A, HIST2H3D, HIST3H2BB, LAMC2, OR2K2, OR51D1 and PRG4. The details of pathways has been given in Table S7, Supplementary data file 1.

Identification of bioactive peptides

The 201 peptide sequences with identified PTM sites were grouped as long (>25 amino acids), medium (7–25 amino acids) and small (<7 amino acids). The identified phosphopeptides were categorized as long (62 peptides), medium (145 peptides), and small (1 peptide). The functions of these peptides were determined from MBPDB for 80% alignment with the known peptides are presented in Table 5. αS1- casein peptide 106–115 was anxiolytic and peptide 107–116 was antimicrobial. κ-casein peptide 46–55 exhibit C3a receptors agonist and opioid functions. β-lactoglobulin peptide 158–164 and β-casein 123–128 showed multiple functions. β-casein 199–217 was anti-inflammatory and α-lactalbumin 118–127 ACE inhibitory. The identified bioactive peptides exhibited anti-microbial activity, DPP-IV inhibitory, anti-inflammatory, ACE inhibitory, antioxidant, proliferating, anti-oxidative, opioid, anti-hypertensive, anxiolytic and hypocholesterolemic functions (Fig. 1e).

Table 5 The function of identified bioactive peptide from MBPD.

Discussion

Non-bovine milk is attracting the researcher’s attention due to its nutrition and therapeutic applications. Goat milk, is leading in the area for nutraceutical formulation and drug development using goat mammary gland as a bioreactor. Goat milk has unique chemical, biochemical, physical and nutritional characteristics and has higher digestibility and lower allergenicity over cow milk25,26. Post-translational modifications (PTMs) of milk proteins contributed to their biological functions and their compositional complexity27. It has been commonly observed that phosphorylation of casein occurs at S or T amino acid residues in tripeptide sequences S/T-X-A, where X represents any AA residue and A is an acidic residue5. In the present study, we have identified phosphorylation occurring in S, T, Y, D amino acids and other post translational modifications as carboxymethylation, oxidation and acetylation. The phosphosites on Asp residues have been reported for the first time in goat milk. Post translational modification of proteins plays an important role to regulate the cellular processes, protein function, protein localization, and formation of protein complex. In eukaryotic cells, protein phosphorylation is among the most frequent post translational modifications28. Post-translational modifications in proteins are crucial for the activity state, localization and protein-protein interactions29. Therefore, molecular diversity of goat milk proteins needs to be explored to identify post translational modifications for studying potential biological role and protein–protein interaction.

The present study focused on obtaining a comprehensive profile of the PTM sites of goat milk casein and non-casein protein and their interaction. Proteomics as a tool have been employed for the discovery, and characterization of post translational modifications such as phosphorylation, oxidation30,31,32. Electrospray ionization (ESI) mass spectrometry (MS) is suitable for studying PTM, including phosphorylation and glycosylation, since the technique provides molecular mass determination of native proteins. In this study, we analysed the goat milk proteins to identify both casein and non-casein PTM sites using nLC-MS/MS. We processed distinct protein spots by mass spectrometry for identification of phosphorylation, oxidation, acetylation and caramidomethylation. A peptidome of 201 peptide sequences with post translational modifications identified 86 proteins/120 UniprotKB accessions (Table S2). The phosphorylation site identified on the amino acids serine, threonine, tyrosine and aspartic acid was 128, 70, 42 and 47, respectively. Serine showed highest affinity for phosphate group in the present study and confirming earlier reports18.

Phosphorylation has been well characterized in the bovine caseins33,34. There are various reports targeting casein fractions in bovine and non-bovine milk by various proteomic approaches32,35,36,37,38,39. The majority of bovine caseins exist in a phosphorylated form, and the phosphorylated residues vary from individual variants possessing one phosphorylated residue (κ-CN) to 13P for others (αS2 CN)40. Bijl and others41 demonstrated that high αS1-CN-8P concentration in bovine milk is a great benefit for the production of uncooked curd cheese because αS1-CN-8P is hydrolyzed more efficiently by chymosin during ripening. Bovine proteome analysis showed more than 30 phosphorylated proteins which included 5 CSN2, 15 CSN1S1, 10 CSN1S2 and 4 CSN3 casein components42. Similarly, donkey milk showed 11 CSN3, 6 CSN1S1 and 3 CSN1S2 casein components43.

Goat milk proteins have been analysed using MS/MS44,45,46,47,48,49. The present study resulted in 105 phosphopeptides from casein and non-casein proteins from all the analysed samples. The conserved peptide sequence of (SSSEE) in casein was also observed in CSN1S1 and CSN1S2 at 1P, 2P and 3P. The phosphosites were identified in casein and whey proteins and on 45 other low abundance proteins. The total number of phosphosites observed in the major milk proteins associated with S, T, Y and D residues were 32, 18, 11 and 21, respectively. The phosphorylation sites observed for CSN2, CSN1S1, CSN1S2, and CSN3 were 11P, 13P, 17P and 6P, respectively (Table 1). However, whey proteins BLG showed 19 phosphosites (7D, 3Y, 5T, 4S) and LALBA showed 4 phosphosites (3D, 1Y). The identified phosphopeptides resulted in 12P, 8P, 11P, 5P, 13P and 4P isoforms of CSN1S1, CSN1S2, CSN2, CSN3, BLG and LALBA, respectively. Beta casein showed highest degree of variation in phosphorylation with identification of 17P sites and other PTM such as oxidation in the identified peptides. A higher number of casein phosphopeptides and phosphorylation sites are reported in the present study in comparison to previous study18.

Casein and whey proteins are post translationally modified by proteolysis by the milk enzymes, formation of disulphide bond by oxidation of cysteine, differential phosphorylation levels of serine and threonine, and glycosylation of threonine residues50. The phosphorylation degree of αS-casein is a prime factor affecting the technological properties of milk. Therefore, “signature peptides” and “caseome” analysis are being used to investigate adulteration in milk of different species51. The identification of cheese from different species has been authenticated by proteolytic peptides52,53. Therefore, proteome analysis of fermented milk products should be carried out due to their nutritional and health economic importance.

The present study reported 45 non casein phospho proteins assigned to various metabolic pathways (Table 3). The identified PTM sites varied in milk samples of different goat breeds (Table 4). Identification of low abundance proteins in milk is difficult as single step analysis fails to detect a large proportion of these proteins. Moreover, to overcome the limited entries in the caprine database, other reference database were used for identification of low abundance proteins. The varying levels and sites of phosphorylation in different breeds may be attributable to various physiological or environmental conditions under the influence of different agro-climatic regions. The phosphopeptides assigned to these proteins were mainly mono- or bi-phosphorylated (Table 2).

The other post translational modifications such as acetylation, oxidation, and carbamidomethylation have also been reported (Table 4). The post translational modifications play an important role in protein/peptide functioning and their interaction. N-terminal acetylation increases peptide stability by preventing N-terminal degradation54. Peptides with carbamidomethylation are mainly used in peptide mass fingerprinting for identification and characterization of proteins55.

In the present study, the protein-protein interaction was analysed to know about the functional properties of proteins (Fig. 1). The GO annotations were analysed using DAVID, network and interactions using Cytoscape and STRING and pathways analysis using Reactome database. The keratin proteins interacted with trypsin (PRSS1) and histone proteins interacted with CREB binding protein (CREBBP). The identified phosphoproteins were associated with lactation, response to stress, histone modification, cornification and signaling pathways (Tables S5, S6, S7). It has been reported that caseins are associated with other secreted calcium (phosphate)–binding phosphoproteins, such as osteopontin, in milk56. Protein phosphorylation is vital for the regulation of metabolism, proliferation, inflammation, apoptosis, signaling and other important physiological processes. Autophosphorylation increases the catalytic efficiency of the receptor and provides binding sites for the assembly of downstream signaling complexes57. Caseins form micelles which vary from species to species, and when cleaved, generated bioactive peptides, having potential functions making them protein of interest. This gastrointestinal degradation may be the consequence of enzymatic hydrolysis, fermentation and other processes used in dairy production58.

The identification and characterization of phosphorylation sites are required to explore signaling networks of milk proteins. Phosphorylation-site provides definitive information on functional relationships between signaling proteins. The peptides released by enzymatic hydrolysis have specific biological functions due to their functional and interactions at cellular level59. The identified phospho bioactive peptides were mainly anti-microbial followed by ACE inhibitory, DPP-IV inhibitory, proliferating functions. Anti-oxidative, antioxidant, anxiolytic and hypocholesterolemic peptides were also confirmed from goat milk proteins (Fig. 1e). Non-bovine milk, for their health potential, economic value and the bioactive components/peptides, the milk protein fractions are being extensively investigated.

The goat milk protein genotypes have been observed in different breeds. Protein and casein content depend on allelic variants and breeds in different regions60,61,62,63. CSN1S1 gene acts as natural model and different genotypes occur due to interallelic combinations64. Indian goats have higher frequency of A and B alleles65,66. Sannen, Alpine and other European goat breeds have higher frequency of medium alleles like C, D, E, F60,61,67,68. Therefore, interallelic combination in casein complex leads to differential protein synthesis as well as other processing properties. Therefore, including different breeds/genotypes will definitely affect the proteome identification and post translational modifications pattern.

Conclusion

Proteomic analysis has been carried out to study human, bovine and non-bovine milk for nutrition and therapeutic applications. Phosphorylation has been well characterized due to several technical, practical and bioinformatics approach. The present study identified 201 peptides showing post translational modifications in goat milk. The phosphorylation at Asp residues is reported first time in goat milk proteins. The rare conserved peptide sequence of (SSSEE) in casein was observed in casein phosphopeptides (CSN1S1, CSN1S2). Phosphorylation regulates protein functions, such as biological activity, interaction and stabilization by causing conformational changes in the protein. Therefore, the identification of the post-translational modifications of the milk proteome has become necessary and may provide newer insight to extend the milk proteome and its potential biological role.

Materials and methods

The work flow of the present study has been depicted in Fig. 2.

Figure 2
figure 2

Flow diagram showing the methodology of PTM analysis.

Milk collection and description of genetic stock

Goat milk samples were collected from 1240 animals belonging to 10 goat breeds/ genotypes during the postpartum days 30–65. Milk samples were collected from natural habitats of the breeds belonging to different geographical and agro-climatic regions of India. Samples were collected between 7:30 and 9:00 hrs. by hand milking. After disinfecting the udder, 30–40 ml of milk was collected directly to the collection tubes, transported to the laboratory at 4ºC within 24 hrs. and stored at -20ºC. Milk subsamples for protein analysis were stored at -40ºC until further analysis. All sample collection was conducted in accordance with institutional practice and the study was approved by Institutional animal ethics committee (IAEC).

The breeds analysed in the study belonged to arid, semi-arid, humid, coastal and mountain regions with different grazing conditions (Fig. 3). The details of samples collected with the description of natural habitats from each breed are presented in Table S1, Supplementary data file 1. The animals were apparently healthy and the body condition score was satisfactory (3–4). The animals are mainly reared under semi-intensive system depending mostly on field grazing and supplementation of dry fodder, concentrate and mineral mixture.

Figure 3
figure 3

Geographical location of goat milk sampling. The breeds are abbreviated in the region of geographical origin. GD, Gaddi; HL, Himalayan local; JK, Jakhrana; B, Barbari; J, Jamunapari; S, Sirohi; BB, Black Bengal; GN, Ganjam; OS, Osmanabadi; AT, Attapady Black.

Gel based milk protein analysis

The milk protein variants were analysed by SDS-PAGE and details have been described elsewhere44. The skimmed milk samples were centrifuged at 12000g, 4ºC, 15 min to obtain clear transparent aqueous layer. This layer was separated, quantified and reduced in Laemmli’s sample buffer for analysis by SDS-PAGE. Milk proteins were resolved on 14% SDS and urea PAGE using 8 µg in each lane. The gels were stained with commasie brilliant blue (R250) and subsequently scanned in the Gel documentation system (Alpha Innotech Corporation, USA). Milk protein variants were identified by comparing the allelic variation with reference samples (confirmed by sequencing) by determining the molecular weight.

The protein variants (n=21) identified in the 10 breeds using SDS-PAGE were selected for analysis by 2DE. Individual protein samples were subjected to in–gel rehydration of IPG strips (SERVA IPG Bluestrip 3–10 NL/7cm). Rehydration was carried out at room temperature for 16 hrs. with protein sample diluted in rehydration buffer (8M Urea, 0.002% bromophenol blue, 2% w/v CHAPS, 3 mg dithiothretol, 1.5% ampholyte pH 3–10). For first dimensional electrophoresis, IPG strips were transferred to Hoefer IPG phor II. Iso electro focusing (IEF) was then performed at 20°C by a series of steps as follows: constant 250 V, 1:00 hrs.; constant 500 V, 1:00 hrs.; gradient 1000 V, 1:00 hrs.; gradient 3000 V, 2:00 hrs.; constant 3000 V, 2:00 hrs. The strips were then equilibrated in SDS equilibration buffer (6M Urea, 2% Tris HCl, 0.002% Bromophenol Blue) for 20 min. and loaded onto a 14% acrylamide gel for second dimension resolution. Second dimension run was carried out with mini gel SE 260 at 4V/cm. The 2DE protein spots were visualised by staining with 0.2% silver nitrate solution.

Nano-liquid chromatography mass spectrometry (nLC-MS/MS)

The details of nLC-MS/MS analysis has been described elsewhere44 and used with a minor modifications for PTM analysis.

  1. a.

    Protein digestion

The silver stained protein spots in gels were excised carefully with help of a sterilized spatula and transferred into separate vials. Destaining and acetone precipitation was carried out. In-gel digestion was performed; samples were reduced with 5 mM TCEP (Tris 2-Carboxyethyl Phosphine) at 55°C for 1 hour and further alkylated with 50 mM iodoacetamide for 30 min at room temperature in the dark. The gel spots were shrinked with acetonitrile and air-dried for few minutes at room temperature followed by digestion with trypsin (1:50, trypsin/lysate ratio) for 16 hours at 37°C. Digests were dried using speed vac for 1 hour and pellet was dissolved in buffer A (5% acetonitrile, 0.1% formic acid). Digests were cleaned by Sep-Pak. Titanium method was performed for phospho binding. Digests were cleaned up again using C18 silica cartridge (The Nest Group, Southborough, MA) following manufacturer’s protocol and dried using speed vac. The desalted dried pellet was reconstituted in buffer A (5% acetonitrile, 0.1% formic acid).

  1. b.

    Liquid chromatography mass spectrometry analysis

All the experiments were performed using EASY-nLC 1000 system (Thermo Fisher Scientific) coupled to Q Exactive mass spectrometer (Thermo Fisher Scientific, Germany) equipped with nano electrospray ion source. Peptide mixture (1.0 µg) was loaded on a precolumn and was resolved using 5 cm PicoFrit column (360 µm outer diameter, 75 µm inner diameter, 10 µm tip) filled with 1.9 µm of C18-resin (Dr Maeisch, Germany). The peptides were loaded with buffer A and eluted with a 0–40% gradient of buffer B (95% acetonitrile, 0.1% formic acid) at a flow rate of 500 nl/min for 10 min. The QExactive was operated using the Top10 HCD data-dependent acquisition mode with a full scan resolution of 70,000 at m/z 400. MS/MS scans were acquired at a resolution of 17500 at m/z 400. Lock mass option was enabled for polydimethylcyclosiloxane (PCM) ions (m/z = 445.120025) for internal recalibration during the run. MS/MS data was acquired using a data-dependent top 10 method dynamically choosing the most abundant precursor ions from the survey scan.

  1. c.

    Protein identification and PTM analysis

The .raw files generated were analyzed using Proteome Discoverer (v2.2) against the in-house Uniprot reference proteome database (Capra hircus, Ovis aries, Homo sapiens and Bos taurus). Due to scarcity of available protein annotations for goat, the analysis was carried out with goat, human, sheep and cow database with 7056, 20117, 27666, and 23869 entries respectively. For Sequest search, the precursor and fragment mass tolerances were set at 10–15 ppm and 0.5 Da, respectively. The protease used to generate peptides, that is the enzyme specificity was set for trypsin/P (cleavage at the C terminus of “K/R”: unless followed by “P”) along with maximum missed cleavages value of 2. Carbamidomethyl on cysteine as fixed modification and oxidation of methionine and N-terminal acetylation and phosphorylation (S, T, Y, D) were considered as variable modifications for database search. Peptide spectrum match and protein false discovery rate (FDR) were set to 0.01.

Functional analysis

All the identified milk proteins were assigned their gene symbol via the Uniprot knowledgebase (http://www.uniprot.org/). Protein classification of the identified proteome and PTM genes subset were performed based on their functional annotations using Gene Ontology (GO) for biological process, subcellular localization and molecular function using the Database for annotation, visualization and integrated discovery (DAVID) version 6.8. The enrichment was performed with Homo sapiens database in background for the identified gene names and measured by fisher exact test in DAVID system.

The analysis for the PTM subset was performed using Cytoscape v.3.8.1. Protein interaction networks, biological pathways and protein clusters with Homo sapiens as reference database were generated using cluego+cluepedia plugin. Analyses were carried out with a significance level of 0.05 using a hypergeometric test and the Benjamini & Hochburg false discovery rate correction.

Protein interaction networks were analysed using search tool for the retrieval of interacting genes/proteins (STRING version 11.0). STRING networks were calculated at highest confidence score of 0.900 for the entire set of milk proteins and for PTM protein subsets and interactions were clustered using MCL algorithm. The gene search for network was performed in Homo sapiens reference database.

The peptides showing modifications were grouped based on the length as small (<7AA), medium (7–25 AA) and long (>25AA). The functions of these peptides were determined by Milk bioactive peptide database (MBPDB)69 at threshold of 80% identity match for their known functions.