Identification of the initial reactive sites of micellar and non-micellar casein exposed to microbial transglutaminase

To investigate the influence of the internal micellar structure on the course of enzymatic cross-linking especially in the initial phase of the reaction, casein micelles isolated from raw milk via ultracentrifugation were incubated with microbial transglutaminase (mTG) in comparison with non-micellar sodium caseinate. Reactive lysine and glutamine residues were identified using a label-free approach, based on the identification of isopeptides within tryptic hydrolysates by targeted HRMS as well as manual monitoring of fragmentation spectra. Identified reactive sites were furthermore weighted by tracking the formation of isopeptides over an incubation time of 15, 30, 45 and 60 min, respectively. Fifteen isopeptides formed in the early stage of mTG cross-linking of caseins were identified and further specified concerning the position of lysine and glutamine residues involved in the reaction. The results revealed lysine K176 and glutamine Q175 of β-casein as the most reactive residues, which might be located in a highly flexible region of the molecule based on different possible reaction partners identified in this study. Except for the isopeptide αs1 K34–αs2 Q101 in sodium caseinate (SC), all reactive sites were detected in micellar and in non-micellar casein, indicating that the initial phase of enzymatic cross-linking is not affected by micellar aggregation of caseins.


Introduction
Microbial transglutaminase (mTG) is an enzyme which catalyzes an acyl transfer from a glutamine moiety to a primary amine. When lysine residues are available, the reaction results in a three-dimensional cross-linking of proteins. This is widely used in the food industry to modify peptides/ proteins and thus optimize their functionalities [1][2][3][4]. For instance, enzymatic cross-linking of milk proteins increases viscosity as well as stability of acid-induced dairy gels and thus decreases syneresis of yogurt gels [5][6][7]. Furthermore, the structure of casein micelles, naturally formed nanocarriers in milk, shows higher stability against dissociating agents or pressure after enzymatic cross-linking by mTG [8,9].
On a molecular level, α S -caseins are expected to be better substrates for mTG than β-casein and κ-casein due to a higher number of glutamine (α s1 /α S2 /β/κ = 14/16/20/15) and lysine (α s1 /α S2 /β/κ = 14/24/11/9) residues in their amino acid sequence. However, in casein micelles, α S -caseins are considered to be predominantly located in the interior of the micellar structure, resulting in less accessibility for mTG and an inverted reactivity in micellar casein compared to non-micellar casein [10,11]. Several scientific groups have already evaluated the sites of individual caseins which are reactive for enzymatic cross-linking [12][13][14]. Usually, labeling substances such as [ 14 C]-putrescine, [ 14 C]-monodansylcadaverine and N-(glucose-glucose-glucitol-1)-cadaverine participating in the enzymatic reaction as acyl donor or acceptor were used. Afterward, the hydrolyzed protein samples were screened for correspondingly labeled peptides to identify reactive sites [12][13][14]. This approach would evoke several problems when applied for complex protein associates like casein micelles by influencing the intermolecular protein network. Hence, to the best of our knowledge, reactive sites for enzymatic cross-linking by mTG in caseins were only evaluated in individual caseins or in sodium caseinate, but not in micellar casein (MC).
The primary aim of our study was to use a label-free approach to investigate reactive sites within casein micelles while maintaining their natural structure. To ensure the maintaining of the natural micelle structure, the procedure for isolation casein micelles from raw milk standardized and optimized in our working group was used as previously reported and evaluated [10,15,16]. Furthermore, direct identification of formed isopeptides should deliver new insights into the enzymatic cross-linking as well as the casein micelle structure by identification of both (i.e., lysine and glutamine) moieties involved in the cross-link.
To restrict the results of this study to the most reactive ("hottest") sites of casein, we focused on the initial phase of the protein cross-linking using mild reaction conditions. Because reactivity of individual caseins in casein micelles had been investigated particularly at a high degree of crosslinking in previous studies [10,17,18], our approach furthermore delivers information about the influence of micellar structure on the course of the enzymatic cross-linking at the very beginning of the reaction.
Apparatus Bi 18E from QCS GmbH (Maintal, Germany), was utilized for all experiments.

Preparation of casein suspension
Caseins were isolated from the same batch of raw milk to exclude genetic variations between the samples. In the first step, raw milk was defatted by centrifugation at 2000×g for 30 min at 4 °C and subsequently divided into two parts: one for production of sodium caseinate and the other for isolation of casein micelles.
Sodium caseinate was prepared according to Recio and Olieman [19]. Briefly, caseins were precipitated by acidification of skim milk with 1 M sodium acetate buffer (pH 4.3), separated from whey by centrifugation (2000×g, 20 min, 4 °C) and again dissolved in deionized water under pH adjustment to pH 7.0 by adding 1 M sodium hydroxide. To completely remove whey proteins, this procedure was performed three times. An additional defatting step was performed by washing precipitated casein with 1 L acetone followed by 2 × 1 L of ethanol, followed by drying under a nitrogen stream overnight. To convert the dried acid caseinate into sodium caseinate, the precipitate was dissolved in deionized water, pH was adjusted to pH 7.0 with 1 M sodium hydroxide, and the solution was lyophilized for obtaining a storable powder.
Isolation of casein micelles was performed as reported in our previous study [10]. Defatted raw milk was lyophilized to obtain storable skim milk powder, which could be then reconstituted in portions by adding bi-distilled water. MC was obtained as a pellet and separated from whey by ultracentrifugation using an Optima XE-90 ultracentrifuge equipped with a SW41-Ti rotor (both obtained from Beckman Coulter, Krefeld, Germany) at 100,000×g for 1 h at 20 °C. To remove remaining whey proteins, the pellets were washed two times with simulated milk ultrafiltrate (SMUF), which was prepared according to Jenness and Koops [20] with additional preservation using 0.02% (w/v) sodium azide. Each washing step was followed by ultracentrifugation as above. For the experiments, pellets of MC were suspended in SMUF utilizing an ultrasonic homogenizer HD2070 combined with sonotrode MS72 (Bandelin electronics, Berlin, Germany) for 2 × 2 min at 75% power and afterward stirred overnight at 6 °C.
All experiments were performed with suspensions of MC or sodium caseinate standardized to 1% (w/v) protein concentration by protein determination via UV signal at 280 nm using commercial sodium caseinate for calibration.

Enzymatic cross-linking
Identification of the most important ("hottest") reactive sites of caseins was performed in an early stage of enzymatic cross-linking, indicated by the predominant formation of only a small amount of dimeric proteins and as few as possible trimeric or higher cross-linked species. Therefore, mild reaction conditions were chosen with an enzyme-substrate ratio of 1:80 (1 U/g casein) and an incubation time of 1 h. Cross-linking was carried out at the temperature optimum of the enzyme at 40 °C and the pH optimum of casein micelles at pH 6.8. For the preservation of the natural casein micelle structure, all samples were solubilized in SMUF. The enzyme reaction was terminated by heating samples at 85 °C for 7 min. As a blank, SMUF instead of mTG solution was added to casein samples and incubated equally.
After incubation, sodium caseinate was dialyzed (molecular weight cutoff 14,000 Da) and subsequently freeze-dried. Cross-linked MC was ultracentrifuged as described above. The supernatant, containing extra-MC, was removed, dialyzed and lyophilized. The MC pellet was suspended in 2 mL bi-distilled water and subsequently lyophilized. All freezedried samples were further subjected to size exclusion chromatography (SEC) as described below.
For weighting the importance of the different reactive sites, casein samples were also incubated for 15, 30, 45 and 60 min under the same conditions as described above but without further purification via SEC.

Semipreparative purification of dimeric casein species by size exclusion chromatography (SEC)
Semi-preparative SEC was used for the separation of dimeric from monomeric proteins after mTG treatment based on a previously reported analytical method [21]. For that, cross-linked and freeze-dried samples were solubilized to 0.2% protein in elution buffer containing 6 M urea, 0.1 M sodium chloride, 0.1 M sodium hydrogen phosphate dihydrate, 1.6 mM CHAPS with additional 1% DTT. For complete reduction of disulfide bonds, samples were kept overnight at 8 °C. After membrane filtration (1.2 µm), aliquots of 8 mL were injected in the protein purifying chromatography system NGC Quest 100 consisting of two eluent pumps, one sample pump, conductivity as well as UV detector and the fraction collector bioFrac, all from Bio-Rad Laboratories GmbH (Munich, Germany). Separation of cross-linked and monomeric proteins was carried out by using the column Superdex 200 HiLoad 26/60 prep grade (GE Healthcare Life Sciences, Buckinghamshire, UK) and an isocratic elution at 2 mL/min for 320 mL by using the above defined elution buffer. Proteins were automatically collected in 7 mL fractions, when UV signal rose above 10 mAU at 280 nm. Fractions containing dimeric proteins were pooled, subsequently dialyzed utilizing a dialysis tubing cellulose membrane with a weight cutoff of 14,000 Da (Sigma Aldrich, Steinheim, Germany) and freeze-dried for following tryptic digestion.
The degree of oligomerization was calculated using the ChromLab Software (Bio-Rad Laboratories GmbH, Munich, Germany) by relating the peak area of cross-linked species to the whole area under the curve like previously reported [22].

Enzymatic hydrolysis
Freeze-dried samples were solubilized in TRIS buffer (0.1 M tris(hydroxymethyl)aminomethane, pH 7.8) to a protein concentration of 1 mg/mL. Aliquots of sample solution were added to 0.02 mM trypsin, which was previously solubilized in 1 mM hydrochloric acid according to the instructions of the distributor Sigma-Aldrich (Steinheim, Germany), to reach an enzyme-substrate ratio of 1:100. Enzymatic hydrolysis was then carried out at 37 °C for 16 h and stopped by freezing the samples followed by lyophilization.

Peptide sequencing via RP-HPLC-HRMS
For identification of tryptic peptides and isopeptides, reversed-phase high-performance liquid chromatography combined with high-resolution mass spectrometry (RP-HPLC-HRMS) was performed. Lyophilized hydrolysates were dissolved in eluent A (0.1% formic acid in bi-distilled water) containing 0.1% DTT and membrane filtered before analysis. Peptides were separated via RP-HPLC equipped with the column AdvanceBio Peptide Plus (2.1 × 150 mm; 2.7 µm) and the detector Q-TOF 6545, all from Agilent (Waldbronn, Germany). For elution, a gradient system as summarized in Table 1 with 0.1% formic acid in bi-distilled water as eluent A and 0.1% formic acid in acetonitrile as eluent B was used, starting with an isocratic step of 98% A for 2 min, followed by a gradient to 90% A for 5 min, then to 80% A in 23 min, then to 60% A in 10 min, then to 5% A in 1 min, followed by an isocratic step at 5% A for 8 min and a gradient back to 98% A in 4 min, followed by a post-run time of 18 min. The oven temperature was set to 35 °C, and the flow rate was 0.2 mL/min. Peptides were ionized positively by the ionization source Dual AJS ESI (Agilent Jet Stream Electrospray Ionization). The mass spectrometer was tuned every day in the mass range up to 3200 m/z at 4 GHz and in high-resolution mode. The fragmentor was set to 200 V, skimmer to 65 V, Oct1 RF Vpp to 750 V, nebulizer to 50 psig, VCap to 4000, nozzle voltage to 1000 V. The drying gas was used at a temperature of 300 °C and a gas flow of 12 L/min, whereas sheath gas temperature was 350 °C and gas flow 11 L/min.
For simultaneous identification of as many peptides as possible, Auto-MSMS mode was used. Accordingly, precursor selection was restricted to the three most abundant precursors per scan and selected precursors were excluded after three spectra and released after 0.5 min, while highly charged peptides were preferred. All precursors were fragmented by collision energy of 20 eV. Detection was made with 2 spectra/s and in a mass range of 100-3200 in MS modus or 50-3200 in MS/MS modus, respectively.
Peptides were identified by the software PEAKS Studio 8.5 (Bioinformatics Solutions Inc., Waterloo, Canada) using the UniprotKB/Swiss-Prot database (https:// www. unipr ot. org/). Each identified peptide was then defined in the software as marker for other peptides to search automatically for isopeptides formed by enzymatic cross-linking. Respective marker masses were calculated by subtracting the monoisotopic mass of ammonia (17.0264 Da) from the original peptide mass because the formation of one isopeptide bond by mTG results in the release of one mol of ammonia. All defined markers originating from α s1 -, α s2 -, β-and κ-casein are listed in Tables S1, S2, S3 and S4 in the supporting information, respectively. Phosphorylated, deaminated as well as dehydrated residues were also detected by the software because of a mass difference of + 79.97; + 0.98 and − 18.01, respectively. Additionally, data analysis included the consideration of undefined cleavages at both sides of the peptides as well as up to five missed cleavages, so that a decrease in hydrolysis efficiency because of cross-linked protein was taken into account. Results from the software analysis were afterward verified by searching for the isopeptide masses in the cross-linked samples manually and proving the absence of these masses in the blank samples (non-cross-linked casein as well as monomeric fraction of semi-preparative SEC). Furthermore, fragmentation spectra of each isopeptide were improved by a targeted analysis to ensure correct peptide sequencing. For that, the identified peptide masses were used as precursors and the collision energy was varied between 10 and 40 eV. The observed fragmentation pattern was then compared to the theoretically formed y-and b-ions.
For evaluating the importance of identified isopeptides, caseins were incubated with mTG for different times (15, 30, 45 min) and undergone tryptic digestion without previous purification via SEC. Subsequently, mass spectrometry was carried out in scan modus using the same MS parameters as in Auto-MSMS modus. The previously identified masses of isopeptides were searched in the scan, and their peak areas were related to the peak area of the peptide α s1 F 145 -R 151 (FYPELFR, m/z 486.2568; z = 2), which could not be crosslinked by mTG because of the absence of glutamine and lysine in its amino acid sequence.

Statistical treatment
Incubation of caseins for 15, 30, 45 and 60 min was performed in duplicates.

Results and discussion
This study examines the influence of the micellar structure on the course of the early stage of enzymatic cross-linking of caseins by mTG. To pursue this objective, MC and sodium caseinate (SC) were incubated with a low activity of mTG, so that only the most reactive sites were affected by enzymatic cross-linking, indicated by primary formation of casein dimers shown via SEC (Fig. 1). After incubation, MC suspensions were ultracentrifuged to isolate the extramicellar fraction (EC) for separate analysis. Extra-micellar casein (EC) accounts for 6.0-7.2% of total casein and consists of all caseins with β-casein as the major component, as determined in our previously published study [10].
The degree of oligomerization, calculated via SEC as the area of cross-linked molecules related to the whole area under the curve, was determined to 14.9%, 13.5% and 13.9% in MC, EC and SC, respectively ( Fig. 1). To concentrate the isopeptide content within the samples for further analysis via HRMS, dimer fractions were separated from monomer and trimer fractions via semi-preparative SEC. As proof of concept, the collected fractions were analyzed via SDS polyacrylamide electrophoresis according to Schägger & von Jagow (1987) with a Coomassie Brilliant Blue staining according to Radola (1980) to ensure the correct assignment of SEC peaks to oligomer type (see Supplementary Data: Fig. S4) [23,24]. The purified dimer fractions were digested by trypsin and resulting tryptic peptides were sequenced by HRMS.
As blanks, non-cross-linked MC and sodium caseinate were hydrolyzed similarly and obtained peptides were utilized as a standard for normally formed peptides due to tryptic digestion of caseins. For sequencing those peptides, HRMS in Auto-MSMS mode combined with data analysis software PEAKS 8.5 was applied. By doing so, the coverage of amino acid sequence reached 77%, 55%, 88% and 75% for α s1 -, α s2 -, β-and κ-casein, respectively. These ordinarily formed peptides were afterward defined as markers in the data analysis software. To consider the loss of ammonia during enzymatic cross-linking, the mass of ammonia (17.0264 Da) was subtracted from the original peptide masses. A summary of these marker peptides with their amino acid sequence, original and corrected monoisotopic mass and a coverage map for each casein is shown in the supplementary material (Tables S1-S4, Figure S1). Within this group of marker peptides, 103 and 63 peptides contained glutamine and/or lysine, respectively. Accordingly, there are at least 6,489 possible dimer combinations without considering posttranslational modifications and multiple glutamine or lysine moieties within one peptide.

Identification of reactive sites
After defining marker peptides, the software was able to detect automatically isopeptides in the cross-linked samples. As discussed by Iacobucci and Sinz [25], automatic identification of peptide cross-links implies the high probability of false-positive results. They recommended five guidelines to avoid misinterpretations of cross-links. Although our study is somewhat different from the studies discussed by these authors, because we did not use any cross-linker substance, we likewise formulated some requirements, which should be fulfilled to state a result from the automatic detection by the software as a real, identified cross-link. Accordingly, we evaluated the software results manually and accepted them as isopeptides only if they would have met the following conditions: (1) absence of the peptide mass in blank samples, (2) mass accuracy should be ± 20 ppm, and (3) amino acid sequence of the isopeptide was accurately sequenced by targeted HRMS.
The procedure of this data analysis should be explained in more detail through an example. According to the software analysis, an isopeptide with a mass-to-charge ratio of m/z 444.7515 and a charge of z = 4 was identified in micellar as well as free casein (Table S5, isopeptide #2). The absence of this peptide mass with a charge of 4 in the blank samples was checked manually and confirmed as shown in Fig. 2, meeting the above-defined 1 st condition for data quality. The theoretical mass of this isopeptide would be 1774.9799 Da. With a detected molecular mass of 1775.0060 Da, the mass accuracy of detection is 14.7 ppm meeting the abovementioned 2nd condition of a mass accuracy within ± 20 ppm. The software assigned this isopeptide mass to a cross-link formed between two peptides originating from β-casein, namely peptides β-106 HKEMPFPK 113 and β-170 VLPVPQK 176 , in particular between β K107 and β Q175. HRMS in targeted modus was used to confirm the assigned amino acid sequence of the isopeptide and the precise identification of reactive sites especially in peptides with multiple lysine or glutamine moieties. In the isopeptide of interest, a crosslink could be theoretically present between β Q175 and β K107 or between β Q175 and β K113. Figure 3 shows the fragmentation spectrum of the isopeptide detected with m/z 444.7515. The nomenclature for each fragmentation ion was chosen as follows: Terminology of ion type (y, b) and numbering of fragmentation ions was performed as suggested by Roepstorff and Fohlman [26]. Peptide chains of a cross-linked peptide were usually differentiated with Greek letters [25,27,28]. Because this could be easily mistaken for α S -and β-caseins, we decided on Roman numerals for the differentiation of peptide chains. Table 2 compares the theoretically formed fragmentation ions of each option for cross-link (β Q175-β K107 or β Q175-β K113) with the detected fragmentation ions labeled in red or blue according to the respective peptide chain. Because nearly the whole amino sequence of the isopeptide was covered by HRMS, the 3 rd of the abovementioned conditions is also met and the detected peptide is confirmed as an isopeptide. By searching for the specific ions in the fragmentation spectrum of the isopeptide, no specific ions for linkage β Q190-β K128, but six for linkage β Q190-β K122 were found, which were highlighted in red in Table 2. Accordingly, HRMS in targeted modus enabled differentiation of reactive moieties from unreactive sites. All likewise 15 identified isopeptides are summarized in Table 3. Respective detailed analysis data including detected m/z ratio, charge z, retention time, theoretical molecular mass of each isopeptide, and the mass accuracy of detection in each casein sample are given in the supplements (Tables  S5-S6). In the last column of Table 3, results were graded in terms of the data quality obtained from the targeted analysis. Hence, 10 isopeptides with verified reactive sites and fragmentation ions of both peptide chains were marked with three stars. One isopeptide (#6) with fragmentation ions from both peptide chains, but multiple possible reactive sites, was marked with two stars, and four isopeptides (#7, #12, #14, #15) with fragmentation ions from only one peptide chain were labeled with only one star. The latter were detected only with a low abundance of precursors and thus can be considered as less important reactive sites, at least for the early stage of reaction.

Evaluation of the identified isopeptides
Detected isopeptides were also categorized in two groups: (I) homogenous isopeptides, which were formed between molecules of one casein type, and (II) heterogeneous isopeptides, resulting from two different casein types. These groups include a similar number of isopeptides indicating that there is no preferred building of isopeptides. According to the steric arrangement of caseins in the casein micelle, this result indicates that caseins are not organized in uniform blocks consisting only of one casein type. The casein micelle can be imagined as a protein aggregate of a more random shape and inner arrangement. However, the group of homogenous isopeptides mainly consists of linkages between β-caseins. For sodium caseinate, this could be explained by the self-associating behavior of free β-caseins, which was previously reported [29][30][31]. These β/β-cross-links were also detected in MC, indicating that β-casein molecules neighbor each other also in the casein micelles, which complies with the proposed formation of core polymers as initial protein aggregates for casein micelle formation [32,33].
Undeniably, especially partially sequenced isopeptides (marked with one or two stars in Table 3) could raise the question if their masses result really from isopeptide bond formation or just from incomplete tryptic digestion. In particular, incomplete digestion is highly probable because of increasing steric hindrance due to isopeptide formation. Because the used data analysis software is able to consider non-specific as well as missed enzymatic cleavages, some isopeptides were detected via two different molecular masses due to the performance of digestion. For instance, isopeptide #4 (Table 3) was identified as a cross-link between β-casein K191 and β-casein Q190, but was detected with the two different peptide masses 1541.9739 Da (#4(I)) and Posttranslational modifications alter the monoisotopic mass of a peptide. Particularly, phosphorylation is a common modification of caseins, but sample preparation could evoke additional modifications like deamination or dehydration. The calculation software also considered those modifications. In the supplementary data, Figure S1 shows the identified modifications in the ordinarily formed casein peptides as yellow squares in the respective amino acid sequence. Also some of the detected isopeptides, as for example isopeptide #7 (Table 3), show phosphorylated serine residues, which were indicated as "J" (phosphoserine) instead of "S" (serine) in the respective amino acid sequence. So far, we were able to identify 15 isopeptides formed in the early stage of mTG cross-linking of caseins and furthermore specify the exact lysine and glutamine residues involved in the reaction. In general, β-casein has mainly participated in the formation of the most isopeptides identified in this study. This underlines the high flexible structure of the β-casein molecule. However, the most interesting result might be that nearly all of the identified isopeptides were found in free and MC. Hence, it can be concluded that, in the initial phase of mTG cross-linking, the same reactive sites are available in casein micelles as well as in sodium caseinate, thus emphasizing the high porosity of this natural nanostructure. Only the isopeptide #14, formed between

Comparison to previously reported reactive sites
The identified reactive moieties α s1 Q13, α s1 Q130, β Q54/56, β Q175, κ K24 were already reported by other studies as substrates for mTG (see Table 4) [12][13][14]34]. However, there are more contradictions revealed in the comparison of our results to the literature reports. We identified 14 other casein moieties, which were not previously described as reactive sites of the enzymatic cross-linking in caseins, whereas 17 other moieties, reported as reactive sites of caseins in the literature, were not confirmed in this study. This huge difference is most probably due to the different analytical techniques. In the abovementioned studies, labeling substances with structures similar to glutamine (e.g. [ 14 C]-putrescine, [ 14 C]-monodansylcadaverine, N-(glucose-glucose-glucitol-1)-cadaverine) were used as a reaction partner for the identification of reactive glutamine residues [12][13][14]. For suppression of protein cross-linking, lysine residues were sometimes amidinated or succinylated, which could evoke electrostatic repulsions and thus might influence the native protein structure to a considerable amount [13]. Therefore, reaction sites reported in the literature are highly influenced by marker solubility as well as protein stability in the presence of a high marker concentration. Moreover, the identification of reactive lysine residues was previously hindered by the lack of appropriate marker substances. Christensen et al. [12] described two reactive lysine residues in κ-casein (κ K21, κ K24) only based on mass calculations according to a detected isopeptide mass. On the contrary, by using HRMS, we were able to identify precisely also the reactive lysine residues. Furthermore, the source of transglutaminase might as well affect preferred reactive sites. Finally, the stage of the enzymatic reaction might not be comparable between different studies, as we focused on the early stage of protein crosslinking. Accordingly, in the present study, the enzymatic cross-linking of casein micelles was performed under native conditions (no marker and SMUF as solvent) to preserve the native protein structure as well as under mild reaction conditions (1 U/g casein; 40 °C, 60 min) to identify only the "hottest" reactive sites. The formed isopeptides were precisely identified by peptide sequencing via HRMS, which enabled information not only about the reactive moieties, but also about the residues located perfectly arranged near to each other to form an isopeptide. Based on the HRMS method as well as the data analysis and data control management, also one trimer formed between α s2 -casein K47, β-casein Q69/71 and β-casein Q190/K191 was identified in our study (see Table 3, isopeptide #15).

Influences of neighboring amino acids on enzymatic activity
To draw some conclusions concerning substrate specificity of mTG, Table 5 summarizes all reactive glutamine and lysine residues identified in this study with their neighboring five amino acids on N-terminal as well as C-terminal side. Consecutive glutamine residues, which were in other proteins previously reported as good substrates for transglutaminases, were also in our study found within the reactive sequences (α s1 Q130/131; β Q54/56) [35][36][37][38]. The glutamine β Q175 was found as a major reactive site involved in the formation of homogenous isopeptides (#2, #3, #4, #5) as well as cross-links to all other casein types (#10, #11, #13, #15). Tokai et al. [34] also reported the tryptic peptide 170 VLPVPQK 176 of β-casein as the most reactive one in the enzymatic cross-linking of caseins by mTG. Our study, moreover, revealed that this protein region delivers not only an appropriate amino acid sequence for optimal substrate recognition but should be also located in a highly flexible protein region due to enabling cross-linking to all other caseins. The reactivity of the directly neighboring lysine residue β K176 supports that finding by the building of homogenous isopeptides (#1, #4) and heterogeneous isopeptides with α s -caseins (#8, #15). Interestingly, Q175 is substituted by E175 in the genetic variant A1 of β-casein, tempting to speculate that β-casein A1 could be a less efficient substrate for mTG. To the best of our knowledge, no experimental data about substrate suitability for mTG of β-casein A1 and A2 are reported so far in the literature. Therefore, further studies should focus on the impact of genetic variations on enzymatic cross-linking.
Coussons et al. [39] compared amino acid sequences of proteins, whose reactive glutamine residues were previously reported in the literature. By doing so, they summarized some similarities which could be utilized for forecasts about the reactivity of glutamine side chains. Accordingly, hydrophobic amino acids are reported to be frequently present on the C-terminal side of modified glutamine. The identified reactive glutamines of our study show in a window of five amino acids on N-terminal as well as C-terminal side many aliphatic amino acids (Table 5). Also, reactive lysines are located similarly in hydrophobic protein regions. Additionally, Coussons et al. [39] concluded that in proteins with less secondary structure, positively as well as negatively charged residues on the positions − 5, + 3, + 4, + 5 and − 4, + 4, + 5 to glutamine (in position 0) disable enzymatic cross-linking, respectively. Except for α s1 Q9 and α s1 Q13, our results are in agreement with that hypotheses demonstrating that the observations of Coussons et al. [39] can function as recommendations to predict the reactivity of glutamine residues in new proteins, but does not replace experimental data. Considering the vicinity of reactive lysines, our results show that positively as well as negatively charged moieties are randomly present or absent near to reactive lysine residues, so that they should be neither a discouraging nor a promoting feature for cross-linking. Besides, multiple proline residues predominantly surround cross-linked regions indicating that reactive sites could be located within loop structures.

Localization of reactive sites within casein molecules
To obtain information about the steric localization of reactive sites, Fig. 4 visualizes the predicted 3D models of individual caseins as they were described elsewhere [32,[40][41][42][43].
Most obviously, identified reactive sites are primarily placed in less structured regions or near β/γ-loops. Exceptions are α s2 K32 and κ K24, which are located in an α-helix and a β-sheet structure, respectively. Notwithstanding, all of the reactive sites seem to be exposed to the solvent, which was previously reported as a requirement for cross-linking [39,44]. The molecular model of β-casein was described as "crab-like" by Kumosinski et al. [40] due to two hydrophilic "crab arms" (I: 28-55; II: 85-119) and the hydrophobic protein backbone as "crab body." The hydrophobic backbone forms many loop structures through which water can easily diffuse. Within that protein region, the reactive lysine K169, as well as the most reactive lysine K176 and glutamine Q175, is located. As plasmin can hydrolyze the hydrophilic crab arms in the positions 28-29, 105-106 and 107-108 [45], good accessibility also for mTG can be imagined and is demonstrated by the reactive residues located in those regions (K107, K113, Q54, Q56). In contrast, reactive sites of α s1 -casein are not spread over the whole molecule, but located in the N-terminal region . This region is followed by two "dog-leg structures," which contain antiparallel β-sheet structures (Fig. 4) [32]. The first includes the sequence 136-159 and promote α s1 -casein dimerization by the formation of sheet-sheet-interaction as marked by one asterisk in Fig. 4 [32]. Consequently, dimerization might bring the region K105-K132 of two α s1 -caseins close enabling the development of homogenous isopeptide #7 (α s1 K105-α s1 Q130/131). Besides, the nearness of α s1 -caseins would be further supported by linking of phosphoserine residues mainly concentrated in the region 41-75 (marked by two asterisk in Fig. 4) to colloidal calcium phosphate. Via the second dog-leg structure of α s1 -casein (162-175; three asterisk in Fig. 4), two α s1 -casein dimers are expected to associate with one κ-casein, each with one of the two dog Fig. 4 3D-Models with secondary structure features (α-helices in yellow, β-sheets in violet) and annotated identified reactive sites (glutamine residues in blue, lysine residues in red); *region for interaction with "dog-leg-structure" of κ-casein; **region for self-association of α s1 -casein, 3D models were created via UCSF Chimera version 1.13.1 based on previously published data [32,[40][41][42][43] leg structures in κ-casein (I: 21-34, II: 39-55) [32]. This α s1 -κ-associate is described as intermediate in the formation of a submicellar structure, offering four open sectors in the interior. Two of these sectors are hydrophobic and may be used for hydrophobic incorporations of four β-casein molecules, whereas the other two sectors enable a diffusion of water and enzymes through the associate because of their hydrophilic character [32]. The single reactive site identified in κ-casein (K24) lies in that "dog-leg" structure, which comes into close proximity to other caseins especially in sub-micellar formation.

Weighting the identified reactive sites
As mentioned above, most of the identified reactive sites were equally detected in non-micellar, extra micellar as well as in MC. So far, the micellar structure has no pronounced effect on enzymatic cross-linking in the initial phase. One exception is isopeptide #14, which is formed by a linkage between α s1 K34 and α s2 Q101 and was found only in sodium caseinate. In casein micelles, especially α S -caseins interact via their multiple phosphoserine residues with colloidal calcium phosphate forming a skeletal structure. Accordingly, α S -caseins were reported as less accessible for enzymatic cross-linking in micellar than in non-micellar casein [10,11,46]. For further differentiation of the reactive sites and to obtain more information about differences in the crosslinking between non-micellar versus MC, the amount of individual isopeptides formed with increasing time of incubation was estimated. MC, as well as sodium caseinate, was incubated with mTG over 15, 30, 45 and as used before over 60 min. EC was again obtained by ultracentrifugation of cross-linked casein micelles. All samples were digested by trypsin, but this time without any former purification of dimers via SEC. Subsequently, the above identified isopeptide masses were searched and detected via HRMS. The increase of peak areas with increasing reaction time represents additional proof of the obtained results: Only the peak areas of isopeptides are expected to increase with reaction time due to an advanced formation, whereas other peak areas should stay nearly constant. Accordingly, Fig. 5 shows the extracted ion chromatograms of isopeptide #10 (I) over incubation time as an example, and in the supplementary data, Figures S2 and S3 summarize the formation of each identified isopeptide over reaction time. However, every peptide or isopeptide has a different ionization ability resulting in unequally intensive peak areas hampering direct comparison of the results. Hence, standardization of peak areas was performed by setting the peak areas of each isopeptide in ratio to the peak area of the peptide α s1 -145 FYPELFR 151 , which can neither be affected by enzymatic cross-linking nor by posttranslational modification and thus represents an efficient "internal" reference peptide. Figure 6 shows a heat map visualizing the obtained ratios of isopeptide to reference peptide, whose exact values are summarized in the supplementary data in Table S7. Most obviously, most isopeptides (#1, #3, #4, #6, #9, #11, #12, #13, #14, #15) are no longer detected or only identified with a small peak area as indicated by the grey areas in the map. This was because the isolation step for dimers was omitted. Consequently, the purification of cross-linked proteins is an important and necessary step in particular for the identification of "hot" reactive sites, which are formed in the initial phase of the reaction. Notwithstanding, the tracking of peak areas over reaction time gives additional evidence for the accurate identification of reactive sites. The results revealed that there are two groups of isopeptides, which might be described as the "hottest" reactive sites due to good detection even without purification. These two groups include isopeptides which were favorably built either in EC or in SC. Particularly, isopeptides #2 and #7 as predominantly formed in EC demonstrate that ECs do not fully react like free caseins in sodium caseinate, and thus, the equilibrium between colloidal and soluble caseins in casein micelle suspensions somehow influences the cross-linking reaction. It can be hypothesized that these reactive sites might be well accessible for mTG in casein micelles. After the cross-linking, the corresponding proteins might no longer be in an appropriate steric conformation for incorporation into the micelle, thus resulting in a release of that cross-linked micellar region to the soluble phase. There are no reactive sites which showed a preferred cross-linking in MC confirming that micellar structure somehow might have an impact on the process of enzymatic cross-linking although this is at least in the initial phase not as apparent as beforehand expected. Moreover, the isopeptides #5, #8 and #10 showed a favorable formation in sodium caseinate, indicating that the reactive sites of the cross-links β K107-β Q175, α s1 Q13-β K176 and α s1 K105-β Q175 might be located in micelle structure either in an inappropriate arrangement to each other or in a for mTG less accessible subunit.

Concluding remarks
In conclusion, our study demonstrates an appropriate analyses method for evaluating reactive sites of enzymatic crosslinking without labeling and while maintaining the natural protein conformation. Moreover, lysine and glutamine sites of a cross-link were identified, which delivers more information about steric arrangement and interactions of individual caseins to each other. The two moieties β Q175 and β K176 were identified as the most flexible reactive sites for the formation of isopeptides in micellar as well as sodium caseinate in the initial phase of the reaction. The residues α s1 K34 and α s2 Q101 seemed to be less accessible for mTG in casein micelles, but in general potential reactive sites identified in sodium caseinate. Most notably, our study revealed that casein micelle structure does not affect the enzymatic crosslinking in the initial phase of the reaction as dramatically Fig. 6 Heat map showing the ratios of the peak areas of each isopeptide and the respective peak area of reference peptide α s1 -145 FYPELFR 151 multiplied by factor 1,000. Results were classified into six groups marked shades ranging from 0 (grey) till over 20 × 10 3 (bright red) in steps of 5 × 10 3 0 <5 <10 <15 <20 ≥20 peak area ratio Isopeptide to reference peptide α s1 -145 FYPELFR 151