Identification of the regulatory proteins in human pancreatic cancers treated with Trichostatin A by 2D-PAGE maps and multivariate statistical analysis
In this paper, principal component analysis (PCA) is applied to a spot quantity dataset comprising 435 spots detected in 18 samples belonging to two different cell lines (Paca44 and T3M4) of control (untreated) and drug-treated pancreatic ductal carcinoma cells. The aim of the study was the identification of the differences occurring between the proteomic patterns of the two investigated cell lines and the evaluation of the effect of the drug Trichostatin A on the protein content of the cells. PCA turned out to be a successful tool for the identification of the classes of samples present in the dataset. Moreover, the loadings analysis allowed the identification of the differentially expressed spots, which characterise each group of samples. The treatment of both the cell lines with Trichostatin A therefore showed an appreciable effect on the proteomic pattern of the treated samples. Identification of some of the most relevant spots was also performed by mass spectrometry.
KeywordsPCAChemometricsHuman pancreatic tumourTrichostatin AProtein identification
Since each cell or biological fluid has a rich protein content (often comprising thousands of proteins of different structure and size), an effective method for achieving their separation is necessary. In the field of proteomics [1, 2], the separation of proteins is usually achieved by two-dimensional (2D) electrophoresis, a very powerful tool which performs two successive electrophoretic runs: the first run (through a pH gradient) separates the proteins with respect to their isoelectric point, while the second run (through a porosity gradient or a highly sieving, constant concentration gel) separates them according to their molecular mass. This technique produces a two-dimensional map, a so-called 2D-PAGE (polyacrylamide gel electrophoresis), with the proteins appearing as spots spread all over the gel matrix. A 2D-PAGE map may thus be considered as a “snapshot” of the protein content of the investigated cell at a given point of its life cycle.
In this new post-genomic and proteomic era, the investigation of the protein content of different cell types has become fundamental. In fact, the physiological state of a particular cell or tissue is related to its protein content, and the onset of a particular disease may cause differences in the proteins contained in the pathological tissue: these differences may consist of changes in the relative abundance or even in the appearance/disappearance of some proteins [3–11]. The comparison of 2D-PAGE maps belonging to healthy subjects with samples belonging to individuals affected by any pathology thus becomes a fundamental tool for both diagnostic and prognostic purposes [3–11]. The 2D-PAGE technique is also widely applied in the field of drug development [12–15], especially for cancer: two-dimensional gel-electrophoresis may be used to investigate if the treatment with a particular drug has played the expected role on the protein content of the pathological cell and to evaluate which effect was produced (e.g. up- or down-regulation, appearance/disappearance of pathological chains).
The high complexity of the sample, which can produce maps with thousands of spots
The complex sample pre-treatment, characterised by several purification/extraction steps, which may contribute to the appearance of maps with spurious spots due to accidental chemical modifications
The sometimes small differences which often occur in the 2D-PAGE maps of treated and reference samples, which are much more difficult to recognise in complex maps.
The 2D-PAGE images to be compared are aligned, so that all images are reduced to the same size. This step needs the choice of at least two positively identified spots in all the maps; the maps are then matched to each other on the basis of the position of these two spots
The spots present on each map are independently revealed
The maps are matched to each other in order to identify the common information (spots present in all the maps) and the different one (spots detected only on some of the samples). If the comparison is performed on a set of replicate maps this step produces a “synthetic” map which summarises the common information and contains only the spots present in all the compared maps.
The great amount of information produced by the comparison of 2D maps can be investigated by modern multivariate techniques like principal component analysis (PCA) [20–22], classification methods [23, 24] and multidimensional scaling (MDS) .
Our research group has developed a new method based on both fuzzy logic and classification methods for the comparison of the proteomic pattern of classes of 2D-PAGE maps [26, 27]. This method has also been applied together with MDS for the study of 2D maps from control and diseased individuals . Different proteomics patterns have also been investigated by the use of three-way principal component analysis .
PCA has been applied to the comparison of 2D-PAGE maps on the basis of the spot volume since the mid-1980s by Anderson et al.  in the USA and Tarroux et al.  in France. Recently, it has been applied to the study of DNA and RNA fragments of several biological systems [32–35] and to the characterisation of proteomic patterns of different classes of tissues [36–41]. Another recent application of PCA is for the characterisation of the anticancer activity of bohemine, a new omoleucine-derived synthetic cyclin-dependent kinase inhibitor, by Kovarova et al. .
In this paper, PCA is applied to a dataset comprising 18 samples belonging to two different cell lines (Paca44 and T3M4) of pancreatic human cancer before and after the treatment with a new drug (Trichostatin A). This approach focuses on the evaluation of the efficacy of the drug (reflected in a difference in the protein content of control and treated samples) and to the identification of the differences occurring between the samples (control/treated samples and Paca44/T3M4 cell lines). Some of the proteins responsible for the identified differences in the control and treated Paca44 samples were also characterised by mean of mass spectrometry with the matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) technique.
Principal component analysis
Principal component analysis [20–22] is a multivariate statistical method, which allows the representation of the original dataset in a new reference system characterised by new variables called principal components (PCs). Each PC has the property of explaining the maximum possible amount of residual variance contained in the original dataset: the first PC explains the maximum amount of variance contained in the overall dataset, while the second one explains the maximum residual variance. The PCs are then calculated hierarchically, so that experimental noise and random variations are contained in the last PCs. The PCs, which are expressed as linear combinations of the original variables are orthogonal to each other and can be used for an effective representation of the system under investigation with a lower number of variables than in the original case. The co-ordinates of the samples in the new reference system are called scores while the coefficient of the linear combination describing each PC (i.e. the weights of the original variables on each PC) are called loadings. The graphical representation of scores by means of PCs allows the identification of groups of samples showing a similar behaviour (samples close to one another in the graph) or different characteristics (samples far from each other). By looking at the corresponding loading plot, it is possible to identify the variables, which are responsible for the analogies or the differences detected for the samples in the score plot. From this point of view, PCA is a very powerful visualisation tool, which allows the representation of multivariate datasets by means of only few PCs identified as the most relevant.
Cluster analysis techniques allow one to investigate the relationships between the objects or the variables of a dataset in order to recognise the existence of groups. The most commonly used approaches belong to the class of agglomerative hierarchical methods , in which the objects are grouped (linked together) on the basis of a measure of their similarity. The most similar objects or groups of objects are linked first. The result of such analyses is a graph, called a dendrogram, in which the objects (x-axis) are connected at decreasing levels of similarity (y-axis). The results of hierarchical clustering methods depend on the specific measure of similarity and on the linking method.
Four replicate 2D maps of a Paca44 cell line pool
Five replicate 2D maps of a T3M4 cell line pool
Four replicate 2D maps of a Paca44 cell pool treated for 48 h with Trichostatin A
Five replicate 2D maps of a T3M4 cell pool treated for 48 h with Trichostatin A.
Principal component analysis was performed with UNSCRAMBLER (Camo Inc., version 7.6, Norway). Cluster analysis was performed with STATISTICA (Statsoft Inc., version 5.1, USA). Graphical representations were performed with both UNSCRAMBLER and STATISTICA. The 2D-PAGE maps were scanned with a GS-710 densitometer (Bio-Rad Laboratories, Hercules, CA, USA) and analysed with the software PDQuest Version 6.2 (Bio-Rad Laboratories, Hercules, CA, USA).
Chemicals and materials
Urea, thiourea, 3-[(cholamidopropyl)dimethylammonium]-1-propane-sulfonate (CHAPS), iodoacetamide (IAA), tributylphosphine (TBP) and sodium dodecyl sulfate (SDS) were obtained from Fluka Chemie (Buchs, Switzerland). Bromophenol blue and agarose were from Pharmacia-LKB (Uppsala, Sweden). Acrylamide, N′,N′-methylenebisacrylamide, ammonium persulfate, TEMED, the Protean IEF Cell, the GS-710 Densitometer and the 17-cm-long, immobilised pH 3–10 linear gradient strips were from Bio-Rad Laboratories (Hercules, CA, USA). Ethanol, methanol and acetic acid were from Merck (Darmstadt, Germany). Trichostatin A (TSA) was obtained from Sigma–Aldrich Ltd. (St. Louis, MO, USA). A 3.3 mM solution of TSA in absolute ethanol was prepared and stored at −80°C until use.
Cell treatment with TSA
Paca44 and T3M4 cells were grown in RPMI 1640 supplemented with 20 mM glutamine and 10% (v/v) FBS (BioWhittaker, Italy) and were incubated at 37°C with 5% (v/v) CO2. Subconfluent cells were treated with 0.2 mM TSA for 48 h.
Protein extraction from cells was performed with lysis buffer (40 mM Tris, 1% v/v NP40, 1 mM Na3VO4, 1 mM NaF, 1 mM PMSF, protease inhibitor cocktail). Cells were left in lysis buffer for 30 min in ice. After centrifugation at 14,000×g at 4°C for removal of particulate material, the protein solution was collected and stored at −80°C until used.
Two-dimensional gel electrophoresis
Seventeen-centimeter-long, pH 3–10 immobilized pH gradient strips (IPG; Bio-Rad Labs., Hercules, CA, USA) were rehydrated for 8 h with 450 μL of 2D solubilizing solution (7 M urea, 2 M thiourea, 5 mM tributylphosphine, 40 mM Tris and 20 mM iodoacetamide) containing 2 mg mL−1 of total reduced/alkylated protein from sample cells. Isoelectric focussing (IEF) was carried out with a Protean IEF Cell (Biorad, Hercules, CA, USA) with a low initial voltage and then by applying a voltage gradient up to 10,000 V with a limiting current of 50 μA. The total product time×voltage applied was 70,000 Vh for each strip, and the temperature was set at 20°C. For the second dimension, the IPGs strips were equilibrated for 26 min by rocking in a solution of 6 M urea, 2% w/v SDS, 20% v/v glycerol, 375 mM Tris-HCl, pH 8.8. The IPG strips were then laid on an 8–18% T gradient SDS-PAGE with 0.5% w/v agarose in the cathode buffer (192 mM glycine, 0.1% w/v SDS and Tris to pH 8.3). The anodic buffer was a solution of 375 mM Tris HCl, pH 8.8. The electrophoretic run was performed by setting a current of 2 mA for each gel for 2 h, then 5 mA/gel for 1 h, 10 mA/gel for 20 h and 20 mA/gel until the end of the run. During the whole run the temperature was set at 11°C. Gels were stained overnight with colloidal Coomassie blue (0.1% w/v Coomassie Brilliant Blue G, 34% v/v methanol, 3% v/v phosphoric acid and 17% w/v ammonium sulphate); destaining was performed with a solution of 5% v/v acetic acid until a clear background was achieved.
Protein identification by mass spectrometry
In situ digestion and extraction of peptides
The spots of interest were carefully excised from the gel with a razor blade, placed in Eppendorf tubes, and destained by washing three times for 20 min in 50% v/v acetonitrile, 2.5 mM Tris, pH 8.5. The gel pieces were dehydrated at room temperature and covered with 10 μL of trypsin (0.04 mg mL−1) in Tris buffer (2.5 mM, pH 8.5) and left at 37°C overnight. The spots were crushed and peptides were extracted in 15 μL of 50% v/v acetonitrile, 1% v/v formic acid. The extraction was conducted in an ultrasonic bath for 15 min. The sample was centrifuged at 8,000×g for 2 min, and the supernatant was collected.
The extracted peptides were loaded onto the target plate by mixing 1 μL of each solution with the same volume of a matrix solution, prepared fresh every day by dissolving 10 mg mL−1 cyano-4-hydroxycinnamic acid in acetonitrile/ethanol (1:1 v:v), and allowed to dry. Measurements were performed by using a TofSpec 2E MALDI-TOF instrument (Micromass, Manchester, UK), operated in reflectron mode, with an accelerating voltage of 20 kV. Peptide masses were searched against SWISS-PROT, TrEMBL and NCBInr databases by utilizing the ProteinLynx program from Micromass, Profound from Prowl, and Mascot from Matrix Science.
Results and discussion
Protein pattern analysis with the PDQuest software
The 2D gels of all the samples (Paca44 control and treated with TSA, T3M4 control and treated with TSA) were scanned with a GS-710 densitometer (Bio-Rad), and analysed with the software PDQuest. A match-set was created from the protein patterns of the 18 replicates 2D maps. A standard gel was generated out of the image with the highest spot number. Spot quantities of all gels were normalized to remove non-expression-related variations in spot intensity, so the raw quantity of each spot in a gel was divided by the total quantity of all the spots in that gel that have been included in the standard. The results were evaluated in terms of spot optical density (OD). The analysis with the PDQuest allowed two types of comparisons: between the two different cell lines (Paca44 versus T3M4), and between the control and TSA-treated cell lines (control versus TSA-treated) in order to detect protein variations that were at least two-fold. The Student’s t-test analysis allowed the identification of 60 spots up-regulated (with a significance level α of 0.05) and 45 spots down-regulated (with α=0.05) in the T3M4 cell line with respect to the PaCa44 cell line; and 11 spots up-regulated (with a significance level α of 0.05) and two spots down-regulated in TSA-treated cell lines in respect to the control samples.
Figures 2c and d show the results of comparison between the control and TSA-treated cell lines. In Fig. 2c the two spots down-regulated in TSA-treated cell lines (which were thus more intense in the control) are marked in red, while in Fig. 2d the 11 spots up-regulated in TSA-treated cell lines are marked in red.
Principal component analysis
Results of PCA performed on the overall dataset: percentage of explained variance and percentage of cumulative explained variance
Explained variance (%)
Cumulative explained variance (%)
PC1 explains the information related to the two cell lines
PC2 carries the information about the TSA effect, mainly for the Paca44 cell line
PC3 carries the information about the sensitivity to TSA, mainly for the T3M4 cell line.
Figure 6 represents the loadings plots of the second component: the red-coloured spots identify spots characterised by a larger optical density in the control samples or missing in the TSA-treated ones, whereas the blue-coloured spots represent the opposite situation [i.e. spots with a larger density in the treated samples or missing in the control ones (of both cell lines)]. Figure 6 (top and bottom) represents two examples of control real samples of the two cell lines (characterised by large values of the red-coloured spots) and two examples of TSA-treated real samples of the two cell lines (characterised by large values of the blue-coloured spots).
The loading plots of the third PC are represented in Fig. 7: the red-coloured circles identify those spots showing a larger optical density in the control T3M4 cells and the TSA-treated Paca44 cells or spots which are absent in the other two classes of samples; the blue-coloured circles instead show the opposite behaviour: they represent the spots which show a larger optical density in the control Paca44 cells and the TSA-treated T3M4 cells or absent in the other two classes. In this case too, an example of a real sample of each class is presented: the top figure represents two examples of real samples characterised by large values of the red-coloured spots, whereas the bottom figure reports two examples of real samples characterised by large values of the blue-coloured spots.
The conclusions derived by means of PCA show a very good agreement with those obtained by PDQuest analysis of the 2D-PAGE maps. The spots identified by PDQuest as the most characterising ones were also identified by means of PCA; in this last case, however, a larger number of spots were identified. Analysis of 2D-PAGE maps by dedicated software usually allows the identification of only those spots which exhibit at least a two-fold variation in the protein content. PCA is a robust tool which allows the detection of variations lower than the classical two-fold, since the changes due to the natural variability of the experimental steps are explained by the last PCs, which are not taken into account. The total information obtained by PCA is then larger than that obtained by dedicated software; for example, in the present case, the existence of the three patterns identified by PC1–PC3 (Figs. 5, 6 and 7,) could not be achieved by conventional PDQuest analysis.
Summary of the identified proteins from Paca44 cell line 2D gels. For spot numbers, refer to Fig. 2
Exp. Mr (Da)
No. of peptides
Tropomyosin alpha four chain (Tropomyosin 4)
Calreticulin precursor (CRP55) (Calregulin)
Tropomyosin alpha three chain (Tropomyosin 3)
Translationally controlled tumor protein (TCTP)
37,406 and 35,899
9.0 and 8.6
5.69 E10 and 1.17 E10
Heterogeneous nuclear ribonucleoproteins A2/B1 and Glyceraldehyde 3-phosphate dehydrogenase, liver (EC 126.96.36.199)
P22626 and P04406
54 and 48
16 and 16
51,736 and 46,142
5.0 and 5.0
3.38 E12 and 4.19 E9
ATP synthase beta chain, mitochondrial precursor (EC 188.8.131.52) and protein disulfide isomerase A6 precursor (EC 184.108.40.206)
P06576 and Q15084
41 and 41
16 and 13
51,736 and 49,671
5.0 and 4.8
4.24 E11 and 1.20 E10
ATP synthase beta chain, mitochondrial precursor (EC 220.127.116.11) and Tubulin beta-1 chain
P06576 and P07437
38 and 42
15 and 14
16,310 and 17,160
5.5 and 5.7
1.91 and 2.01
2.22 E9 and 5.89 E7
ARP2/3 complex 16 kDa subunit (P16-ARC) and Stathmin (Phosphoprotein p19) (pp19) (Oncoprotein 18)
O15511 and P16949
86 and 77
14 and 24
Deduced protein product shows significant homology to coactosin
UEV protein (ubiquitin-conjugating E2 enzyme variant)
Biological significance of some interesting identified proteins
Among the proteins which were identified by MALDI-TOF analysis, of particular interest are the down-regulated translationally controlled tumour protein (TCTP) as well as the up-regulated protein stathmin (OP18). Their roles will be briefly discussed below.
Translationally controlled tumour protein (TCTP), a three-fold down-regulated polypeptide, seems to be involved in tumour reversion, that is, in the process by which some cancer cells lose their malignant phenotype. In a recent study, Tuynder et al.  showed that TCTP is strongly down-regulated in the reversion processes of human leukemia and breast cancer cell lines.
Stathmin (Oncoprotein 18, OP18) was eight-fold up-regulated by the TSA treatment. Stathmin is a p53-regulated member of a novel class of microtubule-destabilizing proteins known to promote microtubule depolymerization during interphase and late mitosis . Thus, high levels of stathmin could induce growth arrest at the G2 to mitotic boundary [46, 47]. This suggests a cell cycle arrest at the G2 phase of Paca44 cell treated with TSA. Due to its effect of inhibiting cell proliferation via a mitotic block, the up-regulation of stathmin reported here appears to be consistent with the antitumoural activity of TSA.
Principal component analysis is applied here to a dataset comprising by 18 samples belonging to control (untreated) and drug-treated pancreatic human cancer cells; the samples belong to two different cell lines: Paca44 and T3M4. PCA turned out to be a successful tool for the identification of the classes of samples present in the dataset; moreover, the loadings analysis allowed the identification of the regulatory spots, which characterise each group of samples. Thus, the treatment of both cell lines with Trichostatin A showed an appreciable effect on the proteomic pattern of the control samples. The separation of the samples into four groups by mean of the first three PCs was also confirmed by cluster analysis. The conclusion driven by PCA resulted in good agreement with those obtained from the application of the differential analysis provided by PDQuest.
The MALDI-TOF analysis performed on the Paca44 cell line allowed the identification of some of the spots differentially expressed in control versus treated Paca44 samples. The biological significance of some of the proteins differentially expressed upon TCA treatment is discussed.
Supported by grants from AIRC (Associazione Italiana Ricerca sul Cancro, Milano, Italy), FIRB 2001 (No. RBNF01KJHT), MIUR (Ministero dell’Istruzione, dell’Università e della Ricerca, Rome, Italy; COFIN 2003), Fondazione Cassa di Risparmio di Verona and the European Community, Grant No. QLG2-CT-2001-01903 and No. QLG-CT-2002-01196.